نمودار کامیت

71120 کامیت‌ها

مولف SHA1 پیام تاریخ
Bindhiya Kanangot Balakrishnan 641fa27699 [SWDEV-566543] Fix param validation in FrequenciesRead test (#2430)
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-23 15:38:25 -08:00
Ioannis Assiouras 49b8900158 SWDEV-558849 - keep the lastEnqueueCommand_ when PAL backend is enabled (#2320) 2025-12-23 21:24:09 +00:00
ammallya c2c4d4c1f5 Revert "Adding full build capability to theROCK for HIP changes (#2003)" (#2441)
This reverts commit 0a52f5c101.

Reverts #2003

MIOpen build failures on windows causing blockers on unrelated file changes.
2025-12-23 13:01:08 -08:00
vedithal-amd 61fd728fdb [rocprofiler-compute] Faster counter accuracy testing (#2420)
* Faster counter accuracy testing

* Better handle SPI_CSN_* metrics for lesser than MI350 series

* Use metric filtering to collect only relevant counters for comparison

* Ensure all workload folders are deleted after testing is completed

* Dont use clean_existing=False

* Add manual test for all counter accuracy
2025-12-23 13:13:53 -05:00
vedithal-amd d7302d6c1c [rocprofiler-compute] Test env. vars. in rocprofiler-sdk backend (#2414)
* Test env. vars. in rocprofiler-sdk backend

* Improve rocprofiler-sdk backend test case to check for env. vars. and
  ensure we do not overwrite irrelevant env. vars.

* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.

* Formatting fixes

* Test fixes

* Remove redundant code in tests

* Remove usage of utils_mod and use utils instead, this prevents
  duplicate imports
2025-12-23 13:13:28 -05:00
vedithal-amd 588773f9bf [rocprofiler-compute] Fix for multi process workload profiling (#2418)
* Fix for multi process workload profiling

Native counter collection tool updates:
    * Do not dump empty counter data for a process
    * Use PID instead of UUID for dumped csv files to facilitate correlation
    * Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
      native tool) files
    * Handle merging multiple pairs of csv (from sdk tool) and csv (from
      native tool) files

Rocpd output format updates:
    * Merge multiple rocpd databases into a single csv
    * Reset dispatch id and kernel id for unique dispatches and unique
      kernels respectively
    * Retain multiple rocpd databases per run for multi process workloads

* Add test case for multiprocess profiling using rocflop workload

* Add rocflop

* Fix native counter csv to rocprofv3 csv conversion

* Use kernel_id instead of dispatch_id to correlate native counter csv
  and kernel trace csv

* python formatting using ruff 0.14 instead of 0.13
2025-12-23 13:12:18 -05:00
marandje 3e49440495 SWDEV-555178 - Calculate phys mem offset for remap range (#1879) 2025-12-23 10:27:42 +01:00
Milan Radosavljevic 719556fbba [rocprofiler-systems] Add SIGKILL delay option (#2384)
## Motivation

When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.

## Technical Details

- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization
2025-12-22 21:17:57 -05:00
Young Hui - AMD 37e3b8a3db [rocpd] Write rocpd yaml files as a list, even when only 1 file (#2288) 2025-12-22 17:56:59 -05:00
habajpai-amd 447025011a [Rocprof-Sys] Resolve crash when profiling TensorFlow GPU application (#2381)
* fix: resolve crash when profiling TensorFlow GPU application

* incorporate review comments

* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
2025-12-22 14:00:55 -05:00
Ammar ELWazir 1f8e8e3fbf Add CODEOWNERS for rocprofiler-sdk project (#2427)
## Motivation

Missing CODEOWNERS for ROCProfiler-SDK

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->

## Technical Details

Add CODEOWNERS for rocprofiler-sdk project

<!-- Explain the changes along with any relevant GitHub links. -->

## JIRA ID

<!-- If applicable, mention the JIRA ID resolved by this PR (Example: Resolves SWDEV-12345). -->
<!-- Do not post any JIRA links here. -->

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [X] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2025-12-22 12:16:09 -05:00
Gopesh Bhardwaj 9141f26905 [Documentaion] updating roctx library linkage documentation (#2251) 2025-12-22 10:36:13 -05:00
ammallya 0a52f5c101 Adding full build capability to theROCK for HIP changes (#2003)
## Add Full Build Capability to theROCK for HIP

### Summary
This PR adds full build support to **theROCK** for HIP-related changes, ensuring that all components are built.

### Changes
- Enabled full build coverage for the following projects:
  - `projects/clr`
  - `projects/hip`
  - `projects/hip-tests`
  - `projects/rocr-runtime`
- Updated build configuration to include all targets for the above projects.
- Ensured rocm-libraries is pulled to build optional components.

### Motivation
These changes are required to support HIP development and testing within theROCK by ensuring all components are built together. This improves reliability, integration testing.
2025-12-22 05:31:32 -08:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
Aleksandar Djordjevic 7da3275b42 [rocprofiler-systems] Improve metadata parsing (#2238)
* Improve metadata JSON parsing
* Fix string ownership
2025-12-22 12:30:51 +01:00
abchoudh-amd 5b241f3e61 Fixed ctests (#2406) 2025-12-22 13:12:58 +05:30
Geo Min 3635953cd8 Revert "Adding org var and dynamic selection of targets (#2317)" (#2416)
This reverts commit c9ac018395.
2025-12-19 14:51:53 -08:00
cadolphe-amd 14c949a827 SWDEV-572676 - adjust tile size to 32 in Unit_hipCGThreadBlockTileType for Navi4x (#2379)
* SWDEV-572676 - adjust tile size to 32 for Navi4x

* SWDEV-572676 - change tile size from fixed value to warp size
2025-12-19 16:43:34 -05:00
Sourabh U Betigeri d552491985 SWDEV-572329 - Remove barrier packet (#2304) 2025-12-19 13:37:48 -08:00
Sourabh U Betigeri fdc1660dfa SWDEV-565304 - Pass numa node to migrate pages correctly (#1729)
* SWDEV-565304 - Pass cpuId of the the thread currently running

* SWDEV-565304 - Numa id to be returned

* SWDEV-565304 - Numa id to be returned
2025-12-19 13:36:53 -08:00
Matt Arsenault 0c0d8dc974 SWDEV-548892 - Stop using __ockl_lane_id (#2186)
__lane_id already exists and is identical.
2025-12-19 20:34:55 +01:00
systems-assistant[bot] 7c989ac022 [SWDEV-525635] Updated output file handling options (#1896)
Currently if the input file name already exists, the tool
appends output to existing file. Added overwrite, append,
or no(discard) options to choose from.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-19 13:10:42 -06:00
habajpai-amd 7b00d3a89b fix: prevent double-free crash during process exit in amd-smi (#2213) 2025-12-19 11:56:40 +05:30
Sourabh U Betigeri 883fdfb820 Revert "clr: Minor fixes for error return" (#2399)
- This reverts commit 8dd8436e43c7f0d062fd73252bf61c35615d181d.
- Resolve MIOpen test failures observed in TheRock
- TheRock Issue: ROCm/TheRock#2642
- room-systems issue: #2400
2025-12-18 18:40:13 -05:00
Jason Bonnell 112b4fd413 [rocprofiler-compute] Add SDK dependency to rocprofiler-compute-tarball.yml workflow (#2329)
* Install rocm-dev in rocprofiler-compute-tarball.yml workflow

* Update paths for push and PR for rocprofiler-compute-tarball.yml

* Add ROCm dependencies to disttest job

* cmake fix binary link creation and fix format

* Use python3 instead of python3.9 in RHEL 8 and RHEL 9 workflows

* set default python3 to python3.9 in rhel8

* Try alternatives setup for python3 in RHEL8 env

* Add pip install cmake to debug RHEL8 issue

* Remove python3.11 in RHEL8 workflow

* Add back comment regarding RHEL8

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-12-18 11:56:23 -05:00
vedithal-amd e4abee4f7d [rocprofiler-compute] Improve iteration multiplexing code and documentation (#2080)
* Improve Iteration multiplexing

* Improve iteration multiplexing documentation by adding usage note and
  listing caveats

* Bugfixes for iteration mulitplexing
    * Use merge iteration multiplexing in analysis webui and db mode
    * Do not remove Dispatch_ID column in merge iteration multiplexing
      since it is needed for analysis of top dispatches based on
duration

* Bugfixes for analysis logic
    * Graceful handling of missing counters in case of iteration
      multiplexing
    * Improved warnings when metrics could not be calculated due to
      missing counter data
    * Fix the check to prevent showing table when a column is full of
      N/A
    * Improve detection of empty values when metric evaludation fails
      due to missing counter data

* Bugfixes for profile logic
    * Fix kernel filtering during roofline benchmark phase

* Update changelog for bugfixes

* Remove unnecessary columns when merging dispatches for iteration multiplexing

* bugfix

* Better analysis warnings

* fix to_std() in parser

* Use median in merge iteration multiplex

* Address review comments

* Fix cmake formatting

* fix None handling of parser util functions

* Enable stochastic counter accuracy test

* fix cmake formatting
2025-12-18 11:51:21 -05:00
Adam Pryor bd6c6852fc [SWDEV-566924] Update KFD_ID metric to use amd-smi instead of rocprof (#2355) 2025-12-18 08:39:19 -06:00
Jatin Chaudhary fdf73116d5 Do not allocate code objects when we map a static code object (#2332) 2025-12-18 09:22:02 +00:00
habajpai-amd b4e04b07ed test: add unit tests for common utilities from PR #1249 (#2237)
* test: add unit tests for common utilities from PR #1249

* incorporate review comments specific to tests formatting

* use filesystem API instead of std::system for safer cleanup

* Add ghc/filesystem submodule v1.5.14 for portable C++17 filesystem support

* fix: add cmake/GhcFilesystem.cmake for CI submodule auto-checkout

* incorporate review comment

* incorporate review comment
2025-12-18 11:03:14 +05:30
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
David Yat Sin 5ebd50c0b4 rocr: Fix asyncHandler segfault (#2261)
Fix initialization order for the async events handler. The polling
thread would launch before the wake signal is initialized.
2025-12-17 20:52:20 -08:00
Shadi Dashmiz 96f6b6e251 SWDEV-571304 : Fix the constructor for __half (#2240)
- comply with cuda

- Fix usecase for constexpr

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2025-12-17 11:15:20 -05:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
marantic-amd 79ad00fb15 Scale down memory usage data when the actual data is stored to cache (#2343) 2025-12-17 14:57:41 +01:00
Benjamin Welton e3c051d9b8 [RDC] Optimize RDC counter sampling with greedy packing algorithm (#1590)
* Optimize RDC counter sampling with greedy packing algorithm

This change significantly reduces the number of rocprofiler-sdk sample calls
by implementing a greedy packing algorithm that groups multiple counters into
the minimal number of hardware profiles.

Key improvements:
- Implement greedy packing algorithm to combine counters into minimal profiles
- Add ProfileSet structure to manage packed counter configurations
- Cache packed profile sets for reuse across queries
- Group telemetry field requests by GPU for bulk processing
- Reduce sample calls by ~35% (from 100 to 65 for typical workloads)

Performance impact:
- 13 counters now packed into 3 profiles (77% compression)
- Reduces overhead from profile creation and context switching
- More efficient utilization of hardware counter resources

Implementation details:
- Added create_profiles_for_counters() using greedy algorithm
- Added sample_counters_with_packing() for bulk sampling
- Modified telemetry layer to use rocp_lookup_bulk()
- Preserves all field transformations and special handling

Testing shows successful packing with expected performance gains.
No functional changes to external APIs or behavior.

Co-Authored-By: Ben Welton <bwelton@amd.com>

* Address PR review feedback

This commit addresses all review comments from the initial PR:

1. Fix division by zero risk in debug logging
   - Added check for empty counters vector before calculating compression ratio
   - Avoids potential division by zero when logging profile creation stats

2. Improve thread safety for statistics tracking
   - Changed static uint64_t to std::atomic<uint64_t> for thread-safe counters
   - Prevents race conditions in multi-threaded sampling scenarios

3. Remove unused variable
   - Removed unused profile_index variable that was incremented but never used
   - Cleaned up dead code

4. Clean up code formatting
   - Removed extra blank lines for consistency
   - Applied formatting fixes across modified files

5. Refactor code duplication between rocp_lookup and rocp_lookup_bulk
   - Created apply_field_transformation() helper function
   - Eliminates ~70 lines of duplicated switch statement logic
   - Centralizes field transformation logic in single location
   - Makes future maintenance easier

6. Document non-rocprofiler metrics handling
   - Added comments explaining how bulk lookup handles special cases
   - Clarifies that non-profiler fields like KFD_ID are handled in transformation

All changes maintain backward compatibility and pass compilation.

Co-Authored-By: Ben Welton <bwelton@amd.com>

---------

Co-authored-by: Ben Welton <bwelton@amd.com>
Co-authored-by: Adam Pryor <61172547+adam360x@users.noreply.github.com>
2025-12-17 07:56:33 -06:00
abchoudh-amd 6d9d880d31 [rocprofiler-compute] Counter accuracy tests and improvements for iteration multiplexing (#2011)
* Added laplace solver in samples

* Add laplace eqn in CMake

* Added counter accuracy test

* Add iteration CLI arg for laplace eq

* Unnest profile method

* Missing counter warning

* Updated insufficient kernel warning

* Added reference for laplace equation

* variable name change

* Added comments for data comparison

* Included scipy as test requirement

* Added line number for ref

* split stochastic and deterministic tests

* Added order cli option for laplace_eqn

* Install laplace eqn

* Missing counter warning

* Warn about missing kernels during analysis

* Update tests

* Split iteration multiplexing ctests

* Updated warning

* Incorporated copilot's suggestions
2025-12-17 18:26:39 +05:30
xuchen-amd c738e73d99 [rocprofiler-compute][tui] menu bar lag fix (#1942) 2025-12-16 17:02:27 -05:00
amd-juwillia 3a3738ad98 Added AMDSMI CI to rocm-systems(#2074)
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-12-16 13:52:42 -06:00
Geo Min c9ac018395 Adding org var and dynamic selection of targets (#2317) 2025-12-16 10:46:59 -08:00
Joseph Narlo 16f06808d4 [SWDEV-565460] AMD SMI Document Multiple Init Best Practices (#2293)
* [SWDEV-565460] AMD SMI Document Multiple Init Best Practices

Signed-off-by: amd-josnarlo <josnarlo.amd.com>

* Add sphinxcontrib-mermaid to render diagram in HTML

bump rocm-docs-core to 1.31.0
pip-compile requirements.txt

---------

Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
2025-12-16 11:06:18 -06:00
Aleksandar Djordjevic 0b4a309ff7 [rocprofiler-systems] Add span (#2142)
* Add span
* Update unit tests
2025-12-16 13:38:47 +01:00
Kent Russell 0a2ea9ef55 hsakmt: Expose and use CWSR and Control stack sizes (#2200)
* hsakmt: Expose CWSR and Control stack sizes

This is better than hardcoding values and hoping that they align with
KFD's definitions

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Use CwsrSize and CtlStackSize if available

If KFD is providing the CwsrSize and CtlStackSize, use the maximum
of those and the old calculations for the ctx_save_restore_size
and ctl_stack_size defined in the queue

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Add warning when ABI<1.20 on GFX1151

CwsrSize and CtlStackSize are reported by KFD ABI 1.20. GFX1151
specifically may have some issues if these regions are misaligned, so
report a strong warning during topology initialization if the system is
GFX1151 but is using KFD ABI < 1.20

Signed-off-by: Kent Russell <kent.russell@amd.com>

---------

Signed-off-by: Kent Russell <kent.russell@amd.com>
2025-12-16 06:26:14 -06:00
Milan Radosavljevic 666e76deac [rocprofiler-systems] Add cached demangler and replace old demangle (#2135)
* Add cached demangler and replace old

* Add unit tests

* Applied suggestions from code review

* Applied suggestions from code review
2025-12-16 08:32:18 +01:00
arvindcheru 21afa807a9 Enable Lintian Support for ROCM-SMI, ROCMINFO (#1650)
* Enable Lintian Support for ROCM-SMI

* Enable Lintian Support for ROCMINFO

* Updated Lintian Override File Processing

* Update UT Fix for Lintian rocmsmi,rocminfo

* Update UT Fixes, Review Comments

* Update Review Comments - removed extra white spaces, added error check for gzip, date commands

* Update Review Comments - Correcting License Type

* Sync Lintian ChangeLog

* Changelog data sync enhanced

* Update Review Comments, UT fix

* white space cleanup - precommit check
2025-12-15 14:35:28 -06:00
randyh62 1240b592a5 Git url fix (#2285)
* Update README-doc.md

Correct GitHub URL for components moved into rocm-systems

* Update amd_clr.rst

Update github.com URLs

* Update Dockerfile

Update rocm-systems paths

* Update CONTRIBUTING.md

update for rocm-systems

* Update CONTRIBUTING.md

minor change

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* Update hip_runtime_api.rst

Update for rocm-systems

* Update installation.rst

update URL to libhsakmt

* Update what_is_hip.rst

* Update projects/clr/CONTRIBUTING.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update projects/clr/README-doc.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update Dockerfile

Update git clone for sparse checkout

* Update projects/hip/CONTRIBUTING.md

* Update projects/clr/CONTRIBUTING.md

* Update projects/hipother/CONTRIBUTING.md

---------

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>
2025-12-15 11:57:18 -08:00
Mario Limonciello 08949cb884 Run pre-commit's whitespace related hooks on projects/amdsmi (#2119)
* Run pre-commit's whitespace related hooks on projects/amdsmi

In order for pre-commit to be useful, everything needs to meet a common
baseline.

* Add whitespace back to Changelog for formatting

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-12-15 13:20:47 -06:00
gabrpham 48e57d3e2a Version bump and Changelog update for ROCm version 7.2 (#2201)
* Update projects/amdsmi/CHANGELOG.md
* Bump to 26.2.1
---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2025-12-15 13:19:30 -06:00
Jonathan R. Madsen a71cc3cc88 [rocprofiler-sdk] Optimize rocprofiler-sdk find_clients() (#2267) 2025-12-15 11:47:37 -06:00
systems-assistant[bot] b002c6a739 SWDEV-538607 - Add SIMDe as a build dependency, remove naked intrinsic use. (#500)
Co-authored-by: Alex Voicu <alexandru.voicu@amd.com>
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
2025-12-15 17:40:51 +00:00
Ajay GunaShekar b9dc8f729a SWDEV-566268 - skip 2 failing tests on rock Windows (#2308) 2025-12-15 12:32:37 -05:00