Gráfico de commits

1823 Commits

Autor SHA1 Mensaje Fecha
vedithal-amd d7302d6c1c [rocprofiler-compute] Test env. vars. in rocprofiler-sdk backend (#2414)
* Test env. vars. in rocprofiler-sdk backend

* Improve rocprofiler-sdk backend test case to check for env. vars. and
  ensure we do not overwrite irrelevant env. vars.

* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.

* Formatting fixes

* Test fixes

* Remove redundant code in tests

* Remove usage of utils_mod and use utils instead, this prevents
  duplicate imports
2025-12-23 13:13:28 -05:00
vedithal-amd 588773f9bf [rocprofiler-compute] Fix for multi process workload profiling (#2418)
* Fix for multi process workload profiling

Native counter collection tool updates:
    * Do not dump empty counter data for a process
    * Use PID instead of UUID for dumped csv files to facilitate correlation
    * Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
      native tool) files
    * Handle merging multiple pairs of csv (from sdk tool) and csv (from
      native tool) files

Rocpd output format updates:
    * Merge multiple rocpd databases into a single csv
    * Reset dispatch id and kernel id for unique dispatches and unique
      kernels respectively
    * Retain multiple rocpd databases per run for multi process workloads

* Add test case for multiprocess profiling using rocflop workload

* Add rocflop

* Fix native counter csv to rocprofv3 csv conversion

* Use kernel_id instead of dispatch_id to correlate native counter csv
  and kernel trace csv

* python formatting using ruff 0.14 instead of 0.13
2025-12-23 13:12:18 -05:00
marandje 3e49440495 SWDEV-555178 - Calculate phys mem offset for remap range (#1879) 2025-12-23 10:27:42 +01:00
Milan Radosavljevic 719556fbba [rocprofiler-systems] Add SIGKILL delay option (#2384)
## Motivation

When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.

## Technical Details

- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization
2025-12-22 21:17:57 -05:00
Young Hui - AMD 37e3b8a3db [rocpd] Write rocpd yaml files as a list, even when only 1 file (#2288) 2025-12-22 17:56:59 -05:00
habajpai-amd 447025011a [Rocprof-Sys] Resolve crash when profiling TensorFlow GPU application (#2381)
* fix: resolve crash when profiling TensorFlow GPU application

* incorporate review comments

* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
2025-12-22 14:00:55 -05:00
Gopesh Bhardwaj 9141f26905 [Documentaion] updating roctx library linkage documentation (#2251) 2025-12-22 10:36:13 -05:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
Aleksandar Djordjevic 7da3275b42 [rocprofiler-systems] Improve metadata parsing (#2238)
* Improve metadata JSON parsing
* Fix string ownership
2025-12-22 12:30:51 +01:00
abchoudh-amd 5b241f3e61 Fixed ctests (#2406) 2025-12-22 13:12:58 +05:30
cadolphe-amd 14c949a827 SWDEV-572676 - adjust tile size to 32 in Unit_hipCGThreadBlockTileType for Navi4x (#2379)
* SWDEV-572676 - adjust tile size to 32 for Navi4x

* SWDEV-572676 - change tile size from fixed value to warp size
2025-12-19 16:43:34 -05:00
Sourabh U Betigeri d552491985 SWDEV-572329 - Remove barrier packet (#2304) 2025-12-19 13:37:48 -08:00
Sourabh U Betigeri fdc1660dfa SWDEV-565304 - Pass numa node to migrate pages correctly (#1729)
* SWDEV-565304 - Pass cpuId of the the thread currently running

* SWDEV-565304 - Numa id to be returned

* SWDEV-565304 - Numa id to be returned
2025-12-19 13:36:53 -08:00
Matt Arsenault 0c0d8dc974 SWDEV-548892 - Stop using __ockl_lane_id (#2186)
__lane_id already exists and is identical.
2025-12-19 20:34:55 +01:00
systems-assistant[bot] 7c989ac022 [SWDEV-525635] Updated output file handling options (#1896)
Currently if the input file name already exists, the tool
appends output to existing file. Added overwrite, append,
or no(discard) options to choose from.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Co-authored-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-19 13:10:42 -06:00
habajpai-amd 7b00d3a89b fix: prevent double-free crash during process exit in amd-smi (#2213) 2025-12-19 11:56:40 +05:30
Sourabh U Betigeri 883fdfb820 Revert "clr: Minor fixes for error return" (#2399)
- This reverts commit 8dd8436e43c7f0d062fd73252bf61c35615d181d.
- Resolve MIOpen test failures observed in TheRock
- TheRock Issue: ROCm/TheRock#2642
- room-systems issue: #2400
2025-12-18 18:40:13 -05:00
Jason Bonnell 112b4fd413 [rocprofiler-compute] Add SDK dependency to rocprofiler-compute-tarball.yml workflow (#2329)
* Install rocm-dev in rocprofiler-compute-tarball.yml workflow

* Update paths for push and PR for rocprofiler-compute-tarball.yml

* Add ROCm dependencies to disttest job

* cmake fix binary link creation and fix format

* Use python3 instead of python3.9 in RHEL 8 and RHEL 9 workflows

* set default python3 to python3.9 in rhel8

* Try alternatives setup for python3 in RHEL8 env

* Add pip install cmake to debug RHEL8 issue

* Remove python3.11 in RHEL8 workflow

* Add back comment regarding RHEL8

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-12-18 11:56:23 -05:00
vedithal-amd e4abee4f7d [rocprofiler-compute] Improve iteration multiplexing code and documentation (#2080)
* Improve Iteration multiplexing

* Improve iteration multiplexing documentation by adding usage note and
  listing caveats

* Bugfixes for iteration mulitplexing
    * Use merge iteration multiplexing in analysis webui and db mode
    * Do not remove Dispatch_ID column in merge iteration multiplexing
      since it is needed for analysis of top dispatches based on
duration

* Bugfixes for analysis logic
    * Graceful handling of missing counters in case of iteration
      multiplexing
    * Improved warnings when metrics could not be calculated due to
      missing counter data
    * Fix the check to prevent showing table when a column is full of
      N/A
    * Improve detection of empty values when metric evaludation fails
      due to missing counter data

* Bugfixes for profile logic
    * Fix kernel filtering during roofline benchmark phase

* Update changelog for bugfixes

* Remove unnecessary columns when merging dispatches for iteration multiplexing

* bugfix

* Better analysis warnings

* fix to_std() in parser

* Use median in merge iteration multiplex

* Address review comments

* Fix cmake formatting

* fix None handling of parser util functions

* Enable stochastic counter accuracy test

* fix cmake formatting
2025-12-18 11:51:21 -05:00
Adam Pryor bd6c6852fc [SWDEV-566924] Update KFD_ID metric to use amd-smi instead of rocprof (#2355) 2025-12-18 08:39:19 -06:00
Jatin Chaudhary fdf73116d5 Do not allocate code objects when we map a static code object (#2332) 2025-12-18 09:22:02 +00:00
habajpai-amd b4e04b07ed test: add unit tests for common utilities from PR #1249 (#2237)
* test: add unit tests for common utilities from PR #1249

* incorporate review comments specific to tests formatting

* use filesystem API instead of std::system for safer cleanup

* Add ghc/filesystem submodule v1.5.14 for portable C++17 filesystem support

* fix: add cmake/GhcFilesystem.cmake for CI submodule auto-checkout

* incorporate review comment

* incorporate review comment
2025-12-18 11:03:14 +05:30
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
David Yat Sin 5ebd50c0b4 rocr: Fix asyncHandler segfault (#2261)
Fix initialization order for the async events handler. The polling
thread would launch before the wake signal is initialized.
2025-12-17 20:52:20 -08:00
Shadi Dashmiz 96f6b6e251 SWDEV-571304 : Fix the constructor for __half (#2240)
- comply with cuda

- Fix usecase for constexpr

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2025-12-17 11:15:20 -05:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
marantic-amd 79ad00fb15 Scale down memory usage data when the actual data is stored to cache (#2343) 2025-12-17 14:57:41 +01:00
Benjamin Welton e3c051d9b8 [RDC] Optimize RDC counter sampling with greedy packing algorithm (#1590)
* Optimize RDC counter sampling with greedy packing algorithm

This change significantly reduces the number of rocprofiler-sdk sample calls
by implementing a greedy packing algorithm that groups multiple counters into
the minimal number of hardware profiles.

Key improvements:
- Implement greedy packing algorithm to combine counters into minimal profiles
- Add ProfileSet structure to manage packed counter configurations
- Cache packed profile sets for reuse across queries
- Group telemetry field requests by GPU for bulk processing
- Reduce sample calls by ~35% (from 100 to 65 for typical workloads)

Performance impact:
- 13 counters now packed into 3 profiles (77% compression)
- Reduces overhead from profile creation and context switching
- More efficient utilization of hardware counter resources

Implementation details:
- Added create_profiles_for_counters() using greedy algorithm
- Added sample_counters_with_packing() for bulk sampling
- Modified telemetry layer to use rocp_lookup_bulk()
- Preserves all field transformations and special handling

Testing shows successful packing with expected performance gains.
No functional changes to external APIs or behavior.

Co-Authored-By: Ben Welton <bwelton@amd.com>

* Address PR review feedback

This commit addresses all review comments from the initial PR:

1. Fix division by zero risk in debug logging
   - Added check for empty counters vector before calculating compression ratio
   - Avoids potential division by zero when logging profile creation stats

2. Improve thread safety for statistics tracking
   - Changed static uint64_t to std::atomic<uint64_t> for thread-safe counters
   - Prevents race conditions in multi-threaded sampling scenarios

3. Remove unused variable
   - Removed unused profile_index variable that was incremented but never used
   - Cleaned up dead code

4. Clean up code formatting
   - Removed extra blank lines for consistency
   - Applied formatting fixes across modified files

5. Refactor code duplication between rocp_lookup and rocp_lookup_bulk
   - Created apply_field_transformation() helper function
   - Eliminates ~70 lines of duplicated switch statement logic
   - Centralizes field transformation logic in single location
   - Makes future maintenance easier

6. Document non-rocprofiler metrics handling
   - Added comments explaining how bulk lookup handles special cases
   - Clarifies that non-profiler fields like KFD_ID are handled in transformation

All changes maintain backward compatibility and pass compilation.

Co-Authored-By: Ben Welton <bwelton@amd.com>

---------

Co-authored-by: Ben Welton <bwelton@amd.com>
Co-authored-by: Adam Pryor <61172547+adam360x@users.noreply.github.com>
2025-12-17 07:56:33 -06:00
abchoudh-amd 6d9d880d31 [rocprofiler-compute] Counter accuracy tests and improvements for iteration multiplexing (#2011)
* Added laplace solver in samples

* Add laplace eqn in CMake

* Added counter accuracy test

* Add iteration CLI arg for laplace eq

* Unnest profile method

* Missing counter warning

* Updated insufficient kernel warning

* Added reference for laplace equation

* variable name change

* Added comments for data comparison

* Included scipy as test requirement

* Added line number for ref

* split stochastic and deterministic tests

* Added order cli option for laplace_eqn

* Install laplace eqn

* Missing counter warning

* Warn about missing kernels during analysis

* Update tests

* Split iteration multiplexing ctests

* Updated warning

* Incorporated copilot's suggestions
2025-12-17 18:26:39 +05:30
xuchen-amd c738e73d99 [rocprofiler-compute][tui] menu bar lag fix (#1942) 2025-12-16 17:02:27 -05:00
amd-juwillia 3a3738ad98 Added AMDSMI CI to rocm-systems(#2074)
Signed-off-by: Justin Williams <Justin.Williams@amd.com>
2025-12-16 13:52:42 -06:00
Joseph Narlo 16f06808d4 [SWDEV-565460] AMD SMI Document Multiple Init Best Practices (#2293)
* [SWDEV-565460] AMD SMI Document Multiple Init Best Practices

Signed-off-by: amd-josnarlo <josnarlo.amd.com>

* Add sphinxcontrib-mermaid to render diagram in HTML

bump rocm-docs-core to 1.31.0
pip-compile requirements.txt

---------

Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
2025-12-16 11:06:18 -06:00
Aleksandar Djordjevic 0b4a309ff7 [rocprofiler-systems] Add span (#2142)
* Add span
* Update unit tests
2025-12-16 13:38:47 +01:00
Kent Russell 0a2ea9ef55 hsakmt: Expose and use CWSR and Control stack sizes (#2200)
* hsakmt: Expose CWSR and Control stack sizes

This is better than hardcoding values and hoping that they align with
KFD's definitions

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Use CwsrSize and CtlStackSize if available

If KFD is providing the CwsrSize and CtlStackSize, use the maximum
of those and the old calculations for the ctx_save_restore_size
and ctl_stack_size defined in the queue

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Add warning when ABI<1.20 on GFX1151

CwsrSize and CtlStackSize are reported by KFD ABI 1.20. GFX1151
specifically may have some issues if these regions are misaligned, so
report a strong warning during topology initialization if the system is
GFX1151 but is using KFD ABI < 1.20

Signed-off-by: Kent Russell <kent.russell@amd.com>

---------

Signed-off-by: Kent Russell <kent.russell@amd.com>
2025-12-16 06:26:14 -06:00
Milan Radosavljevic 666e76deac [rocprofiler-systems] Add cached demangler and replace old demangle (#2135)
* Add cached demangler and replace old

* Add unit tests

* Applied suggestions from code review

* Applied suggestions from code review
2025-12-16 08:32:18 +01:00
arvindcheru 21afa807a9 Enable Lintian Support for ROCM-SMI, ROCMINFO (#1650)
* Enable Lintian Support for ROCM-SMI

* Enable Lintian Support for ROCMINFO

* Updated Lintian Override File Processing

* Update UT Fix for Lintian rocmsmi,rocminfo

* Update UT Fixes, Review Comments

* Update Review Comments - removed extra white spaces, added error check for gzip, date commands

* Update Review Comments - Correcting License Type

* Sync Lintian ChangeLog

* Changelog data sync enhanced

* Update Review Comments, UT fix

* white space cleanup - precommit check
2025-12-15 14:35:28 -06:00
randyh62 1240b592a5 Git url fix (#2285)
* Update README-doc.md

Correct GitHub URL for components moved into rocm-systems

* Update amd_clr.rst

Update github.com URLs

* Update Dockerfile

Update rocm-systems paths

* Update CONTRIBUTING.md

update for rocm-systems

* Update CONTRIBUTING.md

minor change

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* Update hip_runtime_api.rst

Update for rocm-systems

* Update installation.rst

update URL to libhsakmt

* Update what_is_hip.rst

* Update projects/clr/CONTRIBUTING.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update projects/clr/README-doc.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update Dockerfile

Update git clone for sparse checkout

* Update projects/hip/CONTRIBUTING.md

* Update projects/clr/CONTRIBUTING.md

* Update projects/hipother/CONTRIBUTING.md

---------

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>
2025-12-15 11:57:18 -08:00
Mario Limonciello 08949cb884 Run pre-commit's whitespace related hooks on projects/amdsmi (#2119)
* Run pre-commit's whitespace related hooks on projects/amdsmi

In order for pre-commit to be useful, everything needs to meet a common
baseline.

* Add whitespace back to Changelog for formatting

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
2025-12-15 13:20:47 -06:00
gabrpham 48e57d3e2a Version bump and Changelog update for ROCm version 7.2 (#2201)
* Update projects/amdsmi/CHANGELOG.md
* Bump to 26.2.1
---------

Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
2025-12-15 13:19:30 -06:00
Jonathan R. Madsen a71cc3cc88 [rocprofiler-sdk] Optimize rocprofiler-sdk find_clients() (#2267) 2025-12-15 11:47:37 -06:00
systems-assistant[bot] b002c6a739 SWDEV-538607 - Add SIMDe as a build dependency, remove naked intrinsic use. (#500)
Co-authored-by: Alex Voicu <alexandru.voicu@amd.com>
Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
2025-12-15 17:40:51 +00:00
Ajay GunaShekar b9dc8f729a SWDEV-566268 - skip 2 failing tests on rock Windows (#2308) 2025-12-15 12:32:37 -05:00
Matt Arsenault 49565f9d9f SWDEV-548892 - Always declare used ocml and ockl device libs functions (#2230)
Ignore __CLANG_HIP_RUNTIME_WRAPPER_INCLUDED__. This should not be relying
on declarations from the clang builtin headers. There is no issue declaring
the same intrinsics multiple times. This will enable removal of declarations
from the clang builtin headers.
2025-12-15 17:23:33 +01:00
Fábio Mestre 447beeb00b Replace usages of __ockl_gws_init with __builtin_amdgcn_ds_gws_init (#2235) 2025-12-15 16:56:14 +01:00
jamessiddeley-amd 706a8382a5 [rocprof-compute] added graceful exit with corrupt roofline.csv in profile and analyze mode (#1811)
* added graceful errors/exit in profile/analyze roofline.csv

* edit if statement truth

* restore if statement truth (roofline_csv needs at least 2 rows)

* addressed comments and skipped showing roof metrics when data invalid

* fix workload merge

* changed warning to error

* removed redundant variable definition

* added roofline csv validate check in TUI

* add test cases to test validation function

* ruff format

* simplified TUI roofline handling
2025-12-12 17:06:37 -05:00
jamessiddeley-amd 81720183ad [rocprof-compute] Merge CDash Nightly and Continuous workflow files (#2279)
* merged code-coverage and continuous workflow files

* fixed runner typos and added build mode

* add actor name to Continuous build

* improve error handling and remove redundant verbose

* fixed workflow file log output

* revert logs output in run_ci.py

* ruff format
2025-12-12 17:04:56 -05:00
Dominic Widdows 9a8ed9f45d Doc updates updating internal links from deprecated repos to rocm-systems project locations (#2294)
* Update README documentation links for clarity and consistency across projects

- Changed links in the README files for `clr`, `hipother`, and `hip-tests` to use relative paths instead of absolute URLs, improving navigation within the repository.

* Update CONTRIBUTING documentation to use relative links for improved navigation

- Changed absolute URLs to relative paths in the CONTRIBUTING.md files for the hip and hipother projects, enhancing consistency and ease of access within the repository.
2025-12-12 13:21:42 -08:00
Ajay GunaShekar 0bb5638481 Rock: hip-tests installation path to remain same for linux/windows (#2187)
* Rock: hip-tests installation path remains same for linux and windows

On theRock - installation path remains same linux/windows
share/hip/catch_tests

On internal win build - hip-tests will be installed to catch_tests
flag is passed internally which controls the path.
2025-12-12 08:45:28 -08:00
vedithal-amd 4870725a62 Do not absolute python path when adding tests (#2282) 2025-12-12 10:57:19 -05:00
vedithal-amd 793732a04e [rocprofiler-compute] Improve amdsmi interface (#2245)
* Improve amdsmi interface

* Fix issue where max mem clock was being set as max gfx clock

* Handle the case when all device handles might not be usable due to
  devices being hidden by ROCR and HIP environment variables

* Fix get gpu vram size to return str in KB

* Improve testing of amdsmi interface functions
2025-12-12 09:02:37 -05:00