Grafik Komit

2066 Melakukan

Penulis SHA1 Pesan Tanggal
Flora Cui be04fa8250 rocr: reorder HsaNodeProperties to improve compatibility (#2447)
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-08 09:56:39 +08:00
David Galiffi cb17e59a57 [rocprofiler-systems] Improve build time by refactoring RCCL test cmake (#1656)
Improve cmake configuration time by making sure the rccl-tests are built during the build phase rather than the configuration phase.
2026-01-07 19:51:54 -05:00
anujshuk-amd c35a7dd8cb [rocprofiler-systems] Update timemory submodule (#2440)
- Fixes SWDEV-559349 
- Fix build failure caused by correct libunwind not being found in some environments.
- Updated the `timemory` submodule to commit `24407d37ab85c46ba6c18fba9498320f825ee4e4 `.
2026-01-07 19:35:23 -05:00
Ajay GunaShekar 95ab459a4c Use static catch2.lib instead of catch2.dll (#2419)
* Use static catch2.lib instead of catch2.dll

Using catch2.dll incraeses execution time by 12x

* handle debug option for static catch2

* SWDEV-573539 - skip atomics on windows since its taking a very long time to execute

mlsejenkins needs newer cmake but compiler breaks with newer versions
so skipping on windows can be a workaround for now

---------

Co-authored-by: Joseph Macaranas <145489236+jayhawk-commits@users.noreply.github.com>
2026-01-07 14:35:25 -08:00
Alysa Liu 5be4fddf06 kfdtest: Support blit kernel copy (#677)
Add support for blit kernel copy.
Add GpuMemCopyTest test for KFDQMTest.
2026-01-07 16:48:11 -05:00
Aleksandar Djordjevic aecea25a61 [rocprofiler-systems] CMake Cleanup (#2455)
## Technical Details

- Removed `configure_file()` call that was generating `defines.hpp` from `defines.hpp.in` and update CMake file to reference renamed file.
- Remove duplicate `find_library(pthread_LIBRARY NAMES pthread pthreads)`
2026-01-07 14:07:37 -05:00
anujshuk-amd 596ffce5fe [rocprof-sys] Fix segfault from thread ID array overflow (#2172)
**Thread limit configuration and enforcement: **

* Added a check in `CMakeLists.txt` to ensure `ROCPROFSYS_MAX_THREADS` is at least 128, automatically setting it to 128 with a warning if a lower value is provided.
* Replaced hardcoded thread limit (`allowed_max_threads`) in `pthread_create_gotcha.cpp` with the configurable `ROCPROFSYS_MAX_THREADS` value, ensuring all runtime checks and warnings use the actual configured limit.

**Documentation improvements: **

* Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules.

**Test updates: **

* Modified thread limit tests to use the configurable `ROCPROFSYS_MAX_THREADS` value instead of a hardcoded limit and expanded the range of tested thread values.
* Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.
2026-01-07 14:03:37 -05:00
vedithal-amd 050e88ee71 Remove unused python packages (#2437)
* Remove dependency on following unused python packages by updating
  requirements.txt, LICENSE, standalone binary requirements, cmake and
  docker requirements
    * matplotlib
    * kaleido
    * pymongo
    * colorlover
    * tqdm

* Remove unused code from src/utils/gui.py

* Reformat python using ruff
2026-01-07 09:03:49 -05:00
Godavarthy Surya, Anusha 1ef6a86ee3 SWDEV-549711 - Improve graph DEBUG dot print for segments (#2205)
Co-authored-by: Anusha GodavarthySurya<agodavar@amd.com>
2026-01-07 14:07:49 +05:30
Stella Laurenzo 81eed26ec6 [amdsmi] Add include dirs for libdrm. (#2504)
This has started failing on various developer build systems. Looking at it, it is not precisely clear how this ever worked given that nothing appears to be adding the DRM include dirs.

I'd prefer that we remove this delay loading (at least for TheRock builds where it is never needed), but in the meantime, this does fix the issue and is verified on an affected system.

Fixes https://github.com/ROCm/TheRock/issues/2744
2026-01-06 15:18:20 -08:00
Yazen AL Musaffar cb372748f8 [ROCM-SMI] [SWDEV-569731] rsmi tests failing on Frequency/Power/GpuMetrics ReadOnly Fix (#2303)
* Updated unsupported metric version file for rocm_smi_tests Frequency/Power/GpuMetrics ReadOnly tests

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-06 16:46:38 -06:00
Gerardo Hernandez 50644f5aef SWDEV-508225 remove assertions when loading fat binary (#2013)
* SWDEV-508225 - do not assert() after calling digestFatBinary() if it fails. Otherwise this causes assertions to trigger easily in systems that have an APU and a discrete GPU and the code was compiled for the discrete one

* SWDEV-508225 - fix that when using a non-existent ordinal in HIP_VISIBLE_DEVICES, getCurrentArch() would crash
2026-01-06 21:53:32 +00:00
Daniel Oliveira 32fde0f73d [SWDEV-568613] Add gpu_metrics 1.0 support for older GPUs (#2444)
fix: Add gpu_metrics 1.0 support which is still used by some hardware

Code changes related to the following:
  * APIs
  * Unit tests

Signed-off-by: Oliveira, Daniel <daniel.oliveira@amd.com>
2026-01-06 14:25:13 -06:00
systems-assistant[bot] c6b7448227 Add support for get and set APIs for CPUISOFreqPolicy and DFCState Co… (#1901)
* Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control

  - Add support for get and set APIs for CPUISOFreqPolicy and DFCState Control
    in AMD SMI and also in the CLI tool

* CHANGELOG.md file updated

* SWDEV-562837: Update amdsmi-py-api.md as per the new APIs

Updated amdsmi-py-api.md as per the new APIs added.

---------

Signed-off-by: Soumya <sranjanr@amd.com>
Signed-off-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Saka Sitharammurthy <SitharamMurthy.Saka@amd.com>
2026-01-06 10:37:07 -06:00
SakaSitharammurthy 6c98c49362 [SWDEV-568731] Updated example code in amdsmi-py-api.md file (#2311)
Addresses:
- SWDEV-568731
- SWDEV-568724
- SWDEV-568695

Signed-off-by: Saka, SitharamMurthy <SitharamMurthy.Saka@amd.com>
2026-01-06 10:34:36 -06:00
pghoshamd 637b0d71f0 SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146)
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers

* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex

* Replaced unique_locks with lock_guards

* More changes

* Replace new and deletes with smart pointers

* Replaced some more with shared ptrs

* Replacements with smart pointers - pt 2

* missed change
2026-01-06 10:59:34 -05:00
vedithal-amd e005f8487b [rocprofiler-compute] Add gfx arch. based pre-processor guards and runtime checks in rocflop.cpp (#2487)
* Remove MFMA functionality in rocflop sample since its not supported in MI50

* Add gfx arc based support for MFMA and SMFMAC in rocflop.cpp

* Add --int32 usage doc

* Address review comments
2026-01-06 10:17:54 -05:00
Jonathan R. Madsen 7fcea905f3 [rocprofiler-sdk] Fix double-buffering emplace and flush synchronization (#2334)
* Fix buffer tracing synchronization lock

- PR #529 (in rocprofiler-sdk-internal) introduced waiting on the syncer flag when emplacing in a buffer to prevent the overwriting buffer records currently being processed in a buffer flush callback
- The above fix introduced a block on the both buffers when a buffer flush callback was being executed instead of a block on the buffer being flushed.

* Add rocpd tests for duplicate records

* Address code review comments
2026-01-06 06:06:18 -06:00
habajpai-amd 9e4d1c31c7 fix: prevent static initialization deadlock in thread_data (#2474)
* fix: prevent static initialization deadlock in thread_data

* update comment
2026-01-06 16:39:32 +05:30
Jason Bonnell 1d5a6e9bfe Update rocprofiler workflows to use new mi325 runner names (#2467)
* Update rocprofiler workflows to use new runner naming for mi325

* Add input options to workflow_dispatch for rocprofiler-systems CI workflow

* Update runner name on therock-ci-linux.yml as well
2026-01-05 15:41:01 -05:00
AidanBeltonS 39d8432893 SWDEV-566854 - Improve memory object handling (#1939)
* Improve memory object handling for memcpy

* update

* Pass offsets and make hip_graph changes

* Update projects/clr/hipamd/src/hip_memory.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Remove unnecessary command overload

* Update based on feedback

* Fix failing hipGraphTests

* Fix graph bugs

* Fix failing memcpy tests

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-05 18:05:56 +00:00
Benjamin Welton 7871f53563 Add gfx950 support to ValuPipeIssueUtil counter (#2396)
Add gfx950 (MI350) to the ValuPipeIssueUtil counter definition to
enable RDC_FI_PROF_VALU_PIPE_ISSUE_UTIL telemetry field support on
MI350 hardware.
2026-01-05 09:37:34 -08:00
Julia Jiang 88f4bb1988 SWDEV-564412 - fix test failure on hipSetValidDevices_with_hipMemcpyPeer (#2150) 2026-01-05 12:36:31 -05:00
Julia Jiang 0f0504d79d SWDEV-564412-Fix soft hang in HIP sub-test hipMemVmm_Uncached (#2223) 2026-01-05 12:36:08 -05:00
Julia Jiang 3568e0df02 SWDEV-563487 - Fix catch tests failures on Windows (#2097) 2026-01-05 12:35:41 -05:00
Shadi Dashmiz 2789ea429a SWDEV-565300: Fix coherency range mode in mem pool pointers (#2296)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-05 11:33:11 -05:00
jamessiddeley-amd 53fd27c0ed [rocprofiler-compute] Improve roofline logging for roofline.csv (#2390)
* enhanced roofline log output for graceful exit

* addressed comment, added block filtering

* ruff format
2026-01-02 14:41:28 -05:00
Swati Rawat 3f004c9237 Update using-rocprofv3-with-openmp.rst (#2473) 2026-01-02 22:29:39 +05:30
Sv. Lockal afaa412d9d [rocprofiler-register] Fix compilation with libc++ (#1241)
`tests/rocprofiler/rocprofiler.cpp` uses `std::string` without including `<string>` directly.
This works with libstdc++ due to transitive includes, but fails with libc++.

Closes #1240
2026-01-02 22:26:56 +05:30
Ioannis Assiouras aecc845456 SWDEV-573589 - Fixed performance regression due to the increase of the signal pool (#2470) 2026-01-02 12:50:56 +00:00
Joseph Narlo 03f714dd25 [SWDEV-567254] Sync Unified and Linux header (#2220)
* [SWDEV-567254] Sync Unified and Linux header

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>

* Latest sync changes

* Sync

* Add back guest_windows tag

* Sync

---------

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2025-12-30 13:27:55 -06:00
vedithal-amd ca32193c84 Fix test cases (#2462) 2025-12-30 11:39:20 -05:00
Jimbo a59d46ffbf SWDEV-567545 - Implement block_rank in co-op grid groups (#2182)
* SWDEV-567545 - Implement block_rank in co-op grid groups
2025-12-29 11:39:23 -05:00
Adam Pryor 5bf6e366dd [SWDEV-548460] Add RDC Policy Reset Message (#2180)
* [SWDEV-548460] Add RDC Policy Reset Message

* [rdc] Bump version to 1.3.0

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

* chore: [rdc] Format CMakeLists.txt

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>

---------

Signed-off-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
Co-authored-by: Galantsev, Dmitrii <dmitrii.galantsev@amd.com>
2025-12-29 08:31:13 -08:00
German Andryeyev 741b4b9fdf SWDEV-558849 - Fix Windows build for ROCR backend (#2368) 2025-12-29 08:35:22 -05:00
vedithal-amd ea3fb1b810 Remove SMFMAC functionality in rocflop sample since its not supported in MI100 (#2456) 2025-12-27 09:47:54 -05:00
vedithal-amd 9c1560b8bb [rocprofiler-compute] Fix merging logic for multi process (#2445)
* Fix merging logic for multi process

* Fix dispatch id reset logic in case of rocpd format

* Fix kernel id reset logic in case of csv format

* Revert correlation logic change in csv format

* Do inner join instead of left join
2025-12-27 09:47:42 -05:00
abchoudh-amd 983386e40b [rocprofiler-compute] Write raw counter and metric values (#2314)
* Added tool for dumping counter and metric values

* Skip Linting

* Added support for iteration multiplexing

* Remove subparser and supress compute options

* Specify output dir

* Add kernel info

* csv name change

* Added comments

* Support dispatch id-less dataframes

* Formatting fix

* Add default for path

* Print help with no args

* Support only single workload
2025-12-26 14:06:57 +05:30
marantic-amd bb83791b17 Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code (#2434) 2025-12-25 13:36:04 +01:00
marantic-amd c3132773c8 Fix agent device ID in the cached kernel_dispatch trace (#2452) 2025-12-25 10:23:16 +01:00
Bindhiya Kanangot Balakrishnan 641fa27699 [SWDEV-566543] Fix param validation in FrequenciesRead test (#2430)
Fixed incorrect error code expectation in FrequenciesRead
test when calling amdsmi_get_gpu_pci_bandwidth() with nullptr
parameter.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2025-12-23 15:38:25 -08:00
Ioannis Assiouras 49b8900158 SWDEV-558849 - keep the lastEnqueueCommand_ when PAL backend is enabled (#2320) 2025-12-23 21:24:09 +00:00
vedithal-amd 61fd728fdb [rocprofiler-compute] Faster counter accuracy testing (#2420)
* Faster counter accuracy testing

* Better handle SPI_CSN_* metrics for lesser than MI350 series

* Use metric filtering to collect only relevant counters for comparison

* Ensure all workload folders are deleted after testing is completed

* Dont use clean_existing=False

* Add manual test for all counter accuracy
2025-12-23 13:13:53 -05:00
vedithal-amd d7302d6c1c [rocprofiler-compute] Test env. vars. in rocprofiler-sdk backend (#2414)
* Test env. vars. in rocprofiler-sdk backend

* Improve rocprofiler-sdk backend test case to check for env. vars. and
  ensure we do not overwrite irrelevant env. vars.

* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.

* Formatting fixes

* Test fixes

* Remove redundant code in tests

* Remove usage of utils_mod and use utils instead, this prevents
  duplicate imports
2025-12-23 13:13:28 -05:00
vedithal-amd 588773f9bf [rocprofiler-compute] Fix for multi process workload profiling (#2418)
* Fix for multi process workload profiling

Native counter collection tool updates:
    * Do not dump empty counter data for a process
    * Use PID instead of UUID for dumped csv files to facilitate correlation
    * Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
      native tool) files
    * Handle merging multiple pairs of csv (from sdk tool) and csv (from
      native tool) files

Rocpd output format updates:
    * Merge multiple rocpd databases into a single csv
    * Reset dispatch id and kernel id for unique dispatches and unique
      kernels respectively
    * Retain multiple rocpd databases per run for multi process workloads

* Add test case for multiprocess profiling using rocflop workload

* Add rocflop

* Fix native counter csv to rocprofv3 csv conversion

* Use kernel_id instead of dispatch_id to correlate native counter csv
  and kernel trace csv

* python formatting using ruff 0.14 instead of 0.13
2025-12-23 13:12:18 -05:00
marandje 3e49440495 SWDEV-555178 - Calculate phys mem offset for remap range (#1879) 2025-12-23 10:27:42 +01:00
Milan Radosavljevic 719556fbba [rocprofiler-systems] Add SIGKILL delay option (#2384)
## Motivation

When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.

## Technical Details

- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization
2025-12-22 21:17:57 -05:00
Young Hui - AMD 37e3b8a3db [rocpd] Write rocpd yaml files as a list, even when only 1 file (#2288) 2025-12-22 17:56:59 -05:00
habajpai-amd 447025011a [Rocprof-Sys] Resolve crash when profiling TensorFlow GPU application (#2381)
* fix: resolve crash when profiling TensorFlow GPU application

* incorporate review comments

* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
2025-12-22 14:00:55 -05:00
Gopesh Bhardwaj 9141f26905 [Documentaion] updating roctx library linkage documentation (#2251) 2025-12-22 10:36:13 -05:00