İşleme Grafiği

74772 İşleme

Yazar SHA1 Mesaj Tarih
systems-assistant[bot] 88f07baa92 SWDEV-493792 - add split barriers for grid_group (#508)
* SWDEV-493792 - add split barriers for grid_group

* add tests

* Update change log

* Add Navi4 split barrier

* Update docs

* Use new Catch2 Approx macro

* Update split_barrier.cc to check for coop groups

---------

Co-authored-by: Jatin Chaudhary <jatchaud@amd.com>
Co-authored-by: Jatin Chaudhary <51944368+cjatin@users.noreply.github.com>
2026-01-19 09:17:00 -08:00
lloginov-amd e49b501e9a Add scratch memory support (#2211) 2026-01-19 16:24:30 +01:00
Gopesh Bhardwaj 1ac805cb35 [rocprofiler-sdk][Documentation] Updating CHANGELOG for 7.2 (#2573)
* Updating CHANGELOG for 7.2

* Updated CHANGELOG

* Addressed feedback

* Addressed Feedback

* Updated based on review comments

* Update installation steps and documentation links

Updated installation documentation and links to latest repository.

* Addressed Feedback

* Updated CHANGELOG

* Addressed feedback

* updated CHANGELOG

* Apply suggestion from @prbasyal-amd

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-17 14:55:55 +05:30
Aravind Ravikumar f336ad5133 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 239d62f545]
2026-01-16 23:13:06 -05:00
Aravind Ravikumar 239d62f545 Enable Robust Multi-Node RCCL Testing with cvs-sbatch and Improve CI Reliability (#2123)
* sbatch changes and TheRock SHA update

* Move tests location from /home to /apps/cvs_tests

* Add comments and move credential.ini file to /apps/cvs_tests

* Changed salloc reservation to rccl reservation

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
2026-01-16 23:13:06 -05:00
habajpai-amd b53c99669c Revert "fix: prevent double-free crash during process exit in amd-smi (#2213)" (#2640)
This reverts commit 7b00d3a89b.

The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
2026-01-16 16:08:52 -05:00
Rahul Vaidya 62dab32433 Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl-tests commit: a52452e891]
2026-01-16 11:28:40 -08:00
Rahul Vaidya a52452e891 Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161)
Signed-off-by: ravaidya <ravaidya@amd.com>
2026-01-16 11:28:40 -08:00
cfallows-amd 005f07004d [rocprofiler-compute] Update README (#2589)
* Update readme general section and citation version and date.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Minor change to project title- changing now to not forget but we are waiti8ng on feedback about citation from r&d.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Edit citation from R&D feedback

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2026-01-16 12:48:19 -05:00
Rahul Manocha 249a947dc7 Fix set/get access failure for VMM on windows (#2280)
* Fix set/get access failure for VMM on windows

* seperate code paths for linux and windows to avoid using import/export calls in windows

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-16 08:34:21 -08:00
akolliasAMD 2606c13155 Tests package (#384)
* added packaging for the tests and for the driver.sh

* making .sh files into programs so they keep permissions

[ROCm/rocshmem commit: e7269cb925]
2026-01-16 09:10:36 -07:00
akolliasAMD e7269cb925 Tests package (#384)
* added packaging for the tests and for the driver.sh

* making .sh files into programs so they keep permissions
2026-01-16 09:10:36 -07:00
German Andryeyev 07a6b45535 rocr: restore the original line 2026-01-16 11:05:24 -05:00
Aurelien Bouteiller ede2adfe49 new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.

[ROCm/rocshmem commit: cca7872bcf]
2026-01-16 10:40:48 -05:00
Aurelien Bouteiller cca7872bcf new tester: put to all pes from all lanes concurrently (#112)
* Add put to all pes from all lanes concurrently

* Remove wg_init, use size_t for size params, 64bit data exchange (more
bits for verification masking)

* Rename to flood-test, add put,putnbi,p,get,getnbi,g variants, count time
correctly

* Add flood tester to the testing script

* add to gda test case w/o the _g variant that is not implemented.
2026-01-16 10:40:48 -05:00
vedithal-amd f64d8e0f43 [rocprofiler-compute] Improve native tool discovery and partition detection (#2630)
* Improve native tool discovery and partition detection

- Enhanced native tool path resolution to support CMAKE_INSTALL_LIBDIR variations
  (lib, lib64, lib32, etc.) using glob pattern matching
- Extracted path variables to avoid duplication in error messages
- Improved error message clarity by showing exact paths searched for .so and .cpp files
- Simplified code path construction using consistent Path.resolve().parents[x] syntax

- Fixed redundant partition warnings on pre-MI300 GPUs by adding architecture check
- Only query compute/memory partition on MI300+ series (gfx940+)
- Added proper type hints for gpu_arch parameter
- Moved gpu_info extraction after soc_info to ensure gpu_arch is available
- Improved code comments for MI300 series threshold

* Handle gpu arch like a hex string
2026-01-16 10:36:19 -05:00
Fábio Mestre e6236417f7 SWDEV-571222 - Fix bf16 headers on gcc (#2260)
GCC does not support anonymous structs with members that have non-trivial constructors. This commit changes the header to remove the union when compiling with gcc. This should be a non-breaking change for other compilers.
2026-01-16 15:02:48 +00:00
Edgar Gabriel 3ce10dc688 fix allreduce tester (#385)
- use the reduce_psync buffers for synchronization in allreduce, not the
  barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
  discussion it was determined that it is required for correctness.

[ROCm/rocshmem commit: 6f512e92a5]
2026-01-16 08:10:25 -06:00
Edgar Gabriel 6f512e92a5 fix allreduce tester (#385)
- use the reduce_psync buffers for synchronization in allreduce, not the
  barrier_psync.
- execute a wwg barrier after the allreduce operation. After internal
  discussion it was determined that it is required for correctness.
2026-01-16 08:10:25 -06:00
Fábio Mestre 7794ac9ac6 [hip-tests] Fix Float16 accuracy tests (#2178)
Tests were relying on floats for calculating ulp values when validating the output. This is not correct given that the calculations are done using Float16. The fix is to update the test framework to use fp16 ulp instead.
2026-01-16 13:25:11 +00:00
Kian Cossettini 9f014db6a4 [rocprofiler-systems] Update install path for examples (#2625)
* Update install path for examples to `share/rocprofiler-systems/examples`

----

Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-15 21:51:16 -05:00
German Andryeyev e438308541 rocr/libhskamt: Add wsl build in thunk 2026-01-15 17:29:50 -05:00
Omri Mor 93493e3e46 ionic: fix byteswap functions (added in #345), missed in #368 (#388)
[ROCm/rocshmem commit: 885e41ec62]
2026-01-15 14:19:19 -08:00
Omri Mor 885e41ec62 ionic: fix byteswap functions (added in #345), missed in #368 (#388) 2026-01-15 14:19:19 -08:00
German Andryeyev 5c5b9729ff Add 'projects/rocr-runtime/libhsakmt/include/hsakmt/drm/' from commit '8c47e25315e70f9c8cdd57a5790d3e080938c969'
git-subtree-dir: projects/rocr-runtime/libhsakmt/include/hsakmt/drm
git-subtree-mainline: 5319163521
git-subtree-split: 8c47e25315
2026-01-15 16:06:07 -05:00
Omri Mor 3260759dfd Replace byteswap interface to align with C++23 std::byteswap (#368)
* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction

[ROCm/rocshmem commit: cf8b72a047]
2026-01-15 13:03:01 -08:00
Omri Mor cf8b72a047 Replace byteswap interface to align with C++23 std::byteswap (#368)
* byteswap<T> returns by value
* replace hand-rolled implementations with Clang __builtin_bswap<N> intrinsics
* new high-level interface endian::to_be, endian::from_be, etc. to indicate conversion direction
2026-01-15 13:03:01 -08:00
German Andryeyev 5319163521 Add 'projects/rocr-runtime/libhsakmt/include/impl/' from commit 'c34ec1e52fcb52da248c00207ebe646197ea9d3e'
git-subtree-dir: projects/rocr-runtime/libhsakmt/include/impl
git-subtree-mainline: 55f7d39fa5
git-subtree-split: c34ec1e52f
2026-01-15 15:54:37 -05:00
German Andryeyev 55f7d39fa5 Add 'projects/rocr-runtime/libhsakmt/src/dxg/' from commit '029690f0a4f62fefefbb67305a066a72e99f8c0b'
git-subtree-dir: projects/rocr-runtime/libhsakmt/src/dxg
git-subtree-mainline: 8760fb4976
git-subtree-split: 029690f0a4
2026-01-15 15:51:21 -05:00
Mark Meserve 8760fb4976 attach: Formalize ROCAttach API (#1653)
* attach: Formalize ROCAttach API

- Make ROCAttach public with public headers
- Change detach to take a PID
  - attach and detach are now reentrant
- Cleanup of states and signal handling in ptrace session
- Fixes mixed up definition of ROCPROF_ATTACH_TOOL_LIBRARY
  - ROCPROF_ATTACH_TOOL_LIBRARY now always means the tool library loaded by the attachment target
  - ROCPROF_ATTACH_LIBRARY refers to the library used to perform attachment
- Add direct call of rocprof-attach
- Fix python library call of rocprof-attach
  - Function now named attach(), changed from main()

* attach: rocprof-compute ROCAttach updates

- Update to new library names
- Correct usage of C lib detach

* attach: add test for rocattach

- Disable ASan, TSan, and UBSan for the new parallel-attach test
- Lower log level for LSan tests, existing behavior from other tests

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
2026-01-15 14:32:14 -06:00
dsclear-amd 2482bff0b7 Excludes (more) docs-only changes from .azuredevops/rocm_ci_caller.yml. (#2615)
Motivation

We wish to avoid triggering full Jenkins runs for docs-only PRs, as this takes up testing resources and slows development time. rocm_ci_caller.yml already excludes some docs-only changes, but this can be improved to exclude them along more paths.
Technical Details

The checks that rocm_ci_caller.yml uses to determine if a changed file in a PR is worth a Jenkins run has been increased to exclude more paths and more file suffixes.
JIRA ID

AIROCDOC-78, AIROCDOC-424
Test Plan

    Created a test branch users/dsclear/shorten_workflows_test_root with the changes in this PR, branched from develop.
    Branched users/dsclear/shorten_workflows_test_bin_3 and users/dsclear/shorten_workflows_test_text_3 from users/dsclear/shorten_workflows_test_root.
    Modified users/dsclear/shorten_workflows_test_bin_3 to add two .h files, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications. #2613).
    Modified users/dsclear/shorten_workflows_test_text_3 to add a new .txt file, and submitted a PR into users/dsclear/shorten_workflows_test_root (Test PR, do not merge. Test PR to test Jenkins CI/CD modifications (docs only). #2614).

Test Result

The test PR in step 3 caused rocm_ci_caller.yml to attempt to trigger Jenkins, as this is a 'non-docs' change.
The test PR in step 4 had the attempt to trigger Jenkins skipped, as this is a 'docs-only' change.
2026-01-15 14:54:20 -05:00
Mario Limonciello 838b3dccf1 Adjust amdgpu version output for amd-smi (#2563)
* Fix the amdgpu version string comparison

The intention behind it was to avoid showing the string if it's not
got information.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>

* Display the kernel version in amd-smi output

This is an interesting debugging point, especially in the case of
not having a DKMS package installed.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Moving os_kernel_version to static --driver

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

---------

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-15 11:11:58 -08:00
yugang-amd fe60c39256 Bump rocm-docs-core to 1.31.2 (#2627)
* Update requirements.in

* Update requirements.txt
2026-01-15 13:18:30 -05:00
yugang-amd bcd9119dbc Bump rocm-docs-core to 1.31.2 (#387)
[ROCm/rocshmem commit: 491739c9b4]
2026-01-15 13:17:51 -05:00
yugang-amd 491739c9b4 Bump rocm-docs-core to 1.31.2 (#387) 2026-01-15 13:17:51 -05:00
Bindhiya Kanangot Balakrishnan aa16cca39a [SWDEV-549108] Increase gpu_metrics API execution test threshold (#2617)
Increased threshold from 2100 μs to 3100 µs to accommodate
gpu_metric read time variation across Navi systems.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-15 11:20:17 -06:00
Matthias Gehre 1883f736ad Fix double-free crash when librocm_smi64.so and libamd_smi.so are loaded together (#2531)
Problem:
When TheRock-based PyTorch package is installed along with amdsmi, importing
torch causes a double-free crash on exit (GitHub issue ROCm/TheRock#2269).

Root cause:
Both librocm_smi64.so and libamd_smi.so export the C++ static member
'amd::smi::Device::devInfoTypesStrings'. When libraries are loaded with
RTLD_GLOBAL, the dynamic linker resolves libamd_smi.so's reference to this
symbol to the one in librocm_smi64.so. This causes:
1. librocm_smi64.so registers its destructor for devInfoTypesStrings
2. libamd_smi.so also registers a destructor, but for the SAME address
3. On exit, both destructors run on the same object -> double-free

Fix:
Change devInfoTypesStrings from a class static member to a file-local static
variable. This ensures the symbol has internal linkage and is not exported,
preventing the symbol collision.

Changes:
- rocm_smi_device.h: Remove static member declaration
- rocm_smi_device.cc: Change from 'Device::devInfoTypesStrings' to file-local
  'static const std::map<...> devInfoTypesStrings'
- rocm_smi.cc: Remove the global alias to the (now removed) class member

Tested on gfx1151. `import torch` crashed on exit before the fix, and doesn't crash after the fix.
2026-01-15 08:43:47 -08:00
Filip Jankovic 29cd25df66 Add hipDeviceAttributeExpertSchedMode (#2435)
* Add hipDeviceAttributeExpertSchedMode

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

* Update hipDeviceAttributeExpertSchedMode unit test

* Move check to ROCr from thunk interface

* Revert unrelated whitespace changes

* Revert version bump

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
2026-01-15 08:41:39 -08:00
Milan Radosavljevic 940488ed58 [rocprofiler-systems] Fix naming and description of process_page category (#2606) 2026-01-15 16:10:50 +01:00
Milan Radosavljevic 318d13870f [rocprofiler-systems] Update logging to use spdlog library (#2428)
## Motivation

- Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Better performance through compile-time formatting
- Consistent formatting using fmt library
- Runtime log level control via arguments and environment variables
- Easier maintenance and debugging capabilities

## Technical Details

- Added spdlog as a submodule and integrated it into CMake build system
- Created new `rocprofiler-systems-logger` library wrapping spdlog functionality
- Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.)
- Implemented log level control through command-line arguments and environment variables
- Converted assertion macros to proper error handling with exceptions and std::abort()
2026-01-14 15:27:51 -05:00
Joseph Narlo 499127c0b9 [SWDEV-553434] No direct way to get the BASEBOARD temperature info (#2502)
* [SWDEV-553434] No direct way to get the BASEBOARD temperature info. Need to iterate all gpus

Signed-off-by: amd-josnarlo <josnarlo.amd.com>

---------

Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2026-01-14 13:52:58 -06:00
David Yat Sin a3b445118d SWDEV-519413 - Ignore ROCr shutdown events (#1616)
ROCr now reports a shutdown event, but this is not a fatal error. Ignore
this event.
2026-01-14 11:28:03 -08:00
xuchen-amd 71b9ea6ba0 [rocprofiler-compute] improve config management system (#2359) 2026-01-14 13:20:27 -05:00
Luca Bruni d7ff927690 [clr] Fix device printf pointer advancement issue with string format specifiers (#1313) 2026-01-14 13:05:25 -05:00
habajpai-amd bad8d915c3 Fix: Add visibility hidden to devInfoTypesStrings to prevent symbol interposition (#2575) 2026-01-14 09:48:49 -08:00
Gopesh Bhardwaj b18db05091 [rocprofiler-sdk] Fixing docs build (#2608) 2026-01-14 10:13:17 -05:00
pghoshamd d2a1fc945e SWDEV-569319 Fix dangling reference warning (#2509)
* SWDEV-569319 Fix dangling reference warning

* fix nullptr warning

* use emplace

* return regular pointer
2026-01-13 15:39:03 -06:00
hongkzha-amd 9dc2488b6b rocrtst: Add test cases for interrupt disabled mode (#2385)
Add explicit test cases to verify ROCr functionality with interrupts
disabled (HSA_ENABLE_INTERRUPT=0). This ensures compatibility with
virtio, dtif, and WSL configurations which require interrupt-disabled
mode.

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
2026-01-13 12:10:11 -06:00
hongkzha-amd b3c4e94e70 rocr: Improve memory protection and WSL compatibility (#2274)
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.

Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: reduce ifdef linux
v2: Fix IsDXG check logic

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-13 12:08:20 -06:00
Geo Min dfdb64572c [TheRock CI] Adding working single node tests (#2142)
* Adding working single node tests

* Revert to old docker sha

* adding back no perf tests

---------

Co-authored-by: Aravind Ravikumar <arravikum@amd.com>

[ROCm/rccl commit: 4b295c9893]
2026-01-13 08:35:58 -08:00