76333 Commit

Autore SHA1 Messaggio Data
Aryan Salmanpour 91be23b3d0 TheRock compatibility - Devicelib Support (#684)
* TheRock compatibility - Devicelib Support

* clean up

[ROCm/rocdecode commit: 5784f8cffe]
2025-12-10 14:51:43 -08:00
Aryan Salmanpour 5784f8cffe TheRock compatibility - Devicelib Support (#684)
* TheRock compatibility - Devicelib Support

* clean up
2025-12-10 14:51:43 -08:00
Aryan Salmanpour 20b8575993 TheRock compatibility - Devicelib Support (#215)
[ROCm/rocjpeg commit: 62007a9f8b]
2025-12-10 14:49:58 -08:00
Aryan Salmanpour 62007a9f8b TheRock compatibility - Devicelib Support (#215) 2025-12-10 14:49:58 -08:00
yugang-amd 195fe4e5ee GDA docs style edits (#362)
* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/sphinx/_toc.yml.in

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Apply suggestions from code review

Co-authored-by: yugang-amd <yugang.wang@amd.com>

---------

Co-authored-by: Yiltan <ytemucin@amd.com>

[ROCm/rocshmem commit: bbad1d8539]
2025-12-10 17:03:58 -05:00
yugang-amd bbad1d8539 GDA docs style edits (#362)
* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/install.rst

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Update docs/sphinx/_toc.yml.in

Co-authored-by: yugang-amd <yugang.wang@amd.com>

* Apply suggestions from code review

Co-authored-by: yugang-amd <yugang.wang@amd.com>

---------

Co-authored-by: Yiltan <ytemucin@amd.com>
2025-12-10 17:03:58 -05:00
shwetakhatri-amd 0835f2e75a rocrtst: Updated CMakeFiles to find_package instead of hardcoded (#2095)
* rocrtst: Updated CMakeFiles to find_package instead of hardcoded

This is to support TheROCK build environment

* rocrtst: Fix CMake to use find_package() instead of hardcoded ENV paths

Fixed CMake style issues from previos first commit's code review

* rocrtst: Fix rocrtst NUMA dependency detection to use find_package

Also added handling of missing headers

* rocrtst: Fix NUMA and hwloc detection for cross-platform builds

---------

Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
2025-12-10 16:16:25 -05:00
Geo Min 1b4eef8f86 Correct runner name (#2098)
[ROCm/rccl commit: 5384a8abb2]
2025-12-10 11:44:48 -08:00
Geo Min 5384a8abb2 Correct runner name (#2098) 2025-12-10 11:44:48 -08:00
David Galiffi 70562eb854 Add ROCm 7.1 to workflows (#2256)
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-12-10 13:41:22 -05:00
Aurelien Bouteiller e783b47388 Bump version number for post-7.2 devel (#356)
[ROCm/rocshmem commit: 64460f0ec9]
2025-12-10 13:03:20 -05:00
Aurelien Bouteiller 64460f0ec9 Bump version number for post-7.2 devel (#356) 2025-12-10 13:03:20 -05:00
German Andryeyev 3895aadba6 SWDEV-558849 - Make ROCR path in Windows more stable (#2181) 2025-12-10 12:37:10 -05:00
Pengda Xie 1d6b26f829 SWDEV-556684 - HSAIL Cleanup re-apply commit 4abdfe5: (#2024)
Removed some options

-xnack, -force-wgp-mode, -force-wave-size-32, -round-trip-spirv,
-fe-gen-spirv, -lower-pipe-builtins=0|1, -lower-atomics=0|1,
-set-lds=<value>, -set-scalar-registers=<value>,
-set-vector-registers=<value>, -limit-scalar-registers=<value>,
-limit-vector-registers=<value>, -sc-xnack-iommu,
-faa-for-barrier/-fno-a-for-barrier, -sc-dev-format, -verify-lwspir,
-verify-hwspir, -ffma-enable/-fno-fma-enable,
-fmad-enable/-fno-mad-enable, -fdisable-avx/-fno-disable-avx,
-fforce-llvm/-fno-force-llvm, -print-compile-phases,
-kernel-cache-enforce-miss, -kernel-cache-wipe, -kernel-cache,
-sc[=<filename>]/--load-sc-dll[=<filename>],
-be[=<filename>]/--load-be-dll[=<filename>],
-cg[=<filename>]/--load-cg-dll[=<filename>],
-link[=<filename>]/--load-link-dll[=<filename>],
-opt[=<filename>]/--load-opt-dll[=<filename>],
-fe[=<filename>]/--load-fe-dll[=<filename>],
-cl[=<filename>]/--load-cl-dll[=<filename>], -just-kernel=<kernel-name>,
-use-debugil, -fmulti-level-call/-fno-multi-level-call,
-fdebug-call/-fno-debug-call, -fmacro-call/-fno-macro-call,
-fstack-uav/-fno-stack-uav, -fdef-res-id/-fno-def-res-id,
-wokth=int/--waves-opt-kernel-threshold,
-ilkth=int/--inline-kernel-size-threshold,
-ilsth=int/--inline-size-threshold, -ilcth=int/--inline-cost-threshold,
-scopt=int/--sc-opt-level, -flib-no-inline/-fno-lib-no-inline,
-fuser-no-inline/-fno-user-no-inline,
-scras=int/--sc-si-opt-reg-alloc-strategy, -fsc-post-ra-sched,
-fsc-live-sched/-fno-sc-live-sched, -fsc-use-buffer-for-hsa-global,
-fsc-schedule-no-reorder, -fsc-min-reg-schedule,
-fsc-bias-schedule-to-minimize-insts,
-fsc-bias-schedule-to-minimize-regs, -fsc-disable-merge-memory,
-fsc-disable-loop-unroll, -fsc-use-mubuf/-fno-sc-use-mubuf,
-fsc-selective-inline/-fno-sc-selective-inline,
-fsc-keep-calls/-fno-sc-keep-calls, -slc=0|1/--simplifylibcall,
-stack-alignment=<n>, -fdiv2fmul=0|1, -prt-opt-liveness=0|1,
-liveness=0|1, -SRAE-threshold=<value>, -memcombine-max-vec-gen=<value>,
-small-global-objects, -fast-fmaf, -fast-fma, -bfo=0|1, -ebb=0|1, -aa,
-mem2reg=0|1, -licm=0|1, -unroll-allow-partial,
-unroll-threshold=<positive integer>, -unroll-count=<positive integer>,
-apt/--ap-threshold=<positive integer>, -srt/--sr-threshold=<positive
integer>, -fdebug-linker/-fno-debug-linker, -fbin-gpu64/-fno-bin-gpu64,
-fbin-disasm/-fno-bin-disasm, -fbin-bif30, -fbin-hsail/-fno-bin-hsail,
-fbin-amdil/-fno-bin-amdil, -fbin-spir/-fno-bin-spir, -fonly-bin-source,
-fper-pointer-uav/-fno-per-pointer-uav

Co-authored-by: Konstantin Zhuravlyov <kzhuravl_dev@outlook.com>
2025-12-10 09:09:12 -08:00
corey-derochie-amd de82a18790 Fixed unit-test env var list parsing and improved filtered test run speed (#1626)
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.

* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.

* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.

* Wrapped the support checks in helper functions on `TestBed`.

[ROCm/rccl commit: 18e9ad913b]
2025-12-10 10:06:44 -07:00
corey-derochie-amd 18e9ad913b Fixed unit-test env var list parsing and improved filtered test run speed (#1626)
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.

* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.

* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.

* Wrapped the support checks in helper functions on `TestBed`.
2025-12-10 10:06:44 -07:00
Fábio Mestre d4fe3f1cc3 [hip-tests] Update API coverage report generator (#1932)
* [hip-tests] Update API coverage report generator

Updates the HIP API coverage tool. It now takes
extra arguments for the location of the catch test folder
and for the working directory. This avoids issues where the output
of the executable is dependent on the path where it is being
executed from.

Also updates CmakeLists.txt to integrate seamlessly with the
hip-tests project and avoid using commands which rely on
relative paths.

* Remove double new line

* Remove Cmake option to generate coverage

Removes Cmake option to generate coverage. Instead, explicitly removes
the gen_coverage target from all (this is already the default but
doing it explicitly prevents confusion).
2025-12-10 17:53:47 +01:00
Yiltan 258d264ecc Add default context alltoall API (#350)
[ROCm/rocshmem commit: fddbe7b15d]
2025-12-10 11:43:15 -05:00
Yiltan fddbe7b15d Add default context alltoall API (#350) 2025-12-10 11:43:15 -05:00
Aurelien Bouteiller 972893bab2 Reenable building test-only with external MPI (#352)
[ROCm/rocshmem commit: 1a16b3bedc]
2025-12-10 11:40:29 -05:00
Aurelien Bouteiller 1a16b3bedc Reenable building test-only with external MPI (#352) 2025-12-10 11:40:29 -05:00
Rahul Manocha 0c1f87a7f6 SWDEV-558848 - vmm api support for rocr on windows (#1761)
* SWDEV-558848 - vmm api support for rocr on windows

* Fixes to VMM handle Map/Unmap Set/Get Access

* Fix GetShareableHandle to use pointer for shareable handle

* Update os specific map/unmap memory calls

* clang format update

* Minor syntax fixes from code review

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
2025-12-10 08:39:51 -08:00
Venkateshwar Reddy Kandula 465633d707 [AQLProfile] Fix sqtt legacy tests due to command buffer size underestimating (#2194) 2025-12-10 09:25:38 -06:00
jamessiddeley-amd d27bd37042 [rocprof-compute] Fix roofline "test_roof_plot_modes" test case (#2217)
* fix roof test to be isolated file paths

* fix typo

* addressed comments

* fix typos
2025-12-10 10:01:49 -05:00
vedithal-amd 252a5e8146 [rocprofiler-compute] Remove TCP_TCP_LATENCY_sum counter for MI300 (#2174)
* Remove TCP_TCP_LATENCY_sum counter for MI300

* Remove TCP_TCP_LATENCY_sum counter which is unsupported for MI300 per register specification

* Remove VL1 Lat metric from memory chart section (block 3) for MI 300
  since it uses TCP_TCP_LATENCY_sum counter which is unsupported

* Remove references to TCP_TCP_LATENCY_sum

* Update CHANGELOG

* reword changelog
2025-12-10 09:41:46 -05:00
cfallows-amd 9d34098350 [rocprofiler-compute] Roofline runtime compilation patch (#2232)
* Add install into CMakeLists.txt file- resolves 'no hip module' issues.
* Readd printout line for peak VALU during benchmarking removed on accident in a different commit.
* Add CHANGELOG entry for commit 2bfa9a4 ("Integrate roofline benchmark into rocprof-compute (#2015)")

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Run formatter checks on rocprof-compute to clear PR checks

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update benchmark.py link in changelog

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestions to CHANGELOG from code review

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-12-10 01:44:28 -05:00
yangsu13 e55a439c13 wsl/librocdxg: Change hsaKmtQueueRingDoorbell interface
PR: https://github.com/ROCm/rocm-systems/pull/2068/commits/a87adf6444faf9bd841e9fdd38335aa8008b2d06

Signed-off-by: yangsu13 <Yang.Su2@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
2025-12-10 14:08:35 +08:00
Zachary Vincze 81c492eba2 CMakeLists - ${CMAKE_CURRENT_SOURCE_DIR} updates (#683)
* Add ${CMAKE_CURRENT_SOURCE_DIR} over ${CMAKE_SOURCE_DIR} where required

* Address review comments

[ROCm/rocdecode commit: 0ad47c91df]
2025-12-09 21:35:46 -08:00
Zachary Vincze 0ad47c91df CMakeLists - ${CMAKE_CURRENT_SOURCE_DIR} updates (#683)
* Add ${CMAKE_CURRENT_SOURCE_DIR} over ${CMAKE_SOURCE_DIR} where required

* Address review comments
2025-12-09 21:35:46 -08:00
Mario Limonciello 73778bf83c Adjust policy for memory display on APUs (#1967)
* Read the ids_flags when fetching GPU info

The ids_flags contains the flags that can help identify if a GPU
is a dGPU or an APU.

* Show correct memory pool for APUs

The kernel policy for APUs will be to choose the bigger pool of
memory (GTT or VRAM) for KFD work.  Adjust the policy for the monitor
and default commands to show the right memory pool when using an APU.
2025-12-09 21:49:06 -06:00
Geo Min 2e0abab81a [ci] Bumping TheRock CI commit hash (#2097)
* Bumping TheRock CI commit hasH

* fixing artifact group

[ROCm/rccl commit: 6af9087b0c]
2025-12-09 16:25:57 -08:00
Geo Min 6af9087b0c [ci] Bumping TheRock CI commit hash (#2097)
* Bumping TheRock CI commit hasH

* fixing artifact group
2025-12-09 16:25:57 -08:00
Geo Min 879d010974 Bumping commit hash for TheRock (#2244) 2025-12-09 14:56:20 -08:00
Maisam Arif 63da8d2e08 [SWDEV-568673] Updated Docmentation Examples for Python APIs (#2017)
* [SWDEV-568673] Updated Docmentation Examples for Python APIs

* amdsmi_get_processor_type
* amdsmi_gpu_create_counter
* amdsmi_gpu_destroy_counter

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>

* [SWDEV-568997] - Updated Docmentation Examples for Python APIs

* amdsmi_topo_get_p2p_status

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>

* [SWDEV-568997] - Updated Docmentation Examples for Python APIs

* [SWDEV-568997] - amdsmi_topo_get_p2p_status
* [SWDEV-568990] - amdsmi_set_gpu_clk_range
* [SWDEV-568987] - amdsmi_set_gpu_od_clk_info
* [SWDEV-568969] - AmdSmiEventReader
* [SWDEV-568964] - amdsmi_set_gpu_power_profile
* [SWDEV-568953] - amdsmi_gpu_create_counter
* [SWDEV-568939] - amdsmi_set_cpu_pcie_link_rate
* [SWDEV-568937] - amdsmi_get_cpu_socket_lclk_dpm_level

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>

* Fixes:
    SWDEV-568716 [TCT][amd-smi]: NameError: name 'handle' is not defined when calling amdsmi_get_processor_handles(handle)
    SWDEV-568726 [TCT][amd-smi]: TypeError: list indices must be integers or slices, not str when accessing cache_values['cache_properties']
    SWDEV-568526 [TCT][amd-smi]: AMD SMI Python API Documentation Error – Incorrect variable name in sample code
    SWDEV-569017 [TCT][amd-smi]: correction required for amdsmi_set_clk_freq API in python API document page
    SWDEV-569025 [TCT][amd-smi]: amdsmi_get_link_metrics python API raises key error, correction required in python API sample documentation

Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>

* Fix: SWDEV-568727 [TCT][amd-smi]: Mandatory arguements 'encoding' and 'link_name' needs to be updated in Python API Sample documentation

Signed-off-by: amd-josnarlo <josnarlo.amd.com>

---------

Signed-off-by: Maisam Arif <Maisam.Arif@amd.com>
Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
Signed-off-by: Joseph Narlo <joseph.narlo@amd.com>
Signed-off-by: amd-josnarlo <josnarlo.amd.com>
Co-authored-by: Sumanth Gavini <sumanth.gavini@amd.com>
Co-authored-by: Joseph Narlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <josnarlo.amd.com>
2025-12-09 16:16:50 -06:00
Jason Bonnell 74cf2a85ec Fix rocprofiler-sdk CI workflow tarball install errors (#2225)
* Add ls statement for debugging /opt directory file naming

* Update ROCM_VERSION from 7.0.0 to 7.1.1 in SDK CI

* Update amdgpu debian package for Ubuntu in Dockerfile.ci

* disable HIP/CLR build in codeql (#2242)

---------

Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com>
2025-12-09 16:06:35 -05:00
Rahul Manocha af5c453551 hip-tests: Enable standalone test targets with cmake (#2189)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-12-09 11:07:42 -08:00
dependabot[bot] 103b31c51a Docs - Bump rocm-docs-core[api_reference] from 1.30.1 to 1.31.0 in /docs/sphinx (#214)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.30.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.30.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocjpeg commit: 8a59339d34]
2025-12-09 10:54:31 -08:00
dependabot[bot] 8a59339d34 Docs - Bump rocm-docs-core[api_reference] from 1.30.1 to 1.31.0 in /docs/sphinx (#214)
Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.30.1 to 1.31.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/v1.31.0/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.30.1...v1.31.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-version: 1.31.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-12-09 10:54:31 -08:00
Dominic Widdows 75bea883e1 Remove redirect notice from redirect target (#2104)
README is copied from https://github.com/ROCm/clr which redirects to target
https://github.com/ROCm/rocm-systems/tree/develop/projects/clr

This is correct for https://github.com/ROCm/clr I think, but unnecessary for https://github.com/ROCm/rocm-systems/edit/develop/projects/clr/README.md which is already the correct redirect target.
2025-12-09 10:47:51 -08:00
Benjamin Welton 65c048e918 Change rocprofiler-sdk CMake compatibility to AnyNewerVersion (#1632)
* Change rocprofiler-sdk CMake compatibility to AnyNewerVersion

Update CMake package version compatibility from SameMinorVersion to
AnyNewerVersion to allow downstream packages (like RDC) to use newer
versions of rocprofiler-sdk without requiring exact minor version match.

This fixes compatibility issues where RDC requests 1.0.0 but finds 1.1.0.

* Update projects/rocprofiler-sdk/cmake/rocprofiler_config_install.cmake

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Change rocpd and roctx CMake compatibility to SameMajorVersion

Update COMPATIBILITY setting from SameMinorVersion to SameMajorVersion
for both rocpd and roctx packages to allow compatibility across major
version boundaries.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-12-09 09:51:55 -08:00
Swati Rawat 87e61f514c Update ROCTracer README for the GitHub link (#1745)
* Update README for the GitHub link

* Updating links to rocm-systems
2025-12-09 09:42:48 -08:00
Larry Meadows 938fe1ca8e Add missing formatters for ompt_mutex_t and ompt_target_t. (#1343)
This are used by rocprofiler-system when it
    generates perfetto output.
2025-12-09 11:40:48 -06:00
Victor Zhang aaecffa50b SWDEV-568847 - prevent UAF when registering callbacks on completed events (#2066)
* SWDEV-568847 - prevent UAF when registering callbacks on completed events

* cache the status() of event earlier

* Update command.cpp

* revert cl_event.cpp

* Update cl_event.cpp

---------

Co-authored-by: cadolphe-amd <chris.adolphe@amd.com>
2025-12-09 11:38:45 -05:00
Aurelien Bouteiller 92459fa840 Update version to 3.2.0 for 7.2.0 rocm release (#351)
[ROCm/rocshmem commit: ef5f2be215]
2025-12-09 10:26:55 -05:00
Aurelien Bouteiller ef5f2be215 Update version to 3.2.0 for 7.2.0 rocm release (#351) 2025-12-09 10:26:55 -05:00
Jatin Chaudhary eea93d58a2 SWDEV-554626 - return correct error code (#1107)
* SWDEV-554626 - return hipErrorInvalidDeviceFunction when we can not load module
Return correct error code when modules are empty

* Match the error codes

* Revert the error code
2025-12-09 16:10:25 +01:00
Anatolii Rozanov f98c72d627 Add host API for *_on_stream operations (#340)
* Add functional test for barrier_all_on_stream

* Add rocshmem_barrier_all_on_stream support for GDA and RO backends

Implements rocshmem_barrier_all_on_stream operation for
GPU Direct Access and Reverse Offload backends.

Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend.

* Add functional test for rocshmem_broadcastmem_on_stream

* Add host-side rocshmem_broadcastmem_on_stream API

Implement stream-based broadcast collective operation

- Add rocshmem_broadcastmem_on_stream host API and kernel implementation
- Add functional test TeamBroadcastmemOnStreamTester with multi-stream
  support and correctness verification
- Use per-workgroup contexts to avoid contention across parallel streams

API:
rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream)

* Add functional test for rocshmem_getmem_on_stream

* Add host-side rocshmem_getmem_on_stream API

Implement stream-based point-to-point RMA get operation

- Add rocshmem_getmem_on_stream host API and kernel implementation
- Support for asynchronous getmem operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group collective getmem for efficient memory transfer

API:
rocshmem_getmem_on_stream(dest, source, nelems, pe, stream)

(AI Assist)

* Add host-side rocshmem_putmem_on_stream API

- Add rocshmem_putmem_on_stream for asynchronous remote writes
- Support for concurrent RMA operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group device collective operation

API:
rocshmem_putmem_on_stream(dest, source, bytes, pe, stream)

(AI Assist)

* Add functional test for rocshmem_putmem_on_stream

* Add host-side rocshmem_putmem_signal_on_stream API

Enables asynchronous putmem operations with signaling on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_putmem_signal_kernel
- Host interface putmem_signal_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Public API

Function signature:
void rocshmem_putmem_signal_on_stream(void *dest, const void *source,
                                      size_t bytes, uint64_t *sig_addr,
                                      uint64_t signal, int sig_op,
                                      int pe, hipStream_t stream);

* Add functional test for rocshmem_putmem_signal_on_stream

* Add host-side rocshmem_signal_wait_until_on_stream API

Enables asynchronous signal wait operations on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_signal_wait_until_kernel
- Host interface signal_wait_until_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Native uint64_t support in wait_until API (generated from P2P_SYNC.py)

Function signature:
void rocshmem_signal_wait_until_on_stream(uint64_t *sig_addr, int cmp,
                                          uint64_t cmp_value,
                                          hipStream_t stream);

(AI Assist)

* Add functional test for rocshmem_signal_wait_until_on_stream

* Add documentation for stream API functions

This commit adds API documentation for the following host-side
stream functions:

- rocshmem_barrier_all_on_stream (collective routines)
- rocshmem_broadcastmem_on_stream (collective routines)
- rocshmem_getmem_on_stream (RMA operations)
- rocshmem_putmem_on_stream (RMA operations)
- rocshmem_putmem_signal_on_stream (signaling operations)
- rocshmem_signal_wait_until_on_stream (point-to-point sync)

The documentation includes function signatures, parameter descriptions,
and detailed explanations of asynchronous behavior and stream handling.

(AI Assist)

* Rename "bytes" -> "nelems"

* Add "_TEST_" to the variables used in tests

* Remove incorrect hipStreamDefault usage

hipStreamDefault is not a default stream. This is a flag.

If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream

[ROCm/rocshmem commit: d0c8380650]
2025-12-09 08:55:46 -06:00
Anatolii Rozanov d0c8380650 Add host API for *_on_stream operations (#340)
* Add functional test for barrier_all_on_stream

* Add rocshmem_barrier_all_on_stream support for GDA and RO backends

Implements rocshmem_barrier_all_on_stream operation for
GPU Direct Access and Reverse Offload backends.

Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend.

* Add functional test for rocshmem_broadcastmem_on_stream

* Add host-side rocshmem_broadcastmem_on_stream API

Implement stream-based broadcast collective operation

- Add rocshmem_broadcastmem_on_stream host API and kernel implementation
- Add functional test TeamBroadcastmemOnStreamTester with multi-stream
  support and correctness verification
- Use per-workgroup contexts to avoid contention across parallel streams

API:
rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream)

* Add functional test for rocshmem_getmem_on_stream

* Add host-side rocshmem_getmem_on_stream API

Implement stream-based point-to-point RMA get operation

- Add rocshmem_getmem_on_stream host API and kernel implementation
- Support for asynchronous getmem operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group collective getmem for efficient memory transfer

API:
rocshmem_getmem_on_stream(dest, source, nelems, pe, stream)

(AI Assist)

* Add host-side rocshmem_putmem_on_stream API

- Add rocshmem_putmem_on_stream for asynchronous remote writes
- Support for concurrent RMA operations on HIP streams
- Add backend support for GDA, RO, and IPC contexts
- Use work-group device collective operation

API:
rocshmem_putmem_on_stream(dest, source, bytes, pe, stream)

(AI Assist)

* Add functional test for rocshmem_putmem_on_stream

* Add host-side rocshmem_putmem_signal_on_stream API

Enables asynchronous putmem operations with signaling on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_putmem_signal_kernel
- Host interface putmem_signal_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Public API

Function signature:
void rocshmem_putmem_signal_on_stream(void *dest, const void *source,
                                      size_t bytes, uint64_t *sig_addr,
                                      uint64_t signal, int sig_op,
                                      int pe, hipStream_t stream);

* Add functional test for rocshmem_putmem_signal_on_stream

* Add host-side rocshmem_signal_wait_until_on_stream API

Enables asynchronous signal wait operations on HIP streams.

The implementation includes:
- Kernel wrapper rocshmem_signal_wait_until_kernel
- Host interface signal_wait_until_on_stream method
- Context layer support across all backends (IPC, GDA, RO)
- Native uint64_t support in wait_until API (generated from P2P_SYNC.py)

Function signature:
void rocshmem_signal_wait_until_on_stream(uint64_t *sig_addr, int cmp,
                                          uint64_t cmp_value,
                                          hipStream_t stream);

(AI Assist)

* Add functional test for rocshmem_signal_wait_until_on_stream

* Add documentation for stream API functions

This commit adds API documentation for the following host-side
stream functions:

- rocshmem_barrier_all_on_stream (collective routines)
- rocshmem_broadcastmem_on_stream (collective routines)
- rocshmem_getmem_on_stream (RMA operations)
- rocshmem_putmem_on_stream (RMA operations)
- rocshmem_putmem_signal_on_stream (signaling operations)
- rocshmem_signal_wait_until_on_stream (point-to-point sync)

The documentation includes function signatures, parameter descriptions,
and detailed explanations of asynchronous behavior and stream handling.

(AI Assist)

* Rename "bytes" -> "nelems"

* Add "_TEST_" to the variables used in tests

* Remove incorrect hipStreamDefault usage

hipStreamDefault is not a default stream. This is a flag.

If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream
2025-12-09 08:55:46 -06:00
Mario Limonciello a08170bc75 Apu prerequisites (#1946)
* Don't require powercap support

APUs don't necessarily support setting a power cap from sysfs.
Ignore failures of the file missing.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Show edge temperature in default output if hotspot is missing

APUs don't have a hotspot temperature, they have an edge though.
Use that.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Format all "power" keys as watts

There will be more power keys when APU support is added, so format
them properly.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Don't show power limit in output if it's invalid

APUs can't set power limit using power_cap1 interface.  The limit
will be 0 and thus the UX looks weird in default output.
Only add the `/power_limit` if it's valid.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Unify sizes of `amdsmi_power_info_t`

Sizes are used inconsistently.  This causes tools to not show
N/A when they should.  Make them unified.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 21:36:45 -06:00
Dimple Prajapati b9c172de16 Add IBGDA backend flag to enable bitcode generation (#347)
* Change to enable ibgda bitcode compilation

* Apply suggestion from @abouteiller

---------

Co-authored-by: Aurelien Bouteiller <aurelien.bouteiller@amd.com>

[ROCm/rocshmem commit: fbe57306b9]
2025-12-08 16:19:48 -08:00