76333 Commits

Author SHA1 Message Date
systems-assistant[bot] 720a5bcf9a SWDEV-547526 - Add missing free calls (#531)
Co-authored-by: Vladana Stojiljkovic <Vladana.Stojiljkovic@amd.com>
2025-11-13 11:16:41 +01:00
systems-assistant[bot] 7450910e53 SWDEV-548241 - Add missing destroy calls in graph tests (#520)
Co-authored-by: Vladana Stojiljkovic <Vladana.Stojiljkovic@amd.com>
2025-11-13 11:13:40 +01:00
Tim Huang e2d83014cf rocr/dtif: Add ring doorbell for sdma user queue (#1619)
Signed-off-by: Tim Huang <tim.huang@amd.com>
2025-11-13 15:08:08 +08:00
gilbertlee-amd 22d9a038a2 [GRAPH] Adding support for rail-optimized trees for MI3XX with 4 NICs (#2031)
[ROCm/rccl commit: 46b032b760]
2025-11-12 19:34:27 -06:00
gilbertlee-amd 46b032b760 [GRAPH] Adding support for rail-optimized trees for MI3XX with 4 NICs (#2031) 2025-11-12 19:34:27 -06:00
systems-assistant[bot] 061948a5ec [rocpd] Adding merge and package submodules for rocpd (#164)
* adding ROCpd database merge

* adding ROCpd database merge concatenating all tables

* update merge script

  - copy all tables from files

* fix merge format

* Add package submodule, initial POC.  Need to refine

* Minor fixes and clean up duplicated code in package.py

* Revamp metadata layout, add wildcard and .rpdb parsing

* Add auto merge & package when > 5 DBs, add examples, don't use auto_merge when using sub-commands merge & package

* - Extend package/yaml inputs to all rocpd modules
- Improve handling more corner cases for bad input files when parsing input parameters (bad yaml files, bad .rpdb folder, folders as input)
- Changed to use UUID in merged filename instead of the time, in auto-merge algorithm

* Minor text fixes for consistancy between modules

* Add more wildcard support and add package, merge tests

* Make changes based on review suggestions

* Move parsing packages into importer.py, simplified adding required params to a function

* fix package test by flattening input list before processing

* Integrate merge.py changes from Jonathan to add name-collision checks, recreating indexes, foreign key check (disabled for now, due to processing time)

* Rework rocpd.<submodule>.{add_args,process_args}

- add_args function returns a functor which accepts input and args
- time_window functor returned from add_args automatically applies time windowing of input

* change merge&package limit to 1, merge should create data views

* Move files by default instead of making copies

- copying can be enabled by passing "copy=True" or --copy cmdline argument

* refactor package to make the logic cleaner, set merge limit back to 5

* Allow automerge-limit param to override limit, change default back to 1.  Tests updated to use query, much quicker

* Update --help instructions for package

---------

Co-authored-by: acanadas <acanadas@amd.com>
Co-authored-by: a-canadasruiz <Araceli.CanadasRuiz@amd.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-11-12 17:07:12 -05:00
Sourabh U Betigeri f58393108f SWDEV-564408 - Reduces hip-tests runtime Pt2 (#1724) 2025-11-12 13:37:00 -08:00
David Yat Sin 7b097599c4 rocr: Fix race condition in SetAsyncSignalHandler (#1642)
Fix race condition when SetAsyncSignalHandler for the first time because
async_events_thread_ could be null and launched twice.

Refactored async-events to use lazy_pointer.
2025-11-12 13:54:26 -05:00
pcritchl-amd 60cd210dac Reapply "SWDEV-562996 - Build fix: Ubertrace callback calling convention mismatch on x86 (#1587)" (#1717) (#1754) 2025-11-12 13:47:24 -05:00
Aryan Salmanpour f0a07d9676 Drop libva-amdgpu/libva-amdgpu-devel use of RHEL8 (#202)
[ROCm/rocjpeg commit: 78af63d67c]
2025-11-12 10:44:43 -08:00
Aryan Salmanpour 78af63d67c Drop libva-amdgpu/libva-amdgpu-devel use of RHEL8 (#202) 2025-11-12 10:44:43 -08:00
Ioannis Assiouras 4f91b68988 SWDEV-559166 - Remove obsolete member execInfoOffset from KernelParameters (#1790) 2025-11-12 17:20:36 +00:00
nawrinsu cac8dc67fd Add tuner config file (2,4,8 nodes) for gfx950 (#2012)
* Add tuner config file (2,4,8 nodes) for gfx950

* remove alltoall

* Added comment regarding allgather direct

[ROCm/rccl commit: c488c5307e]
2025-11-12 09:16:36 -08:00
nawrinsu c488c5307e Add tuner config file (2,4,8 nodes) for gfx950 (#2012)
* Add tuner config file (2,4,8 nodes) for gfx950

* remove alltoall

* Added comment regarding allgather direct
2025-11-12 09:16:36 -08:00
Yiltan a500bc8029 only use rocm_install if we build the tools (#316)
[ROCm/rocshmem commit: 73786e203e]
2025-11-12 10:58:49 -05:00
Yiltan 73786e203e only use rocm_install if we build the tools (#316) 2025-11-12 10:58:49 -05:00
David Galiffi f8694173f6 Round the sum of percentages before validating to account for floating point errors (#1824)
* Round the sum of percentages before validating to account for floating point errors
---------

Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
2025-11-12 09:26:25 -05:00
Aleksei Tumakaev 90ac6675c2 [rocpd] Fix negative timestamp delta in perfetto (#1568)
* Fix negative delta_ts in perfetto
2025-11-12 15:08:58 +01:00
Edgar Gabriel 722b54fddb replace MPI function call. (#317)
* replace MPI function call.

* add two missing defs for RO

[ROCm/rocshmem commit: e1a7e20b1b]
2025-11-12 07:38:47 -06:00
Edgar Gabriel e1a7e20b1b replace MPI function call. (#317)
* replace MPI function call.

* add two missing defs for RO
2025-11-12 07:38:47 -06:00
Satyanvesh Dittakavi 07dd4c85e7 SWDEV-546308 - Implement hipKernelGetParamInfo API (#1783) 2025-11-12 14:09:26 +05:30
swargamrambabu e6b1ec25bd SWDEV-561337 Additional Tests for hipStreamCopyAttributes API (#1607)
* SWDEV-548797 Additional Tests for hipStreamCopyAttributes API

* SWDEV-548797 : Added sanity check section for negative test case
2025-11-12 14:05:54 +05:30
systems-assistant[bot] f99baf5481 SWDEV-519340 - Enable and fix hipModuleLoad test (#607) 2025-11-12 09:28:49 +01:00
David Galiffi 3ad7c20961 Change test condition from transpose-sampling to roctx-api-sampling (#1784) 2025-11-11 17:39:05 -05:00
jofrn 8f9da259ac Fix memory leak in hip_fatbin.cpp UncompressAndPopulateCodeObject (#1692)
Wrap amd_comgr_data_t item returned from action_data_get_data() in
ComgrDataUniqueHandle to ensure it gets released.
2025-11-11 16:48:06 -05:00
systems-assistant[bot] a66ca8809b SWDEV-511239 - Remove and and use && for preprocessors (#506)
This shows up as warning in msvc.

Co-authored-by: Jatin Chaudhary <JatinJaikishan.Chaudhary@amd.com>
2025-11-11 09:43:57 -08:00
Kapil S. Pawar c4d7680749 Added Functional Tests for CSV Tuner Plugin (#1968)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Updated readme

* Updated readme

* Updated readme

[ROCm/rccl commit: c8da880dc7]
2025-11-11 10:11:19 -06:00
Kapil S. Pawar c8da880dc7 Added Functional Tests for CSV Tuner Plugin (#1968)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Updated readme

* Updated readme

* Updated readme
2025-11-11 10:11:19 -06:00
Dingming Wu 0d3fba9a22 Adjust nChannels on gfx950 based on ranks and nodes for better bandwidth (#2027)
[ROCm/rccl commit: b811645688]
2025-11-11 09:46:51 -06:00
Dingming Wu b811645688 Adjust nChannels on gfx950 based on ranks and nodes for better bandwidth (#2027) 2025-11-11 09:46:51 -06:00
Gheorghe-Teodor Bercea 3da73a7526 Fix compilation when enabling indirect function calls (#1994)
Fix compilation when enabling indirect function calls.

[ROCm/rccl commit: 1678bb9ae7]
2025-11-11 09:36:48 -05:00
Gheorghe-Teodor Bercea 1678bb9ae7 Fix compilation when enabling indirect function calls (#1994)
Fix compilation when enabling indirect function calls.
2025-11-11 09:36:48 -05:00
Giovanni Lenzi Baraldi 07a563c475 AQLprofile SQTT double buffer support (#1787) 2025-11-11 13:01:22 +01:00
Todd tiantuo Li cf536a8c1a SWDEV-554372 - Add 3 HIP_GET_PROC_ADDRESS_xxx flags (#1771) 2025-11-10 23:29:40 -08:00
cfallows-amd 683a63d9ec Update rocprofiler-compute workflows (#1788)
* Update workflow files to use general public rocm dev build images from dockerhub.
Old method was to borrow rocprofiler-systems images but they do not contain rocm install anymore, so we cannot rely on them.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add workflow files to paths on push and PR

* Revert change of image for red hat variant because the image offered in official rocm image release is too large for runners.
Going back to using systems team images and installing rocm on them (as they do) as a workaround until we can get a smaller package size docker image with ROCm included.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

Adjusted python3-devel install line with an if else determined by distro version.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
2025-11-10 20:48:39 -05:00
Mustafa Abduljabbar b12399898d Reduce LL threshold for a2a (#2032)
[ROCm/rccl commit: 52f9526bd6]
2025-11-10 19:14:23 -05:00
Mustafa Abduljabbar 52f9526bd6 Reduce LL threshold for a2a (#2032) 2025-11-10 19:14:23 -05:00
Jin Jung 83291d71a1 SWDEV-558855 - hipExternalMemoryGetMappedBuffer test with CPU-nonvisible memory (#1760) 2025-11-10 17:35:03 -05:00
amd-hsivasun 946eacdd4a [Ex CI] Disable hip-tests pipeline (#1785) 2025-11-10 17:33:42 -05:00
Kapil S. Pawar 6bbc4b5d48 [RcclReplayer] Compile without the need for RCCL to be compiled (#2039)
[ROCm/rccl commit: acdafac49f]
2025-11-10 15:38:48 -06:00
Kapil S. Pawar acdafac49f [RcclReplayer] Compile without the need for RCCL to be compiled (#2039) 2025-11-10 15:38:48 -06:00
SaleelK 5e418ca256 clr: Allow all engines but prefer recommended engines (#1750)
* Also honor ROC_P2P_SDMA_SIZE for IPC, since IPC can also mean P2P
2025-11-10 13:10:46 -08:00
David Galiffi 3883bd3e93 Support for TheRock builds (#1545)
* Cleaning up some BUILD_<dep> config variables

The `ROCPROFSYS_BUILD_<dep>` settings were being translated to `BUILD_<dep>` for the old Dyninst dependencies.
Remove this extra layer
Add `rocprofiler_systems_add_option` for the `ROCPROFSYS_BUILD_<dep>` options, so there is a better description in the in the CMakeCache.

* Changes to support USE_ROCM in TheRock builds

* Removed `amd-smi::roctx` from Findamd-smi.cmake

* Fix linking error on rocm-6.4 when including amd_smi

* Format cmake

* Fix typo in logs

* Removing Findamd-smi.cmake

* Refactor the cmake parameters for `amd-smi`.

The `drm` libraries were only required ba amdsmi for rocm-6.4.0. There was no point adding them for other versions.
2025-11-10 14:38:51 -05:00
AL Musaffar, Yazen 699890a3f5 Fix for XGMI and SOC policies KeyError (#823)
Fix for amd-smi XGMI and SOC policies errors

Signed-off-by: Yazen AL Musaffar <Yazen.ALMusaffar@amd.com>
2025-11-10 12:41:47 -06:00
AL Musaffar, Yazen 93a719b894 Fix for XGMI and SOC policies KeyError (#823)
Fix for amd-smi XGMI and SOC policies errors

Signed-off-by: Yazen AL Musaffar <Yazen.ALMusaffar@amd.com>

[ROCm/amdsmi commit: 699890a3f5]
2025-11-10 12:41:47 -06:00
Jatin Chaudhary 68098c4d90 SWDEV-560329 - Fix some tests (#1378) 2025-11-10 18:22:03 +00:00
Aleksandar Djordjevic f39a60ac25 [rocprofiler-systems] Apply new CMake formatting for the latest gersemi version (#1778)
* Fix cmake formatting

* Updated rev. in `.pre-commit-config.yaml`

* Pin the gersemi used in CI to v0.23.1, matching the pre-commit

---------

Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-10 13:08:44 -05:00
Dingming Wu 23870ceccd Fail the job if flag HIP_HOST_UNCACHED_MEMORY is not set on MI350x (#2023)
* Fail the job if compiler flag HIP_HOST_UNCACHED_MEMORY is not turned on on mi350x
Place the check after initTransportsRank as the GPU arch info in comm->topo->nodes info is populated after that.

* Update src/init.cc to use ERROR instead of WARN
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: 05f914c997]
2025-11-10 11:54:35 -06:00
Dingming Wu 05f914c997 Fail the job if flag HIP_HOST_UNCACHED_MEMORY is not set on MI350x (#2023)
* Fail the job if compiler flag HIP_HOST_UNCACHED_MEMORY is not turned on on mi350x
Place the check after initTransportsRank as the GPU arch info in comm->topo->nodes info is populated after that.

* Update src/init.cc to use ERROR instead of WARN
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-11-10 11:54:35 -06:00
Dingming Wu c601f9b3f8 Increment opCount for intra-node comms as well (#2024)
* Enhance logging in NCCL initialization
It's convenient to log comms obj and default channels together for debugging

* Add opCount to collDevWork and update increment logic
Added opCount to collDevWork and incremented it when proxyOpQueue is empty (e.g., for intra-node comms)

* Clarify opCount increment logic in enqueue.cc
Updated comment to clarify incrementing opCount for intranode communications.

* Refactor NCCL_INIT logging format
Updated logging format for NCCL_INIT to improve clarity.

* Remove duplicate INFO logging in init.cc

[ROCm/rccl commit: b00ee4c83c]
2025-11-10 11:23:49 -06:00