Commit gráf

74797 Commit-ok

Szerző SHA1 Üzenet Dátum
systems-assistant[bot] 1211790607 Direct Reduce Scatter Implementation (#2765)
* Add new implementation of direct send/recv reduce scatter

* Resolved conflicts

* Add multiple channels support to the reduction kernel of direct reduce scatter and adjust offset into buffer to utilize multiple channels.

* Resolve validation issue when number of elements is not divisible by number of channels leaving elements unaccount for in reduction.

* fix proxy hang

* set maxSrcs to 64 in reduceCopy

* optimize multi-channel code

* fix validation issue in single node MI300

* Tune the message size range for 2,4, and 8 Nodes

* Move Direct RS into separate kernel

* Add Copyright

* resolve review comments

* resolve review comments

* fix merge build issue

* revert move Direct RS into separate kernel

* address review comments

* address review comments

---------

Co-authored-by: KawtharShafie <kawtharshafie@gmail.com>
Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2026-01-30 09:27:27 -06:00
systems-assistant[bot] 055909d335 Set default max channels to 48 for MI350 multi-node (#2759)
* make 48 the default max channels for MI350

* address review comments

---------

Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-30 09:22:42 -06:00
Alysa Liu 13091e18ad libhsakmt: Add THEROCK_SANITIZER support for ASAN builds (#2978)
Add THEROCK_SANITIZER support for ASAN builds.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-30 10:02:10 -05:00
Jin Jung 25d0107d24 SWDEV-575867: Fix error code for mapped graphics resources (#2662) 2026-01-30 07:47:13 -05:00
Alexandra Sidorova 8800e03058 [CLR] Added missed ostream include to amd_hip_bfloat16.h (#2960) 2026-01-30 07:42:38 -05:00
Jan Stephan 20a745962f Fix graph API binary paths (#2884)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
2026-01-30 10:51:37 +01:00
Junhua Shen 0d98c3bdd5 libhsakmt: Implement per-context topology for multi-context KFD support (#2405)
This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management.

Changes:
* Add hsaKmtGetClockCountersCtx for multi-context support
  - Add context-aware version of hsaKmtGetClockCounters
  - Original API is retained as a wrapper calling the ctx-version with primary context

* Enable independent debug sessions across multiple KFD contexts
  -Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context

* Add perf sub-context for per-context performance counter management
  - Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization

* Refactor FMM for per-context resource management
  - Refactor multiple global variables related to FMM, including 
    GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context

* Implement per-context topology for complete context isolation
  - Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id)
     to per-context hsa_kfd_topology_context structure
  - Update all topology functions to accept HsaKFDContext parameter for
     context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.)
  - Refactor topology snapshot management for per-context isolation
  - Add context-aware PMC trace access APIs

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2026-01-30 09:42:25 +08:00
vedithal-amd a838b0c07b [rocprofiler-compute] Fix test case for MI 308 (#2934)
* Fix test case for MI 308

* Use consistent naming of GPUs in comment
2026-01-29 18:54:52 -05:00
Venkateshwar Reddy Kandula dea3da3a6f [rocprofiler-sdk][CI] Fix rocprofiler-sdk CI change rocm version to 7.2.0 (#2979)
* Update aqlprofile-continuous_integration.yml to use rocm-7.2.0

* Update rocprofiler-sdk-continuous_integration.yml to use rocm-7.2.0

* Update rocprofiler-sdk-docs.yml to use rocm-7.2.0
2026-01-29 17:28:52 -06:00
pghoshamd adaaa883b2 SWDEV-576090 Fix mem leaks and double free of signals (#2817) 2026-01-29 16:53:27 -05:00
Mark Meserve 94c246eb9e attach: fix typos and older names in documentation (#2684) 2026-01-29 16:46:24 -05:00
vedithal-amd 4b364df43b [rocprofiler-compute] Enable panel level csv files for roofline panel (#2887)
* Enable panel level csv files for roofline panel

* Fixed comments
2026-01-29 15:32:54 -05:00
Rahul Manocha c4f7593001 clr: Update signal count and pool size for staging buffer (#2889)
* clr: Update signal count and pool size for staging buffer

* Change to naming of variables etc

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-29 10:34:00 -08:00
systems-assistant[bot] 58c203e252 Fix channel overuse for 1 rank comms (#2760)
* Fix channel overuse for 1 rank comms

* limit channels when warpSpeed is enabled but not used

* enable std::min check against # of CUs for maxChannels computation when warpSpeed is enabled

---------

Co-authored-by: Mustafa Abduljabbar <muabdulj@amd.com>
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-29 12:13:46 -06:00
Benjamin Welton b509e9bd77 [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations (#2941)
* [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations

The HIP runtime API now has 515+ operations (as of ROCm 7.x), but
domain_ops_padding was set to 512. This caused std::out_of_range
exceptions when checking operations >= 512 via std::bitset::test().

Changes:
- Increase domain_ops_padding from 512 to 1024
- Add compile-time static_assert to validate padding is sufficient
  for all API domains (HIP, HSA, marker, RCCL, rocDecode, rocJPEG)

Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>

* Update projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/context/domain.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [rocprofiler-sdk] Apply clang-format-11 to domain.cpp

Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>

* Rework implementation to ensure coverage of all operation enums

* Fix compiler error in unit test for enum_string.cpp

* Fix data types of domain_ops_padding values

* Revert some changes in domain.cpp

---------

Co-authored-by: Claude (claude-opus-4.5) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-29 12:26:33 -05:00
Yiannis Papadopoulos 66ee941fea rocr/aie: Throw exception for malformed packets and packet submission errors (#2528) 2026-01-29 10:52:42 -05:00
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
Venkateshwar Reddy Kandula a7c3e8392a [rocprofiler-sdk] Use venv for fixing CI docker image workflow (#2955)
* use python virtual env for aws cli

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* use 7.2 amdgpu for ubuntu

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-29 09:53:15 -05:00
David Galiffi 4c458fae9c [rocprofiler-systems] Fix ROCM_VERSION guard used for the scratch_memory_record structure (#2948)
- Fix ROCM_VERSION guard used for the scratch_memory_record structure
- This fixes a rocm/7.0.2 build failure

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-29 09:34:27 -05:00
moniljethva b5e4074c78 Adding support for GFX 11.5 in AQL Profiler (#2340)
* Adding support of AQL Profiler for GFX 11.5

* Removing hard coded value for sa_number

* Adding instance count for WGP block, removing hard coded values.

* Fixed SQ counter block and TD counter block instances
2026-01-29 17:39:12 +05:30
Jaydeep 190d9a8e27 SWDEV-561273 - hip samples on TheRock build using HIP LANGUAGE and hip-lang package. (#1794) 2026-01-29 09:15:58 +01:00
Bindhiya Kanangot Balakrishnan fa6f071751 [SWDEV-574637] Avoid redundant hive gpu resets (#2657)
Mode-1 GPU reset affects entire XGMI hive. Added
xgmi_hive_id check to reset only once for same-hive
GPUs while preserving separate resets for different
hives or no hives.
 - Example:
   `sudo amd-smi reset -G` or `sudo amd-smi reset -G -g 0`
   on MI300 will reset all GPU's only once.

Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
2026-01-28 22:59:17 -06:00
Sumanth Gavini e9c72b06b0 [ROCM-1036] Dynamic fan support detection in set -h (#2721)
Show "N/A" for ASICs without fan support
`amd-smi set -h` fan help text will be dynamic instead of "0-255 or 0-100%"

Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
2026-01-28 22:44:25 -06:00
koushikbillakanti-amd e9b143323a [SWDEV-498649] Fix reset cli AttributeError (#2203)
* Fix SWDEV-498649: Handle missing attributes safely in set_gpu

---------

Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-28 22:39:50 -06:00
Yazen AL Musaffar 19725abbf4 [SWDEV-560702] Per process MEM usages does not add up to per GPU MEM usage. (#2888)
* Update pyhton docs for process memory usage
* Added comment for processes total memory usage

---------

Signed-off-by: yalmusaf <Yazen.ALMusaffar@amd.com>
2026-01-28 22:34:20 -06:00
Loganaden Velvindron bf36e5f620 Fix disabled fortify source security flag (#2570)
Fix spurious character that caused CI issue.
2026-01-28 22:30:24 -06:00
peterjunpark 159e751788 docs(amdsmi): add link to amd-smi-virt (#2543)
Update install page virt references
Signed-off-by: Peter Park <peter.park@amd.com>
2026-01-28 22:24:55 -06:00
Joseph Narlo 48a4cda75c [SWDEV-552552] Provide CLI testing within amd-smi-lib-tests install (#2485)
* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested

---------

Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
2026-01-28 22:16:01 -06:00
Adam Pryor cf3e283d85 [FMDEV-170733] Remove amd-smi ptl set check (#2933)
Signed-off-by: Arif, Maisam <Maisam.Arif@amd.com>
2026-01-28 22:12:17 -06:00
systems-assistant[bot] 27be824745 [SWDEV-565483] Add power profile set/get to amd-smi CLI (#1905)
* Fix exception handling in power profile commands
* Update CHANGELOG.md
* Update amdsmi_parser.py for the single character argument for --profile as -o

---------

Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-28 22:00:18 -06:00
Gopesh Bhardwaj 680a92769c Fixing aqlprofile ASM statement (#2881)
* Fixing aqlprofile ASM statement

* Removing f16 tests
2026-01-29 09:01:41 +05:30
Tao Sang 66a1e38387 SWDEV-577011 Fix missing ais symbols in Windows (#2871)
Fix missing ais symbols in rocr in Windows
2026-01-28 22:29:30 -05:00
Copilot 14f9f2537a Add artifact upload steps to AMDSMI CI workflow for PR builds (#2936) 2026-01-28 18:14:47 -05:00
David Yat Sin 99d88827fb Update CODEOWNERS for ROCR-Runtime (#2790) 2026-01-28 15:53:23 -05:00
Yazen AL Musaffar 0c54f1d6f6 [AMD-SMI] [SWDEV-572092] amd-smi does not redirect output to file when --json option is used. (#2389)
* Fix for amd-smi json file redirection is broken

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* merge branch develop into SWDEV-572092

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
2026-01-28 13:54:44 -06:00
German Andryeyev a5ada1e6e3 SWDEV-567852 - Clean-up HIP events (#2708)
* SWDEV-567852 - Clean-up HIP events

Removed unused fields, optimized memory allocation, improved encapsulation, modernized with C++11 auto, added documentation
2026-01-28 13:34:07 -05:00
Swati Rawat 9de4a2ebb1 Correct rocprofv3 usage instructions (#2925)
* Correct rocprofv3 usage

* Apply suggestion from @SwRaw

* Apply suggestion from @SwRaw

* Update .gitignore
2026-01-28 22:46:19 +05:30
Jason Bonnell d917259953 Add --verbose to ctest to get more output (#2928) 2026-01-28 22:43:14 +05:30
Sajina PK e265e0e24f [rocprofiler-systems]: Add documentation for communication API tracing (#2478)
Add documentation for communication runtime tracing for MPI, UCX, RCCL.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2026-01-27 23:48:27 -05:00
SaleelK 5c7c549301 clr: Fix some nullptr checks and prints (#2825) 2026-01-27 16:45:17 -08:00
vedithal-amd 996202f560 [rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894)
* Backport documentation changes from ROCm 7.1 release branch

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review comments

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 17:22:41 -05:00
vedithal-amd 717cdde126 Update test_metric_validation.py to handle MI325X (#2866) 2026-01-27 16:12:05 -05:00
vedithal-amd 93407271df [rocprofiler-compute] Fix docker file for testing (#2883)
* Fix docker file for testing

* Add correct WORKDIR
2026-01-27 16:11:29 -05:00
cfallows-amd 4d7f709510 [rocprofiler-compute] Update baseline comparison notes in documentation (#2878)
* Update baseline comparison with anchor, text, samples, image in CLI page. Fixes broken 404 links after grafana was removed.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update options in list to full name, correct gpu id option.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Formatting and broken intersphinx fixed

* Indentation formatting fixed

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-27 16:04:21 -05:00
Yiannis Papadopoulos fdb19e5a4c rocr: Format script skips non-existing files in sparse checkouts (#2360) 2026-01-27 15:58:53 -05:00
Shadi Dashmiz b816d10802 Fix for pntr attri query from a peer device (#2722)
* Fix for pntr attri query from a peer device

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>

* SWDEV-577116 : Fix qeury on peer device

- if access is disabled query should return error.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 15:25:14 -05:00
sluzynsk-amd f37b100c34 SWDEV-563777 - further reduce compilation warnings (#2331)
This change resolves some of the warnings generated during clr builds.
Quiet regular output of doxygen.
Disable non-documented warnings of doxygen.

Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>
2026-01-27 20:51:16 +01:00
Yazen AL Musaffar b7829db10a [AMD-SMI] [SWDEV-553392] Removed Driver Reload capability from amd-smi cli only. (#2665)
* Removed Driver Reload capability from amd-smi cli only

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Updates

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* updates

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>

* Update CHANGELOG.md

---------

Signed-off-by: yalmusaf_amdeng <Yazen.ALMusaffar@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
2026-01-27 13:33:03 -06:00
Ioannis Assiouras a66c6ca156 Removed extra marker when syncing graph streams back to the launch stream (#2823) 2026-01-27 19:26:48 +00:00
Venkateshwar Reddy Kandula 7f5e443e44 format rocprofiler-sdk via black. (#2703)
Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com>
2026-01-27 13:30:50 -05:00