* Add new implementation of direct send/recv reduce scatter
* Resolved conflicts
* Add multiple channels support to the reduction kernel of direct reduce scatter and adjust offset into buffer to utilize multiple channels.
* Resolve validation issue when number of elements is not divisible by number of channels leaving elements unaccount for in reduction.
* fix proxy hang
* set maxSrcs to 64 in reduceCopy
* optimize multi-channel code
* fix validation issue in single node MI300
* Tune the message size range for 2,4, and 8 Nodes
* Move Direct RS into separate kernel
* Add Copyright
* resolve review comments
* resolve review comments
* fix merge build issue
* revert move Direct RS into separate kernel
* address review comments
* address review comments
---------
Co-authored-by: KawtharShafie <kawtharshafie@gmail.com>
Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management.
Changes:
* Add hsaKmtGetClockCountersCtx for multi-context support
- Add context-aware version of hsaKmtGetClockCounters
- Original API is retained as a wrapper calling the ctx-version with primary context
* Enable independent debug sessions across multiple KFD contexts
-Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context
* Add perf sub-context for per-context performance counter management
- Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization
* Refactor FMM for per-context resource management
- Refactor multiple global variables related to FMM, including
GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context
* Implement per-context topology for complete context isolation
- Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id)
to per-context hsa_kfd_topology_context structure
- Update all topology functions to accept HsaKFDContext parameter for
context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.)
- Refactor topology snapshot management for per-context isolation
- Add context-aware PMC trace access APIs
Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
* Update aqlprofile-continuous_integration.yml to use rocm-7.2.0
* Update rocprofiler-sdk-continuous_integration.yml to use rocm-7.2.0
* Update rocprofiler-sdk-docs.yml to use rocm-7.2.0
* clr: Update signal count and pool size for staging buffer
* Change to naming of variables etc
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* Fix channel overuse for 1 rank comms
* limit channels when warpSpeed is enabled but not used
* enable std::min check against # of CUs for maxChannels computation when warpSpeed is enabled
---------
Co-authored-by: Mustafa Abduljabbar <muabdulj@amd.com>
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
* [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations
The HIP runtime API now has 515+ operations (as of ROCm 7.x), but
domain_ops_padding was set to 512. This caused std::out_of_range
exceptions when checking operations >= 512 via std::bitset::test().
Changes:
- Increase domain_ops_padding from 512 to 1024
- Add compile-time static_assert to validate padding is sufficient
for all API domains (HIP, HSA, marker, RCCL, rocDecode, rocJPEG)
Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>
* Update projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/context/domain.cpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* [rocprofiler-sdk] Apply clang-format-11 to domain.cpp
Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>
* Rework implementation to ensure coverage of all operation enums
* Fix compiler error in unit test for enum_string.cpp
* Fix data types of domain_ops_padding values
* Revert some changes in domain.cpp
---------
Co-authored-by: Claude (claude-opus-4.5) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* SWDEV-561708 Counted queue size from env var
* use counted_queue_size for test
* remove rocrtst changes; add a const for default queue size
* Remove env var from test; use queue->size
* Improve env var documentation
* Correct type
- Fix ROCM_VERSION guard used for the scratch_memory_record structure
- This fixes a rocm/7.0.2 build failure
---------
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
* Adding support of AQL Profiler for GFX 11.5
* Removing hard coded value for sa_number
* Adding instance count for WGP block, removing hard coded values.
* Fixed SQ counter block and TD counter block instances
Mode-1 GPU reset affects entire XGMI hive. Added
xgmi_hive_id check to reset only once for same-hive
GPUs while preserving separate resets for different
hives or no hives.
- Example:
`sudo amd-smi reset -G` or `sudo amd-smi reset -G -g 0`
on MI300 will reset all GPU's only once.
Signed-off-by: Bindhiya Kanangot Balakrishnan <Bindhiya.KanangotBalakrishnan@amd.com>
Show "N/A" for ASICs without fan support
`amd-smi set -h` fan help text will be dynamic instead of "0-255 or 0-100%"
Signed-off-by: Sumanth Gavini <sumanth.gavini@amd.com>
* Update pyhton docs for process memory usage
* Added comment for processes total memory usage
---------
Signed-off-by: yalmusaf <Yazen.ALMusaffar@amd.com>
* Add common module
* Added information to help with unknowns
* Allow paring of cmds
* change cmd print default
* Reduce cmds to be tested
---------
Signed-off-by: amd-josnarlo <joseph.narlo@amd.com>
Co-authored-by: amd-josnarlo <joseph.narlo@amd.com>
* Fix exception handling in power profile commands
* Update CHANGELOG.md
* Update amdsmi_parser.py for the single character argument for --profile as -o
---------
Co-authored-by: Koushik Billakanti <Koushik.Billakanti@amd.com>
Co-authored-by: gabrpham <Gabriel.Pham@amd.com>
Co-authored-by: Maisam Arif <Maisam.Arif@amd.com>
This change resolves some of the warnings generated during clr builds.
Quiet regular output of doxygen.
Disable non-documented warnings of doxygen.
Signed-off-by: Sebastian Luzynski <Sebastian.Luzynski@amd.com>