커밋 그래프

76354 커밋

작성자 SHA1 메시지 날짜
Donato Capitella a2686c9f41 Fix(critical): Prevent ncclInternalError when SMI is disabled by mocking getDeviceIndexByPciBusId 2026-02-01 12:48:18 +00:00
Donato Capitella 532214edfb Fix: Export rsmi_init shim with default visibility to be seen by PyTorch 2026-02-01 12:12:28 +00:00
Donato Capitella aec38e7dde Fix(critical): Add rsmi_init shim to satisfy PyTorch linker dependencies when SMI is disabled 2026-02-01 12:10:13 +00:00
Donato Capitella f4b6e5f450 Fix: Unconditionally include SMI headers in build list to fix hipify missing file error 2026-02-01 11:54:43 +00:00
Donato Capitella 0586700b06 fix: disable AMD SMI for gfx1151 targets in CMake and remove a debug error from the SMI wrapper header. 2026-02-01 11:49:27 +00:00
Donato Capitella 3f31d17ae7 build: Add compile-time error when ROCM_SMI is disabled. 2026-02-01 11:44:38 +00:00
Donato Capitella f227312867 Fix(Refactor): Switch SMI logic to whitelist (RCCL_SMI_ENABLED) and remove redundant fallback code 2026-02-01 11:31:39 +00:00
Donato Capitella 54de8024d3 Perf: Add NO_COMPRESS option to disable slow offload-compress 2026-02-01 11:14:25 +00:00
Donato Capitella 3bd4e81a8b Fix: Switch to add_compile_definitions for SMI_DISABLED and remove redundant target_ call 2026-02-01 11:13:07 +00:00
Donato Capitella 7504897fe4 Fix cmake syntax error: add missing endif() 2026-02-01 10:55:56 +00:00
Donato Capitella 1d5c0c1add Fix(critical): Move SMI_DISABLED logic to top of CMakeLists.txt and force via target_compile_definitions 2026-02-01 10:55:06 +00:00
Donato Capitella 2e6df33acc Fix(critical): Introduce SMI_DISABLED define to forcibly disable SMI usage in headers 2026-02-01 10:39:37 +00:00
Donato Capitella cd91b85935 Fix: Provide inline dummy SMI symbols when SMI is disabled to prevent link errors 2026-02-01 10:27:12 +00:00
Donato Capitella 484bd5bf0f Fix: Properly guard rocm_smi_wrap.cc content with USE_ROCMSMI 2026-02-01 10:13:38 +00:00
Donato Capitella 95b150d96a Fix: Do not compile rocm_smi_wrap.cc when ENABLE_AMDSMI is OFF 2026-02-01 10:11:26 +00:00
Donato Capitella 6289de70ad Force unset USE_AMDSMI internal cache variable when ENABLE_AMDSMI is OFF 2026-02-01 09:56:24 +00:00
Donato Capitella f1f0851398 Fix undefined amdsmi_init by properly guarding SMI code and adding ENABLE_AMDSMI option 2026-02-01 09:34:49 +00:00
Donato Capitella b4f25507ec Allow disabling SMI support via ENABLE_AMDSMI in cmake 2026-02-01 09:07:16 +00:00
Donato Capitella d2ea5d5d4c fix(rccl): disable symmetric kernels when GENERATE_SYM_KERNELS is OFF 2026-02-01 08:44:52 +00:00
Donato Capitella 8126402d12 fix(rccl): fix typo in ncclSymkGetKernelPtr fallback 2026-02-01 08:26:01 +00:00
Donato Capitella 0b8251289a feat(rccl): add gfx1151 support 2026-01-31 16:42:58 +00:00
Pedram Alizadeh c19441b2b9 Reducing the p2pnChannels to 32 (from 64) for send/recv based collectives on multi-node MI350 (2 and 4 nodes) (#2977) 2026-01-30 18:23:09 -05:00
ammallya aa840563a7 Migration of rocdecode and rocjpeg complete (#2998) 2026-01-30 14:37:28 -08:00
systems-assistant[bot] 4358cad858 [AzureCI] Add all_reduce_bias to rccl-tests CI (#2768)
* [AzureCI] Add all_reduce_bias to rccl-tests CI

* Increase rccl-tests timeout to 2 hours

---------

Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-30 14:29:06 -07:00
Ameya Keshava Mallya 821a6e0700 Merge remote-tracking branch 'origin/develop' into preserved/rocjpeg 2026-01-30 20:58:59 +00:00
Shadi Dashmiz f1e5612e26 SWDEV-572439: make assert_fail constexpr in the hip headers (#2392)
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-30 15:41:17 -05:00
Shadi Dashmiz e1844f6a59 SWDEV-573004 - fix shfl_sync for compiler init value (#2533)
- add attribute for maybe undef

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
2026-01-30 15:39:42 -05:00
Ameya Keshava Mallya 2ff4a999c3 Merge remote-tracking branch 'origin/develop' into preserved/rocdecode 2026-01-30 20:37:24 +00:00
Ameya Keshava Mallya f4f1df295a Add 'projects/rocjpeg/' from commit '06a08d3cb83b7e77555ff2baebedfe4e52fa5dbb'
git-subtree-dir: projects/rocjpeg
git-subtree-mainline: d0396f30b3
git-subtree-split: 06a08d3cb8
2026-01-30 20:35:04 +00:00
Ameya Keshava Mallya d0396f30b3 Add 'projects/rocdecode/' from commit 'b0bab079403eda171f9056409fa96b0908f61073'
git-subtree-dir: projects/rocdecode
git-subtree-mainline: 5d609c1e57
git-subtree-split: b0bab07940
2026-01-30 20:33:26 +00:00
xuchen-amd 5d609c1e57 fix script input bug for delta generation (#2944) 2026-01-30 14:45:12 -05:00
cfreeamd 5172701708 rocr: Correct gpu dumped core contents (#2851)
Includes several tests (rocrtst) for this capability.
2026-01-30 09:38:09 -08:00
Sjoerd Simons 7e6b7cb50b Improve path readability check (#2967)
On modern system e.g. render nodes are made accessible via the udev
uaccess functionality, which adds the logged in user to the ACL of the
device. This means just checking for user and group is bound to give
false positives. Instead use os.access as a first check

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-30 11:34:39 -06:00
systems-assistant[bot] 327778ef18 Add nccl_debug variable and values (#2756)
* nccl-debug variables table test

* spacing

* spacing

* RCCL variable edits from SME

* Update projects/rccl/docs/api-reference/env-variables.rst

Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: Matt Williams <Matt.Williams+amdeng@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Matt Williams <matt.williams@amd.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-30 12:00:18 -05:00
David Yat Sin 00e8a67165 rocr: Restore mmap flags back to MAP_PRIVATE (#2886)
Change mmap flags back to MAP_PRIVATE as MAP_SHARED increases allocation
time. Transparent huge pages are disabled for MAP_SHARED by default.
2026-01-30 08:36:05 -08:00
systems-assistant[bot] 1211790607 Direct Reduce Scatter Implementation (#2765)
* Add new implementation of direct send/recv reduce scatter

* Resolved conflicts

* Add multiple channels support to the reduction kernel of direct reduce scatter and adjust offset into buffer to utilize multiple channels.

* Resolve validation issue when number of elements is not divisible by number of channels leaving elements unaccount for in reduction.

* fix proxy hang

* set maxSrcs to 64 in reduceCopy

* optimize multi-channel code

* fix validation issue in single node MI300

* Tune the message size range for 2,4, and 8 Nodes

* Move Direct RS into separate kernel

* Add Copyright

* resolve review comments

* resolve review comments

* fix merge build issue

* revert move Direct RS into separate kernel

* address review comments

* address review comments

---------

Co-authored-by: KawtharShafie <kawtharshafie@gmail.com>
Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2026-01-30 09:27:27 -06:00
systems-assistant[bot] 055909d335 Set default max channels to 48 for MI350 multi-node (#2759)
* make 48 the default max channels for MI350

* address review comments

---------

Co-authored-by: Ghadeer Alabandi <abandiga@gmail.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-30 09:22:42 -06:00
Alysa Liu 13091e18ad libhsakmt: Add THEROCK_SANITIZER support for ASAN builds (#2978)
Add THEROCK_SANITIZER support for ASAN builds.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-30 10:02:10 -05:00
Jin Jung 25d0107d24 SWDEV-575867: Fix error code for mapped graphics resources (#2662) 2026-01-30 07:47:13 -05:00
Alexandra Sidorova 8800e03058 [CLR] Added missed ostream include to amd_hip_bfloat16.h (#2960) 2026-01-30 07:42:38 -05:00
Jan Stephan 20a745962f Fix graph API binary paths (#2884)
Signed-off-by: Jan Stephan <jan.stephan@amd.com>
2026-01-30 10:51:37 +01:00
Junhua Shen 0d98c3bdd5 libhsakmt: Implement per-context topology for multi-context KFD support (#2405)
This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management.

Changes:
* Add hsaKmtGetClockCountersCtx for multi-context support
  - Add context-aware version of hsaKmtGetClockCounters
  - Original API is retained as a wrapper calling the ctx-version with primary context

* Enable independent debug sessions across multiple KFD contexts
  -Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context

* Add perf sub-context for per-context performance counter management
  - Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization

* Refactor FMM for per-context resource management
  - Refactor multiple global variables related to FMM, including 
    GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context

* Implement per-context topology for complete context isolation
  - Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id)
     to per-context hsa_kfd_topology_context structure
  - Update all topology functions to accept HsaKFDContext parameter for
     context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.)
  - Refactor topology snapshot management for per-context isolation
  - Add context-aware PMC trace access APIs

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2026-01-30 09:42:25 +08:00
vedithal-amd a838b0c07b [rocprofiler-compute] Fix test case for MI 308 (#2934)
* Fix test case for MI 308

* Use consistent naming of GPUs in comment
2026-01-29 18:54:52 -05:00
Venkateshwar Reddy Kandula dea3da3a6f [rocprofiler-sdk][CI] Fix rocprofiler-sdk CI change rocm version to 7.2.0 (#2979)
* Update aqlprofile-continuous_integration.yml to use rocm-7.2.0

* Update rocprofiler-sdk-continuous_integration.yml to use rocm-7.2.0

* Update rocprofiler-sdk-docs.yml to use rocm-7.2.0
2026-01-29 17:28:52 -06:00
pghoshamd adaaa883b2 SWDEV-576090 Fix mem leaks and double free of signals (#2817) 2026-01-29 16:53:27 -05:00
Mark Meserve 94c246eb9e attach: fix typos and older names in documentation (#2684) 2026-01-29 16:46:24 -05:00
vedithal-amd 4b364df43b [rocprofiler-compute] Enable panel level csv files for roofline panel (#2887)
* Enable panel level csv files for roofline panel

* Fixed comments
2026-01-29 15:32:54 -05:00
Rahul Manocha c4f7593001 clr: Update signal count and pool size for staging buffer (#2889)
* clr: Update signal count and pool size for staging buffer

* Change to naming of variables etc

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-29 10:34:00 -08:00
systems-assistant[bot] 58c203e252 Fix channel overuse for 1 rank comms (#2760)
* Fix channel overuse for 1 rank comms

* limit channels when warpSpeed is enabled but not used

* enable std::min check against # of CUs for maxChannels computation when warpSpeed is enabled

---------

Co-authored-by: Mustafa Abduljabbar <muabdulj@amd.com>
Co-authored-by: isaki001 <ioannissakiotis@gmail.com>
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
2026-01-29 12:13:46 -06:00
Benjamin Welton b509e9bd77 [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations (#2941)
* [rocprofiler-sdk] Fix domain_ops_padding for 515+ HIP operations

The HIP runtime API now has 515+ operations (as of ROCm 7.x), but
domain_ops_padding was set to 512. This caused std::out_of_range
exceptions when checking operations >= 512 via std::bitset::test().

Changes:
- Increase domain_ops_padding from 512 to 1024
- Add compile-time static_assert to validate padding is sufficient
  for all API domains (HIP, HSA, marker, RCCL, rocDecode, rocJPEG)

Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>

* Update projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/context/domain.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [rocprofiler-sdk] Apply clang-format-11 to domain.cpp

Co-Authored-By: Claude (claude-opus-4.5) <noreply@anthropic.com>

* Rework implementation to ensure coverage of all operation enums

* Fix compiler error in unit test for enum_string.cpp

* Fix data types of domain_ops_padding values

* Revert some changes in domain.cpp

---------

Co-authored-by: Claude (claude-opus-4.5) <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-29 12:26:33 -05:00