* Adding working single node tests
* Revert to old docker sha
* adding back no perf tests
---------
Co-authored-by: Aravind Ravikumar <arravikum@amd.com>
[ROCm/rccl commit: 4b295c9893]
* ROCSHMEM linking/building to match MSCCL++ style
* add rocSHMEM as a submodule
* Move rocSHMEM submodule to ext-src/rocSHMEM
* Adding submodule support proper, as well as a patch for rocshmem
* Cleaning up INCLUDE_DIR vs INCLUDE_DIRS mixup
* updating patch file
* Pointing rocshmem submodule to edgars fixup patch
* Adding IBVERBS link to the submodule build
* More IBVERBS patching
* pin rocshmem submodule to b534423
* Adding IPC support in rocSHMEM build
* updating rocshmem submodule to resolve CQ errors
* Updating submodule to include recent a2a optimizations
* invoke rocshmem alltoall from rccl
* Updating submodule to CQ error number hang
* Updating submodule to include a2a improvements and bug fixes
* Updating submodule to point to Yiltan's fork and doorbell ring removal commit
* Updating hash to correspond with submodule change
* Updating to no-ctx wg call and updating submodule
* copy-in/copy-out using multiples CUs
* Updating rocSHMEM submodule to include doorbell improvs
* updating gitmodule to point to upstream
* code cleanup and adjust threashold
* guard rocshmem a2a invocation
* Only build with rocshmem when specified
* code cleanup
* address review comments
* Removing debugging failure case
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
* whitespace fix
* Adding rocshmem compile guard
* Removing unneccesary comment
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
* remove commented lines
* address review comments
* cleanup
---------
Signed-off-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Thomas Huber <thomas.huber@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k12-27.cs-aus.dcgpu>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-09.cs-aus.dcgpu>
Co-authored-by: Islam <nusislam@amd.com>
Co-authored-by: Nusrat Islam <nusislam@dell300x-ccs-aus-k13-03.cs-aus.dcgpu>
[ROCm/rccl commit: 27648b0900]
* Update device.h for hip_bfloat16 inclusion guard
Prevents other files in rocm include the old hip/hip_bfloat16.h, which is guarded by _HIP_INCLUDE_HIP_AMD_DETAIL_HIP_BFLOAT16_H_ and _HIP_BFLOAT16_H_
* Update device.h to handle old hip_bfloat16.h
Added a workaround for old hip_bfloat16.h header usage.
[ROCm/rccl commit: 8e4dbfdf37]
* Added python test runner to execute rccl tests
* Disabled capture output to avoid hangs
* Add RCCL_TEST_MPI_HOSTFILE env var to get the hostfile
* Converted test_type to boolean gtest flag
* Removed unused return values
* Added custom rccl library usage
* Removed json output
* Updates to test_runner: added num_gpus field
* Address review comments
* Prepend env vars for single node, single process executions
* Added separate enums for exit and result codes
* Update configuration files
* Moved configurations to its own dir
* Address review comments
* Update tools/scripts/test_runner/README.md
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
---------
Co-authored-by: Corey Derochie <161367113+corey-derochie-amd@users.noreply.github.com>
[ROCm/rccl commit: 0c2c61d2f1]
* Force ring in WarpSpeed manual mode and log event
* Skip usage for non-ring in WarpSpeed auto mode
* Enable WarpSpeed when its CU count is set
[ROCm/rccl commit: 93fdcb160c]
* Added support for AMD ROCm net-ib alongside vanilla net-ib, with auto-generation to detect conflicts early during NCCL sync and enable future customizations.
* Integrated AMD AINIC support in RCCL for out-of-the-box usage, leveraging performance improvements by default, channel pinning for optimal pipeline performance, and extended support for 32B in-line CTS messages.
* Implemented internal derivation of AINIC-specific flags when RCCL AINIC environment parameter is set, and checks before initializing AINIC net-ib methods.
* Included snapshot of auto-generated ROCm net-ib file (src/transport/net_ib_rocm.cc) for reference.
* Fixed typos in RCCL param API (RCCL_AINIC_ROCE) and dlclose.
* Updated plugin loading logic:
* Load internal ROCmIB plugin only when NCCL_NET_PLUGIN is not set.
* Load default internal net-ib only when not AINIC and no external plugin env is set.
[ROCm/rccl commit: 9f4651f20f]
* Update unit tests for alt_rsmi impl
- Create distinct test executable for alt_rsmi testing
- Updated alt_rsmi tests to use public methods
- Compiles alt_rsmi.cc with ARSMI_TEST_BUILD
- Enables external linkage of internal variables
- Only for AltRsmiTests.cpp that manipulates internals
- Clean separation for test behavior
* Address review comments
* restore hidden symbol visibility
[ROCm/rccl commit: 74690ea705]
* remove node-count and threshold restrictions from p2p-batching
* remove batching threshold usage, fix typo for using batching-enablement flag
---------
Co-authored-by: Mustafa Abduljabbar <mustafa.abduljabbar@amd.com>
[ROCm/rccl commit: 7c1049d2a4]
* Add ncclCommDump API
* remove trailing whitespace changes
* Add more proxy trace timestamps
* Add facebook_rccl namespace before proxyTrace timestamp call
* Clean up ProxyTrae construction
* Move updateProxyOpCounter to member function
* Move setProxyOpTimestamp to member function
* Move addNewProxyOp to member function
* Make internal methods private
* Make ProxyTrace thread safe
* Fix unit tests
* Fix overwritten ProxyTrace DONE setting in net.cc
[ROCm/rccl commit: 08dd75712f]
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.
* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.
* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.
* Wrapped the support checks in helper functions on `TestBed`.
[ROCm/rccl commit: 18e9ad913b]
* Added single process isolation support to execute tests
* Address review comments
* Update README
* Removed requirement of explicit call to clear method
* Added macros for simplified usage
* Updated tests to use process isolation framework
* Adjust summary output format for isolated tests
* Updated rccl_wrap tests
* Used process isolation in AllocTests
* Used process isolation and fixed failing tests
* Modified test output, added signal handling
Updated macros to handle lambdas
* Convert argcheck tests to isolated tests
* Convert proxy tests to isolated tests
* Remove non-supported test
* Fixed file descriptor handling and clearing env vars for tests
[ROCm/rccl commit: 7e10267dfd]
* Added MPI support to execute unit/functional tests
Update node and process validation
Updated node detection count and modified validation method
Update validation logic to include max procs and nodes
* Address review comments
* Fix warnings
* Added a new NET transport test and clean up
* Added MPI test logging mechanism
* Decoupled GTest framework
* Added Net IB functional tests
* Updated with resource guards
* Added NET IB tests and refactored code
* Update P2pWorkflow test
* Update documentation
* Add MPI_TESTS_ENABLED guard to the file
* Fix Shm and NetIB tests
* Applied refactoring and cleanup
* Replaced BufferGuard with AutoGuard
* Modified test debug logging
* Use macro to reduce NcclTypeTraits code duplication
- Replace repetitive template specializations with a single
DEFINE_NCCL_TYPE_TRAIT macro
- Use stringification operator (#) to auto-generate type name strings
- Add #undef to keep macro from polluting namespace
- Makes adding new type mappings trivial
* Unify buffer initialization with generic pattern function
- Remove initializeBufferWithCustomPattern
- Make initializeBufferWithPattern generic with PatternFunc template param
- Now single function handles all patterns via lambda injection
- Updated all test files to use lambdas for pattern generation
- Pattern logic now visible at call site (self-documenting)
* Unify buffer verification with pluggable pattern function
- Remove verifyBufferWithCustomCheck
- Make verifyBufferData generic with PatternFunc template param
- Single function handles all verification patterns via lambda injection
- Updated all test files to use lambdas
- Better defaults: num_samples=0 means verify all elements
- Pattern logic now visible at call site (self-documenting)
* Docs: Add DeviceBufferHelpers section to MPITestRunner.md
- Document new refactored buffer initialization/verification API
- Explain pluggable pattern functions with lambda examples
- Show type mapping and automatic float/int comparison
- Include migration guide from old API to new unified functions
- Demonstrate best practices with real-world examples
- Reference recent refactoring commits (macro-based type traits)
* Docs: Update documentation and examples
- Update on DeviceBufferHelpers
- Update examples using DeviceBufferHelpers methods, e.g. data verification
* Address review comment.
- Replace manual pattern generation loop with initializeBufferWithPattern call
- Use downloadBuffer to get host copy instead of manual hipMemcpy
* Remove non-existent dependency
* Remove duplicate testcase
* Code cleanup in test files
* Moved common constants to base class
[ROCm/rccl commit: 29e1567b95]