* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.
* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.
* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.
* Wrapped the support checks in helper functions on `TestBed`.
[ROCm/rccl commit: 18e9ad913b]
* Added single process isolation support to execute tests
* Address review comments
* Update README
* Removed requirement of explicit call to clear method
* Added macros for simplified usage
* Updated tests to use process isolation framework
* Adjust summary output format for isolated tests
* Updated rccl_wrap tests
* Used process isolation in AllocTests
* Used process isolation and fixed failing tests
* Modified test output, added signal handling
Updated macros to handle lambdas
* Convert argcheck tests to isolated tests
* Convert proxy tests to isolated tests
* Remove non-supported test
* Fixed file descriptor handling and clearing env vars for tests
[ROCm/rccl commit: 7e10267dfd]
* Added MPI support to execute unit/functional tests
Update node and process validation
Updated node detection count and modified validation method
Update validation logic to include max procs and nodes
* Address review comments
* Fix warnings
* Added a new NET transport test and clean up
* Added MPI test logging mechanism
* Decoupled GTest framework
* Added Net IB functional tests
* Updated with resource guards
* Added NET IB tests and refactored code
* Update P2pWorkflow test
* Update documentation
* Add MPI_TESTS_ENABLED guard to the file
* Fix Shm and NetIB tests
* Applied refactoring and cleanup
* Replaced BufferGuard with AutoGuard
* Modified test debug logging
* Use macro to reduce NcclTypeTraits code duplication
- Replace repetitive template specializations with a single
DEFINE_NCCL_TYPE_TRAIT macro
- Use stringification operator (#) to auto-generate type name strings
- Add #undef to keep macro from polluting namespace
- Makes adding new type mappings trivial
* Unify buffer initialization with generic pattern function
- Remove initializeBufferWithCustomPattern
- Make initializeBufferWithPattern generic with PatternFunc template param
- Now single function handles all patterns via lambda injection
- Updated all test files to use lambdas for pattern generation
- Pattern logic now visible at call site (self-documenting)
* Unify buffer verification with pluggable pattern function
- Remove verifyBufferWithCustomCheck
- Make verifyBufferData generic with PatternFunc template param
- Single function handles all verification patterns via lambda injection
- Updated all test files to use lambdas
- Better defaults: num_samples=0 means verify all elements
- Pattern logic now visible at call site (self-documenting)
* Docs: Add DeviceBufferHelpers section to MPITestRunner.md
- Document new refactored buffer initialization/verification API
- Explain pluggable pattern functions with lambda examples
- Show type mapping and automatic float/int comparison
- Include migration guide from old API to new unified functions
- Demonstrate best practices with real-world examples
- Reference recent refactoring commits (macro-based type traits)
* Docs: Update documentation and examples
- Update on DeviceBufferHelpers
- Update examples using DeviceBufferHelpers methods, e.g. data verification
* Address review comment.
- Replace manual pattern generation loop with initializeBufferWithPattern call
- Use downloadBuffer to get host copy instead of manual hipMemcpy
* Remove non-existent dependency
* Remove duplicate testcase
* Code cleanup in test files
* Moved common constants to base class
[ROCm/rccl commit: 29e1567b95]
* update AG direct and single node LL threshold
* update thresholds based on MI350 expeirmental results
* disable using LL for direct AG
* enable direct AG for lower GPU counts
* direct AG single node tuning
* fix in-place buffer allocation for AG unit test
* whitespace fix
* gate direct AG for gfx950 and gfx942
---------
Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>
[ROCm/rccl commit: d22a39e954]
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.
[ROCm/rccl commit: b88c134874]
* Increased max stack size to 640
* Added new binary for executing unit tests
Added new unit tests for argcheck.cc and alt_rsmi.cc files
Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.
[ROCm/rccl commit: 275fdd43c1]
* removing extra build time by removing the gfx11xx arch from using hip_fp8
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 697bee4ee8]
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"
This reverts commit 30eecfdb25.
* [UT] Modify max stack size to 496
* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION
* addressing the ci failure
* Adding the device tag
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 170acf3bda]
* Skipping AllReduce FP8 test on 9 to 16 ranks (gfx90a) as it's using Tree algorithm not RING
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 5f691aaf65]
* Limit P2P channels per peer to not exceeding max channels
* [UT] test single GPU cases for all collectives
* [UT] fix out of range root value
[ROCm/rccl commit: 4237caad69]
* working tests for a single message size
* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests
* simplify test invocation
* remove unecessary logs and exit from ncclCommRegister
* set expected results for allGather
* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process
* fix improper size of expected-results vector
* Removing unused changes.
* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.
* Apply suggestions from code review
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
---------
Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>
[ROCm/rccl commit: 3398fa78fe]
* Initializing all ranks to the same value to avoid failure of UT AllReduce for FP8 type
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 39483c55f8]
* mapping devices wrt pci
* Gpu allocation by using pci mapping
* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.
* Removing redundant testBed instance calling
[ROCm/rccl commit: 69b2b712ab]
* Unit Tests for RCCL in CPX mode
* override pow2gpus set by cpx mode by user argument
* Adding comment for UT_POW2_GPUS
* Additional comment on why using pow2gpus for cpx mode.
[ROCm/rccl commit: 289a80c4e9]
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rccl commit: db840f024e]
* Template unroll for RCCL kernels
* Adding unroll template arg during CMake hipification
* Reduce linking parallel jobs to avoid OOM in CI
* Workaround issues with UT tests
SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking
* CI: do not use -j 16 when building
* CI: use -j 8 when building
* Only reduce parallel linking job for CI extended
* Restore original jenkins command. Change parallel linking jobs in cmake
* Disable MSCCLPP
---------
Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com>
[ROCm/rccl commit: 89349f2ce4]
MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.
[ROCm/rccl commit: 0c36d571ea]
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
[ROCm/rccl commit: 6777e65c1d]