88 Révisions

Auteur SHA1 Message Date
Marzieh Berenjkoub d7293281f3 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 858b4e76eb]
2026-01-20 13:04:02 -06:00
Deeksha Goplani ea1f021496 Added new unit test for register.cc (#1712)
* new unit test for register.cc

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>

* Add new register API tests

* Fix debug message ordering issue

---------

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: 420b3b840e]
2026-01-09 17:04:01 -06:00
corey-derochie-amd de82a18790 Fixed unit-test env var list parsing and improved filtered test run speed (#1626)
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.

* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.

* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.

* Wrapped the support checks in helper functions on `TestBed`.

[ROCm/rccl commit: 18e9ad913b]
2025-12-10 10:06:44 -07:00
Atul Kulkarni 11ffeda52f Added a Process Isolated Test Runner (#1993)
* Added single process isolation support to execute tests

* Address review comments

* Update README

* Removed requirement of explicit call to clear method

* Added macros for simplified usage

* Updated tests to use process isolation framework

* Adjust summary output format for isolated tests

* Updated rccl_wrap tests

* Used process isolation in AllocTests

* Used process isolation and fixed failing tests

* Modified test output, added signal handling

Updated macros to handle lambdas

* Convert argcheck tests to isolated tests

* Convert proxy tests to isolated tests

* Remove non-supported test

* Fixed file descriptor handling and clearing env vars for tests

[ROCm/rccl commit: 7e10267dfd]
2025-12-08 10:36:05 -06:00
Atul Kulkarni 142860442a Enable MPI support to execute MPI specific unit/functional tests (#1996)
* Added MPI support to execute unit/functional tests

Update node and process validation
Updated node detection count and modified validation method
Update validation logic to include max procs and nodes

* Address review comments

* Fix warnings

* Added a new NET transport test and clean up

* Added MPI test logging mechanism

* Decoupled GTest framework

* Added Net IB functional tests

* Updated with resource guards

* Added NET IB tests and refactored code

* Update P2pWorkflow test

* Update documentation

* Add MPI_TESTS_ENABLED guard to the file

* Fix Shm and NetIB tests

* Applied refactoring and cleanup

* Replaced BufferGuard with AutoGuard

* Modified test debug logging

* Use macro to reduce NcclTypeTraits code duplication

- Replace repetitive template specializations with a single
  DEFINE_NCCL_TYPE_TRAIT macro
- Use stringification operator (#) to auto-generate type name strings
- Add #undef to keep macro from polluting namespace
- Makes adding new type mappings trivial

* Unify buffer initialization with generic pattern function

- Remove initializeBufferWithCustomPattern
- Make initializeBufferWithPattern generic with PatternFunc template param
- Now single function handles all patterns via lambda injection
- Updated all test files to use lambdas for pattern generation
- Pattern logic now visible at call site (self-documenting)

* Unify buffer verification with pluggable pattern function

- Remove verifyBufferWithCustomCheck
- Make verifyBufferData generic with PatternFunc template param
- Single function handles all verification patterns via lambda injection
- Updated all test files to use lambdas
- Better defaults: num_samples=0 means verify all elements
- Pattern logic now visible at call site (self-documenting)

* Docs: Add DeviceBufferHelpers section to MPITestRunner.md

- Document new refactored buffer initialization/verification API
- Explain pluggable pattern functions with lambda examples
- Show type mapping and automatic float/int comparison
- Include migration guide from old API to new unified functions
- Demonstrate best practices with real-world examples
- Reference recent refactoring commits (macro-based type traits)

* Docs: Update documentation and examples

- Update on DeviceBufferHelpers
- Update examples using DeviceBufferHelpers methods, e.g. data verification

* Address review comment.

- Replace manual pattern generation loop with initializeBufferWithPattern call
- Use downloadBuffer to get host copy instead of manual hipMemcpy

* Remove non-existent dependency

* Remove duplicate testcase

* Code cleanup in test files

* Moved common constants to base class

[ROCm/rccl commit: 29e1567b95]
2025-12-06 16:05:37 -06:00
Atul Kulkarni 86a4dd95f6 Remove static to non-static conversion used in tests (#2084)
* Remove coll_reg tests which are unsupported

* removed static to non-static conversion feature

[ROCm/rccl commit: 7ec8e73e12]
2025-12-04 18:03:14 -06:00
Atul Kulkarni 0ced7aede8 Fix rccl test suite to use hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic (#2082)
[ROCm/rccl commit: cc6e259a02]
2025-12-04 10:02:06 -06:00
Atul Kulkarni e4aef19511 Added new unit tests for AllReduce with Bias API (#2036)
* Added new unit tests for AllReduce with Bias API

* Address review comments

[ROCm/rccl commit: 7c12b0b76b]
2025-12-03 17:37:34 -06:00
Nusrat Islam d6d5fac152 Update direct AG and single node LL threshold (#1944)
* update AG direct and single node LL threshold

* update thresholds based on MI350 expeirmental results

* disable using LL for direct AG

* enable direct AG for lower GPU counts

* direct AG single node tuning

* fix in-place buffer allocation for AG unit test

* whitespace fix

* gate direct AG for gfx950 and gfx942

---------

Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>

[ROCm/rccl commit: d22a39e954]
2025-10-09 10:48:50 -05:00
corey-derochie-amd af1c448ed1 Changed TestBedChild to avoid hang if the call fails (#1875)
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.

[ROCm/rccl commit: b88c134874]
2025-08-23 00:17:34 -05:00
Arm Patinyasakdikul 8557ea33ad Test: delete child object to address memory leak. (#1863)
[ROCm/rccl commit: 9d3acffa5f]
2025-08-20 10:15:03 -05:00
awelling2801 0d34963b35 Added tests for coll_reg (#1700)
Changes to coll_reg

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 5ecc1b7ede]
2025-07-31 13:49:23 -05:00
awelling2801 839fcb54b5 Added tests for transport.cc (#1725)
Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 7320752bf3]
2025-07-31 11:04:28 -05:00
Atul Kulkarni c94fb7c58e Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

[ROCm/rccl commit: 275fdd43c1]
2025-07-17 11:20:49 -05:00
mberenjk 1623fcc7a1 Improving build time by removing the gfx11xx and host code from rccl_float8.h (#1789)
* removing extra build time by removing the gfx11xx arch from using hip_fp8

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 697bee4ee8]
2025-07-09 14:03:47 -05:00
Rakesh Roy 82a822b646 Fix chrono build error (#1790)
[ROCm/rccl commit: dd3b1d816c]
2025-07-04 08:27:30 -05:00
BertanDogancay c0c9312e38 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: aaf023976a]
2025-06-20 07:54:49 -05:00
Arm Patinyasakdikul 7f7f1cede3 Added missing copyright message. (#1742)
* Added missing copyright message.

* addressed comments.

[ROCm/rccl commit: 6c37ae9470]
2025-06-12 09:58:01 -05:00
Atul Kulkarni 4cd71722f2 Added new ENABLE_CODE_COVERAGE option. (#1664)
Modified install.sh script to add this new option

[ROCm/rccl commit: 682ed36fe6]
2025-06-10 12:12:36 -05:00
Arm Patinyasakdikul 59597ad8a7 Test: bump max stacksize once again to match current expectation.
[ROCm/rccl commit: c07445d5b4]
2025-05-23 11:18:25 -05:00
Arm Patinyasakdikul 2cb65ba466 Test: Change max stack size to 520 to accomodate new ROCm changes.
[ROCm/rccl commit: 523e0893e4]
2025-05-21 20:21:27 -05:00
corey-derochie-amd 65d67dce7a Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues. (#1546)
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"

This reverts commit 30eecfdb25.

* [UT] Modify max stack size to 496

* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION

* addressing the ci failure

* Adding the device tag

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 170acf3bda]
2025-05-14 15:33:03 -05:00
BertanDogancay d045d0ca23 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a6bf9bfc9e]
2025-04-23 20:47:43 -07:00
gilbertlee-amd 8023be9355 Adding UT_DEBUG_PAUSE to unit tests (#1653)
[ROCm/rccl commit: ee85a70bb4]
2025-04-21 21:15:07 -06:00
Tim 58ee618194 RCCL Replayer update (#1603)
RCCL recorder w/ suggested change and UT



[ROCm/rccl commit: 9a55ff60a9]
2025-04-19 00:21:27 -04:00
AbandiGa acf0bc1c6e added copyright (#1635)
[ROCm/rccl commit: 7a84c5dbb0]
2025-04-14 09:46:18 -05:00
BertanDogancay 8ed27fde74 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 0b2062c560]
2025-03-27 12:53:04 -05:00
Nilesh M Negi 8cfbc0fbd1 [UT] Increase stack size for StandaloneTests to 480 (#1616)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl commit: d6b987a53f]
2025-03-21 21:33:32 -05:00
mberenjk a3a598efb3 Skipping AllReduce test on more than 8 ranks for FP8 type on Hyabusa (#1598)
* Skipping AllReduce FP8 test on 9 to 16 ranks (gfx90a) as it's using Tree algorithm not RING

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 5f691aaf65]
2025-03-17 10:22:49 -05:00
Wenkai Du afd04a5117 Limit P2P channels per peer to not exceeding max channels (#1594)
* Limit P2P channels per peer to not exceeding max channels

* [UT] test single GPU cases for all collectives

* [UT] fix out of range root value

[ROCm/rccl commit: 4237caad69]
2025-03-11 09:32:09 -07:00
Nilesh M Negi 4ccbaabdc9 [UT] Include iomanip if not defined (#1510)
* [UT] Include iomanip if not defined

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Remove include guards

`iomanip.h` has pre-defined include guards. These are not needed.

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: 4e406acc43]
2025-02-11 08:48:47 -07:00
isaki001 a40d4eb960 non-hipGraph MSCCL++ tests for allReduce and allGather (#1503)
* working tests for a single message size

* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests

* simplify test invocation

* remove unecessary logs and exit from ncclCommRegister

* set expected results for allGather

* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process

* fix improper size of expected-results vector

* Removing unused changes.

* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.

* Apply suggestions from code review

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

[ROCm/rccl commit: 3398fa78fe]
2025-02-04 09:11:32 -06:00
corey-derochie-amd 8e6bedeedc Increased gfx90a stack size expectation to 320 to match latest compiler. (#1487)
[ROCm/rccl commit: c68b558ed5]
2025-01-16 17:04:51 -07:00
mberenjk 300f954185 Initializing all ranks to the same value to avoid failure of UT AllR… (#1459)
* Initializing all ranks to the same value to avoid failure of  UT AllReduce for FP8 type

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 39483c55f8]
2025-01-02 11:39:02 -06:00
saurabhAMD 69d976532b GPU allocation for CPX Unit Tests using PCI bus id (#1403)
* mapping devices wrt pci

* Gpu allocation by using pci mapping

* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.

* Removing redundant testBed instance calling

[ROCm/rccl commit: 69b2b712ab]
2024-11-04 10:51:00 -06:00
Bertan Dogancay 251df02d42 Increase MAX_STACK_SIZE for UT (#1398)
[ROCm/rccl commit: 984f1e4343]
2024-11-01 13:07:45 -04:00
Tim e346e19065 Adjustment for UT Sendrecv (#1400)
Enabled UT sendrecv to same rank and refactor UBR call

[ROCm/rccl commit: fd9924cfe7]
2024-10-30 15:13:53 -04:00
saurabhAMD e3b39ab309 Making variable names consistent in EnvVars.cpp (#1327)
* Making variable names consistent in EnvVars.cpp

[ROCm/rccl commit: 4856309413]
2024-09-11 09:23:31 -05:00
saurabhAMD fdaef9dd82 Enabling Unit Tests for CPX mode (#1324)
* Unit Tests for RCCL in CPX mode

* override pow2gpus set by cpx mode by user argument

* Adding comment for UT_POW2_GPUS

* Additional comment on why using pow2gpus for cpx mode.

[ROCm/rccl commit: 289a80c4e9]
2024-09-09 10:12:33 -05:00
Tim 1bd3db8fc7 Update EnvVars.cpp
[ROCm/rccl commit: 757d1891e9]
2024-09-04 16:55:36 -04:00
mberenjk 886b576722 adding all nccl apis to api_support to enable rccl tracing by rocprofv3 (#1297)
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rccl commit: db840f024e]
2024-08-22 12:36:07 -05:00
Tim 3261e2a5fd Adding User Buffer Registration support for Unit test (#1199)
* Adding UBR support for UT SendRecv

Signed-off-by: Tim Hu <timhu102@amd.com>

* Update test/common/TestBedChild.cpp

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Signed-off-by: Tim Hu <timhu102@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

[ROCm/rccl commit: a4793286c7]
2024-07-30 13:39:25 -04:00
akolliasAMD 37c44d531b gfx12 Disable ll protocol (#1268)
[ROCm/rccl commit: c246e25f8e]
2024-07-26 08:59:55 -06:00
Wenkai Du 54e4899607 Template unroll for RCCL kernels (#1250)
* Template unroll for RCCL kernels

* Adding unroll template arg during CMake hipification

* Reduce linking parallel jobs to avoid OOM in CI

* Workaround issues with UT tests

SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking

* CI: do not use -j 16 when building

* CI: use -j 8 when building

* Only reduce parallel linking job for CI extended

* Restore original jenkins command. Change parallel linking jobs in cmake

* Disable MSCCLPP

---------

Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com>

[ROCm/rccl commit: 89349f2ce4]
2024-07-19 08:15:59 -07:00
corey-derochie-amd 37bf54b8f8 Enable multi-threading for MSCCL (#1203)
MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.

[ROCm/rccl commit: 0c36d571ea]
2024-07-04 09:34:38 -06:00
saurabhAMD de7ea612d7 Unit Tests for testing channels (#1222)
[ROCm/rccl commit: e170f41ddd]
2024-06-25 10:10:10 -05:00
saurabhAMD 44064a612c enable UT to test with channels greater than 64
[ROCm/rccl commit: 392a73fdef]
2024-06-13 13:54:08 -05:00
Bertan Dogancay dea5e83940 [UT] Start supporting multiple group calls and graphs (#1151)
* Start supporting multiple group calls UT

[ROCm/rccl commit: 0ec41f1386]
2024-04-25 11:11:16 -06:00
mberenjk da835cff9c replacing rccl_bfloat16 with hip_bfloat16 (#1126)
Co-authored-by: mberenjk <mberenjk@amd.com>

[ROCm/rccl commit: 428837ffe4]
2024-04-11 11:30:37 -05:00
Andy li e373bd44bf Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo

[ROCm/rccl commit: 6777e65c1d]
2024-03-08 15:17:53 -08:00