217 Commits

Author SHA1 Message Date
Marzieh Berenjkoub d7293281f3 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: 858b4e76eb]
2026-01-20 13:04:02 -06:00
Deeksha Goplani ea1f021496 Added new unit test for register.cc (#1712)
* new unit test for register.cc

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>

* Add new register API tests

* Fix debug message ordering issue

---------

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: 420b3b840e]
2026-01-09 17:04:01 -06:00
Atul Kulkarni c64c23fbee Removes default visibility in debug mode and updates unit tests for alt_rsmi impl (#2091)
* Update unit tests for alt_rsmi impl

- Create distinct test executable for alt_rsmi testing
- Updated alt_rsmi tests to use public methods
- Compiles alt_rsmi.cc with ARSMI_TEST_BUILD
- Enables external linkage of internal variables
- Only for AltRsmiTests.cpp that manipulates internals
- Clean separation for test behavior

* Address review comments

* restore hidden symbol visibility

[ROCm/rccl commit: 74690ea705]
2025-12-17 10:27:00 -08:00
Ahmed Khan f17357d0d4 Add ncclCommDump API (#2068)
* Add ncclCommDump API

* remove trailing whitespace changes

* Add more proxy trace timestamps

* Add facebook_rccl namespace before proxyTrace timestamp call

* Clean up ProxyTrae construction

* Move updateProxyOpCounter to member function

* Move setProxyOpTimestamp to member function

* Move addNewProxyOp to member function

* Make internal methods private

* Make ProxyTrace thread safe

* Fix unit tests

* Fix overwritten ProxyTrace DONE setting in net.cc

[ROCm/rccl commit: 08dd75712f]
2025-12-11 15:02:35 -07:00
corey-derochie-amd de82a18790 Fixed unit-test env var list parsing and improved filtered test run speed (#1626)
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.

* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.

* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.

* Wrapped the support checks in helper functions on `TestBed`.

[ROCm/rccl commit: 18e9ad913b]
2025-12-10 10:06:44 -07:00
Atul Kulkarni 11ffeda52f Added a Process Isolated Test Runner (#1993)
* Added single process isolation support to execute tests

* Address review comments

* Update README

* Removed requirement of explicit call to clear method

* Added macros for simplified usage

* Updated tests to use process isolation framework

* Adjust summary output format for isolated tests

* Updated rccl_wrap tests

* Used process isolation in AllocTests

* Used process isolation and fixed failing tests

* Modified test output, added signal handling

Updated macros to handle lambdas

* Convert argcheck tests to isolated tests

* Convert proxy tests to isolated tests

* Remove non-supported test

* Fixed file descriptor handling and clearing env vars for tests

[ROCm/rccl commit: 7e10267dfd]
2025-12-08 10:36:05 -06:00
Atul Kulkarni 142860442a Enable MPI support to execute MPI specific unit/functional tests (#1996)
* Added MPI support to execute unit/functional tests

Update node and process validation
Updated node detection count and modified validation method
Update validation logic to include max procs and nodes

* Address review comments

* Fix warnings

* Added a new NET transport test and clean up

* Added MPI test logging mechanism

* Decoupled GTest framework

* Added Net IB functional tests

* Updated with resource guards

* Added NET IB tests and refactored code

* Update P2pWorkflow test

* Update documentation

* Add MPI_TESTS_ENABLED guard to the file

* Fix Shm and NetIB tests

* Applied refactoring and cleanup

* Replaced BufferGuard with AutoGuard

* Modified test debug logging

* Use macro to reduce NcclTypeTraits code duplication

- Replace repetitive template specializations with a single
  DEFINE_NCCL_TYPE_TRAIT macro
- Use stringification operator (#) to auto-generate type name strings
- Add #undef to keep macro from polluting namespace
- Makes adding new type mappings trivial

* Unify buffer initialization with generic pattern function

- Remove initializeBufferWithCustomPattern
- Make initializeBufferWithPattern generic with PatternFunc template param
- Now single function handles all patterns via lambda injection
- Updated all test files to use lambdas for pattern generation
- Pattern logic now visible at call site (self-documenting)

* Unify buffer verification with pluggable pattern function

- Remove verifyBufferWithCustomCheck
- Make verifyBufferData generic with PatternFunc template param
- Single function handles all verification patterns via lambda injection
- Updated all test files to use lambdas
- Better defaults: num_samples=0 means verify all elements
- Pattern logic now visible at call site (self-documenting)

* Docs: Add DeviceBufferHelpers section to MPITestRunner.md

- Document new refactored buffer initialization/verification API
- Explain pluggable pattern functions with lambda examples
- Show type mapping and automatic float/int comparison
- Include migration guide from old API to new unified functions
- Demonstrate best practices with real-world examples
- Reference recent refactoring commits (macro-based type traits)

* Docs: Update documentation and examples

- Update on DeviceBufferHelpers
- Update examples using DeviceBufferHelpers methods, e.g. data verification

* Address review comment.

- Replace manual pattern generation loop with initializeBufferWithPattern call
- Use downloadBuffer to get host copy instead of manual hipMemcpy

* Remove non-existent dependency

* Remove duplicate testcase

* Code cleanup in test files

* Moved common constants to base class

[ROCm/rccl commit: 29e1567b95]
2025-12-06 16:05:37 -06:00
Atul Kulkarni 1a986dc190 Remove legacy AltRsmi tests (#2090)
These tests will be replaced by new tests.

[ROCm/rccl commit: 8ad446b271]
2025-12-05 16:53:55 -06:00
Atul Kulkarni 63aa3bb537 Remove legacy Shm and P2p tests (#2089)
These tests will be replaced by MPI tests.

[ROCm/rccl commit: 0d797d1f6c]
2025-12-05 16:53:28 -06:00
Atul Kulkarni 86a4dd95f6 Remove static to non-static conversion used in tests (#2084)
* Remove coll_reg tests which are unsupported

* removed static to non-static conversion feature

[ROCm/rccl commit: 7ec8e73e12]
2025-12-04 18:03:14 -06:00
Atul Kulkarni 0ced7aede8 Fix rccl test suite to use hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic (#2082)
[ROCm/rccl commit: cc6e259a02]
2025-12-04 10:02:06 -06:00
Atul Kulkarni e4aef19511 Added new unit tests for AllReduce with Bias API (#2036)
* Added new unit tests for AllReduce with Bias API

* Address review comments

[ROCm/rccl commit: 7c12b0b76b]
2025-12-03 17:37:34 -06:00
Kapil S. Pawar acb0d614a5 Functional Tests for Ext-Profiler Plugin (#2007)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Added tests for ext-profiler

* Updated readme

* Updated readme

[ROCm/rccl commit: c7f400dbff]
2025-11-18 11:20:39 -06:00
Kapil S. Pawar c4d7680749 Added Functional Tests for CSV Tuner Plugin (#1968)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Updated readme

* Updated readme

* Updated readme

[ROCm/rccl commit: c8da880dc7]
2025-11-11 10:11:19 -06:00
Arm Patinyasakdikul 03e92dc942 Added copyrights for Palamida scan 7.2. (#2018)
[ROCm/rccl commit: 84fdcab68a]
2025-10-30 13:33:20 -05:00
Atul Kulkarni 884138205d Added ROCM_VERSION restriction to alloc unit tests (#1989)
[ROCm/rccl commit: 26dc7abb32]
2025-10-28 12:54:34 -05:00
Nusrat Islam d6d5fac152 Update direct AG and single node LL threshold (#1944)
* update AG direct and single node LL threshold

* update thresholds based on MI350 expeirmental results

* disable using LL for direct AG

* enable direct AG for lower GPU counts

* direct AG single node tuning

* fix in-place buffer allocation for AG unit test

* whitespace fix

* gate direct AG for gfx950 and gfx942

---------

Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>

[ROCm/rccl commit: d22a39e954]
2025-10-09 10:48:50 -05:00
Atul Kulkarni 980392b279 Updated tests based on NCCL 2.27.3-1 sync (#1892)
[ROCm/rccl commit: 9839d1c7c8]
2025-09-18 09:56:09 -05:00
Laura Promberger b9be197d53 Bump minimum cmake version to 3.16 to enable cmake 4 (#1909)
Minimum required cmake version of test/CMakeList.txt is bumped from 2.8
to 3.16. This alignes with the version used in CMakeList.txt and will
enable building with cmake 4.

[ROCm/rccl commit: 0f6fec1553]
2025-09-16 23:10:22 -05:00
Kapil S. Pawar a8f84f32a4 Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm (#1895)
* Added new unit tests for rccl_wrap

[ROCm/rccl commit: 86a6d06e40]
2025-09-15 18:00:26 -05:00
Kapil S. Pawar 80aa4daa4d Added new tests for rccl_wrap - rcclSetPipelining (#1890)
* Added tests for rcclSetPipelining

* Added conditions to skip the test

* Updated message size

[ROCm/rccl commit: f418a4c6d0]
2025-09-05 09:29:11 -05:00
ycui1984 1999f2eba8 [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm>=6.4.0 (#1867)
* [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm >= 6.4.0
* [rocm_regression] Check firmware version
* [rocm_regression] Resolve review comments
* [rocm_regression] Move hsa env checking into init once func
* [rocm_regression] Prevent hot fix version in firmware
* [rocm_regression] Improve unit tests

[ROCm/rccl commit: 361d596229]
2025-08-29 11:18:23 -05:00
Kapil S. Pawar 3d889cc189 Code coverage tests for param.cc (#1872)
* Added code coverage unit tests for param.cc

* Updated ParamTests.cpp and removed ParamTestsConfFile.txt

* Updated ParamTests.cpp

* Removed NCCL_LOG_INFO and added sample cofig file

---------

Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: c9becd89cd]
2025-08-27 09:30:37 -05:00
ishkool f500628ef2 Code coverage tests for net_socket.cc (#1840)
* Code coverage UTs for net_socket.cc

* Addressed review comments

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: c288fbf1b2]
2025-08-27 09:24:21 -05:00
corey-derochie-amd af1c448ed1 Changed TestBedChild to avoid hang if the call fails (#1875)
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.

[ROCm/rccl commit: b88c134874]
2025-08-23 00:17:34 -05:00
awelling2801 40462cc845 Added new tests for rccl_wrap - rcclUpdateThreadThreshold (#1855)
* Added tests for rccl_wrap - rcclUpdateThreadThreshold

* Skipped tests gtest_skip added

* Added tests for new functions rcclSetP2pNetChunkSize and rcclSetPxn

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: a1a65c65c4]
2025-08-21 16:39:53 -05:00
Arm Patinyasakdikul 8557ea33ad Test: delete child object to address memory leak. (#1863)
[ROCm/rccl commit: 9d3acffa5f]
2025-08-20 10:15:03 -05:00
ishkool 377160e0c9 Code Coverage: Proxy.cc tests (#1818)
* Proxy.cc tests

* Update ProxyTest.cpp

Cleaned up the code.

* Update ProxyTests.cpp

Bring back deleting dynamically allocated memory

[ROCm/rccl commit: 876f985e0f]
2025-08-15 19:06:32 -05:00
Atul Kulkarni 38e88ba87e Added new unit tests for src/enqueue.cc (#1853)
[ROCm/rccl commit: 84f3cc6a02]
2025-08-15 18:26:26 -05:00
ishkool 61a189bc84 Code Coverage Unit Tests for comm.h (#1783)
* File containing test for comm.h

* Update CommTest.cpp

Added gtest API for assert

* Update CommTest.cpp

Adding copyright

* Update CommTest.cpp

Removing info and tested as not required.

* Update and rename CommTest.cpp to CommTests.cpp

* Update CMakeLists.txt

[ROCm/rccl commit: 6453273aa6]
2025-08-15 17:44:24 -05:00
Rahul Vaidya baa6a61535 [BUILD] Fix UT packaging on Debian family OS (#1854)
* Fix UT packaging on Debian family OSes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Split OR condition when performing Debian checks

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: ee9ed3ef87]
2025-08-11 17:03:16 -05:00
Nilesh M Negi 74adb64dfb [BUILD] Fix UT packaging on Debian OS (#1848)
[ROCm/rccl commit: 5036d0e713]
2025-08-11 09:43:26 -05:00
Rahul Vaidya 70a5f2f317 Fix rccl-UnitTests packaging on Debian systems (#1846)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: cbbc713b03]
2025-08-08 12:28:56 -05:00
awelling2801 c5b4e1bc78 Created coverage tests for rccl_wrap (#1694)
* Created coverage tests for rccl_wrap

RCCL_EXPOSE_STATIC off by default

Coverage tests for rccl_wrap.cc

* Remove RCCL_EXPOSE_STATIC dependency

* Removed Rcclwrap.RcclGetAlgoInfoTest

* Remove comments

* Corrected RCCL_EXPOSE_STATIC definition logic

---------

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>

[ROCm/rccl commit: 82bea39280]
2025-08-06 14:48:00 -05:00
Atul Kulkarni 35283394ed Add unit tests for graph/xml.cc & graph/xml.h (#1833)
* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

* Added new unit tests for src/transport/shm.cc

* Added new unit tests for graph/xml.cc

[ROCm/rccl commit: 0e7d7da55d]
2025-08-01 14:20:27 -05:00
awelling2801 0d34963b35 Added tests for coll_reg (#1700)
Changes to coll_reg

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 5ecc1b7ede]
2025-07-31 13:49:23 -05:00
awelling2801 839fcb54b5 Added tests for transport.cc (#1725)
Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>

[ROCm/rccl commit: 7320752bf3]
2025-07-31 11:04:28 -05:00
Rahul Vaidya d65eb0b021 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl commit: 0adc5edc74]
2025-07-31 11:00:49 -05:00
ycui1984 39c508b80d Add collective latency profiler (#1785)
* [LatencyProfiler] Initial commit

* [LatencyProfiler] Add unit tests

* [LatencyProfiler] add more

* [LatencyProfiler] Pass unit tests

* [LatencyProfiler] Add hooks to integrate with meta internal tools

* [LatencyProfiler] Restore install.sh

* [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace

* [LatencyProfiler] Add header

[ROCm/rccl commit: 874cd657ef]
2025-07-30 14:59:28 -07:00
awelling2801 da2bb8a578 Added tests for Ipcsocket (#1690)
Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: 9843adaab2]
2025-07-29 10:03:28 -05:00
awelling2801 88dcaaddc5 Code coverage improvements for alloc.h (#1676)
* Added tests for alloc.h

* Added tests for ZeroElementCopy and MemcpyNullSrcOrDstPointer

---------

Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl commit: e118aadc14]
2025-07-29 09:19:57 -05:00
peizhang56 5c02be7b51 Add Unit Test for bitops.h (#1821)
* Add Unit Test for bitops.h

* Change the style

* Fix the code review comments

* Add more test cases

[ROCm/rccl commit: fe182d6546]
2025-07-28 11:25:15 -05:00
Atul Kulkarni de0d446e03 Added new unit tests for src/transport/p2p.cc (#1774)
[ROCm/rccl commit: 81ec6bff4c]
2025-07-25 12:57:57 -05:00
Atul Kulkarni bd53bdf447 Added new unit tests for src/transport/shm.cc (#1689)
[ROCm/rccl commit: 1c3d1b3842]
2025-07-25 05:54:42 -05:00
Atul Kulkarni c94fb7c58e Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

[ROCm/rccl commit: 275fdd43c1]
2025-07-17 11:20:49 -05:00
Nilesh M Negi 41c985462c [BUILD] Use fmt-header instead of libfmt (#1791)
[ROCm/rccl commit: 6b4ad0fd74]
2025-07-10 17:19:53 -05:00
mberenjk 1623fcc7a1 Improving build time by removing the gfx11xx and host code from rccl_float8.h (#1789)
* removing extra build time by removing the gfx11xx arch from using hip_fp8

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 697bee4ee8]
2025-07-09 14:03:47 -05:00
Rakesh Roy 82a822b646 Fix chrono build error (#1790)
[ROCm/rccl commit: dd3b1d816c]
2025-07-04 08:27:30 -05:00
Dingming Wu d34a38ccfc Add proxyTrace (#1732)
This feature tracks the proxy events and status of each send/recv op. ProxyTrace keeps a fixed number of active ops in host mem and dumps the status of each op when the program crashes or hangs.

[ROCm/rccl commit: 020dcf0a7c]
2025-06-25 23:01:34 -05:00
BertanDogancay c0c9312e38 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: aaf023976a]
2025-06-20 07:54:49 -05:00