Граф коммитов

217 Коммитов

Автор SHA1 Сообщение Дата
Marzieh Berenjkoub 858b4e76eb Merge remote-tracking branch 'nccl/master' into develop 2026-01-20 13:04:02 -06:00
Deeksha Goplani 420b3b840e Added new unit test for register.cc (#1712)
* new unit test for register.cc

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>

* Add new register API tests

* Fix debug message ordering issue

---------

Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2026-01-09 17:04:01 -06:00
Atul Kulkarni 74690ea705 Removes default visibility in debug mode and updates unit tests for alt_rsmi impl (#2091)
* Update unit tests for alt_rsmi impl

- Create distinct test executable for alt_rsmi testing
- Updated alt_rsmi tests to use public methods
- Compiles alt_rsmi.cc with ARSMI_TEST_BUILD
- Enables external linkage of internal variables
- Only for AltRsmiTests.cpp that manipulates internals
- Clean separation for test behavior

* Address review comments

* restore hidden symbol visibility
2025-12-17 10:27:00 -08:00
Ahmed Khan 08dd75712f Add ncclCommDump API (#2068)
* Add ncclCommDump API

* remove trailing whitespace changes

* Add more proxy trace timestamps

* Add facebook_rccl namespace before proxyTrace timestamp call

* Clean up ProxyTrae construction

* Move updateProxyOpCounter to member function

* Move setProxyOpTimestamp to member function

* Move addNewProxyOp to member function

* Make internal methods private

* Make ProxyTrace thread safe

* Fix unit tests

* Fix overwritten ProxyTrace DONE setting in net.cc
2025-12-11 15:02:35 -07:00
corey-derochie-amd 18e9ad913b Fixed unit-test env var list parsing and improved filtered test run speed (#1626)
* Fixed parsing of env var lists which were overwriting the mutable env var string and polluting future parses.

* Fixed all tests to obey UT_DATATYPES and UT_REDOPS filters.

* Allow tests to bail early via `GTEST_SKIP` if UT_DATATYPES or UT_REDOPS filters give a test size of zero. This allows tests to run much faster with filters on.

* Wrapped the support checks in helper functions on `TestBed`.
2025-12-10 10:06:44 -07:00
Atul Kulkarni 7e10267dfd Added a Process Isolated Test Runner (#1993)
* Added single process isolation support to execute tests

* Address review comments

* Update README

* Removed requirement of explicit call to clear method

* Added macros for simplified usage

* Updated tests to use process isolation framework

* Adjust summary output format for isolated tests

* Updated rccl_wrap tests

* Used process isolation in AllocTests

* Used process isolation and fixed failing tests

* Modified test output, added signal handling

Updated macros to handle lambdas

* Convert argcheck tests to isolated tests

* Convert proxy tests to isolated tests

* Remove non-supported test

* Fixed file descriptor handling and clearing env vars for tests
2025-12-08 10:36:05 -06:00
Atul Kulkarni 29e1567b95 Enable MPI support to execute MPI specific unit/functional tests (#1996)
* Added MPI support to execute unit/functional tests

Update node and process validation
Updated node detection count and modified validation method
Update validation logic to include max procs and nodes

* Address review comments

* Fix warnings

* Added a new NET transport test and clean up

* Added MPI test logging mechanism

* Decoupled GTest framework

* Added Net IB functional tests

* Updated with resource guards

* Added NET IB tests and refactored code

* Update P2pWorkflow test

* Update documentation

* Add MPI_TESTS_ENABLED guard to the file

* Fix Shm and NetIB tests

* Applied refactoring and cleanup

* Replaced BufferGuard with AutoGuard

* Modified test debug logging

* Use macro to reduce NcclTypeTraits code duplication

- Replace repetitive template specializations with a single
  DEFINE_NCCL_TYPE_TRAIT macro
- Use stringification operator (#) to auto-generate type name strings
- Add #undef to keep macro from polluting namespace
- Makes adding new type mappings trivial

* Unify buffer initialization with generic pattern function

- Remove initializeBufferWithCustomPattern
- Make initializeBufferWithPattern generic with PatternFunc template param
- Now single function handles all patterns via lambda injection
- Updated all test files to use lambdas for pattern generation
- Pattern logic now visible at call site (self-documenting)

* Unify buffer verification with pluggable pattern function

- Remove verifyBufferWithCustomCheck
- Make verifyBufferData generic with PatternFunc template param
- Single function handles all verification patterns via lambda injection
- Updated all test files to use lambdas
- Better defaults: num_samples=0 means verify all elements
- Pattern logic now visible at call site (self-documenting)

* Docs: Add DeviceBufferHelpers section to MPITestRunner.md

- Document new refactored buffer initialization/verification API
- Explain pluggable pattern functions with lambda examples
- Show type mapping and automatic float/int comparison
- Include migration guide from old API to new unified functions
- Demonstrate best practices with real-world examples
- Reference recent refactoring commits (macro-based type traits)

* Docs: Update documentation and examples

- Update on DeviceBufferHelpers
- Update examples using DeviceBufferHelpers methods, e.g. data verification

* Address review comment.

- Replace manual pattern generation loop with initializeBufferWithPattern call
- Use downloadBuffer to get host copy instead of manual hipMemcpy

* Remove non-existent dependency

* Remove duplicate testcase

* Code cleanup in test files

* Moved common constants to base class
2025-12-06 16:05:37 -06:00
Atul Kulkarni 8ad446b271 Remove legacy AltRsmi tests (#2090)
These tests will be replaced by new tests.
2025-12-05 16:53:55 -06:00
Atul Kulkarni 0d797d1f6c Remove legacy Shm and P2p tests (#2089)
These tests will be replaced by MPI tests.
2025-12-05 16:53:28 -06:00
Atul Kulkarni 7ec8e73e12 Remove static to non-static conversion used in tests (#2084)
* Remove coll_reg tests which are unsupported

* removed static to non-static conversion feature
2025-12-04 18:03:14 -06:00
Atul Kulkarni cc6e259a02 Fix rccl test suite to use hip_bf16.h instead of hip_bfloat16.h for the __bf16 intrinsic (#2082) 2025-12-04 10:02:06 -06:00
Atul Kulkarni 7c12b0b76b Added new unit tests for AllReduce with Bias API (#2036)
* Added new unit tests for AllReduce with Bias API

* Address review comments
2025-12-03 17:37:34 -06:00
Kapil S. Pawar c7f400dbff Functional Tests for Ext-Profiler Plugin (#2007)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Added tests for ext-profiler

* Updated readme

* Updated readme
2025-11-18 11:20:39 -06:00
Kapil S. Pawar c8da880dc7 Added Functional Tests for CSV Tuner Plugin (#1968)
* Add functional tests for CSV Tuner Plugin

* Updated directory structure

* Updated and renamed directories

* Updated csv conf files

* Updated readme

* Updated readme

* Updated readme
2025-11-11 10:11:19 -06:00
Arm Patinyasakdikul 84fdcab68a Added copyrights for Palamida scan 7.2. (#2018) 2025-10-30 13:33:20 -05:00
Atul Kulkarni 26dc7abb32 Added ROCM_VERSION restriction to alloc unit tests (#1989) 2025-10-28 12:54:34 -05:00
Nusrat Islam d22a39e954 Update direct AG and single node LL threshold (#1944)
* update AG direct and single node LL threshold

* update thresholds based on MI350 expeirmental results

* disable using LL for direct AG

* enable direct AG for lower GPU counts

* direct AG single node tuning

* fix in-place buffer allocation for AG unit test

* whitespace fix

* gate direct AG for gfx950 and gfx942

---------

Co-authored-by: Nusrat Islam <nusislam@nova-login-gtu2.prov.gtu.zts.cpe.ice.amd.com>
2025-10-09 10:48:50 -05:00
Atul Kulkarni 9839d1c7c8 Updated tests based on NCCL 2.27.3-1 sync (#1892) 2025-09-18 09:56:09 -05:00
Laura Promberger 0f6fec1553 Bump minimum cmake version to 3.16 to enable cmake 4 (#1909)
Minimum required cmake version of test/CMakeList.txt is bumped from 2.8
to 3.16. This alignes with the version used in CMakeList.txt and will
enable building with cmake 4.
2025-09-16 23:10:22 -05:00
Kapil S. Pawar 86a6d06e40 Added new tests for rccl_wrap - rcclOverrideProtocol, rcclOverrideAlgorithm (#1895)
* Added new unit tests for rccl_wrap
2025-09-15 18:00:26 -05:00
Kapil S. Pawar f418a4c6d0 Added new tests for rccl_wrap - rcclSetPipelining (#1890)
* Added tests for rcclSetPipelining

* Added conditions to skip the test

* Updated message size
2025-09-05 09:29:11 -05:00
ycui1984 361d596229 [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm>=6.4.0 (#1867)
* [rocm_regression] Return errors when HSA_NO_SCRATCH_RECLAIM=1 even for rocm >= 6.4.0
* [rocm_regression] Check firmware version
* [rocm_regression] Resolve review comments
* [rocm_regression] Move hsa env checking into init once func
* [rocm_regression] Prevent hot fix version in firmware
* [rocm_regression] Improve unit tests
2025-08-29 11:18:23 -05:00
Kapil S. Pawar c9becd89cd Code coverage tests for param.cc (#1872)
* Added code coverage unit tests for param.cc

* Updated ParamTests.cpp and removed ParamTestsConfFile.txt

* Updated ParamTests.cpp

* Removed NCCL_LOG_INFO and added sample cofig file

---------

Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>
2025-08-27 09:30:37 -05:00
ishkool c288fbf1b2 Code coverage tests for net_socket.cc (#1840)
* Code coverage UTs for net_socket.cc

* Addressed review comments

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-27 09:24:21 -05:00
corey-derochie-amd b88c134874 Changed TestBedChild to avoid hang if the call fails (#1875)
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.
2025-08-23 00:17:34 -05:00
awelling2801 a1a65c65c4 Added new tests for rccl_wrap - rcclUpdateThreadThreshold (#1855)
* Added tests for rccl_wrap - rcclUpdateThreadThreshold

* Skipped tests gtest_skip added

* Added tests for new functions rcclSetP2pNetChunkSize and rcclSetPxn

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-21 16:39:53 -05:00
Arm Patinyasakdikul 9d3acffa5f Test: delete child object to address memory leak. (#1863) 2025-08-20 10:15:03 -05:00
ishkool 876f985e0f Code Coverage: Proxy.cc tests (#1818)
* Proxy.cc tests

* Update ProxyTest.cpp

Cleaned up the code.

* Update ProxyTests.cpp

Bring back deleting dynamically allocated memory
2025-08-15 19:06:32 -05:00
Atul Kulkarni 84f3cc6a02 Added new unit tests for src/enqueue.cc (#1853) 2025-08-15 18:26:26 -05:00
ishkool 6453273aa6 Code Coverage Unit Tests for comm.h (#1783)
* File containing test for comm.h

* Update CommTest.cpp

Added gtest API for assert

* Update CommTest.cpp

Adding copyright

* Update CommTest.cpp

Removing info and tested as not required.

* Update and rename CommTest.cpp to CommTests.cpp

* Update CMakeLists.txt
2025-08-15 17:44:24 -05:00
Rahul Vaidya ee9ed3ef87 [BUILD] Fix UT packaging on Debian family OS (#1854)
* Fix UT packaging on Debian family OSes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Split OR condition when performing Debian checks

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>
2025-08-11 17:03:16 -05:00
Nilesh M Negi 5036d0e713 [BUILD] Fix UT packaging on Debian OS (#1848) 2025-08-11 09:43:26 -05:00
Rahul Vaidya cbbc713b03 Fix rccl-UnitTests packaging on Debian systems (#1846)
Signed-off-by: ravaidya <ravaidya@amd.com>
2025-08-08 12:28:56 -05:00
awelling2801 82bea39280 Created coverage tests for rccl_wrap (#1694)
* Created coverage tests for rccl_wrap

RCCL_EXPOSE_STATIC off by default

Coverage tests for rccl_wrap.cc

* Remove RCCL_EXPOSE_STATIC dependency

* Removed Rcclwrap.RcclGetAlgoInfoTest

* Remove comments

* Corrected RCCL_EXPOSE_STATIC definition logic

---------

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-06 14:48:00 -05:00
Atul Kulkarni 0e7d7da55d Add unit tests for graph/xml.cc & graph/xml.h (#1833)
* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

* Added new unit tests for src/transport/shm.cc

* Added new unit tests for graph/xml.cc
2025-08-01 14:20:27 -05:00
awelling2801 5ecc1b7ede Added tests for coll_reg (#1700)
Changes to coll_reg

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
2025-07-31 13:49:23 -05:00
awelling2801 7320752bf3 Added tests for transport.cc (#1725)
Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
2025-07-31 11:04:28 -05:00
Rahul Vaidya 0adc5edc74 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>
2025-07-31 11:00:49 -05:00
ycui1984 874cd657ef Add collective latency profiler (#1785)
* [LatencyProfiler] Initial commit

* [LatencyProfiler] Add unit tests

* [LatencyProfiler] add more

* [LatencyProfiler] Pass unit tests

* [LatencyProfiler] Add hooks to integrate with meta internal tools

* [LatencyProfiler] Restore install.sh

* [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace

* [LatencyProfiler] Add header
2025-07-30 14:59:28 -07:00
awelling2801 9843adaab2 Added tests for Ipcsocket (#1690)
Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>
2025-07-29 10:03:28 -05:00
awelling2801 e118aadc14 Code coverage improvements for alloc.h (#1676)
* Added tests for alloc.h

* Added tests for ZeroElementCopy and MemcpyNullSrcOrDstPointer

---------

Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>
2025-07-29 09:19:57 -05:00
peizhang56 fe182d6546 Add Unit Test for bitops.h (#1821)
* Add Unit Test for bitops.h

* Change the style

* Fix the code review comments

* Add more test cases
2025-07-28 11:25:15 -05:00
Atul Kulkarni 81ec6bff4c Added new unit tests for src/transport/p2p.cc (#1774) 2025-07-25 12:57:57 -05:00
Atul Kulkarni 1c3d1b3842 Added new unit tests for src/transport/shm.cc (#1689) 2025-07-25 05:54:42 -05:00
Atul Kulkarni 275fdd43c1 Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.
2025-07-17 11:20:49 -05:00
Nilesh M Negi 6b4ad0fd74 [BUILD] Use fmt-header instead of libfmt (#1791) 2025-07-10 17:19:53 -05:00
mberenjk 697bee4ee8 Improving build time by removing the gfx11xx and host code from rccl_float8.h (#1789)
* removing extra build time by removing the gfx11xx arch from using hip_fp8

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-07-09 14:03:47 -05:00
Rakesh Roy dd3b1d816c Fix chrono build error (#1790) 2025-07-04 08:27:30 -05:00
Dingming Wu 020dcf0a7c Add proxyTrace (#1732)
This feature tracks the proxy events and status of each send/recv op. ProxyTrace keeps a fixed number of active ops in host mem and dumps the status of each op when the program crashes or hangs.
2025-06-25 23:01:34 -05:00
BertanDogancay aaf023976a Merge remote-tracking branch 'nccl/master' into develop 2025-06-20 07:54:49 -05:00