Граф коммитов

195 Коммитов

Автор SHA1 Сообщение Дата
Kapil S. Pawar c9becd89cd Code coverage tests for param.cc (#1872)
* Added code coverage unit tests for param.cc

* Updated ParamTests.cpp and removed ParamTestsConfFile.txt

* Updated ParamTests.cpp

* Removed NCCL_LOG_INFO and added sample cofig file

---------

Co-authored-by: Pawar <kpawar@ctr2-alola-ctrl-01.amd.com>
2025-08-27 09:30:37 -05:00
ishkool c288fbf1b2 Code coverage tests for net_socket.cc (#1840)
* Code coverage UTs for net_socket.cc

* Addressed review comments

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-27 09:24:21 -05:00
corey-derochie-amd b88c134874 Changed TestBedChild to avoid hang if the call fails (#1875)
Changed `TestBedChild` protocol to send the result code before the return value to avoid hanging if the call fails. Switched `TestBedChild::GetUniqueId` to use this.
2025-08-23 00:17:34 -05:00
awelling2801 a1a65c65c4 Added new tests for rccl_wrap - rcclUpdateThreadThreshold (#1855)
* Added tests for rccl_wrap - rcclUpdateThreadThreshold

* Skipped tests gtest_skip added

* Added tests for new functions rcclSetP2pNetChunkSize and rcclSetPxn

---------

Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-21 16:39:53 -05:00
Arm Patinyasakdikul 9d3acffa5f Test: delete child object to address memory leak. (#1863) 2025-08-20 10:15:03 -05:00
ishkool 876f985e0f Code Coverage: Proxy.cc tests (#1818)
* Proxy.cc tests

* Update ProxyTest.cpp

Cleaned up the code.

* Update ProxyTests.cpp

Bring back deleting dynamically allocated memory
2025-08-15 19:06:32 -05:00
Atul Kulkarni 84f3cc6a02 Added new unit tests for src/enqueue.cc (#1853) 2025-08-15 18:26:26 -05:00
ishkool 6453273aa6 Code Coverage Unit Tests for comm.h (#1783)
* File containing test for comm.h

* Update CommTest.cpp

Added gtest API for assert

* Update CommTest.cpp

Adding copyright

* Update CommTest.cpp

Removing info and tested as not required.

* Update and rename CommTest.cpp to CommTests.cpp

* Update CMakeLists.txt
2025-08-15 17:44:24 -05:00
Rahul Vaidya ee9ed3ef87 [BUILD] Fix UT packaging on Debian family OS (#1854)
* Fix UT packaging on Debian family OSes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Split OR condition when performing Debian checks

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>
2025-08-11 17:03:16 -05:00
Nilesh M Negi 5036d0e713 [BUILD] Fix UT packaging on Debian OS (#1848) 2025-08-11 09:43:26 -05:00
Rahul Vaidya cbbc713b03 Fix rccl-UnitTests packaging on Debian systems (#1846)
Signed-off-by: ravaidya <ravaidya@amd.com>
2025-08-08 12:28:56 -05:00
awelling2801 82bea39280 Created coverage tests for rccl_wrap (#1694)
* Created coverage tests for rccl_wrap

RCCL_EXPOSE_STATIC off by default

Coverage tests for rccl_wrap.cc

* Remove RCCL_EXPOSE_STATIC dependency

* Removed Rcclwrap.RcclGetAlgoInfoTest

* Remove comments

* Corrected RCCL_EXPOSE_STATIC definition logic

---------

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
Co-authored-by: Atul Kulkarni <atul.kulkarni@amd.com>
2025-08-06 14:48:00 -05:00
Atul Kulkarni 0e7d7da55d Add unit tests for graph/xml.cc & graph/xml.h (#1833)
* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.

* Added new unit tests for src/transport/shm.cc

* Added new unit tests for graph/xml.cc
2025-08-01 14:20:27 -05:00
awelling2801 5ecc1b7ede Added tests for coll_reg (#1700)
Changes to coll_reg

Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
2025-07-31 13:49:23 -05:00
awelling2801 7320752bf3 Added tests for transport.cc (#1725)
Co-authored-by: Welling <awelling@ctr2-alola-login-01.amd.com>
2025-07-31 11:04:28 -05:00
Rahul Vaidya 0adc5edc74 Fix RHEL10 packaging for rcclras and rccl-UnitTests (#1831)
Signed-off-by: ravaidya <ravaidya@amd.com>
2025-07-31 11:00:49 -05:00
ycui1984 874cd657ef Add collective latency profiler (#1785)
* [LatencyProfiler] Initial commit

* [LatencyProfiler] Add unit tests

* [LatencyProfiler] add more

* [LatencyProfiler] Pass unit tests

* [LatencyProfiler] Add hooks to integrate with meta internal tools

* [LatencyProfiler] Restore install.sh

* [LatencyProfiler] Resolved comments 1. add proper license 2. use proper namespace

* [LatencyProfiler] Add header
2025-07-30 14:59:28 -07:00
awelling2801 9843adaab2 Added tests for Ipcsocket (#1690)
Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>
2025-07-29 10:03:28 -05:00
awelling2801 e118aadc14 Code coverage improvements for alloc.h (#1676)
* Added tests for alloc.h

* Added tests for ZeroElementCopy and MemcpyNullSrcOrDstPointer

---------

Co-authored-by: Welling <awelling@ctr2-alola-ctrl-01.amd.com>
2025-07-29 09:19:57 -05:00
peizhang56 fe182d6546 Add Unit Test for bitops.h (#1821)
* Add Unit Test for bitops.h

* Change the style

* Fix the code review comments

* Add more test cases
2025-07-28 11:25:15 -05:00
Atul Kulkarni 81ec6bff4c Added new unit tests for src/transport/p2p.cc (#1774) 2025-07-25 12:57:57 -05:00
Atul Kulkarni 1c3d1b3842 Added new unit tests for src/transport/shm.cc (#1689) 2025-07-25 05:54:42 -05:00
Atul Kulkarni 275fdd43c1 Code coverage improvements (#1665)
* Increased max stack size to 640

* Added new binary for executing unit tests

Added new unit tests for argcheck.cc and alt_rsmi.cc files

Modified the method to execute unit tests to cover static methods
by using a bash script to convert static to non-static functions
and variables on the fly restricted to debug build type.
2025-07-17 11:20:49 -05:00
Nilesh M Negi 6b4ad0fd74 [BUILD] Use fmt-header instead of libfmt (#1791) 2025-07-10 17:19:53 -05:00
mberenjk 697bee4ee8 Improving build time by removing the gfx11xx and host code from rccl_float8.h (#1789)
* removing extra build time by removing the gfx11xx arch from using hip_fp8

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-07-09 14:03:47 -05:00
Rakesh Roy dd3b1d816c Fix chrono build error (#1790) 2025-07-04 08:27:30 -05:00
Dingming Wu 020dcf0a7c Add proxyTrace (#1732)
This feature tracks the proxy events and status of each send/recv op. ProxyTrace keeps a fixed number of active ops in host mem and dumps the status of each op when the program crashes or hangs.
2025-06-25 23:01:34 -05:00
BertanDogancay aaf023976a Merge remote-tracking branch 'nccl/master' into develop 2025-06-20 07:54:49 -05:00
Tim ba97c9c18b replayer update v0 (#1733)
* First version of new replayer, with comments on future TODOs

* plus minor fixes for UT

* Updated format of recorder, especially in binary department, according to replayer's need
2025-06-13 15:05:34 -04:00
Arm Patinyasakdikul 6c37ae9470 Added missing copyright message. (#1742)
* Added missing copyright message.

* addressed comments.
2025-06-12 09:58:01 -05:00
Atul Kulkarni 682ed36fe6 Added new ENABLE_CODE_COVERAGE option. (#1664)
Modified install.sh script to add this new option
2025-06-10 12:12:36 -05:00
vstojilj 2ac44cfe4e SWDEV-536040 - Include <thread> header (#1724) 2025-06-06 10:28:11 -06:00
Arm Patinyasakdikul c07445d5b4 Test: bump max stacksize once again to match current expectation. 2025-05-23 11:18:25 -05:00
Arm Patinyasakdikul 523e0893e4 Test: Change max stack size to 520 to accomodate new ROCm changes. 2025-05-21 20:21:27 -05:00
corey-derochie-amd 170acf3bda Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues. (#1546)
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"

This reverts commit 824b81c034.

* [UT] Modify max stack size to 496

* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION

* addressing the ci failure

* Adding the device tag

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-14 15:33:03 -05:00
mberenjk e70003736e Write JSON file to /tmp directory to avoid incorrect write access in recorderTest (#1680)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-07 13:58:27 -05:00
Siu Chi Chan 9525c5b2ef rccl-UnitTests - link to dl library (#1673) 2025-05-02 21:20:22 -05:00
deeksha-amd 2486838465 Added new tests for improving the code coverage (#1656)
Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com>
2025-04-30 18:01:11 -05:00
BertanDogancay a6bf9bfc9e Merge remote-tracking branch 'nccl/master' into develop 2025-04-23 20:47:43 -07:00
gilbertlee-amd ee85a70bb4 Adding UT_DEBUG_PAUSE to unit tests (#1653) 2025-04-21 21:15:07 -06:00
Tim 9a55ff60a9 RCCL Replayer update (#1603)
RCCL recorder w/ suggested change and UT
2025-04-19 00:21:27 -04:00
AbandiGa 7a84c5dbb0 added copyright (#1635) 2025-04-14 09:46:18 -05:00
BertanDogancay 0b2062c560 Merge remote-tracking branch 'nccl/master' into develop 2025-03-27 12:53:04 -05:00
Nilesh M Negi d6b987a53f [UT] Increase stack size for StandaloneTests to 480 (#1616)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-03-21 21:33:32 -05:00
gilbertlee-amd 626dc50ab5 Removing the experimental clique kernel files (#1610) 2025-03-20 18:10:01 -06:00
gilbertlee-amd 9a4e49ff1a Psuedo-randomly adding zero-byte sends in AllToAllv unit test (#1597) 2025-03-20 17:00:48 -06:00
mberenjk 5f691aaf65 Skipping AllReduce test on more than 8 ranks for FP8 type on Hyabusa (#1598)
* Skipping AllReduce FP8 test on 9 to 16 ranks (gfx90a) as it's using Tree algorithm not RING

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-03-17 10:22:49 -05:00
Wenkai Du 4237caad69 Limit P2P channels per peer to not exceeding max channels (#1594)
* Limit P2P channels per peer to not exceeding max channels

* [UT] test single GPU cases for all collectives

* [UT] fix out of range root value
2025-03-11 09:32:09 -07:00
isaki001 59c55842f1 fix the size of the recv buffer in AllGather UBR test (#1564) 2025-03-05 11:42:15 -06:00
Nilesh M Negi 4e406acc43 [UT] Include iomanip if not defined (#1510)
* [UT] Include iomanip if not defined

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Remove include guards

`iomanip.h` has pre-defined include guards. These are not needed.

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2025-02-11 08:48:47 -07:00