Commit Graph

110 Commits

Author SHA1 Message Date
akolliasAMD f4858e14b2 rearranged how the min and max functions are part of msccl (#1025)
* rearranged how the min and max functions are part of msccl

* added more coverage on in place graph tests
2023-12-21 08:58:33 -07:00
akolliasAMD d8dc282eeb AllReduceTests,fixed the number of roots (#925) 2023-10-20 10:25:11 -06:00
Bertan Dogancay c1f57a7041 Modify All-To-All doc (#896)
* Modify All-To-All doc

* Update nccl.h.in

* update unit-tests

---------

Co-authored-by: gilbertlee-amd <44450918+gilbertlee-amd@users.noreply.github.com>
2023-09-27 12:45:21 -04:00
Bertan Dogancay 0a01dc2f19 Add 0-byte test for send/recv (#865) 2023-08-29 09:14:18 -06:00
Bertan Dogancay 9d11cd092f Add ncclCommSplit test (#852)
Add ncclSplitCommTest
2023-08-25 16:26:45 -06:00
gilbertlee-amd a5a25bdff7 Removing unnecessary chrpath check for unit tests (#811) 2023-07-20 10:28:04 -06:00
Wenkai Du ce6a2ffac8 Merge pull request #782 from ROCmSoftwarePlatform/2.18.3
Sync up with NCCL 2.18.3
2023-06-29 15:04:16 -07:00
akolliasAMD cf8cfa88a8 Re-enabled graph tests (#736)
* enabled graph tests
* joined multi and single process CI testing
2023-06-29 08:08:17 -06:00
gilbertlee-amd f7c553edad Report unit test environment variable values as part of output (#789) 2023-06-29 07:13:05 -06:00
Wenkai Du abd0615351 Merge remote-tracking branch 'nccl/master' into develop 2023-06-26 22:51:56 +00:00
Pedram Alizadeh 520f15e61b resolving the pthread-gtest linking issue for rccl-UnitTests (#768) 2023-06-06 14:21:40 -04:00
gilbertlee-amd 777d8747a5 Refactoring CMakeFiles (#755) 2023-05-25 16:08:54 -06:00
Pedram Alizadeh 53c1c38f0e Disabled hipgraph tests! (#725) 2023-04-13 17:42:05 -04:00
akolliasAMD 2ce7d971e5 lessened the amount of child processes to active ones (#720) 2023-04-11 08:59:56 -06:00
gilbertlee-amd 27e0cb43c2 Unit test performance refactor (#700)
* Refactoring unit tests to improve performance
* Spawning child processes during InitComms instead of on TestBed construction
* Temporarily disabling graph unit tests
2023-04-06 12:28:53 -06:00
gilbertlee-amd 00c3d8d850 Adding interactive mode for unit tests (UT_INTERACTIVE) (#715) 2023-03-21 10:58:24 -06:00
akolliasAMD 9a0d4a07a6 Test Fixes (#710)
* splitting CI tests in running SP first and MP second
* set device before hipStreamSynchronize on tests
2023-03-21 08:48:39 -06:00
Ziyue Yang e3b2342f39 MSCCL: Improve executor and integrate scheduler (#694)
* MSCCL: improve executor and add scheduler for testing

* Use external scheduler

* Fix cmake error

* Address comments

* Fix thread safe issue

* Make MSCCL lifecycle APIs thread safe

* Make MSCCL internal scheduler aware of topology hint

* Revise error message
2023-03-14 14:34:25 -07:00
gilbertlee-amd 80ed608a9d Multi stream unit test (#693)
* Adding multi-stream support to unit tests
2023-02-23 13:28:50 -07:00
gilbertlee-amd f63d3b1978 Adding UnitTest timing summary (UT_SHOW_TIMING) (#692) 2023-02-22 08:57:13 -07:00
akolliasAMD d119c0886e UnitTests: made reduceScatter run a smaller amount of tests (#691) 2023-02-21 16:21:24 -07:00
gilbertlee-amd a640c6983f Unit test fail check (#689)
* Adding fall-through on unit test failure

* Workaround for hipGraph validity check issue
2023-02-18 08:50:46 -08:00
gilbertlee-amd df46645ff8 Switching to relaxed capture for unit tests (#679) 2023-02-08 11:28:58 -07:00
Pedram Alizadeh fddb5e6be8 UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#674) 2023-02-03 17:36:30 -05:00
Pedram Alizadeh fbe52b6caa removed the wrapper script so that the old name is no longer referenced (#676) 2023-01-31 11:11:02 -05:00
akolliasAMD 24aa8bd802 added a different way for getting device count, by running it in a child process (#665) 2022-12-14 16:10:14 -07:00
Pedram Alizadeh 54a3da04eb Revert "UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#662)" (#666)
This reverts commit 8250092367.
2022-12-14 11:28:40 -05:00
PedramAlizadeh 45872d170f Changed the name of UnitTests to rccl-UnitTests (wrapper executable included). 2022-12-13 21:45:57 +00:00
Pedram Alizadeh 8250092367 UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#662) 2022-12-13 16:05:09 -05:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd faed69f9fc Graph unit tests (#656)
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Ranjith Ramakrishnan b397cb16ea Correct hsa header path for new directory layout 2022-11-04 09:52:16 -07:00
raramakr b32f38126d Merge pull request #635 from raramakr/swdev
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd ebb8b5bf63 Updating files for missing licenses (#637) 2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan cf4e963aaf Correct include and library path for new directory layout
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
akolliasAMD 06bce9d0c9 added stream synch after hipMemset (#609) 2022-08-30 16:18:37 -06:00
Edgar Gabriel f6e00dec13 introduce support for ncclFloat16/half in UT 2022-08-24 15:28:24 +00:00
gilbertlee-amd dae11c2aca Disable clique AllReduce UnitTest (#595) 2022-08-04 18:30:00 -06:00
akolliasAMD 686dbc8bc6 updated alltoallV test to reflect how send counts are done in perf tests (#586) 2022-07-21 14:59:34 -06:00
akolliasAMD 8b9291eb47 moved default number of max ranks per gpu to 1 2022-06-22 17:37:49 +00:00
Edgar a87d61db2b extending the unit-tests for multi-rank support 2022-06-10 14:23:19 +00:00
gilbertlee-amd 700b473211 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Edgar 2bf6d254b6 add a signal handler and backtrace
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling
2022-04-25 10:48:17 -04:00
Liam Wrubleski a8f1e61f48 Packages for test and benchmark executables on all supported OSes using CPack. (#512) 2022-03-21 15:04:14 -06:00
akolliasAMD 65ea3d80db Added alltoallv test and optional args variable on collective args (#514)
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
Nirmal Unnikrishnan 676a4737c1 File reorganization as per the new defined standard
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
gilbertlee-amd 0687940b84 Changing initialization method for UnitTests (#510) 2022-03-07 09:22:55 -07:00
gilbertlee-amd 699dc30f05 [UnitTests] Check process mask for custom tests (#507) 2022-03-02 17:24:14 -07:00
akolliasAMD ff54e79799 Added Unit test for nccl send recv (#506)
Added Send Receive test that tests through all pairs
2022-03-02 15:50:16 -05:00
gilbertlee-amd 29ad0f5fbe Unit test refactor (#500)
Refactoring and consolidating single-process / multi-process unit testing
2022-02-25 08:59:07 -07:00