gilbertlee-amd
00c3d8d850
Adding interactive mode for unit tests (UT_INTERACTIVE) ( #715 )
2023-03-21 10:58:24 -06:00
akolliasAMD
9a0d4a07a6
Test Fixes ( #710 )
...
* splitting CI tests in running SP first and MP second
* set device before hipStreamSynchronize on tests
2023-03-21 08:48:39 -06:00
Ziyue Yang
e3b2342f39
MSCCL: Improve executor and integrate scheduler ( #694 )
...
* MSCCL: improve executor and add scheduler for testing
* Use external scheduler
* Fix cmake error
* Address comments
* Fix thread safe issue
* Make MSCCL lifecycle APIs thread safe
* Make MSCCL internal scheduler aware of topology hint
* Revise error message
2023-03-14 14:34:25 -07:00
gilbertlee-amd
80ed608a9d
Multi stream unit test ( #693 )
...
* Adding multi-stream support to unit tests
2023-02-23 13:28:50 -07:00
gilbertlee-amd
f63d3b1978
Adding UnitTest timing summary (UT_SHOW_TIMING) ( #692 )
2023-02-22 08:57:13 -07:00
akolliasAMD
d119c0886e
UnitTests: made reduceScatter run a smaller amount of tests ( #691 )
2023-02-21 16:21:24 -07:00
gilbertlee-amd
a640c6983f
Unit test fail check ( #689 )
...
* Adding fall-through on unit test failure
* Workaround for hipGraph validity check issue
2023-02-18 08:50:46 -08:00
gilbertlee-amd
df46645ff8
Switching to relaxed capture for unit tests ( #679 )
2023-02-08 11:28:58 -07:00
Pedram Alizadeh
fddb5e6be8
UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #674 )
2023-02-03 17:36:30 -05:00
Pedram Alizadeh
fbe52b6caa
removed the wrapper script so that the old name is no longer referenced ( #676 )
2023-01-31 11:11:02 -05:00
akolliasAMD
24aa8bd802
added a different way for getting device count, by running it in a child process ( #665 )
2022-12-14 16:10:14 -07:00
Pedram Alizadeh
54a3da04eb
Revert "UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #662 )" ( #666 )
...
This reverts commit 8250092367 .
2022-12-14 11:28:40 -05:00
PedramAlizadeh
45872d170f
Changed the name of UnitTests to rccl-UnitTests (wrapper executable included).
2022-12-13 21:45:57 +00:00
Pedram Alizadeh
8250092367
UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #662 )
2022-12-13 16:05:09 -05:00
Ziyue Yang
adafc0f759
Add MSCCL Support ( #658 )
...
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd
faed69f9fc
Graph unit tests ( #656 )
...
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Ranjith Ramakrishnan
b397cb16ea
Correct hsa header path for new directory layout
2022-11-04 09:52:16 -07:00
raramakr
b32f38126d
Merge pull request #635 from raramakr/swdev
...
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd
ebb8b5bf63
Updating files for missing licenses ( #637 )
2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan
cf4e963aaf
Correct include and library path for new directory layout
...
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
akolliasAMD
06bce9d0c9
added stream synch after hipMemset ( #609 )
2022-08-30 16:18:37 -06:00
Edgar Gabriel
f6e00dec13
introduce support for ncclFloat16/half in UT
2022-08-24 15:28:24 +00:00
gilbertlee-amd
dae11c2aca
Disable clique AllReduce UnitTest ( #595 )
2022-08-04 18:30:00 -06:00
akolliasAMD
686dbc8bc6
updated alltoallV test to reflect how send counts are done in perf tests ( #586 )
2022-07-21 14:59:34 -06:00
akolliasAMD
8b9291eb47
moved default number of max ranks per gpu to 1
2022-06-22 17:37:49 +00:00
Edgar
a87d61db2b
extending the unit-tests for multi-rank support
2022-06-10 14:23:19 +00:00
gilbertlee-amd
700b473211
Moving opt-in custom signal handler from UnitTests into RCCL ( #550 )
...
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Edgar
2bf6d254b6
add a signal handler and backtrace
...
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling
2022-04-25 10:48:17 -04:00
Liam Wrubleski
a8f1e61f48
Packages for test and benchmark executables on all supported OSes using CPack. ( #512 )
2022-03-21 15:04:14 -06:00
akolliasAMD
65ea3d80db
Added alltoallv test and optional args variable on collective args ( #514 )
...
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
Nirmal Unnikrishnan
676a4737c1
File reorganization as per the new defined standard
...
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
gilbertlee-amd
0687940b84
Changing initialization method for UnitTests ( #510 )
2022-03-07 09:22:55 -07:00
gilbertlee-amd
699dc30f05
[UnitTests] Check process mask for custom tests ( #507 )
2022-03-02 17:24:14 -07:00
akolliasAMD
ff54e79799
Added Unit test for nccl send recv ( #506 )
...
Added Send Receive test that tests through all pairs
2022-03-02 15:50:16 -05:00
gilbertlee-amd
29ad0f5fbe
Unit test refactor ( #500 )
...
Refactoring and consolidating single-process / multi-process unit testing
2022-02-25 08:59:07 -07:00
Wenkai Du
6268b87c16
Unit tests: fix number of GPU detection ( #484 )
2022-01-05 15:06:12 -08:00
Stanley Tsang
bbbb35ceec
Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests ( #481 )
2021-12-09 11:04:31 -07:00
Wenkai Du
03a830293c
gtest: dynamically generate tests based on test machine's GPU count ( #467 )
...
* gtest: dynamically generate tests based on test machine's GPU count
* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang
a6dba6b9dd
Remove hardcoded references to /opt/rocm when using chrpath ( #469 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du
bc2932be4e
Unit Test: use range from 0 to 1 for floating point test data ( #459 )
...
* Unit Test: use range from 0 to 1 for floating point test data
* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang
2f87073514
Fixing cmake_install_prefix search to include /opt/rocm-xxxx ( #462 )
2021-11-06 07:58:26 -07:00
Stanley Tsang
7e55b211c5
Build AllReduce only mode ( #443 )
...
* Initial commit of all_reduce_only support
* Working AllReduce only build
* Removing printfs and restoring release build
* Restore P2P index
* Updates to build_allreduce_only mode.
* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Stanley Tsang
d23dfc12c1
Re-enable use of chrpath to manually set rpath for unit tests. ( #448 )
...
* Re-enable use of chrpath to manually set rpath for unit tests.
* Add check for chrpath
2021-10-26 11:10:04 -06:00
Wenkai Du
1faff323b4
Unit Test: support ncclAvg
2021-08-25 14:15:54 -07:00
Wenkai Du
215904ee8e
Fix unit tests static build ( #403 )
2021-07-09 09:35:32 -07:00
Eiden Yoshida
5c3e7d8b67
Fix static builds ( #393 )
2021-06-23 09:19:48 -06:00
Wenkai Du
59d2867b01
Remove hard coded /opt/rocm from cmake ( #396 )
2021-06-21 08:29:23 -07:00
Wenkai Du
6dcae8a459
Select sendrecv path based on collective data size ( #391 )
...
* Select sendrecv path based on collective data size
* Add comments on packing and unpacking group field
* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Stanley Tsang
f6f5e16fe6
Fixing bug with ExtractSubDataset function not fully initializing subdataset ( #390 )
2021-06-10 14:35:39 -06:00
Stanley Tsang
256403d4f0
Adding support for hipMallocManaged() in unit tests ( #375 )
...
* Adding HMM support for unit tests
* Fixing HMM opt-in check
2021-05-25 17:07:12 -06:00