Commit Graph

95 Commits

Author SHA1 Message Date
gilbertlee-amd 00c3d8d850 Adding interactive mode for unit tests (UT_INTERACTIVE) (#715) 2023-03-21 10:58:24 -06:00
akolliasAMD 9a0d4a07a6 Test Fixes (#710)
* splitting CI tests in running SP first and MP second
* set device before hipStreamSynchronize on tests
2023-03-21 08:48:39 -06:00
Ziyue Yang e3b2342f39 MSCCL: Improve executor and integrate scheduler (#694)
* MSCCL: improve executor and add scheduler for testing

* Use external scheduler

* Fix cmake error

* Address comments

* Fix thread safe issue

* Make MSCCL lifecycle APIs thread safe

* Make MSCCL internal scheduler aware of topology hint

* Revise error message
2023-03-14 14:34:25 -07:00
gilbertlee-amd 80ed608a9d Multi stream unit test (#693)
* Adding multi-stream support to unit tests
2023-02-23 13:28:50 -07:00
gilbertlee-amd f63d3b1978 Adding UnitTest timing summary (UT_SHOW_TIMING) (#692) 2023-02-22 08:57:13 -07:00
akolliasAMD d119c0886e UnitTests: made reduceScatter run a smaller amount of tests (#691) 2023-02-21 16:21:24 -07:00
gilbertlee-amd a640c6983f Unit test fail check (#689)
* Adding fall-through on unit test failure

* Workaround for hipGraph validity check issue
2023-02-18 08:50:46 -08:00
gilbertlee-amd df46645ff8 Switching to relaxed capture for unit tests (#679) 2023-02-08 11:28:58 -07:00
Pedram Alizadeh fddb5e6be8 UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#674) 2023-02-03 17:36:30 -05:00
Pedram Alizadeh fbe52b6caa removed the wrapper script so that the old name is no longer referenced (#676) 2023-01-31 11:11:02 -05:00
akolliasAMD 24aa8bd802 added a different way for getting device count, by running it in a child process (#665) 2022-12-14 16:10:14 -07:00
Pedram Alizadeh 54a3da04eb Revert "UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#662)" (#666)
This reverts commit 8250092367.
2022-12-14 11:28:40 -05:00
PedramAlizadeh 45872d170f Changed the name of UnitTests to rccl-UnitTests (wrapper executable included). 2022-12-13 21:45:57 +00:00
Pedram Alizadeh 8250092367 UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#662) 2022-12-13 16:05:09 -05:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd faed69f9fc Graph unit tests (#656)
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Ranjith Ramakrishnan b397cb16ea Correct hsa header path for new directory layout 2022-11-04 09:52:16 -07:00
raramakr b32f38126d Merge pull request #635 from raramakr/swdev
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd ebb8b5bf63 Updating files for missing licenses (#637) 2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan cf4e963aaf Correct include and library path for new directory layout
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
akolliasAMD 06bce9d0c9 added stream synch after hipMemset (#609) 2022-08-30 16:18:37 -06:00
Edgar Gabriel f6e00dec13 introduce support for ncclFloat16/half in UT 2022-08-24 15:28:24 +00:00
gilbertlee-amd dae11c2aca Disable clique AllReduce UnitTest (#595) 2022-08-04 18:30:00 -06:00
akolliasAMD 686dbc8bc6 updated alltoallV test to reflect how send counts are done in perf tests (#586) 2022-07-21 14:59:34 -06:00
akolliasAMD 8b9291eb47 moved default number of max ranks per gpu to 1 2022-06-22 17:37:49 +00:00
Edgar a87d61db2b extending the unit-tests for multi-rank support 2022-06-10 14:23:19 +00:00
gilbertlee-amd 700b473211 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Edgar 2bf6d254b6 add a signal handler and backtrace
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling
2022-04-25 10:48:17 -04:00
Liam Wrubleski a8f1e61f48 Packages for test and benchmark executables on all supported OSes using CPack. (#512) 2022-03-21 15:04:14 -06:00
akolliasAMD 65ea3d80db Added alltoallv test and optional args variable on collective args (#514)
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
Nirmal Unnikrishnan 676a4737c1 File reorganization as per the new defined standard
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
gilbertlee-amd 0687940b84 Changing initialization method for UnitTests (#510) 2022-03-07 09:22:55 -07:00
gilbertlee-amd 699dc30f05 [UnitTests] Check process mask for custom tests (#507) 2022-03-02 17:24:14 -07:00
akolliasAMD ff54e79799 Added Unit test for nccl send recv (#506)
Added Send Receive test that tests through all pairs
2022-03-02 15:50:16 -05:00
gilbertlee-amd 29ad0f5fbe Unit test refactor (#500)
Refactoring and consolidating single-process / multi-process unit testing
2022-02-25 08:59:07 -07:00
Wenkai Du 6268b87c16 Unit tests: fix number of GPU detection (#484) 2022-01-05 15:06:12 -08:00
Stanley Tsang bbbb35ceec Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests (#481) 2021-12-09 11:04:31 -07:00
Wenkai Du 03a830293c gtest: dynamically generate tests based on test machine's GPU count (#467)
* gtest: dynamically generate tests based on test machine's GPU count

* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang a6dba6b9dd Remove hardcoded references to /opt/rocm when using chrpath (#469)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du bc2932be4e Unit Test: use range from 0 to 1 for floating point test data (#459)
* Unit Test: use range from 0 to 1 for floating point test data

* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang 2f87073514 Fixing cmake_install_prefix search to include /opt/rocm-xxxx (#462) 2021-11-06 07:58:26 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Stanley Tsang d23dfc12c1 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath
2021-10-26 11:10:04 -06:00
Wenkai Du 1faff323b4 Unit Test: support ncclAvg 2021-08-25 14:15:54 -07:00
Wenkai Du 215904ee8e Fix unit tests static build (#403) 2021-07-09 09:35:32 -07:00
Eiden Yoshida 5c3e7d8b67 Fix static builds (#393) 2021-06-23 09:19:48 -06:00
Wenkai Du 59d2867b01 Remove hard coded /opt/rocm from cmake (#396) 2021-06-21 08:29:23 -07:00
Wenkai Du 6dcae8a459 Select sendrecv path based on collective data size (#391)
* Select sendrecv path based on collective data size

* Add comments on packing and unpacking group field

* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Stanley Tsang f6f5e16fe6 Fixing bug with ExtractSubDataset function not fully initializing subdataset (#390) 2021-06-10 14:35:39 -06:00
Stanley Tsang 256403d4f0 Adding support for hipMallocManaged() in unit tests (#375)
* Adding HMM support for unit tests

* Fixing HMM opt-in check
2021-05-25 17:07:12 -06:00