PedramAlizadeh
45872d170f
Changed the name of UnitTests to rccl-UnitTests (wrapper executable included).
2022-12-13 21:45:57 +00:00
Pedram Alizadeh
8250092367
UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) ( #662 )
2022-12-13 16:05:09 -05:00
Ziyue Yang
adafc0f759
Add MSCCL Support ( #658 )
...
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd
faed69f9fc
Graph unit tests ( #656 )
...
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Ranjith Ramakrishnan
b397cb16ea
Correct hsa header path for new directory layout
2022-11-04 09:52:16 -07:00
raramakr
b32f38126d
Merge pull request #635 from raramakr/swdev
...
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd
ebb8b5bf63
Updating files for missing licenses ( #637 )
2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan
cf4e963aaf
Correct include and library path for new directory layout
...
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
akolliasAMD
06bce9d0c9
added stream synch after hipMemset ( #609 )
2022-08-30 16:18:37 -06:00
Edgar Gabriel
f6e00dec13
introduce support for ncclFloat16/half in UT
2022-08-24 15:28:24 +00:00
gilbertlee-amd
dae11c2aca
Disable clique AllReduce UnitTest ( #595 )
2022-08-04 18:30:00 -06:00
akolliasAMD
686dbc8bc6
updated alltoallV test to reflect how send counts are done in perf tests ( #586 )
2022-07-21 14:59:34 -06:00
akolliasAMD
8b9291eb47
moved default number of max ranks per gpu to 1
2022-06-22 17:37:49 +00:00
Edgar
a87d61db2b
extending the unit-tests for multi-rank support
2022-06-10 14:23:19 +00:00
gilbertlee-amd
700b473211
Moving opt-in custom signal handler from UnitTests into RCCL ( #550 )
...
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Edgar
2bf6d254b6
add a signal handler and backtrace
...
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling
2022-04-25 10:48:17 -04:00
Liam Wrubleski
a8f1e61f48
Packages for test and benchmark executables on all supported OSes using CPack. ( #512 )
2022-03-21 15:04:14 -06:00
akolliasAMD
65ea3d80db
Added alltoallv test and optional args variable on collective args ( #514 )
...
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
Nirmal Unnikrishnan
676a4737c1
File reorganization as per the new defined standard
...
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
gilbertlee-amd
0687940b84
Changing initialization method for UnitTests ( #510 )
2022-03-07 09:22:55 -07:00
gilbertlee-amd
699dc30f05
[UnitTests] Check process mask for custom tests ( #507 )
2022-03-02 17:24:14 -07:00
akolliasAMD
ff54e79799
Added Unit test for nccl send recv ( #506 )
...
Added Send Receive test that tests through all pairs
2022-03-02 15:50:16 -05:00
gilbertlee-amd
29ad0f5fbe
Unit test refactor ( #500 )
...
Refactoring and consolidating single-process / multi-process unit testing
2022-02-25 08:59:07 -07:00
Wenkai Du
6268b87c16
Unit tests: fix number of GPU detection ( #484 )
2022-01-05 15:06:12 -08:00
Stanley Tsang
bbbb35ceec
Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests ( #481 )
2021-12-09 11:04:31 -07:00
Wenkai Du
03a830293c
gtest: dynamically generate tests based on test machine's GPU count ( #467 )
...
* gtest: dynamically generate tests based on test machine's GPU count
* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang
a6dba6b9dd
Remove hardcoded references to /opt/rocm when using chrpath ( #469 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du
bc2932be4e
Unit Test: use range from 0 to 1 for floating point test data ( #459 )
...
* Unit Test: use range from 0 to 1 for floating point test data
* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang
2f87073514
Fixing cmake_install_prefix search to include /opt/rocm-xxxx ( #462 )
2021-11-06 07:58:26 -07:00
Stanley Tsang
7e55b211c5
Build AllReduce only mode ( #443 )
...
* Initial commit of all_reduce_only support
* Working AllReduce only build
* Removing printfs and restoring release build
* Restore P2P index
* Updates to build_allreduce_only mode.
* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Stanley Tsang
d23dfc12c1
Re-enable use of chrpath to manually set rpath for unit tests. ( #448 )
...
* Re-enable use of chrpath to manually set rpath for unit tests.
* Add check for chrpath
2021-10-26 11:10:04 -06:00
Wenkai Du
1faff323b4
Unit Test: support ncclAvg
2021-08-25 14:15:54 -07:00
Wenkai Du
215904ee8e
Fix unit tests static build ( #403 )
2021-07-09 09:35:32 -07:00
Eiden Yoshida
5c3e7d8b67
Fix static builds ( #393 )
2021-06-23 09:19:48 -06:00
Wenkai Du
59d2867b01
Remove hard coded /opt/rocm from cmake ( #396 )
2021-06-21 08:29:23 -07:00
Wenkai Du
6dcae8a459
Select sendrecv path based on collective data size ( #391 )
...
* Select sendrecv path based on collective data size
* Add comments on packing and unpacking group field
* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Stanley Tsang
f6f5e16fe6
Fixing bug with ExtractSubDataset function not fully initializing subdataset ( #390 )
2021-06-10 14:35:39 -06:00
Stanley Tsang
256403d4f0
Adding support for hipMallocManaged() in unit tests ( #375 )
...
* Adding HMM support for unit tests
* Fixing HMM opt-in check
2021-05-25 17:07:12 -06:00
gilbertlee-amd
2daadcc834
Disabling env var caching for all unit tests ( #371 )
...
* Disabling env var caching for all unit tests
2021-05-18 12:56:30 -06:00
Stanley Tsang
0b2bfdd6d8
Multiprocess unit test various fixes ( #367 )
...
* Re-enabling mp unit tests
* Fixing shared memory leak and other bugs related to shared mem for MP unit tests
* Revert 43bfbfc97bf9edbae1f386d461439091618ff8ed
* Further tightening up unlinks
* Moving test check macros to separate header file
* Tightening up shared memory unlinking for clique kernels, add munmap for host barrier for MP unit tests
* Updating new MP unit test
* Fixing mqueue bug
* Fixing memory leak in MP unit tests
2021-05-14 09:38:49 -06:00
gilbertlee-amd
e796b1645c
Clique tuning upgrade ( #352 ) ( #19 )
...
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
2021-05-11 08:44:59 -06:00
Wenkai Du
a4ea1fed5b
Merge remote-tracking branch 'nccl/master' into develop
2021-05-05 16:01:01 -07:00
Wenkai Du
8e180cf087
Revert "Port alltoall[v]" ( #325 )
...
This reverts commit f4d5d3d620 .
2021-03-06 13:59:31 -08:00
Gilbert Lee
f1a9ce3fa5
Using GTEST_SKIP() to skip unit tests that have insufficient devices. Skipping out earlier
2021-02-09 03:54:04 +00:00
Stanley Tsang
d00b7d17bd
Update MP UT to support arbitrary # of GPUs; multiple bugfixes ( #16 )
...
* Fixing temp file creation/deletion for Clique kernel mode.
* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs
* GroupCall MP UT properly quits when too many devices specified
* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script
2021-02-05 16:49:25 -08:00
Wenkai Du
ab1e7a0318
Merge remote-tracking branch 'origin/develop' into 2.8.3
2021-02-04 20:02:34 -05:00
Gilbert Lee
01a998b17c
Removing in-place tests from Combined calls (no support for send/recv)
2021-01-28 20:09:03 +00:00
gilbertlee-amd
3e62ceddc5
Clique kernel support ( #295 ) ( #15 )
...
* Adding experimental clique-based kernels (opt-in only)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
2021-01-28 09:45:01 -07:00
Stanley Tsang
d3fa257682
Adding multiprocess unit tests ( #312 )
...
Adding multiprocess unit tests for collectives.
To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess
2021-01-15 16:34:36 -07:00
Wenkai Du
b33a2cac8b
gtest: add scatter to combined calls and use loops ( #303 )
...
* gtest: add scatter to combined calls and use loops
* gtest: run validation inside loop
* gtest: revert small element count to 2520
* gtest: fix memory leak in validation
(cherry picked from commit b0853ccd51 )
* Fix combined call UT
* Fix memory leak
* Fix alltoallv test
2021-01-14 19:28:01 -05:00