Commit-Graf

60 Incheckningar

Upphovsman SHA1 Meddelande Datum
Wenkai Du 6268b87c16 Unit tests: fix number of GPU detection (#484) 2022-01-05 15:06:12 -08:00
Stanley Tsang bbbb35ceec Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests (#481) 2021-12-09 11:04:31 -07:00
Wenkai Du 03a830293c gtest: dynamically generate tests based on test machine's GPU count (#467)
* gtest: dynamically generate tests based on test machine's GPU count

* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang a6dba6b9dd Remove hardcoded references to /opt/rocm when using chrpath (#469)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du bc2932be4e Unit Test: use range from 0 to 1 for floating point test data (#459)
* Unit Test: use range from 0 to 1 for floating point test data

* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang 2f87073514 Fixing cmake_install_prefix search to include /opt/rocm-xxxx (#462) 2021-11-06 07:58:26 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Stanley Tsang d23dfc12c1 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath
2021-10-26 11:10:04 -06:00
Wenkai Du 1faff323b4 Unit Test: support ncclAvg 2021-08-25 14:15:54 -07:00
Wenkai Du 215904ee8e Fix unit tests static build (#403) 2021-07-09 09:35:32 -07:00
Eiden Yoshida 5c3e7d8b67 Fix static builds (#393) 2021-06-23 09:19:48 -06:00
Wenkai Du 59d2867b01 Remove hard coded /opt/rocm from cmake (#396) 2021-06-21 08:29:23 -07:00
Wenkai Du 6dcae8a459 Select sendrecv path based on collective data size (#391)
* Select sendrecv path based on collective data size

* Add comments on packing and unpacking group field

* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Stanley Tsang f6f5e16fe6 Fixing bug with ExtractSubDataset function not fully initializing subdataset (#390) 2021-06-10 14:35:39 -06:00
Stanley Tsang 256403d4f0 Adding support for hipMallocManaged() in unit tests (#375)
* Adding HMM support for unit tests

* Fixing HMM opt-in check
2021-05-25 17:07:12 -06:00
gilbertlee-amd 2daadcc834 Disabling env var caching for all unit tests (#371)
* Disabling env var caching for all unit tests
2021-05-18 12:56:30 -06:00
Stanley Tsang 0b2bfdd6d8 Multiprocess unit test various fixes (#367)
* Re-enabling mp unit tests

* Fixing shared memory leak and other bugs related to shared mem for MP unit tests

* Revert 43bfbfc97bf9edbae1f386d461439091618ff8ed

* Further tightening up unlinks

* Moving test check macros to separate header file

* Tightening up shared memory unlinking for clique kernels, add munmap for host barrier for MP unit tests

* Updating new MP unit test

* Fixing mqueue bug

* Fixing memory leak in MP unit tests
2021-05-14 09:38:49 -06:00
gilbertlee-amd e796b1645c Clique tuning upgrade (#352) (#19)
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
2021-05-11 08:44:59 -06:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Wenkai Du 8e180cf087 Revert "Port alltoall[v]" (#325)
This reverts commit f4d5d3d620.
2021-03-06 13:59:31 -08:00
Gilbert Lee f1a9ce3fa5 Using GTEST_SKIP() to skip unit tests that have insufficient devices. Skipping out earlier 2021-02-09 03:54:04 +00:00
Stanley Tsang d00b7d17bd Update MP UT to support arbitrary # of GPUs; multiple bugfixes (#16)
* Fixing temp file creation/deletion for Clique kernel mode.

* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs

* GroupCall MP UT properly quits when too many devices specified

* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script
2021-02-05 16:49:25 -08:00
Wenkai Du ab1e7a0318 Merge remote-tracking branch 'origin/develop' into 2.8.3 2021-02-04 20:02:34 -05:00
Gilbert Lee 01a998b17c Removing in-place tests from Combined calls (no support for send/recv) 2021-01-28 20:09:03 +00:00
gilbertlee-amd 3e62ceddc5 Clique kernel support (#295) (#15)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>
2021-01-28 09:45:01 -07:00
Stanley Tsang d3fa257682 Adding multiprocess unit tests (#312)
Adding multiprocess unit tests for collectives.  

To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess
2021-01-15 16:34:36 -07:00
Wenkai Du b33a2cac8b gtest: add scatter to combined calls and use loops (#303)
* gtest: add scatter to combined calls and use loops

* gtest: run validation inside loop

* gtest: revert small element count to 2520

* gtest: fix memory leak in validation

(cherry picked from commit b0853ccd51)

* Fix combined call UT

* Fix memory leak

* Fix alltoallv test
2021-01-14 19:28:01 -05:00
Wenkai Du b0853ccd51 gtest: add scatter to combined calls and use loops (#303)
* gtest: add scatter to combined calls and use loops

* gtest: run validation inside loop

* gtest: revert small element count to 2520

* gtest: fix memory leak in validation
2020-11-13 17:57:44 -08:00
gilbertlee-amd 41bcfb8878 Clique kernel support (#295)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>
2020-11-10 15:44:10 -07:00
Wenkai Du b871ea3c0c Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes
2020-09-30 16:25:36 -07:00
Wenkai Du 60819dcf8d Merge pull request #262 from wenkaidu/alignment
Make data alignment requirements matching ISA manual
2020-09-08 10:40:42 -07:00
Aaron Enye Shi 958b213428 Add RCCL Static Lib Creation with -fgpu-rdc
RCCL uses -fgpu-rdc to compile its source objects. When linking
the RCCL static library, the link and archive step must do through
hipcc and uses the flag --emit-static-lib. When compiling
UnitTests, the librccl.a must be consumed through -l and -L.
2020-09-03 11:25:41 -04:00
Wenkai Du b163a8898f gtest: add alltoallv test 2020-09-02 21:28:32 +00:00
Wenkai Du 7e3f841fab Merge remote-tracking branch 'nccl/master' into 2.7.8 2020-08-10 16:11:00 +00:00
saadrahim 0dc019e35f Download GTest if not found in system (#237)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
2020-08-06 09:36:58 -06:00
Stanley Tsang 684f3e6af4 Adding better naming to unit tests for filtering; adding short and full unit test suites (#235) 2020-07-21 12:19:47 -06:00
saadrahim 99a491273f Changing GTest inclusion in cmake to use find_package (#234)
* GTest is used via find_package. No longer downloaded in cmake.

* Adding error handling
2020-07-15 20:51:48 -06:00
gilbertlee-amd f87ba17737 Removing UnitTest as install, removing unused env var (#231) 2020-07-10 09:30:28 -06:00
Wenkai Du 8db0aa8f4c gtest: extend testing up to 8 GPUs 2020-06-29 09:32:31 -07:00
Wenkai Du fee1a20b74 gtest: add scatter, gather and all to all unit tests 2020-06-09 17:44:15 -07:00
Stanley Tsang 20fa04d9b6 Updating copyright notices for 2020. 2020-01-29 15:28:08 -08:00
Wenkai Du fe6d012eb0 Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup 2020-01-29 15:28:03 -08:00
Wenkai Du 1e55645d97 Misc fixes and improvements for 2.5.6
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log
2020-01-29 15:27:05 -08:00
gilbertlee-amd 000bce6f27 Removing OpenMP from unit tests (#163) 2019-12-20 11:41:56 -07:00
Wenkai Du 4ca05c1297 Support bfloat16 on rest of the unit tests 2019-11-18 14:18:34 -08:00
Wenkai Du bdac0256a5 Add bfloat16 all reduce unit test 2019-11-18 13:50:29 -08:00
Akila Premachandra f48ae5c98d Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt 2019-08-23 22:02:42 +00:00
Wenkai Du f11c8f60cd RCCL 2.4 update 2019-08-14 10:42:35 -07:00
Sylvain Jeaugey f93fe9bfd9 2.3.5-5
Add support for inter-node communication using sockets and InfiniBand/RoCE.
Improve latency.
Add support for aggregation.
Improve LL/regular tuning.
Remove tests as those are now at github.com/nvidia/nccl-tests .
2018-09-25 14:12:01 -07:00
sclarkson 680a35c6b7 fix tests on maxwell 2017-11-11 19:22:06 -08:00