نمودار کامیت

83 کامیت‌ها

مولف SHA1 پیام تاریخ
PedramAlizadeh 45872d170f Changed the name of UnitTests to rccl-UnitTests (wrapper executable included). 2022-12-13 21:45:57 +00:00
Pedram Alizadeh 8250092367 UnitTest: add test cases for 2.14 API (ncclCommInitRankConfig and ncclCommFinalize for non-blocking communicator) (#662) 2022-12-13 16:05:09 -05:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd faed69f9fc Graph unit tests (#656)
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Ranjith Ramakrishnan b397cb16ea Correct hsa header path for new directory layout 2022-11-04 09:52:16 -07:00
raramakr b32f38126d Merge pull request #635 from raramakr/swdev
Correct include and library path for new directory layout
2022-10-14 15:48:44 -07:00
gilbertlee-amd ebb8b5bf63 Updating files for missing licenses (#637) 2022-10-14 13:49:16 -06:00
Ranjith Ramakrishnan cf4e963aaf Correct include and library path for new directory layout
Use actual header files and libraries , rather than using wrapper header files and library softlinks
2022-10-14 01:32:04 -07:00
akolliasAMD 06bce9d0c9 added stream synch after hipMemset (#609) 2022-08-30 16:18:37 -06:00
Edgar Gabriel f6e00dec13 introduce support for ncclFloat16/half in UT 2022-08-24 15:28:24 +00:00
gilbertlee-amd dae11c2aca Disable clique AllReduce UnitTest (#595) 2022-08-04 18:30:00 -06:00
akolliasAMD 686dbc8bc6 updated alltoallV test to reflect how send counts are done in perf tests (#586) 2022-07-21 14:59:34 -06:00
akolliasAMD 8b9291eb47 moved default number of max ranks per gpu to 1 2022-06-22 17:37:49 +00:00
Edgar a87d61db2b extending the unit-tests for multi-rank support 2022-06-10 14:23:19 +00:00
gilbertlee-amd 700b473211 Moving opt-in custom signal handler from UnitTests into RCCL (#550)
* Enable via RCCL_ENABLE_SIGNALHANDLER=1
2022-05-20 09:56:38 -06:00
Edgar 2bf6d254b6 add a signal handler and backtrace
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling
2022-04-25 10:48:17 -04:00
Liam Wrubleski a8f1e61f48 Packages for test and benchmark executables on all supported OSes using CPack. (#512) 2022-03-21 15:04:14 -06:00
akolliasAMD 65ea3d80db Added alltoallv test and optional args variable on collective args (#514)
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
Nirmal Unnikrishnan 676a4737c1 File reorganization as per the new defined standard
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
gilbertlee-amd 0687940b84 Changing initialization method for UnitTests (#510) 2022-03-07 09:22:55 -07:00
gilbertlee-amd 699dc30f05 [UnitTests] Check process mask for custom tests (#507) 2022-03-02 17:24:14 -07:00
akolliasAMD ff54e79799 Added Unit test for nccl send recv (#506)
Added Send Receive test that tests through all pairs
2022-03-02 15:50:16 -05:00
gilbertlee-amd 29ad0f5fbe Unit test refactor (#500)
Refactoring and consolidating single-process / multi-process unit testing
2022-02-25 08:59:07 -07:00
Wenkai Du 6268b87c16 Unit tests: fix number of GPU detection (#484) 2022-01-05 15:06:12 -08:00
Stanley Tsang bbbb35ceec Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests (#481) 2021-12-09 11:04:31 -07:00
Wenkai Du 03a830293c gtest: dynamically generate tests based on test machine's GPU count (#467)
* gtest: dynamically generate tests based on test machine's GPU count

* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang a6dba6b9dd Remove hardcoded references to /opt/rocm when using chrpath (#469)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du bc2932be4e Unit Test: use range from 0 to 1 for floating point test data (#459)
* Unit Test: use range from 0 to 1 for floating point test data

* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang 2f87073514 Fixing cmake_install_prefix search to include /opt/rocm-xxxx (#462) 2021-11-06 07:58:26 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Stanley Tsang d23dfc12c1 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath
2021-10-26 11:10:04 -06:00
Wenkai Du 1faff323b4 Unit Test: support ncclAvg 2021-08-25 14:15:54 -07:00
Wenkai Du 215904ee8e Fix unit tests static build (#403) 2021-07-09 09:35:32 -07:00
Eiden Yoshida 5c3e7d8b67 Fix static builds (#393) 2021-06-23 09:19:48 -06:00
Wenkai Du 59d2867b01 Remove hard coded /opt/rocm from cmake (#396) 2021-06-21 08:29:23 -07:00
Wenkai Du 6dcae8a459 Select sendrecv path based on collective data size (#391)
* Select sendrecv path based on collective data size

* Add comments on packing and unpacking group field

* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Stanley Tsang f6f5e16fe6 Fixing bug with ExtractSubDataset function not fully initializing subdataset (#390) 2021-06-10 14:35:39 -06:00
Stanley Tsang 256403d4f0 Adding support for hipMallocManaged() in unit tests (#375)
* Adding HMM support for unit tests

* Fixing HMM opt-in check
2021-05-25 17:07:12 -06:00
gilbertlee-amd 2daadcc834 Disabling env var caching for all unit tests (#371)
* Disabling env var caching for all unit tests
2021-05-18 12:56:30 -06:00
Stanley Tsang 0b2bfdd6d8 Multiprocess unit test various fixes (#367)
* Re-enabling mp unit tests

* Fixing shared memory leak and other bugs related to shared mem for MP unit tests

* Revert 43bfbfc97bf9edbae1f386d461439091618ff8ed

* Further tightening up unlinks

* Moving test check macros to separate header file

* Tightening up shared memory unlinking for clique kernels, add munmap for host barrier for MP unit tests

* Updating new MP unit test

* Fixing mqueue bug

* Fixing memory leak in MP unit tests
2021-05-14 09:38:49 -06:00
gilbertlee-amd e796b1645c Clique tuning upgrade (#352) (#19)
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
2021-05-11 08:44:59 -06:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Wenkai Du 8e180cf087 Revert "Port alltoall[v]" (#325)
This reverts commit f4d5d3d620.
2021-03-06 13:59:31 -08:00
Gilbert Lee f1a9ce3fa5 Using GTEST_SKIP() to skip unit tests that have insufficient devices. Skipping out earlier 2021-02-09 03:54:04 +00:00
Stanley Tsang d00b7d17bd Update MP UT to support arbitrary # of GPUs; multiple bugfixes (#16)
* Fixing temp file creation/deletion for Clique kernel mode.

* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs

* GroupCall MP UT properly quits when too many devices specified

* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script
2021-02-05 16:49:25 -08:00
Wenkai Du ab1e7a0318 Merge remote-tracking branch 'origin/develop' into 2.8.3 2021-02-04 20:02:34 -05:00
Gilbert Lee 01a998b17c Removing in-place tests from Combined calls (no support for send/recv) 2021-01-28 20:09:03 +00:00
gilbertlee-amd 3e62ceddc5 Clique kernel support (#295) (#15)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>
2021-01-28 09:45:01 -07:00
Stanley Tsang d3fa257682 Adding multiprocess unit tests (#312)
Adding multiprocess unit tests for collectives.  

To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess
2021-01-15 16:34:36 -07:00
Wenkai Du b33a2cac8b gtest: add scatter to combined calls and use loops (#303)
* gtest: add scatter to combined calls and use loops

* gtest: run validation inside loop

* gtest: revert small element count to 2520

* gtest: fix memory leak in validation

(cherry picked from commit b0853ccd51)

* Fix combined call UT

* Fix memory leak

* Fix alltoallv test
2021-01-14 19:28:01 -05:00