Wenkai Du
cd17cf6dce
Update Rome model matching and add new models ( #516 )
...
* Update Rome model matching and add new models
* Add missing file
* Models update
2022-03-21 10:54:40 -07:00
akolliasAMD
65ea3d80db
Added alltoallv test and optional args variable on collective args ( #514 )
...
* Added alltoallv test and optional args variable on collective args
2022-03-18 13:55:11 -04:00
nunnikri
a04da71647
Merge pull request #511 from nunnikri/develop
...
File reorganization as per the new defined standard
2022-03-10 08:39:29 -08:00
Nirmal Unnikrishnan
115461cc04
File reorganization with backward compatibility
...
Updated the header file location and export path
2022-03-10 01:28:41 +00:00
Nirmal Unnikrishnan
676a4737c1
File reorganization as per the new defined standard
...
The header files will in /opt/rocm-xxx/include/rccl
Libraries and cmake will be in /opt/rocm-xxx/lib folder.
Added wrappers for header files using rocm-cmake functions for backward compatibility.
2022-03-08 17:32:02 +00:00
gilbertlee-amd
0687940b84
Changing initialization method for UnitTests ( #510 )
2022-03-07 09:22:55 -07:00
Wenkai Du
d6d6af710e
Force ring algorithm on single node ( #509 )
2022-03-04 10:29:02 -08:00
gilbertlee-amd
b634b2f1c2
Adding NCCL_DEBUG=INFO for CI runs ( #508 )
2022-03-03 18:04:28 -07:00
gilbertlee-amd
699dc30f05
[UnitTests] Check process mask for custom tests ( #507 )
2022-03-02 17:24:14 -07:00
akolliasAMD
ff54e79799
Added Unit test for nccl send recv ( #506 )
...
Added Send Receive test that tests through all pairs
2022-03-02 15:50:16 -05:00
gilbertlee-amd
29ad0f5fbe
Unit test refactor ( #500 )
...
Refactoring and consolidating single-process / multi-process unit testing
2022-02-25 08:59:07 -07:00
Ziyue Yang
b569c0a1db
Add Pivot AllToAll algorithm for Rome model ( #503 )
...
* add a2a pivot interface
* remove debug info
* address comments
* fix bug
* remove custom script
* address comments
* fix bug
2022-02-20 21:09:47 -08:00
Wenkai Du
94e0dc8bfd
Allow additional options to be passed in through model's definition ( #501 )
2022-02-17 08:28:58 -08:00
Wenkai Du
02096c9936
Add another Rome model ( #497 )
2022-02-12 10:30:16 -08:00
gilbertlee-amd
f3c2cafd9d
[TransferBench] Fix for cases with subsets of configured numa nodes ( #495 )
2022-02-07 12:16:19 -07:00
gilbertlee-amd
84d5fce7dd
TransferBench: Adding ability to reindex GPUs based on PCIe address ( #494 )
2022-02-02 08:51:41 -07:00
Wenkai Du
400df49dbe
Generate proper b-tree with non-repeating channels ( #493 )
2022-01-19 15:09:17 -08:00
Wenkai Du
e94720fe3b
Merge pull request #492 from wenkaidu/develop
...
Sync up with NCCL
2022-01-18 12:55:05 -08:00
Stanley Tsang
b002ef1aaf
Search for GTest 1.11; fix detection of GTest after install ( #476 )
2022-01-17 15:18:01 -07:00
Wenkai Du
973d0111db
Merge remote-tracking branch 'nccl/master' into develop
2022-01-17 13:34:36 -08:00
Roopa Malavally
336ff4fe5f
Add files via upload
2022-01-16 22:19:54 -08:00
Roopa Malavally
f467fb57e8
Delete classification-maptest.xml
2022-01-16 22:19:33 -08:00
Roopa Malavally
62ef66b656
Add files via upload
2022-01-16 08:12:03 -08:00
Roopa Malavally
39cf27222c
Add files via upload
2022-01-16 08:11:40 -08:00
Wenkai Du
598c6fdded
Update Rome models ( #491 )
2022-01-14 10:03:30 -08:00
Wenkai Du
369c021992
topo_expl: update for 2.11.4 ( #490 )
...
* topo_expl: update for 2.11.4
* topo_expl: revert a few logging changes
2022-01-13 13:33:07 -08:00
Wenkai Du
6ff7690cb5
Use noinline and a few other fixes ( #489 )
...
* Use noinline and a few other fixes
* Tune collectives
2022-01-11 16:51:06 -08:00
gilbertlee-amd
7add135529
Updating CHANGELOG.md ( #488 )
2022-01-10 11:34:31 -07:00
Wenkai Du
3669e12432
Use hipGraph instead of cudaGraph ( #487 )
2022-01-10 08:26:01 -08:00
Wenkai Du
565fbeb5e9
Tune collectives for 2.11.4 ( #486 )
2022-01-10 08:25:47 -08:00
gilbertlee-amd
2530a2f084
[TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel ( #485 )
2022-01-05 16:33:25 -07:00
Wenkai Du
6268b87c16
Unit tests: fix number of GPU detection ( #484 )
2022-01-05 15:06:12 -08:00
Wenkai Du
4234a638b5
Merge pull request #482 from ROCmSoftwarePlatform/2.11.4
...
Sync up with 2.11.4
2022-01-05 09:31:51 -08:00
Wenkai Du
f8d0775a6f
Add another Rome model ( #483 )
2022-01-05 09:26:31 -08:00
Wenkai Du
434ecb0e1f
Merge remote-tracking branch 'origin/develop' into 2.11.4
2022-01-03 09:54:16 -08:00
Stanley Tsang
bbbb35ceec
Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests ( #481 )
2021-12-09 11:04:31 -07:00
Chang Lan
c5790b3672
Build fastsocket plugin from ext-net
2021-12-09 08:41:05 +01:00
Ke Wen
c88c9f873f
Add env NCCL_NET_DISABLE_INTRA
...
Disable NET transport for intra-node communication by setting the env to 1
It provides an option to error out instead of falling back to NET when superior intra-node transports (P2P and SHM) are unavailable
2021-12-08 16:28:19 +01:00
Wenkai Du
a94b953bcc
Update Rome model ( #479 )
2021-12-03 08:24:51 -08:00
Wenkai Du
8a08a2f579
Update tuning parameters ( #478 )
2021-11-30 08:51:11 -08:00
gilbertlee-amd
1157c2edfe
[TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs ( #477 )
2021-11-24 18:05:37 -07:00
gilbertlee-amd
539de1216f
Minor cppcheck fixes, adding suppression file ( #475 )
...
* Minor cppcheck fixes, adding suppression file
2021-11-24 10:23:59 -07:00
Wenkai Du
e9bf01fb7e
Determine fine grained memory availability at RCCL bootstrapping ( #471 )
2021-11-19 08:12:53 -08:00
Chris Jones
8cf7325d69
Perform busIdToInt64 on the stack.
...
I noticed when I enabled `NCCL_DEBUG_SUBSYS=ALLOC` that this function is
called thousands of times, making the log output unintelligible.
Fortunately, this function can be implemented without heap allocations.
2021-11-19 09:35:55 +01:00
Stanley Tsang
7b8b54955b
Set ROCM_PATH CMake variable in install script ( #470 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
* Setting ROCM_PATH CMake variable in install script
2021-11-18 14:44:19 -07:00
Wenkai Du
03a830293c
gtest: dynamically generate tests based on test machine's GPU count ( #467 )
...
* gtest: dynamically generate tests based on test machine's GPU count
* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang
a6dba6b9dd
Remove hardcoded references to /opt/rocm when using chrpath ( #469 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du
3a919c1f49
Merge remote-tracking branch 'nccl/master' into develop
2021-11-11 14:22:12 -08:00
Wenkai Du
e05de8fd26
Remove extra work element copy ( #465 )
2021-11-09 13:52:03 -08:00
gilbertlee-amd
1c7ef1b790
[TransferBench] Adding #CUs / RRLW mode to p2p benchmark ( #464 )
2021-11-08 14:36:04 -07:00