Stanley Tsang
f152c8d160
Update MP UT to support arbitrary # of GPUs; multiple bugfixes ( #16 )
...
* Fixing temp file creation/deletion for Clique kernel mode.
* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs
* GroupCall MP UT properly quits when too many devices specified
* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script
[ROCm/rccl commit: d00b7d17bd ]
2021-02-05 16:49:25 -08:00
Wenkai Du
fe8923ebba
Add gfx908 Rome 4 NICs model
...
[ROCm/rccl commit: 6dfdfef98f ]
2021-02-06 00:19:47 +00:00
Gilbert Lee
b954d85935
[TransferBench] Fixing some merge issues
...
[ROCm/rccl commit: f372c53d52 ]
2021-02-05 16:46:20 +00:00
Wenkai Du
ae5779702a
Merge remote-tracking branch 'origin/develop' into 2.8.3
...
[ROCm/rccl commit: ab1e7a0318 ]
2021-02-04 20:02:34 -05:00
Gilbert Lee
1dfb88f554
[topo_expl] Updating for 2.8.3
...
[ROCm/rccl commit: 2f541508c5 ]
2021-02-04 19:08:42 +00:00
Gilbert Lee
f2d07cb9a6
[ib-test] Update for 2.8.3]
...
[ROCm/rccl commit: 9aac1ed38f ]
2021-02-04 19:05:03 +00:00
Gilbert Lee
1643d05c75
[TransferBench] Updating for 2.8.3
...
[ROCm/rccl commit: 9ce203dd0a ]
2021-02-04 18:58:25 +00:00
gilbertlee-amd
16d625ca27
Tuning some clique-based kernel parameters ( #315 )
...
[ROCm/rccl commit: 1990ffd76a ]
2021-02-03 20:00:08 -07:00
Wenkai Du
57abf599b2
Enable GPU direct RDMA read from GPU
...
[ROCm/rccl commit: 5f97122442 ]
2021-02-03 02:48:30 +00:00
gilbertlee-amd
60c74f63fa
[TransferBench] Restore some previous fixes - memory leak, PCIe address ( #314 )
...
[ROCm/rccl commit: 62e0447e9a ]
2021-02-01 09:48:09 -07:00
Gilbert Lee
6bf9b0d36a
Removing in-place tests from Combined calls (no support for send/recv)
...
[ROCm/rccl commit: 01a998b17c ]
2021-01-28 20:09:03 +00:00
gilbertlee-amd
c981e76efe
Clique kernel support ( #295 ) ( #15 )
...
* Adding experimental clique-based kernels (opt-in only)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
[ROCm/rccl commit: 3e62ceddc5 ]
2021-01-28 09:45:01 -07:00
Wenkai Du
7f9c15b843
Use less unroll for clique kernels ( #313 )
...
[ROCm/rccl commit: 41e47a36e7 ]
2021-01-15 17:48:10 -08:00
Stanley Tsang
d7ed44eb9a
Adding multiprocess unit tests ( #312 )
...
Adding multiprocess unit tests for collectives.
To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess
[ROCm/rccl commit: d3fa257682 ]
2021-01-15 16:34:36 -07:00
Wenkai Du
d4382de267
Improve collective trace
...
[ROCm/rccl commit: 2ddbe6646b ]
2021-01-14 19:28:01 -05:00
Wenkai Du
560224fe9f
gtest: add scatter to combined calls and use loops ( #303 )
...
* gtest: add scatter to combined calls and use loops
* gtest: run validation inside loop
* gtest: revert small element count to 2520
* gtest: fix memory leak in validation
(cherry picked from commit 36935cfbee )
* Fix combined call UT
* Fix memory leak
* Fix alltoallv test
[ROCm/rccl commit: b33a2cac8b ]
2021-01-14 19:28:01 -05:00
Wenkai Du
2c49121171
Port alltoall[v]
...
[ROCm/rccl commit: f4d5d3d620 ]
2021-01-14 19:28:01 -05:00
Wenkai Du
41bead5a4e
Do not allow GPU as intermediate
...
[ROCm/rccl commit: 105db19a11 ]
2021-01-14 19:28:01 -05:00
Wenkai Du
34c6013299
Revert "Changes to topology based on XGMI ( #272 )"
...
This reverts commit 0a9adc16f4 .
[ROCm/rccl commit: e055229e56 ]
2021-01-14 19:28:01 -05:00
Wenkai Du
adff98765c
Merge remote-tracking branch 'nccl/master' into no-target-id
...
[ROCm/rccl commit: d469947641 ]
2021-01-14 19:27:53 -05:00
Wenkai Du
4ea285c527
Fix Rome PCIe 2 node topology generation ( #310 )
...
[ROCm/rccl commit: 373a108516 ]
2020-12-15 17:16:17 -08:00
gilbertlee-amd
c570f09681
[TransferBench] Fixing bug with fine-grained memory allocation ( #311 )
...
* Fixing bug with fine-grained memory
[ROCm/rccl commit: 41c35dad48 ]
2020-12-15 17:37:31 -07:00
gilbertlee-amd
5155abb250
[TransferBench] Adding ability to perform CPU-executed copies, various upgrades ( #309 )
...
* Adding CPU based execution, fixing typos, adding Fine-grained mem
* Exposing sampling factor when generating range of data sizes
* Refactoring how Links are launched, now once per thread
* Documentation updates
[ROCm/rccl commit: ae0c4092c7 ]
2020-12-11 10:21:14 -07:00
gilbertlee-amd
9b48f92d72
[TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism ( #307 )
...
* Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing
* Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)
[ROCm/rccl commit: b80ae551b1 ]
2020-12-04 14:57:13 -07:00
Wenkai Du
9e83df4ad3
Adding backward compatibility for target-id syntax for AMDGPU_TARGETS ( #306 )
...
[ROCm/rccl commit: 882d52ad7e ]
2020-12-04 13:55:56 -08:00
Wenkai Du
b68ff1ebba
Add Rome model and improve search ( #305 )
...
[ROCm/rccl commit: 975b14dffa ]
2020-11-17 14:55:06 -08:00
Sylvain Jeaugey
a8908b34ee
2.8.3-1
...
Optimization for Tree allreduce on A100.
Improve aggregation performance.
Use shared buffers for inter-node send/recv.
Add NVTX profiling hooks.
Accelerate alltoall connections by merging communication for all
channels.
Add support for one hop communication through NVLink, for faster
send/recv communication on cubemesh topologies like DGX-1.
Improve alltoall scheduling to better balance intra/inter node
communication.
Increase send/recv parallelism by 8x, each warp sending or
receiving to a different peer.
Net: move to v4.
Net: make flush operation asynchronous to accelerate alltoall.
Net: define maximum number of requests.
Fix hang when using LL128 protocol after 2^31 steps.
Fix #379 : topology injection failing when using less GPUs than
described in the XML.
Fix #394 : protocol mismatch causing hangs or crashes when using
one GPU per node.
[ROCm/rccl commit: 920dbe5b35 ]
2020-11-17 11:08:52 -08:00
Wenkai Du
32fdfc93fc
Merge remote-tracking branch 'origin/master' into develop
...
[ROCm/rccl commit: 1943bac646 ]
2020-11-16 12:16:53 -05:00
Wenkai Du
f19cbc8e51
Use device's link width and speed if port doesn't report ( #304 )
...
[ROCm/rccl commit: 554729079d ]
2020-11-13 17:58:04 -08:00
Wenkai Du
36935cfbee
gtest: add scatter to combined calls and use loops ( #303 )
...
* gtest: add scatter to combined calls and use loops
* gtest: run validation inside loop
* gtest: revert small element count to 2520
* gtest: fix memory leak in validation
[ROCm/rccl commit: b0853ccd51 ]
2020-11-13 17:57:44 -08:00
Stanley Tsang
f373cd2fdc
Fixing IPC handle leak ( #302 )
...
[ROCm/rccl commit: 2958f7eace ]
2020-11-13 10:32:42 -07:00
gilbertlee-amd
f66d05193a
Adding RCCL_CLIQUE_DEBUG to help debug experimental clique feature ( #300 )
...
[ROCm/rccl commit: c8d08a7c2f ]
2020-11-13 09:07:11 -07:00
Wenkai Du
62d21047b8
Skip unused peer connection in scatter and gather ( #301 )
...
[ROCm/rccl commit: 4e68229c8b ]
2020-11-12 15:47:34 -08:00
Colin Smith
1349b382cd
Merge pull request #299 from ROCmSoftwarePlatform/develop
...
Enable target id build
[ROCm/rccl commit: 377b43470b ]
2020-11-10 15:47:42 -07:00
gilbertlee-amd
a7ef699687
Clique kernel support ( #295 )
...
* Adding experimental clique-based kernels (opt-in only)
Co-authored-by: Stanley Tsang <stanley.tsang@amd.com >
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com >
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com >
[ROCm/rccl commit: 41bcfb8878 ]
2020-11-10 15:44:10 -07:00
Wenkai Du
273974393e
Use target id of xnack off ( #298 )
...
[ROCm/rccl commit: 1fdb216f87 ]
2020-11-10 11:10:48 -08:00
Wenkai Du
a0109b2eae
Use ncclSend/ncclRecv for alltoall type of collectives as default ( #297 )
...
[ROCm/rccl commit: 2e8b3a0857 ]
2020-11-09 11:23:17 -08:00
gilbertlee-amd
1af4a5c9fa
Adding a CHANGELOG ( #296 )
...
[ROCm/rccl commit: bdd8adf1ca ]
2020-11-05 13:38:30 -07:00
Wenkai Du
a4dd1a9548
Improve GPU direct RDMA handling on Rome ( #294 )
...
[ROCm/rccl commit: 709b7e4880 ]
2020-11-03 14:29:08 -08:00
Wenkai Du
c0c64d970a
Add more Rome models ( #292 )
...
[ROCm/rccl commit: dfa3c41ede ]
2020-10-30 21:26:04 -07:00
gilbertlee-amd
2931959e6e
Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats ( #290 )
...
[ROCm/rccl commit: bfab1d3592 ]
2020-10-27 09:00:33 -06:00
Wenkai Du
e7ea1a585e
Fix lintian errors ( #287 )
...
[ROCm/rccl commit: 2ecfc62ec8 ]
2020-10-21 16:20:53 -07:00
gilbertlee-amd
a062c80298
[TransferBench] Displaying PCIe Bus ID ( #288 )
...
* Adding PCIe BusID per GPU in topology display
[ROCm/rccl commit: 61e1a71d14 ]
2020-10-21 16:13:36 -06:00
gilbertlee-amd
0282595de5
TransferBench Typo. Pinned host memory uses C not P ( #286 )
...
[ROCm/rccl commit: 769418c5c7 ]
2020-10-21 12:05:38 -06:00
xietingwew
0277094dd2
fix proxyArgs for trace log
...
[ROCm/rccl commit: 084207e685 ]
2020-10-21 09:18:40 -07:00
saadrahim
5439649936
Adding sles15, centos7 and centos8 testing ( #283 )
...
[ROCm/rccl commit: e8177c9ee7 ]
2020-10-20 09:39:03 -06:00
Wenkai Du
1aae6b1344
Fix incorrect pointer checking for scatter and gather ( #285 )
...
[ROCm/rccl commit: dcad0ef7cb ]
2020-10-19 13:27:09 -07:00
gilbertlee-amd
f4b9a0d8e5
Removing unnecessary flags from CI ( #278 )
...
* Removing unnecessary flags from CI
* Re-adding HSA_FORCE_FINE_GRAIN_PCIE in CI
[ROCm/rccl commit: 9b3f762b68 ]
2020-10-19 13:08:24 -06:00
saadrahim
0465dffe6f
Updating copyright for documentation ( #282 )
...
[ROCm/rccl commit: 49aa6d7afe ]
2020-10-19 13:07:15 -06:00
Wenkai Du
d1781365d6
Merge pull request #279 from wenkaidu/nccl_sync
...
Sync up with latest NCCL master branch
[ROCm/rccl commit: a7deecb104 ]
2020-10-16 11:21:35 -07:00