Graf commitů

546 Commity

Autor SHA1 Zpráva Datum
Stanley Tsang f152c8d160 Update MP UT to support arbitrary # of GPUs; multiple bugfixes (#16)
* Fixing temp file creation/deletion for Clique kernel mode.

* Refactoring of MP unit tests; include bugfixes and general support for any number of GPUs

* GroupCall MP UT properly quits when too many devices specified

* MP UT will programmatically set NCCL_COMM_ID if not specified; updated install script

[ROCm/rccl commit: d00b7d17bd]
2021-02-05 16:49:25 -08:00
Wenkai Du fe8923ebba Add gfx908 Rome 4 NICs model
[ROCm/rccl commit: 6dfdfef98f]
2021-02-06 00:19:47 +00:00
Gilbert Lee b954d85935 [TransferBench] Fixing some merge issues
[ROCm/rccl commit: f372c53d52]
2021-02-05 16:46:20 +00:00
Wenkai Du ae5779702a Merge remote-tracking branch 'origin/develop' into 2.8.3
[ROCm/rccl commit: ab1e7a0318]
2021-02-04 20:02:34 -05:00
Gilbert Lee 1dfb88f554 [topo_expl] Updating for 2.8.3
[ROCm/rccl commit: 2f541508c5]
2021-02-04 19:08:42 +00:00
Gilbert Lee f2d07cb9a6 [ib-test] Update for 2.8.3]
[ROCm/rccl commit: 9aac1ed38f]
2021-02-04 19:05:03 +00:00
Gilbert Lee 1643d05c75 [TransferBench] Updating for 2.8.3
[ROCm/rccl commit: 9ce203dd0a]
2021-02-04 18:58:25 +00:00
gilbertlee-amd 16d625ca27 Tuning some clique-based kernel parameters (#315)
[ROCm/rccl commit: 1990ffd76a]
2021-02-03 20:00:08 -07:00
Wenkai Du 57abf599b2 Enable GPU direct RDMA read from GPU
[ROCm/rccl commit: 5f97122442]
2021-02-03 02:48:30 +00:00
gilbertlee-amd 60c74f63fa [TransferBench] Restore some previous fixes - memory leak, PCIe address (#314)
[ROCm/rccl commit: 62e0447e9a]
2021-02-01 09:48:09 -07:00
Gilbert Lee 6bf9b0d36a Removing in-place tests from Combined calls (no support for send/recv)
[ROCm/rccl commit: 01a998b17c]
2021-01-28 20:09:03 +00:00
gilbertlee-amd c981e76efe Clique kernel support (#295) (#15)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

[ROCm/rccl commit: 3e62ceddc5]
2021-01-28 09:45:01 -07:00
Wenkai Du 7f9c15b843 Use less unroll for clique kernels (#313)
[ROCm/rccl commit: 41e47a36e7]
2021-01-15 17:48:10 -08:00
Stanley Tsang d7ed44eb9a Adding multiprocess unit tests (#312)
Adding multiprocess unit tests for collectives.  

To run, NCCL_COMM_ID=$HOSTNAME:12345 build/release/test/UnitTestsMultiProcess

[ROCm/rccl commit: d3fa257682]
2021-01-15 16:34:36 -07:00
Wenkai Du d4382de267 Improve collective trace
[ROCm/rccl commit: 2ddbe6646b]
2021-01-14 19:28:01 -05:00
Wenkai Du 560224fe9f gtest: add scatter to combined calls and use loops (#303)
* gtest: add scatter to combined calls and use loops

* gtest: run validation inside loop

* gtest: revert small element count to 2520

* gtest: fix memory leak in validation

(cherry picked from commit 36935cfbee)

* Fix combined call UT

* Fix memory leak

* Fix alltoallv test


[ROCm/rccl commit: b33a2cac8b]
2021-01-14 19:28:01 -05:00
Wenkai Du 2c49121171 Port alltoall[v]
[ROCm/rccl commit: f4d5d3d620]
2021-01-14 19:28:01 -05:00
Wenkai Du 41bead5a4e Do not allow GPU as intermediate
[ROCm/rccl commit: 105db19a11]
2021-01-14 19:28:01 -05:00
Wenkai Du 34c6013299 Revert "Changes to topology based on XGMI (#272)"
This reverts commit 0a9adc16f4.


[ROCm/rccl commit: e055229e56]
2021-01-14 19:28:01 -05:00
Wenkai Du adff98765c Merge remote-tracking branch 'nccl/master' into no-target-id
[ROCm/rccl commit: d469947641]
2021-01-14 19:27:53 -05:00
Wenkai Du 4ea285c527 Fix Rome PCIe 2 node topology generation (#310)
[ROCm/rccl commit: 373a108516]
2020-12-15 17:16:17 -08:00
gilbertlee-amd c570f09681 [TransferBench] Fixing bug with fine-grained memory allocation (#311)
* Fixing bug with fine-grained memory

[ROCm/rccl commit: 41c35dad48]
2020-12-15 17:37:31 -07:00
gilbertlee-amd 5155abb250 [TransferBench] Adding ability to perform CPU-executed copies, various upgrades (#309)
* Adding CPU based execution, fixing typos, adding Fine-grained mem
* Exposing sampling factor when generating range of data sizes
* Refactoring how Links are launched, now once per thread
* Documentation updates

[ROCm/rccl commit: ae0c4092c7]
2020-12-11 10:21:14 -07:00
gilbertlee-amd 9b48f92d72 [TransferBench] Support multiple of 4 byte sizes, changing default GPU timing mechanism (#307)
* Changing default timing mechanism, adjusting CPU bandwidth calc, adding flag to use combined timing
* Adding support for smaller transfers (byte size must be multiple of 4 instead of 128)

[ROCm/rccl commit: b80ae551b1]
2020-12-04 14:57:13 -07:00
Wenkai Du 9e83df4ad3 Adding backward compatibility for target-id syntax for AMDGPU_TARGETS (#306)
[ROCm/rccl commit: 882d52ad7e]
2020-12-04 13:55:56 -08:00
Wenkai Du b68ff1ebba Add Rome model and improve search (#305)
[ROCm/rccl commit: 975b14dffa]
2020-11-17 14:55:06 -08:00
Sylvain Jeaugey a8908b34ee 2.8.3-1
Optimization for Tree allreduce on A100.
Improve aggregation performance.
Use shared buffers for inter-node send/recv.
Add NVTX profiling hooks.
Accelerate alltoall connections by merging communication for all
channels.
Add support for one hop communication through NVLink, for faster
send/recv communication on cubemesh topologies like DGX-1.
Improve alltoall scheduling to better balance intra/inter node
communication.
Increase send/recv parallelism by 8x, each warp sending or
receiving to a different peer.
Net: move to v4.
Net: make flush operation asynchronous to accelerate alltoall.
Net: define maximum number of requests.
Fix hang when using LL128 protocol after 2^31 steps.
Fix #379 : topology injection failing when using less GPUs than
described in the XML.
Fix #394 : protocol mismatch causing hangs or crashes when using
one GPU per node.


[ROCm/rccl commit: 920dbe5b35]
2020-11-17 11:08:52 -08:00
Wenkai Du 32fdfc93fc Merge remote-tracking branch 'origin/master' into develop
[ROCm/rccl commit: 1943bac646]
2020-11-16 12:16:53 -05:00
Wenkai Du f19cbc8e51 Use device's link width and speed if port doesn't report (#304)
[ROCm/rccl commit: 554729079d]
2020-11-13 17:58:04 -08:00
Wenkai Du 36935cfbee gtest: add scatter to combined calls and use loops (#303)
* gtest: add scatter to combined calls and use loops

* gtest: run validation inside loop

* gtest: revert small element count to 2520

* gtest: fix memory leak in validation

[ROCm/rccl commit: b0853ccd51]
2020-11-13 17:57:44 -08:00
Stanley Tsang f373cd2fdc Fixing IPC handle leak (#302)
[ROCm/rccl commit: 2958f7eace]
2020-11-13 10:32:42 -07:00
gilbertlee-amd f66d05193a Adding RCCL_CLIQUE_DEBUG to help debug experimental clique feature (#300)
[ROCm/rccl commit: c8d08a7c2f]
2020-11-13 09:07:11 -07:00
Wenkai Du 62d21047b8 Skip unused peer connection in scatter and gather (#301)
[ROCm/rccl commit: 4e68229c8b]
2020-11-12 15:47:34 -08:00
Colin Smith 1349b382cd Merge pull request #299 from ROCmSoftwarePlatform/develop
Enable target id build

[ROCm/rccl commit: 377b43470b]
2020-11-10 15:47:42 -07:00
gilbertlee-amd a7ef699687 Clique kernel support (#295)
* Adding experimental clique-based kernels (opt-in only)

Co-authored-by: Stanley Tsang <stanley.tsang@amd.com>
Co-authored-by: Gilbert Lee <gilbert.lee@amd.com>
Co-authored-by: Wenkai Du <43822138+wenkaidu@users.noreply.github.com>

[ROCm/rccl commit: 41bcfb8878]
2020-11-10 15:44:10 -07:00
Wenkai Du 273974393e Use target id of xnack off (#298)
[ROCm/rccl commit: 1fdb216f87]
2020-11-10 11:10:48 -08:00
Wenkai Du a0109b2eae Use ncclSend/ncclRecv for alltoall type of collectives as default (#297)
[ROCm/rccl commit: 2e8b3a0857]
2020-11-09 11:23:17 -08:00
gilbertlee-amd 1af4a5c9fa Adding a CHANGELOG (#296)
[ROCm/rccl commit: bdd8adf1ca]
2020-11-05 13:38:30 -07:00
Wenkai Du a4dd1a9548 Improve GPU direct RDMA handling on Rome (#294)
[ROCm/rccl commit: 709b7e4880]
2020-11-03 14:29:08 -08:00
Wenkai Du c0c64d970a Add more Rome models (#292)
[ROCm/rccl commit: dfa3c41ede]
2020-10-30 21:26:04 -07:00
gilbertlee-amd 2931959e6e Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290)
[ROCm/rccl commit: bfab1d3592]
2020-10-27 09:00:33 -06:00
Wenkai Du e7ea1a585e Fix lintian errors (#287)
[ROCm/rccl commit: 2ecfc62ec8]
2020-10-21 16:20:53 -07:00
gilbertlee-amd a062c80298 [TransferBench] Displaying PCIe Bus ID (#288)
* Adding PCIe BusID per GPU in topology display

[ROCm/rccl commit: 61e1a71d14]
2020-10-21 16:13:36 -06:00
gilbertlee-amd 0282595de5 TransferBench Typo. Pinned host memory uses C not P (#286)
[ROCm/rccl commit: 769418c5c7]
2020-10-21 12:05:38 -06:00
xietingwew 0277094dd2 fix proxyArgs for trace log
[ROCm/rccl commit: 084207e685]
2020-10-21 09:18:40 -07:00
saadrahim 5439649936 Adding sles15, centos7 and centos8 testing (#283)
[ROCm/rccl commit: e8177c9ee7]
2020-10-20 09:39:03 -06:00
Wenkai Du 1aae6b1344 Fix incorrect pointer checking for scatter and gather (#285)
[ROCm/rccl commit: dcad0ef7cb]
2020-10-19 13:27:09 -07:00
gilbertlee-amd f4b9a0d8e5 Removing unnecessary flags from CI (#278)
* Removing unnecessary flags from CI

* Re-adding HSA_FORCE_FINE_GRAIN_PCIE in CI

[ROCm/rccl commit: 9b3f762b68]
2020-10-19 13:08:24 -06:00
saadrahim 0465dffe6f Updating copyright for documentation (#282)
[ROCm/rccl commit: 49aa6d7afe]
2020-10-19 13:07:15 -06:00
Wenkai Du d1781365d6 Merge pull request #279 from wenkaidu/nccl_sync
Sync up with latest NCCL master branch

[ROCm/rccl commit: a7deecb104]
2020-10-16 11:21:35 -07:00