Граф коммитов

439 Коммитов

Автор SHA1 Сообщение Дата
Wenkai Du 34c6013299 Revert "Changes to topology based on XGMI (#272)"
This reverts commit 0a9adc16f4.


[ROCm/rccl commit: e055229e56]
2021-01-14 19:28:01 -05:00
Wenkai Du adff98765c Merge remote-tracking branch 'nccl/master' into no-target-id
[ROCm/rccl commit: d469947641]
2021-01-14 19:27:53 -05:00
Sylvain Jeaugey a8908b34ee 2.8.3-1
Optimization for Tree allreduce on A100.
Improve aggregation performance.
Use shared buffers for inter-node send/recv.
Add NVTX profiling hooks.
Accelerate alltoall connections by merging communication for all
channels.
Add support for one hop communication through NVLink, for faster
send/recv communication on cubemesh topologies like DGX-1.
Improve alltoall scheduling to better balance intra/inter node
communication.
Increase send/recv parallelism by 8x, each warp sending or
receiving to a different peer.
Net: move to v4.
Net: make flush operation asynchronous to accelerate alltoall.
Net: define maximum number of requests.
Fix hang when using LL128 protocol after 2^31 steps.
Fix #379 : topology injection failing when using less GPUs than
described in the XML.
Fix #394 : protocol mismatch causing hangs or crashes when using
one GPU per node.


[ROCm/rccl commit: 920dbe5b35]
2020-11-17 11:08:52 -08:00
Wenkai Du a0109b2eae Use ncclSend/ncclRecv for alltoall type of collectives as default (#297)
[ROCm/rccl commit: 2e8b3a0857]
2020-11-09 11:23:17 -08:00
gilbertlee-amd 1af4a5c9fa Adding a CHANGELOG (#296)
[ROCm/rccl commit: bdd8adf1ca]
2020-11-05 13:38:30 -07:00
Wenkai Du a4dd1a9548 Improve GPU direct RDMA handling on Rome (#294)
[ROCm/rccl commit: 709b7e4880]
2020-11-03 14:29:08 -08:00
Wenkai Du c0c64d970a Add more Rome models (#292)
[ROCm/rccl commit: dfa3c41ede]
2020-10-30 21:26:04 -07:00
gilbertlee-amd 2931959e6e Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats (#290)
[ROCm/rccl commit: bfab1d3592]
2020-10-27 09:00:33 -06:00
Wenkai Du e7ea1a585e Fix lintian errors (#287)
[ROCm/rccl commit: 2ecfc62ec8]
2020-10-21 16:20:53 -07:00
gilbertlee-amd a062c80298 [TransferBench] Displaying PCIe Bus ID (#288)
* Adding PCIe BusID per GPU in topology display

[ROCm/rccl commit: 61e1a71d14]
2020-10-21 16:13:36 -06:00
gilbertlee-amd 0282595de5 TransferBench Typo. Pinned host memory uses C not P (#286)
[ROCm/rccl commit: 769418c5c7]
2020-10-21 12:05:38 -06:00
xietingwew 0277094dd2 fix proxyArgs for trace log
[ROCm/rccl commit: 084207e685]
2020-10-21 09:18:40 -07:00
saadrahim 5439649936 Adding sles15, centos7 and centos8 testing (#283)
[ROCm/rccl commit: e8177c9ee7]
2020-10-20 09:39:03 -06:00
Wenkai Du 1aae6b1344 Fix incorrect pointer checking for scatter and gather (#285)
[ROCm/rccl commit: dcad0ef7cb]
2020-10-19 13:27:09 -07:00
gilbertlee-amd f4b9a0d8e5 Removing unnecessary flags from CI (#278)
* Removing unnecessary flags from CI

* Re-adding HSA_FORCE_FINE_GRAIN_PCIE in CI

[ROCm/rccl commit: 9b3f762b68]
2020-10-19 13:08:24 -06:00
saadrahim 0465dffe6f Updating copyright for documentation (#282)
[ROCm/rccl commit: 49aa6d7afe]
2020-10-19 13:07:15 -06:00
Wenkai Du d1781365d6 Merge pull request #279 from wenkaidu/nccl_sync
Sync up with latest NCCL master branch

[ROCm/rccl commit: a7deecb104]
2020-10-16 11:21:35 -07:00
Eiden Yoshida f1dc3f1e86 Update sramecc and xnack to ANY (#284)
Co-authored-by: Tony <Tony.Tye@amd.com>
Co-authored-by: Wenkai Du<Wenkai.Du@amd.com>

[ROCm/rccl commit: 205b5507b4]
2020-10-16 00:25:18 -06:00
Wenkai Du 194135a40c Merge remote-tracking branch 'nccl/master' into nccl_sync
[ROCm/rccl commit: c835d8263a]
2020-10-15 18:42:38 -04:00
gilbertlee-amd 94437eef28 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit d68a532bc6.

[ROCm/rccl commit: 84a2541e01]
2020-10-15 11:30:18 -07:00
Sylvain Jeaugey 591ffd32fe Fix affinity move
[ROCm/rccl commit: 0e14394c5f]
2020-10-13 16:58:05 -07:00
Sylvain Jeaugey 5de6b6681d Make sure proxy threads inherit the CPU affinity.
[ROCm/rccl commit: c6dbdb0084]
2020-10-13 16:37:52 -07:00
Wenkai Du 8b120c0508 Update Rome single node models (#277)
[ROCm/rccl commit: 33babcb5e2]
2020-10-13 13:33:09 -07:00
gilbertlee-amd d68a532bc6 Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels

[ROCm/rccl commit: 2b8184808d]
2020-10-13 11:22:04 -06:00
Wenkai Du 41260bb948 Rework Rome detection and add multiple network ports models (#274)
* Rework Rome detection and add multiple network ports models

* Remove unused opCount in p2p transport

[ROCm/rccl commit: ae008fd2db]
2020-10-07 13:37:36 -07:00
Wenkai Du e12db6f2ab Don't download GTest unless building unit test (#275)
[ROCm/rccl commit: 88a062342b]
2020-10-02 15:25:40 -07:00
Wenkai Du dbde26e681 Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes

[ROCm/rccl commit: b871ea3c0c]
2020-09-30 16:25:36 -07:00
nunnikri 256de55920 SWDEV-253325 : Chaning amdgpu-target to cuda-gpu-arch (#268)
[ROCm/rccl commit: aa985bfb7e]
2020-09-25 15:44:56 -06:00
Stanley Tsang 67a8d86d78 Updating inline asm to not require explicit L1 cache invalidation (#270)
[ROCm/rccl commit: acca2ae20a]
2020-09-25 13:46:26 -06:00
gilbertlee-amd 5ca117d7cd New TransferBench features (#273)
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars

[ROCm/rccl commit: ee262819a7]
2020-09-25 12:20:48 -06:00
gilbertlee-amd 0a9adc16f4 Changes to topology based on XGMI (#272)
* Alterations to topology search to improve XGMI-enabled nodes

[ROCm/rccl commit: 01bd2573db]
2020-09-25 12:20:09 -06:00
Wenkai Du 7ba087e069 Ensure all ranks on same send/receive or alltoall kernel path (#271)
[ROCm/rccl commit: 44fcde7835]
2020-09-24 08:25:04 -07:00
Wenkai Du 37f7eec6b7 Change network plugin name to librccl-net.so (#266)
[ROCm/rccl commit: d871fceb54]
2020-09-18 13:23:30 -07:00
Wenkai Du f0a303664e Limit P2P channels on Rome
[ROCm/rccl commit: 42955f5f4f]
2020-09-17 17:20:32 -07:00
lijietang f6b08ca547 Add rccl bw test script in tools (#255)
[ROCm/rccl commit: bbe233f8c1]
2020-09-11 16:59:03 +08:00
Stanley Tsang 209133fadf Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos (#265)
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos

* Removing potentially unneccessary dependencies from install script

[ROCm/rccl commit: 8c90aefb6d]
2020-09-10 17:27:22 -06:00
Wenkai Du a3402d6aeb Merge pull request #262 from wenkaidu/alignment
Make data alignment requirements matching ISA manual

[ROCm/rccl commit: 60819dcf8d]
2020-09-08 10:40:42 -07:00
Stanley Tsang 818b44e27d Adding XNACK flags. (#264)
* Adding XNACK flags.

[ROCm/rccl commit: f2e5db7bf7]
2020-09-08 11:36:30 -06:00
Aaron Enye Shi 0a3a397481 Add RCCL Static Lib Creation with -fgpu-rdc
RCCL uses -fgpu-rdc to compile its source objects. When linking
the RCCL static library, the link and archive step must do through
hipcc and uses the flag --emit-static-lib. When compiling
UnitTests, the librccl.a must be consumed through -l and -L.


[ROCm/rccl commit: 958b213428]
2020-09-03 11:25:41 -04:00
Wenkai Du 09639a5d54 Fix broken profiling build (#263)
[ROCm/rccl commit: e2042ccf8a]
2020-09-02 15:39:52 -07:00
Wenkai Du 81bf52ddee gtest: add alltoallv test
[ROCm/rccl commit: b163a8898f]
2020-09-02 21:28:32 +00:00
Wenkai Du cfa1228504 Make data alignment requirements matching ISA manual
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf

8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.


[ROCm/rccl commit: 4751992231]
2020-09-01 21:21:58 +00:00
Wenkai Du 778ab61097 Fix incorrect threads split in sendrecv (#261)
[ROCm/rccl commit: 4180e6409e]
2020-08-31 17:33:22 -07:00
Wenkai Du 03bb6bcb54 Increase minimal channels for gfx908 (#259)
[ROCm/rccl commit: c5cbece6d0]
2020-08-26 11:40:11 -07:00
Wenkai Du 0898fea746 Only use software barrier for synchronization (#258)
[ROCm/rccl commit: b0919dc46c]
2020-08-25 13:16:34 -07:00
Wenkai Du 5f49a0e088 Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models

[ROCm/rccl commit: 391bbf3f1e]
2020-08-19 11:03:20 -07:00
gilbertlee-amd 3e4ddd065b Upgrading various TransferBench features (#257)
[ROCm/rccl commit: ec9af40fcd]
2020-08-19 09:47:19 -06:00
Wenkai Du 3d5fb8142e Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 0108a1219d.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model

[ROCm/rccl commit: a51e4071e3]
2020-08-17 10:51:02 -07:00
gilbertlee-amd 1a9b00a7fd Fixes to make TransferBench compile for hipclang (#254)
[ROCm/rccl commit: c985478133]
2020-08-13 12:25:28 -06:00
saadrahim 67bb880b8b Adding gfx908 to CI (#253)
[ROCm/rccl commit: 6d8e19929c]
2020-08-13 11:07:33 -06:00