Wenkai Du
d469947641
Merge remote-tracking branch 'nccl/master' into no-target-id
2021-01-14 19:27:53 -05:00
Sylvain Jeaugey
920dbe5b35
2.8.3-1
...
Optimization for Tree allreduce on A100.
Improve aggregation performance.
Use shared buffers for inter-node send/recv.
Add NVTX profiling hooks.
Accelerate alltoall connections by merging communication for all
channels.
Add support for one hop communication through NVLink, for faster
send/recv communication on cubemesh topologies like DGX-1.
Improve alltoall scheduling to better balance intra/inter node
communication.
Increase send/recv parallelism by 8x, each warp sending or
receiving to a different peer.
Net: move to v4.
Net: make flush operation asynchronous to accelerate alltoall.
Net: define maximum number of requests.
Fix hang when using LL128 protocol after 2^31 steps.
Fix #379 : topology injection failing when using less GPUs than
described in the XML.
Fix #394 : protocol mismatch causing hangs or crashes when using
one GPU per node.
2020-11-17 11:08:52 -08:00
Wenkai Du
2e8b3a0857
Use ncclSend/ncclRecv for alltoall type of collectives as default ( #297 )
2020-11-09 11:23:17 -08:00
gilbertlee-amd
bdd8adf1ca
Adding a CHANGELOG ( #296 )
2020-11-05 13:38:30 -07:00
Wenkai Du
709b7e4880
Improve GPU direct RDMA handling on Rome ( #294 )
2020-11-03 14:29:08 -08:00
Wenkai Du
dfa3c41ede
Add more Rome models ( #292 )
2020-10-30 21:26:04 -07:00
gilbertlee-amd
bfab1d3592
Adding output to CSV, removing OpenMP, decreasing default numBytes to 64MB, adding aggregate stats ( #290 )
2020-10-27 09:00:33 -06:00
Wenkai Du
2ecfc62ec8
Fix lintian errors ( #287 )
2020-10-21 16:20:53 -07:00
gilbertlee-amd
61e1a71d14
[TransferBench] Displaying PCIe Bus ID ( #288 )
...
* Adding PCIe BusID per GPU in topology display
2020-10-21 16:13:36 -06:00
gilbertlee-amd
769418c5c7
TransferBench Typo. Pinned host memory uses C not P ( #286 )
2020-10-21 12:05:38 -06:00
xietingwew
084207e685
fix proxyArgs for trace log
2020-10-21 09:18:40 -07:00
saadrahim
e8177c9ee7
Adding sles15, centos7 and centos8 testing ( #283 )
2020-10-20 09:39:03 -06:00
Wenkai Du
dcad0ef7cb
Fix incorrect pointer checking for scatter and gather ( #285 )
2020-10-19 13:27:09 -07:00
gilbertlee-amd
9b3f762b68
Removing unnecessary flags from CI ( #278 )
...
* Removing unnecessary flags from CI
* Re-adding HSA_FORCE_FINE_GRAIN_PCIE in CI
2020-10-19 13:08:24 -06:00
saadrahim
49aa6d7afe
Updating copyright for documentation ( #282 )
2020-10-19 13:07:15 -06:00
Wenkai Du
a7deecb104
Merge pull request #279 from wenkaidu/nccl_sync
...
Sync up with latest NCCL master branch
2020-10-16 11:21:35 -07:00
Eiden Yoshida
205b5507b4
Update sramecc and xnack to ANY ( #284 )
...
Co-authored-by: Tony <Tony.Tye@amd.com >
Co-authored-by: Wenkai Du<Wenkai.Du@amd.com >
2020-10-16 00:25:18 -06:00
Wenkai Du
c835d8263a
Merge remote-tracking branch 'nccl/master' into nccl_sync
2020-10-15 18:42:38 -04:00
gilbertlee-amd
84a2541e01
Revert "Initial support for clique-based kernels ( #276 )" ( #280 )
...
This reverts commit 2b8184808d .
2020-10-15 11:30:18 -07:00
Sylvain Jeaugey
0e14394c5f
Fix affinity move
2020-10-13 16:58:05 -07:00
Sylvain Jeaugey
c6dbdb0084
Make sure proxy threads inherit the CPU affinity.
2020-10-13 16:37:52 -07:00
Wenkai Du
33babcb5e2
Update Rome single node models ( #277 )
2020-10-13 13:33:09 -07:00
gilbertlee-amd
2b8184808d
Initial support for clique-based kernels ( #276 )
...
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du
ae008fd2db
Rework Rome detection and add multiple network ports models ( #274 )
...
* Rework Rome detection and add multiple network ports models
* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du
88a062342b
Don't download GTest unless building unit test ( #275 )
2020-10-02 15:25:40 -07:00
Wenkai Du
b871ea3c0c
Add Alltoallv RCCL kernel implementation ( #269 )
...
* Add alltoallv API and implementation
* Extend Rome P2P channel limit to multinode and alltoall kernels
* topo_expl: fix compilation and sync up with main
* gtest: use RCCL alltoallv API
* Code review changes
2020-09-30 16:25:36 -07:00
nunnikri
aa985bfb7e
SWDEV-253325 : Chaning amdgpu-target to cuda-gpu-arch ( #268 )
2020-09-25 15:44:56 -06:00
Stanley Tsang
acca2ae20a
Updating inline asm to not require explicit L1 cache invalidation ( #270 )
2020-09-25 13:46:26 -06:00
gilbertlee-amd
ee262819a7
New TransferBench features ( #273 )
...
* Upgrading TransferBench to support pinned CPU memory, expanding functionality, cleaning up env vars
2020-09-25 12:20:48 -06:00
gilbertlee-amd
01bd2573db
Changes to topology based on XGMI ( #272 )
...
* Alterations to topology search to improve XGMI-enabled nodes
2020-09-25 12:20:09 -06:00
Wenkai Du
44fcde7835
Ensure all ranks on same send/receive or alltoall kernel path ( #271 )
2020-09-24 08:25:04 -07:00
Wenkai Du
d871fceb54
Change network plugin name to librccl-net.so ( #266 )
2020-09-18 13:23:30 -07:00
Wenkai Du
45a8f09e97
Merge pull request #267 from wenkaidu/p2p
...
Limit P2P channels on Rome
2020-09-18 11:35:35 -07:00
Wenkai Du
42955f5f4f
Limit P2P channels on Rome
2020-09-17 17:20:32 -07:00
lijietang
bbe233f8c1
Add rccl bw test script in tools ( #255 )
2020-09-11 16:59:03 +08:00
Stanley Tsang
8c90aefb6d
Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos ( #265 )
...
* Adding the ability to force install dependencies (namely gtest); gtest library installation fix for centos
* Removing potentially unneccessary dependencies from install script
2020-09-10 17:27:22 -06:00
Wenkai Du
60819dcf8d
Merge pull request #262 from wenkaidu/alignment
...
Make data alignment requirements matching ISA manual
2020-09-08 10:40:42 -07:00
Stanley Tsang
f2e5db7bf7
Adding XNACK flags. ( #264 )
...
* Adding XNACK flags.
2020-09-08 11:36:30 -06:00
Aaron Enye Shi
958b213428
Add RCCL Static Lib Creation with -fgpu-rdc
...
RCCL uses -fgpu-rdc to compile its source objects. When linking
the RCCL static library, the link and archive step must do through
hipcc and uses the flag --emit-static-lib. When compiling
UnitTests, the librccl.a must be consumed through -l and -L.
2020-09-03 11:25:41 -04:00
Wenkai Du
e2042ccf8a
Fix broken profiling build ( #263 )
2020-09-02 15:39:52 -07:00
Wenkai Du
b163a8898f
gtest: add alltoallv test
2020-09-02 21:28:32 +00:00
Wenkai Du
4751992231
Make data alignment requirements matching ISA manual
...
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf
8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.
2020-09-01 21:21:58 +00:00
Wenkai Du
4180e6409e
Fix incorrect threads split in sendrecv ( #261 )
2020-08-31 17:33:22 -07:00
Wenkai Du
c5cbece6d0
Increase minimal channels for gfx908 ( #259 )
2020-08-26 11:40:11 -07:00
Wenkai Du
b0919dc46c
Only use software barrier for synchronization ( #258 )
2020-08-25 13:16:34 -07:00
Wenkai Du
391bbf3f1e
Add NPS4 support on some models ( #256 )
...
* Add NPS4 support on some models
* Add XML models
2020-08-19 11:03:20 -07:00
gilbertlee-amd
ec9af40fcd
Upgrading various TransferBench features ( #257 )
2020-08-19 09:47:19 -06:00
Wenkai Du
a51e4071e3
Add another Rome model ( #249 )
...
* Add another Rome model
* Add gfx908 4P3L models and support
* Revert "Use cached value for detecting GDR support only once"
This reverts commit 67c8e72ce3 .
* Skip using ibverb for GPU direct RDMA detection
* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
gilbertlee-amd
c985478133
Fixes to make TransferBench compile for hipclang ( #254 )
2020-08-13 12:25:28 -06:00
saadrahim
6d8e19929c
Adding gfx908 to CI ( #253 )
2020-08-13 11:07:33 -06:00