Wenkai Du
dcad0ef7cb
Fix incorrect pointer checking for scatter and gather ( #285 )
2020-10-19 13:27:09 -07:00
Wenkai Du
c835d8263a
Merge remote-tracking branch 'nccl/master' into nccl_sync
2020-10-15 18:42:38 -04:00
gilbertlee-amd
84a2541e01
Revert "Initial support for clique-based kernels ( #276 )" ( #280 )
...
This reverts commit 2b8184808d .
2020-10-15 11:30:18 -07:00
Sylvain Jeaugey
0e14394c5f
Fix affinity move
2020-10-13 16:58:05 -07:00
Sylvain Jeaugey
c6dbdb0084
Make sure proxy threads inherit the CPU affinity.
2020-10-13 16:37:52 -07:00
Wenkai Du
33babcb5e2
Update Rome single node models ( #277 )
2020-10-13 13:33:09 -07:00
gilbertlee-amd
2b8184808d
Initial support for clique-based kernels ( #276 )
...
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du
ae008fd2db
Rework Rome detection and add multiple network ports models ( #274 )
...
* Rework Rome detection and add multiple network ports models
* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du
b871ea3c0c
Add Alltoallv RCCL kernel implementation ( #269 )
...
* Add alltoallv API and implementation
* Extend Rome P2P channel limit to multinode and alltoall kernels
* topo_expl: fix compilation and sync up with main
* gtest: use RCCL alltoallv API
* Code review changes
2020-09-30 16:25:36 -07:00
Stanley Tsang
acca2ae20a
Updating inline asm to not require explicit L1 cache invalidation ( #270 )
2020-09-25 13:46:26 -06:00
gilbertlee-amd
01bd2573db
Changes to topology based on XGMI ( #272 )
...
* Alterations to topology search to improve XGMI-enabled nodes
2020-09-25 12:20:09 -06:00
Wenkai Du
44fcde7835
Ensure all ranks on same send/receive or alltoall kernel path ( #271 )
2020-09-24 08:25:04 -07:00
Wenkai Du
d871fceb54
Change network plugin name to librccl-net.so ( #266 )
2020-09-18 13:23:30 -07:00
Wenkai Du
42955f5f4f
Limit P2P channels on Rome
2020-09-17 17:20:32 -07:00
Wenkai Du
60819dcf8d
Merge pull request #262 from wenkaidu/alignment
...
Make data alignment requirements matching ISA manual
2020-09-08 10:40:42 -07:00
Wenkai Du
e2042ccf8a
Fix broken profiling build ( #263 )
2020-09-02 15:39:52 -07:00
Wenkai Du
4751992231
Make data alignment requirements matching ISA manual
...
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf
8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.
2020-09-01 21:21:58 +00:00
Wenkai Du
4180e6409e
Fix incorrect threads split in sendrecv ( #261 )
2020-08-31 17:33:22 -07:00
Wenkai Du
c5cbece6d0
Increase minimal channels for gfx908 ( #259 )
2020-08-26 11:40:11 -07:00
Wenkai Du
b0919dc46c
Only use software barrier for synchronization ( #258 )
2020-08-25 13:16:34 -07:00
Wenkai Du
391bbf3f1e
Add NPS4 support on some models ( #256 )
...
* Add NPS4 support on some models
* Add XML models
2020-08-19 11:03:20 -07:00
Wenkai Du
a51e4071e3
Add another Rome model ( #249 )
...
* Add another Rome model
* Add gfx908 4P3L models and support
* Revert "Use cached value for detecting GDR support only once"
This reverts commit 67c8e72ce3 .
* Skip using ibverb for GPU direct RDMA detection
* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
Wenkai Du
7e3d8a31cc
Collect gcnArch and hipDeviceArch_t in XML ( #252 )
2020-08-12 15:48:38 -07:00
Wenkai Du
066223333d
Merge pull request #248 from wenkaidu/2.7.8
...
2.7.8
2020-08-11 08:20:37 -07:00
Wenkai Du
7e3f841fab
Merge remote-tracking branch 'nccl/master' into 2.7.8
2020-08-10 16:11:00 +00:00
Wenkai Du
09ef75656a
Add more Rome 4P2H models
2020-08-06 18:20:02 +00:00
Jack Snyder
de49a77074
Setting type when gpu sub node is discovered
2020-08-05 13:39:23 -07:00
Eric Badger
700c0e0f24
Don't require NIC devices to have specific PCI class
...
If a PCI node is the parent of a NIC, treat it as such, regardless of
the PCI class code for the device. This allows non-traditional devices
to act as NICs via the net plugin mechanism.
For consistency, treat GPUs similarly.
2020-08-05 12:46:29 -07:00
Wenkai Du
5b03132ace
Allow setup ring through NCCL_RINGS to facilitate testing
2020-08-04 21:07:00 +00:00
Wenkai Du
d1e20b4c5e
Improve 4P2H topology on Rome ( #243 )
...
1. Use bi-directional rings
2. GPU search is sorted by PCI device ID to get consistent results
2020-07-28 14:21:44 -07:00
David Addison
033d799524
2.7.8-1
...
Fix collective mismatch error when using ncclSend/ncclRecv
2020-07-27 16:34:09 -07:00
Wenkai Du
e7a10aa0e4
Topology tuning for 4P2H on Rome ( #242 )
...
* Topology tuning for 4P2H on Rome
* Use ncclTopoIdToIndex
2020-07-27 11:53:57 -07:00
Wenkai Du
8d5fb920b6
ib-test: support multiple channels ( #241 )
2020-07-27 11:03:12 -07:00
Wenkai Du
d5f90e19b5
Add 8P6L multi-node models ( #239 )
2020-07-21 14:10:36 -07:00
Wenkai Du
ab787c767e
Change default channels duplication for chordal ring ( #233 )
2020-07-14 15:16:50 -07:00
Wenkai Du
5215130168
Revert "Split primitive class to smaller structures" ( #230 )
...
This reverts commit 486fd436af .
2020-07-08 11:06:50 -07:00
Riatre Foo
2d8601701d
Fix build action order
...
Add $(INCTARGETS) to build dependencies of %.o and $(DEVICELIB).
As there were no dep files during the first build, Make may kick off source
compilation before nccl.h got generated, which leads to occasional build
failures on systems with high core count. The build failure could be
reproduced reliably with a `sleep 5` in $(INCDIR)/nccl.h rule.
2020-07-07 10:20:51 -07:00
Wenkai Du
da3b197d6c
Merge remote-tracking branch 'nccl/master' into develop
2020-07-01 16:51:25 -07:00
Wenkai Du
964c4c2061
Merge sendrecv kernel from NCCL 2.7.3
...
This commit was cherry-picked and modified from
https://github.com/NVIDIA/nccl/commit/5949d96f36d050e59d05872f8bbffd2549318e95
2020-06-29 08:47:46 -07:00
Wenkai Du
b90735c935
Use separate threads for send and receive
2020-06-29 08:47:15 -07:00
Sylvain Jeaugey
1952325569
2.7.6-1
...
Fix crash when NVswitch is not visible inside a VM.
2020-06-26 16:35:54 -07:00
Sylvain Jeaugey
01afd20a77
2.7.5-1
...
Minor fixes for A100 platforms.
Add a WARN for invalid GroupEnd call.
2020-06-26 14:39:49 -07:00
Wenkai Du
84f8ba3bb0
Revert use posix_memalign for network buffer allocation on host memory ( #222 )
2020-06-24 11:25:55 -07:00
Wenkai Du
0eb19a563a
Use posix_memalign for network buffer allocation on host memory ( #221 )
...
* Use posix_memalign for network buffer allocation on host memory
* ib-test: add ability to specify run iterations
* ib-test: define iterations as multiple of default cycles
* Add checking to posix_memalign return value
2020-06-22 13:06:25 -07:00
Stanley Tsang
8d21adb5e3
Documentation updates for NCCL 2.7.0 ( #219 )
...
* Making hip-clang the default compiler; documentation update
* Adding back --hip-clang to install.sh as a silent option for CI
* Documentation updates for NCCL 2.7
* Restoring deleted line in install script
2020-06-16 16:48:11 -06:00
Wenkai Du
cfa97eccd3
Add IB/RDMA unit test
2020-06-16 18:29:17 +00:00
Wenkai Du
95b8f70d15
Limit network profiling support to simple protocol and avoid overflow
2020-06-15 20:51:36 +00:00
Wenkai Du
7484e53ff7
Rework network proxy profiling
2020-06-13 03:13:58 +00:00
Wenkai Du
b257676f30
Reduce RCCL kernel count as we don't pass first coll in argument
2020-06-12 21:30:04 +00:00
Wenkai Du
a6d621176c
Sender rank's opCount maybe ahead by one if it finishes earlier
2020-06-12 03:39:45 +00:00