Граф коммитов

201 Коммитов

Автор SHA1 Сообщение Дата
Wenkai Du dcad0ef7cb Fix incorrect pointer checking for scatter and gather (#285) 2020-10-19 13:27:09 -07:00
Wenkai Du c835d8263a Merge remote-tracking branch 'nccl/master' into nccl_sync 2020-10-15 18:42:38 -04:00
gilbertlee-amd 84a2541e01 Revert "Initial support for clique-based kernels (#276)" (#280)
This reverts commit 2b8184808d.
2020-10-15 11:30:18 -07:00
Sylvain Jeaugey 0e14394c5f Fix affinity move 2020-10-13 16:58:05 -07:00
Sylvain Jeaugey c6dbdb0084 Make sure proxy threads inherit the CPU affinity. 2020-10-13 16:37:52 -07:00
Wenkai Du 33babcb5e2 Update Rome single node models (#277) 2020-10-13 13:33:09 -07:00
gilbertlee-amd 2b8184808d Initial support for clique-based kernels (#276)
* Initial support for clique-based kernels
2020-10-13 11:22:04 -06:00
Wenkai Du ae008fd2db Rework Rome detection and add multiple network ports models (#274)
* Rework Rome detection and add multiple network ports models

* Remove unused opCount in p2p transport
2020-10-07 13:37:36 -07:00
Wenkai Du b871ea3c0c Add Alltoallv RCCL kernel implementation (#269)
* Add alltoallv API and implementation

* Extend Rome P2P channel limit to multinode and alltoall kernels

* topo_expl: fix compilation and sync up with main

* gtest: use RCCL alltoallv API

* Code review changes
2020-09-30 16:25:36 -07:00
Stanley Tsang acca2ae20a Updating inline asm to not require explicit L1 cache invalidation (#270) 2020-09-25 13:46:26 -06:00
gilbertlee-amd 01bd2573db Changes to topology based on XGMI (#272)
* Alterations to topology search to improve XGMI-enabled nodes
2020-09-25 12:20:09 -06:00
Wenkai Du 44fcde7835 Ensure all ranks on same send/receive or alltoall kernel path (#271) 2020-09-24 08:25:04 -07:00
Wenkai Du d871fceb54 Change network plugin name to librccl-net.so (#266) 2020-09-18 13:23:30 -07:00
Wenkai Du 42955f5f4f Limit P2P channels on Rome 2020-09-17 17:20:32 -07:00
Wenkai Du 60819dcf8d Merge pull request #262 from wenkaidu/alignment
Make data alignment requirements matching ISA manual
2020-09-08 10:40:42 -07:00
Wenkai Du e2042ccf8a Fix broken profiling build (#263) 2020-09-02 15:39:52 -07:00
Wenkai Du 4751992231 Make data alignment requirements matching ISA manual
From https://developer.amd.com/wp-content/resources/Vega_Shader_ISA.pdf

8.1.7. Alignment
For Dword or larger reads or writes, the two LSBs of the byte-address
are ignored, thus forcing Dword alignment.
2020-09-01 21:21:58 +00:00
Wenkai Du 4180e6409e Fix incorrect threads split in sendrecv (#261) 2020-08-31 17:33:22 -07:00
Wenkai Du c5cbece6d0 Increase minimal channels for gfx908 (#259) 2020-08-26 11:40:11 -07:00
Wenkai Du b0919dc46c Only use software barrier for synchronization (#258) 2020-08-25 13:16:34 -07:00
Wenkai Du 391bbf3f1e Add NPS4 support on some models (#256)
* Add NPS4 support on some models

* Add XML models
2020-08-19 11:03:20 -07:00
Wenkai Du a51e4071e3 Add another Rome model (#249)
* Add another Rome model

* Add gfx908 4P3L models and support

* Revert "Use cached value for detecting GDR support only once"

This reverts commit 67c8e72ce3.

* Skip using ibverb for GPU direct RDMA detection

* Fine tune one Rome model
2020-08-17 10:51:02 -07:00
Wenkai Du 7e3d8a31cc Collect gcnArch and hipDeviceArch_t in XML (#252) 2020-08-12 15:48:38 -07:00
Wenkai Du 066223333d Merge pull request #248 from wenkaidu/2.7.8
2.7.8
2020-08-11 08:20:37 -07:00
Wenkai Du 7e3f841fab Merge remote-tracking branch 'nccl/master' into 2.7.8 2020-08-10 16:11:00 +00:00
Wenkai Du 09ef75656a Add more Rome 4P2H models 2020-08-06 18:20:02 +00:00
Jack Snyder de49a77074 Setting type when gpu sub node is discovered 2020-08-05 13:39:23 -07:00
Eric Badger 700c0e0f24 Don't require NIC devices to have specific PCI class
If a PCI node is the parent of a NIC, treat it as such, regardless of
the PCI class code for the device. This allows non-traditional devices
to act as NICs via the net plugin mechanism.

For consistency, treat GPUs similarly.
2020-08-05 12:46:29 -07:00
Wenkai Du 5b03132ace Allow setup ring through NCCL_RINGS to facilitate testing 2020-08-04 21:07:00 +00:00
Wenkai Du d1e20b4c5e Improve 4P2H topology on Rome (#243)
1. Use bi-directional rings
2. GPU search is sorted by PCI device ID to get consistent results
2020-07-28 14:21:44 -07:00
David Addison 033d799524 2.7.8-1
Fix collective mismatch error when using ncclSend/ncclRecv
2020-07-27 16:34:09 -07:00
Wenkai Du e7a10aa0e4 Topology tuning for 4P2H on Rome (#242)
* Topology tuning for 4P2H on Rome

* Use ncclTopoIdToIndex
2020-07-27 11:53:57 -07:00
Wenkai Du 8d5fb920b6 ib-test: support multiple channels (#241) 2020-07-27 11:03:12 -07:00
Wenkai Du d5f90e19b5 Add 8P6L multi-node models (#239) 2020-07-21 14:10:36 -07:00
Wenkai Du ab787c767e Change default channels duplication for chordal ring (#233) 2020-07-14 15:16:50 -07:00
Wenkai Du 5215130168 Revert "Split primitive class to smaller structures" (#230)
This reverts commit 486fd436af.
2020-07-08 11:06:50 -07:00
Riatre Foo 2d8601701d Fix build action order
Add $(INCTARGETS) to build dependencies of %.o and $(DEVICELIB).
As there were no dep files during the first build, Make may kick off source
compilation before nccl.h got generated, which leads to occasional build
failures on systems with high core count. The build failure could be
reproduced reliably with a `sleep 5` in $(INCDIR)/nccl.h rule.
2020-07-07 10:20:51 -07:00
Wenkai Du da3b197d6c Merge remote-tracking branch 'nccl/master' into develop 2020-07-01 16:51:25 -07:00
Wenkai Du 964c4c2061 Merge sendrecv kernel from NCCL 2.7.3
This commit was cherry-picked and modified from
https://github.com/NVIDIA/nccl/commit/5949d96f36d050e59d05872f8bbffd2549318e95
2020-06-29 08:47:46 -07:00
Wenkai Du b90735c935 Use separate threads for send and receive 2020-06-29 08:47:15 -07:00
Sylvain Jeaugey 1952325569 2.7.6-1
Fix crash when NVswitch is not visible inside a VM.
2020-06-26 16:35:54 -07:00
Sylvain Jeaugey 01afd20a77 2.7.5-1
Minor fixes for A100 platforms.
Add a WARN for invalid GroupEnd call.
2020-06-26 14:39:49 -07:00
Wenkai Du 84f8ba3bb0 Revert use posix_memalign for network buffer allocation on host memory (#222) 2020-06-24 11:25:55 -07:00
Wenkai Du 0eb19a563a Use posix_memalign for network buffer allocation on host memory (#221)
* Use posix_memalign for network buffer allocation on host memory

* ib-test: add ability to specify run iterations

* ib-test: define iterations as multiple of default cycles

* Add checking to posix_memalign return value
2020-06-22 13:06:25 -07:00
Stanley Tsang 8d21adb5e3 Documentation updates for NCCL 2.7.0 (#219)
* Making hip-clang the default compiler; documentation update

* Adding back --hip-clang to install.sh as a silent option for CI

* Documentation updates for NCCL 2.7

* Restoring deleted line in install script
2020-06-16 16:48:11 -06:00
Wenkai Du cfa97eccd3 Add IB/RDMA unit test 2020-06-16 18:29:17 +00:00
Wenkai Du 95b8f70d15 Limit network profiling support to simple protocol and avoid overflow 2020-06-15 20:51:36 +00:00
Wenkai Du 7484e53ff7 Rework network proxy profiling 2020-06-13 03:13:58 +00:00
Wenkai Du b257676f30 Reduce RCCL kernel count as we don't pass first coll in argument 2020-06-12 21:30:04 +00:00
Wenkai Du a6d621176c Sender rank's opCount maybe ahead by one if it finishes earlier 2020-06-12 03:39:45 +00:00