Graf commitů

114 Commity

Autor SHA1 Zpráva Datum
Wenkai Du 7bbce085cc Enable LL128 protocol support (#605)
* Enable LL128 protocol support

* Use shared memory object directly when possible
2022-09-08 14:45:27 -07:00
gilbertlee-amd 47b2fc3a30 Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH (#608)
Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH
2022-09-06 10:29:46 -06:00
Edgar Gabriel 4141ec1151 fix channelcount for multi-rank scenario 2022-08-22 19:09:22 +00:00
Wenkai Du 14b8ff153f Repurpose profiling implementation to simple timestamps tracing (#600) 2022-08-18 15:34:46 -07:00
Ziyue Yang f6b9686482 Improve alignment and tuning for Pivot A2A algorithm (#593)
* Improve alignment and tuning for Pivot A2A algorithm

* enable pivot a2a by default
2022-08-05 19:40:19 -07:00
gilbertlee-amd a89a9966aa Adding git hash info to version output line (#572) 2022-06-28 16:42:51 -06:00
Ziyue Yang 6e93fafdc3 Add Feature - Add NPKit Support in RCCL (#564)
* apply npkit

* fix bug

* add npkit in readme
2022-06-20 14:30:19 -07:00
Edgar 0336ffdf70 Introduce multi-rank support per device.
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
2022-06-10 14:23:12 +00:00
Wenkai Du 7a6c6927ae Enable timing profile option (#558) 2022-06-03 07:05:13 -07:00
Aristotelis e0864e7093 Merge remote-tracking branch 'ncclRepo/master' into develop 2022-06-02 15:27:24 +00:00
Wenkai Du ef499c4810 Add another Rome model (#553)
* Add another Rome model

* Add option to force enable intranet on single node

* Limit p2p channels to number of ranks

* Refine p2p channels handling
2022-05-31 11:31:30 -07:00
akolliasAMD 98f0809a39 Added creation of new tree and added switch for using treesplit for specific cases (#551) 2022-05-25 18:55:14 -04:00
Wenkai Du 6707a270b1 Add switch for pivot alltoall kernel (#549) 2022-05-17 18:14:04 -07:00
Sylvain Jeaugey 7aa1c46fd5 2.12.12-1
Improve allreduce performance when we have more than one network interface per
GPU and we need to use PXN to close rings.
Add support for PCI Gen5 on 5.4 kernels.
Fix crash when setting NCCL_SET_THREAD_NAME.
Fix random crash in init due to uninitialized struct.
Fix hang on cubemesh topologies.
Add P2P_DIRECT_DISABLE parameter to disable direct access to pointers within a
process.
2022-05-13 00:26:57 -07:00
Wenkai Du d28e1cb44f Merge remote-tracking branch 'nccl/master' into develop 2022-04-18 11:15:25 -07:00
Wenkai Du bbe780ca6c Support multiple tuning tables (#522)
* Support multiple tuning tables

* [UnitTests] Skip managed memory testing
2022-03-31 17:09:21 -07:00
Sylvain Jeaugey 353e8ba446 2.12.10-1
Fix bug with CollNet
Fix bug with zero-bytes send/recv operations
Fix NCCL_PARAM implementation to avoid taking a lock on every call
Fix bug when setting NCCL_IB_QPS_PER_CONNECTION to more than one.
Improve error reporting for network errors.
2022-03-30 02:27:01 -07:00
Sylvain Jeaugey 3c223c105a 2.12.7-1
Add network communication through another GPU connected with NVLink
(PXN).
Add aggregation of messages coming from different local GPUs through
PXN and going to the same destination.
Add new v5 plugin API with grouped receives and tags.
Add compat for v4 plugins.
Add naming of NCCL threads to help debugging.
Fix NVLink detection and avoid data corruption when some NVLinks are
down.
Add support for Relaxed Ordering for IB.
Add profiling and timing infrastructure.
2022-03-02 20:48:56 +01:00
Ziyue Yang b569c0a1db Add Pivot AllToAll algorithm for Rome model (#503)
* add a2a pivot interface

* remove debug info

* address comments

* fix bug

* remove custom script

* address comments

* fix bug
2022-02-20 21:09:47 -08:00
Wenkai Du 598c6fdded Update Rome models (#491) 2022-01-14 10:03:30 -08:00
Wenkai Du 434ecb0e1f Merge remote-tracking branch 'origin/develop' into 2.11.4 2022-01-03 09:54:16 -08:00
Wenkai Du e9bf01fb7e Determine fine grained memory availability at RCCL bootstrapping (#471) 2021-11-19 08:12:53 -08:00
Wenkai Du 3a919c1f49 Merge remote-tracking branch 'nccl/master' into develop 2021-11-11 14:22:12 -08:00
Wenkai Du 0331e39f81 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
Wenkai Du 26fc6b0919 profiling: fix incorrect print out in timing profile (#457) 2021-11-03 16:22:21 -07:00
Wenkai Du ec36c4c326 Enable timing profiling mode (#447) 2021-10-27 08:21:48 -07:00
Wenkai Du 2508507d0a Fix PCIe gen detection (#437)
* Fix PCIe gen detection

* Update profiling support
2021-10-15 08:23:50 -07:00
Wenkai Du 2249a1d9d3 Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning
2021-10-12 08:23:20 -07:00
Wenkai Du 29c729d8b6 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Ke Wen e11238b302 2.11.4-1
Add new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum).
Improve CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration.
Add environment variable NCCL_NET_PLUGIN="<suffix>" to allow user to choose among multiple NCCL net plugins by substituting into "libnccl-net-<suffix>.so".
Fix memory leak of NVB connections.
Fix topology detection of IB Virtual Functions (SR-IOV).
2021-09-08 16:06:23 -07:00
Wenkai Du 8ee2b7932a Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-13 15:51:53 -07:00
Wenkai Du ef432e48e1 Update tuning table (#424) 2021-09-13 08:39:01 -07:00
Wenkai Du 31bd4236f1 Remove atomic from profiling 2021-09-08 14:20:32 -05:00
Wenkai Du 5c8380ff5b Implement NIC identification and remapping (#420)
* Add 1H16P GPU model

* Implement NIC identification and remapping

* Revert "Sort IB devices based on device name (#413)"

This reverts commit 2d0ed8dff6.

* Fix permute and check order

* Correction on IB speed reporting

* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"

This reverts commit caf5c9992a.
2021-08-24 09:42:04 -07:00
Wenkai Du d5f93649ff Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-08-24 09:49:47 -07:00
Wenkai Du bf2339f93e Merge remote-tracking branch 'nccl/master' into 2.10.3 2021-07-30 16:23:14 -07:00
Wenkai Du 818cdb16a8 Query XGMI links from xml and adjust gfx906 channel usage (#410) 2021-07-27 17:32:41 -07:00
Wenkai Du 135d47d125 topo_expl: fix build after switching to rocm-smi-lib (#405)
* topo_expl: fix build after switching to rocm-smi-lib

* Use minimal of 4 channels for gfx908
2021-07-27 08:30:08 -07:00
Lu bd6dbca8fb Add more info to RCCL logging for topo-aware optim. 2021-07-22 09:52:39 -07:00
Ke Wen 7e51592129 2.10.3-1
Add support for bfloat16.
Add ncclAvg reduction operation.
Improve performance for aggregated operations.
Improve performance for tree.
Improve network error reporting.
Add NCCL_NET parameter to force a specific network.
Add NCCL_IB_QPS_PER_CONNECTION parameter to split IB traffic onto multiple queue pairs.
Fix topology detection error in WSL2.
Fix proxy memory elements affinity (improve alltoall performance).
Fix graph search on cubemesh topologies.
Fix hang in cubemesh during NVB connections.
2021-07-08 14:30:14 -07:00
Wenkai Du 56155ff5b6 Use rocm_smi_lib for getting topology information (#402)
* Use rocm_smi_lib for getting topology information

* Add rocm-smi-lib dependency to RCCL package
2021-07-08 13:23:11 -07:00
Wenkai Du 6dcae8a459 Select sendrecv path based on collective data size (#391)
* Select sendrecv path based on collective data size

* Add comments on packing and unpacking group field

* Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests
2021-06-10 17:51:04 -07:00
Wenkai Du b815a2800f Setup collectives threshold for enabling intranet (#387)
* Setup collectives threshold for enabling intranet

* Use separate operation counters for coll and p2p
2021-06-09 13:24:26 -07:00
Wenkai Du e3abf1c2ec Merge remote-tracking branch 'nccl/master' into develop 2021-05-25 20:52:15 -07:00
Wenkai Du 87727383fe Merge remote-tracking branch 'nccl/master' into 2.9.8 2021-05-17 10:15:16 -07:00
Sylvain Jeaugey 3fec2fa5ee 2.9.9-1
Fix crash when setting NCCL_MAX_P2P_NCHANNELS below nchannels.
Fix hang during sendrecv dynamic NVB connection establishment on
cubemesh topologies.
Add environment variable to only use SHARP on communicators beyond
a given number of ranks.
Add debug subsystem to trace memory allocations.
Fix compilation with TRACE=1. (Issue #505)
2021-05-12 11:09:31 -07:00
gilbertlee-amd e796b1645c Clique tuning upgrade (#352) (#19)
* Enabling clique for any XGMI-connected topology, adding tuning
* Updating CHANGELOG for clique tuning
* Re-working clique barrier system to work on multi-process / multi-gpu
2021-05-11 08:44:59 -06:00
Sylvain Jeaugey ca8485b0d0 2.9.8-1
Fix memory leaks.
Fix crash in bootstrap error case.
Fix Collnet clean-up issue.
Make PCI switch vendor/device optional for XML injection.
Add support for nvidia-peermem module.
2021-05-10 14:00:03 -07:00
Wenkai Du a4ea1fed5b Merge remote-tracking branch 'nccl/master' into develop 2021-05-05 16:01:01 -07:00
Wenkai Du ed237dcaa7 Use better name for kernel collective trace enable (#357)
"NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL" enables collectives API
trace. Adding "RCCL_KERNEL_COLL_TRACE_ENABLE=1" enables kernel traces.
2021-04-26 08:35:53 -07:00