rocm-systems

Autor	SHA1	Zpráva	Datum
Wenkai Du	7bbce085cc	Enable LL128 protocol support (#605 ) * Enable LL128 protocol support * Use shared memory object directly when possible	2022-09-08 14:45:27 -07:00
gilbertlee-amd	47b2fc3a30	Adding opt-in hipGraph support for RCCL via RCCL_ENABLE_HIPGRAPH (#608 ) Adding opt-in hipGraph support via RCCL_ENABLE_HIPGRAPH	2022-09-06 10:29:46 -06:00
Edgar Gabriel	4141ec1151	fix channelcount for multi-rank scenario	2022-08-22 19:09:22 +00:00
Wenkai Du	14b8ff153f	Repurpose profiling implementation to simple timestamps tracing (#600 )	2022-08-18 15:34:46 -07:00
Ziyue Yang	f6b9686482	Improve alignment and tuning for Pivot A2A algorithm (#593 ) * Improve alignment and tuning for Pivot A2A algorithm * enable pivot a2a by default	2022-08-05 19:40:19 -07:00
gilbertlee-amd	a89a9966aa	Adding git hash info to version output line (#572 )	2022-06-28 16:42:51 -06:00
Ziyue Yang	6e93fafdc3	Add Feature - Add NPKit Support in RCCL (#564 ) * apply npkit * fix bug * add npkit in readme	2022-06-20 14:30:19 -07:00
Edgar	0336ffdf70	Introduce multi-rank support per device. This is a single commit of the source code changes required to introduce support for multiple ranks per device. A new interface (ncclCommRankInitMulti) has to be used to make use of this new feature.	2022-06-10 14:23:12 +00:00
Wenkai Du	7a6c6927ae	Enable timing profile option (#558 )	2022-06-03 07:05:13 -07:00
Aristotelis	e0864e7093	Merge remote-tracking branch 'ncclRepo/master' into develop	2022-06-02 15:27:24 +00:00
Wenkai Du	ef499c4810	Add another Rome model (#553 ) * Add another Rome model * Add option to force enable intranet on single node * Limit p2p channels to number of ranks * Refine p2p channels handling	2022-05-31 11:31:30 -07:00
akolliasAMD	98f0809a39	Added creation of new tree and added switch for using treesplit for specific cases (#551 )	2022-05-25 18:55:14 -04:00
Wenkai Du	6707a270b1	Add switch for pivot alltoall kernel (#549 )	2022-05-17 18:14:04 -07:00
Sylvain Jeaugey	7aa1c46fd5	2.12.12-1 Improve allreduce performance when we have more than one network interface per GPU and we need to use PXN to close rings. Add support for PCI Gen5 on 5.4 kernels. Fix crash when setting NCCL_SET_THREAD_NAME. Fix random crash in init due to uninitialized struct. Fix hang on cubemesh topologies. Add P2P_DIRECT_DISABLE parameter to disable direct access to pointers within a process.	2022-05-13 00:26:57 -07:00
Wenkai Du	d28e1cb44f	Merge remote-tracking branch 'nccl/master' into develop	2022-04-18 11:15:25 -07:00
Wenkai Du	bbe780ca6c	Support multiple tuning tables (#522 ) * Support multiple tuning tables * [UnitTests] Skip managed memory testing	2022-03-31 17:09:21 -07:00
Sylvain Jeaugey	353e8ba446	2.12.10-1 Fix bug with CollNet Fix bug with zero-bytes send/recv operations Fix NCCL_PARAM implementation to avoid taking a lock on every call Fix bug when setting NCCL_IB_QPS_PER_CONNECTION to more than one. Improve error reporting for network errors.	2022-03-30 02:27:01 -07:00
Sylvain Jeaugey	3c223c105a	2.12.7-1 Add network communication through another GPU connected with NVLink (PXN). Add aggregation of messages coming from different local GPUs through PXN and going to the same destination. Add new v5 plugin API with grouped receives and tags. Add compat for v4 plugins. Add naming of NCCL threads to help debugging. Fix NVLink detection and avoid data corruption when some NVLinks are down. Add support for Relaxed Ordering for IB. Add profiling and timing infrastructure.	2022-03-02 20:48:56 +01:00
Ziyue Yang	b569c0a1db	Add Pivot AllToAll algorithm for Rome model (#503 ) * add a2a pivot interface * remove debug info * address comments * fix bug * remove custom script * address comments * fix bug	2022-02-20 21:09:47 -08:00
Wenkai Du	598c6fdded	Update Rome models (#491 )	2022-01-14 10:03:30 -08:00
Wenkai Du	434ecb0e1f	Merge remote-tracking branch 'origin/develop' into 2.11.4	2022-01-03 09:54:16 -08:00
Wenkai Du	e9bf01fb7e	Determine fine grained memory availability at RCCL bootstrapping (#471 )	2021-11-19 08:12:53 -08:00
Wenkai Du	3a919c1f49	Merge remote-tracking branch 'nccl/master' into develop	2021-11-11 14:22:12 -08:00
Wenkai Du	0331e39f81	Update Rome model matching (#461 ) * Update Rome model matching * Add another Rome model * Automatically setup NET GDR level from model	2021-11-05 08:53:47 -07:00
Wenkai Du	26fc6b0919	profiling: fix incorrect print out in timing profile (#457 )	2021-11-03 16:22:21 -07:00
Wenkai Du	ec36c4c326	Enable timing profiling mode (#447 )	2021-10-27 08:21:48 -07:00
Wenkai Du	2508507d0a	Fix PCIe gen detection (#437 ) * Fix PCIe gen detection * Update profiling support	2021-10-15 08:23:50 -07:00
Wenkai Du	2249a1d9d3	Add more Rome models (#434 ) * Add more Rome models * Update models and tuning * Update tuning	2021-10-12 08:23:20 -07:00
Wenkai Du	29c729d8b6	Trim NICs when all GPUs are connected by XGMI (#430 ) * Trim NICs when all GPUs are connected by XGMI * Only enable clique with maximum of 2 hops	2021-10-05 18:27:43 -07:00
Ke Wen	e11238b302	2.11.4-1 Add new API for creating a reduction operation which multiplies the input by a rank-specific scalar before doing an inter-rank summation (see: ncclRedOpCreatePreMulSum). Improve CollNet (SHARP) performance of ncclAllReduce when captured in a CUDA Graph via user buffer registration. Add environment variable NCCL_NET_PLUGIN="<suffix>" to allow user to choose among multiple NCCL net plugins by substituting into "libnccl-net-<suffix>.so". Fix memory leak of NVB connections. Fix topology detection of IB Virtual Functions (SR-IOV).	2021-09-08 16:06:23 -07:00
Wenkai Du	8ee2b7932a	Merge remote-tracking branch 'origin/develop' into 2.10.3	2021-09-13 15:51:53 -07:00
Wenkai Du	ef432e48e1	Update tuning table (#424 )	2021-09-13 08:39:01 -07:00
Wenkai Du	31bd4236f1	Remove atomic from profiling	2021-09-08 14:20:32 -05:00
Wenkai Du	5c8380ff5b	Implement NIC identification and remapping (#420 ) * Add 1H16P GPU model * Implement NIC identification and remapping * Revert "Sort IB devices based on device name (#413)" This reverts commit `2d0ed8dff6`. * Fix permute and check order * Correction on IB speed reporting * Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)" This reverts commit `caf5c9992a`.	2021-08-24 09:42:04 -07:00
Wenkai Du	d5f93649ff	Merge remote-tracking branch 'origin/develop' into 2.10.3	2021-08-24 09:49:47 -07:00
Wenkai Du	bf2339f93e	Merge remote-tracking branch 'nccl/master' into 2.10.3	2021-07-30 16:23:14 -07:00
Wenkai Du	818cdb16a8	Query XGMI links from xml and adjust gfx906 channel usage (#410 )	2021-07-27 17:32:41 -07:00
Wenkai Du	135d47d125	topo_expl: fix build after switching to rocm-smi-lib (#405 ) * topo_expl: fix build after switching to rocm-smi-lib * Use minimal of 4 channels for gfx908	2021-07-27 08:30:08 -07:00
Lu	bd6dbca8fb	Add more info to RCCL logging for topo-aware optim.	2021-07-22 09:52:39 -07:00
Ke Wen	7e51592129	2.10.3-1 Add support for bfloat16. Add ncclAvg reduction operation. Improve performance for aggregated operations. Improve performance for tree. Improve network error reporting. Add NCCL_NET parameter to force a specific network. Add NCCL_IB_QPS_PER_CONNECTION parameter to split IB traffic onto multiple queue pairs. Fix topology detection error in WSL2. Fix proxy memory elements affinity (improve alltoall performance). Fix graph search on cubemesh topologies. Fix hang in cubemesh during NVB connections.	2021-07-08 14:30:14 -07:00
Wenkai Du	56155ff5b6	Use rocm_smi_lib for getting topology information (#402 ) * Use rocm_smi_lib for getting topology information * Add rocm-smi-lib dependency to RCCL package	2021-07-08 13:23:11 -07:00
Wenkai Du	6dcae8a459	Select sendrecv path based on collective data size (#391 ) * Select sendrecv path based on collective data size * Add comments on packing and unpacking group field * Toggling RCCL_P2P_NET_DISABLE in combined calls unit tests	2021-06-10 17:51:04 -07:00
Wenkai Du	b815a2800f	Setup collectives threshold for enabling intranet (#387 ) * Setup collectives threshold for enabling intranet * Use separate operation counters for coll and p2p	2021-06-09 13:24:26 -07:00
Wenkai Du	e3abf1c2ec	Merge remote-tracking branch 'nccl/master' into develop	2021-05-25 20:52:15 -07:00
Wenkai Du	87727383fe	Merge remote-tracking branch 'nccl/master' into 2.9.8	2021-05-17 10:15:16 -07:00
Sylvain Jeaugey	3fec2fa5ee	2.9.9-1 Fix crash when setting NCCL_MAX_P2P_NCHANNELS below nchannels. Fix hang during sendrecv dynamic NVB connection establishment on cubemesh topologies. Add environment variable to only use SHARP on communicators beyond a given number of ranks. Add debug subsystem to trace memory allocations. Fix compilation with TRACE=1. (Issue #505)	2021-05-12 11:09:31 -07:00
gilbertlee-amd	e796b1645c	Clique tuning upgrade (#352 ) (#19 ) * Enabling clique for any XGMI-connected topology, adding tuning * Updating CHANGELOG for clique tuning * Re-working clique barrier system to work on multi-process / multi-gpu	2021-05-11 08:44:59 -06:00
Sylvain Jeaugey	ca8485b0d0	2.9.8-1 Fix memory leaks. Fix crash in bootstrap error case. Fix Collnet clean-up issue. Make PCI switch vendor/device optional for XML injection. Add support for nvidia-peermem module.	2021-05-10 14:00:03 -07:00
Wenkai Du	a4ea1fed5b	Merge remote-tracking branch 'nccl/master' into develop	2021-05-05 16:01:01 -07:00
Wenkai Du	ed237dcaa7	Use better name for kernel collective trace enable (#357 ) "NCCL_DEBUG=INFO NCCL_DEBUG_SUBSYS=INIT,COLL" enables collectives API trace. Adding "RCCL_KERNEL_COLL_TRACE_ENABLE=1" enables kernel traces.	2021-04-26 08:35:53 -07:00

1 2 3

114 Commity