Commit Graph

18 Commits

Author SHA1 Message Date
Wenkai Du a0cef69110 npkit: add broadcast trace (#1166) 2024-05-07 14:00:16 -07:00
Wenkai Du 4e1b8c1cbb MSCCL: add support for out-of-place all reduce (#1156) 2024-04-28 19:49:09 -07:00
Wenkai Du f330b82985 Revert "Use relaxed atomics for LL on GFX11 (#859)" (#1148)
This reverts commit 6a0a6a37d9.

Use inline asm for 128b load on GFX11 for better peformance.
2024-04-26 07:49:55 -07:00
Shilei Tian efe99057b0 SWDEV-455705: Fix an UB that could lead to miscompilation (#1155) 2024-04-25 10:10:01 -07:00
Wenkai Du 9e0c9b4ed8 Replace __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1154) 2024-04-25 07:19:18 -07:00
BertanDogancay e1a835910e Merge remote-tracking branch 'nccl/master' into develop 2024-04-23 13:34:00 -07:00
mberenjk 428837ffe4 replacing rccl_bfloat16 with hip_bfloat16 (#1126)
Co-authored-by: mberenjk <mberenjk@amd.com>
2024-04-11 11:30:37 -05:00
Andy li 6777e65c1d Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo
2024-03-08 15:17:53 -08:00
BertanDogancay 6f3310605c Disable unsupported ld/st instructions 2024-02-15 13:58:16 -08:00
BertanDogancay 76f83f95ab Merge remote-tracking branch 'rccl/develop' into 2.19.4 2024-02-15 13:37:14 -08:00
Sylvain Jeaugey b6475625fb 2.20.3-1
Add support for alternating rings, allow for cross-nic rings without
cross-rail communication.
Add support for user buffer registration for network send/recv.
Optimize aggregated operations to better utilize all channels.
Add flattening for BCM PCI gen5 switches.
Add support for inter-node NVLink communication
Add support for port fusion in NET/IB.
Add support for ReduceScatter and AllGather using Collnet.
Update net API to v8.
Fix hang during A2A connection.
2024-02-13 04:22:38 -08:00
Wenkai Du d999d9ad21 Merge remote-tracking branch 'rccl/develop' into 2.19.4 2024-02-09 11:31:03 -06:00
BertanDogancay da85abab54 Fix stack size 2024-01-31 17:09:07 -08:00
Wenkai Du 1a134b283b Merge remote-tracking branch 'rccl/develop' into 2.19.4 2024-01-31 11:53:10 -06:00
BertanDogancay 31ec5d5cb0 correct data type 2024-01-28 19:55:19 -08:00
Wenkai Du 4aafb2a3c5 Fix sendrecv merge 2024-01-24 16:23:53 -08:00
BertanDogancay 81ddf9de89 Merge remote-tracking branch 'nccl/v2.19' into develop 2024-01-24 15:25:33 -08:00
Sylvain Jeaugey f9c3dc251e 2.19.1-1
Add local user buffer registration for NVLink SHARP.
Add tuning plugin support.
Increase net API to v7 to allow for device-side packet reordering;
remove support for v4 plugins.
Add support for RoCE ECE.
Add support for C2C links.
Better detect SHM allocation failures to avoid crash with Bus Error.
Fix missing thread unlocks in bootstrap (Fixes #936).
Disable network flush by default on H100.
Move device code from src/collectives/device to src/device.
2023-09-26 05:50:33 -07:00