Andy li
6777e65c1d
Enable fp8 support ( #1101 )
...
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
2024-03-08 15:17:53 -08:00
Wenkai Du
ff951e607d
Improve debug messages of memory allocations ( #1107 )
2024-03-08 10:55:10 -08:00
Wenkai Du
77615cce28
msccl: fix scratch memory allocation after API change ( #1103 )
2024-03-06 11:11:04 -08:00
Wenkai Du
cbd955627e
Add support for using contiguous for GPU direct RDMA ( #1096 )
...
Enabled by env var RCCL_NET_CONTIGUOUS_MEM=1
2024-02-29 10:06:43 -08:00
Wenkai Du
df98a6957d
Add another Rome model ( #1095 )
2024-02-28 10:46:05 -08:00
Bertan Dogancay
b617aecc31
Implement ROCTX ( #1094 )
...
* Implement roctx
2024-02-27 15:46:15 -07:00
Wenkai Du
74f9e5db64
Add new GPU model ( #1080 )
2024-02-23 12:19:42 -08:00
Wenkai Du
c5ab37211b
Update RCCL/MSCCL work FIFO depth to 256K ( #1091 )
2024-02-21 17:15:11 -08:00
Bertan Dogancay
b275ed0b56
LL128 check if all XGMI ( #1089 )
2024-02-21 09:41:40 -07:00
Bertan Dogancay
2fb12a9358
Merge pull request #1079 from BertanDogancay/2.19.4-sync
...
2.19.4 Sync
2024-02-16 09:50:11 -07:00
akolliasAMD
bac57421c7
Allow bus id to be null ( #1085 )
...
* Allow bus id to be null
2024-02-15 16:36:51 -07:00
BertanDogancay
6f3310605c
Disable unsupported ld/st instructions
2024-02-15 13:58:16 -08:00
BertanDogancay
76f83f95ab
Merge remote-tracking branch 'rccl/develop' into 2.19.4
2024-02-15 13:37:14 -08:00
Wenkai Du
51003c9980
Use native half without conversion ( #1083 )
2024-02-13 16:57:34 -08:00
Wenkai Du
1f0af90206
Fix undefined symbol when nvtx is not enabled ( #1082 )
2024-02-13 14:03:43 -08:00
BertanDogancay
32cca51894
Fix docs
2024-02-11 22:32:55 -08:00
Wenkai Du
d999d9ad21
Merge remote-tracking branch 'rccl/develop' into 2.19.4
2024-02-09 11:31:03 -06:00
Wenkai Du
5669b0d7b6
2.18.5 fix ( #1077 )
...
* Revert "Revert "2.18.5-1""
This reverts commit 767fde8210 .
* Fix initial net device value
2024-02-09 09:18:38 -08:00
Bertan Dogancay
8a442faa12
Nvtx support ( #1076 )
...
* NVTX support
2024-02-08 14:08:24 -07:00
Wenkai Du
5257c753c5
msccl: use relaxed atomics on scratch buffer ( #1075 )
2024-02-08 12:09:56 -08:00
Wenkai Du
704c9ef0d1
Doubling P2P channels per peer on single node gfx94x only ( #1074 )
2024-02-07 14:05:57 -08:00
Wenkai Du
1d989f6524
Doubling P2P channels per peer on single node only ( #1069 )
2024-02-02 12:41:00 -08:00
BertanDogancay
12ac20ade5
Revert re-usage of connect and listen ports
2024-02-01 10:03:13 -08:00
BertanDogancay
00fdb1ef51
Clean up
2024-01-31 17:27:15 -08:00
BertanDogancay
da85abab54
Fix stack size
2024-01-31 17:09:07 -08:00
Wenkai Du
95f87232c4
Fix transport merge
2024-01-31 17:35:12 -06:00
Wenkai Du
1a134b283b
Merge remote-tracking branch 'rccl/develop' into 2.19.4
2024-01-31 11:53:10 -06:00
BertanDogancay
9ff53eeeae
Merge remote-tracking branch 'nccl/master' into develop
2024-01-30 14:43:43 -08:00
Bertan Dogancay
01b359027b
Include common.h in enqueue.cc instead ( #1067 )
2024-01-30 08:24:22 -08:00
Wenkai Du
f7550d83b8
msccl: ensure memory coherence after data receive ( #1062 )
2024-01-30 08:22:50 -08:00
BertanDogancay
31ec5d5cb0
correct data type
2024-01-28 19:55:19 -08:00
Pedram Alizadeh
ccfb35fa6d
modifying the tuning table to improve the performance of allreduce for 8MB and 16MB for single-node MI300X ( #1063 )
2024-01-26 09:05:53 -05:00
Wenkai Du
be8ef4367f
colltrace: fix dropped trace messages ( #1059 )
...
* colltrace: fix dropped trace messages
* Remove extra space
2024-01-25 13:31:53 -08:00
Wenkai Du
ffde530af5
Increase P2P channels per peer ( #1060 )
2024-01-25 11:21:58 -08:00
Wenkai Du
4aafb2a3c5
Fix sendrecv merge
2024-01-24 16:23:53 -08:00
BertanDogancay
81ddf9de89
Merge remote-tracking branch 'nccl/v2.19' into develop
2024-01-24 15:25:33 -08:00
Wenkai Du
7987015a19
Revert "msccl: build same number of kernels as in ROCm 5.7" ( #1058 )
...
This reverts commit f960174d03be7e5174baa83b256526d388a38842.
2024-01-24 08:43:50 -08:00
Bertan Dogancay
5564d65e71
Use binary search for direct function calls ( #1057 )
...
* Use binary search for direct function calls
* fix scratch mem issue on MI300
2024-01-22 17:37:56 -07:00
Bertan Dogancay
c4dbf8a914
Fix collective trace when rccl is configured ( #1056 )
...
* Fix collective trace when rccl is configured
2024-01-22 09:26:44 -07:00
Wenkai Du
7e25d5bc55
Use new HIP graph API compatible with CUDA 11030 ( #991 )
...
* Use new HIP graph API compatible with CUDA 11030
* Update dependency to ROCm 6.1
* Fix single stream use case
2024-01-21 19:00:50 -08:00
Nilesh M Negi
8b97a20943
COLLECTIVES: Switch to unroll 2 for MI300 ( #1051 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-01-19 12:16:05 -06:00
Bertan Dogancay
28d9b170c9
[DEV] Configure functions in RCCL ( #986 )
...
* configure functions in rccl
2024-01-18 15:07:16 -07:00
Wenkai Du
3325f96c56
Only use full MAXCHANNELS for gfx94x ( #1050 )
2024-01-17 09:00:49 -08:00
Pedram Alizadeh
b08124c85d
adding rccl tuning parameters for MI300X gfx942 with 8 GPUs single and multi-node ( #1047 )
2024-01-16 13:44:32 -05:00
Wenkai Du
261707d90a
Add option to force enable network transport on single node ( #1046 )
2024-01-16 07:54:18 -08:00
PedramAlizadeh
767fde8210
Revert "2.18.5-1"
...
This reverts commit 559b70f86c .
2024-01-12 16:54:19 +00:00
Bertan Dogancay
cf248d9402
Addressing the compiler warning ( #988 )
2024-01-10 14:59:40 -07:00
Hossein Pourreza
735178c1fe
cover more gpu/nic mapping cases ( #1037 )
2024-01-10 08:01:37 -08:00
Wenkai Du
5851ae5974
Re-enable L128 on gfx90a of compiler supports it ( #1036 )
2024-01-10 08:01:11 -08:00
Nilesh M Negi
249e9f7f65
Un-escaped character causes error with address sanitizer builds ( #992 )
...
Signed-off-by: Nilesh M Negi <Nilesh.Negi@amd.com >
Co-authored-by: Jenkins <jenkins-compute@amd.com >
2024-01-09 13:28:32 -06:00