Wenkai Du
|
ff951e607d
|
Improve debug messages of memory allocations (#1107)
|
2024-03-08 10:55:10 -08:00 |
|
Wenkai Du
|
77615cce28
|
msccl: fix scratch memory allocation after API change (#1103)
|
2024-03-06 11:11:04 -08:00 |
|
Wenkai Du
|
cbd955627e
|
Add support for using contiguous for GPU direct RDMA (#1096)
Enabled by env var RCCL_NET_CONTIGUOUS_MEM=1
|
2024-02-29 10:06:43 -08:00 |
|
Wenkai Du
|
df98a6957d
|
Add another Rome model (#1095)
|
2024-02-28 10:46:05 -08:00 |
|
Bertan Dogancay
|
b617aecc31
|
Implement ROCTX (#1094)
* Implement roctx
|
2024-02-27 15:46:15 -07:00 |
|
Wenkai Du
|
74f9e5db64
|
Add new GPU model (#1080)
|
2024-02-23 12:19:42 -08:00 |
|
Wenkai Du
|
c5ab37211b
|
Update RCCL/MSCCL work FIFO depth to 256K (#1091)
|
2024-02-21 17:15:11 -08:00 |
|
Bertan Dogancay
|
b275ed0b56
|
LL128 check if all XGMI (#1089)
|
2024-02-21 09:41:40 -07:00 |
|
Bertan Dogancay
|
2fb12a9358
|
Merge pull request #1079 from BertanDogancay/2.19.4-sync
2.19.4 Sync
|
2024-02-16 09:50:11 -07:00 |
|
akolliasAMD
|
bac57421c7
|
Allow bus id to be null (#1085)
* Allow bus id to be null
|
2024-02-15 16:36:51 -07:00 |
|
BertanDogancay
|
6f3310605c
|
Disable unsupported ld/st instructions
|
2024-02-15 13:58:16 -08:00 |
|
BertanDogancay
|
76f83f95ab
|
Merge remote-tracking branch 'rccl/develop' into 2.19.4
|
2024-02-15 13:37:14 -08:00 |
|
Wenkai Du
|
51003c9980
|
Use native half without conversion (#1083)
|
2024-02-13 16:57:34 -08:00 |
|
Wenkai Du
|
1f0af90206
|
Fix undefined symbol when nvtx is not enabled (#1082)
|
2024-02-13 14:03:43 -08:00 |
|
BertanDogancay
|
32cca51894
|
Fix docs
|
2024-02-11 22:32:55 -08:00 |
|
Wenkai Du
|
d999d9ad21
|
Merge remote-tracking branch 'rccl/develop' into 2.19.4
|
2024-02-09 11:31:03 -06:00 |
|
Wenkai Du
|
5669b0d7b6
|
2.18.5 fix (#1077)
* Revert "Revert "2.18.5-1""
This reverts commit 767fde8210.
* Fix initial net device value
|
2024-02-09 09:18:38 -08:00 |
|
Bertan Dogancay
|
8a442faa12
|
Nvtx support (#1076)
* NVTX support
|
2024-02-08 14:08:24 -07:00 |
|
Wenkai Du
|
5257c753c5
|
msccl: use relaxed atomics on scratch buffer (#1075)
|
2024-02-08 12:09:56 -08:00 |
|
Wenkai Du
|
704c9ef0d1
|
Doubling P2P channels per peer on single node gfx94x only (#1074)
|
2024-02-07 14:05:57 -08:00 |
|
Wenkai Du
|
1d989f6524
|
Doubling P2P channels per peer on single node only (#1069)
|
2024-02-02 12:41:00 -08:00 |
|
BertanDogancay
|
12ac20ade5
|
Revert re-usage of connect and listen ports
|
2024-02-01 10:03:13 -08:00 |
|
BertanDogancay
|
00fdb1ef51
|
Clean up
|
2024-01-31 17:27:15 -08:00 |
|
BertanDogancay
|
da85abab54
|
Fix stack size
|
2024-01-31 17:09:07 -08:00 |
|
Wenkai Du
|
95f87232c4
|
Fix transport merge
|
2024-01-31 17:35:12 -06:00 |
|
Wenkai Du
|
1a134b283b
|
Merge remote-tracking branch 'rccl/develop' into 2.19.4
|
2024-01-31 11:53:10 -06:00 |
|
BertanDogancay
|
9ff53eeeae
|
Merge remote-tracking branch 'nccl/master' into develop
|
2024-01-30 14:43:43 -08:00 |
|
Bertan Dogancay
|
01b359027b
|
Include common.h in enqueue.cc instead (#1067)
|
2024-01-30 08:24:22 -08:00 |
|
Wenkai Du
|
f7550d83b8
|
msccl: ensure memory coherence after data receive (#1062)
|
2024-01-30 08:22:50 -08:00 |
|
BertanDogancay
|
31ec5d5cb0
|
correct data type
|
2024-01-28 19:55:19 -08:00 |
|
Pedram Alizadeh
|
ccfb35fa6d
|
modifying the tuning table to improve the performance of allreduce for 8MB and 16MB for single-node MI300X (#1063)
|
2024-01-26 09:05:53 -05:00 |
|
Wenkai Du
|
be8ef4367f
|
colltrace: fix dropped trace messages (#1059)
* colltrace: fix dropped trace messages
* Remove extra space
|
2024-01-25 13:31:53 -08:00 |
|
Wenkai Du
|
ffde530af5
|
Increase P2P channels per peer (#1060)
|
2024-01-25 11:21:58 -08:00 |
|
Wenkai Du
|
4aafb2a3c5
|
Fix sendrecv merge
|
2024-01-24 16:23:53 -08:00 |
|
BertanDogancay
|
81ddf9de89
|
Merge remote-tracking branch 'nccl/v2.19' into develop
|
2024-01-24 15:25:33 -08:00 |
|
Wenkai Du
|
7987015a19
|
Revert "msccl: build same number of kernels as in ROCm 5.7" (#1058)
This reverts commit f960174d03be7e5174baa83b256526d388a38842.
|
2024-01-24 08:43:50 -08:00 |
|
Bertan Dogancay
|
5564d65e71
|
Use binary search for direct function calls (#1057)
* Use binary search for direct function calls
* fix scratch mem issue on MI300
|
2024-01-22 17:37:56 -07:00 |
|
Bertan Dogancay
|
c4dbf8a914
|
Fix collective trace when rccl is configured (#1056)
* Fix collective trace when rccl is configured
|
2024-01-22 09:26:44 -07:00 |
|
Wenkai Du
|
7e25d5bc55
|
Use new HIP graph API compatible with CUDA 11030 (#991)
* Use new HIP graph API compatible with CUDA 11030
* Update dependency to ROCm 6.1
* Fix single stream use case
|
2024-01-21 19:00:50 -08:00 |
|
Nilesh M Negi
|
8b97a20943
|
COLLECTIVES: Switch to unroll 2 for MI300 (#1051)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
|
2024-01-19 12:16:05 -06:00 |
|
Bertan Dogancay
|
28d9b170c9
|
[DEV] Configure functions in RCCL (#986)
* configure functions in rccl
|
2024-01-18 15:07:16 -07:00 |
|
Wenkai Du
|
3325f96c56
|
Only use full MAXCHANNELS for gfx94x (#1050)
|
2024-01-17 09:00:49 -08:00 |
|
Pedram Alizadeh
|
b08124c85d
|
adding rccl tuning parameters for MI300X gfx942 with 8 GPUs single and multi-node (#1047)
|
2024-01-16 13:44:32 -05:00 |
|
Wenkai Du
|
261707d90a
|
Add option to force enable network transport on single node (#1046)
|
2024-01-16 07:54:18 -08:00 |
|
PedramAlizadeh
|
767fde8210
|
Revert "2.18.5-1"
This reverts commit 559b70f86c.
|
2024-01-12 16:54:19 +00:00 |
|
Bertan Dogancay
|
cf248d9402
|
Addressing the compiler warning (#988)
|
2024-01-10 14:59:40 -07:00 |
|
Hossein Pourreza
|
735178c1fe
|
cover more gpu/nic mapping cases (#1037)
|
2024-01-10 08:01:37 -08:00 |
|
Wenkai Du
|
5851ae5974
|
Re-enable L128 on gfx90a of compiler supports it (#1036)
|
2024-01-10 08:01:11 -08:00 |
|
Nilesh M Negi
|
249e9f7f65
|
Un-escaped character causes error with address sanitizer builds (#992)
Signed-off-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Jenkins <jenkins-compute@amd.com>
|
2024-01-09 13:28:32 -06:00 |
|
Pedram Alizadeh
|
aa5c84c997
|
Merge pull request #1022 from PedramAlizadeh/sync_nccl_2.18.6
Sync to nccl 2.18.6
|
2024-01-09 13:29:29 -05:00 |
|