Mustafa Abduljabbar
128b0e7074
Remove MSCCL single node AllGather XMLs ( #1693 )
...
* Remove MSCCL single node XMLs
* Remove comment on MSCCL AG single node support
[ROCm/rccl commit: d665547eef ]
2025-05-13 17:07:03 -05:00
Mustafa Abduljabbar
a85cfaa680
[AllGather MSCCL] Multinode and single node support up to certain send count ( #1650 )
...
* Add multinode and singlenode allgather XML
[ROCm/rccl commit: aa7991dfc8 ]
2025-04-24 09:02:03 -04:00
Pedram Alizadeh
b225281747
single-node AR msccl algorithm tuning for MI300 ( #1629 )
...
[ROCm/rccl commit: 5b36b68d06 ]
2025-04-10 10:42:28 -04:00
Wenkai Du
5f8571dcbc
msccl: disable 1-shot xmls ( #1375 )
...
MSCCL 1-shot xmls may cause different output values on different ranks.
Disabling them for now to avoid undefined behavior in applications.
[ROCm/rccl commit: 62d10fdc25 ]
2024-10-14 15:10:53 -07:00
Wenkai Du
9ad1fe571b
Temporarily disable MSCCL all gather XMLs due to UT failure ( #1373 )
...
[ROCm/rccl commit: a680e329e6 ]
2024-10-12 08:43:16 -07:00
ClementLinCF
4f56aa5f8c
Optimize NCHANNELS and MSCCL config for gfx942 80CUs ( #1195 )
...
* Optimize NCHANNELS and MSCCL config for gfx942 80CUs
Set appropriately for different NCCL_MIN_NCHANNELS and MSCCL config,
potentially improving communication perf on the MI300x 80CUs
* Delete tools/msccl-algorithms/allreduce_1step_mccl_8_2_16777216_LL.xml
* Change the factor of gfx94 and update msccl config
[ROCm/rccl commit: cab25f919e ]
2024-06-01 07:07:46 -07:00
Wenkai Du
3906e992f8
MSCCL: add support for out-of-place all reduce ( #1156 )
...
[ROCm/rccl commit: 4e1b8c1cbb ]
2024-04-28 19:49:09 -07:00
Pedram Alizadeh
61f89d680d
msccl algorithms tuning for alltoall on MI300 ( #1120 )
...
Co-authored-by: PedramAlizadeh <amd@pmohamma.com >
[ROCm/rccl commit: c2fc1d6809 ]
2024-03-21 20:35:29 -04:00
Pedram Alizadeh
17b9546da9
msccl algorithms tuning for allgather on MI300 ( #1110 )
...
[ROCm/rccl commit: 50f22e8317 ]
2024-03-14 12:18:26 -04:00
Pedram Alizadeh
bf48d1bc4d
msccl algorithms tuning for allreduce on MI300 ( #1088 )
...
[ROCm/rccl commit: 5a0f9990a9 ]
2024-02-21 11:31:56 -05:00
Ziyue Yang
e3d45f9de4
Improve MSCCL algorithms ( #1023 )
...
[ROCm/rccl commit: 0a53077c9c ]
2024-01-03 14:51:34 -08:00
Ziyue Yang
62299668bd
Tune MSCCL all-reduce algorithm ( #1009 )
...
[ROCm/rccl commit: bb144dcd50 ]
2023-12-08 17:47:02 -06:00
Wen-Heng (Jack) Chung
0266febb31
Let 320KB message size uses LL protocol. ( #1006 )
...
[ROCm/rccl commit: 8e8323252a ]
2023-12-06 18:14:31 -06:00
Ziyue Yang
cef45b8311
Fix mscclAlgoHandle not initialized issue ( #995 )
...
[ROCm/rccl commit: e44e112a17 ]
2023-12-01 07:58:01 -08:00
Ziyue Yang
f0c47d085e
Move MSCCL algorithm loading to initialization to workaround HIP graph conflict ( #982 )
...
* MSCCL: pre-specify channels and pre-load algorithms
* add mutex
* fix bug
* clean include
* disable all-gathers temporarily
[ROCm/rccl commit: 4bb0b4a380 ]
2023-11-30 09:47:20 -08:00
Ziyue Yang
2351578d5b
Optimize MSCCL all-gather algorithms for gfx942 ( #964 )
...
[ROCm/rccl commit: 7ae95db5b8 ]
2023-11-15 08:18:59 -08:00
akolliasAMD
691df735a3
Revert "Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )" ( #939 )
...
This reverts commit 769f00db5c .
[ROCm/rccl commit: 9f02ee8dea ]
2023-10-30 23:52:58 -06:00
Wen-Heng (Jack) Chung
769f00db5c
Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )
...
[ROCm/rccl commit: bfb8642450 ]
2023-10-24 18:41:12 -05:00
Wen-Heng (Jack) Chung
89a8493ef8
Introduce allgather MSCCL XML specification for MI250X up to 320KB. ( #930 )
...
[ROCm/rccl commit: 3f9ffe4788 ]
2023-10-24 18:35:55 -05:00
Wen-Heng (Jack) Chung
fc2a13c077
Introduce 1-shot allreduce for MI250X Hayabusa. ( #929 )
...
[ROCm/rccl commit: 72d5fbddfd ]
2023-10-24 16:31:18 -05:00
Wen-Heng (Jack) Chung
49e52e7269
Introduce 1pass allreduce. Tailor it for very small message sizes <= 20KB. ( #919 )
...
[ROCm/rccl commit: 341926c60a ]
2023-10-16 16:31:08 -05:00
Wenkai Du
af04103d72
Add MSCCL xml files ( #861 )
...
[ROCm/rccl commit: aeca1af374 ]
2023-08-23 14:12:34 -07:00
Ziyue Yang
f7f669e7f0
MSCCL: Improve executor and integrate scheduler ( #694 )
...
* MSCCL: improve executor and add scheduler for testing
* Use external scheduler
* Fix cmake error
* Address comments
* Fix thread safe issue
* Make MSCCL lifecycle APIs thread safe
* Make MSCCL internal scheduler aware of topology hint
* Revise error message
[ROCm/rccl commit: e3b2342f39 ]
2023-03-14 14:34:25 -07:00