Mustafa Abduljabbar
d665547eef
Remove MSCCL single node AllGather XMLs ( #1693 )
...
* Remove MSCCL single node XMLs
* Remove comment on MSCCL AG single node support
2025-05-13 17:07:03 -05:00
Mustafa Abduljabbar
aa7991dfc8
[AllGather MSCCL] Multinode and single node support up to certain send count ( #1650 )
...
* Add multinode and singlenode allgather XML
2025-04-24 09:02:03 -04:00
Pedram Alizadeh
5b36b68d06
single-node AR msccl algorithm tuning for MI300 ( #1629 )
2025-04-10 10:42:28 -04:00
Wenkai Du
62d10fdc25
msccl: disable 1-shot xmls ( #1375 )
...
MSCCL 1-shot xmls may cause different output values on different ranks.
Disabling them for now to avoid undefined behavior in applications.
2024-10-14 15:10:53 -07:00
Wenkai Du
a680e329e6
Temporarily disable MSCCL all gather XMLs due to UT failure ( #1373 )
2024-10-12 08:43:16 -07:00
ClementLinCF
cab25f919e
Optimize NCHANNELS and MSCCL config for gfx942 80CUs ( #1195 )
...
* Optimize NCHANNELS and MSCCL config for gfx942 80CUs
Set appropriately for different NCCL_MIN_NCHANNELS and MSCCL config,
potentially improving communication perf on the MI300x 80CUs
* Delete tools/msccl-algorithms/allreduce_1step_mccl_8_2_16777216_LL.xml
* Change the factor of gfx94 and update msccl config
2024-06-01 07:07:46 -07:00
Wenkai Du
4e1b8c1cbb
MSCCL: add support for out-of-place all reduce ( #1156 )
2024-04-28 19:49:09 -07:00
Pedram Alizadeh
c2fc1d6809
msccl algorithms tuning for alltoall on MI300 ( #1120 )
...
Co-authored-by: PedramAlizadeh <amd@pmohamma.com >
2024-03-21 20:35:29 -04:00
Pedram Alizadeh
50f22e8317
msccl algorithms tuning for allgather on MI300 ( #1110 )
2024-03-14 12:18:26 -04:00
Pedram Alizadeh
5a0f9990a9
msccl algorithms tuning for allreduce on MI300 ( #1088 )
2024-02-21 11:31:56 -05:00
Ziyue Yang
0a53077c9c
Improve MSCCL algorithms ( #1023 )
2024-01-03 14:51:34 -08:00
Ziyue Yang
bb144dcd50
Tune MSCCL all-reduce algorithm ( #1009 )
2023-12-08 17:47:02 -06:00
Wen-Heng (Jack) Chung
8e8323252a
Let 320KB message size uses LL protocol. ( #1006 )
2023-12-06 18:14:31 -06:00
Ziyue Yang
e44e112a17
Fix mscclAlgoHandle not initialized issue ( #995 )
2023-12-01 07:58:01 -08:00
Ziyue Yang
4bb0b4a380
Move MSCCL algorithm loading to initialization to workaround HIP graph conflict ( #982 )
...
* MSCCL: pre-specify channels and pre-load algorithms
* add mutex
* fix bug
* clean include
* disable all-gathers temporarily
2023-11-30 09:47:20 -08:00
Ziyue Yang
7ae95db5b8
Optimize MSCCL all-gather algorithms for gfx942 ( #964 )
2023-11-15 08:18:59 -08:00
akolliasAMD
9f02ee8dea
Revert "Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )" ( #939 )
...
This reverts commit bfb8642450 .
2023-10-30 23:52:58 -06:00
Wen-Heng (Jack) Chung
bfb8642450
Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )
2023-10-24 18:41:12 -05:00
Wen-Heng (Jack) Chung
3f9ffe4788
Introduce allgather MSCCL XML specification for MI250X up to 320KB. ( #930 )
2023-10-24 18:35:55 -05:00
Wen-Heng (Jack) Chung
72d5fbddfd
Introduce 1-shot allreduce for MI250X Hayabusa. ( #929 )
2023-10-24 16:31:18 -05:00
Wen-Heng (Jack) Chung
341926c60a
Introduce 1pass allreduce. Tailor it for very small message sizes <= 20KB. ( #919 )
2023-10-16 16:31:08 -05:00
Wenkai Du
aeca1af374
Add MSCCL xml files ( #861 )
2023-08-23 14:12:34 -07:00
Ziyue Yang
e3b2342f39
MSCCL: Improve executor and integrate scheduler ( #694 )
...
* MSCCL: improve executor and add scheduler for testing
* Use external scheduler
* Fix cmake error
* Address comments
* Fix thread safe issue
* Make MSCCL lifecycle APIs thread safe
* Make MSCCL internal scheduler aware of topology hint
* Revise error message
2023-03-14 14:34:25 -07:00