Граф коммитов

246 Коммитов

Автор SHA1 Сообщение Дата
Wenkai Du 4e1b8c1cbb MSCCL: add support for out-of-place all reduce (#1156) 2024-04-28 19:49:09 -07:00
Wenkai Du 9e0c9b4ed8 Replace __HIP_PLATFORM_HCC__ with __HIP_PLATFORM_AMD__ (#1154) 2024-04-25 07:19:18 -07:00
gilbertlee-amd 4cb62f999a Rail optimization for rings (#1140)
- Modifies the ring creation algorithm to be friendlier to rail-optimized topologies (should not affect classic fabric topologies)
2024-04-15 12:03:57 -06:00
gilbertlee-amd 93982533d7 [topo_expl] Adding -n option to override number of nodes (#1134) 2024-04-04 15:11:47 -06:00
Wenkai Du e8c76fd806 rccl_prim_test: increase max number of workgroups and test iterations (#1132) 2024-04-03 11:29:21 -07:00
corey-derochie-amd 503a472a25 Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. (#1125) 2024-03-25 16:29:13 -06:00
corey-derochie-amd 9eefc68cb5 Fixes the copyright comment block on each of topo_expl/models/*.xml. The format was not valid XML. (#1124) 2024-03-25 16:21:17 -06:00
Pedram Alizadeh c2fc1d6809 msccl algorithms tuning for alltoall on MI300 (#1120)
Co-authored-by: PedramAlizadeh <amd@pmohamma.com>
2024-03-21 20:35:29 -04:00
Pedram Alizadeh 50f22e8317 msccl algorithms tuning for allgather on MI300 (#1110) 2024-03-14 12:18:26 -04:00
Andy li 6777e65c1d Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo
2024-03-08 15:17:53 -08:00
Wenkai Du d2224fd3e1 topo_expl: 2.19.4 update and fix build error (#1098) 2024-03-07 08:52:50 -08:00
Wenkai Du df98a6957d Add another Rome model (#1095) 2024-02-28 10:46:05 -08:00
Wenkai Du 74f9e5db64 Add new GPU model (#1080) 2024-02-23 12:19:42 -08:00
Pedram Alizadeh 5a0f9990a9 msccl algorithms tuning for allreduce on MI300 (#1088) 2024-02-21 11:31:56 -05:00
BertanDogancay 76f83f95ab Merge remote-tracking branch 'rccl/develop' into 2.19.4 2024-02-15 13:37:14 -08:00
akolliasAMD 16d7f372b7 Npkit updates (#1084)
* removed warmup runs to be an opt in
2024-02-15 07:48:45 -07:00
Wenkai Du d1575a1622 topo_expl: 2.19 update 2024-01-31 16:11:14 -06:00
Wenkai Du 600b44fee5 topo-expl: fix broken build (#1048) 2024-01-17 08:59:03 -08:00
Wenkai Du f7e39fced2 Doubling buffer size to fix NCCL INFO corruption with increased channels (#1035) 2024-01-08 08:14:33 -08:00
Wenkai Du cfc04a8aef p2p-latency-tests: fix build by switching to gcnArchName (#1030)
* p2p-latency-tests: fix build by switching to gcnArchName

* rccl-prim-test: switch to gcnArchName
2024-01-04 13:36:48 -08:00
Ziyue Yang 0a53077c9c Improve MSCCL algorithms (#1023) 2024-01-03 14:51:34 -08:00
Ziyue Yang bb144dcd50 Tune MSCCL all-reduce algorithm (#1009) 2023-12-08 17:47:02 -06:00
Wen-Heng (Jack) Chung 8e8323252a Let 320KB message size uses LL protocol. (#1006) 2023-12-06 18:14:31 -06:00
Ziyue Yang e44e112a17 Fix mscclAlgoHandle not initialized issue (#995) 2023-12-01 07:58:01 -08:00
Ziyue Yang 4bb0b4a380 Move MSCCL algorithm loading to initialization to workaround HIP graph conflict (#982)
* MSCCL: pre-specify channels and pre-load algorithms

* add mutex

* fix bug

* clean include

* disable all-gathers temporarily
2023-11-30 09:47:20 -08:00
akolliasAMD c71bae1608 npkit trace script now syncs the on average difference per rank (#981) 2023-11-28 11:03:55 -07:00
gilbertlee-amd 213869a6b4 JitterBench (#975) 2023-11-23 11:14:11 -07:00
Wenkai Du 50b2dd9fd7 Add special handling of gfx940 (#976)
* Add special handling of gfx940

* Update ring base
2023-11-22 15:07:36 -08:00
Ziyue Yang 7ae95db5b8 Optimize MSCCL all-gather algorithms for gfx942 (#964) 2023-11-15 08:18:59 -08:00
gilbertlee-amd d50bab28bf Adding LaunchBench tool (#952) 2023-11-03 12:04:52 -06:00
akolliasAMD 9f02ee8dea Revert "Introduce allgather for MSCCL on 8 sockets up to 320KB. (#931)" (#939)
This reverts commit bfb8642450.
2023-10-30 23:52:58 -06:00
Wen-Heng (Jack) Chung bfb8642450 Introduce allgather for MSCCL on 8 sockets up to 320KB. (#931) 2023-10-24 18:41:12 -05:00
Wen-Heng (Jack) Chung 3f9ffe4788 Introduce allgather MSCCL XML specification for MI250X up to 320KB. (#930) 2023-10-24 18:35:55 -05:00
Wen-Heng (Jack) Chung 72d5fbddfd Introduce 1-shot allreduce for MI250X Hayabusa. (#929) 2023-10-24 16:31:18 -05:00
Wen-Heng (Jack) Chung 341926c60a Introduce 1pass allreduce. Tailor it for very small message sizes <= 20KB. (#919) 2023-10-16 16:31:08 -05:00
mberenjk 7e2d905376 adding cuda support for EmptyKernelTest (#913) 2023-10-11 14:11:12 -05:00
gilbertlee-amd 7dbf47e07b Adding a simple EmptyKernelTest to measure launch latency (#910) 2023-10-04 17:22:48 -06:00
Pedram Alizadeh 3f6c2b9b32 Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite (#895) 2023-09-27 12:44:36 -04:00
akolliasAMD 762a42859e Fixed topo_expl (#891) 2023-09-13 12:05:35 -06:00
Audrey MP e58ec78d35 Gcn arch name (#886)
We use CMake to determine if we're compiling against a version of ROCm that supports gcnArchName and handles architecture checking appropriately. It includes a few helper functions as drop ins for the functionality we used gcnArch for before; sometimes to enable flags, and sometimes to set frequencies.
2023-09-12 15:34:40 -04:00
Wenkai Du c6dd6f6237 Update ll_latency_test and add CUDA version (#873) 2023-08-30 16:29:42 -07:00
Wenkai Du aa95985867 rccl-prim-test: use non-temporal access (#867) 2023-08-28 08:28:05 -07:00
Wenkai Du aeca1af374 Add MSCCL xml files (#861) 2023-08-23 14:12:34 -07:00
akolliasAMD d33cd5a233 NCCL_TREES variable and rome model fixes (#856) 2023-08-21 10:35:37 -06:00
Wenkai Du 148e3430f4 p2p/ll-latency-test: convert to single thread tests (#857) 2023-08-21 07:48:37 -07:00
Wenkai Du 7044599575 Add new model support (#847)
* Add new model support

* Update new rings
2023-08-10 17:14:51 -07:00
Wenkai Du 0441eec32e p2p_latency_test: clean up IPC temp files at exit (#832) 2023-07-31 08:11:07 -07:00
Wenkai Du c424979c14 ll_latency_test: fix time calculation (#825)
* ll_latency_test: fix time calculation

* Measure time after barrier

* Read time stamp only from thread 0
2023-07-27 09:04:35 -07:00
Wenkai Du 1c1ec096e2 tools: Add LL latency test (#820)
* Add LL latency test

* Correct name in usage
2023-07-25 20:08:04 -07:00
Bertan Dogancay 8bab4f04b7 Implement RCCL Replayer (#817)
* Implement RCCL Replayer
2023-07-24 16:26:22 -06:00