Граф коммитов

227 Коммитов

Автор SHA1 Сообщение Дата
Wenkai Du cfc04a8aef p2p-latency-tests: fix build by switching to gcnArchName (#1030)
* p2p-latency-tests: fix build by switching to gcnArchName

* rccl-prim-test: switch to gcnArchName
2024-01-04 13:36:48 -08:00
Ziyue Yang 0a53077c9c Improve MSCCL algorithms (#1023) 2024-01-03 14:51:34 -08:00
Ziyue Yang bb144dcd50 Tune MSCCL all-reduce algorithm (#1009) 2023-12-08 17:47:02 -06:00
Wen-Heng (Jack) Chung 8e8323252a Let 320KB message size uses LL protocol. (#1006) 2023-12-06 18:14:31 -06:00
Ziyue Yang e44e112a17 Fix mscclAlgoHandle not initialized issue (#995) 2023-12-01 07:58:01 -08:00
Ziyue Yang 4bb0b4a380 Move MSCCL algorithm loading to initialization to workaround HIP graph conflict (#982)
* MSCCL: pre-specify channels and pre-load algorithms

* add mutex

* fix bug

* clean include

* disable all-gathers temporarily
2023-11-30 09:47:20 -08:00
akolliasAMD c71bae1608 npkit trace script now syncs the on average difference per rank (#981) 2023-11-28 11:03:55 -07:00
gilbertlee-amd 213869a6b4 JitterBench (#975) 2023-11-23 11:14:11 -07:00
Wenkai Du 50b2dd9fd7 Add special handling of gfx940 (#976)
* Add special handling of gfx940

* Update ring base
2023-11-22 15:07:36 -08:00
Ziyue Yang 7ae95db5b8 Optimize MSCCL all-gather algorithms for gfx942 (#964) 2023-11-15 08:18:59 -08:00
gilbertlee-amd d50bab28bf Adding LaunchBench tool (#952) 2023-11-03 12:04:52 -06:00
akolliasAMD 9f02ee8dea Revert "Introduce allgather for MSCCL on 8 sockets up to 320KB. (#931)" (#939)
This reverts commit bfb8642450.
2023-10-30 23:52:58 -06:00
Wen-Heng (Jack) Chung bfb8642450 Introduce allgather for MSCCL on 8 sockets up to 320KB. (#931) 2023-10-24 18:41:12 -05:00
Wen-Heng (Jack) Chung 3f9ffe4788 Introduce allgather MSCCL XML specification for MI250X up to 320KB. (#930) 2023-10-24 18:35:55 -05:00
Wen-Heng (Jack) Chung 72d5fbddfd Introduce 1-shot allreduce for MI250X Hayabusa. (#929) 2023-10-24 16:31:18 -05:00
Wen-Heng (Jack) Chung 341926c60a Introduce 1pass allreduce. Tailor it for very small message sizes <= 20KB. (#919) 2023-10-16 16:31:08 -05:00
mberenjk 7e2d905376 adding cuda support for EmptyKernelTest (#913) 2023-10-11 14:11:12 -05:00
gilbertlee-amd 7dbf47e07b Adding a simple EmptyKernelTest to measure launch latency (#910) 2023-10-04 17:22:48 -06:00
Pedram Alizadeh 3f6c2b9b32 Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite (#895) 2023-09-27 12:44:36 -04:00
akolliasAMD 762a42859e Fixed topo_expl (#891) 2023-09-13 12:05:35 -06:00
Audrey MP e58ec78d35 Gcn arch name (#886)
We use CMake to determine if we're compiling against a version of ROCm that supports gcnArchName and handles architecture checking appropriately. It includes a few helper functions as drop ins for the functionality we used gcnArch for before; sometimes to enable flags, and sometimes to set frequencies.
2023-09-12 15:34:40 -04:00
Wenkai Du c6dd6f6237 Update ll_latency_test and add CUDA version (#873) 2023-08-30 16:29:42 -07:00
Wenkai Du aa95985867 rccl-prim-test: use non-temporal access (#867) 2023-08-28 08:28:05 -07:00
Wenkai Du aeca1af374 Add MSCCL xml files (#861) 2023-08-23 14:12:34 -07:00
akolliasAMD d33cd5a233 NCCL_TREES variable and rome model fixes (#856) 2023-08-21 10:35:37 -06:00
Wenkai Du 148e3430f4 p2p/ll-latency-test: convert to single thread tests (#857) 2023-08-21 07:48:37 -07:00
Wenkai Du 7044599575 Add new model support (#847)
* Add new model support

* Update new rings
2023-08-10 17:14:51 -07:00
Wenkai Du 0441eec32e p2p_latency_test: clean up IPC temp files at exit (#832) 2023-07-31 08:11:07 -07:00
Wenkai Du c424979c14 ll_latency_test: fix time calculation (#825)
* ll_latency_test: fix time calculation

* Measure time after barrier

* Read time stamp only from thread 0
2023-07-27 09:04:35 -07:00
Wenkai Du 1c1ec096e2 tools: Add LL latency test (#820)
* Add LL latency test

* Correct name in usage
2023-07-25 20:08:04 -07:00
Bertan Dogancay 8bab4f04b7 Implement RCCL Replayer (#817)
* Implement RCCL Replayer
2023-07-24 16:26:22 -06:00
Wenkai Du a7fcd58a97 Enable gfx94x (#808) (#816)
(cherry picked from commit 94da229a7788d74685d1591a4e75a8341de64f41)
2023-07-21 07:31:27 -07:00
Ziyue Yang b1cddcaf9a Add GPU P2P ping-pong latency test tool (#804)
* Add GPU P2P ping-pong latency test tool

* Address comments

* Fix IPC issue in gfx94x
2023-07-14 07:41:29 -07:00
Wenkai Du f41ea11444 rccl-prim-test: calculate iterations' standard deviation (#803)
* rccl-prim-test: calculate iterations' standard deviation

* Add default ring configuration for gfx940

* Use hipDeviceMallocUncached on gfx94x
2023-07-13 11:05:50 -07:00
Wenkai Du 43f13cd25a rccl-prim-test: calculate throughput standard deviations (#802) 2023-07-12 10:04:40 -07:00
Wenkai Du abd0615351 Merge remote-tracking branch 'nccl/master' into develop 2023-06-26 22:51:56 +00:00
Bertan Dogancay 0c77c66221 Disable Colltrace for --fast option (#778)
* Disable Colltrace for --fast option

* Limit nprocs for CI
2023-06-21 14:16:09 -06:00
Bertan Dogancay f35777e9b0 improve compilation time and create timetrace plot (#773)
* improve compilation time and create time-trace plot

* set default value for nproc
2023-06-14 09:17:51 -06:00
akolliasAMD 9cdac774ea Wall clock update and npkit trace script Update (#771)
* changed builtin clock to wall_clock64
* updated npkit_Trace_generator to the new version of npkit
2023-06-07 17:47:10 -06:00
gilbertlee-amd 20b567caac Updating NOTICES.txt and LICENSE.txt (#770) 2023-06-07 09:45:03 -06:00
Wenkai Du 3af90902c8 Add NCCL_NCHANNELS_PER_PEER override (#767)
Also fix topol_expl build issue
2023-06-06 08:41:38 -07:00
akolliasAMD 2b1efa9e9a added time results on npkit generator (#749) 2023-05-30 12:57:25 -06:00
akolliasAMD c88475462b added modified npkit_trace_generator.py to scripts (#738)
* added modified npkit_trace_generator.py to scripts
2023-05-09 10:11:35 -06:00
Wenkai Du addbf4bd90 rccl-prim-test: minor update (#718) 2023-04-03 07:30:04 -07:00
Ziyue Yang e3b2342f39 MSCCL: Improve executor and integrate scheduler (#694)
* MSCCL: improve executor and add scheduler for testing

* Use external scheduler

* Fix cmake error

* Address comments

* Fix thread safe issue

* Make MSCCL lifecycle APIs thread safe

* Make MSCCL internal scheduler aware of topology hint

* Revise error message
2023-03-14 14:34:25 -07:00
Wenkai Du e1cb45ff22 Merge remote-tracking branch 'nccl/master' into HEAD 2023-02-04 01:44:43 +00:00
Wenkai Du a0dd8e0b84 topo_expl: fix broken build by adding hipify steps (#670) 2023-01-06 07:29:40 -08:00
Ziyue Yang adafc0f759 Add MSCCL Support (#658)
* Add MSCCL support

* Add alignment and message size checking

* Fix nRanks checking, in-place and out-of-place tests and group call handling

* Fix hipGraph unit test

* Change MSCCL init warning to INFO

* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd faed69f9fc Graph unit tests (#656)
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du 94ad7f6f51 Update tuning table and fix topo_expl 2022-11-07 18:24:24 +00:00