Wenkai Du
cfc04a8aef
p2p-latency-tests: fix build by switching to gcnArchName ( #1030 )
...
* p2p-latency-tests: fix build by switching to gcnArchName
* rccl-prim-test: switch to gcnArchName
2024-01-04 13:36:48 -08:00
Ziyue Yang
0a53077c9c
Improve MSCCL algorithms ( #1023 )
2024-01-03 14:51:34 -08:00
Ziyue Yang
bb144dcd50
Tune MSCCL all-reduce algorithm ( #1009 )
2023-12-08 17:47:02 -06:00
Wen-Heng (Jack) Chung
8e8323252a
Let 320KB message size uses LL protocol. ( #1006 )
2023-12-06 18:14:31 -06:00
Ziyue Yang
e44e112a17
Fix mscclAlgoHandle not initialized issue ( #995 )
2023-12-01 07:58:01 -08:00
Ziyue Yang
4bb0b4a380
Move MSCCL algorithm loading to initialization to workaround HIP graph conflict ( #982 )
...
* MSCCL: pre-specify channels and pre-load algorithms
* add mutex
* fix bug
* clean include
* disable all-gathers temporarily
2023-11-30 09:47:20 -08:00
akolliasAMD
c71bae1608
npkit trace script now syncs the on average difference per rank ( #981 )
2023-11-28 11:03:55 -07:00
gilbertlee-amd
213869a6b4
JitterBench ( #975 )
2023-11-23 11:14:11 -07:00
Wenkai Du
50b2dd9fd7
Add special handling of gfx940 ( #976 )
...
* Add special handling of gfx940
* Update ring base
2023-11-22 15:07:36 -08:00
Ziyue Yang
7ae95db5b8
Optimize MSCCL all-gather algorithms for gfx942 ( #964 )
2023-11-15 08:18:59 -08:00
gilbertlee-amd
d50bab28bf
Adding LaunchBench tool ( #952 )
2023-11-03 12:04:52 -06:00
akolliasAMD
9f02ee8dea
Revert "Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )" ( #939 )
...
This reverts commit bfb8642450 .
2023-10-30 23:52:58 -06:00
Wen-Heng (Jack) Chung
bfb8642450
Introduce allgather for MSCCL on 8 sockets up to 320KB. ( #931 )
2023-10-24 18:41:12 -05:00
Wen-Heng (Jack) Chung
3f9ffe4788
Introduce allgather MSCCL XML specification for MI250X up to 320KB. ( #930 )
2023-10-24 18:35:55 -05:00
Wen-Heng (Jack) Chung
72d5fbddfd
Introduce 1-shot allreduce for MI250X Hayabusa. ( #929 )
2023-10-24 16:31:18 -05:00
Wen-Heng (Jack) Chung
341926c60a
Introduce 1pass allreduce. Tailor it for very small message sizes <= 20KB. ( #919 )
2023-10-16 16:31:08 -05:00
mberenjk
7e2d905376
adding cuda support for EmptyKernelTest ( #913 )
2023-10-11 14:11:12 -05:00
gilbertlee-amd
7dbf47e07b
Adding a simple EmptyKernelTest to measure launch latency ( #910 )
2023-10-04 17:22:48 -06:00
Pedram Alizadeh
3f6c2b9b32
Adding a script that will download/compile/run TransferBench/RCCL/UCX/RCCL-tests/RCCL-Unittests/hip-mpi-testsuite ( #895 )
2023-09-27 12:44:36 -04:00
akolliasAMD
762a42859e
Fixed topo_expl ( #891 )
2023-09-13 12:05:35 -06:00
Audrey MP
e58ec78d35
Gcn arch name ( #886 )
...
We use CMake to determine if we're compiling against a version of ROCm that supports gcnArchName and handles architecture checking appropriately. It includes a few helper functions as drop ins for the functionality we used gcnArch for before; sometimes to enable flags, and sometimes to set frequencies.
2023-09-12 15:34:40 -04:00
Wenkai Du
c6dd6f6237
Update ll_latency_test and add CUDA version ( #873 )
2023-08-30 16:29:42 -07:00
Wenkai Du
aa95985867
rccl-prim-test: use non-temporal access ( #867 )
2023-08-28 08:28:05 -07:00
Wenkai Du
aeca1af374
Add MSCCL xml files ( #861 )
2023-08-23 14:12:34 -07:00
akolliasAMD
d33cd5a233
NCCL_TREES variable and rome model fixes ( #856 )
2023-08-21 10:35:37 -06:00
Wenkai Du
148e3430f4
p2p/ll-latency-test: convert to single thread tests ( #857 )
2023-08-21 07:48:37 -07:00
Wenkai Du
7044599575
Add new model support ( #847 )
...
* Add new model support
* Update new rings
2023-08-10 17:14:51 -07:00
Wenkai Du
0441eec32e
p2p_latency_test: clean up IPC temp files at exit ( #832 )
2023-07-31 08:11:07 -07:00
Wenkai Du
c424979c14
ll_latency_test: fix time calculation ( #825 )
...
* ll_latency_test: fix time calculation
* Measure time after barrier
* Read time stamp only from thread 0
2023-07-27 09:04:35 -07:00
Wenkai Du
1c1ec096e2
tools: Add LL latency test ( #820 )
...
* Add LL latency test
* Correct name in usage
2023-07-25 20:08:04 -07:00
Bertan Dogancay
8bab4f04b7
Implement RCCL Replayer ( #817 )
...
* Implement RCCL Replayer
2023-07-24 16:26:22 -06:00
Wenkai Du
a7fcd58a97
Enable gfx94x ( #808 ) ( #816 )
...
(cherry picked from commit 94da229a7788d74685d1591a4e75a8341de64f41)
2023-07-21 07:31:27 -07:00
Ziyue Yang
b1cddcaf9a
Add GPU P2P ping-pong latency test tool ( #804 )
...
* Add GPU P2P ping-pong latency test tool
* Address comments
* Fix IPC issue in gfx94x
2023-07-14 07:41:29 -07:00
Wenkai Du
f41ea11444
rccl-prim-test: calculate iterations' standard deviation ( #803 )
...
* rccl-prim-test: calculate iterations' standard deviation
* Add default ring configuration for gfx940
* Use hipDeviceMallocUncached on gfx94x
2023-07-13 11:05:50 -07:00
Wenkai Du
43f13cd25a
rccl-prim-test: calculate throughput standard deviations ( #802 )
2023-07-12 10:04:40 -07:00
Wenkai Du
abd0615351
Merge remote-tracking branch 'nccl/master' into develop
2023-06-26 22:51:56 +00:00
Bertan Dogancay
0c77c66221
Disable Colltrace for --fast option ( #778 )
...
* Disable Colltrace for --fast option
* Limit nprocs for CI
2023-06-21 14:16:09 -06:00
Bertan Dogancay
f35777e9b0
improve compilation time and create timetrace plot ( #773 )
...
* improve compilation time and create time-trace plot
* set default value for nproc
2023-06-14 09:17:51 -06:00
akolliasAMD
9cdac774ea
Wall clock update and npkit trace script Update ( #771 )
...
* changed builtin clock to wall_clock64
* updated npkit_Trace_generator to the new version of npkit
2023-06-07 17:47:10 -06:00
gilbertlee-amd
20b567caac
Updating NOTICES.txt and LICENSE.txt ( #770 )
2023-06-07 09:45:03 -06:00
Wenkai Du
3af90902c8
Add NCCL_NCHANNELS_PER_PEER override ( #767 )
...
Also fix topol_expl build issue
2023-06-06 08:41:38 -07:00
akolliasAMD
2b1efa9e9a
added time results on npkit generator ( #749 )
2023-05-30 12:57:25 -06:00
akolliasAMD
c88475462b
added modified npkit_trace_generator.py to scripts ( #738 )
...
* added modified npkit_trace_generator.py to scripts
2023-05-09 10:11:35 -06:00
Wenkai Du
addbf4bd90
rccl-prim-test: minor update ( #718 )
2023-04-03 07:30:04 -07:00
Ziyue Yang
e3b2342f39
MSCCL: Improve executor and integrate scheduler ( #694 )
...
* MSCCL: improve executor and add scheduler for testing
* Use external scheduler
* Fix cmake error
* Address comments
* Fix thread safe issue
* Make MSCCL lifecycle APIs thread safe
* Make MSCCL internal scheduler aware of topology hint
* Revise error message
2023-03-14 14:34:25 -07:00
Wenkai Du
e1cb45ff22
Merge remote-tracking branch 'nccl/master' into HEAD
2023-02-04 01:44:43 +00:00
Wenkai Du
a0dd8e0b84
topo_expl: fix broken build by adding hipify steps ( #670 )
2023-01-06 07:29:40 -08:00
Ziyue Yang
adafc0f759
Add MSCCL Support ( #658 )
...
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd
faed69f9fc
Graph unit tests ( #656 )
...
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du
94ad7f6f51
Update tuning table and fix topo_expl
2022-11-07 18:24:24 +00:00