Wenkai Du
c424979c14
ll_latency_test: fix time calculation ( #825 )
...
* ll_latency_test: fix time calculation
* Measure time after barrier
* Read time stamp only from thread 0
2023-07-27 09:04:35 -07:00
Wenkai Du
1c1ec096e2
tools: Add LL latency test ( #820 )
...
* Add LL latency test
* Correct name in usage
2023-07-25 20:08:04 -07:00
Bertan Dogancay
8bab4f04b7
Implement RCCL Replayer ( #817 )
...
* Implement RCCL Replayer
2023-07-24 16:26:22 -06:00
Wenkai Du
a7fcd58a97
Enable gfx94x ( #808 ) ( #816 )
...
(cherry picked from commit 94da229a7788d74685d1591a4e75a8341de64f41)
2023-07-21 07:31:27 -07:00
Ziyue Yang
b1cddcaf9a
Add GPU P2P ping-pong latency test tool ( #804 )
...
* Add GPU P2P ping-pong latency test tool
* Address comments
* Fix IPC issue in gfx94x
2023-07-14 07:41:29 -07:00
Wenkai Du
f41ea11444
rccl-prim-test: calculate iterations' standard deviation ( #803 )
...
* rccl-prim-test: calculate iterations' standard deviation
* Add default ring configuration for gfx940
* Use hipDeviceMallocUncached on gfx94x
2023-07-13 11:05:50 -07:00
Wenkai Du
43f13cd25a
rccl-prim-test: calculate throughput standard deviations ( #802 )
2023-07-12 10:04:40 -07:00
Wenkai Du
abd0615351
Merge remote-tracking branch 'nccl/master' into develop
2023-06-26 22:51:56 +00:00
Bertan Dogancay
0c77c66221
Disable Colltrace for --fast option ( #778 )
...
* Disable Colltrace for --fast option
* Limit nprocs for CI
2023-06-21 14:16:09 -06:00
Bertan Dogancay
f35777e9b0
improve compilation time and create timetrace plot ( #773 )
...
* improve compilation time and create time-trace plot
* set default value for nproc
2023-06-14 09:17:51 -06:00
akolliasAMD
9cdac774ea
Wall clock update and npkit trace script Update ( #771 )
...
* changed builtin clock to wall_clock64
* updated npkit_Trace_generator to the new version of npkit
2023-06-07 17:47:10 -06:00
gilbertlee-amd
20b567caac
Updating NOTICES.txt and LICENSE.txt ( #770 )
2023-06-07 09:45:03 -06:00
Wenkai Du
3af90902c8
Add NCCL_NCHANNELS_PER_PEER override ( #767 )
...
Also fix topol_expl build issue
2023-06-06 08:41:38 -07:00
akolliasAMD
2b1efa9e9a
added time results on npkit generator ( #749 )
2023-05-30 12:57:25 -06:00
akolliasAMD
c88475462b
added modified npkit_trace_generator.py to scripts ( #738 )
...
* added modified npkit_trace_generator.py to scripts
2023-05-09 10:11:35 -06:00
Wenkai Du
addbf4bd90
rccl-prim-test: minor update ( #718 )
2023-04-03 07:30:04 -07:00
Ziyue Yang
e3b2342f39
MSCCL: Improve executor and integrate scheduler ( #694 )
...
* MSCCL: improve executor and add scheduler for testing
* Use external scheduler
* Fix cmake error
* Address comments
* Fix thread safe issue
* Make MSCCL lifecycle APIs thread safe
* Make MSCCL internal scheduler aware of topology hint
* Revise error message
2023-03-14 14:34:25 -07:00
Wenkai Du
e1cb45ff22
Merge remote-tracking branch 'nccl/master' into HEAD
2023-02-04 01:44:43 +00:00
Wenkai Du
a0dd8e0b84
topo_expl: fix broken build by adding hipify steps ( #670 )
2023-01-06 07:29:40 -08:00
Ziyue Yang
adafc0f759
Add MSCCL Support ( #658 )
...
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
2022-12-12 15:51:04 -08:00
gilbertlee-amd
faed69f9fc
Graph unit tests ( #656 )
...
* Adding hipGraph unit tests
2022-12-01 10:28:42 -07:00
Wenkai Du
94ad7f6f51
Update tuning table and fix topo_expl
2022-11-07 18:24:24 +00:00
Wenkai Du
4f0e223db4
Merge remote-tracking branch 'nccl/master' into develop
2022-10-20 15:41:29 +00:00
Wenkai Du
fc554a2428
topo_expl: fix compilation error ( #639 )
2022-10-19 14:19:50 -07:00
gilbertlee-amd
10dbd2a452
Fixing formatting for copywrite ( #638 )
2022-10-19 13:43:21 -06:00
gilbertlee-amd
ebb8b5bf63
Updating files for missing licenses ( #637 )
2022-10-14 13:49:16 -06:00
gilbertlee-amd
bd7d589446
Removing TransferBench from tools ( #632 )
...
Point to new TransferBench repo
2022-09-30 11:53:32 -06:00
Wen-Heng (Jack) Chung
84054c3b30
Tweak unroll factors.
2022-09-22 13:03:04 -05:00
Gilbert Lee
009e79623f
Merge branch 'develop' into 2.13.4
2022-09-09 23:07:04 +00:00
gilbertlee-amd
dd56135a9a
Updating stream caching ( #614 )
...
- Adding non-captured hipStream for use in setup
2022-09-09 16:30:15 -06:00
gilbertlee-amd
65d78e9a1d
GraphBench ( #613 )
...
Adding simple GraphBench tool for comparing RCCL hipGraph performance
2022-09-09 12:12:25 -06:00
Wenkai Du
a79d9e3586
Merge remote-tracking branch 'nccl/master' into develop
2022-09-09 16:05:38 +00:00
akolliasAMD
06bce9d0c9
added stream synch after hipMemset ( #609 )
2022-08-30 16:18:37 -06:00
arvindcheru
2cb2f9493a
HIP Path default updated to ROCM_PATH (reorg path) ( #592 )
...
Updated default path for hip to ROCM_PATH (/opt/rocm instead of /opt/rocm/hip) as per new/current structure.
2022-08-04 13:38:41 -04:00
Edgar
0336ffdf70
Introduce multi-rank support per device.
...
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
2022-06-10 14:23:12 +00:00
Wenkai Du
ef499c4810
Add another Rome model ( #553 )
...
* Add another Rome model
* Add option to force enable intranet on single node
* Limit p2p channels to number of ranks
* Refine p2p channels handling
2022-05-31 11:31:30 -07:00
Wenkai Du
c5b77121f0
Update Rome model ( #552 )
2022-05-26 09:59:23 -07:00
akolliasAMD
98f0809a39
Added creation of new tree and added switch for using treesplit for specific cases ( #551 )
2022-05-25 18:55:14 -04:00
Wenkai Du
283dc86a73
Refine and add new Rome models ( #548 )
2022-05-17 08:23:59 -07:00
gilbertlee-amd
685bcea127
[TransferBench] Syncing with TransferBench v1.02 ( #541 )
2022-04-27 20:43:24 -06:00
Wenkai Du
063da25563
topo_expl: fix build and add tuning support ( #539 )
2022-04-26 15:40:07 -07:00
Wenkai Du
d28e1cb44f
Merge remote-tracking branch 'nccl/master' into develop
2022-04-18 11:15:25 -07:00
Wenkai Du
2151c79d14
Add new Rome model ( #536 )
2022-04-13 11:45:40 -07:00
Wenkai Du
ba4c165bf3
Add new Rome model ( #535 )
2022-04-12 13:27:32 -07:00
gilbertlee-amd
def6832287
Transfer bench single stream mode ( #531 )
...
- Adding single stream mode
- Removing some unused env vars
- Adding output to CSV mode for p2p benchmark, topology listing modes
2022-04-08 15:20:55 -06:00
Wenkai Du
bbe780ca6c
Support multiple tuning tables ( #522 )
...
* Support multiple tuning tables
* [UnitTests] Skip managed memory testing
2022-03-31 17:09:21 -07:00
gilbertlee-amd
2d558c9abc
Adding explicit request for coarse-grained host memory due to changes in HipHostMalloc ( #517 )
2022-03-25 13:05:07 -06:00
Wenkai Du
cd17cf6dce
Update Rome model matching and add new models ( #516 )
...
* Update Rome model matching and add new models
* Add missing file
* Models update
2022-03-21 10:54:40 -07:00
Ziyue Yang
b569c0a1db
Add Pivot AllToAll algorithm for Rome model ( #503 )
...
* add a2a pivot interface
* remove debug info
* address comments
* fix bug
* remove custom script
* address comments
* fix bug
2022-02-20 21:09:47 -08:00
gilbertlee-amd
f3c2cafd9d
[TransferBench] Fix for cases with subsets of configured numa nodes ( #495 )
2022-02-07 12:16:19 -07:00