Wenkai Du
b0b2e27df3
Rework kernel launch code ( #449 )
...
[ROCm/rccl commit: d221fb672a ]
2021-10-28 07:26:11 -07:00
Wenkai Du
747216e2b2
Enable timing profiling mode ( #447 )
...
[ROCm/rccl commit: ec36c4c326 ]
2021-10-27 08:21:48 -07:00
Stanley Tsang
a6feafd5dc
Build AllReduce only mode ( #443 )
...
* Initial commit of all_reduce_only support
* Working AllReduce only build
* Removing printfs and restoring release build
* Restore P2P index
* Updates to build_allreduce_only mode.
* cleaning up macro ifdefs
[ROCm/rccl commit: 7e55b211c5 ]
2021-10-26 17:36:46 -06:00
Wenkai Du
15143b1cfb
Query XGMI link count through rocm_smi_lib API ( #442 )
...
[ROCm/rccl commit: 14a184eb67 ]
2021-10-26 10:30:20 -07:00
Stanley Tsang
0afd607328
Re-enable use of chrpath to manually set rpath for unit tests. ( #448 )
...
* Re-enable use of chrpath to manually set rpath for unit tests.
* Add check for chrpath
[ROCm/rccl commit: d23dfc12c1 ]
2021-10-26 11:10:04 -06:00
gilbertlee-amd
bf024320e4
[TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var ( #446 )
...
[ROCm/rccl commit: 18246fc191 ]
2021-10-25 11:23:29 -06:00
Saad Rahim
83dc6396a5
Removing unmaintained dockerfiles ( #439 )
...
[ROCm/rccl commit: 31f9e79775 ]
2021-10-22 16:11:23 -06:00
Roopa Malavally
e381d2d830
Update attributions.rst
...
[ROCm/rccl commit: 8486554e4b ]
2021-10-21 21:08:48 -07:00
gilbertlee-amd
b795cc090b
TransferBench p2p benchmark mode ( #444 )
...
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
[ROCm/rccl commit: 550d732d6c ]
2021-10-21 15:28:16 -06:00
Wenkai Du
aac9d57633
Fix collnet tuning parameters ( #441 )
...
[ROCm/rccl commit: b4cefc05ed ]
2021-10-20 20:45:36 -07:00
Wenkai Du
4d979ce13d
Fix PCIe gen detection ( #437 )
...
* Fix PCIe gen detection
* Update profiling support
[ROCm/rccl commit: 2508507d0a ]
2021-10-15 08:23:50 -07:00
gilbertlee-amd
fe4285d002
[TransferBench] Adding comment echoing to help distinguish tests ( #438 )
...
[ROCm/rccl commit: f6b7ac693e ]
2021-10-13 14:56:57 -06:00
gilbertlee-amd
ad1a620333
[TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU ( #436 )
...
[ROCm/rccl commit: 269f07fbc3 ]
2021-10-12 09:32:54 -06:00
Wenkai Du
b587b55c2e
Add more Rome models ( #434 )
...
* Add more Rome models
* Update models and tuning
* Update tuning
[ROCm/rccl commit: 2249a1d9d3 ]
2021-10-12 08:23:20 -07:00
gilbertlee-amd
227848b70f
[TransferBench] Adding ability to specify suffix for numBytes ( #435 )
...
[ROCm/rccl commit: aa917c3fc8 ]
2021-10-08 16:36:19 -06:00
gilbertlee-amd
94c60f772d
Updating licensing / attribution for documentation ( #432 )
...
[ROCm/rccl commit: a6368bac99 ]
2021-10-08 13:17:24 -06:00
gilbertlee-amd
fef14c1b73
[TransferBench] Fixing advanced config, adding new all-1-hop sample test ( #433 )
...
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
[ROCm/rccl commit: e506d14d18 ]
2021-10-07 15:57:21 -06:00
Wenkai Du
d377c4dcc6
Add another Rome model ( #431 )
...
[ROCm/rccl commit: e0053311c0 ]
2021-10-06 08:17:12 -07:00
Wenkai Du
91eca0d7d2
Trim NICs when all GPUs are connected by XGMI ( #430 )
...
* Trim NICs when all GPUs are connected by XGMI
* Only enable clique with maximum of 2 hops
[ROCm/rccl commit: 29c729d8b6 ]
2021-10-05 18:27:43 -07:00
Gilbert Lee
5be5b37e19
[TransferBench] Update to 2.10.3
...
[ROCm/rccl commit: 68ec3f84e6 ]
2021-08-02 05:53:20 -05:00
Wenkai Du
a44ed86b46
Merge remote-tracking branch 'origin/develop' into 2.10.3
...
[ROCm/rccl commit: 5b72727670 ]
2021-09-15 10:33:25 -07:00
Wenkai Du
1eaf495391
Remove extra L1 cache invalidate and restore __ATOMIC_SEQ_CST atomics ( #426 )
...
[ROCm/rccl commit: 5ae3f3f954 ]
2021-09-14 18:30:16 -07:00
Wenkai Du
d6064367f0
Merge remote-tracking branch 'origin/develop' into 2.10.3
...
[ROCm/rccl commit: 3667d308ab ]
2021-09-13 17:32:48 -07:00
Wenkai Du
f4387b2954
Use relaxed atomics and add sleep and wakeup in barrier loop ( #425 )
...
* Use relaxed atomics and add sleep and wakeup in barrier loop
* atomicAdd in ROCm 4.3 only support unsigned long long
* Switch to atomicAdd and atomicExch in more places
* Restore LOAD/STORE define to __ATOMIC_SEQ_CST
* Restore atomic for sizes FIFO
[ROCm/rccl commit: 020484bf40 ]
2021-09-13 17:03:49 -07:00
Wenkai Du
8acdb77cc0
Merge remote-tracking branch 'origin/develop' into 2.10.3
...
[ROCm/rccl commit: 8ee2b7932a ]
2021-09-13 15:51:53 -07:00
Wenkai Du
9ffeb41fe1
Update tuning table ( #424 )
...
[ROCm/rccl commit: ef432e48e1 ]
2021-09-13 08:39:01 -07:00
Wenkai Du
934885526d
Merge pull request #423 from wenkaidu/prim-test
...
rccl-prim-test: support 8p1h and 16p1h testing
[ROCm/rccl commit: a2421f8b4a ]
2021-09-08 17:01:19 -07:00
Wenkai Du
d2580c8cf5
Improve barrier implementation
...
[ROCm/rccl commit: adb8d63352 ]
2021-09-08 16:14:32 -05:00
Wenkai Du
d75504e9dc
Remove atomic from profiling
...
[ROCm/rccl commit: 31bd4236f1 ]
2021-09-08 14:20:32 -05:00
Wenkai Du
310d51056f
rccl-prim-test: enable 8p1h and 16p1h test
...
[ROCm/rccl commit: 7558b5e2bf ]
2021-09-08 11:51:26 -05:00
Wenkai Du
4f610a2239
Revert "rccl-prim-test: add all-to-all benchmark ( #185 )"
...
This reverts commit e3e1c6b29c .
[ROCm/rccl commit: b22d097524 ]
2021-09-07 16:41:46 -05:00
gilbertlee-amd
06b0e1c4e2
[TransferBench] ConfigFile parsing fixes, adding additional info ( #422 )
...
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix
* [TransferBench] Fixing up NUMA node detection by filtering pools
[ROCm/rccl commit: 51d64894ff ]
2021-09-07 15:28:16 -06:00
Wenkai Du
95b37d97cf
Merge remote-tracking branch 'nccl/master' into 2.10.3
...
[ROCm/rccl commit: 21969d7f89 ]
2021-09-01 16:34:44 -07:00
John Bachan
04553b802a
Fix to https://github.com/NVIDIA/nccl/issues/560
...
ncclGroup's containing operations of mixed datatype, element, or collective
would induce crash.
[ROCm/rccl commit: 5f2f2f670f ]
2021-08-31 15:50:05 -07:00
Wenkai Du
b9508a6aba
Implement NIC identification and remapping ( #420 )
...
* Add 1H16P GPU model
* Implement NIC identification and remapping
* Revert "Sort IB devices based on device name (#413 )"
This reverts commit de0c586bad .
* Fix permute and check order
* Correction on IB speed reporting
* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361 )"
This reverts commit fa690c47a0 .
[ROCm/rccl commit: 5c8380ff5b ]
2021-08-24 09:42:04 -07:00
Gilbert Lee
e0fd3c6ba3
Modifying ReduceOrCopyMulti to accept number of preOp source to support clique-based kernels
...
[ROCm/rccl commit: ae13d2a354 ]
2021-08-11 11:00:34 -05:00
Wenkai Du
eaf54184bf
Improve clique kernel performance by increasing unroll
...
[ROCm/rccl commit: a4929465c5 ]
2021-08-26 18:06:09 -07:00
Wenkai Du
a5e85a66fd
Fix typo that affects clique kernels
...
[ROCm/rccl commit: 574f0aca53 ]
2021-08-26 11:10:58 -07:00
Wenkai Du
bd03ec2a45
Unit Test: support ncclAvg
...
[ROCm/rccl commit: 1faff323b4 ]
2021-08-25 14:15:54 -07:00
Wenkai Du
da7ac55e6f
Fix kernel data trace
...
[ROCm/rccl commit: 60ca7484c0 ]
2021-08-24 14:02:53 -07:00
Wenkai Du
4fd7a14087
Merge remote-tracking branch 'origin/develop' into 2.10.3
...
[ROCm/rccl commit: d5f93649ff ]
2021-08-24 09:49:47 -07:00
Wenkai Du
57518da006
Add gfx908 VM model ( #418 )
...
[ROCm/rccl commit: 5f15ed6e3e ]
2021-08-10 08:55:11 -07:00
Wenkai Du
51198b536d
Use noinline for kernel functions
...
[ROCm/rccl commit: 707c687090 ]
2021-08-06 09:15:04 -07:00
Wenkai Du
79bc1e3fde
Fix incorrect network proxy received bytes reporting
...
[ROCm/rccl commit: 01d3b20a66 ]
2021-08-05 17:45:48 -07:00
gilbertlee-amd
b0c3a1790f
[TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header ( #416 )
...
[ROCm/rccl commit: 1ed272e5f0 ]
2021-08-04 10:53:41 -06:00
Wenkai Du
6e997354e8
Merge branch 'develop' into 2.10.3
...
[ROCm/rccl commit: babbd1047b ]
2021-08-04 09:45:22 -07:00
Wenkai Du
de0c586bad
Sort IB devices based on device name ( #413 )
...
[ROCm/rccl commit: 2d0ed8dff6 ]
2021-08-03 15:32:41 -07:00
Wenkai Du
4b89e98675
Merge remote-tracking branch 'nccl/master' into 2.10.3
...
[ROCm/rccl commit: bf2339f93e ]
2021-07-30 16:23:14 -07:00
Wenkai Du
4b082ceb32
XGMI connection is always prioritized over NET regardless of hops ( #412 )
...
[ROCm/rccl commit: 3e27227562 ]
2021-07-29 11:12:42 -07:00
Eiden Yoshida
d4bdf8fab7
Add basic rtest.xml ( #411 )
...
[ROCm/rccl commit: 229ca88ee6 ]
2021-07-28 11:53:03 -06:00