Graf commitů

626 Commity

Autor SHA1 Zpráva Datum
Wenkai Du fd98ee84b4 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model

[ROCm/rccl commit: 0331e39f81]
2021-11-05 08:53:47 -07:00
rachanaramanna 709b3dc85b Update LICENSE.txt (#450)
[ROCm/rccl commit: 04c10a6025]
2021-11-05 09:13:53 -06:00
Wenkai Du befaa35e12 profiling: fix incorrect print out in timing profile (#457)
[ROCm/rccl commit: 26fc6b0919]
2021-11-03 16:22:21 -07:00
pavahora 93290f6230 Updating googletest to 1.11.0 (#454)
Co-authored-by: Vahora <pavahora@amd.com>

[ROCm/rccl commit: ee1a11ca7e]
2021-11-02 15:44:35 -06:00
Wenkai Du df59f64e3f Support different protocols and algorithms in all reduce only build (#455)
* Support different protocols and algorithms in all reduce only build

* Restore deleted line in error

[ROCm/rccl commit: 29170a8b5f]
2021-11-02 08:39:08 -07:00
Wenkai Du a11b55a37f Check rocm_smi64Config.h on older ROCm build (#452)
[ROCm/rccl commit: 4643a17f83]
2021-10-28 07:26:28 -07:00
Wenkai Du b0b2e27df3 Rework kernel launch code (#449)
[ROCm/rccl commit: d221fb672a]
2021-10-28 07:26:11 -07:00
Wenkai Du 747216e2b2 Enable timing profiling mode (#447)
[ROCm/rccl commit: ec36c4c326]
2021-10-27 08:21:48 -07:00
Stanley Tsang a6feafd5dc Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs

[ROCm/rccl commit: 7e55b211c5]
2021-10-26 17:36:46 -06:00
Wenkai Du 15143b1cfb Query XGMI link count through rocm_smi_lib API (#442)
[ROCm/rccl commit: 14a184eb67]
2021-10-26 10:30:20 -07:00
Stanley Tsang 0afd607328 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath

[ROCm/rccl commit: d23dfc12c1]
2021-10-26 11:10:04 -06:00
gilbertlee-amd bf024320e4 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446)
[ROCm/rccl commit: 18246fc191]
2021-10-25 11:23:29 -06:00
Saad Rahim 83dc6396a5 Removing unmaintained dockerfiles (#439)
[ROCm/rccl commit: 31f9e79775]
2021-10-22 16:11:23 -06:00
Roopa Malavally e381d2d830 Update attributions.rst
[ROCm/rccl commit: 8486554e4b]
2021-10-21 21:08:48 -07:00
gilbertlee-amd b795cc090b TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)

[ROCm/rccl commit: 550d732d6c]
2021-10-21 15:28:16 -06:00
Wenkai Du aac9d57633 Fix collnet tuning parameters (#441)
[ROCm/rccl commit: b4cefc05ed]
2021-10-20 20:45:36 -07:00
Wenkai Du 4d979ce13d Fix PCIe gen detection (#437)
* Fix PCIe gen detection

* Update profiling support

[ROCm/rccl commit: 2508507d0a]
2021-10-15 08:23:50 -07:00
gilbertlee-amd fe4285d002 [TransferBench] Adding comment echoing to help distinguish tests (#438)
[ROCm/rccl commit: f6b7ac693e]
2021-10-13 14:56:57 -06:00
gilbertlee-amd ad1a620333 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436)
[ROCm/rccl commit: 269f07fbc3]
2021-10-12 09:32:54 -06:00
Wenkai Du b587b55c2e Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning

[ROCm/rccl commit: 2249a1d9d3]
2021-10-12 08:23:20 -07:00
gilbertlee-amd 227848b70f [TransferBench] Adding ability to specify suffix for numBytes (#435)
[ROCm/rccl commit: aa917c3fc8]
2021-10-08 16:36:19 -06:00
gilbertlee-amd 94c60f772d Updating licensing / attribution for documentation (#432)
[ROCm/rccl commit: a6368bac99]
2021-10-08 13:17:24 -06:00
gilbertlee-amd fef14c1b73 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test

[ROCm/rccl commit: e506d14d18]
2021-10-07 15:57:21 -06:00
Wenkai Du d377c4dcc6 Add another Rome model (#431)
[ROCm/rccl commit: e0053311c0]
2021-10-06 08:17:12 -07:00
Wenkai Du 91eca0d7d2 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops

[ROCm/rccl commit: 29c729d8b6]
2021-10-05 18:27:43 -07:00
Gilbert Lee 5be5b37e19 [TransferBench] Update to 2.10.3
[ROCm/rccl commit: 68ec3f84e6]
2021-08-02 05:53:20 -05:00
Wenkai Du a44ed86b46 Merge remote-tracking branch 'origin/develop' into 2.10.3
[ROCm/rccl commit: 5b72727670]
2021-09-15 10:33:25 -07:00
Wenkai Du 1eaf495391 Remove extra L1 cache invalidate and restore __ATOMIC_SEQ_CST atomics (#426)
[ROCm/rccl commit: 5ae3f3f954]
2021-09-14 18:30:16 -07:00
Wenkai Du d6064367f0 Merge remote-tracking branch 'origin/develop' into 2.10.3
[ROCm/rccl commit: 3667d308ab]
2021-09-13 17:32:48 -07:00
Wenkai Du f4387b2954 Use relaxed atomics and add sleep and wakeup in barrier loop (#425)
* Use relaxed atomics and add sleep and wakeup in barrier loop

* atomicAdd in ROCm 4.3 only support unsigned long long

* Switch to atomicAdd and atomicExch in more places

* Restore LOAD/STORE define to __ATOMIC_SEQ_CST

* Restore atomic for sizes FIFO

[ROCm/rccl commit: 020484bf40]
2021-09-13 17:03:49 -07:00
Wenkai Du 8acdb77cc0 Merge remote-tracking branch 'origin/develop' into 2.10.3
[ROCm/rccl commit: 8ee2b7932a]
2021-09-13 15:51:53 -07:00
Wenkai Du 9ffeb41fe1 Update tuning table (#424)
[ROCm/rccl commit: ef432e48e1]
2021-09-13 08:39:01 -07:00
Wenkai Du 934885526d Merge pull request #423 from wenkaidu/prim-test
rccl-prim-test: support 8p1h and 16p1h testing

[ROCm/rccl commit: a2421f8b4a]
2021-09-08 17:01:19 -07:00
Wenkai Du d2580c8cf5 Improve barrier implementation
[ROCm/rccl commit: adb8d63352]
2021-09-08 16:14:32 -05:00
Wenkai Du d75504e9dc Remove atomic from profiling
[ROCm/rccl commit: 31bd4236f1]
2021-09-08 14:20:32 -05:00
Wenkai Du 310d51056f rccl-prim-test: enable 8p1h and 16p1h test
[ROCm/rccl commit: 7558b5e2bf]
2021-09-08 11:51:26 -05:00
Wenkai Du 4f610a2239 Revert "rccl-prim-test: add all-to-all benchmark (#185)"
This reverts commit e3e1c6b29c.


[ROCm/rccl commit: b22d097524]
2021-09-07 16:41:46 -05:00
gilbertlee-amd 06b0e1c4e2 [TransferBench] ConfigFile parsing fixes, adding additional info (#422)
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix

* [TransferBench] Fixing up NUMA node detection by filtering pools

[ROCm/rccl commit: 51d64894ff]
2021-09-07 15:28:16 -06:00
Wenkai Du 95b37d97cf Merge remote-tracking branch 'nccl/master' into 2.10.3
[ROCm/rccl commit: 21969d7f89]
2021-09-01 16:34:44 -07:00
John Bachan 04553b802a Fix to https://github.com/NVIDIA/nccl/issues/560
ncclGroup's containing operations of mixed datatype, element, or collective
would induce crash.


[ROCm/rccl commit: 5f2f2f670f]
2021-08-31 15:50:05 -07:00
Wenkai Du b9508a6aba Implement NIC identification and remapping (#420)
* Add 1H16P GPU model

* Implement NIC identification and remapping

* Revert "Sort IB devices based on device name (#413)"

This reverts commit de0c586bad.

* Fix permute and check order

* Correction on IB speed reporting

* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"

This reverts commit fa690c47a0.

[ROCm/rccl commit: 5c8380ff5b]
2021-08-24 09:42:04 -07:00
Gilbert Lee e0fd3c6ba3 Modifying ReduceOrCopyMulti to accept number of preOp source to support clique-based kernels
[ROCm/rccl commit: ae13d2a354]
2021-08-11 11:00:34 -05:00
Wenkai Du eaf54184bf Improve clique kernel performance by increasing unroll
[ROCm/rccl commit: a4929465c5]
2021-08-26 18:06:09 -07:00
Wenkai Du a5e85a66fd Fix typo that affects clique kernels
[ROCm/rccl commit: 574f0aca53]
2021-08-26 11:10:58 -07:00
Wenkai Du bd03ec2a45 Unit Test: support ncclAvg
[ROCm/rccl commit: 1faff323b4]
2021-08-25 14:15:54 -07:00
Wenkai Du da7ac55e6f Fix kernel data trace
[ROCm/rccl commit: 60ca7484c0]
2021-08-24 14:02:53 -07:00
Wenkai Du 4fd7a14087 Merge remote-tracking branch 'origin/develop' into 2.10.3
[ROCm/rccl commit: d5f93649ff]
2021-08-24 09:49:47 -07:00
Wenkai Du 57518da006 Add gfx908 VM model (#418)
[ROCm/rccl commit: 5f15ed6e3e]
2021-08-10 08:55:11 -07:00
Wenkai Du 51198b536d Use noinline for kernel functions
[ROCm/rccl commit: 707c687090]
2021-08-06 09:15:04 -07:00
Wenkai Du 79bc1e3fde Fix incorrect network proxy received bytes reporting
[ROCm/rccl commit: 01d3b20a66]
2021-08-05 17:45:48 -07:00