Wenkai Du
e9bf01fb7e
Determine fine grained memory availability at RCCL bootstrapping ( #471 )
2021-11-19 08:12:53 -08:00
Stanley Tsang
7b8b54955b
Set ROCM_PATH CMake variable in install script ( #470 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
* Setting ROCM_PATH CMake variable in install script
2021-11-18 14:44:19 -07:00
Wenkai Du
03a830293c
gtest: dynamically generate tests based on test machine's GPU count ( #467 )
...
* gtest: dynamically generate tests based on test machine's GPU count
* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang
a6dba6b9dd
Remove hardcoded references to /opt/rocm when using chrpath ( #469 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du
e05de8fd26
Remove extra work element copy ( #465 )
2021-11-09 13:52:03 -08:00
gilbertlee-amd
1c7ef1b790
[TransferBench] Adding #CUs / RRLW mode to p2p benchmark ( #464 )
2021-11-08 14:36:04 -07:00
Wenkai Du
bc2932be4e
Unit Test: use range from 0 to 1 for floating point test data ( #459 )
...
* Unit Test: use range from 0 to 1 for floating point test data
* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang
2f87073514
Fixing cmake_install_prefix search to include /opt/rocm-xxxx ( #462 )
2021-11-06 07:58:26 -07:00
Wenkai Du
33bdd557c8
Do not use async stream for memory allocation and transport setup without graph ( #460 )
2021-11-05 13:39:14 -07:00
Wenkai Du
0331e39f81
Update Rome model matching ( #461 )
...
* Update Rome model matching
* Add another Rome model
* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
rachanaramanna
04c10a6025
Update LICENSE.txt ( #450 )
2021-11-05 09:13:53 -06:00
Wenkai Du
26fc6b0919
profiling: fix incorrect print out in timing profile ( #457 )
2021-11-03 16:22:21 -07:00
pavahora
ee1a11ca7e
Updating googletest to 1.11.0 ( #454 )
...
Co-authored-by: Vahora <pavahora@amd.com >
2021-11-02 15:44:35 -06:00
Wenkai Du
29170a8b5f
Support different protocols and algorithms in all reduce only build ( #455 )
...
* Support different protocols and algorithms in all reduce only build
* Restore deleted line in error
2021-11-02 08:39:08 -07:00
Wenkai Du
4643a17f83
Check rocm_smi64Config.h on older ROCm build ( #452 )
2021-10-28 07:26:28 -07:00
Wenkai Du
d221fb672a
Rework kernel launch code ( #449 )
2021-10-28 07:26:11 -07:00
Wenkai Du
ec36c4c326
Enable timing profiling mode ( #447 )
2021-10-27 08:21:48 -07:00
Stanley Tsang
7e55b211c5
Build AllReduce only mode ( #443 )
...
* Initial commit of all_reduce_only support
* Working AllReduce only build
* Removing printfs and restoring release build
* Restore P2P index
* Updates to build_allreduce_only mode.
* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Wenkai Du
14a184eb67
Query XGMI link count through rocm_smi_lib API ( #442 )
2021-10-26 10:30:20 -07:00
Stanley Tsang
d23dfc12c1
Re-enable use of chrpath to manually set rpath for unit tests. ( #448 )
...
* Re-enable use of chrpath to manually set rpath for unit tests.
* Add check for chrpath
2021-10-26 11:10:04 -06:00
gilbertlee-amd
18246fc191
[TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var ( #446 )
2021-10-25 11:23:29 -06:00
Saad Rahim
31f9e79775
Removing unmaintained dockerfiles ( #439 )
2021-10-22 16:11:23 -06:00
Roopa Malavally
8486554e4b
Update attributions.rst
2021-10-21 21:08:48 -07:00
gilbertlee-amd
550d732d6c
TransferBench p2p benchmark mode ( #444 )
...
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
Wenkai Du
b4cefc05ed
Fix collnet tuning parameters ( #441 )
2021-10-20 20:45:36 -07:00
Wenkai Du
2508507d0a
Fix PCIe gen detection ( #437 )
...
* Fix PCIe gen detection
* Update profiling support
2021-10-15 08:23:50 -07:00
gilbertlee-amd
f6b7ac693e
[TransferBench] Adding comment echoing to help distinguish tests ( #438 )
2021-10-13 14:56:57 -06:00
gilbertlee-amd
269f07fbc3
[TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU ( #436 )
2021-10-12 09:32:54 -06:00
Wenkai Du
2249a1d9d3
Add more Rome models ( #434 )
...
* Add more Rome models
* Update models and tuning
* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd
aa917c3fc8
[TransferBench] Adding ability to specify suffix for numBytes ( #435 )
2021-10-08 16:36:19 -06:00
gilbertlee-amd
a6368bac99
Updating licensing / attribution for documentation ( #432 )
2021-10-08 13:17:24 -06:00
gilbertlee-amd
e506d14d18
[TransferBench] Fixing advanced config, adding new all-1-hop sample test ( #433 )
...
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du
e0053311c0
Add another Rome model ( #431 )
2021-10-06 08:17:12 -07:00
Wenkai Du
29c729d8b6
Trim NICs when all GPUs are connected by XGMI ( #430 )
...
* Trim NICs when all GPUs are connected by XGMI
* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Wenkai Du
51a1cf428e
Merge pull request #428 from ROCmSoftwarePlatform/2.10.3
...
Sync up with NCCL 2.10.3
2021-09-17 08:23:43 -07:00
Wenkai Du
5ae3f3f954
Remove extra L1 cache invalidate and restore __ATOMIC_SEQ_CST atomics ( #426 )
2021-09-14 18:30:16 -07:00
Wenkai Du
020484bf40
Use relaxed atomics and add sleep and wakeup in barrier loop ( #425 )
...
* Use relaxed atomics and add sleep and wakeup in barrier loop
* atomicAdd in ROCm 4.3 only support unsigned long long
* Switch to atomicAdd and atomicExch in more places
* Restore LOAD/STORE define to __ATOMIC_SEQ_CST
* Restore atomic for sizes FIFO
2021-09-13 17:03:49 -07:00
Wenkai Du
ef432e48e1
Update tuning table ( #424 )
2021-09-13 08:39:01 -07:00
Wenkai Du
a2421f8b4a
Merge pull request #423 from wenkaidu/prim-test
...
rccl-prim-test: support 8p1h and 16p1h testing
2021-09-08 17:01:19 -07:00
Wenkai Du
adb8d63352
Improve barrier implementation
2021-09-08 16:14:32 -05:00
Wenkai Du
31bd4236f1
Remove atomic from profiling
2021-09-08 14:20:32 -05:00
Wenkai Du
7558b5e2bf
rccl-prim-test: enable 8p1h and 16p1h test
2021-09-08 11:51:26 -05:00
Wenkai Du
b22d097524
Revert "rccl-prim-test: add all-to-all benchmark ( #185 )"
...
This reverts commit ebc823e603 .
2021-09-07 16:41:46 -05:00
gilbertlee-amd
51d64894ff
[TransferBench] ConfigFile parsing fixes, adding additional info ( #422 )
...
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix
* [TransferBench] Fixing up NUMA node detection by filtering pools
2021-09-07 15:28:16 -06:00
Wenkai Du
5c8380ff5b
Implement NIC identification and remapping ( #420 )
...
* Add 1H16P GPU model
* Implement NIC identification and remapping
* Revert "Sort IB devices based on device name (#413 )"
This reverts commit 2d0ed8dff6 .
* Fix permute and check order
* Correction on IB speed reporting
* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361 )"
This reverts commit caf5c9992a .
2021-08-24 09:42:04 -07:00
Wenkai Du
5f15ed6e3e
Add gfx908 VM model ( #418 )
2021-08-10 08:55:11 -07:00
gilbertlee-amd
1ed272e5f0
[TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header ( #416 )
2021-08-04 10:53:41 -06:00
Wenkai Du
2d0ed8dff6
Sort IB devices based on device name ( #413 )
2021-08-03 15:32:41 -07:00
Gilbert Lee
68ec3f84e6
[TransferBench] Update to 2.10.3
2021-08-02 05:53:20 -05:00
Wenkai Du
5b72727670
Merge remote-tracking branch 'origin/develop' into 2.10.3
2021-09-15 10:33:25 -07:00