İşleme Grafiği

717 İşleme

Yazar SHA1 Mesaj Tarih
Wenkai Du e9bf01fb7e Determine fine grained memory availability at RCCL bootstrapping (#471) 2021-11-19 08:12:53 -08:00
Stanley Tsang 7b8b54955b Set ROCM_PATH CMake variable in install script (#470)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH

* Setting ROCM_PATH CMake variable in install script
2021-11-18 14:44:19 -07:00
Wenkai Du 03a830293c gtest: dynamically generate tests based on test machine's GPU count (#467)
* gtest: dynamically generate tests based on test machine's GPU count

* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang a6dba6b9dd Remove hardcoded references to /opt/rocm when using chrpath (#469)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du e05de8fd26 Remove extra work element copy (#465) 2021-11-09 13:52:03 -08:00
gilbertlee-amd 1c7ef1b790 [TransferBench] Adding #CUs / RRLW mode to p2p benchmark (#464) 2021-11-08 14:36:04 -07:00
Wenkai Du bc2932be4e Unit Test: use range from 0 to 1 for floating point test data (#459)
* Unit Test: use range from 0 to 1 for floating point test data

* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang 2f87073514 Fixing cmake_install_prefix search to include /opt/rocm-xxxx (#462) 2021-11-06 07:58:26 -07:00
Wenkai Du 33bdd557c8 Do not use async stream for memory allocation and transport setup without graph (#460) 2021-11-05 13:39:14 -07:00
Wenkai Du 0331e39f81 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
rachanaramanna 04c10a6025 Update LICENSE.txt (#450) 2021-11-05 09:13:53 -06:00
Wenkai Du 26fc6b0919 profiling: fix incorrect print out in timing profile (#457) 2021-11-03 16:22:21 -07:00
pavahora ee1a11ca7e Updating googletest to 1.11.0 (#454)
Co-authored-by: Vahora <pavahora@amd.com>
2021-11-02 15:44:35 -06:00
Wenkai Du 29170a8b5f Support different protocols and algorithms in all reduce only build (#455)
* Support different protocols and algorithms in all reduce only build

* Restore deleted line in error
2021-11-02 08:39:08 -07:00
Wenkai Du 4643a17f83 Check rocm_smi64Config.h on older ROCm build (#452) 2021-10-28 07:26:28 -07:00
Wenkai Du d221fb672a Rework kernel launch code (#449) 2021-10-28 07:26:11 -07:00
Wenkai Du ec36c4c326 Enable timing profiling mode (#447) 2021-10-27 08:21:48 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Wenkai Du 14a184eb67 Query XGMI link count through rocm_smi_lib API (#442) 2021-10-26 10:30:20 -07:00
Stanley Tsang d23dfc12c1 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath
2021-10-26 11:10:04 -06:00
gilbertlee-amd 18246fc191 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446) 2021-10-25 11:23:29 -06:00
Saad Rahim 31f9e79775 Removing unmaintained dockerfiles (#439) 2021-10-22 16:11:23 -06:00
Roopa Malavally 8486554e4b Update attributions.rst 2021-10-21 21:08:48 -07:00
gilbertlee-amd 550d732d6c TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
Wenkai Du b4cefc05ed Fix collnet tuning parameters (#441) 2021-10-20 20:45:36 -07:00
Wenkai Du 2508507d0a Fix PCIe gen detection (#437)
* Fix PCIe gen detection

* Update profiling support
2021-10-15 08:23:50 -07:00
gilbertlee-amd f6b7ac693e [TransferBench] Adding comment echoing to help distinguish tests (#438) 2021-10-13 14:56:57 -06:00
gilbertlee-amd 269f07fbc3 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436) 2021-10-12 09:32:54 -06:00
Wenkai Du 2249a1d9d3 Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd aa917c3fc8 [TransferBench] Adding ability to specify suffix for numBytes (#435) 2021-10-08 16:36:19 -06:00
gilbertlee-amd a6368bac99 Updating licensing / attribution for documentation (#432) 2021-10-08 13:17:24 -06:00
gilbertlee-amd e506d14d18 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du e0053311c0 Add another Rome model (#431) 2021-10-06 08:17:12 -07:00
Wenkai Du 29c729d8b6 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Wenkai Du 51a1cf428e Merge pull request #428 from ROCmSoftwarePlatform/2.10.3
Sync up with NCCL 2.10.3
2021-09-17 08:23:43 -07:00
Wenkai Du 5ae3f3f954 Remove extra L1 cache invalidate and restore __ATOMIC_SEQ_CST atomics (#426) 2021-09-14 18:30:16 -07:00
Wenkai Du 020484bf40 Use relaxed atomics and add sleep and wakeup in barrier loop (#425)
* Use relaxed atomics and add sleep and wakeup in barrier loop

* atomicAdd in ROCm 4.3 only support unsigned long long

* Switch to atomicAdd and atomicExch in more places

* Restore LOAD/STORE define to __ATOMIC_SEQ_CST

* Restore atomic for sizes FIFO
2021-09-13 17:03:49 -07:00
Wenkai Du ef432e48e1 Update tuning table (#424) 2021-09-13 08:39:01 -07:00
Wenkai Du a2421f8b4a Merge pull request #423 from wenkaidu/prim-test
rccl-prim-test: support 8p1h and 16p1h testing
2021-09-08 17:01:19 -07:00
Wenkai Du adb8d63352 Improve barrier implementation 2021-09-08 16:14:32 -05:00
Wenkai Du 31bd4236f1 Remove atomic from profiling 2021-09-08 14:20:32 -05:00
Wenkai Du 7558b5e2bf rccl-prim-test: enable 8p1h and 16p1h test 2021-09-08 11:51:26 -05:00
Wenkai Du b22d097524 Revert "rccl-prim-test: add all-to-all benchmark (#185)"
This reverts commit ebc823e603.
2021-09-07 16:41:46 -05:00
gilbertlee-amd 51d64894ff [TransferBench] ConfigFile parsing fixes, adding additional info (#422)
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix

* [TransferBench] Fixing up NUMA node detection by filtering pools
2021-09-07 15:28:16 -06:00
Wenkai Du 5c8380ff5b Implement NIC identification and remapping (#420)
* Add 1H16P GPU model

* Implement NIC identification and remapping

* Revert "Sort IB devices based on device name (#413)"

This reverts commit 2d0ed8dff6.

* Fix permute and check order

* Correction on IB speed reporting

* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"

This reverts commit caf5c9992a.
2021-08-24 09:42:04 -07:00
Wenkai Du 5f15ed6e3e Add gfx908 VM model (#418) 2021-08-10 08:55:11 -07:00
gilbertlee-amd 1ed272e5f0 [TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header (#416) 2021-08-04 10:53:41 -06:00
Wenkai Du 2d0ed8dff6 Sort IB devices based on device name (#413) 2021-08-03 15:32:41 -07:00
Gilbert Lee 68ec3f84e6 [TransferBench] Update to 2.10.3 2021-08-02 05:53:20 -05:00
Wenkai Du 5b72727670 Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-15 10:33:25 -07:00