Grafik Komit

711 Melakukan

Penulis SHA1 Pesan Tanggal
Wenkai Du bc2932be4e Unit Test: use range from 0 to 1 for floating point test data (#459)
* Unit Test: use range from 0 to 1 for floating point test data

* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang 2f87073514 Fixing cmake_install_prefix search to include /opt/rocm-xxxx (#462) 2021-11-06 07:58:26 -07:00
Wenkai Du 33bdd557c8 Do not use async stream for memory allocation and transport setup without graph (#460) 2021-11-05 13:39:14 -07:00
Wenkai Du 0331e39f81 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
rachanaramanna 04c10a6025 Update LICENSE.txt (#450) 2021-11-05 09:13:53 -06:00
Wenkai Du 26fc6b0919 profiling: fix incorrect print out in timing profile (#457) 2021-11-03 16:22:21 -07:00
pavahora ee1a11ca7e Updating googletest to 1.11.0 (#454)
Co-authored-by: Vahora <pavahora@amd.com>
2021-11-02 15:44:35 -06:00
Wenkai Du 29170a8b5f Support different protocols and algorithms in all reduce only build (#455)
* Support different protocols and algorithms in all reduce only build

* Restore deleted line in error
2021-11-02 08:39:08 -07:00
Wenkai Du 4643a17f83 Check rocm_smi64Config.h on older ROCm build (#452) 2021-10-28 07:26:28 -07:00
Wenkai Du d221fb672a Rework kernel launch code (#449) 2021-10-28 07:26:11 -07:00
Wenkai Du ec36c4c326 Enable timing profiling mode (#447) 2021-10-27 08:21:48 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Wenkai Du 14a184eb67 Query XGMI link count through rocm_smi_lib API (#442) 2021-10-26 10:30:20 -07:00
Stanley Tsang d23dfc12c1 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath
2021-10-26 11:10:04 -06:00
gilbertlee-amd 18246fc191 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446) 2021-10-25 11:23:29 -06:00
Saad Rahim 31f9e79775 Removing unmaintained dockerfiles (#439) 2021-10-22 16:11:23 -06:00
Roopa Malavally 8486554e4b Update attributions.rst 2021-10-21 21:08:48 -07:00
gilbertlee-amd 550d732d6c TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
Wenkai Du b4cefc05ed Fix collnet tuning parameters (#441) 2021-10-20 20:45:36 -07:00
Wenkai Du 2508507d0a Fix PCIe gen detection (#437)
* Fix PCIe gen detection

* Update profiling support
2021-10-15 08:23:50 -07:00
gilbertlee-amd f6b7ac693e [TransferBench] Adding comment echoing to help distinguish tests (#438) 2021-10-13 14:56:57 -06:00
gilbertlee-amd 269f07fbc3 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436) 2021-10-12 09:32:54 -06:00
Wenkai Du 2249a1d9d3 Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd aa917c3fc8 [TransferBench] Adding ability to specify suffix for numBytes (#435) 2021-10-08 16:36:19 -06:00
gilbertlee-amd a6368bac99 Updating licensing / attribution for documentation (#432) 2021-10-08 13:17:24 -06:00
gilbertlee-amd e506d14d18 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du e0053311c0 Add another Rome model (#431) 2021-10-06 08:17:12 -07:00
Wenkai Du 29c729d8b6 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
Wenkai Du 51a1cf428e Merge pull request #428 from ROCmSoftwarePlatform/2.10.3
Sync up with NCCL 2.10.3
2021-09-17 08:23:43 -07:00
Wenkai Du 5ae3f3f954 Remove extra L1 cache invalidate and restore __ATOMIC_SEQ_CST atomics (#426) 2021-09-14 18:30:16 -07:00
Wenkai Du 020484bf40 Use relaxed atomics and add sleep and wakeup in barrier loop (#425)
* Use relaxed atomics and add sleep and wakeup in barrier loop

* atomicAdd in ROCm 4.3 only support unsigned long long

* Switch to atomicAdd and atomicExch in more places

* Restore LOAD/STORE define to __ATOMIC_SEQ_CST

* Restore atomic for sizes FIFO
2021-09-13 17:03:49 -07:00
Wenkai Du ef432e48e1 Update tuning table (#424) 2021-09-13 08:39:01 -07:00
Wenkai Du a2421f8b4a Merge pull request #423 from wenkaidu/prim-test
rccl-prim-test: support 8p1h and 16p1h testing
2021-09-08 17:01:19 -07:00
Wenkai Du adb8d63352 Improve barrier implementation 2021-09-08 16:14:32 -05:00
Wenkai Du 31bd4236f1 Remove atomic from profiling 2021-09-08 14:20:32 -05:00
Wenkai Du 7558b5e2bf rccl-prim-test: enable 8p1h and 16p1h test 2021-09-08 11:51:26 -05:00
Wenkai Du b22d097524 Revert "rccl-prim-test: add all-to-all benchmark (#185)"
This reverts commit ebc823e603.
2021-09-07 16:41:46 -05:00
gilbertlee-amd 51d64894ff [TransferBench] ConfigFile parsing fixes, adding additional info (#422)
* [TransferBench] Adding GPU to NUMA distance detection, parsing fixes, config file generation fix

* [TransferBench] Fixing up NUMA node detection by filtering pools
2021-09-07 15:28:16 -06:00
Wenkai Du 5c8380ff5b Implement NIC identification and remapping (#420)
* Add 1H16P GPU model

* Implement NIC identification and remapping

* Revert "Sort IB devices based on device name (#413)"

This reverts commit 2d0ed8dff6.

* Fix permute and check order

* Correction on IB speed reporting

* Revert "Allow user to link layer with RCCL_IB_HCA_SKIP_LINK_LAYER (#361)"

This reverts commit caf5c9992a.
2021-08-24 09:42:04 -07:00
Wenkai Du 5f15ed6e3e Add gfx908 VM model (#418) 2021-08-10 08:55:11 -07:00
gilbertlee-amd 1ed272e5f0 [TransferBench] Removing dependency on hip_fp16 header, fixing swapped output CSV header (#416) 2021-08-04 10:53:41 -06:00
Wenkai Du 2d0ed8dff6 Sort IB devices based on device name (#413) 2021-08-03 15:32:41 -07:00
Gilbert Lee 68ec3f84e6 [TransferBench] Update to 2.10.3 2021-08-02 05:53:20 -05:00
Wenkai Du 5b72727670 Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-15 10:33:25 -07:00
Wenkai Du 3667d308ab Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-13 17:32:48 -07:00
Wenkai Du 8ee2b7932a Merge remote-tracking branch 'origin/develop' into 2.10.3 2021-09-13 15:51:53 -07:00
Wenkai Du 21969d7f89 Merge remote-tracking branch 'nccl/master' into 2.10.3 2021-09-01 16:34:44 -07:00
John Bachan 5f2f2f670f Fix to https://github.com/NVIDIA/nccl/issues/560
ncclGroup's containing operations of mixed datatype, element, or collective
would induce crash.
2021-08-31 15:50:05 -07:00
Gilbert Lee ae13d2a354 Modifying ReduceOrCopyMulti to accept number of preOp source to support clique-based kernels 2021-08-11 11:00:34 -05:00
Wenkai Du a4929465c5 Improve clique kernel performance by increasing unroll 2021-08-26 18:06:09 -07:00