Граф коммитов

734 Коммитов

Автор SHA1 Сообщение Дата
gilbertlee-amd 7add135529 Updating CHANGELOG.md (#488) 2022-01-10 11:34:31 -07:00
Wenkai Du 3669e12432 Use hipGraph instead of cudaGraph (#487) 2022-01-10 08:26:01 -08:00
Wenkai Du 565fbeb5e9 Tune collectives for 2.11.4 (#486) 2022-01-10 08:25:47 -08:00
gilbertlee-amd 2530a2f084 [TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel (#485) 2022-01-05 16:33:25 -07:00
Wenkai Du 6268b87c16 Unit tests: fix number of GPU detection (#484) 2022-01-05 15:06:12 -08:00
Wenkai Du 4234a638b5 Merge pull request #482 from ROCmSoftwarePlatform/2.11.4
Sync up with 2.11.4
2022-01-05 09:31:51 -08:00
Wenkai Du f8d0775a6f Add another Rome model (#483) 2022-01-05 09:26:31 -08:00
Wenkai Du 434ecb0e1f Merge remote-tracking branch 'origin/develop' into 2.11.4 2022-01-03 09:54:16 -08:00
Stanley Tsang bbbb35ceec Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests (#481) 2021-12-09 11:04:31 -07:00
Wenkai Du a94b953bcc Update Rome model (#479) 2021-12-03 08:24:51 -08:00
Wenkai Du 8a08a2f579 Update tuning parameters (#478) 2021-11-30 08:51:11 -08:00
gilbertlee-amd 1157c2edfe [TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs (#477) 2021-11-24 18:05:37 -07:00
gilbertlee-amd 539de1216f Minor cppcheck fixes, adding suppression file (#475)
* Minor cppcheck fixes, adding suppression file
2021-11-24 10:23:59 -07:00
Wenkai Du e9bf01fb7e Determine fine grained memory availability at RCCL bootstrapping (#471) 2021-11-19 08:12:53 -08:00
Stanley Tsang 7b8b54955b Set ROCM_PATH CMake variable in install script (#470)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH

* Setting ROCM_PATH CMake variable in install script
2021-11-18 14:44:19 -07:00
Wenkai Du 03a830293c gtest: dynamically generate tests based on test machine's GPU count (#467)
* gtest: dynamically generate tests based on test machine's GPU count

* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang a6dba6b9dd Remove hardcoded references to /opt/rocm when using chrpath (#469)
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx

* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du 3a919c1f49 Merge remote-tracking branch 'nccl/master' into develop 2021-11-11 14:22:12 -08:00
Wenkai Du e05de8fd26 Remove extra work element copy (#465) 2021-11-09 13:52:03 -08:00
gilbertlee-amd 1c7ef1b790 [TransferBench] Adding #CUs / RRLW mode to p2p benchmark (#464) 2021-11-08 14:36:04 -07:00
Wenkai Du bc2932be4e Unit Test: use range from 0 to 1 for floating point test data (#459)
* Unit Test: use range from 0 to 1 for floating point test data

* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang 2f87073514 Fixing cmake_install_prefix search to include /opt/rocm-xxxx (#462) 2021-11-06 07:58:26 -07:00
Wenkai Du 33bdd557c8 Do not use async stream for memory allocation and transport setup without graph (#460) 2021-11-05 13:39:14 -07:00
Wenkai Du 0331e39f81 Update Rome model matching (#461)
* Update Rome model matching

* Add another Rome model

* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
rachanaramanna 04c10a6025 Update LICENSE.txt (#450) 2021-11-05 09:13:53 -06:00
Wenkai Du 26fc6b0919 profiling: fix incorrect print out in timing profile (#457) 2021-11-03 16:22:21 -07:00
pavahora ee1a11ca7e Updating googletest to 1.11.0 (#454)
Co-authored-by: Vahora <pavahora@amd.com>
2021-11-02 15:44:35 -06:00
Wenkai Du 29170a8b5f Support different protocols and algorithms in all reduce only build (#455)
* Support different protocols and algorithms in all reduce only build

* Restore deleted line in error
2021-11-02 08:39:08 -07:00
Wenkai Du 4643a17f83 Check rocm_smi64Config.h on older ROCm build (#452) 2021-10-28 07:26:28 -07:00
Wenkai Du d221fb672a Rework kernel launch code (#449) 2021-10-28 07:26:11 -07:00
Wenkai Du ec36c4c326 Enable timing profiling mode (#447) 2021-10-27 08:21:48 -07:00
Stanley Tsang 7e55b211c5 Build AllReduce only mode (#443)
* Initial commit of all_reduce_only support

* Working AllReduce only build

* Removing printfs and restoring release build

* Restore P2P index

* Updates to build_allreduce_only mode.

* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Wenkai Du 14a184eb67 Query XGMI link count through rocm_smi_lib API (#442) 2021-10-26 10:30:20 -07:00
Stanley Tsang d23dfc12c1 Re-enable use of chrpath to manually set rpath for unit tests. (#448)
* Re-enable use of chrpath to manually set rpath for unit tests.

* Add check for chrpath
2021-10-26 11:10:04 -06:00
gilbertlee-amd 18246fc191 [TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var (#446) 2021-10-25 11:23:29 -06:00
Saad Rahim 31f9e79775 Removing unmaintained dockerfiles (#439) 2021-10-22 16:11:23 -06:00
Roopa Malavally 8486554e4b Update attributions.rst 2021-10-21 21:08:48 -07:00
gilbertlee-amd 550d732d6c TransferBench p2p benchmark mode (#444)
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
Wenkai Du b4cefc05ed Fix collnet tuning parameters (#441) 2021-10-20 20:45:36 -07:00
Wenkai Du 2508507d0a Fix PCIe gen detection (#437)
* Fix PCIe gen detection

* Update profiling support
2021-10-15 08:23:50 -07:00
gilbertlee-amd f6b7ac693e [TransferBench] Adding comment echoing to help distinguish tests (#438) 2021-10-13 14:56:57 -06:00
gilbertlee-amd 269f07fbc3 [TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU (#436) 2021-10-12 09:32:54 -06:00
Wenkai Du 2249a1d9d3 Add more Rome models (#434)
* Add more Rome models

* Update models and tuning

* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd aa917c3fc8 [TransferBench] Adding ability to specify suffix for numBytes (#435) 2021-10-08 16:36:19 -06:00
gilbertlee-amd a6368bac99 Updating licensing / attribution for documentation (#432) 2021-10-08 13:17:24 -06:00
gilbertlee-amd e506d14d18 [TransferBench] Fixing advanced config, adding new all-1-hop sample test (#433)
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du e0053311c0 Add another Rome model (#431) 2021-10-06 08:17:12 -07:00
Wenkai Du 29c729d8b6 Trim NICs when all GPUs are connected by XGMI (#430)
* Trim NICs when all GPUs are connected by XGMI

* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
John Bachan 30ca3fcacf Fix compilation failure in "src/enqueue.cc" on older GCC because of
missing `#include <cstring>`.
2021-09-23 09:55:16 -07:00
Sylvain Jeaugey 4ec992fab7 Fix Collnet when GDR is disabled 2021-09-22 05:19:16 -07:00