gilbertlee-amd
7add135529
Updating CHANGELOG.md ( #488 )
2022-01-10 11:34:31 -07:00
Wenkai Du
3669e12432
Use hipGraph instead of cudaGraph ( #487 )
2022-01-10 08:26:01 -08:00
Wenkai Du
565fbeb5e9
Tune collectives for 2.11.4 ( #486 )
2022-01-10 08:25:47 -08:00
gilbertlee-amd
2530a2f084
[TransferBench] Updating for 2.11.4. Decoupling from RCCL kernel ( #485 )
2022-01-05 16:33:25 -07:00
Wenkai Du
6268b87c16
Unit tests: fix number of GPU detection ( #484 )
2022-01-05 15:06:12 -08:00
Wenkai Du
4234a638b5
Merge pull request #482 from ROCmSoftwarePlatform/2.11.4
...
Sync up with 2.11.4
2022-01-05 09:31:51 -08:00
Wenkai Du
f8d0775a6f
Add another Rome model ( #483 )
2022-01-05 09:26:31 -08:00
Wenkai Du
434ecb0e1f
Merge remote-tracking branch 'origin/develop' into 2.11.4
2022-01-03 09:54:16 -08:00
Stanley Tsang
bbbb35ceec
Fixing setting of GPUs to 2 when 1 or less GPUs on system for unit tests ( #481 )
2021-12-09 11:04:31 -07:00
Wenkai Du
a94b953bcc
Update Rome model ( #479 )
2021-12-03 08:24:51 -08:00
Wenkai Du
8a08a2f579
Update tuning parameters ( #478 )
2021-11-30 08:51:11 -08:00
gilbertlee-amd
1157c2edfe
[TransferBench] Adding more preset benchmarks to filter read mode, cpu vs gpu pairs ( #477 )
2021-11-24 18:05:37 -07:00
gilbertlee-amd
539de1216f
Minor cppcheck fixes, adding suppression file ( #475 )
...
* Minor cppcheck fixes, adding suppression file
2021-11-24 10:23:59 -07:00
Wenkai Du
e9bf01fb7e
Determine fine grained memory availability at RCCL bootstrapping ( #471 )
2021-11-19 08:12:53 -08:00
Stanley Tsang
7b8b54955b
Set ROCM_PATH CMake variable in install script ( #470 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
* Setting ROCM_PATH CMake variable in install script
2021-11-18 14:44:19 -07:00
Wenkai Du
03a830293c
gtest: dynamically generate tests based on test machine's GPU count ( #467 )
...
* gtest: dynamically generate tests based on test machine's GPU count
* Adjust test element size and bfloat16 threshold for up to 16 GPUs
2021-11-16 10:28:26 -08:00
Stanley Tsang
a6dba6b9dd
Remove hardcoded references to /opt/rocm when using chrpath ( #469 )
...
* Fixing cmake_install_prefix search to include /opt/rocm-xxxx
* Removing all hard references to /opt/rocm with ROCM_PATH
2021-11-15 15:00:55 -07:00
Wenkai Du
3a919c1f49
Merge remote-tracking branch 'nccl/master' into develop
2021-11-11 14:22:12 -08:00
Wenkai Du
e05de8fd26
Remove extra work element copy ( #465 )
2021-11-09 13:52:03 -08:00
gilbertlee-amd
1c7ef1b790
[TransferBench] Adding #CUs / RRLW mode to p2p benchmark ( #464 )
2021-11-08 14:36:04 -07:00
Wenkai Du
bc2932be4e
Unit Test: use range from 0 to 1 for floating point test data ( #459 )
...
* Unit Test: use range from 0 to 1 for floating point test data
* gtest: Update init data and bfloat16 threshold
2021-11-08 09:21:09 -08:00
Stanley Tsang
2f87073514
Fixing cmake_install_prefix search to include /opt/rocm-xxxx ( #462 )
2021-11-06 07:58:26 -07:00
Wenkai Du
33bdd557c8
Do not use async stream for memory allocation and transport setup without graph ( #460 )
2021-11-05 13:39:14 -07:00
Wenkai Du
0331e39f81
Update Rome model matching ( #461 )
...
* Update Rome model matching
* Add another Rome model
* Automatically setup NET GDR level from model
2021-11-05 08:53:47 -07:00
rachanaramanna
04c10a6025
Update LICENSE.txt ( #450 )
2021-11-05 09:13:53 -06:00
Wenkai Du
26fc6b0919
profiling: fix incorrect print out in timing profile ( #457 )
2021-11-03 16:22:21 -07:00
pavahora
ee1a11ca7e
Updating googletest to 1.11.0 ( #454 )
...
Co-authored-by: Vahora <pavahora@amd.com >
2021-11-02 15:44:35 -06:00
Wenkai Du
29170a8b5f
Support different protocols and algorithms in all reduce only build ( #455 )
...
* Support different protocols and algorithms in all reduce only build
* Restore deleted line in error
2021-11-02 08:39:08 -07:00
Wenkai Du
4643a17f83
Check rocm_smi64Config.h on older ROCm build ( #452 )
2021-10-28 07:26:28 -07:00
Wenkai Du
d221fb672a
Rework kernel launch code ( #449 )
2021-10-28 07:26:11 -07:00
Wenkai Du
ec36c4c326
Enable timing profiling mode ( #447 )
2021-10-27 08:21:48 -07:00
Stanley Tsang
7e55b211c5
Build AllReduce only mode ( #443 )
...
* Initial commit of all_reduce_only support
* Working AllReduce only build
* Removing printfs and restoring release build
* Restore P2P index
* Updates to build_allreduce_only mode.
* cleaning up macro ifdefs
2021-10-26 17:36:46 -06:00
Wenkai Du
14a184eb67
Query XGMI link count through rocm_smi_lib API ( #442 )
2021-10-26 10:30:20 -07:00
Stanley Tsang
d23dfc12c1
Re-enable use of chrpath to manually set rpath for unit tests. ( #448 )
...
* Re-enable use of chrpath to manually set rpath for unit tests.
* Add check for chrpath
2021-10-26 11:10:04 -06:00
gilbertlee-amd
18246fc191
[TransferBench] Changing default per block multiple to 256B, adding BLOCK_BYTES env var ( #446 )
2021-10-25 11:23:29 -06:00
Saad Rahim
31f9e79775
Removing unmaintained dockerfiles ( #439 )
2021-10-22 16:11:23 -06:00
Roopa Malavally
8486554e4b
Update attributions.rst
2021-10-21 21:08:48 -07:00
gilbertlee-amd
550d732d6c
TransferBench p2p benchmark mode ( #444 )
...
* [TransferBench] Adding a p2p benchmark mode
* [TransferBench] Switching to using single sync mode by default (USE_SINGLE_SYNC=1)
2021-10-21 15:28:16 -06:00
Wenkai Du
b4cefc05ed
Fix collnet tuning parameters ( #441 )
2021-10-20 20:45:36 -07:00
Wenkai Du
2508507d0a
Fix PCIe gen detection ( #437 )
...
* Fix PCIe gen detection
* Update profiling support
2021-10-15 08:23:50 -07:00
gilbertlee-amd
f6b7ac693e
[TransferBench] Adding comment echoing to help distinguish tests ( #438 )
2021-10-13 14:56:57 -06:00
gilbertlee-amd
269f07fbc3
[TransferBench] Adding shared memory per threadblock env var. Defaulting to 1 threadblock per CU ( #436 )
2021-10-12 09:32:54 -06:00
Wenkai Du
2249a1d9d3
Add more Rome models ( #434 )
...
* Add more Rome models
* Update models and tuning
* Update tuning
2021-10-12 08:23:20 -07:00
gilbertlee-amd
aa917c3fc8
[TransferBench] Adding ability to specify suffix for numBytes ( #435 )
2021-10-08 16:36:19 -06:00
gilbertlee-amd
a6368bac99
Updating licensing / attribution for documentation ( #432 )
2021-10-08 13:17:24 -06:00
gilbertlee-amd
e506d14d18
[TransferBench] Fixing advanced config, adding new all-1-hop sample test ( #433 )
...
* [TransferBench] Fixing advanced config, adding new all-1-hop sample test
2021-10-07 15:57:21 -06:00
Wenkai Du
e0053311c0
Add another Rome model ( #431 )
2021-10-06 08:17:12 -07:00
Wenkai Du
29c729d8b6
Trim NICs when all GPUs are connected by XGMI ( #430 )
...
* Trim NICs when all GPUs are connected by XGMI
* Only enable clique with maximum of 2 hops
2021-10-05 18:27:43 -07:00
John Bachan
30ca3fcacf
Fix compilation failure in "src/enqueue.cc" on older GCC because of
...
missing `#include <cstring>`.
2021-09-23 09:55:16 -07:00
Sylvain Jeaugey
4ec992fab7
Fix Collnet when GDR is disabled
2021-09-22 05:19:16 -07:00