paulfreddy
bbb0c59cd4
Changes for multiple ROCm installation ( #164 )
...
* Changes for multiple ROCm installation
1. Set version to 2.10.1
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
* Changes for multiple ROCm installation
1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
* Changes for multiple ROCm installation
1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
[ROCm/rccl commit: 15c917244d ]
2020-01-08 21:28:16 -08:00
Gilbert Lee
5783917a75
Changing single sync mode to time all iterations instead of just last
...
[ROCm/rccl commit: e5074ce94d ]
2019-12-20 17:08:39 -08:00
gilbertlee-amd
71635198b8
Removing OpenMP from unit tests ( #163 )
...
[ROCm/rccl commit: 000bce6f27 ]
2019-12-20 11:41:56 -07:00
gilbertlee-amd
a461b6d139
Adding new sleep after sync capability for data fabric profiling ( #162 )
...
Fixing missing header include for ROCM 3.0 changes
[ROCm/rccl commit: 2f4269d06d ]
2019-12-12 15:20:54 -07:00
saadrahim
26e161a7a2
Package fix ( #161 )
...
* Fixing RHEL dependency on rocm-dev
[ROCm/rccl commit: 0092b35132 ]
2019-12-06 16:06:50 -07:00
saadrahim
13de181fbc
Changing package dependency to rocm-dev ( #160 )
...
[ROCm/rccl commit: bd59b6f880 ]
2019-12-06 14:00:25 -07:00
Wenkai Du
35ad901dfe
Merge pull request #158 from wenkaidu/p2p
...
Change default P2P level
[ROCm/rccl commit: 9e10cde644 ]
2019-12-04 16:30:58 -08:00
Wenkai Du
b25dd83e7e
Merge pull request #157 from wenkaidu/readme
...
Change manual build instructions to fit most common usage
[ROCm/rccl commit: e9ca3a8029 ]
2019-12-04 14:50:41 -08:00
Wenkai Du
272d22fbe3
Change default P2P level
...
[ROCm/rccl commit: 90e928bcd5 ]
2019-12-04 21:05:10 +00:00
Wenkai Du
fedce64117
Change manual build instructions to fit most common usage
...
[ROCm/rccl commit: 00a910c2da ]
2019-11-26 12:40:26 -08:00
Wenkai Du
10a18c53f6
Merge pull request #155 from wenkaidu/direct
...
Disable direct buffers to reduce scratch memory size
[ROCm/rccl commit: b1ed4b7fa8 ]
2019-11-21 09:39:09 -08:00
Wenkai Du
4cb52294d5
Disable direct buffers to reduce scratch memory size
...
[ROCm/rccl commit: a0be2b8812 ]
2019-11-20 13:03:16 -08:00
Wenkai Du
5e3917961e
Merge pull request #154 from wenkaidu/bf16
...
Add bfloat16 support in RCCL
[ROCm/rccl commit: 9a70ee2eb1 ]
2019-11-19 09:07:51 -08:00
Wenkai Du
55ad3c801e
Support bfloat16 on rest of the unit tests
...
[ROCm/rccl commit: 4ca05c1297 ]
2019-11-18 14:18:34 -08:00
Wenkai Du
7dc39b8928
Add bfloat16 all reduce unit test
...
[ROCm/rccl commit: bdac0256a5 ]
2019-11-18 13:50:29 -08:00
Wenkai Du
1e182391ad
Add bfloat16 support in RCCL
...
Preprocessor symbol RCCL_BFLOAT16 is used as feature indicator
[ROCm/rccl commit: 5e109ed400 ]
2019-11-18 13:45:53 -08:00
Wenkai Du
4de763fe00
Merge pull request #153 from wenkaidu/fuji
...
Temporary disable 0x803 target due to build error
[ROCm/rccl commit: 58a6e535f6 ]
2019-11-14 11:46:21 -08:00
Wenkai Du
8b1ce44c2a
Temporary disable 0x803 target due to build error
...
[ROCm/rccl commit: cd7ab1425b ]
2019-11-14 11:17:41 -08:00
Wenkai Du
8e60479385
Merge pull request #151 from wenkaidu/prim_test
...
rccl_prim_test: Generalize ring topology and duplications
[ROCm/rccl commit: 55c07e4fb7 ]
2019-11-13 08:17:55 -08:00
Siu Chi Chan
c0e052bee4
Merge pull request #152 from scchan/bump_hcc_version_check_32
...
Bump up HCC version for -hc-function-calls switch
[ROCm/rccl commit: 453c735475 ]
2019-11-13 10:45:40 -05:00
Siu Chi Chan
9b19999918
Bump up HCC version for -hc-function-calls switch
...
[ROCm/rccl commit: 08ba92f1b0 ]
2019-11-12 14:16:35 -05:00
Wenkai Du
25b3175e82
rccl_prim_test: Generalize ring topology and duplications
...
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
[ROCm/rccl commit: 07bb6fce8f ]
2019-11-11 15:42:24 -08:00
Wenkai Du
66e4337c6e
Merge pull request #149 from wenkaidu/rtc
...
Correct RTC frequencies for profiling purpose
[ROCm/rccl commit: 277c72a638 ]
2019-11-06 08:02:58 -08:00
gilbertlee-amd
22cbbb9004
Adding interactive mode for profiling purposes ( #150 )
...
[ROCm/rccl commit: fd94f4fa25 ]
2019-11-05 17:10:16 -07:00
Wenkai Du
62042e47bc
Correct RTC frequencies for profiling purpose
...
[ROCm/rccl commit: 8995047830 ]
2019-11-05 11:36:45 -08:00
Wenkai Du
8023c8de0e
Merge pull request #148 from wenkaidu/fine_grain
...
Check for fine grain support using memory allocation
[ROCm/rccl commit: c49de785d2 ]
2019-11-04 10:19:07 -08:00
Wenkai Du
41f6319b33
Check for fine grain support using memory allocation
...
[ROCm/rccl commit: 669f1951a4 ]
2019-11-01 15:58:49 -07:00
Wenkai Du
0d6b476b08
Merge pull request #145 from wenkaidu/prim_test
...
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
[ROCm/rccl commit: 90b2921207 ]
2019-11-01 13:30:01 -07:00
gilbertlee-amd
f9ef1553aa
Single Sync Timing mode ( #144 )
...
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
[ROCm/rccl commit: 2f9edd2432 ]
2019-11-01 10:18:25 -06:00
Jeff Daily
e43e1f1b3d
additional check for fine grain support in p2pCanConnect ( #146 )
...
[ROCm/rccl commit: 5a502955c9 ]
2019-10-31 08:58:38 -07:00
Wenkai Du
91b906cf88
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
...
[ROCm/rccl commit: ab91cdd5c9 ]
2019-10-30 13:15:02 -07:00
Gilbert Lee
a99accb2cb
Adding ability to switch between fine/coarse grain destination GPU memory
...
Adding ability to switch between memset/memcpy
[ROCm/rccl commit: 648c1ee7cc ]
2019-10-29 12:00:32 -06:00
Wenkai Du
b4ab922f94
Merge pull request #140 from scchan/rocm210_hc_function_calls
...
add -hc-function-calls switch back for HCC ROCm 2.10
[ROCm/rccl commit: 9be7ae8f0d ]
2019-10-28 09:56:47 -07:00
mhbliao
19940c6aa1
Merge pull request #142 from mhbliao/hliao/master/cmake
...
[cmake] Allow GPU targets to be parameterized with `AMDGPU_TARGETS`.
[ROCm/rccl commit: d89734234a ]
2019-10-28 08:33:30 -04:00
Michael LIAO
4b94f25d08
[cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS.
...
[ROCm/rccl commit: ec10a5cf14 ]
2019-10-25 13:55:27 -04:00
Wenkai Du
2edd1a1c5d
Merge pull request #141 from wenkaidu/hdp
...
Disable HDP flush for RDMA
[ROCm/rccl commit: b98d334114 ]
2019-10-24 16:26:01 -07:00
Wenkai Du
d3f399f619
Disable HDP flush for RDMA
...
[ROCm/rccl commit: 296176a4fd ]
2019-10-23 14:40:17 -07:00
Siu Chi Chan
8d2018d372
add -hc-function-calls switch back for HCC ROCm 2.10
...
[ROCm/rccl commit: d779eae1d0 ]
2019-10-21 18:00:02 -04:00
Wenkai Du
fc33ee4f44
Merge pull request #138 from wenkaidu/slice_steps
...
Revert collective chunk and slice steps to avoid drop in throughput
[ROCm/rccl commit: 998ab83675 ]
2019-10-18 13:30:27 -07:00
Wenkai Du
21bc1ef493
Revert collective chunk and slice steps to avoid drop in throughput
...
[ROCm/rccl commit: df74d12946 ]
2019-10-18 12:54:00 -07:00
saadrahim
0f4d4d63ec
CI Re-enabled for Ubuntu ( #135 )
...
[ROCm/rccl commit: a95529a6e2 ]
2019-10-18 11:38:51 -06:00
gilbertlee-amd
6732daf67d
Merge pull request #137 from gilbertlee-amd/GenericOpFix
...
Fix for GenericOp device primitive bug
[ROCm/rccl commit: 60279867b3 ]
2019-10-11 10:46:29 -06:00
Gilbert Lee
7560929bd7
Reverting GenericOp bug workaround modifications to slice/chunk steps
...
[ROCm/rccl commit: 37603ae6cb ]
2019-10-11 09:20:10 -07:00
Gilbert Lee
cf597ff257
Performing __threadfence_system() with only first thread
...
[ROCm/rccl commit: 1392dd2997 ]
2019-10-11 09:16:19 -07:00
Gilbert Lee
d257970ad1
Fix for GenericOp device primitive bug
...
[ROCm/rccl commit: 8ae1bce3bb ]
2019-10-10 22:39:45 -07:00
Wenkai Du
fbcdfd8348
Merge pull request #136 from wenkaidu/tree
...
Enable tree kernels in build
[ROCm/rccl commit: 062c798c86 ]
2019-10-09 10:58:52 -07:00
Wenkai Du
c4ed3d2e08
Merge pull request #134 from changpeng/master
...
Tuning the inline and unroll to reduce the scratch usage
[ROCm/rccl commit: 662281e599 ]
2019-10-09 10:58:38 -07:00
Wenkai Du
f86ee41415
Enable tree kernels in build
...
Need to tune and specify NCCL_TREE_THRESHOLD to allow usage
[ROCm/rccl commit: 76976c9e2e ]
2019-10-08 23:20:11 +00:00
Changpeng Fang
d8a06589c9
Tuning the inline and unroll to reduce the scratch usage
...
Summary:
1. remove the noinline attribute for AllReduceThreeKernel;
2. change AUTPUNROLL for tree functions to 1 or 2;
Combining 1 and 2 will reduce the scratch usage from 1256 to 952
[ROCm/rccl commit: eec319038e ]
2019-10-08 14:02:25 -07:00
Wenkai Du
de13a48f7b
Merge pull request #133 from scchan/hcc_version_detect
...
detect the hcc version and conditionally add -hc-function-calls
[ROCm/rccl commit: de25e4cb7c ]
2019-10-03 10:53:45 -07:00