Eiden Yoshida
eb823a7621
Refactor Jenkinsfiles to allow use of new docker containers ( #170 )
...
[ROCm/rccl commit: edb863de62 ]
2020-02-18 11:25:29 -07:00
Wenkai Du
ded8d0d389
Bump up HCC version for -hc-function-calls switch
...
[ROCm/rccl commit: 3d092f32b8 ]
2020-02-11 19:37:13 +00:00
Wenkai Du
6b2d7de200
Add ring bandwidth correction factor
...
[ROCm/rccl commit: d1dae2721d ]
2020-01-30 09:52:27 -08:00
Stanley Tsang
e5419407c4
Updating copyright notices for 2020.
...
[ROCm/rccl commit: 20fa04d9b6 ]
2020-01-29 15:28:08 -08:00
Wenkai Du
e6b5933d7e
Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup
...
[ROCm/rccl commit: fe6d012eb0 ]
2020-01-29 15:28:03 -08:00
Wenkai Du
622b49e80a
Split primitive class to smaller structures
...
[ROCm/rccl commit: 486fd436af ]
2020-01-29 15:27:23 -08:00
Wenkai Du
d2fbcfea02
Misc fixes and improvements for 2.5.6
...
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log
[ROCm/rccl commit: 1e55645d97 ]
2020-01-29 15:27:05 -08:00
paulfreddy
bbb0c59cd4
Changes for multiple ROCm installation ( #164 )
...
* Changes for multiple ROCm installation
1. Set version to 2.10.1
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
* Changes for multiple ROCm installation
1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
* Changes for multiple ROCm installation
1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
[ROCm/rccl commit: 15c917244d ]
2020-01-08 21:28:16 -08:00
Gilbert Lee
5783917a75
Changing single sync mode to time all iterations instead of just last
...
[ROCm/rccl commit: e5074ce94d ]
2019-12-20 17:08:39 -08:00
gilbertlee-amd
71635198b8
Removing OpenMP from unit tests ( #163 )
...
[ROCm/rccl commit: 000bce6f27 ]
2019-12-20 11:41:56 -07:00
gilbertlee-amd
a461b6d139
Adding new sleep after sync capability for data fabric profiling ( #162 )
...
Fixing missing header include for ROCM 3.0 changes
[ROCm/rccl commit: 2f4269d06d ]
2019-12-12 15:20:54 -07:00
saadrahim
26e161a7a2
Package fix ( #161 )
...
* Fixing RHEL dependency on rocm-dev
[ROCm/rccl commit: 0092b35132 ]
2019-12-06 16:06:50 -07:00
saadrahim
13de181fbc
Changing package dependency to rocm-dev ( #160 )
...
[ROCm/rccl commit: bd59b6f880 ]
2019-12-06 14:00:25 -07:00
Wenkai Du
35ad901dfe
Merge pull request #158 from wenkaidu/p2p
...
Change default P2P level
[ROCm/rccl commit: 9e10cde644 ]
2019-12-04 16:30:58 -08:00
Wenkai Du
272d22fbe3
Change default P2P level
...
[ROCm/rccl commit: 90e928bcd5 ]
2019-12-04 21:05:10 +00:00
Wenkai Du
d7d4175df0
Merge remote-tracking branch 'remotes/nccl/master' into rccl_2.5.6
...
[ROCm/rccl commit: 6648c81dc6 ]
2019-12-03 15:42:04 -08:00
Wenkai Du
fedce64117
Change manual build instructions to fit most common usage
...
[ROCm/rccl commit: 00a910c2da ]
2019-11-26 12:40:26 -08:00
Wenkai Du
4cb52294d5
Disable direct buffers to reduce scratch memory size
...
[ROCm/rccl commit: a0be2b8812 ]
2019-11-20 13:03:16 -08:00
Sylvain Jeaugey
71560fd67b
2.5.6-1 ( #255 )
...
Add LL128 Protocol.
Rewrite the topology detection and tree/ring creation (#179 ). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.
Rework P2P/SHM detection in containers (#155 , #248 ).
Detect duplicated devices and return an error (#231 ).
Add tuning for GCP
[ROCm/rccl commit: 299c554dcc ]
2019-11-19 14:57:39 -08:00
Wenkai Du
55ad3c801e
Support bfloat16 on rest of the unit tests
...
[ROCm/rccl commit: 4ca05c1297 ]
2019-11-18 14:18:34 -08:00
Wenkai Du
7dc39b8928
Add bfloat16 all reduce unit test
...
[ROCm/rccl commit: bdac0256a5 ]
2019-11-18 13:50:29 -08:00
Wenkai Du
1e182391ad
Add bfloat16 support in RCCL
...
Preprocessor symbol RCCL_BFLOAT16 is used as feature indicator
[ROCm/rccl commit: 5e109ed400 ]
2019-11-18 13:45:53 -08:00
Wenkai Du
8b1ce44c2a
Temporary disable 0x803 target due to build error
...
[ROCm/rccl commit: cd7ab1425b ]
2019-11-14 11:17:41 -08:00
Wenkai Du
8e60479385
Merge pull request #151 from wenkaidu/prim_test
...
rccl_prim_test: Generalize ring topology and duplications
[ROCm/rccl commit: 55c07e4fb7 ]
2019-11-13 08:17:55 -08:00
Siu Chi Chan
9b19999918
Bump up HCC version for -hc-function-calls switch
...
[ROCm/rccl commit: 08ba92f1b0 ]
2019-11-12 14:16:35 -05:00
Wenkai Du
25b3175e82
rccl_prim_test: Generalize ring topology and duplications
...
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
[ROCm/rccl commit: 07bb6fce8f ]
2019-11-11 15:42:24 -08:00
Wenkai Du
66e4337c6e
Merge pull request #149 from wenkaidu/rtc
...
Correct RTC frequencies for profiling purpose
[ROCm/rccl commit: 277c72a638 ]
2019-11-06 08:02:58 -08:00
gilbertlee-amd
22cbbb9004
Adding interactive mode for profiling purposes ( #150 )
...
[ROCm/rccl commit: fd94f4fa25 ]
2019-11-05 17:10:16 -07:00
Wenkai Du
62042e47bc
Correct RTC frequencies for profiling purpose
...
[ROCm/rccl commit: 8995047830 ]
2019-11-05 11:36:45 -08:00
Wenkai Du
41f6319b33
Check for fine grain support using memory allocation
...
[ROCm/rccl commit: 669f1951a4 ]
2019-11-01 15:58:49 -07:00
Wenkai Du
0d6b476b08
Merge pull request #145 from wenkaidu/prim_test
...
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
[ROCm/rccl commit: 90b2921207 ]
2019-11-01 13:30:01 -07:00
gilbertlee-amd
f9ef1553aa
Single Sync Timing mode ( #144 )
...
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
[ROCm/rccl commit: 2f9edd2432 ]
2019-11-01 10:18:25 -06:00
Jeff Daily
e43e1f1b3d
additional check for fine grain support in p2pCanConnect ( #146 )
...
[ROCm/rccl commit: 5a502955c9 ]
2019-10-31 08:58:38 -07:00
Wenkai Du
91b906cf88
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
...
[ROCm/rccl commit: ab91cdd5c9 ]
2019-10-30 13:15:02 -07:00
Gilbert Lee
a99accb2cb
Adding ability to switch between fine/coarse grain destination GPU memory
...
Adding ability to switch between memset/memcpy
[ROCm/rccl commit: 648c1ee7cc ]
2019-10-29 12:00:32 -06:00
Wenkai Du
b4ab922f94
Merge pull request #140 from scchan/rocm210_hc_function_calls
...
add -hc-function-calls switch back for HCC ROCm 2.10
[ROCm/rccl commit: 9be7ae8f0d ]
2019-10-28 09:56:47 -07:00
Michael LIAO
4b94f25d08
[cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS.
...
[ROCm/rccl commit: ec10a5cf14 ]
2019-10-25 13:55:27 -04:00
Wenkai Du
d3f399f619
Disable HDP flush for RDMA
...
[ROCm/rccl commit: 296176a4fd ]
2019-10-23 14:40:17 -07:00
Siu Chi Chan
8d2018d372
add -hc-function-calls switch back for HCC ROCm 2.10
...
[ROCm/rccl commit: d779eae1d0 ]
2019-10-21 18:00:02 -04:00
Wenkai Du
21bc1ef493
Revert collective chunk and slice steps to avoid drop in throughput
...
[ROCm/rccl commit: df74d12946 ]
2019-10-18 12:54:00 -07:00
saadrahim
0f4d4d63ec
CI Re-enabled for Ubuntu ( #135 )
...
[ROCm/rccl commit: a95529a6e2 ]
2019-10-18 11:38:51 -06:00
Gilbert Lee
7560929bd7
Reverting GenericOp bug workaround modifications to slice/chunk steps
...
[ROCm/rccl commit: 37603ae6cb ]
2019-10-11 09:20:10 -07:00
Gilbert Lee
cf597ff257
Performing __threadfence_system() with only first thread
...
[ROCm/rccl commit: 1392dd2997 ]
2019-10-11 09:16:19 -07:00
Gilbert Lee
d257970ad1
Fix for GenericOp device primitive bug
...
[ROCm/rccl commit: 8ae1bce3bb ]
2019-10-10 22:39:45 -07:00
Wenkai Du
fbcdfd8348
Merge pull request #136 from wenkaidu/tree
...
Enable tree kernels in build
[ROCm/rccl commit: 062c798c86 ]
2019-10-09 10:58:52 -07:00
Wenkai Du
f86ee41415
Enable tree kernels in build
...
Need to tune and specify NCCL_TREE_THRESHOLD to allow usage
[ROCm/rccl commit: 76976c9e2e ]
2019-10-08 23:20:11 +00:00
Changpeng Fang
d8a06589c9
Tuning the inline and unroll to reduce the scratch usage
...
Summary:
1. remove the noinline attribute for AllReduceThreeKernel;
2. change AUTPUNROLL for tree functions to 1 or 2;
Combining 1 and 2 will reduce the scratch usage from 1256 to 952
[ROCm/rccl commit: eec319038e ]
2019-10-08 14:02:25 -07:00
Siu Chi Chan
0af7f5268f
detect the hcc version and conditionally add the -hc-function-calls switch
...
[ROCm/rccl commit: b87ef4f152 ]
2019-10-03 13:25:25 -04:00
Wenkai Du
57dcac6afe
Only generate kernels for sum and copy
...
[ROCm/rccl commit: 61ef1dcad5 ]
2019-09-24 17:01:12 -07:00
Gilbert Lee
a401a91fd7
Re-adding gfx908 target
...
[ROCm/rccl commit: 6232985e34 ]
2019-09-13 16:57:34 +00:00