Wenkai Du
8b1ce44c2a
Temporary disable 0x803 target due to build error
...
[ROCm/rccl commit: cd7ab1425b ]
2019-11-14 11:17:41 -08:00
Wenkai Du
8e60479385
Merge pull request #151 from wenkaidu/prim_test
...
rccl_prim_test: Generalize ring topology and duplications
[ROCm/rccl commit: 55c07e4fb7 ]
2019-11-13 08:17:55 -08:00
Siu Chi Chan
9b19999918
Bump up HCC version for -hc-function-calls switch
...
[ROCm/rccl commit: 08ba92f1b0 ]
2019-11-12 14:16:35 -05:00
Wenkai Du
25b3175e82
rccl_prim_test: Generalize ring topology and duplications
...
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
[ROCm/rccl commit: 07bb6fce8f ]
2019-11-11 15:42:24 -08:00
Wenkai Du
66e4337c6e
Merge pull request #149 from wenkaidu/rtc
...
Correct RTC frequencies for profiling purpose
[ROCm/rccl commit: 277c72a638 ]
2019-11-06 08:02:58 -08:00
gilbertlee-amd
22cbbb9004
Adding interactive mode for profiling purposes ( #150 )
...
[ROCm/rccl commit: fd94f4fa25 ]
2019-11-05 17:10:16 -07:00
Wenkai Du
62042e47bc
Correct RTC frequencies for profiling purpose
...
[ROCm/rccl commit: 8995047830 ]
2019-11-05 11:36:45 -08:00
Wenkai Du
41f6319b33
Check for fine grain support using memory allocation
...
[ROCm/rccl commit: 669f1951a4 ]
2019-11-01 15:58:49 -07:00
Wenkai Du
0d6b476b08
Merge pull request #145 from wenkaidu/prim_test
...
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
[ROCm/rccl commit: 90b2921207 ]
2019-11-01 13:30:01 -07:00
gilbertlee-amd
f9ef1553aa
Single Sync Timing mode ( #144 )
...
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
[ROCm/rccl commit: 2f9edd2432 ]
2019-11-01 10:18:25 -06:00
Jeff Daily
e43e1f1b3d
additional check for fine grain support in p2pCanConnect ( #146 )
...
[ROCm/rccl commit: 5a502955c9 ]
2019-10-31 08:58:38 -07:00
Wenkai Du
91b906cf88
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
...
[ROCm/rccl commit: ab91cdd5c9 ]
2019-10-30 13:15:02 -07:00
Gilbert Lee
a99accb2cb
Adding ability to switch between fine/coarse grain destination GPU memory
...
Adding ability to switch between memset/memcpy
[ROCm/rccl commit: 648c1ee7cc ]
2019-10-29 12:00:32 -06:00
Wenkai Du
b4ab922f94
Merge pull request #140 from scchan/rocm210_hc_function_calls
...
add -hc-function-calls switch back for HCC ROCm 2.10
[ROCm/rccl commit: 9be7ae8f0d ]
2019-10-28 09:56:47 -07:00
Michael LIAO
4b94f25d08
[cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS.
...
[ROCm/rccl commit: ec10a5cf14 ]
2019-10-25 13:55:27 -04:00
Wenkai Du
d3f399f619
Disable HDP flush for RDMA
...
[ROCm/rccl commit: 296176a4fd ]
2019-10-23 14:40:17 -07:00
Siu Chi Chan
8d2018d372
add -hc-function-calls switch back for HCC ROCm 2.10
...
[ROCm/rccl commit: d779eae1d0 ]
2019-10-21 18:00:02 -04:00
Wenkai Du
21bc1ef493
Revert collective chunk and slice steps to avoid drop in throughput
...
[ROCm/rccl commit: df74d12946 ]
2019-10-18 12:54:00 -07:00
saadrahim
0f4d4d63ec
CI Re-enabled for Ubuntu ( #135 )
...
[ROCm/rccl commit: a95529a6e2 ]
2019-10-18 11:38:51 -06:00
Gilbert Lee
7560929bd7
Reverting GenericOp bug workaround modifications to slice/chunk steps
...
[ROCm/rccl commit: 37603ae6cb ]
2019-10-11 09:20:10 -07:00
Gilbert Lee
cf597ff257
Performing __threadfence_system() with only first thread
...
[ROCm/rccl commit: 1392dd2997 ]
2019-10-11 09:16:19 -07:00
Gilbert Lee
d257970ad1
Fix for GenericOp device primitive bug
...
[ROCm/rccl commit: 8ae1bce3bb ]
2019-10-10 22:39:45 -07:00
Wenkai Du
fbcdfd8348
Merge pull request #136 from wenkaidu/tree
...
Enable tree kernels in build
[ROCm/rccl commit: 062c798c86 ]
2019-10-09 10:58:52 -07:00
Wenkai Du
f86ee41415
Enable tree kernels in build
...
Need to tune and specify NCCL_TREE_THRESHOLD to allow usage
[ROCm/rccl commit: 76976c9e2e ]
2019-10-08 23:20:11 +00:00
Changpeng Fang
d8a06589c9
Tuning the inline and unroll to reduce the scratch usage
...
Summary:
1. remove the noinline attribute for AllReduceThreeKernel;
2. change AUTPUNROLL for tree functions to 1 or 2;
Combining 1 and 2 will reduce the scratch usage from 1256 to 952
[ROCm/rccl commit: eec319038e ]
2019-10-08 14:02:25 -07:00
Siu Chi Chan
0af7f5268f
detect the hcc version and conditionally add the -hc-function-calls switch
...
[ROCm/rccl commit: b87ef4f152 ]
2019-10-03 13:25:25 -04:00
Wenkai Du
57dcac6afe
Only generate kernels for sum and copy
...
[ROCm/rccl commit: 61ef1dcad5 ]
2019-09-24 17:01:12 -07:00
Gilbert Lee
a401a91fd7
Re-adding gfx908 target
...
[ROCm/rccl commit: 6232985e34 ]
2019-09-13 16:57:34 +00:00
Gilbert Lee
dbb7500fd1
RDMA HDP flush fix
...
[ROCm/rccl commit: 86ce0a93b5 ]
2019-09-06 16:35:55 +00:00
Gilbert Lee
2847ad9576
Revert "Set RDMA default to off state"
...
This reverts commit 4afd6818ba .
[ROCm/rccl commit: 3e6b326a19 ]
2019-09-05 18:16:53 +00:00
rohit pathania
31670133a1
Read operation throughput
...
[ROCm/rccl commit: a270ee080e ]
2019-09-03 14:58:40 +05:30
rohit pathania
bc51b5bc28
display each workgroup ,links and directions with throughputs
...
[ROCm/rccl commit: e5b13d69e5 ]
2019-08-30 13:28:23 +05:30
Wenkai Du
daf2c4b200
Allocate opCount in pinned host memory for P2P transport
...
To avoid remote P2P read access when checking remote GPU's opCount
[ROCm/rccl commit: 8c975353ed ]
2019-08-29 10:22:09 -07:00
amdkila
ea0ce5c064
Merge pull request #128 from amdkila/hip-clang
...
Added hip-clang options to install script, and openmp/pthread flags
[ROCm/rccl commit: 259583cde6 ]
2019-08-27 16:23:40 -06:00
Wenkai Du
4afd6818ba
Set RDMA default to off state
...
[ROCm/rccl commit: 0f16ad966a ]
2019-08-26 10:59:33 -07:00
saadrahim
e433b21b23
Updating versioning to follow rocm-cmake standard ( #126 )
...
[ROCm/rccl commit: 544d4fb704 ]
2019-08-23 16:33:38 -06:00
Akila Premachandra
94b33a7550
Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt
...
[ROCm/rccl commit: f48ae5c98d ]
2019-08-23 22:02:42 +00:00
Wenkai Du
4df1defc3b
Merge pull request #125 from wenkaidu/fix_nvml_id
...
Assign unused nmvlDev to avoid random number
[ROCm/rccl commit: 6759660529 ]
2019-08-19 09:08:13 -07:00
Wenkai Du
54608abf5c
Merge pull request #117 from rpathani/xgmi_bench
...
Modified the code to use RTC clock frequency based on gpu gcn id
[ROCm/rccl commit: ee5dec4467 ]
2019-08-19 08:59:34 -07:00
rpathani
c441f2ff9b
Update rccl_prim_test.cpp
...
[ROCm/rccl commit: 40e30b5168 ]
2019-08-19 12:44:11 +05:30
Wenkai Du
04cd446d89
Assign unused nmvlDev to avoid random number
...
[ROCm/rccl commit: 86efdfc3b5 ]
2019-08-16 16:34:14 -07:00
Wenkai Du
60989a3fc9
Merge remote-tracking branch 'remotes/nccl/master' into HEAD
...
[ROCm/rccl commit: 7c38da0939 ]
2019-08-16 16:13:34 -07:00
Wenkai Du
7396d5c3ba
Tune AUTOUNROLL for better performance
...
Also remove all unused UNROLL defines
[ROCm/rccl commit: 1faededc03 ]
2019-08-16 10:34:53 -07:00
rpathani
eaa1cdb48c
Merge branch 'master' into xgmi_bench
...
[ROCm/rccl commit: deea20d49c ]
2019-08-16 10:56:56 +05:30
Michael LIAO
f4a240065f
Fix build with hip-clang.
...
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.
[ROCm/rccl commit: 9369f8d75d ]
2019-08-15 14:56:04 -04:00
Wenkai Du
d4862fa605
Tune LL threshold for VEGA
...
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL
[ROCm/rccl commit: 2223cccf15 ]
2019-08-15 09:16:11 -07:00
Wenkai Du
93c44e96cb
Default to minimal 2 rings and improve LL loop
...
[ROCm/rccl commit: 4b77a16f3f ]
2019-08-14 14:12:56 -07:00
Wenkai Du
1feef99e7d
Remove duplicate line
...
[ROCm/rccl commit: 5782a8d857 ]
2019-08-14 13:22:43 -07:00
Wenkai Du
6047487815
RCCL 2.4 update
...
[ROCm/rccl commit: f11c8f60cd ]
2019-08-14 10:42:35 -07:00
David Addison
d57c0b0f92
Updated PR#196 to use a common hash function
...
[ROCm/rccl commit: fad079a8ae ]
2019-08-14 10:08:39 -07:00