gilbertlee-amd
fd94f4fa25
Adding interactive mode for profiling purposes ( #150 )
2019-11-05 17:10:16 -07:00
Wenkai Du
c49de785d2
Merge pull request #148 from wenkaidu/fine_grain
...
Check for fine grain support using memory allocation
2019-11-04 10:19:07 -08:00
Wenkai Du
669f1951a4
Check for fine grain support using memory allocation
2019-11-01 15:58:49 -07:00
Wenkai Du
90b2921207
Merge pull request #145 from wenkaidu/prim_test
...
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-11-01 13:30:01 -07:00
gilbertlee-amd
2f9edd2432
Single Sync Timing mode ( #144 )
...
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
2019-11-01 10:18:25 -06:00
Jeff Daily
5a502955c9
additional check for fine grain support in p2pCanConnect ( #146 )
2019-10-31 08:58:38 -07:00
Wenkai Du
ab91cdd5c9
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-10-30 13:15:02 -07:00
Gilbert Lee
648c1ee7cc
Adding ability to switch between fine/coarse grain destination GPU memory
...
Adding ability to switch between memset/memcpy
2019-10-29 12:00:32 -06:00
Wenkai Du
9be7ae8f0d
Merge pull request #140 from scchan/rocm210_hc_function_calls
...
add -hc-function-calls switch back for HCC ROCm 2.10
2019-10-28 09:56:47 -07:00
mhbliao
d89734234a
Merge pull request #142 from mhbliao/hliao/master/cmake
...
[cmake] Allow GPU targets to be parameterized with `AMDGPU_TARGETS`.
2019-10-28 08:33:30 -04:00
Michael LIAO
ec10a5cf14
[cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS.
2019-10-25 13:55:27 -04:00
Wenkai Du
b98d334114
Merge pull request #141 from wenkaidu/hdp
...
Disable HDP flush for RDMA
2019-10-24 16:26:01 -07:00
Wenkai Du
296176a4fd
Disable HDP flush for RDMA
2019-10-23 14:40:17 -07:00
Siu Chi Chan
d779eae1d0
add -hc-function-calls switch back for HCC ROCm 2.10
2019-10-21 18:00:02 -04:00
Wenkai Du
998ab83675
Merge pull request #138 from wenkaidu/slice_steps
...
Revert collective chunk and slice steps to avoid drop in throughput
2019-10-18 13:30:27 -07:00
Wenkai Du
df74d12946
Revert collective chunk and slice steps to avoid drop in throughput
2019-10-18 12:54:00 -07:00
saadrahim
a95529a6e2
CI Re-enabled for Ubuntu ( #135 )
2019-10-18 11:38:51 -06:00
gilbertlee-amd
60279867b3
Merge pull request #137 from gilbertlee-amd/GenericOpFix
...
Fix for GenericOp device primitive bug
2019-10-11 10:46:29 -06:00
Gilbert Lee
37603ae6cb
Reverting GenericOp bug workaround modifications to slice/chunk steps
2019-10-11 09:20:10 -07:00
Gilbert Lee
1392dd2997
Performing __threadfence_system() with only first thread
2019-10-11 09:16:19 -07:00
Gilbert Lee
8ae1bce3bb
Fix for GenericOp device primitive bug
2019-10-10 22:39:45 -07:00
Wenkai Du
062c798c86
Merge pull request #136 from wenkaidu/tree
...
Enable tree kernels in build
2019-10-09 10:58:52 -07:00
Wenkai Du
662281e599
Merge pull request #134 from changpeng/master
...
Tuning the inline and unroll to reduce the scratch usage
2019-10-09 10:58:38 -07:00
Wenkai Du
76976c9e2e
Enable tree kernels in build
...
Need to tune and specify NCCL_TREE_THRESHOLD to allow usage
2019-10-08 23:20:11 +00:00
Changpeng Fang
eec319038e
Tuning the inline and unroll to reduce the scratch usage
...
Summary:
1. remove the noinline attribute for AllReduceThreeKernel;
2. change AUTPUNROLL for tree functions to 1 or 2;
Combining 1 and 2 will reduce the scratch usage from 1256 to 952
2019-10-08 14:02:25 -07:00
Wenkai Du
de25e4cb7c
Merge pull request #133 from scchan/hcc_version_detect
...
detect the hcc version and conditionally add -hc-function-calls
2019-10-03 10:53:45 -07:00
Siu Chi Chan
b87ef4f152
detect the hcc version and conditionally add the -hc-function-calls switch
2019-10-03 13:25:25 -04:00
Wenkai Du
8ec01dde33
Merge pull request #132 from wenkaidu/reduce_kernels
...
Only generate kernels for sum and copy
2019-09-26 16:14:45 -07:00
Wenkai Du
61ef1dcad5
Only generate kernels for sum and copy
2019-09-24 17:01:12 -07:00
Gilbert Lee
6232985e34
Re-adding gfx908 target
2019-09-13 16:57:34 +00:00
Gilbert Lee
86ce0a93b5
RDMA HDP flush fix
2019-09-06 16:35:55 +00:00
Gilbert Lee
3e6b326a19
Revert "Set RDMA default to off state"
...
This reverts commit 0f16ad966a .
2019-09-05 18:16:53 +00:00
gilbertlee-amd
eaf25ab099
Merge pull request #131 from rpathani/xgmi_bench
...
Read operation throughput
2019-09-04 09:59:13 -06:00
rohit pathania
a270ee080e
Read operation throughput
2019-09-03 14:58:40 +05:30
Wenkai Du
22c9ae0712
Merge pull request #129 from rpathani/xgmi_bench
...
display each workgroup ,links and directions with throughputs
2019-08-30 09:06:21 -07:00
rohit pathania
e5b13d69e5
display each workgroup ,links and directions with throughputs
2019-08-30 13:28:23 +05:30
Wenkai Du
9c501fb8fb
Merge pull request #130 from wenkaidu/p2p_fix
...
Allocate opCount in pinned host memory for P2P transport
2019-08-29 14:12:03 -07:00
Wenkai Du
8c975353ed
Allocate opCount in pinned host memory for P2P transport
...
To avoid remote P2P read access when checking remote GPU's opCount
2019-08-29 10:22:09 -07:00
amdkila
259583cde6
Merge pull request #128 from amdkila/hip-clang
...
Added hip-clang options to install script, and openmp/pthread flags
2019-08-27 16:23:40 -06:00
Wenkai Du
a4ef5a3dd4
Merge pull request #127 from wenkaidu/rdma
...
Set RDMA default to off state
2019-08-26 11:46:10 -07:00
Wenkai Du
0f16ad966a
Set RDMA default to off state
2019-08-26 10:59:33 -07:00
saadrahim
544d4fb704
Updating versioning to follow rocm-cmake standard ( #126 )
2019-08-23 16:33:38 -06:00
Akila Premachandra
f48ae5c98d
Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt
2019-08-23 22:02:42 +00:00
Wenkai Du
6759660529
Merge pull request #125 from wenkaidu/fix_nvml_id
...
Assign unused nmvlDev to avoid random number
2019-08-19 09:08:13 -07:00
Wenkai Du
ee5dec4467
Merge pull request #117 from rpathani/xgmi_bench
...
Modified the code to use RTC clock frequency based on gpu gcn id
2019-08-19 08:59:34 -07:00
rpathani
40e30b5168
Update rccl_prim_test.cpp
2019-08-19 12:44:11 +05:30
Wenkai Du
a67ae11ce4
Merge pull request #124 from wenkaidu/upstream_sync
...
Upstream sync
2019-08-16 16:41:55 -07:00
Wenkai Du
86efdfc3b5
Assign unused nmvlDev to avoid random number
2019-08-16 16:34:14 -07:00
Wenkai Du
7c38da0939
Merge remote-tracking branch 'remotes/nccl/master' into HEAD
2019-08-16 16:13:34 -07:00
Wenkai Du
72a64e27f3
Merge pull request #123 from wenkaidu/tune_unroll
...
Tune AUTOUNROLL for better performance
2019-08-16 11:15:49 -07:00