Commit Graph

280 Commits

Author SHA1 Message Date
gilbertlee-amd fd94f4fa25 Adding interactive mode for profiling purposes (#150) 2019-11-05 17:10:16 -07:00
Wenkai Du c49de785d2 Merge pull request #148 from wenkaidu/fine_grain
Check for fine grain support using memory allocation
2019-11-04 10:19:07 -08:00
Wenkai Du 669f1951a4 Check for fine grain support using memory allocation 2019-11-01 15:58:49 -07:00
Wenkai Du 90b2921207 Merge pull request #145 from wenkaidu/prim_test
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-11-01 13:30:01 -07:00
gilbertlee-amd 2f9edd2432 Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
2019-11-01 10:18:25 -06:00
Jeff Daily 5a502955c9 additional check for fine grain support in p2pCanConnect (#146) 2019-10-31 08:58:38 -07:00
Wenkai Du ab91cdd5c9 rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup 2019-10-30 13:15:02 -07:00
Gilbert Lee 648c1ee7cc Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy
2019-10-29 12:00:32 -06:00
Wenkai Du 9be7ae8f0d Merge pull request #140 from scchan/rocm210_hc_function_calls
add -hc-function-calls switch back for HCC ROCm 2.10
2019-10-28 09:56:47 -07:00
mhbliao d89734234a Merge pull request #142 from mhbliao/hliao/master/cmake
[cmake] Allow GPU targets to be parameterized with `AMDGPU_TARGETS`.
2019-10-28 08:33:30 -04:00
Michael LIAO ec10a5cf14 [cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS. 2019-10-25 13:55:27 -04:00
Wenkai Du b98d334114 Merge pull request #141 from wenkaidu/hdp
Disable HDP flush for RDMA
2019-10-24 16:26:01 -07:00
Wenkai Du 296176a4fd Disable HDP flush for RDMA 2019-10-23 14:40:17 -07:00
Siu Chi Chan d779eae1d0 add -hc-function-calls switch back for HCC ROCm 2.10 2019-10-21 18:00:02 -04:00
Wenkai Du 998ab83675 Merge pull request #138 from wenkaidu/slice_steps
Revert collective chunk and slice steps to avoid drop in throughput
2019-10-18 13:30:27 -07:00
Wenkai Du df74d12946 Revert collective chunk and slice steps to avoid drop in throughput 2019-10-18 12:54:00 -07:00
saadrahim a95529a6e2 CI Re-enabled for Ubuntu (#135) 2019-10-18 11:38:51 -06:00
gilbertlee-amd 60279867b3 Merge pull request #137 from gilbertlee-amd/GenericOpFix
Fix for GenericOp device primitive bug
2019-10-11 10:46:29 -06:00
Gilbert Lee 37603ae6cb Reverting GenericOp bug workaround modifications to slice/chunk steps 2019-10-11 09:20:10 -07:00
Gilbert Lee 1392dd2997 Performing __threadfence_system() with only first thread 2019-10-11 09:16:19 -07:00
Gilbert Lee 8ae1bce3bb Fix for GenericOp device primitive bug 2019-10-10 22:39:45 -07:00
Wenkai Du 062c798c86 Merge pull request #136 from wenkaidu/tree
Enable tree kernels in build
2019-10-09 10:58:52 -07:00
Wenkai Du 662281e599 Merge pull request #134 from changpeng/master
Tuning the inline and unroll to reduce the scratch usage
2019-10-09 10:58:38 -07:00
Wenkai Du 76976c9e2e Enable tree kernels in build
Need to tune and specify NCCL_TREE_THRESHOLD to allow usage
2019-10-08 23:20:11 +00:00
Changpeng Fang eec319038e Tuning the inline and unroll to reduce the scratch usage
Summary:
 1. remove the noinline attribute for AllReduceThreeKernel;
 2. change AUTPUNROLL for tree functions to 1 or 2;
 Combining 1 and 2 will reduce the scratch usage from 1256 to 952
2019-10-08 14:02:25 -07:00
Wenkai Du de25e4cb7c Merge pull request #133 from scchan/hcc_version_detect
detect the hcc version and conditionally add -hc-function-calls
2019-10-03 10:53:45 -07:00
Siu Chi Chan b87ef4f152 detect the hcc version and conditionally add the -hc-function-calls switch 2019-10-03 13:25:25 -04:00
Wenkai Du 8ec01dde33 Merge pull request #132 from wenkaidu/reduce_kernels
Only generate kernels for sum and copy
2019-09-26 16:14:45 -07:00
Wenkai Du 61ef1dcad5 Only generate kernels for sum and copy 2019-09-24 17:01:12 -07:00
Gilbert Lee 6232985e34 Re-adding gfx908 target 2019-09-13 16:57:34 +00:00
Gilbert Lee 86ce0a93b5 RDMA HDP flush fix 2019-09-06 16:35:55 +00:00
Gilbert Lee 3e6b326a19 Revert "Set RDMA default to off state"
This reverts commit 0f16ad966a.
2019-09-05 18:16:53 +00:00
gilbertlee-amd eaf25ab099 Merge pull request #131 from rpathani/xgmi_bench
Read operation throughput
2019-09-04 09:59:13 -06:00
rohit pathania a270ee080e Read operation throughput 2019-09-03 14:58:40 +05:30
Wenkai Du 22c9ae0712 Merge pull request #129 from rpathani/xgmi_bench
display each workgroup ,links and directions with throughputs
2019-08-30 09:06:21 -07:00
rohit pathania e5b13d69e5 display each workgroup ,links and directions with throughputs 2019-08-30 13:28:23 +05:30
Wenkai Du 9c501fb8fb Merge pull request #130 from wenkaidu/p2p_fix
Allocate opCount in pinned host memory for P2P transport
2019-08-29 14:12:03 -07:00
Wenkai Du 8c975353ed Allocate opCount in pinned host memory for P2P transport
To avoid remote P2P read access when checking remote GPU's opCount
2019-08-29 10:22:09 -07:00
amdkila 259583cde6 Merge pull request #128 from amdkila/hip-clang
Added hip-clang options to install script, and openmp/pthread flags
2019-08-27 16:23:40 -06:00
Wenkai Du a4ef5a3dd4 Merge pull request #127 from wenkaidu/rdma
Set RDMA default to off state
2019-08-26 11:46:10 -07:00
Wenkai Du 0f16ad966a Set RDMA default to off state 2019-08-26 10:59:33 -07:00
saadrahim 544d4fb704 Updating versioning to follow rocm-cmake standard (#126) 2019-08-23 16:33:38 -06:00
Akila Premachandra f48ae5c98d Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt 2019-08-23 22:02:42 +00:00
Wenkai Du 6759660529 Merge pull request #125 from wenkaidu/fix_nvml_id
Assign unused nmvlDev to avoid random number
2019-08-19 09:08:13 -07:00
Wenkai Du ee5dec4467 Merge pull request #117 from rpathani/xgmi_bench
Modified the code to use RTC clock frequency based on gpu gcn id
2019-08-19 08:59:34 -07:00
rpathani 40e30b5168 Update rccl_prim_test.cpp 2019-08-19 12:44:11 +05:30
Wenkai Du a67ae11ce4 Merge pull request #124 from wenkaidu/upstream_sync
Upstream sync
2019-08-16 16:41:55 -07:00
Wenkai Du 86efdfc3b5 Assign unused nmvlDev to avoid random number 2019-08-16 16:34:14 -07:00
Wenkai Du 7c38da0939 Merge remote-tracking branch 'remotes/nccl/master' into HEAD 2019-08-16 16:13:34 -07:00
Wenkai Du 72a64e27f3 Merge pull request #123 from wenkaidu/tune_unroll
Tune AUTOUNROLL for better performance
2019-08-16 11:15:49 -07:00