Граф коммитов

288 Коммитов

Автор SHA1 Сообщение Дата
Wenkai Du 4de763fe00 Merge pull request #153 from wenkaidu/fuji
Temporary disable 0x803 target due to build error

[ROCm/rccl commit: 58a6e535f6]
2019-11-14 11:46:21 -08:00
Wenkai Du 8b1ce44c2a Temporary disable 0x803 target due to build error
[ROCm/rccl commit: cd7ab1425b]
2019-11-14 11:17:41 -08:00
Wenkai Du 8e60479385 Merge pull request #151 from wenkaidu/prim_test
rccl_prim_test: Generalize ring topology and duplications

[ROCm/rccl commit: 55c07e4fb7]
2019-11-13 08:17:55 -08:00
Siu Chi Chan c0e052bee4 Merge pull request #152 from scchan/bump_hcc_version_check_32
Bump up HCC version for -hc-function-calls switch

[ROCm/rccl commit: 453c735475]
2019-11-13 10:45:40 -05:00
Siu Chi Chan 9b19999918 Bump up HCC version for -hc-function-calls switch
[ROCm/rccl commit: 08ba92f1b0]
2019-11-12 14:16:35 -05:00
Wenkai Du 25b3175e82 rccl_prim_test: Generalize ring topology and duplications
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"


[ROCm/rccl commit: 07bb6fce8f]
2019-11-11 15:42:24 -08:00
Wenkai Du 66e4337c6e Merge pull request #149 from wenkaidu/rtc
Correct RTC frequencies for profiling purpose

[ROCm/rccl commit: 277c72a638]
2019-11-06 08:02:58 -08:00
gilbertlee-amd 22cbbb9004 Adding interactive mode for profiling purposes (#150)
[ROCm/rccl commit: fd94f4fa25]
2019-11-05 17:10:16 -07:00
Wenkai Du 62042e47bc Correct RTC frequencies for profiling purpose
[ROCm/rccl commit: 8995047830]
2019-11-05 11:36:45 -08:00
Wenkai Du 8023c8de0e Merge pull request #148 from wenkaidu/fine_grain
Check for fine grain support using memory allocation

[ROCm/rccl commit: c49de785d2]
2019-11-04 10:19:07 -08:00
Wenkai Du 41f6319b33 Check for fine grain support using memory allocation
[ROCm/rccl commit: 669f1951a4]
2019-11-01 15:58:49 -07:00
Wenkai Du 0d6b476b08 Merge pull request #145 from wenkaidu/prim_test
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup

[ROCm/rccl commit: 90b2921207]
2019-11-01 13:30:01 -07:00
gilbertlee-amd f9ef1553aa Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info


[ROCm/rccl commit: 2f9edd2432]
2019-11-01 10:18:25 -06:00
Jeff Daily e43e1f1b3d additional check for fine grain support in p2pCanConnect (#146)
[ROCm/rccl commit: 5a502955c9]
2019-10-31 08:58:38 -07:00
Wenkai Du 91b906cf88 rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
[ROCm/rccl commit: ab91cdd5c9]
2019-10-30 13:15:02 -07:00
Gilbert Lee a99accb2cb Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy


[ROCm/rccl commit: 648c1ee7cc]
2019-10-29 12:00:32 -06:00
Wenkai Du b4ab922f94 Merge pull request #140 from scchan/rocm210_hc_function_calls
add -hc-function-calls switch back for HCC ROCm 2.10

[ROCm/rccl commit: 9be7ae8f0d]
2019-10-28 09:56:47 -07:00
mhbliao 19940c6aa1 Merge pull request #142 from mhbliao/hliao/master/cmake
[cmake] Allow GPU targets to be parameterized with `AMDGPU_TARGETS`.

[ROCm/rccl commit: d89734234a]
2019-10-28 08:33:30 -04:00
Michael LIAO 4b94f25d08 [cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS.
[ROCm/rccl commit: ec10a5cf14]
2019-10-25 13:55:27 -04:00
Wenkai Du 2edd1a1c5d Merge pull request #141 from wenkaidu/hdp
Disable HDP flush for RDMA

[ROCm/rccl commit: b98d334114]
2019-10-24 16:26:01 -07:00
Wenkai Du d3f399f619 Disable HDP flush for RDMA
[ROCm/rccl commit: 296176a4fd]
2019-10-23 14:40:17 -07:00
Siu Chi Chan 8d2018d372 add -hc-function-calls switch back for HCC ROCm 2.10
[ROCm/rccl commit: d779eae1d0]
2019-10-21 18:00:02 -04:00
Wenkai Du fc33ee4f44 Merge pull request #138 from wenkaidu/slice_steps
Revert collective chunk and slice steps to avoid drop in throughput

[ROCm/rccl commit: 998ab83675]
2019-10-18 13:30:27 -07:00
Wenkai Du 21bc1ef493 Revert collective chunk and slice steps to avoid drop in throughput
[ROCm/rccl commit: df74d12946]
2019-10-18 12:54:00 -07:00
saadrahim 0f4d4d63ec CI Re-enabled for Ubuntu (#135)
[ROCm/rccl commit: a95529a6e2]
2019-10-18 11:38:51 -06:00
gilbertlee-amd 6732daf67d Merge pull request #137 from gilbertlee-amd/GenericOpFix
Fix for GenericOp device primitive bug

[ROCm/rccl commit: 60279867b3]
2019-10-11 10:46:29 -06:00
Gilbert Lee 7560929bd7 Reverting GenericOp bug workaround modifications to slice/chunk steps
[ROCm/rccl commit: 37603ae6cb]
2019-10-11 09:20:10 -07:00
Gilbert Lee cf597ff257 Performing __threadfence_system() with only first thread
[ROCm/rccl commit: 1392dd2997]
2019-10-11 09:16:19 -07:00
Gilbert Lee d257970ad1 Fix for GenericOp device primitive bug
[ROCm/rccl commit: 8ae1bce3bb]
2019-10-10 22:39:45 -07:00
Wenkai Du fbcdfd8348 Merge pull request #136 from wenkaidu/tree
Enable tree kernels in build

[ROCm/rccl commit: 062c798c86]
2019-10-09 10:58:52 -07:00
Wenkai Du c4ed3d2e08 Merge pull request #134 from changpeng/master
Tuning the inline and unroll to reduce the scratch usage

[ROCm/rccl commit: 662281e599]
2019-10-09 10:58:38 -07:00
Wenkai Du f86ee41415 Enable tree kernels in build
Need to tune and specify NCCL_TREE_THRESHOLD to allow usage


[ROCm/rccl commit: 76976c9e2e]
2019-10-08 23:20:11 +00:00
Changpeng Fang d8a06589c9 Tuning the inline and unroll to reduce the scratch usage
Summary:
 1. remove the noinline attribute for AllReduceThreeKernel;
 2. change AUTPUNROLL for tree functions to 1 or 2;
 Combining 1 and 2 will reduce the scratch usage from 1256 to 952


[ROCm/rccl commit: eec319038e]
2019-10-08 14:02:25 -07:00
Wenkai Du de13a48f7b Merge pull request #133 from scchan/hcc_version_detect
detect the hcc version and conditionally add -hc-function-calls

[ROCm/rccl commit: de25e4cb7c]
2019-10-03 10:53:45 -07:00
Siu Chi Chan 0af7f5268f detect the hcc version and conditionally add the -hc-function-calls switch
[ROCm/rccl commit: b87ef4f152]
2019-10-03 13:25:25 -04:00
Wenkai Du 4164cac99a Merge pull request #132 from wenkaidu/reduce_kernels
Only generate kernels for sum and copy

[ROCm/rccl commit: 8ec01dde33]
2019-09-26 16:14:45 -07:00
Wenkai Du 57dcac6afe Only generate kernels for sum and copy
[ROCm/rccl commit: 61ef1dcad5]
2019-09-24 17:01:12 -07:00
Gilbert Lee a401a91fd7 Re-adding gfx908 target
[ROCm/rccl commit: 6232985e34]
2019-09-13 16:57:34 +00:00
Gilbert Lee dbb7500fd1 RDMA HDP flush fix
[ROCm/rccl commit: 86ce0a93b5]
2019-09-06 16:35:55 +00:00
Gilbert Lee 2847ad9576 Revert "Set RDMA default to off state"
This reverts commit 4afd6818ba.


[ROCm/rccl commit: 3e6b326a19]
2019-09-05 18:16:53 +00:00
gilbertlee-amd e5082c6f61 Merge pull request #131 from rpathani/xgmi_bench
Read operation throughput

[ROCm/rccl commit: eaf25ab099]
2019-09-04 09:59:13 -06:00
rohit pathania 31670133a1 Read operation throughput
[ROCm/rccl commit: a270ee080e]
2019-09-03 14:58:40 +05:30
Wenkai Du 6c762c0a3d Merge pull request #129 from rpathani/xgmi_bench
display each workgroup ,links and directions with throughputs

[ROCm/rccl commit: 22c9ae0712]
2019-08-30 09:06:21 -07:00
rohit pathania bc51b5bc28 display each workgroup ,links and directions with throughputs
[ROCm/rccl commit: e5b13d69e5]
2019-08-30 13:28:23 +05:30
Wenkai Du 04004816ba Merge pull request #130 from wenkaidu/p2p_fix
Allocate opCount in pinned host memory for P2P transport

[ROCm/rccl commit: 9c501fb8fb]
2019-08-29 14:12:03 -07:00
Wenkai Du daf2c4b200 Allocate opCount in pinned host memory for P2P transport
To avoid remote P2P read access when checking remote GPU's opCount


[ROCm/rccl commit: 8c975353ed]
2019-08-29 10:22:09 -07:00
amdkila ea0ce5c064 Merge pull request #128 from amdkila/hip-clang
Added hip-clang options to install script, and openmp/pthread flags

[ROCm/rccl commit: 259583cde6]
2019-08-27 16:23:40 -06:00
Wenkai Du 96cab1f5f5 Merge pull request #127 from wenkaidu/rdma
Set RDMA default to off state

[ROCm/rccl commit: a4ef5a3dd4]
2019-08-26 11:46:10 -07:00
Wenkai Du 4afd6818ba Set RDMA default to off state
[ROCm/rccl commit: 0f16ad966a]
2019-08-26 10:59:33 -07:00
saadrahim e433b21b23 Updating versioning to follow rocm-cmake standard (#126)
[ROCm/rccl commit: 544d4fb704]
2019-08-23 16:33:38 -06:00