Wenkai Du
8ec01dde33
Merge pull request #132 from wenkaidu/reduce_kernels
...
Only generate kernels for sum and copy
2019-09-26 16:14:45 -07:00
Wenkai Du
61ef1dcad5
Only generate kernels for sum and copy
2019-09-24 17:01:12 -07:00
Gilbert Lee
6232985e34
Re-adding gfx908 target
2019-09-13 16:57:34 +00:00
Gilbert Lee
86ce0a93b5
RDMA HDP flush fix
2019-09-06 16:35:55 +00:00
Gilbert Lee
3e6b326a19
Revert "Set RDMA default to off state"
...
This reverts commit 0f16ad966a .
2019-09-05 18:16:53 +00:00
gilbertlee-amd
eaf25ab099
Merge pull request #131 from rpathani/xgmi_bench
...
Read operation throughput
2019-09-04 09:59:13 -06:00
rohit pathania
a270ee080e
Read operation throughput
2019-09-03 14:58:40 +05:30
Wenkai Du
22c9ae0712
Merge pull request #129 from rpathani/xgmi_bench
...
display each workgroup ,links and directions with throughputs
2019-08-30 09:06:21 -07:00
rohit pathania
e5b13d69e5
display each workgroup ,links and directions with throughputs
2019-08-30 13:28:23 +05:30
Wenkai Du
9c501fb8fb
Merge pull request #130 from wenkaidu/p2p_fix
...
Allocate opCount in pinned host memory for P2P transport
2019-08-29 14:12:03 -07:00
Wenkai Du
8c975353ed
Allocate opCount in pinned host memory for P2P transport
...
To avoid remote P2P read access when checking remote GPU's opCount
2019-08-29 10:22:09 -07:00
amdkila
259583cde6
Merge pull request #128 from amdkila/hip-clang
...
Added hip-clang options to install script, and openmp/pthread flags
2019-08-27 16:23:40 -06:00
Wenkai Du
a4ef5a3dd4
Merge pull request #127 from wenkaidu/rdma
...
Set RDMA default to off state
2019-08-26 11:46:10 -07:00
Wenkai Du
0f16ad966a
Set RDMA default to off state
2019-08-26 10:59:33 -07:00
saadrahim
544d4fb704
Updating versioning to follow rocm-cmake standard ( #126 )
2019-08-23 16:33:38 -06:00
Akila Premachandra
f48ae5c98d
Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt
2019-08-23 22:02:42 +00:00
Wenkai Du
6759660529
Merge pull request #125 from wenkaidu/fix_nvml_id
...
Assign unused nmvlDev to avoid random number
2019-08-19 09:08:13 -07:00
Wenkai Du
ee5dec4467
Merge pull request #117 from rpathani/xgmi_bench
...
Modified the code to use RTC clock frequency based on gpu gcn id
2019-08-19 08:59:34 -07:00
rpathani
40e30b5168
Update rccl_prim_test.cpp
2019-08-19 12:44:11 +05:30
Wenkai Du
a67ae11ce4
Merge pull request #124 from wenkaidu/upstream_sync
...
Upstream sync
2019-08-16 16:41:55 -07:00
Wenkai Du
86efdfc3b5
Assign unused nmvlDev to avoid random number
2019-08-16 16:34:14 -07:00
Wenkai Du
7c38da0939
Merge remote-tracking branch 'remotes/nccl/master' into HEAD
2019-08-16 16:13:34 -07:00
Wenkai Du
72a64e27f3
Merge pull request #123 from wenkaidu/tune_unroll
...
Tune AUTOUNROLL for better performance
2019-08-16 11:15:49 -07:00
Wenkai Du
1faededc03
Tune AUTOUNROLL for better performance
...
Also remove all unused UNROLL defines
2019-08-16 10:34:53 -07:00
rpathani
deea20d49c
Merge branch 'master' into xgmi_bench
2019-08-16 10:56:56 +05:30
Wenkai Du
50c2202fe9
Merge pull request #121 from mhbliao/hliao/master/swdev-200061
...
Fix build with hip-clang.
2019-08-15 12:40:46 -07:00
Michael LIAO
9369f8d75d
Fix build with hip-clang.
...
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.
2019-08-15 14:56:04 -04:00
Wenkai Du
3f6662f837
Merge pull request #122 from wenkaidu/tune_ll
...
Tune LL threshold for VEGA
2019-08-15 10:33:17 -07:00
Wenkai Du
2223cccf15
Tune LL threshold for VEGA
...
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL
2019-08-15 09:16:11 -07:00
Wenkai Du
9af66195db
Merge pull request #120 from wenkaidu/rccl_2.4_update
...
RCCL 2.4 update
2019-08-14 15:21:30 -07:00
Wenkai Du
4b77a16f3f
Default to minimal 2 rings and improve LL loop
2019-08-14 14:12:56 -07:00
Wenkai Du
5782a8d857
Remove duplicate line
2019-08-14 13:22:43 -07:00
Wenkai Du
6827b174c0
Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.4_update
2019-08-14 10:44:18 -07:00
Wenkai Du
f11c8f60cd
RCCL 2.4 update
2019-08-14 10:42:35 -07:00
David Addison
ccb1298148
Merge branch 'lowintelligence-shm'
...
PR#196
2019-08-14 10:09:53 -07:00
David Addison
fad079a8ae
Updated PR#196 to use a common hash function
2019-08-14 10:08:39 -07:00
David Addison
01d1836668
Merge branch 'shm' of git://github.com/lowintelligence/nccl into lowintelligence-shm
2019-08-14 09:45:45 -07:00
rohit pathania
65e2f5d87b
Modified the code to use RTC clock frequency based on gpu gcn id
2019-08-14 12:55:12 +05:30
David Addison
7f2b337e70
Make use of SO_REUSEPORT conditional
...
Fixes : #244
SO_RESUEPORT was introduced in Linux 3.9 and later.
This change allows NCCL to compile against older releases.
The functionality is only required if the user is specifying
a NCCL bootstrap address via an environment variable.
2019-08-13 16:32:07 -07:00
rpathani
40445c17d8
Adding linkinfo and srcGPU to destGPU info ( #114 )
...
* Adding linkinfo and srcGPU to destGPU info
2019-08-13 09:25:03 -07:00
rohit pathania
0f74929dab
Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
...
# Conflicts:
# tools/rccl-prim-test/rccl_prim_test.cpp
2019-08-13 11:36:56 +05:30
rohit pathania
3bbf924ff8
Adding linkinfo and srcGPU to destGPU info
2019-08-13 11:28:50 +05:30
Stanley Tsang
b3a57dbb33
Merge pull request #116 from stanleytsang-amd/master
...
Removing unnecessary device collective source files.
2019-08-12 18:26:02 -04:00
Stanley Tsang
3a61907182
Removing unnecessary device collective source files.
2019-08-12 18:23:23 +00:00
rohit pathania
5a2f74b8d0
Adding linkinfo and srcGPU to destGPU info
2019-08-09 12:44:06 +05:30
gilbertlee-amd
b8cf48fc16
Adding TransferBench tool ( #113 )
...
* Adding standalone TransferBench tool
2019-08-07 17:21:41 -06:00
Wenkai Du
f1c727d4ce
Merge pull request #112 from wenkaidu/hdp
...
Get HDP register address from hipDeviceGetAttribute API
2019-08-05 14:27:19 -07:00
Wenkai Du
84d3344796
Get HDP register address from hipDeviceGetAttribute API
2019-08-05 14:14:09 -07:00
Wenkai Du
4a9bdd8539
Merge pull request #108 from wenkaidu/xgmi_finegrain
...
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
2019-08-02 10:00:48 -07:00
Wenkai Du
315f792f83
Merge pull request #110 from mhbliao/hliao/master/swdev-198268
...
Revise the previous fix to use the canonical path to HSA.
2019-08-01 12:46:25 -07:00