نمودار کامیت

303 کامیت‌ها

مولف SHA1 پیام تاریخ
Wenkai Du 8ec01dde33 Merge pull request #132 from wenkaidu/reduce_kernels
Only generate kernels for sum and copy
2019-09-26 16:14:45 -07:00
Wenkai Du 61ef1dcad5 Only generate kernels for sum and copy 2019-09-24 17:01:12 -07:00
Gilbert Lee 6232985e34 Re-adding gfx908 target 2019-09-13 16:57:34 +00:00
Gilbert Lee 86ce0a93b5 RDMA HDP flush fix 2019-09-06 16:35:55 +00:00
Gilbert Lee 3e6b326a19 Revert "Set RDMA default to off state"
This reverts commit 0f16ad966a.
2019-09-05 18:16:53 +00:00
gilbertlee-amd eaf25ab099 Merge pull request #131 from rpathani/xgmi_bench
Read operation throughput
2019-09-04 09:59:13 -06:00
rohit pathania a270ee080e Read operation throughput 2019-09-03 14:58:40 +05:30
Wenkai Du 22c9ae0712 Merge pull request #129 from rpathani/xgmi_bench
display each workgroup ,links and directions with throughputs
2019-08-30 09:06:21 -07:00
rohit pathania e5b13d69e5 display each workgroup ,links and directions with throughputs 2019-08-30 13:28:23 +05:30
Wenkai Du 9c501fb8fb Merge pull request #130 from wenkaidu/p2p_fix
Allocate opCount in pinned host memory for P2P transport
2019-08-29 14:12:03 -07:00
Wenkai Du 8c975353ed Allocate opCount in pinned host memory for P2P transport
To avoid remote P2P read access when checking remote GPU's opCount
2019-08-29 10:22:09 -07:00
amdkila 259583cde6 Merge pull request #128 from amdkila/hip-clang
Added hip-clang options to install script, and openmp/pthread flags
2019-08-27 16:23:40 -06:00
Wenkai Du a4ef5a3dd4 Merge pull request #127 from wenkaidu/rdma
Set RDMA default to off state
2019-08-26 11:46:10 -07:00
Wenkai Du 0f16ad966a Set RDMA default to off state 2019-08-26 10:59:33 -07:00
saadrahim 544d4fb704 Updating versioning to follow rocm-cmake standard (#126) 2019-08-23 16:33:38 -06:00
Akila Premachandra f48ae5c98d Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt 2019-08-23 22:02:42 +00:00
Wenkai Du 6759660529 Merge pull request #125 from wenkaidu/fix_nvml_id
Assign unused nmvlDev to avoid random number
2019-08-19 09:08:13 -07:00
Wenkai Du ee5dec4467 Merge pull request #117 from rpathani/xgmi_bench
Modified the code to use RTC clock frequency based on gpu gcn id
2019-08-19 08:59:34 -07:00
rpathani 40e30b5168 Update rccl_prim_test.cpp 2019-08-19 12:44:11 +05:30
Wenkai Du a67ae11ce4 Merge pull request #124 from wenkaidu/upstream_sync
Upstream sync
2019-08-16 16:41:55 -07:00
Wenkai Du 86efdfc3b5 Assign unused nmvlDev to avoid random number 2019-08-16 16:34:14 -07:00
Wenkai Du 7c38da0939 Merge remote-tracking branch 'remotes/nccl/master' into HEAD 2019-08-16 16:13:34 -07:00
Wenkai Du 72a64e27f3 Merge pull request #123 from wenkaidu/tune_unroll
Tune AUTOUNROLL for better performance
2019-08-16 11:15:49 -07:00
Wenkai Du 1faededc03 Tune AUTOUNROLL for better performance
Also remove all unused UNROLL defines
2019-08-16 10:34:53 -07:00
rpathani deea20d49c Merge branch 'master' into xgmi_bench 2019-08-16 10:56:56 +05:30
Wenkai Du 50c2202fe9 Merge pull request #121 from mhbliao/hliao/master/swdev-200061
Fix build with hip-clang.
2019-08-15 12:40:46 -07:00
Michael LIAO 9369f8d75d Fix build with hip-clang.
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.
2019-08-15 14:56:04 -04:00
Wenkai Du 3f6662f837 Merge pull request #122 from wenkaidu/tune_ll
Tune LL threshold for VEGA
2019-08-15 10:33:17 -07:00
Wenkai Du 2223cccf15 Tune LL threshold for VEGA
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL
2019-08-15 09:16:11 -07:00
Wenkai Du 9af66195db Merge pull request #120 from wenkaidu/rccl_2.4_update
RCCL 2.4 update
2019-08-14 15:21:30 -07:00
Wenkai Du 4b77a16f3f Default to minimal 2 rings and improve LL loop 2019-08-14 14:12:56 -07:00
Wenkai Du 5782a8d857 Remove duplicate line 2019-08-14 13:22:43 -07:00
Wenkai Du 6827b174c0 Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.4_update 2019-08-14 10:44:18 -07:00
Wenkai Du f11c8f60cd RCCL 2.4 update 2019-08-14 10:42:35 -07:00
David Addison ccb1298148 Merge branch 'lowintelligence-shm'
PR#196
2019-08-14 10:09:53 -07:00
David Addison fad079a8ae Updated PR#196 to use a common hash function 2019-08-14 10:08:39 -07:00
David Addison 01d1836668 Merge branch 'shm' of git://github.com/lowintelligence/nccl into lowintelligence-shm 2019-08-14 09:45:45 -07:00
rohit pathania 65e2f5d87b Modified the code to use RTC clock frequency based on gpu gcn id 2019-08-14 12:55:12 +05:30
David Addison 7f2b337e70 Make use of SO_REUSEPORT conditional
Fixes: #244

SO_RESUEPORT was introduced in Linux 3.9 and later.
This change allows NCCL to compile against older releases.

The functionality is only required if the user is specifying
a NCCL bootstrap address via an environment variable.
2019-08-13 16:32:07 -07:00
rpathani 40445c17d8 Adding linkinfo and srcGPU to destGPU info (#114)
* Adding linkinfo and srcGPU to destGPU info
2019-08-13 09:25:03 -07:00
rohit pathania 0f74929dab Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
# Conflicts:
#	tools/rccl-prim-test/rccl_prim_test.cpp
2019-08-13 11:36:56 +05:30
rohit pathania 3bbf924ff8 Adding linkinfo and srcGPU to destGPU info 2019-08-13 11:28:50 +05:30
Stanley Tsang b3a57dbb33 Merge pull request #116 from stanleytsang-amd/master
Removing unnecessary device collective source files.
2019-08-12 18:26:02 -04:00
Stanley Tsang 3a61907182 Removing unnecessary device collective source files. 2019-08-12 18:23:23 +00:00
rohit pathania 5a2f74b8d0 Adding linkinfo and srcGPU to destGPU info 2019-08-09 12:44:06 +05:30
gilbertlee-amd b8cf48fc16 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool
2019-08-07 17:21:41 -06:00
Wenkai Du f1c727d4ce Merge pull request #112 from wenkaidu/hdp
Get HDP register address from hipDeviceGetAttribute API
2019-08-05 14:27:19 -07:00
Wenkai Du 84d3344796 Get HDP register address from hipDeviceGetAttribute API 2019-08-05 14:14:09 -07:00
Wenkai Du 4a9bdd8539 Merge pull request #108 from wenkaidu/xgmi_finegrain
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
2019-08-02 10:00:48 -07:00
Wenkai Du 315f792f83 Merge pull request #110 from mhbliao/hliao/master/swdev-198268
Revise the previous fix to use the canonical path to HSA.
2019-08-01 12:46:25 -07:00