Граф коммитов

214 Коммитов

Автор SHA1 Сообщение Дата
rohit pathania bc51b5bc28 display each workgroup ,links and directions with throughputs
[ROCm/rccl commit: e5b13d69e5]
2019-08-30 13:28:23 +05:30
Wenkai Du daf2c4b200 Allocate opCount in pinned host memory for P2P transport
To avoid remote P2P read access when checking remote GPU's opCount


[ROCm/rccl commit: 8c975353ed]
2019-08-29 10:22:09 -07:00
amdkila ea0ce5c064 Merge pull request #128 from amdkila/hip-clang
Added hip-clang options to install script, and openmp/pthread flags

[ROCm/rccl commit: 259583cde6]
2019-08-27 16:23:40 -06:00
Wenkai Du 4afd6818ba Set RDMA default to off state
[ROCm/rccl commit: 0f16ad966a]
2019-08-26 10:59:33 -07:00
saadrahim e433b21b23 Updating versioning to follow rocm-cmake standard (#126)
[ROCm/rccl commit: 544d4fb704]
2019-08-23 16:33:38 -06:00
Akila Premachandra 94b33a7550 Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt
[ROCm/rccl commit: f48ae5c98d]
2019-08-23 22:02:42 +00:00
Wenkai Du 4df1defc3b Merge pull request #125 from wenkaidu/fix_nvml_id
Assign unused nmvlDev to avoid random number

[ROCm/rccl commit: 6759660529]
2019-08-19 09:08:13 -07:00
Wenkai Du 54608abf5c Merge pull request #117 from rpathani/xgmi_bench
Modified the code to use RTC clock frequency based on gpu gcn id

[ROCm/rccl commit: ee5dec4467]
2019-08-19 08:59:34 -07:00
rpathani c441f2ff9b Update rccl_prim_test.cpp
[ROCm/rccl commit: 40e30b5168]
2019-08-19 12:44:11 +05:30
Wenkai Du 04cd446d89 Assign unused nmvlDev to avoid random number
[ROCm/rccl commit: 86efdfc3b5]
2019-08-16 16:34:14 -07:00
Wenkai Du 60989a3fc9 Merge remote-tracking branch 'remotes/nccl/master' into HEAD
[ROCm/rccl commit: 7c38da0939]
2019-08-16 16:13:34 -07:00
Wenkai Du 7396d5c3ba Tune AUTOUNROLL for better performance
Also remove all unused UNROLL defines


[ROCm/rccl commit: 1faededc03]
2019-08-16 10:34:53 -07:00
rpathani eaa1cdb48c Merge branch 'master' into xgmi_bench
[ROCm/rccl commit: deea20d49c]
2019-08-16 10:56:56 +05:30
Michael LIAO f4a240065f Fix build with hip-clang.
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.


[ROCm/rccl commit: 9369f8d75d]
2019-08-15 14:56:04 -04:00
Wenkai Du d4862fa605 Tune LL threshold for VEGA
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL


[ROCm/rccl commit: 2223cccf15]
2019-08-15 09:16:11 -07:00
Wenkai Du 93c44e96cb Default to minimal 2 rings and improve LL loop
[ROCm/rccl commit: 4b77a16f3f]
2019-08-14 14:12:56 -07:00
Wenkai Du 1feef99e7d Remove duplicate line
[ROCm/rccl commit: 5782a8d857]
2019-08-14 13:22:43 -07:00
Wenkai Du 6047487815 RCCL 2.4 update
[ROCm/rccl commit: f11c8f60cd]
2019-08-14 10:42:35 -07:00
David Addison d57c0b0f92 Updated PR#196 to use a common hash function
[ROCm/rccl commit: fad079a8ae]
2019-08-14 10:08:39 -07:00
David Addison bb5b11fa23 Merge branch 'shm' of git://github.com/lowintelligence/nccl into lowintelligence-shm
[ROCm/rccl commit: 01d1836668]
2019-08-14 09:45:45 -07:00
rohit pathania 2dbcb62caf Modified the code to use RTC clock frequency based on gpu gcn id
[ROCm/rccl commit: 65e2f5d87b]
2019-08-14 12:55:12 +05:30
David Addison c7957daee3 Make use of SO_REUSEPORT conditional
Fixes: #244

SO_RESUEPORT was introduced in Linux 3.9 and later.
This change allows NCCL to compile against older releases.

The functionality is only required if the user is specifying
a NCCL bootstrap address via an environment variable.


[ROCm/rccl commit: 7f2b337e70]
2019-08-13 16:32:07 -07:00
rohit pathania 042261445d Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
# Conflicts:
#	tools/rccl-prim-test/rccl_prim_test.cpp


[ROCm/rccl commit: 0f74929dab]
2019-08-13 11:36:56 +05:30
rohit pathania 86f6d95b06 Adding linkinfo and srcGPU to destGPU info
[ROCm/rccl commit: 3bbf924ff8]
2019-08-13 11:28:50 +05:30
rohit pathania 95162665c7 Adding linkinfo and srcGPU to destGPU info
[ROCm/rccl commit: 5a2f74b8d0]
2019-08-09 12:44:06 +05:30
gilbertlee-amd 8645391260 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool

[ROCm/rccl commit: b8cf48fc16]
2019-08-07 17:21:41 -06:00
Wenkai Du 909e014b51 Get HDP register address from hipDeviceGetAttribute API
[ROCm/rccl commit: 84d3344796]
2019-08-05 14:14:09 -07:00
Wenkai Du b540c55c9b Merge pull request #108 from wenkaidu/xgmi_finegrain
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link

[ROCm/rccl commit: 4a9bdd8539]
2019-08-02 10:00:48 -07:00
Michael LIAO c14ef9f408 Revise the previous fix to use the canonical path to HSA.
- This fix the build failures under certain environments.


[ROCm/rccl commit: 4f2aa06688]
2019-08-01 14:50:44 -04:00
Wenkai Du 2dcb42effd Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
[ROCm/rccl commit: e7022e9196]
2019-08-01 04:26:37 +00:00
Michael LIAO 4b5bf9f227 Fix build with hip-clang
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
  `hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.


[ROCm/rccl commit: 41310144f6]
2019-07-31 15:07:36 -04:00
Cao Zongyan d45a1180f7 Refine RPM package building spec file.
Add /sbin/ldconfig into RPM package install operations.


[ROCm/rccl commit: bfb3921519]
2019-07-31 10:36:22 -07:00
Wenkai Du 6688279075 Add gfx908 target (#106)
[ROCm/rccl commit: 1969e89003]
2019-07-30 13:56:45 -07:00
Wenkai Du 62e6e67e31 Remove extra "." from version string (#104)
[ROCm/rccl commit: 1fee6f9d50]
2019-07-25 15:25:02 -07:00
saadrahim 596e200499 Changing to rocm-cmake new style versioning (#103)
[ROCm/rccl commit: fdee095dd3]
2019-07-22 23:40:13 +00:00
Wenkai Du d7f25d5be7 Use hipExtLaunchMultiKernelMultiDevice API (#100)
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232

[ROCm/rccl commit: 0522041fac]
2019-07-18 09:02:37 -07:00
Ke Wen a66ab68630 Fix NIC distances for 11+ NICs
[ROCm/rccl commit: 4d579e51cc]
2019-07-17 06:32:33 -07:00
Ke Wen 5c5c58c73b Fix #224: prevent number of IB devices from going out of bound
[ROCm/rccl commit: 920ae57c14]
2019-07-17 06:32:33 -07:00
Wenkai Du 25d29e97d1 Increase debug print of ring topology to 64 ranks (#99)
[ROCm/rccl commit: dc1908e944]
2019-07-16 14:54:17 -07:00
Wenkai Du 602292685d Allocate transport memory based on numa node (#97)
[ROCm/rccl commit: 43bd6f5fbf]
2019-07-15 11:45:38 -07:00
Ke Wen 4211da6d29 Size up IPC buffers to multiples of 2MB
Avoid potential CUDA error in concurrent communicator initialization


[ROCm/rccl commit: c8c68fb5f7]
2019-07-12 09:50:17 -07:00
gilbertlee-amd 4310b5b4c1 Adding explicit HDP flush when using RDMA via Infiniband (#95)
* Adding explicit HDP flush when using RDMA via Infiniband



[ROCm/rccl commit: 7b6332d3d0]
2019-07-10 16:29:02 -06:00
Hirochika Asai ee08e8b421 Add the exact matching modifier support "=" to the NCCL_IB_HCA variable (#236)
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.

[ROCm/rccl commit: 0b192d2299]
2019-07-09 14:45:41 -07:00
Wenkai Du b7322c800a Refactor primitive test to support multiple GPUs in rings (#94)
* Refactor primitive test to support multiple GPUs in rings

* Make GPUs sync before transfer optional

* Use same ring format as RCCL

* Extend to 8 GPUs and report errors if there is no P2P access

* Control GPUs sync before ops from command line with "-s" option

* Change buffer size through command line option "-n"

Rename iterations command line option to "-i"


[ROCm/rccl commit: 70804da15b]
2019-07-05 14:29:20 -07:00
Wenkai Du 20975921dd Fix share memory collision in multi-communicator case. (#93)
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.

Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm

[ROCm/rccl commit: 949d680e49]
2019-07-02 09:27:16 -07:00
Wenkai Du 90b7a02944 Match primitives unroll counts with latest RCCL (#91)
[ROCm/rccl commit: e6a0da444f]
2019-06-26 15:09:13 -07:00
Stanley Tsang 6aa817d768 Fixing install script to actually install library when requested (#88)
* Fixing install script to actually install library when requested.  Cleaning up unused code.

Removing unused arguments from install script.

Fixing weird whitespacing

* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib

* Updates and corrections to README and install script


[ROCm/rccl commit: 329a62a01f]
2019-06-25 17:25:21 -06:00
Ke Wen 3c13a4d1bb Merge branch 'master' into HEAD
[ROCm/rccl commit: 8e04d80382]
2019-06-25 13:39:08 -07:00
Ke Wen b91d8170f8 2.4.8-1
Fix #209: improve socket transport performance
  Split transfers over multiple sockets
  Launch multiple threads to drive sockets
  Detect AWS NICs and set nsockets/nthreads accordingly


[ROCm/rccl commit: 7c72dee660]
2019-06-25 13:22:47 -07:00
saadrahim 239c7bdf44 Changing maintainer to no-reply to fix deb generation (#86)
[ROCm/rccl commit: 840f8715ef]
2019-06-24 17:13:57 -06:00