rohit pathania
bc51b5bc28
display each workgroup ,links and directions with throughputs
...
[ROCm/rccl commit: e5b13d69e5 ]
2019-08-30 13:28:23 +05:30
Wenkai Du
daf2c4b200
Allocate opCount in pinned host memory for P2P transport
...
To avoid remote P2P read access when checking remote GPU's opCount
[ROCm/rccl commit: 8c975353ed ]
2019-08-29 10:22:09 -07:00
amdkila
ea0ce5c064
Merge pull request #128 from amdkila/hip-clang
...
Added hip-clang options to install script, and openmp/pthread flags
[ROCm/rccl commit: 259583cde6 ]
2019-08-27 16:23:40 -06:00
Wenkai Du
4afd6818ba
Set RDMA default to off state
...
[ROCm/rccl commit: 0f16ad966a ]
2019-08-26 10:59:33 -07:00
saadrahim
e433b21b23
Updating versioning to follow rocm-cmake standard ( #126 )
...
[ROCm/rccl commit: 544d4fb704 ]
2019-08-23 16:33:38 -06:00
Akila Premachandra
94b33a7550
Added hip-clang options to install script, and openmp/pthread options to CMakeLists.txt
...
[ROCm/rccl commit: f48ae5c98d ]
2019-08-23 22:02:42 +00:00
Wenkai Du
4df1defc3b
Merge pull request #125 from wenkaidu/fix_nvml_id
...
Assign unused nmvlDev to avoid random number
[ROCm/rccl commit: 6759660529 ]
2019-08-19 09:08:13 -07:00
Wenkai Du
54608abf5c
Merge pull request #117 from rpathani/xgmi_bench
...
Modified the code to use RTC clock frequency based on gpu gcn id
[ROCm/rccl commit: ee5dec4467 ]
2019-08-19 08:59:34 -07:00
rpathani
c441f2ff9b
Update rccl_prim_test.cpp
...
[ROCm/rccl commit: 40e30b5168 ]
2019-08-19 12:44:11 +05:30
Wenkai Du
04cd446d89
Assign unused nmvlDev to avoid random number
...
[ROCm/rccl commit: 86efdfc3b5 ]
2019-08-16 16:34:14 -07:00
Wenkai Du
60989a3fc9
Merge remote-tracking branch 'remotes/nccl/master' into HEAD
...
[ROCm/rccl commit: 7c38da0939 ]
2019-08-16 16:13:34 -07:00
Wenkai Du
7396d5c3ba
Tune AUTOUNROLL for better performance
...
Also remove all unused UNROLL defines
[ROCm/rccl commit: 1faededc03 ]
2019-08-16 10:34:53 -07:00
rpathani
eaa1cdb48c
Merge branch 'master' into xgmi_bench
...
[ROCm/rccl commit: deea20d49c ]
2019-08-16 10:56:56 +05:30
Michael LIAO
f4a240065f
Fix build with hip-clang.
...
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.
[ROCm/rccl commit: 9369f8d75d ]
2019-08-15 14:56:04 -04:00
Wenkai Du
d4862fa605
Tune LL threshold for VEGA
...
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL
[ROCm/rccl commit: 2223cccf15 ]
2019-08-15 09:16:11 -07:00
Wenkai Du
93c44e96cb
Default to minimal 2 rings and improve LL loop
...
[ROCm/rccl commit: 4b77a16f3f ]
2019-08-14 14:12:56 -07:00
Wenkai Du
1feef99e7d
Remove duplicate line
...
[ROCm/rccl commit: 5782a8d857 ]
2019-08-14 13:22:43 -07:00
Wenkai Du
6047487815
RCCL 2.4 update
...
[ROCm/rccl commit: f11c8f60cd ]
2019-08-14 10:42:35 -07:00
David Addison
d57c0b0f92
Updated PR#196 to use a common hash function
...
[ROCm/rccl commit: fad079a8ae ]
2019-08-14 10:08:39 -07:00
David Addison
bb5b11fa23
Merge branch 'shm' of git://github.com/lowintelligence/nccl into lowintelligence-shm
...
[ROCm/rccl commit: 01d1836668 ]
2019-08-14 09:45:45 -07:00
rohit pathania
2dbcb62caf
Modified the code to use RTC clock frequency based on gpu gcn id
...
[ROCm/rccl commit: 65e2f5d87b ]
2019-08-14 12:55:12 +05:30
David Addison
c7957daee3
Make use of SO_REUSEPORT conditional
...
Fixes : #244
SO_RESUEPORT was introduced in Linux 3.9 and later.
This change allows NCCL to compile against older releases.
The functionality is only required if the user is specifying
a NCCL bootstrap address via an environment variable.
[ROCm/rccl commit: 7f2b337e70 ]
2019-08-13 16:32:07 -07:00
rohit pathania
042261445d
Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
...
# Conflicts:
# tools/rccl-prim-test/rccl_prim_test.cpp
[ROCm/rccl commit: 0f74929dab ]
2019-08-13 11:36:56 +05:30
rohit pathania
86f6d95b06
Adding linkinfo and srcGPU to destGPU info
...
[ROCm/rccl commit: 3bbf924ff8 ]
2019-08-13 11:28:50 +05:30
rohit pathania
95162665c7
Adding linkinfo and srcGPU to destGPU info
...
[ROCm/rccl commit: 5a2f74b8d0 ]
2019-08-09 12:44:06 +05:30
gilbertlee-amd
8645391260
Adding TransferBench tool ( #113 )
...
* Adding standalone TransferBench tool
[ROCm/rccl commit: b8cf48fc16 ]
2019-08-07 17:21:41 -06:00
Wenkai Du
909e014b51
Get HDP register address from hipDeviceGetAttribute API
...
[ROCm/rccl commit: 84d3344796 ]
2019-08-05 14:14:09 -07:00
Wenkai Du
b540c55c9b
Merge pull request #108 from wenkaidu/xgmi_finegrain
...
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
[ROCm/rccl commit: 4a9bdd8539 ]
2019-08-02 10:00:48 -07:00
Michael LIAO
c14ef9f408
Revise the previous fix to use the canonical path to HSA.
...
- This fix the build failures under certain environments.
[ROCm/rccl commit: 4f2aa06688 ]
2019-08-01 14:50:44 -04:00
Wenkai Du
2dcb42effd
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
...
[ROCm/rccl commit: e7022e9196 ]
2019-08-01 04:26:37 +00:00
Michael LIAO
4b5bf9f227
Fix build with hip-clang
...
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
`hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.
[ROCm/rccl commit: 41310144f6 ]
2019-07-31 15:07:36 -04:00
Cao Zongyan
d45a1180f7
Refine RPM package building spec file.
...
Add /sbin/ldconfig into RPM package install operations.
[ROCm/rccl commit: bfb3921519 ]
2019-07-31 10:36:22 -07:00
Wenkai Du
6688279075
Add gfx908 target ( #106 )
...
[ROCm/rccl commit: 1969e89003 ]
2019-07-30 13:56:45 -07:00
Wenkai Du
62e6e67e31
Remove extra "." from version string ( #104 )
...
[ROCm/rccl commit: 1fee6f9d50 ]
2019-07-25 15:25:02 -07:00
saadrahim
596e200499
Changing to rocm-cmake new style versioning ( #103 )
...
[ROCm/rccl commit: fdee095dd3 ]
2019-07-22 23:40:13 +00:00
Wenkai Du
d7f25d5be7
Use hipExtLaunchMultiKernelMultiDevice API ( #100 )
...
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232
[ROCm/rccl commit: 0522041fac ]
2019-07-18 09:02:37 -07:00
Ke Wen
a66ab68630
Fix NIC distances for 11+ NICs
...
[ROCm/rccl commit: 4d579e51cc ]
2019-07-17 06:32:33 -07:00
Ke Wen
5c5c58c73b
Fix #224 : prevent number of IB devices from going out of bound
...
[ROCm/rccl commit: 920ae57c14 ]
2019-07-17 06:32:33 -07:00
Wenkai Du
25d29e97d1
Increase debug print of ring topology to 64 ranks ( #99 )
...
[ROCm/rccl commit: dc1908e944 ]
2019-07-16 14:54:17 -07:00
Wenkai Du
602292685d
Allocate transport memory based on numa node ( #97 )
...
[ROCm/rccl commit: 43bd6f5fbf ]
2019-07-15 11:45:38 -07:00
Ke Wen
4211da6d29
Size up IPC buffers to multiples of 2MB
...
Avoid potential CUDA error in concurrent communicator initialization
[ROCm/rccl commit: c8c68fb5f7 ]
2019-07-12 09:50:17 -07:00
gilbertlee-amd
4310b5b4c1
Adding explicit HDP flush when using RDMA via Infiniband ( #95 )
...
* Adding explicit HDP flush when using RDMA via Infiniband
[ROCm/rccl commit: 7b6332d3d0 ]
2019-07-10 16:29:02 -06:00
Hirochika Asai
ee08e8b421
Add the exact matching modifier support "=" to the NCCL_IB_HCA variable ( #236 )
...
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
[ROCm/rccl commit: 0b192d2299 ]
2019-07-09 14:45:41 -07:00
Wenkai Du
b7322c800a
Refactor primitive test to support multiple GPUs in rings ( #94 )
...
* Refactor primitive test to support multiple GPUs in rings
* Make GPUs sync before transfer optional
* Use same ring format as RCCL
* Extend to 8 GPUs and report errors if there is no P2P access
* Control GPUs sync before ops from command line with "-s" option
* Change buffer size through command line option "-n"
Rename iterations command line option to "-i"
[ROCm/rccl commit: 70804da15b ]
2019-07-05 14:29:20 -07:00
Wenkai Du
20975921dd
Fix share memory collision in multi-communicator case. ( #93 )
...
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.
Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm
[ROCm/rccl commit: 949d680e49 ]
2019-07-02 09:27:16 -07:00
Wenkai Du
90b7a02944
Match primitives unroll counts with latest RCCL ( #91 )
...
[ROCm/rccl commit: e6a0da444f ]
2019-06-26 15:09:13 -07:00
Stanley Tsang
6aa817d768
Fixing install script to actually install library when requested ( #88 )
...
* Fixing install script to actually install library when requested. Cleaning up unused code.
Removing unused arguments from install script.
Fixing weird whitespacing
* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib
* Updates and corrections to README and install script
[ROCm/rccl commit: 329a62a01f ]
2019-06-25 17:25:21 -06:00
Ke Wen
3c13a4d1bb
Merge branch 'master' into HEAD
...
[ROCm/rccl commit: 8e04d80382 ]
2019-06-25 13:39:08 -07:00
Ke Wen
b91d8170f8
2.4.8-1
...
Fix #209 : improve socket transport performance
Split transfers over multiple sockets
Launch multiple threads to drive sockets
Detect AWS NICs and set nsockets/nthreads accordingly
[ROCm/rccl commit: 7c72dee660 ]
2019-06-25 13:22:47 -07:00
saadrahim
239c7bdf44
Changing maintainer to no-reply to fix deb generation ( #86 )
...
[ROCm/rccl commit: 840f8715ef ]
2019-06-24 17:13:57 -06:00