rpathani
eaa1cdb48c
Merge branch 'master' into xgmi_bench
...
[ROCm/rccl commit: deea20d49c ]
2019-08-16 10:56:56 +05:30
Michael LIAO
f4a240065f
Fix build with hip-clang.
...
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.
[ROCm/rccl commit: 9369f8d75d ]
2019-08-15 14:56:04 -04:00
Wenkai Du
d4862fa605
Tune LL threshold for VEGA
...
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL
[ROCm/rccl commit: 2223cccf15 ]
2019-08-15 09:16:11 -07:00
Wenkai Du
93c44e96cb
Default to minimal 2 rings and improve LL loop
...
[ROCm/rccl commit: 4b77a16f3f ]
2019-08-14 14:12:56 -07:00
Wenkai Du
1feef99e7d
Remove duplicate line
...
[ROCm/rccl commit: 5782a8d857 ]
2019-08-14 13:22:43 -07:00
Wenkai Du
6047487815
RCCL 2.4 update
...
[ROCm/rccl commit: f11c8f60cd ]
2019-08-14 10:42:35 -07:00
rohit pathania
2dbcb62caf
Modified the code to use RTC clock frequency based on gpu gcn id
...
[ROCm/rccl commit: 65e2f5d87b ]
2019-08-14 12:55:12 +05:30
rohit pathania
042261445d
Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
...
# Conflicts:
# tools/rccl-prim-test/rccl_prim_test.cpp
[ROCm/rccl commit: 0f74929dab ]
2019-08-13 11:36:56 +05:30
rohit pathania
86f6d95b06
Adding linkinfo and srcGPU to destGPU info
...
[ROCm/rccl commit: 3bbf924ff8 ]
2019-08-13 11:28:50 +05:30
rohit pathania
95162665c7
Adding linkinfo and srcGPU to destGPU info
...
[ROCm/rccl commit: 5a2f74b8d0 ]
2019-08-09 12:44:06 +05:30
gilbertlee-amd
8645391260
Adding TransferBench tool ( #113 )
...
* Adding standalone TransferBench tool
[ROCm/rccl commit: b8cf48fc16 ]
2019-08-07 17:21:41 -06:00
Wenkai Du
909e014b51
Get HDP register address from hipDeviceGetAttribute API
...
[ROCm/rccl commit: 84d3344796 ]
2019-08-05 14:14:09 -07:00
Wenkai Du
b540c55c9b
Merge pull request #108 from wenkaidu/xgmi_finegrain
...
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
[ROCm/rccl commit: 4a9bdd8539 ]
2019-08-02 10:00:48 -07:00
Michael LIAO
c14ef9f408
Revise the previous fix to use the canonical path to HSA.
...
- This fix the build failures under certain environments.
[ROCm/rccl commit: 4f2aa06688 ]
2019-08-01 14:50:44 -04:00
Wenkai Du
2dcb42effd
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
...
[ROCm/rccl commit: e7022e9196 ]
2019-08-01 04:26:37 +00:00
Michael LIAO
4b5bf9f227
Fix build with hip-clang
...
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
`hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.
[ROCm/rccl commit: 41310144f6 ]
2019-07-31 15:07:36 -04:00
Wenkai Du
6688279075
Add gfx908 target ( #106 )
...
[ROCm/rccl commit: 1969e89003 ]
2019-07-30 13:56:45 -07:00
Wenkai Du
62e6e67e31
Remove extra "." from version string ( #104 )
...
[ROCm/rccl commit: 1fee6f9d50 ]
2019-07-25 15:25:02 -07:00
saadrahim
596e200499
Changing to rocm-cmake new style versioning ( #103 )
...
[ROCm/rccl commit: fdee095dd3 ]
2019-07-22 23:40:13 +00:00
Wenkai Du
d7f25d5be7
Use hipExtLaunchMultiKernelMultiDevice API ( #100 )
...
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232
[ROCm/rccl commit: 0522041fac ]
2019-07-18 09:02:37 -07:00
Ke Wen
a66ab68630
Fix NIC distances for 11+ NICs
...
[ROCm/rccl commit: 4d579e51cc ]
2019-07-17 06:32:33 -07:00
Ke Wen
5c5c58c73b
Fix #224 : prevent number of IB devices from going out of bound
...
[ROCm/rccl commit: 920ae57c14 ]
2019-07-17 06:32:33 -07:00
Wenkai Du
25d29e97d1
Increase debug print of ring topology to 64 ranks ( #99 )
...
[ROCm/rccl commit: dc1908e944 ]
2019-07-16 14:54:17 -07:00
Wenkai Du
602292685d
Allocate transport memory based on numa node ( #97 )
...
[ROCm/rccl commit: 43bd6f5fbf ]
2019-07-15 11:45:38 -07:00
Ke Wen
4211da6d29
Size up IPC buffers to multiples of 2MB
...
Avoid potential CUDA error in concurrent communicator initialization
[ROCm/rccl commit: c8c68fb5f7 ]
2019-07-12 09:50:17 -07:00
gilbertlee-amd
4310b5b4c1
Adding explicit HDP flush when using RDMA via Infiniband ( #95 )
...
* Adding explicit HDP flush when using RDMA via Infiniband
[ROCm/rccl commit: 7b6332d3d0 ]
2019-07-10 16:29:02 -06:00
Hirochika Asai
ee08e8b421
Add the exact matching modifier support "=" to the NCCL_IB_HCA variable ( #236 )
...
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
[ROCm/rccl commit: 0b192d2299 ]
2019-07-09 14:45:41 -07:00
Wenkai Du
b7322c800a
Refactor primitive test to support multiple GPUs in rings ( #94 )
...
* Refactor primitive test to support multiple GPUs in rings
* Make GPUs sync before transfer optional
* Use same ring format as RCCL
* Extend to 8 GPUs and report errors if there is no P2P access
* Control GPUs sync before ops from command line with "-s" option
* Change buffer size through command line option "-n"
Rename iterations command line option to "-i"
[ROCm/rccl commit: 70804da15b ]
2019-07-05 14:29:20 -07:00
Wenkai Du
20975921dd
Fix share memory collision in multi-communicator case. ( #93 )
...
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.
Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm
[ROCm/rccl commit: 949d680e49 ]
2019-07-02 09:27:16 -07:00
Wenkai Du
90b7a02944
Match primitives unroll counts with latest RCCL ( #91 )
...
[ROCm/rccl commit: e6a0da444f ]
2019-06-26 15:09:13 -07:00
Stanley Tsang
6aa817d768
Fixing install script to actually install library when requested ( #88 )
...
* Fixing install script to actually install library when requested. Cleaning up unused code.
Removing unused arguments from install script.
Fixing weird whitespacing
* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib
* Updates and corrections to README and install script
[ROCm/rccl commit: 329a62a01f ]
2019-06-25 17:25:21 -06:00
Ke Wen
3c13a4d1bb
Merge branch 'master' into HEAD
...
[ROCm/rccl commit: 8e04d80382 ]
2019-06-25 13:39:08 -07:00
Ke Wen
b91d8170f8
2.4.8-1
...
Fix #209 : improve socket transport performance
Split transfers over multiple sockets
Launch multiple threads to drive sockets
Detect AWS NICs and set nsockets/nthreads accordingly
[ROCm/rccl commit: 7c72dee660 ]
2019-06-25 13:22:47 -07:00
saadrahim
239c7bdf44
Changing maintainer to no-reply to fix deb generation ( #86 )
...
[ROCm/rccl commit: 840f8715ef ]
2019-06-24 17:13:57 -06:00
saadrahim
f437e903f1
Merge pull request #83 from ROCmSoftwarePlatform/devel
...
Devel to Master
[ROCm/rccl commit: 0de9051ace ]
2019-06-24 14:25:18 -06:00
saadrahim
789c0b828e
Fixing Centos 7 Packaging and package versioning/maintainer ( #82 )
...
- Fixing Centos 7 Packaging
- standardizing version numbers for release to use rocm versioning
- removing maintainer email based on legal's input
[ROCm/rccl commit: 1c7b0bd878 ]
2019-06-24 14:22:16 -06:00
Felix Abecassis
d2f579ba8b
Fix out-of-bounds read in ncclStrToCpuset ( #233 )
...
The affinityStr string was not null-terminated but was passed to strlen(3).
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com >
[ROCm/rccl commit: 37e4f8729e ]
2019-06-21 10:25:08 +02:00
Wenkai Du
17530a2a6f
Use different unroll numbers for copy and reduce ( #81 )
...
* Use different unroll numbers for copy and reduce
* use 4 separate unroll factors
[ROCm/rccl commit: bb5e42bac0 ]
2019-06-19 16:36:16 -07:00
Jeff Daily
53b1ca1d7f
do not use internal stream ( #79 )
...
[ROCm/rccl commit: 754ed213cc ]
2019-06-12 16:26:59 -06:00
Wenkai Du
87d5441552
Calculate and print kernel throughput ( #78 )
...
* rccl-prim-test: print GPU info and set iterations
* Calculate and print kernel throughput
[ROCm/rccl commit: ee14676064 ]
2019-06-07 10:39:30 -07:00
Wenkai Du
dcb2801f25
rccl-prim-test: print GPU info and set iterations ( #77 )
...
[ROCm/rccl commit: 42b488507d ]
2019-06-05 15:16:33 -07:00
Wenkai Du
a8fbf5555c
Implement HDP flush when transfer data over PCIe P2P ( #75 )
...
* Implement HDP flush when transfer data over PCIe P2P
* Add some descriptions to HDP flushing
* Fix for review comments
[ROCm/rccl commit: b7a6307371 ]
2019-06-03 16:29:55 -07:00
Yaxun Sam Liu
dff9e760a0
Make ncclFuncs static
...
This is necessary to constant propagate the function pointers
to eliminate the indirect function call.
[ROCm/rccl commit: 5827a4f616 ]
2019-05-29 10:50:13 -04:00
Saad Rahim
a5d9580a99
Adding NVIDIA copyright
...
[ROCm/rccl commit: 0c0a8ed86f ]
2019-05-24 15:05:00 -07:00
Saad Rahim
07d0f15687
Fixing whitespace
...
[ROCm/rccl commit: 02ef2d27e6 ]
2019-05-24 14:49:12 -07:00
Saad Rahim
7d340ae2a2
Adding link to readthedocs
...
[ROCm/rccl commit: fac7ef9370 ]
2019-05-24 14:48:24 -07:00
saadrahim
b90e705679
Readthedocs documentation support ( #71 )
...
[ROCm/rccl commit: bb7542c1d9 ]
2019-05-24 15:03:56 -06:00
Wenkai Du
5fdf2edd39
Increase number of rings with XGMI connection
...
Improve throughput for about 20%. Also remove P2P over PCIe which was
left enabled at initial release.
Signed-off-by: Wenkai Du <wenkai.du@amd.com >
[ROCm/rccl commit: f45566a8bd ]
2019-05-24 20:58:51 +00:00
Yaxun (Sam) Liu
7b4b3e2981
Fix build failure for hip-clang ( #69 )
...
[ROCm/rccl commit: b921279a21 ]
2019-05-23 16:53:25 -06:00
Wenkai Du
0ed10b1e4d
Add RCCL primitive testing ( #70 )
...
[ROCm/rccl commit: 1bb6d2104c ]
2019-05-23 16:52:17 -06:00