Michael LIAO
4f2aa06688
Revise the previous fix to use the canonical path to HSA.
...
- This fix the build failures under certain environments.
2019-08-01 14:50:44 -04:00
Wenkai Du
9189279220
Merge pull request #107 from mhbliao/hliao/master/swdev-198268
...
Fix build with hip-clang
2019-08-01 08:58:37 -07:00
Wenkai Du
e7022e9196
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
2019-08-01 04:26:37 +00:00
Michael LIAO
41310144f6
Fix build with hip-clang
...
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
`hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.
2019-07-31 15:07:36 -04:00
Cao Zongyan
bfb3921519
Refine RPM package building spec file.
...
Add /sbin/ldconfig into RPM package install operations.
2019-07-31 10:36:22 -07:00
Wenkai Du
1969e89003
Add gfx908 target ( #106 )
2019-07-30 13:56:45 -07:00
Wenkai Du
1fee6f9d50
Remove extra "." from version string ( #104 )
2019-07-25 15:25:02 -07:00
saadrahim
fdee095dd3
Changing to rocm-cmake new style versioning ( #103 )
2019-07-22 23:40:13 +00:00
Wenkai Du
0522041fac
Use hipExtLaunchMultiKernelMultiDevice API ( #100 )
...
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232
2019-07-18 09:02:37 -07:00
Ke Wen
4d579e51cc
Fix NIC distances for 11+ NICs
2019-07-17 06:32:33 -07:00
Ke Wen
920ae57c14
Fix #224 : prevent number of IB devices from going out of bound
2019-07-17 06:32:33 -07:00
Wenkai Du
dc1908e944
Increase debug print of ring topology to 64 ranks ( #99 )
2019-07-16 14:54:17 -07:00
Wenkai Du
43bd6f5fbf
Allocate transport memory based on numa node ( #97 )
2019-07-15 11:45:38 -07:00
Ke Wen
c8c68fb5f7
Size up IPC buffers to multiples of 2MB
...
Avoid potential CUDA error in concurrent communicator initialization
2019-07-12 09:50:17 -07:00
gilbertlee-amd
7b6332d3d0
Adding explicit HDP flush when using RDMA via Infiniband ( #95 )
...
* Adding explicit HDP flush when using RDMA via Infiniband
2019-07-10 16:29:02 -06:00
Hirochika Asai
0b192d2299
Add the exact matching modifier support "=" to the NCCL_IB_HCA variable ( #236 )
...
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
2019-07-09 14:45:41 -07:00
Wenkai Du
70804da15b
Refactor primitive test to support multiple GPUs in rings ( #94 )
...
* Refactor primitive test to support multiple GPUs in rings
* Make GPUs sync before transfer optional
* Use same ring format as RCCL
* Extend to 8 GPUs and report errors if there is no P2P access
* Control GPUs sync before ops from command line with "-s" option
* Change buffer size through command line option "-n"
Rename iterations command line option to "-i"
2019-07-05 14:29:20 -07:00
Wenkai Du
949d680e49
Fix share memory collision in multi-communicator case. ( #93 )
...
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.
Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm
2019-07-02 09:27:16 -07:00
Wenkai Du
e6a0da444f
Match primitives unroll counts with latest RCCL ( #91 )
2019-06-26 15:09:13 -07:00
Stanley Tsang
329a62a01f
Fixing install script to actually install library when requested ( #88 )
...
* Fixing install script to actually install library when requested. Cleaning up unused code.
Removing unused arguments from install script.
Fixing weird whitespacing
* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib
* Updates and corrections to README and install script
2019-06-25 17:25:21 -06:00
Ke Wen
8e04d80382
Merge branch 'master' into HEAD
2019-06-25 13:39:08 -07:00
Ke Wen
7c72dee660
2.4.8-1
...
Fix #209 : improve socket transport performance
Split transfers over multiple sockets
Launch multiple threads to drive sockets
Detect AWS NICs and set nsockets/nthreads accordingly
2019-06-25 13:22:47 -07:00
saadrahim
840f8715ef
Changing maintainer to no-reply to fix deb generation ( #86 )
2019-06-24 17:13:57 -06:00
saadrahim
0de9051ace
Merge pull request #83 from ROCmSoftwarePlatform/devel
...
Devel to Master
2019-06-24 14:25:18 -06:00
saadrahim
1c7b0bd878
Fixing Centos 7 Packaging and package versioning/maintainer ( #82 )
...
- Fixing Centos 7 Packaging
- standardizing version numbers for release to use rocm versioning
- removing maintainer email based on legal's input
2019-06-24 14:22:16 -06:00
Felix Abecassis
37e4f8729e
Fix out-of-bounds read in ncclStrToCpuset ( #233 )
...
The affinityStr string was not null-terminated but was passed to strlen(3).
Signed-off-by: Felix Abecassis <fabecassis@nvidia.com >
2019-06-21 10:25:08 +02:00
Wenkai Du
bb5e42bac0
Use different unroll numbers for copy and reduce ( #81 )
...
* Use different unroll numbers for copy and reduce
* use 4 separate unroll factors
2019-06-19 16:36:16 -07:00
Jeff Daily
754ed213cc
do not use internal stream ( #79 )
2019-06-12 16:26:59 -06:00
Wenkai Du
ee14676064
Calculate and print kernel throughput ( #78 )
...
* rccl-prim-test: print GPU info and set iterations
* Calculate and print kernel throughput
2019-06-07 10:39:30 -07:00
Wenkai Du
42b488507d
rccl-prim-test: print GPU info and set iterations ( #77 )
2019-06-05 15:16:33 -07:00
Wenkai Du
b7a6307371
Implement HDP flush when transfer data over PCIe P2P ( #75 )
...
* Implement HDP flush when transfer data over PCIe P2P
* Add some descriptions to HDP flushing
* Fix for review comments
2019-06-03 16:29:55 -07:00
Wenkai Du
8c974f1f50
Merge pull request #76 from ROCmSoftwarePlatform/fix-indirect-call
...
Make ncclFuncs static
2019-05-29 12:04:58 -07:00
Yaxun Sam Liu
5827a4f616
Make ncclFuncs static
...
This is necessary to constant propagate the function pointers
to eliminate the indirect function call.
2019-05-29 10:50:13 -04:00
Wenkai Du
c85d629355
Merge pull request #74 from saadrahim/readmeUpdate
...
Readme update
2019-05-24 20:27:08 -07:00
Saad Rahim
0c0a8ed86f
Adding NVIDIA copyright
2019-05-24 15:05:00 -07:00
Saad Rahim
02ef2d27e6
Fixing whitespace
2019-05-24 14:49:12 -07:00
Saad Rahim
fac7ef9370
Adding link to readthedocs
2019-05-24 14:48:24 -07:00
Wenkai Du
9a0ac849fa
Merge pull request #72 from wenkaidu/default_rings
...
Increase number of rings with XGMI connection
2019-05-24 14:42:54 -07:00
saadrahim
bb7542c1d9
Readthedocs documentation support ( #71 )
2019-05-24 15:03:56 -06:00
Wenkai Du
f45566a8bd
Increase number of rings with XGMI connection
...
Improve throughput for about 20%. Also remove P2P over PCIe which was
left enabled at initial release.
Signed-off-by: Wenkai Du <wenkai.du@amd.com >
2019-05-24 20:58:51 +00:00
Yaxun (Sam) Liu
b921279a21
Fix build failure for hip-clang ( #69 )
2019-05-23 16:53:25 -06:00
Wenkai Du
1bb6d2104c
Add RCCL primitive testing ( #70 )
2019-05-23 16:52:17 -06:00
Rajat Chopra
6d8b2421bc
Update debian dependencies in README ( #228 )
...
'fakeroot' is needed for building deb packages
2019-05-22 21:19:36 -07:00
saadrahim
4c4351673b
Jenkinsfile ( #65 )
...
* Changing Jenkinsfile to support runs without docker
* Updating install file for build options
* Fixing command execution
* Fixing Jenkinsfile
* fixing test execution
* Removing junit search
2019-05-22 15:32:32 -06:00
saadrahim
42c3e4b93d
Updating readme for 2.5 release ( #67 )
2019-05-22 15:31:12 -06:00
gilbertlee-amd
ffe2054ed2
Test combined calls ( #64 )
...
* Adding test for queueing multiple different collectives, 1 device per thread
2019-05-22 15:30:37 -06:00
Aaron Enye Shi
6e8f40eb22
Update README to note install rocm-cmake ( #68 )
2019-05-22 15:29:59 -06:00
gilbertlee-amd
a115f577dd
Adding fix for unsufficient devices / better logging for skipped tests ( #63 )
2019-05-21 14:34:20 -06:00
Stanley Tsang
afa945d6e6
Renaming jenkinsfile
2019-05-21 15:54:41 +00:00
Wenkai Du
4bfa506a6b
Remove extra compiler path setting
2019-05-21 00:08:42 +00:00