Stanley Tsang
de09bece99
Removing unnecessary device collective source files.
...
[ROCm/rccl commit: 3a61907182 ]
2019-08-12 18:23:23 +00:00
gilbertlee-amd
8645391260
Adding TransferBench tool ( #113 )
...
* Adding standalone TransferBench tool
[ROCm/rccl commit: b8cf48fc16 ]
2019-08-07 17:21:41 -06:00
Wenkai Du
abab7569f9
Merge pull request #112 from wenkaidu/hdp
...
Get HDP register address from hipDeviceGetAttribute API
[ROCm/rccl commit: f1c727d4ce ]
2019-08-05 14:27:19 -07:00
Wenkai Du
909e014b51
Get HDP register address from hipDeviceGetAttribute API
...
[ROCm/rccl commit: 84d3344796 ]
2019-08-05 14:14:09 -07:00
Wenkai Du
b540c55c9b
Merge pull request #108 from wenkaidu/xgmi_finegrain
...
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
[ROCm/rccl commit: 4a9bdd8539 ]
2019-08-02 10:00:48 -07:00
Wenkai Du
fe2cb9f4cb
Merge pull request #110 from mhbliao/hliao/master/swdev-198268
...
Revise the previous fix to use the canonical path to HSA.
[ROCm/rccl commit: 315f792f83 ]
2019-08-01 12:46:25 -07:00
Michael LIAO
c14ef9f408
Revise the previous fix to use the canonical path to HSA.
...
- This fix the build failures under certain environments.
[ROCm/rccl commit: 4f2aa06688 ]
2019-08-01 14:50:44 -04:00
Wenkai Du
4d9eb5bd76
Merge pull request #107 from mhbliao/hliao/master/swdev-198268
...
Fix build with hip-clang
[ROCm/rccl commit: 9189279220 ]
2019-08-01 08:58:37 -07:00
Wenkai Du
2dcb42effd
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
...
[ROCm/rccl commit: e7022e9196 ]
2019-08-01 04:26:37 +00:00
Michael LIAO
4b5bf9f227
Fix build with hip-clang
...
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
`hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.
[ROCm/rccl commit: 41310144f6 ]
2019-07-31 15:07:36 -04:00
Wenkai Du
6688279075
Add gfx908 target ( #106 )
...
[ROCm/rccl commit: 1969e89003 ]
2019-07-30 13:56:45 -07:00
Wenkai Du
62e6e67e31
Remove extra "." from version string ( #104 )
...
[ROCm/rccl commit: 1fee6f9d50 ]
2019-07-25 15:25:02 -07:00
saadrahim
596e200499
Changing to rocm-cmake new style versioning ( #103 )
...
[ROCm/rccl commit: fdee095dd3 ]
2019-07-22 23:40:13 +00:00
Wenkai Du
d7f25d5be7
Use hipExtLaunchMultiKernelMultiDevice API ( #100 )
...
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232
[ROCm/rccl commit: 0522041fac ]
2019-07-18 09:02:37 -07:00
Wenkai Du
25d29e97d1
Increase debug print of ring topology to 64 ranks ( #99 )
...
[ROCm/rccl commit: dc1908e944 ]
2019-07-16 14:54:17 -07:00
Wenkai Du
602292685d
Allocate transport memory based on numa node ( #97 )
...
[ROCm/rccl commit: 43bd6f5fbf ]
2019-07-15 11:45:38 -07:00
gilbertlee-amd
4310b5b4c1
Adding explicit HDP flush when using RDMA via Infiniband ( #95 )
...
* Adding explicit HDP flush when using RDMA via Infiniband
[ROCm/rccl commit: 7b6332d3d0 ]
2019-07-10 16:29:02 -06:00
Wenkai Du
b7322c800a
Refactor primitive test to support multiple GPUs in rings ( #94 )
...
* Refactor primitive test to support multiple GPUs in rings
* Make GPUs sync before transfer optional
* Use same ring format as RCCL
* Extend to 8 GPUs and report errors if there is no P2P access
* Control GPUs sync before ops from command line with "-s" option
* Change buffer size through command line option "-n"
Rename iterations command line option to "-i"
[ROCm/rccl commit: 70804da15b ]
2019-07-05 14:29:20 -07:00
Wenkai Du
20975921dd
Fix share memory collision in multi-communicator case. ( #93 )
...
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.
Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm
[ROCm/rccl commit: 949d680e49 ]
2019-07-02 09:27:16 -07:00
Wenkai Du
90b7a02944
Match primitives unroll counts with latest RCCL ( #91 )
...
[ROCm/rccl commit: e6a0da444f ]
2019-06-26 15:09:13 -07:00
Stanley Tsang
6aa817d768
Fixing install script to actually install library when requested ( #88 )
...
* Fixing install script to actually install library when requested. Cleaning up unused code.
Removing unused arguments from install script.
Fixing weird whitespacing
* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib
* Updates and corrections to README and install script
[ROCm/rccl commit: 329a62a01f ]
2019-06-25 17:25:21 -06:00
saadrahim
239c7bdf44
Changing maintainer to no-reply to fix deb generation ( #86 )
...
[ROCm/rccl commit: 840f8715ef ]
2019-06-24 17:13:57 -06:00
saadrahim
f437e903f1
Merge pull request #83 from ROCmSoftwarePlatform/devel
...
Devel to Master
[ROCm/rccl commit: 0de9051ace ]
2019-06-24 14:25:18 -06:00
saadrahim
789c0b828e
Fixing Centos 7 Packaging and package versioning/maintainer ( #82 )
...
- Fixing Centos 7 Packaging
- standardizing version numbers for release to use rocm versioning
- removing maintainer email based on legal's input
[ROCm/rccl commit: 1c7b0bd878 ]
2019-06-24 14:22:16 -06:00
Wenkai Du
17530a2a6f
Use different unroll numbers for copy and reduce ( #81 )
...
* Use different unroll numbers for copy and reduce
* use 4 separate unroll factors
[ROCm/rccl commit: bb5e42bac0 ]
2019-06-19 16:36:16 -07:00
Jeff Daily
53b1ca1d7f
do not use internal stream ( #79 )
...
[ROCm/rccl commit: 754ed213cc ]
2019-06-12 16:26:59 -06:00
Wenkai Du
87d5441552
Calculate and print kernel throughput ( #78 )
...
* rccl-prim-test: print GPU info and set iterations
* Calculate and print kernel throughput
[ROCm/rccl commit: ee14676064 ]
2019-06-07 10:39:30 -07:00
Wenkai Du
dcb2801f25
rccl-prim-test: print GPU info and set iterations ( #77 )
...
[ROCm/rccl commit: 42b488507d ]
2019-06-05 15:16:33 -07:00
Wenkai Du
a8fbf5555c
Implement HDP flush when transfer data over PCIe P2P ( #75 )
...
* Implement HDP flush when transfer data over PCIe P2P
* Add some descriptions to HDP flushing
* Fix for review comments
[ROCm/rccl commit: b7a6307371 ]
2019-06-03 16:29:55 -07:00
Wenkai Du
9bd033992f
Merge pull request #76 from ROCmSoftwarePlatform/fix-indirect-call
...
Make ncclFuncs static
[ROCm/rccl commit: 8c974f1f50 ]
2019-05-29 12:04:58 -07:00
Yaxun Sam Liu
dff9e760a0
Make ncclFuncs static
...
This is necessary to constant propagate the function pointers
to eliminate the indirect function call.
[ROCm/rccl commit: 5827a4f616 ]
2019-05-29 10:50:13 -04:00
Wenkai Du
a647ae9bac
Merge pull request #74 from saadrahim/readmeUpdate
...
Readme update
[ROCm/rccl commit: c85d629355 ]
2019-05-24 20:27:08 -07:00
Saad Rahim
a5d9580a99
Adding NVIDIA copyright
...
[ROCm/rccl commit: 0c0a8ed86f ]
2019-05-24 15:05:00 -07:00
Saad Rahim
07d0f15687
Fixing whitespace
...
[ROCm/rccl commit: 02ef2d27e6 ]
2019-05-24 14:49:12 -07:00
Saad Rahim
7d340ae2a2
Adding link to readthedocs
...
[ROCm/rccl commit: fac7ef9370 ]
2019-05-24 14:48:24 -07:00
Wenkai Du
a804727a7c
Merge pull request #72 from wenkaidu/default_rings
...
Increase number of rings with XGMI connection
[ROCm/rccl commit: 9a0ac849fa ]
2019-05-24 14:42:54 -07:00
saadrahim
b90e705679
Readthedocs documentation support ( #71 )
...
[ROCm/rccl commit: bb7542c1d9 ]
2019-05-24 15:03:56 -06:00
Wenkai Du
5fdf2edd39
Increase number of rings with XGMI connection
...
Improve throughput for about 20%. Also remove P2P over PCIe which was
left enabled at initial release.
Signed-off-by: Wenkai Du <wenkai.du@amd.com >
[ROCm/rccl commit: f45566a8bd ]
2019-05-24 20:58:51 +00:00
Yaxun (Sam) Liu
7b4b3e2981
Fix build failure for hip-clang ( #69 )
...
[ROCm/rccl commit: b921279a21 ]
2019-05-23 16:53:25 -06:00
Wenkai Du
0ed10b1e4d
Add RCCL primitive testing ( #70 )
...
[ROCm/rccl commit: 1bb6d2104c ]
2019-05-23 16:52:17 -06:00
saadrahim
9d9fd68215
Jenkinsfile ( #65 )
...
* Changing Jenkinsfile to support runs without docker
* Updating install file for build options
* Fixing command execution
* Fixing Jenkinsfile
* fixing test execution
* Removing junit search
[ROCm/rccl commit: 4c4351673b ]
2019-05-22 15:32:32 -06:00
saadrahim
af09015f8d
Updating readme for 2.5 release ( #67 )
...
[ROCm/rccl commit: 42c3e4b93d ]
2019-05-22 15:31:12 -06:00
gilbertlee-amd
336883ef2b
Test combined calls ( #64 )
...
* Adding test for queueing multiple different collectives, 1 device per thread
[ROCm/rccl commit: ffe2054ed2 ]
2019-05-22 15:30:37 -06:00
Aaron Enye Shi
6201fd9645
Update README to note install rocm-cmake ( #68 )
...
[ROCm/rccl commit: 6e8f40eb22 ]
2019-05-22 15:29:59 -06:00
gilbertlee-amd
08a65f2201
Adding fix for unsufficient devices / better logging for skipped tests ( #63 )
...
[ROCm/rccl commit: a115f577dd ]
2019-05-21 14:34:20 -06:00
Stanley Tsang
7c60e997e0
Renaming jenkinsfile
...
[ROCm/rccl commit: afa945d6e6 ]
2019-05-21 15:54:41 +00:00
Wenkai Du
b815e21d58
Remove extra compiler path setting
...
[ROCm/rccl commit: 4bfa506a6b ]
2019-05-21 00:08:42 +00:00
Wenkai Du
d42406be17
By default will not build test program
...
[ROCm/rccl commit: e517dbed5c ]
2019-05-20 18:37:58 +00:00
gilbertlee-amd
2215ef431d
Merge pull request #62 from gilbertlee-amd/AlignmentTests
...
Adding support for alignment tests via sub-datasets
[ROCm/rccl commit: c57ab960ff ]
2019-05-18 10:54:52 -06:00
Gilbert Lee
57ac9a8a93
Adding support for alignment tests via sub-datasets
...
Added sample alignment test for AllGather
Datasets no longer free memory on destruction so Release() must be used
[ROCm/rccl commit: a50c852851 ]
2019-05-18 00:04:03 +00:00