Gráfico de Commits

193 Commits

Autor SHA1 Mensagem Data
Stanley Tsang de09bece99 Removing unnecessary device collective source files.
[ROCm/rccl commit: 3a61907182]
2019-08-12 18:23:23 +00:00
gilbertlee-amd 8645391260 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool

[ROCm/rccl commit: b8cf48fc16]
2019-08-07 17:21:41 -06:00
Wenkai Du abab7569f9 Merge pull request #112 from wenkaidu/hdp
Get HDP register address from hipDeviceGetAttribute API

[ROCm/rccl commit: f1c727d4ce]
2019-08-05 14:27:19 -07:00
Wenkai Du 909e014b51 Get HDP register address from hipDeviceGetAttribute API
[ROCm/rccl commit: 84d3344796]
2019-08-05 14:14:09 -07:00
Wenkai Du b540c55c9b Merge pull request #108 from wenkaidu/xgmi_finegrain
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link

[ROCm/rccl commit: 4a9bdd8539]
2019-08-02 10:00:48 -07:00
Wenkai Du fe2cb9f4cb Merge pull request #110 from mhbliao/hliao/master/swdev-198268
Revise the previous fix to use the canonical path to HSA.

[ROCm/rccl commit: 315f792f83]
2019-08-01 12:46:25 -07:00
Michael LIAO c14ef9f408 Revise the previous fix to use the canonical path to HSA.
- This fix the build failures under certain environments.


[ROCm/rccl commit: 4f2aa06688]
2019-08-01 14:50:44 -04:00
Wenkai Du 4d9eb5bd76 Merge pull request #107 from mhbliao/hliao/master/swdev-198268
Fix build with hip-clang

[ROCm/rccl commit: 9189279220]
2019-08-01 08:58:37 -07:00
Wenkai Du 2dcb42effd Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
[ROCm/rccl commit: e7022e9196]
2019-08-01 04:26:37 +00:00
Michael LIAO 4b5bf9f227 Fix build with hip-clang
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
  `hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.


[ROCm/rccl commit: 41310144f6]
2019-07-31 15:07:36 -04:00
Wenkai Du 6688279075 Add gfx908 target (#106)
[ROCm/rccl commit: 1969e89003]
2019-07-30 13:56:45 -07:00
Wenkai Du 62e6e67e31 Remove extra "." from version string (#104)
[ROCm/rccl commit: 1fee6f9d50]
2019-07-25 15:25:02 -07:00
saadrahim 596e200499 Changing to rocm-cmake new style versioning (#103)
[ROCm/rccl commit: fdee095dd3]
2019-07-22 23:40:13 +00:00
Wenkai Du d7f25d5be7 Use hipExtLaunchMultiKernelMultiDevice API (#100)
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232

[ROCm/rccl commit: 0522041fac]
2019-07-18 09:02:37 -07:00
Wenkai Du 25d29e97d1 Increase debug print of ring topology to 64 ranks (#99)
[ROCm/rccl commit: dc1908e944]
2019-07-16 14:54:17 -07:00
Wenkai Du 602292685d Allocate transport memory based on numa node (#97)
[ROCm/rccl commit: 43bd6f5fbf]
2019-07-15 11:45:38 -07:00
gilbertlee-amd 4310b5b4c1 Adding explicit HDP flush when using RDMA via Infiniband (#95)
* Adding explicit HDP flush when using RDMA via Infiniband



[ROCm/rccl commit: 7b6332d3d0]
2019-07-10 16:29:02 -06:00
Wenkai Du b7322c800a Refactor primitive test to support multiple GPUs in rings (#94)
* Refactor primitive test to support multiple GPUs in rings

* Make GPUs sync before transfer optional

* Use same ring format as RCCL

* Extend to 8 GPUs and report errors if there is no P2P access

* Control GPUs sync before ops from command line with "-s" option

* Change buffer size through command line option "-n"

Rename iterations command line option to "-i"


[ROCm/rccl commit: 70804da15b]
2019-07-05 14:29:20 -07:00
Wenkai Du 20975921dd Fix share memory collision in multi-communicator case. (#93)
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.

Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm

[ROCm/rccl commit: 949d680e49]
2019-07-02 09:27:16 -07:00
Wenkai Du 90b7a02944 Match primitives unroll counts with latest RCCL (#91)
[ROCm/rccl commit: e6a0da444f]
2019-06-26 15:09:13 -07:00
Stanley Tsang 6aa817d768 Fixing install script to actually install library when requested (#88)
* Fixing install script to actually install library when requested.  Cleaning up unused code.

Removing unused arguments from install script.

Fixing weird whitespacing

* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib

* Updates and corrections to README and install script


[ROCm/rccl commit: 329a62a01f]
2019-06-25 17:25:21 -06:00
saadrahim 239c7bdf44 Changing maintainer to no-reply to fix deb generation (#86)
[ROCm/rccl commit: 840f8715ef]
2019-06-24 17:13:57 -06:00
saadrahim f437e903f1 Merge pull request #83 from ROCmSoftwarePlatform/devel
Devel to Master

[ROCm/rccl commit: 0de9051ace]
2019-06-24 14:25:18 -06:00
saadrahim 789c0b828e Fixing Centos 7 Packaging and package versioning/maintainer (#82)
- Fixing Centos 7 Packaging
- standardizing version numbers for release to use rocm versioning
- removing maintainer email based on legal's input


[ROCm/rccl commit: 1c7b0bd878]
2019-06-24 14:22:16 -06:00
Wenkai Du 17530a2a6f Use different unroll numbers for copy and reduce (#81)
* Use different unroll numbers for copy and reduce

* use 4 separate unroll factors


[ROCm/rccl commit: bb5e42bac0]
2019-06-19 16:36:16 -07:00
Jeff Daily 53b1ca1d7f do not use internal stream (#79)
[ROCm/rccl commit: 754ed213cc]
2019-06-12 16:26:59 -06:00
Wenkai Du 87d5441552 Calculate and print kernel throughput (#78)
* rccl-prim-test: print GPU info and set iterations

* Calculate and print kernel throughput


[ROCm/rccl commit: ee14676064]
2019-06-07 10:39:30 -07:00
Wenkai Du dcb2801f25 rccl-prim-test: print GPU info and set iterations (#77)
[ROCm/rccl commit: 42b488507d]
2019-06-05 15:16:33 -07:00
Wenkai Du a8fbf5555c Implement HDP flush when transfer data over PCIe P2P (#75)
* Implement HDP flush when transfer data over PCIe P2P
* Add some descriptions to HDP flushing
* Fix for review comments


[ROCm/rccl commit: b7a6307371]
2019-06-03 16:29:55 -07:00
Wenkai Du 9bd033992f Merge pull request #76 from ROCmSoftwarePlatform/fix-indirect-call
Make ncclFuncs static

[ROCm/rccl commit: 8c974f1f50]
2019-05-29 12:04:58 -07:00
Yaxun Sam Liu dff9e760a0 Make ncclFuncs static
This is necessary to constant propagate the function pointers
to eliminate the indirect function call.


[ROCm/rccl commit: 5827a4f616]
2019-05-29 10:50:13 -04:00
Wenkai Du a647ae9bac Merge pull request #74 from saadrahim/readmeUpdate
Readme update

[ROCm/rccl commit: c85d629355]
2019-05-24 20:27:08 -07:00
Saad Rahim a5d9580a99 Adding NVIDIA copyright
[ROCm/rccl commit: 0c0a8ed86f]
2019-05-24 15:05:00 -07:00
Saad Rahim 07d0f15687 Fixing whitespace
[ROCm/rccl commit: 02ef2d27e6]
2019-05-24 14:49:12 -07:00
Saad Rahim 7d340ae2a2 Adding link to readthedocs
[ROCm/rccl commit: fac7ef9370]
2019-05-24 14:48:24 -07:00
Wenkai Du a804727a7c Merge pull request #72 from wenkaidu/default_rings
Increase number of rings with XGMI connection

[ROCm/rccl commit: 9a0ac849fa]
2019-05-24 14:42:54 -07:00
saadrahim b90e705679 Readthedocs documentation support (#71)
[ROCm/rccl commit: bb7542c1d9]
2019-05-24 15:03:56 -06:00
Wenkai Du 5fdf2edd39 Increase number of rings with XGMI connection
Improve throughput for about 20%. Also remove P2P over PCIe which was
left enabled at initial release.

Signed-off-by: Wenkai Du <wenkai.du@amd.com>


[ROCm/rccl commit: f45566a8bd]
2019-05-24 20:58:51 +00:00
Yaxun (Sam) Liu 7b4b3e2981 Fix build failure for hip-clang (#69)
[ROCm/rccl commit: b921279a21]
2019-05-23 16:53:25 -06:00
Wenkai Du 0ed10b1e4d Add RCCL primitive testing (#70)
[ROCm/rccl commit: 1bb6d2104c]
2019-05-23 16:52:17 -06:00
saadrahim 9d9fd68215 Jenkinsfile (#65)
* Changing Jenkinsfile to support runs without docker
* Updating install file for build options
* Fixing command execution
* Fixing Jenkinsfile
* fixing test execution
* Removing junit search


[ROCm/rccl commit: 4c4351673b]
2019-05-22 15:32:32 -06:00
saadrahim af09015f8d Updating readme for 2.5 release (#67)
[ROCm/rccl commit: 42c3e4b93d]
2019-05-22 15:31:12 -06:00
gilbertlee-amd 336883ef2b Test combined calls (#64)
* Adding test for queueing multiple different collectives, 1 device per thread


[ROCm/rccl commit: ffe2054ed2]
2019-05-22 15:30:37 -06:00
Aaron Enye Shi 6201fd9645 Update README to note install rocm-cmake (#68)
[ROCm/rccl commit: 6e8f40eb22]
2019-05-22 15:29:59 -06:00
gilbertlee-amd 08a65f2201 Adding fix for unsufficient devices / better logging for skipped tests (#63)
[ROCm/rccl commit: a115f577dd]
2019-05-21 14:34:20 -06:00
Stanley Tsang 7c60e997e0 Renaming jenkinsfile
[ROCm/rccl commit: afa945d6e6]
2019-05-21 15:54:41 +00:00
Wenkai Du b815e21d58 Remove extra compiler path setting
[ROCm/rccl commit: 4bfa506a6b]
2019-05-21 00:08:42 +00:00
Wenkai Du d42406be17 By default will not build test program
[ROCm/rccl commit: e517dbed5c]
2019-05-20 18:37:58 +00:00
gilbertlee-amd 2215ef431d Merge pull request #62 from gilbertlee-amd/AlignmentTests
Adding support for alignment tests via sub-datasets

[ROCm/rccl commit: c57ab960ff]
2019-05-18 10:54:52 -06:00
Gilbert Lee 57ac9a8a93 Adding support for alignment tests via sub-datasets
Added sample alignment test for AllGather
Datasets no longer free memory on destruction so Release() must be used


[ROCm/rccl commit: a50c852851]
2019-05-18 00:04:03 +00:00