Граф коммитов

197 Коммитов

Автор SHA1 Сообщение Дата
rpathani eaa1cdb48c Merge branch 'master' into xgmi_bench
[ROCm/rccl commit: deea20d49c]
2019-08-16 10:56:56 +05:30
Michael LIAO f4a240065f Fix build with hip-clang.
- Add necessary function attribute for HIP programming model.
- Explicitly include hsa headers.


[ROCm/rccl commit: 9369f8d75d]
2019-08-15 14:56:04 -04:00
Wenkai Du d4862fa605 Tune LL threshold for VEGA
Also move abort check after SPINS_BEFORE_CHECK_ABORT as NCCL


[ROCm/rccl commit: 2223cccf15]
2019-08-15 09:16:11 -07:00
Wenkai Du 93c44e96cb Default to minimal 2 rings and improve LL loop
[ROCm/rccl commit: 4b77a16f3f]
2019-08-14 14:12:56 -07:00
Wenkai Du 1feef99e7d Remove duplicate line
[ROCm/rccl commit: 5782a8d857]
2019-08-14 13:22:43 -07:00
Wenkai Du 6047487815 RCCL 2.4 update
[ROCm/rccl commit: f11c8f60cd]
2019-08-14 10:42:35 -07:00
rohit pathania 2dbcb62caf Modified the code to use RTC clock frequency based on gpu gcn id
[ROCm/rccl commit: 65e2f5d87b]
2019-08-14 12:55:12 +05:30
rohit pathania 042261445d Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
# Conflicts:
#	tools/rccl-prim-test/rccl_prim_test.cpp


[ROCm/rccl commit: 0f74929dab]
2019-08-13 11:36:56 +05:30
rohit pathania 86f6d95b06 Adding linkinfo and srcGPU to destGPU info
[ROCm/rccl commit: 3bbf924ff8]
2019-08-13 11:28:50 +05:30
rohit pathania 95162665c7 Adding linkinfo and srcGPU to destGPU info
[ROCm/rccl commit: 5a2f74b8d0]
2019-08-09 12:44:06 +05:30
gilbertlee-amd 8645391260 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool

[ROCm/rccl commit: b8cf48fc16]
2019-08-07 17:21:41 -06:00
Wenkai Du 909e014b51 Get HDP register address from hipDeviceGetAttribute API
[ROCm/rccl commit: 84d3344796]
2019-08-05 14:14:09 -07:00
Wenkai Du b540c55c9b Merge pull request #108 from wenkaidu/xgmi_finegrain
Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link

[ROCm/rccl commit: 4a9bdd8539]
2019-08-02 10:00:48 -07:00
Michael LIAO c14ef9f408 Revise the previous fix to use the canonical path to HSA.
- This fix the build failures under certain environments.


[ROCm/rccl commit: 4f2aa06688]
2019-08-01 14:50:44 -04:00
Wenkai Du 2dcb42effd Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link
[ROCm/rccl commit: e7022e9196]
2019-08-01 04:26:37 +00:00
Michael LIAO 4b5bf9f227 Fix build with hip-clang
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
  `hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.


[ROCm/rccl commit: 41310144f6]
2019-07-31 15:07:36 -04:00
Wenkai Du 6688279075 Add gfx908 target (#106)
[ROCm/rccl commit: 1969e89003]
2019-07-30 13:56:45 -07:00
Wenkai Du 62e6e67e31 Remove extra "." from version string (#104)
[ROCm/rccl commit: 1fee6f9d50]
2019-07-25 15:25:02 -07:00
saadrahim 596e200499 Changing to rocm-cmake new style versioning (#103)
[ROCm/rccl commit: fdee095dd3]
2019-07-22 23:40:13 +00:00
Wenkai Du d7f25d5be7 Use hipExtLaunchMultiKernelMultiDevice API (#100)
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232

[ROCm/rccl commit: 0522041fac]
2019-07-18 09:02:37 -07:00
Ke Wen a66ab68630 Fix NIC distances for 11+ NICs
[ROCm/rccl commit: 4d579e51cc]
2019-07-17 06:32:33 -07:00
Ke Wen 5c5c58c73b Fix #224: prevent number of IB devices from going out of bound
[ROCm/rccl commit: 920ae57c14]
2019-07-17 06:32:33 -07:00
Wenkai Du 25d29e97d1 Increase debug print of ring topology to 64 ranks (#99)
[ROCm/rccl commit: dc1908e944]
2019-07-16 14:54:17 -07:00
Wenkai Du 602292685d Allocate transport memory based on numa node (#97)
[ROCm/rccl commit: 43bd6f5fbf]
2019-07-15 11:45:38 -07:00
Ke Wen 4211da6d29 Size up IPC buffers to multiples of 2MB
Avoid potential CUDA error in concurrent communicator initialization


[ROCm/rccl commit: c8c68fb5f7]
2019-07-12 09:50:17 -07:00
gilbertlee-amd 4310b5b4c1 Adding explicit HDP flush when using RDMA via Infiniband (#95)
* Adding explicit HDP flush when using RDMA via Infiniband



[ROCm/rccl commit: 7b6332d3d0]
2019-07-10 16:29:02 -06:00
Hirochika Asai ee08e8b421 Add the exact matching modifier support "=" to the NCCL_IB_HCA variable (#236)
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.

[ROCm/rccl commit: 0b192d2299]
2019-07-09 14:45:41 -07:00
Wenkai Du b7322c800a Refactor primitive test to support multiple GPUs in rings (#94)
* Refactor primitive test to support multiple GPUs in rings

* Make GPUs sync before transfer optional

* Use same ring format as RCCL

* Extend to 8 GPUs and report errors if there is no P2P access

* Control GPUs sync before ops from command line with "-s" option

* Change buffer size through command line option "-n"

Rename iterations command line option to "-i"


[ROCm/rccl commit: 70804da15b]
2019-07-05 14:29:20 -07:00
Wenkai Du 20975921dd Fix share memory collision in multi-communicator case. (#93)
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.

Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm

[ROCm/rccl commit: 949d680e49]
2019-07-02 09:27:16 -07:00
Wenkai Du 90b7a02944 Match primitives unroll counts with latest RCCL (#91)
[ROCm/rccl commit: e6a0da444f]
2019-06-26 15:09:13 -07:00
Stanley Tsang 6aa817d768 Fixing install script to actually install library when requested (#88)
* Fixing install script to actually install library when requested.  Cleaning up unused code.

Removing unused arguments from install script.

Fixing weird whitespacing

* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib

* Updates and corrections to README and install script


[ROCm/rccl commit: 329a62a01f]
2019-06-25 17:25:21 -06:00
Ke Wen 3c13a4d1bb Merge branch 'master' into HEAD
[ROCm/rccl commit: 8e04d80382]
2019-06-25 13:39:08 -07:00
Ke Wen b91d8170f8 2.4.8-1
Fix #209: improve socket transport performance
  Split transfers over multiple sockets
  Launch multiple threads to drive sockets
  Detect AWS NICs and set nsockets/nthreads accordingly


[ROCm/rccl commit: 7c72dee660]
2019-06-25 13:22:47 -07:00
saadrahim 239c7bdf44 Changing maintainer to no-reply to fix deb generation (#86)
[ROCm/rccl commit: 840f8715ef]
2019-06-24 17:13:57 -06:00
saadrahim f437e903f1 Merge pull request #83 from ROCmSoftwarePlatform/devel
Devel to Master

[ROCm/rccl commit: 0de9051ace]
2019-06-24 14:25:18 -06:00
saadrahim 789c0b828e Fixing Centos 7 Packaging and package versioning/maintainer (#82)
- Fixing Centos 7 Packaging
- standardizing version numbers for release to use rocm versioning
- removing maintainer email based on legal's input


[ROCm/rccl commit: 1c7b0bd878]
2019-06-24 14:22:16 -06:00
Felix Abecassis d2f579ba8b Fix out-of-bounds read in ncclStrToCpuset (#233)
The affinityStr string was not null-terminated but was passed to strlen(3).

Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>

[ROCm/rccl commit: 37e4f8729e]
2019-06-21 10:25:08 +02:00
Wenkai Du 17530a2a6f Use different unroll numbers for copy and reduce (#81)
* Use different unroll numbers for copy and reduce

* use 4 separate unroll factors


[ROCm/rccl commit: bb5e42bac0]
2019-06-19 16:36:16 -07:00
Jeff Daily 53b1ca1d7f do not use internal stream (#79)
[ROCm/rccl commit: 754ed213cc]
2019-06-12 16:26:59 -06:00
Wenkai Du 87d5441552 Calculate and print kernel throughput (#78)
* rccl-prim-test: print GPU info and set iterations

* Calculate and print kernel throughput


[ROCm/rccl commit: ee14676064]
2019-06-07 10:39:30 -07:00
Wenkai Du dcb2801f25 rccl-prim-test: print GPU info and set iterations (#77)
[ROCm/rccl commit: 42b488507d]
2019-06-05 15:16:33 -07:00
Wenkai Du a8fbf5555c Implement HDP flush when transfer data over PCIe P2P (#75)
* Implement HDP flush when transfer data over PCIe P2P
* Add some descriptions to HDP flushing
* Fix for review comments


[ROCm/rccl commit: b7a6307371]
2019-06-03 16:29:55 -07:00
Yaxun Sam Liu dff9e760a0 Make ncclFuncs static
This is necessary to constant propagate the function pointers
to eliminate the indirect function call.


[ROCm/rccl commit: 5827a4f616]
2019-05-29 10:50:13 -04:00
Saad Rahim a5d9580a99 Adding NVIDIA copyright
[ROCm/rccl commit: 0c0a8ed86f]
2019-05-24 15:05:00 -07:00
Saad Rahim 07d0f15687 Fixing whitespace
[ROCm/rccl commit: 02ef2d27e6]
2019-05-24 14:49:12 -07:00
Saad Rahim 7d340ae2a2 Adding link to readthedocs
[ROCm/rccl commit: fac7ef9370]
2019-05-24 14:48:24 -07:00
saadrahim b90e705679 Readthedocs documentation support (#71)
[ROCm/rccl commit: bb7542c1d9]
2019-05-24 15:03:56 -06:00
Wenkai Du 5fdf2edd39 Increase number of rings with XGMI connection
Improve throughput for about 20%. Also remove P2P over PCIe which was
left enabled at initial release.

Signed-off-by: Wenkai Du <wenkai.du@amd.com>


[ROCm/rccl commit: f45566a8bd]
2019-05-24 20:58:51 +00:00
Yaxun (Sam) Liu 7b4b3e2981 Fix build failure for hip-clang (#69)
[ROCm/rccl commit: b921279a21]
2019-05-23 16:53:25 -06:00
Wenkai Du 0ed10b1e4d Add RCCL primitive testing (#70)
[ROCm/rccl commit: 1bb6d2104c]
2019-05-23 16:52:17 -06:00