Wykres commitów

303 Commity

Autor SHA1 Wiadomość Data
Michael LIAO 4f2aa06688 Revise the previous fix to use the canonical path to HSA.
- This fix the build failures under certain environments.
2019-08-01 14:50:44 -04:00
Wenkai Du 9189279220 Merge pull request #107 from mhbliao/hliao/master/swdev-198268
Fix build with hip-clang
2019-08-01 08:58:37 -07:00
Wenkai Du e7022e9196 Remove dependency to HSA_FORCE_FINE_GRAIN_PCIE flag for XGMI link 2019-08-01 04:26:37 +00:00
Michael LIAO 41310144f6 Fix build with hip-clang
Two minor issues are solved:
+ Enclose the kernel function with parenthesis as hip-clang defines
  `hipLaunchKernelGGL` as macro.
+ Need to explicitly include <hsa.h> for hip-clang.
2019-07-31 15:07:36 -04:00
Cao Zongyan bfb3921519 Refine RPM package building spec file.
Add /sbin/ldconfig into RPM package install operations.
2019-07-31 10:36:22 -07:00
Wenkai Du 1969e89003 Add gfx908 target (#106) 2019-07-30 13:56:45 -07:00
Wenkai Du 1fee6f9d50 Remove extra "." from version string (#104) 2019-07-25 15:25:02 -07:00
saadrahim fdee095dd3 Changing to rocm-cmake new style versioning (#103) 2019-07-22 23:40:13 +00:00
Wenkai Du 0522041fac Use hipExtLaunchMultiKernelMultiDevice API (#100)
Depends on HIP version with this pull request:
https://github.com/ROCm-Developer-Tools/HIP/pull/1232
2019-07-18 09:02:37 -07:00
Ke Wen 4d579e51cc Fix NIC distances for 11+ NICs 2019-07-17 06:32:33 -07:00
Ke Wen 920ae57c14 Fix #224: prevent number of IB devices from going out of bound 2019-07-17 06:32:33 -07:00
Wenkai Du dc1908e944 Increase debug print of ring topology to 64 ranks (#99) 2019-07-16 14:54:17 -07:00
Wenkai Du 43bd6f5fbf Allocate transport memory based on numa node (#97) 2019-07-15 11:45:38 -07:00
Ke Wen c8c68fb5f7 Size up IPC buffers to multiples of 2MB
Avoid potential CUDA error in concurrent communicator initialization
2019-07-12 09:50:17 -07:00
gilbertlee-amd 7b6332d3d0 Adding explicit HDP flush when using RDMA via Infiniband (#95)
* Adding explicit HDP flush when using RDMA via Infiniband
2019-07-10 16:29:02 -06:00
Hirochika Asai 0b192d2299 Add the exact matching modifier support "=" to the NCCL_IB_HCA variable (#236)
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
2019-07-09 14:45:41 -07:00
Wenkai Du 70804da15b Refactor primitive test to support multiple GPUs in rings (#94)
* Refactor primitive test to support multiple GPUs in rings

* Make GPUs sync before transfer optional

* Use same ring format as RCCL

* Extend to 8 GPUs and report errors if there is no P2P access

* Control GPUs sync before ops from command line with "-s" option

* Change buffer size through command line option "-n"

Rename iterations command line option to "-i"
2019-07-05 14:29:20 -07:00
Wenkai Du 949d680e49 Fix share memory collision in multi-communicator case. (#93)
Current SHM object name would only use pidHash and ranks as
identification, which would collide each other when program runs with
multiple communicators. Here we added commId info into pidHash, it makes
'pidHash'es of different communicators keeping in same process will be
distincted with each other.

Ported from original commit: https://github.com/lowintelligence/nccl/commits/shm
2019-07-02 09:27:16 -07:00
Wenkai Du e6a0da444f Match primitives unroll counts with latest RCCL (#91) 2019-06-26 15:09:13 -07:00
Stanley Tsang 329a62a01f Fixing install script to actually install library when requested (#88)
* Fixing install script to actually install library when requested.  Cleaning up unused code.

Removing unused arguments from install script.

Fixing weird whitespacing

* Fixing install script to install to correct location /opt/rocm, now creates symlink in /opt/rocm/lib

* Updates and corrections to README and install script
2019-06-25 17:25:21 -06:00
Ke Wen 8e04d80382 Merge branch 'master' into HEAD 2019-06-25 13:39:08 -07:00
Ke Wen 7c72dee660 2.4.8-1
Fix #209: improve socket transport performance
  Split transfers over multiple sockets
  Launch multiple threads to drive sockets
  Detect AWS NICs and set nsockets/nthreads accordingly
2019-06-25 13:22:47 -07:00
saadrahim 840f8715ef Changing maintainer to no-reply to fix deb generation (#86) 2019-06-24 17:13:57 -06:00
saadrahim 0de9051ace Merge pull request #83 from ROCmSoftwarePlatform/devel
Devel to Master
2019-06-24 14:25:18 -06:00
saadrahim 1c7b0bd878 Fixing Centos 7 Packaging and package versioning/maintainer (#82)
- Fixing Centos 7 Packaging
- standardizing version numbers for release to use rocm versioning
- removing maintainer email based on legal's input
2019-06-24 14:22:16 -06:00
Felix Abecassis 37e4f8729e Fix out-of-bounds read in ncclStrToCpuset (#233)
The affinityStr string was not null-terminated but was passed to strlen(3).

Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2019-06-21 10:25:08 +02:00
Wenkai Du bb5e42bac0 Use different unroll numbers for copy and reduce (#81)
* Use different unroll numbers for copy and reduce

* use 4 separate unroll factors
2019-06-19 16:36:16 -07:00
Jeff Daily 754ed213cc do not use internal stream (#79) 2019-06-12 16:26:59 -06:00
Wenkai Du ee14676064 Calculate and print kernel throughput (#78)
* rccl-prim-test: print GPU info and set iterations

* Calculate and print kernel throughput
2019-06-07 10:39:30 -07:00
Wenkai Du 42b488507d rccl-prim-test: print GPU info and set iterations (#77) 2019-06-05 15:16:33 -07:00
Wenkai Du b7a6307371 Implement HDP flush when transfer data over PCIe P2P (#75)
* Implement HDP flush when transfer data over PCIe P2P
* Add some descriptions to HDP flushing
* Fix for review comments
2019-06-03 16:29:55 -07:00
Wenkai Du 8c974f1f50 Merge pull request #76 from ROCmSoftwarePlatform/fix-indirect-call
Make ncclFuncs static
2019-05-29 12:04:58 -07:00
Yaxun Sam Liu 5827a4f616 Make ncclFuncs static
This is necessary to constant propagate the function pointers
to eliminate the indirect function call.
2019-05-29 10:50:13 -04:00
Wenkai Du c85d629355 Merge pull request #74 from saadrahim/readmeUpdate
Readme update
2019-05-24 20:27:08 -07:00
Saad Rahim 0c0a8ed86f Adding NVIDIA copyright 2019-05-24 15:05:00 -07:00
Saad Rahim 02ef2d27e6 Fixing whitespace 2019-05-24 14:49:12 -07:00
Saad Rahim fac7ef9370 Adding link to readthedocs 2019-05-24 14:48:24 -07:00
Wenkai Du 9a0ac849fa Merge pull request #72 from wenkaidu/default_rings
Increase number of rings with XGMI connection
2019-05-24 14:42:54 -07:00
saadrahim bb7542c1d9 Readthedocs documentation support (#71) 2019-05-24 15:03:56 -06:00
Wenkai Du f45566a8bd Increase number of rings with XGMI connection
Improve throughput for about 20%. Also remove P2P over PCIe which was
left enabled at initial release.

Signed-off-by: Wenkai Du <wenkai.du@amd.com>
2019-05-24 20:58:51 +00:00
Yaxun (Sam) Liu b921279a21 Fix build failure for hip-clang (#69) 2019-05-23 16:53:25 -06:00
Wenkai Du 1bb6d2104c Add RCCL primitive testing (#70) 2019-05-23 16:52:17 -06:00
Rajat Chopra 6d8b2421bc Update debian dependencies in README (#228)
'fakeroot' is needed for building deb packages
2019-05-22 21:19:36 -07:00
saadrahim 4c4351673b Jenkinsfile (#65)
* Changing Jenkinsfile to support runs without docker
* Updating install file for build options
* Fixing command execution
* Fixing Jenkinsfile
* fixing test execution
* Removing junit search
2019-05-22 15:32:32 -06:00
saadrahim 42c3e4b93d Updating readme for 2.5 release (#67) 2019-05-22 15:31:12 -06:00
gilbertlee-amd ffe2054ed2 Test combined calls (#64)
* Adding test for queueing multiple different collectives, 1 device per thread
2019-05-22 15:30:37 -06:00
Aaron Enye Shi 6e8f40eb22 Update README to note install rocm-cmake (#68) 2019-05-22 15:29:59 -06:00
gilbertlee-amd a115f577dd Adding fix for unsufficient devices / better logging for skipped tests (#63) 2019-05-21 14:34:20 -06:00
Stanley Tsang afa945d6e6 Renaming jenkinsfile 2019-05-21 15:54:41 +00:00
Wenkai Du 4bfa506a6b Remove extra compiler path setting 2019-05-21 00:08:42 +00:00