Wenkai Du
5f57e6b466
Merge pull request #194 from wenkaidu/search
...
Fix incorrect next device ID in PCI ordered search
[ROCm/rccl commit: cf5070f6c0 ]
2020-04-27 09:54:09 -07:00
Wenkai Du
7b7f781658
Fix incorrect next device ID in PCI ordered search
...
[ROCm/rccl commit: edb49ed2d5 ]
2020-04-25 01:01:13 +00:00
saadrahim
b9acac2db6
Enabling CI Testing Again ( #192 )
...
Adding CI support based on AMD internal CI refactor.
[ROCm/rccl commit: cc66dd46e9 ]
2020-04-24 10:36:57 -06:00
Wenkai Du
728cf9ee10
Revert "Temporary disable 0x803 target due to build error"
...
This reverts commit 8b1ce44c2a .
[ROCm/rccl commit: 5170bd1c02 ]
2020-04-14 16:58:41 +00:00
Wenkai Du
2de0b24c30
rccl-prim-test: auto-detect rings in 4P and 8P configurations
...
[ROCm/rccl commit: ef7064ba9b ]
2020-04-10 18:17:21 +00:00
Aaron Enye Shi
bfbfe370c3
Fix HIP-Clang build with HSA headers
...
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.
[ROCm/rccl commit: a95090d981 ]
2020-04-03 17:58:23 -04:00
Wenkai Du
8852e54181
topo_expl: update to 2.6
...
[ROCm/rccl commit: 6f54b23503 ]
2020-04-01 13:37:08 -07:00
Wenkai Du
4aeb7f041e
Merge remote-tracking branch 'nccl/master' into v2.6.4_merge
...
[ROCm/rccl commit: fa36fd9ef9 ]
2020-04-01 13:35:12 -07:00
Sylvain Jeaugey
b996c2ca00
Merge pull request #314 from NVIDIA/v2.6
...
2.6.4-1
[ROCm/rccl commit: 533e3702cf ]
2020-03-26 17:31:24 -07:00
Sylvain Jeaugey
40adc74496
2.6.4-1
...
Add support for network collectives.
Add support for XML topology dump/injection.
Add text values for GDR and P2P Levels, including "NVL".
Add speed detection for PCI, Infiniband and Ethernet cards.
Add CPU detection for ARM and AMD CPUs.
Add support for adaptive routing on Infiniband.
Change NET plugin API to v3 : merge PCI path and GPU pointer
capability into a single structure and add other properties.
[ROCm/rccl commit: b221128eca ]
2020-03-20 14:58:36 -07:00
Rashika Kheria
38b445c94f
Check return code for Flush operation
...
Current NCCL code does not abort for failed Flush operations by
underlying network. This may compromise data integrity.
Signed-off-by: Rashika Kheria <rashika@amazon.com >
[ROCm/rccl commit: 6c61492eba ]
2020-03-16 20:40:59 -07:00
Wenkai Du
e3e1c6b29c
rccl-prim-test: add all-to-all benchmark ( #185 )
...
For gfx908, support simple detection of ring topology.
Call ReduceOrCopyMulti directly from kernel.
Also simplify code by removing kernel start synchronization option
which has no effect on throughput measurements.
[ROCm/rccl commit: ebc823e603 ]
2020-03-16 10:00:54 -07:00
amdkila
eef6314001
set hip::host and hip::device and remove some deprecated targets ( #184 )
...
[ROCm/rccl commit: b9fb0cd808 ]
2020-03-05 13:36:55 -07:00
Wenkai Du
cb19bce4e0
Merge pull request #183 from wenkaidu/dup_rings
...
Remove condition for ring duplication
[ROCm/rccl commit: 0976e47b06 ]
2020-03-02 17:12:42 -08:00
Wenkai Du
dba615366b
Merge pull request #182 from wenkaidu/topo_expl
...
Topo expl
[ROCm/rccl commit: 88752f9173 ]
2020-03-02 15:44:09 -08:00
Wenkai Du
85fd51a06f
Remove condition for ring duplication
...
Fix insufficent number of rings on single node after pull #179
[ROCm/rccl commit: 62dc28bd2e ]
2020-03-02 12:55:06 -08:00
Wenkai Du
7882b2f0c5
topo_expl: add a few more single node models
...
[ROCm/rccl commit: 32388d60a9 ]
2020-03-02 11:43:03 -08:00
Wenkai Du
593d99d9a9
Check fine grained memory before enabling RDMA
...
Adding back the check which was lost from 2.5 merge.
[ROCm/rccl commit: fb59328a7b ]
2020-03-02 11:18:27 -08:00
Wenkai Du
2a66deb694
Merge pull request #179 from wenkaidu/search
...
Use fraction of system maxWidth as steps for searching
[ROCm/rccl commit: 8b5bc8bca2 ]
2020-02-28 11:05:46 -08:00
Wenkai Du
b750defc28
Merge remote-tracking branch 'remotes/nccl/master'
...
[ROCm/rccl commit: 8e73a2ad60 ]
2020-02-27 12:53:03 -08:00
Wenkai Du
a36c2ecbc4
Add topology visualizer tool
...
[ROCm/rccl commit: 498d5029ad ]
2020-02-26 15:23:34 -08:00
Wenkai Du
3886f9bea8
topo_expl: use bandwidth numbers defined in graph in CPU models
...
[ROCm/rccl commit: 934b6de557 ]
2020-02-26 14:17:36 -08:00
Wenkai Du
45a7541582
Revise PCI BW numbers on Rome
...
[ROCm/rccl commit: d2adc61bf6 ]
2020-02-26 13:17:49 -08:00
Wenkai Du
b4be0ff3b8
Use fraction of system maxWidth as steps for searching
...
This reverts previous workaround of deducting only half of width
from paths.
[ROCm/rccl commit: 8391637613 ]
2020-02-26 09:14:35 -08:00
Wenkai Du
5747c3cac1
Fix abort handling in LL primitives
...
[ROCm/rccl commit: 077c3cda74 ]
2020-02-25 13:42:54 -08:00
Wenkai Du
d640f38d56
Fix system maxSpeed and maxWidth calculation
...
[ROCm/rccl commit: 9b80b3633f ]
2020-02-24 15:18:57 -08:00
Wenkai Du
93d448e2fe
Fix incorrect CR8 detection
...
Also change level of ring graph print to help debugging
[ROCm/rccl commit: f54dc58113 ]
2020-02-21 10:09:49 -08:00
Wenkai Du
cf4bce4ad3
Merge pull request #172 from wenkaidu/topo_expl
...
Add topology explorer
[ROCm/rccl commit: 5b3856f2ed ]
2020-02-20 15:16:55 -08:00
Wenkai Du
00f421ccbd
Add topology explorer
...
[ROCm/rccl commit: 55f8e2dec7 ]
2020-02-19 14:42:06 -08:00
Wenkai Du
9dad3e0a90
Merge pull request #167 from wenkaidu/cr8
...
Generate 8G6L chordal ring from reference
[ROCm/rccl commit: 9110820470 ]
2020-02-18 14:59:23 -08:00
Eiden Yoshida
d6d1f700f6
Fix hipclang argument in CI ( #171 )
...
[ROCm/rccl commit: 428f1f1555 ]
2020-02-18 13:17:52 -07:00
Eiden Yoshida
eb823a7621
Refactor Jenkinsfiles to allow use of new docker containers ( #170 )
...
[ROCm/rccl commit: edb863de62 ]
2020-02-18 11:25:29 -07:00
Sylvain Jeaugey
6034c27655
Fix Allgather operations above 4G with multiple GPUs per process.
...
Fixes nccl-tests#37.
Direct offsets were still on 32 bits in the low-level primitives.
[ROCm/rccl commit: c38f174bd4 ]
2020-02-12 11:11:55 -08:00
Wenkai Du
8432e8a921
Generate 8G6L chordal ring from reference
...
[ROCm/rccl commit: abcfbf1231 ]
2020-02-11 22:01:12 +00:00
Wenkai Du
ded8d0d389
Bump up HCC version for -hc-function-calls switch
...
[ROCm/rccl commit: 3d092f32b8 ]
2020-02-11 19:37:13 +00:00
Wenkai Du
6b2d7de200
Add ring bandwidth correction factor
...
[ROCm/rccl commit: d1dae2721d ]
2020-01-30 09:52:27 -08:00
Stanley Tsang
e5419407c4
Updating copyright notices for 2020.
...
[ROCm/rccl commit: 20fa04d9b6 ]
2020-01-29 15:28:08 -08:00
Wenkai Du
e6b5933d7e
Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup
...
[ROCm/rccl commit: fe6d012eb0 ]
2020-01-29 15:28:03 -08:00
Wenkai Du
622b49e80a
Split primitive class to smaller structures
...
[ROCm/rccl commit: 486fd436af ]
2020-01-29 15:27:23 -08:00
Wenkai Du
d2fbcfea02
Misc fixes and improvements for 2.5.6
...
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log
[ROCm/rccl commit: 1e55645d97 ]
2020-01-29 15:27:05 -08:00
Sylvain Jeaugey
40958b6445
2.5.7-1
...
[ROCm/rccl commit: 3701130b3c ]
2020-01-16 15:40:57 -08:00
paulfreddy
bbb0c59cd4
Changes for multiple ROCm installation ( #164 )
...
* Changes for multiple ROCm installation
1. Set version to 2.10.1
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
* Changes for multiple ROCm installation
1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
* Changes for multiple ROCm installation
1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
[ROCm/rccl commit: 15c917244d ]
2020-01-08 21:28:16 -08:00
Luke Yeager
d91217b16f
[topology] remove NET links when trimming system
...
This fixes a memory leak.
[ROCm/rccl commit: 7a18fe0784 ]
2020-01-07 13:29:57 -08:00
Luke Yeager
91ff39bedb
[build] Allow setting CXXFLAGS on the command line
...
[ROCm/rccl commit: c7ba70ff90 ]
2020-01-07 13:29:42 -08:00
Gilbert Lee
5783917a75
Changing single sync mode to time all iterations instead of just last
...
[ROCm/rccl commit: e5074ce94d ]
2019-12-20 17:08:39 -08:00
gilbertlee-amd
71635198b8
Removing OpenMP from unit tests ( #163 )
...
[ROCm/rccl commit: 000bce6f27 ]
2019-12-20 11:41:56 -07:00
gilbertlee-amd
a461b6d139
Adding new sleep after sync capability for data fabric profiling ( #162 )
...
Fixing missing header include for ROCM 3.0 changes
[ROCm/rccl commit: 2f4269d06d ]
2019-12-12 15:20:54 -07:00
Christian Sigg
ff74ebdcea
Fix clang build ( #274 )
...
The attribute is called `optnone`, not `noopt`.
[ROCm/rccl commit: 3899f6e0f2 ]
2019-12-09 09:31:13 -08:00
Ke Wen
6413a29ce8
Merge branch 'master' into HEAD
...
[ROCm/rccl commit: 44b5652617 ]
2019-12-06 18:28:11 -08:00
Ke Wen
8dc42618e4
2.5.6-2
...
Fix PPC64 Debian packaging
[ROCm/rccl commit: 6bb953d4e6 ]
2019-12-06 18:26:39 -08:00