İşleme Grafiği

275 İşleme

Yazar SHA1 Mesaj Tarih
Wenkai Du 5747c3cac1 Fix abort handling in LL primitives
[ROCm/rccl commit: 077c3cda74]
2020-02-25 13:42:54 -08:00
Wenkai Du d640f38d56 Fix system maxSpeed and maxWidth calculation
[ROCm/rccl commit: 9b80b3633f]
2020-02-24 15:18:57 -08:00
Wenkai Du 93d448e2fe Fix incorrect CR8 detection
Also change level of ring graph print to help debugging


[ROCm/rccl commit: f54dc58113]
2020-02-21 10:09:49 -08:00
Wenkai Du cf4bce4ad3 Merge pull request #172 from wenkaidu/topo_expl
Add topology explorer

[ROCm/rccl commit: 5b3856f2ed]
2020-02-20 15:16:55 -08:00
Wenkai Du 00f421ccbd Add topology explorer
[ROCm/rccl commit: 55f8e2dec7]
2020-02-19 14:42:06 -08:00
Wenkai Du 9dad3e0a90 Merge pull request #167 from wenkaidu/cr8
Generate 8G6L chordal ring from reference

[ROCm/rccl commit: 9110820470]
2020-02-18 14:59:23 -08:00
Eiden Yoshida d6d1f700f6 Fix hipclang argument in CI (#171)
[ROCm/rccl commit: 428f1f1555]
2020-02-18 13:17:52 -07:00
Eiden Yoshida eb823a7621 Refactor Jenkinsfiles to allow use of new docker containers (#170)
[ROCm/rccl commit: edb863de62]
2020-02-18 11:25:29 -07:00
Wenkai Du 8432e8a921 Generate 8G6L chordal ring from reference
[ROCm/rccl commit: abcfbf1231]
2020-02-11 22:01:12 +00:00
Wenkai Du ded8d0d389 Bump up HCC version for -hc-function-calls switch
[ROCm/rccl commit: 3d092f32b8]
2020-02-11 19:37:13 +00:00
Wenkai Du 6b2d7de200 Add ring bandwidth correction factor
[ROCm/rccl commit: d1dae2721d]
2020-01-30 09:52:27 -08:00
Stanley Tsang e5419407c4 Updating copyright notices for 2020.
[ROCm/rccl commit: 20fa04d9b6]
2020-01-29 15:28:08 -08:00
Wenkai Du e6b5933d7e Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup
[ROCm/rccl commit: fe6d012eb0]
2020-01-29 15:28:03 -08:00
Wenkai Du 622b49e80a Split primitive class to smaller structures
[ROCm/rccl commit: 486fd436af]
2020-01-29 15:27:23 -08:00
Wenkai Du d2fbcfea02 Misc fixes and improvements for 2.5.6
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log


[ROCm/rccl commit: 1e55645d97]
2020-01-29 15:27:05 -08:00
paulfreddy bbb0c59cd4 Changes for multiple ROCm installation (#164)
* Changes for multiple ROCm installation

   1. Set version to 2.10.1
   2. Add CMAKE_INSTALL_PREFIX to neccessary places
   3. Cleanup, fix rpath, use prefix in install.sh

* Changes for multiple ROCm installation

   1. Set soversion to match release version
   2. Add CMAKE_INSTALL_PREFIX to neccessary places
   3. Cleanup, fix rpath, use prefix in install.sh

* Changes for multiple ROCm installation

1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh


[ROCm/rccl commit: 15c917244d]
2020-01-08 21:28:16 -08:00
Gilbert Lee 5783917a75 Changing single sync mode to time all iterations instead of just last
[ROCm/rccl commit: e5074ce94d]
2019-12-20 17:08:39 -08:00
gilbertlee-amd 71635198b8 Removing OpenMP from unit tests (#163)
[ROCm/rccl commit: 000bce6f27]
2019-12-20 11:41:56 -07:00
gilbertlee-amd a461b6d139 Adding new sleep after sync capability for data fabric profiling (#162)
Fixing missing header include for ROCM 3.0 changes

[ROCm/rccl commit: 2f4269d06d]
2019-12-12 15:20:54 -07:00
saadrahim 26e161a7a2 Package fix (#161)
* Fixing RHEL dependency on rocm-dev


[ROCm/rccl commit: 0092b35132]
2019-12-06 16:06:50 -07:00
saadrahim 13de181fbc Changing package dependency to rocm-dev (#160)
[ROCm/rccl commit: bd59b6f880]
2019-12-06 14:00:25 -07:00
Wenkai Du 35ad901dfe Merge pull request #158 from wenkaidu/p2p
Change default P2P level

[ROCm/rccl commit: 9e10cde644]
2019-12-04 16:30:58 -08:00
Wenkai Du 272d22fbe3 Change default P2P level
[ROCm/rccl commit: 90e928bcd5]
2019-12-04 21:05:10 +00:00
Wenkai Du d7d4175df0 Merge remote-tracking branch 'remotes/nccl/master' into rccl_2.5.6
[ROCm/rccl commit: 6648c81dc6]
2019-12-03 15:42:04 -08:00
Wenkai Du fedce64117 Change manual build instructions to fit most common usage
[ROCm/rccl commit: 00a910c2da]
2019-11-26 12:40:26 -08:00
Wenkai Du 4cb52294d5 Disable direct buffers to reduce scratch memory size
[ROCm/rccl commit: a0be2b8812]
2019-11-20 13:03:16 -08:00
Sylvain Jeaugey 71560fd67b 2.5.6-1 (#255)
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).

Add tuning for GCP

[ROCm/rccl commit: 299c554dcc]
2019-11-19 14:57:39 -08:00
Wenkai Du 55ad3c801e Support bfloat16 on rest of the unit tests
[ROCm/rccl commit: 4ca05c1297]
2019-11-18 14:18:34 -08:00
Wenkai Du 7dc39b8928 Add bfloat16 all reduce unit test
[ROCm/rccl commit: bdac0256a5]
2019-11-18 13:50:29 -08:00
Wenkai Du 1e182391ad Add bfloat16 support in RCCL
Preprocessor symbol RCCL_BFLOAT16 is used as feature indicator


[ROCm/rccl commit: 5e109ed400]
2019-11-18 13:45:53 -08:00
Wenkai Du 8b1ce44c2a Temporary disable 0x803 target due to build error
[ROCm/rccl commit: cd7ab1425b]
2019-11-14 11:17:41 -08:00
Wenkai Du 8e60479385 Merge pull request #151 from wenkaidu/prim_test
rccl_prim_test: Generalize ring topology and duplications

[ROCm/rccl commit: 55c07e4fb7]
2019-11-13 08:17:55 -08:00
Siu Chi Chan 9b19999918 Bump up HCC version for -hc-function-calls switch
[ROCm/rccl commit: 08ba92f1b0]
2019-11-12 14:16:35 -05:00
Wenkai Du 25b3175e82 rccl_prim_test: Generalize ring topology and duplications
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"


[ROCm/rccl commit: 07bb6fce8f]
2019-11-11 15:42:24 -08:00
Wenkai Du 66e4337c6e Merge pull request #149 from wenkaidu/rtc
Correct RTC frequencies for profiling purpose

[ROCm/rccl commit: 277c72a638]
2019-11-06 08:02:58 -08:00
gilbertlee-amd 22cbbb9004 Adding interactive mode for profiling purposes (#150)
[ROCm/rccl commit: fd94f4fa25]
2019-11-05 17:10:16 -07:00
Wenkai Du 62042e47bc Correct RTC frequencies for profiling purpose
[ROCm/rccl commit: 8995047830]
2019-11-05 11:36:45 -08:00
Wenkai Du 41f6319b33 Check for fine grain support using memory allocation
[ROCm/rccl commit: 669f1951a4]
2019-11-01 15:58:49 -07:00
Wenkai Du 0d6b476b08 Merge pull request #145 from wenkaidu/prim_test
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup

[ROCm/rccl commit: 90b2921207]
2019-11-01 13:30:01 -07:00
gilbertlee-amd f9ef1553aa Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info


[ROCm/rccl commit: 2f9edd2432]
2019-11-01 10:18:25 -06:00
Jeff Daily e43e1f1b3d additional check for fine grain support in p2pCanConnect (#146)
[ROCm/rccl commit: 5a502955c9]
2019-10-31 08:58:38 -07:00
Wenkai Du 91b906cf88 rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
[ROCm/rccl commit: ab91cdd5c9]
2019-10-30 13:15:02 -07:00
Gilbert Lee a99accb2cb Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy


[ROCm/rccl commit: 648c1ee7cc]
2019-10-29 12:00:32 -06:00
Wenkai Du b4ab922f94 Merge pull request #140 from scchan/rocm210_hc_function_calls
add -hc-function-calls switch back for HCC ROCm 2.10

[ROCm/rccl commit: 9be7ae8f0d]
2019-10-28 09:56:47 -07:00
Michael LIAO 4b94f25d08 [cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS.
[ROCm/rccl commit: ec10a5cf14]
2019-10-25 13:55:27 -04:00
Wenkai Du d3f399f619 Disable HDP flush for RDMA
[ROCm/rccl commit: 296176a4fd]
2019-10-23 14:40:17 -07:00
Siu Chi Chan 8d2018d372 add -hc-function-calls switch back for HCC ROCm 2.10
[ROCm/rccl commit: d779eae1d0]
2019-10-21 18:00:02 -04:00
Wenkai Du 21bc1ef493 Revert collective chunk and slice steps to avoid drop in throughput
[ROCm/rccl commit: df74d12946]
2019-10-18 12:54:00 -07:00
saadrahim 0f4d4d63ec CI Re-enabled for Ubuntu (#135)
[ROCm/rccl commit: a95529a6e2]
2019-10-18 11:38:51 -06:00
Gilbert Lee 7560929bd7 Reverting GenericOp bug workaround modifications to slice/chunk steps
[ROCm/rccl commit: 37603ae6cb]
2019-10-11 09:20:10 -07:00