Граф коммитов

312 Коммитов

Автор SHA1 Сообщение Дата
Wenkai Du 3d092f32b8 Bump up HCC version for -hc-function-calls switch 2020-02-11 19:37:13 +00:00
Wenkai Du d1dae2721d Add ring bandwidth correction factor 2020-01-30 09:52:27 -08:00
Stanley Tsang 20fa04d9b6 Updating copyright notices for 2020. 2020-01-29 15:28:08 -08:00
Wenkai Du fe6d012eb0 Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup 2020-01-29 15:28:03 -08:00
Wenkai Du 486fd436af Split primitive class to smaller structures 2020-01-29 15:27:23 -08:00
Wenkai Du 1e55645d97 Misc fixes and improvements for 2.5.6
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log
2020-01-29 15:27:05 -08:00
paulfreddy 15c917244d Changes for multiple ROCm installation (#164)
* Changes for multiple ROCm installation

   1. Set version to 2.10.1
   2. Add CMAKE_INSTALL_PREFIX to neccessary places
   3. Cleanup, fix rpath, use prefix in install.sh

* Changes for multiple ROCm installation

   1. Set soversion to match release version
   2. Add CMAKE_INSTALL_PREFIX to neccessary places
   3. Cleanup, fix rpath, use prefix in install.sh

* Changes for multiple ROCm installation

1. Set soversion to match release version
2. Add CMAKE_INSTALL_PREFIX to neccessary places
3. Cleanup, fix rpath, use prefix in install.sh
2020-01-08 21:28:16 -08:00
Gilbert Lee e5074ce94d Changing single sync mode to time all iterations instead of just last 2019-12-20 17:08:39 -08:00
gilbertlee-amd 000bce6f27 Removing OpenMP from unit tests (#163) 2019-12-20 11:41:56 -07:00
gilbertlee-amd 2f4269d06d Adding new sleep after sync capability for data fabric profiling (#162)
Fixing missing header include for ROCM 3.0 changes
2019-12-12 15:20:54 -07:00
saadrahim 0092b35132 Package fix (#161)
* Fixing RHEL dependency on rocm-dev
2019-12-06 16:06:50 -07:00
saadrahim bd59b6f880 Changing package dependency to rocm-dev (#160) 2019-12-06 14:00:25 -07:00
Wenkai Du 9e10cde644 Merge pull request #158 from wenkaidu/p2p
Change default P2P level
2019-12-04 16:30:58 -08:00
Wenkai Du e9ca3a8029 Merge pull request #157 from wenkaidu/readme
Change manual build instructions to fit most common usage
2019-12-04 14:50:41 -08:00
Wenkai Du 90e928bcd5 Change default P2P level 2019-12-04 21:05:10 +00:00
Wenkai Du 6648c81dc6 Merge remote-tracking branch 'remotes/nccl/master' into rccl_2.5.6 2019-12-03 15:42:04 -08:00
Wenkai Du 00a910c2da Change manual build instructions to fit most common usage 2019-11-26 12:40:26 -08:00
Wenkai Du b1ed4b7fa8 Merge pull request #155 from wenkaidu/direct
Disable direct buffers to reduce scratch memory size
2019-11-21 09:39:09 -08:00
Wenkai Du a0be2b8812 Disable direct buffers to reduce scratch memory size 2019-11-20 13:03:16 -08:00
Sylvain Jeaugey 299c554dcc 2.5.6-1 (#255)
Add LL128 Protocol.

Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.

Rework P2P/SHM detection in containers (#155, #248).

Detect duplicated devices and return an error (#231).

Add tuning for GCP
2019-11-19 14:57:39 -08:00
Wenkai Du 9a70ee2eb1 Merge pull request #154 from wenkaidu/bf16
Add bfloat16 support in RCCL
2019-11-19 09:07:51 -08:00
Wenkai Du 4ca05c1297 Support bfloat16 on rest of the unit tests 2019-11-18 14:18:34 -08:00
Wenkai Du bdac0256a5 Add bfloat16 all reduce unit test 2019-11-18 13:50:29 -08:00
Wenkai Du 5e109ed400 Add bfloat16 support in RCCL
Preprocessor symbol RCCL_BFLOAT16 is used as feature indicator
2019-11-18 13:45:53 -08:00
Wenkai Du 58a6e535f6 Merge pull request #153 from wenkaidu/fuji
Temporary disable 0x803 target due to build error
2019-11-14 11:46:21 -08:00
Wenkai Du cd7ab1425b Temporary disable 0x803 target due to build error 2019-11-14 11:17:41 -08:00
Wenkai Du 55c07e4fb7 Merge pull request #151 from wenkaidu/prim_test
rccl_prim_test: Generalize ring topology and duplications
2019-11-13 08:17:55 -08:00
Siu Chi Chan 453c735475 Merge pull request #152 from scchan/bump_hcc_version_check_32
Bump up HCC version for -hc-function-calls switch
2019-11-13 10:45:40 -05:00
Siu Chi Chan 08ba92f1b0 Bump up HCC version for -hc-function-calls switch 2019-11-12 14:16:35 -05:00
Wenkai Du 07bb6fce8f rccl_prim_test: Generalize ring topology and duplications
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
2019-11-11 15:42:24 -08:00
Wenkai Du 277c72a638 Merge pull request #149 from wenkaidu/rtc
Correct RTC frequencies for profiling purpose
2019-11-06 08:02:58 -08:00
gilbertlee-amd fd94f4fa25 Adding interactive mode for profiling purposes (#150) 2019-11-05 17:10:16 -07:00
Wenkai Du 8995047830 Correct RTC frequencies for profiling purpose 2019-11-05 11:36:45 -08:00
Wenkai Du c49de785d2 Merge pull request #148 from wenkaidu/fine_grain
Check for fine grain support using memory allocation
2019-11-04 10:19:07 -08:00
Wenkai Du 669f1951a4 Check for fine grain support using memory allocation 2019-11-01 15:58:49 -07:00
Wenkai Du 90b2921207 Merge pull request #145 from wenkaidu/prim_test
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-11-01 13:30:01 -07:00
gilbertlee-amd 2f9edd2432 Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
2019-11-01 10:18:25 -06:00
Jeff Daily 5a502955c9 additional check for fine grain support in p2pCanConnect (#146) 2019-10-31 08:58:38 -07:00
Wenkai Du ab91cdd5c9 rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup 2019-10-30 13:15:02 -07:00
Gilbert Lee 648c1ee7cc Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy
2019-10-29 12:00:32 -06:00
Wenkai Du 9be7ae8f0d Merge pull request #140 from scchan/rocm210_hc_function_calls
add -hc-function-calls switch back for HCC ROCm 2.10
2019-10-28 09:56:47 -07:00
mhbliao d89734234a Merge pull request #142 from mhbliao/hliao/master/cmake
[cmake] Allow GPU targets to be parameterized with `AMDGPU_TARGETS`.
2019-10-28 08:33:30 -04:00
Michael LIAO ec10a5cf14 [cmake] Allow GPU targets to be parameterized with AMDGPU_TARGETS. 2019-10-25 13:55:27 -04:00
Wenkai Du b98d334114 Merge pull request #141 from wenkaidu/hdp
Disable HDP flush for RDMA
2019-10-24 16:26:01 -07:00
Wenkai Du 296176a4fd Disable HDP flush for RDMA 2019-10-23 14:40:17 -07:00
Siu Chi Chan d779eae1d0 add -hc-function-calls switch back for HCC ROCm 2.10 2019-10-21 18:00:02 -04:00
Wenkai Du 998ab83675 Merge pull request #138 from wenkaidu/slice_steps
Revert collective chunk and slice steps to avoid drop in throughput
2019-10-18 13:30:27 -07:00
Wenkai Du df74d12946 Revert collective chunk and slice steps to avoid drop in throughput 2019-10-18 12:54:00 -07:00
saadrahim a95529a6e2 CI Re-enabled for Ubuntu (#135) 2019-10-18 11:38:51 -06:00
gilbertlee-amd 60279867b3 Merge pull request #137 from gilbertlee-amd/GenericOpFix
Fix for GenericOp device primitive bug
2019-10-11 10:46:29 -06:00