Wykres commitów

40 Commity

Autor SHA1 Wiadomość Data
Saad Rahim 33c23fdcda Merge remote-tracking branch 'upstream/master' into develop 2020-04-29 16:12:37 -07:00
Wenkai Du 5743c6b7d2 topo_expl: fix build error 2020-04-27 17:17:05 +00:00
Gilbert Lee 339bf9ff19 Adding option to re-use streams instead of re-creating per topology 2020-04-23 15:53:40 +00:00
Wenkai Du ef7064ba9b rccl-prim-test: auto-detect rings in 4P and 8P configurations 2020-04-10 18:17:21 +00:00
Aaron Enye Shi a95090d981 Fix HIP-Clang build with HSA headers
HIP-Clang does not include these HSA headers, and they need to be explicitly added in RCCL.
2020-04-03 17:58:23 -04:00
Wenkai Du 6f54b23503 topo_expl: update to 2.6 2020-04-01 13:37:08 -07:00
Wenkai Du ebc823e603 rccl-prim-test: add all-to-all benchmark (#185)
For gfx908, support simple detection of ring topology.
Call ReduceOrCopyMulti directly from kernel.
Also simplify code by removing kernel start synchronization option
which has no effect on throughput measurements.
2020-03-16 10:00:54 -07:00
Wenkai Du 32388d60a9 topo_expl: add a few more single node models 2020-03-02 11:43:03 -08:00
Wenkai Du 498d5029ad Add topology visualizer tool 2020-02-26 15:23:34 -08:00
Wenkai Du 934b6de557 topo_expl: use bandwidth numbers defined in graph in CPU models 2020-02-26 14:17:36 -08:00
Wenkai Du d2adc61bf6 Revise PCI BW numbers on Rome 2020-02-26 13:17:49 -08:00
Wenkai Du 55f8e2dec7 Add topology explorer 2020-02-19 14:42:06 -08:00
Stanley Tsang 20fa04d9b6 Updating copyright notices for 2020. 2020-01-29 15:28:08 -08:00
Wenkai Du fe6d012eb0 Merge remote-tracking branch 'remotes/rccl/master' into rccl_2.5.6_cleanup 2020-01-29 15:28:03 -08:00
Wenkai Du 1e55645d97 Misc fixes and improvements for 2.5.6
1. Fix RCCL unit test
2. Add ROME detection and tuning
3. Change default P2P level
4. Fix search algorithm for XGMI
5. Remove explicit channel duplication with implicit by using half of link speed
6. Add collective trace support
7. Correct Intel Skylake CPU detection and bandwidth
8. Fix topo connect function
9. Disable GDR read and remove unreachable code
10. Disable LL128 kernels
11. Add tuning parameters
12. Use original clock64() implementation which returns RTC counter value
13. Print out timestamp of collective trace
14. Do not use struct ncclColl in kernel launch parameter
15. Fix abort handling and add tracing
17. Add __launch_bounds__ to kernel functions
18. Remove unused abortCount
19. Unset default MIN_NRINGS and MIN_NCHANNELS
20. Do not allocate shared memory when not using LL128 kernels
21. Correct time print out in tuning log
2020-01-29 15:27:05 -08:00
Gilbert Lee e5074ce94d Changing single sync mode to time all iterations instead of just last 2019-12-20 17:08:39 -08:00
gilbertlee-amd 2f4269d06d Adding new sleep after sync capability for data fabric profiling (#162)
Fixing missing header include for ROCM 3.0 changes
2019-12-12 15:20:54 -07:00
Wenkai Du 07bb6fce8f rccl_prim_test: Generalize ring topology and duplications
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
2019-11-11 15:42:24 -08:00
Wenkai Du 277c72a638 Merge pull request #149 from wenkaidu/rtc
Correct RTC frequencies for profiling purpose
2019-11-06 08:02:58 -08:00
gilbertlee-amd fd94f4fa25 Adding interactive mode for profiling purposes (#150) 2019-11-05 17:10:16 -07:00
Wenkai Du 8995047830 Correct RTC frequencies for profiling purpose 2019-11-05 11:36:45 -08:00
Wenkai Du 90b2921207 Merge pull request #145 from wenkaidu/prim_test
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-11-01 13:30:01 -07:00
gilbertlee-amd 2f9edd2432 Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
2019-11-01 10:18:25 -06:00
Wenkai Du ab91cdd5c9 rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup 2019-10-30 13:15:02 -07:00
Gilbert Lee 648c1ee7cc Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy
2019-10-29 12:00:32 -06:00
rohit pathania a270ee080e Read operation throughput 2019-09-03 14:58:40 +05:30
rohit pathania e5b13d69e5 display each workgroup ,links and directions with throughputs 2019-08-30 13:28:23 +05:30
rpathani 40e30b5168 Update rccl_prim_test.cpp 2019-08-19 12:44:11 +05:30
rpathani deea20d49c Merge branch 'master' into xgmi_bench 2019-08-16 10:56:56 +05:30
Wenkai Du f11c8f60cd RCCL 2.4 update 2019-08-14 10:42:35 -07:00
rohit pathania 65e2f5d87b Modified the code to use RTC clock frequency based on gpu gcn id 2019-08-14 12:55:12 +05:30
rohit pathania 0f74929dab Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
# Conflicts:
#	tools/rccl-prim-test/rccl_prim_test.cpp
2019-08-13 11:36:56 +05:30
rohit pathania 3bbf924ff8 Adding linkinfo and srcGPU to destGPU info 2019-08-13 11:28:50 +05:30
rohit pathania 5a2f74b8d0 Adding linkinfo and srcGPU to destGPU info 2019-08-09 12:44:06 +05:30
gilbertlee-amd b8cf48fc16 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool
2019-08-07 17:21:41 -06:00
Wenkai Du 70804da15b Refactor primitive test to support multiple GPUs in rings (#94)
* Refactor primitive test to support multiple GPUs in rings

* Make GPUs sync before transfer optional

* Use same ring format as RCCL

* Extend to 8 GPUs and report errors if there is no P2P access

* Control GPUs sync before ops from command line with "-s" option

* Change buffer size through command line option "-n"

Rename iterations command line option to "-i"
2019-07-05 14:29:20 -07:00
Wenkai Du e6a0da444f Match primitives unroll counts with latest RCCL (#91) 2019-06-26 15:09:13 -07:00
Wenkai Du ee14676064 Calculate and print kernel throughput (#78)
* rccl-prim-test: print GPU info and set iterations

* Calculate and print kernel throughput
2019-06-07 10:39:30 -07:00
Wenkai Du 42b488507d rccl-prim-test: print GPU info and set iterations (#77) 2019-06-05 15:16:33 -07:00
Wenkai Du 1bb6d2104c Add RCCL primitive testing (#70) 2019-05-23 16:52:17 -06:00