提交線圖

25 次程式碼提交

作者 SHA1 備註 日期
Gilbert Lee e5074ce94d Changing single sync mode to time all iterations instead of just last 2019-12-20 17:08:39 -08:00
gilbertlee-amd 2f4269d06d Adding new sleep after sync capability for data fabric profiling (#162)
Fixing missing header include for ROCM 3.0 changes
2019-12-12 15:20:54 -07:00
Wenkai Du 07bb6fce8f rccl_prim_test: Generalize ring topology and duplications
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
2019-11-11 15:42:24 -08:00
Wenkai Du 277c72a638 Merge pull request #149 from wenkaidu/rtc
Correct RTC frequencies for profiling purpose
2019-11-06 08:02:58 -08:00
gilbertlee-amd fd94f4fa25 Adding interactive mode for profiling purposes (#150) 2019-11-05 17:10:16 -07:00
Wenkai Du 8995047830 Correct RTC frequencies for profiling purpose 2019-11-05 11:36:45 -08:00
Wenkai Du 90b2921207 Merge pull request #145 from wenkaidu/prim_test
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-11-01 13:30:01 -07:00
gilbertlee-amd 2f9edd2432 Single Sync Timing mode (#144)
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
2019-11-01 10:18:25 -06:00
Wenkai Du ab91cdd5c9 rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup 2019-10-30 13:15:02 -07:00
Gilbert Lee 648c1ee7cc Adding ability to switch between fine/coarse grain destination GPU memory
Adding ability to switch between memset/memcpy
2019-10-29 12:00:32 -06:00
rohit pathania a270ee080e Read operation throughput 2019-09-03 14:58:40 +05:30
rohit pathania e5b13d69e5 display each workgroup ,links and directions with throughputs 2019-08-30 13:28:23 +05:30
rpathani 40e30b5168 Update rccl_prim_test.cpp 2019-08-19 12:44:11 +05:30
rpathani deea20d49c Merge branch 'master' into xgmi_bench 2019-08-16 10:56:56 +05:30
Wenkai Du f11c8f60cd RCCL 2.4 update 2019-08-14 10:42:35 -07:00
rohit pathania 65e2f5d87b Modified the code to use RTC clock frequency based on gpu gcn id 2019-08-14 12:55:12 +05:30
rohit pathania 0f74929dab Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
# Conflicts:
#	tools/rccl-prim-test/rccl_prim_test.cpp
2019-08-13 11:36:56 +05:30
rohit pathania 3bbf924ff8 Adding linkinfo and srcGPU to destGPU info 2019-08-13 11:28:50 +05:30
rohit pathania 5a2f74b8d0 Adding linkinfo and srcGPU to destGPU info 2019-08-09 12:44:06 +05:30
gilbertlee-amd b8cf48fc16 Adding TransferBench tool (#113)
* Adding standalone TransferBench tool
2019-08-07 17:21:41 -06:00
Wenkai Du 70804da15b Refactor primitive test to support multiple GPUs in rings (#94)
* Refactor primitive test to support multiple GPUs in rings

* Make GPUs sync before transfer optional

* Use same ring format as RCCL

* Extend to 8 GPUs and report errors if there is no P2P access

* Control GPUs sync before ops from command line with "-s" option

* Change buffer size through command line option "-n"

Rename iterations command line option to "-i"
2019-07-05 14:29:20 -07:00
Wenkai Du e6a0da444f Match primitives unroll counts with latest RCCL (#91) 2019-06-26 15:09:13 -07:00
Wenkai Du ee14676064 Calculate and print kernel throughput (#78)
* rccl-prim-test: print GPU info and set iterations

* Calculate and print kernel throughput
2019-06-07 10:39:30 -07:00
Wenkai Du 42b488507d rccl-prim-test: print GPU info and set iterations (#77) 2019-06-05 15:16:33 -07:00
Wenkai Du 1bb6d2104c Add RCCL primitive testing (#70) 2019-05-23 16:52:17 -06:00