Gilbert Lee
e5074ce94d
Changing single sync mode to time all iterations instead of just last
2019-12-20 17:08:39 -08:00
gilbertlee-amd
2f4269d06d
Adding new sleep after sync capability for data fabric profiling ( #162 )
...
Fixing missing header include for ROCM 3.0 changes
2019-12-12 15:20:54 -07:00
Wenkai Du
07bb6fce8f
rccl_prim_test: Generalize ring topology and duplications
...
Allow user specified ring topology from command line and duplicated
to requested number of workgroups:
./rccl_prim_test -w 12 -p copy -r "0 1 2 3|3 2 1 0|0 2 1 3|3 1 2 0|0 2 3 1|1 3 2 0"
2019-11-11 15:42:24 -08:00
Wenkai Du
277c72a638
Merge pull request #149 from wenkaidu/rtc
...
Correct RTC frequencies for profiling purpose
2019-11-06 08:02:58 -08:00
gilbertlee-amd
fd94f4fa25
Adding interactive mode for profiling purposes ( #150 )
2019-11-05 17:10:16 -07:00
Wenkai Du
8995047830
Correct RTC frequencies for profiling purpose
2019-11-05 11:36:45 -08:00
Wenkai Du
90b2921207
Merge pull request #145 from wenkaidu/prim_test
...
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-11-01 13:30:01 -07:00
gilbertlee-amd
2f9edd2432
Single Sync Timing mode ( #144 )
...
* Adding single sync timing mode to emulate timing reported by rccl-prim-test / rccl-tests
* Adding duration / overhead info
2019-11-01 10:18:25 -06:00
Wenkai Du
ab91cdd5c9
rccl-prim-test: use hipExtLaunchMultiKernelMultiDevice and minor cleanup
2019-10-30 13:15:02 -07:00
Gilbert Lee
648c1ee7cc
Adding ability to switch between fine/coarse grain destination GPU memory
...
Adding ability to switch between memset/memcpy
2019-10-29 12:00:32 -06:00
rohit pathania
a270ee080e
Read operation throughput
2019-09-03 14:58:40 +05:30
rohit pathania
e5b13d69e5
display each workgroup ,links and directions with throughputs
2019-08-30 13:28:23 +05:30
rpathani
40e30b5168
Update rccl_prim_test.cpp
2019-08-19 12:44:11 +05:30
rpathani
deea20d49c
Merge branch 'master' into xgmi_bench
2019-08-16 10:56:56 +05:30
Wenkai Du
f11c8f60cd
RCCL 2.4 update
2019-08-14 10:42:35 -07:00
rohit pathania
65e2f5d87b
Modified the code to use RTC clock frequency based on gpu gcn id
2019-08-14 12:55:12 +05:30
rohit pathania
0f74929dab
Merge branch 'xgmi_bench' of https://github.com/rpathani/rccl into xgmi_bench
...
# Conflicts:
# tools/rccl-prim-test/rccl_prim_test.cpp
2019-08-13 11:36:56 +05:30
rohit pathania
3bbf924ff8
Adding linkinfo and srcGPU to destGPU info
2019-08-13 11:28:50 +05:30
rohit pathania
5a2f74b8d0
Adding linkinfo and srcGPU to destGPU info
2019-08-09 12:44:06 +05:30
gilbertlee-amd
b8cf48fc16
Adding TransferBench tool ( #113 )
...
* Adding standalone TransferBench tool
2019-08-07 17:21:41 -06:00
Wenkai Du
70804da15b
Refactor primitive test to support multiple GPUs in rings ( #94 )
...
* Refactor primitive test to support multiple GPUs in rings
* Make GPUs sync before transfer optional
* Use same ring format as RCCL
* Extend to 8 GPUs and report errors if there is no P2P access
* Control GPUs sync before ops from command line with "-s" option
* Change buffer size through command line option "-n"
Rename iterations command line option to "-i"
2019-07-05 14:29:20 -07:00
Wenkai Du
e6a0da444f
Match primitives unroll counts with latest RCCL ( #91 )
2019-06-26 15:09:13 -07:00
Wenkai Du
ee14676064
Calculate and print kernel throughput ( #78 )
...
* rccl-prim-test: print GPU info and set iterations
* Calculate and print kernel throughput
2019-06-07 10:39:30 -07:00
Wenkai Du
42b488507d
rccl-prim-test: print GPU info and set iterations ( #77 )
2019-06-05 15:16:33 -07:00
Wenkai Du
1bb6d2104c
Add RCCL primitive testing ( #70 )
2019-05-23 16:52:17 -06:00