ebc823e603
For gfx908, support simple detection of ring topology. Call ReduceOrCopyMulti directly from kernel. Also simplify code by removing kernel start synchronization option which has no effect on throughput measurements.