6f512e92a5
- use the reduce_psync buffers for synchronization in allreduce, not the barrier_psync. - execute a wwg barrier after the allreduce operation. After internal discussion it was determined that it is required for correctness.