add a simple version of an allreduce algorithm as a starting point. [ROCm/rocshmem commit: ba21cb7b85]
ba21cb7b85