Arquivos
rocm-systems/projects
Edgar Gabriel 2836240906 fix odd-case allreduce scenarios
if the number of elements to be used in the allreduce operation is not
exact multiple of the work-array buffer size and number of pe's, we need
to adjust the algorithm to:
 - initially perform a ring_allreduce on n_segments * chunk_size (which
   is the integer division of the number of elements and the work-buffer
   size, i.e. will not cover the entire buffer)
 - perform another ring_allreduce where chunk_size is reduced to match
   the remaining elements
 - if the remaining elements from the previous step cannot evenly be
   divded by the number of pe's, we need to perform a direct_allreduce on
   the outstanding number of elements.


[ROCm/rocshmem commit: a4b4281f50]
2024-10-24 15:08:32 +00:00
..
2024-10-24 15:08:32 +00:00