e38f98fad5
* fix reduction for gfx942 and 1201
match the synchronizaation of internal_putmem_wg and internal_getmem_wg
to their non-internal counterparts. the internal_putmem_wg is used in
the ipc reduction
* move specialization to internal_putmem
[ROCm/rocshmem commit: 8d2504d6c1]