4b04b540bf2745df04162af0dd9ed4643fa18954
* Add host-side rocshmem_alltoallmem_on_stream function
Function signature:
rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
const void *source, size_t size,
hipStream_t stream)
- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.
* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends
When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.
* Add functional test for team_alltoallmem_on_stream
This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.
* Add documentation for rocshmem_alltoallmem_on_stream
This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:
[ROCm/rocshmem commit: 5577feb70d]
Описание
No description provided
Languages
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Разное
1.1%