Add host API for alltoallmem_on_stream collective operation (#333)
* Add host-side rocshmem_alltoallmem_on_stream function
Function signature:
rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
const void *source, size_t size,
hipStream_t stream)
- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.
* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends
When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.
* Add functional test for team_alltoallmem_on_stream
This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.
* Add documentation for rocshmem_alltoallmem_on_stream
This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:
[ROCm/rocshmem commit: 5577feb70d]
This commit is contained in:
committed by
GitHub
vanhempi
0f32739b52
commit
4b04b540bf
@@ -109,6 +109,7 @@ declare -A TEST_NUMBERS=(
|
||||
["teamctxsingleinfra"]="73"
|
||||
["teamctxblockinfra"]="74"
|
||||
["teamctxoddeveninfra"]="75"
|
||||
["alltoallmem_on_stream"]="76"
|
||||
)
|
||||
|
||||
ExecTest() {
|
||||
@@ -428,6 +429,8 @@ TestColl() {
|
||||
ExecTest "fcollect" 2 1 1 32768
|
||||
|
||||
ExecTest "teamreduction" 2 1 1 32768
|
||||
|
||||
ExecTest "alltoallmem_on_stream" 2 1 1 32768
|
||||
}
|
||||
|
||||
TestOther() {
|
||||
|
||||
Viittaa uudesa ongelmassa
Block a user