5577feb70d
* Add host-side rocshmem_alltoallmem_on_stream function
Function signature:
rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void *dest,
const void *source, size_t size,
hipStream_t stream)
- The function launches rocshmem_alltoallmem_kernel which calls
device-side alltoall<char> workgroup collective through default context.
- Uses dynamic block size determination via occupancy API.
- Implemented for all backends.
* Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends
When allocating memory for alltoall_pSync_pool in setup_teams() and
teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE
instead of ROCSHMEM_ALLTOALL_SYNC_SIZE.
* Add functional test for team_alltoallmem_on_stream
This commit adds a new functional test to verify the correctness of
the host-side rocshmem_team_alltoallmem_on_stream API.
* Add documentation for rocshmem_alltoallmem_on_stream
This commit adds API documentation for the host-side
rocshmem_alltoallmem_on_stream function in the collective routines
section. The documentation includes:
Building the rocSHMEM documentation
macOS
To build html documentation locally:
brew install doxygen sphinx-doc
pip3.10 install -r ./requirements.txt
python3.10 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
open _build/html/index.html
To build pdf documentation we require a LaTeX installation on your machine. Once LaTeX is installed, you may run the following:
pip3.10 install -r ./requirements.txt
sphinx-build -M latexpdf . _build
open _build/latex/rocshmem.pdf