rocm-systems

Autore	SHA1	Messaggio	Data
Anatolii Rozanov	d0c8380650	Add host API for _on_stream operations (#340 ) Add functional test for barrier_all_on_stream * Add rocshmem_barrier_all_on_stream support for GDA and RO backends Implements rocshmem_barrier_all_on_stream operation for GPU Direct Access and Reverse Offload backends. Previously, rocshmem_barrier_all_on_stream was only supported for IPC backend. * Add functional test for rocshmem_broadcastmem_on_stream * Add host-side rocshmem_broadcastmem_on_stream API Implement stream-based broadcast collective operation - Add rocshmem_broadcastmem_on_stream host API and kernel implementation - Add functional test TeamBroadcastmemOnStreamTester with multi-stream support and correctness verification - Use per-workgroup contexts to avoid contention across parallel streams API: rocshmem_broadcastmem_on_stream(team, dest, source, nelems, pe_root, stream) * Add functional test for rocshmem_getmem_on_stream * Add host-side rocshmem_getmem_on_stream API Implement stream-based point-to-point RMA get operation - Add rocshmem_getmem_on_stream host API and kernel implementation - Support for asynchronous getmem operations on HIP streams - Add backend support for GDA, RO, and IPC contexts - Use work-group collective getmem for efficient memory transfer API: rocshmem_getmem_on_stream(dest, source, nelems, pe, stream) (AI Assist) * Add host-side rocshmem_putmem_on_stream API - Add rocshmem_putmem_on_stream for asynchronous remote writes - Support for concurrent RMA operations on HIP streams - Add backend support for GDA, RO, and IPC contexts - Use work-group device collective operation API: rocshmem_putmem_on_stream(dest, source, bytes, pe, stream) (AI Assist) * Add functional test for rocshmem_putmem_on_stream * Add host-side rocshmem_putmem_signal_on_stream API Enables asynchronous putmem operations with signaling on HIP streams. The implementation includes: - Kernel wrapper rocshmem_putmem_signal_kernel - Host interface putmem_signal_on_stream method - Context layer support across all backends (IPC, GDA, RO) - Public API Function signature: void rocshmem_putmem_signal_on_stream(void dest, const void source, size_t bytes, uint64_t sig_addr, uint64_t signal, int sig_op, int pe, hipStream_t stream); Add functional test for rocshmem_putmem_signal_on_stream * Add host-side rocshmem_signal_wait_until_on_stream API Enables asynchronous signal wait operations on HIP streams. The implementation includes: - Kernel wrapper rocshmem_signal_wait_until_kernel - Host interface signal_wait_until_on_stream method - Context layer support across all backends (IPC, GDA, RO) - Native uint64_t support in wait_until API (generated from P2P_SYNC.py) Function signature: void rocshmem_signal_wait_until_on_stream(uint64_t sig_addr, int cmp, uint64_t cmp_value, hipStream_t stream); (AI Assist) Add functional test for rocshmem_signal_wait_until_on_stream * Add documentation for stream API functions This commit adds API documentation for the following host-side stream functions: - rocshmem_barrier_all_on_stream (collective routines) - rocshmem_broadcastmem_on_stream (collective routines) - rocshmem_getmem_on_stream (RMA operations) - rocshmem_putmem_on_stream (RMA operations) - rocshmem_putmem_signal_on_stream (signaling operations) - rocshmem_signal_wait_until_on_stream (point-to-point sync) The documentation includes function signatures, parameter descriptions, and detailed explanations of asynchronous behavior and stream handling. (AI Assist) * Rename "bytes" -> "nelems" * Add "_TEST_" to the variables used in tests * Remove incorrect hipStreamDefault usage hipStreamDefault is not a default stream. This is a flag. If stream == nullptr, then just pass it to kernel. It will launch the kernel on the default stream	2025-12-09 08:55:46 -06:00
Yiltan	ecd4c9f561	Remove unused fence policy (#348 )	2025-12-08 14:06:53 -05:00
Anatolii Rozanov	5577feb70d	Add host API for alltoallmem_on_stream collective operation (#333 ) * Add host-side rocshmem_alltoallmem_on_stream function Function signature: rocshmem_alltoallmem_on_stream(rocshmem_team_t team, void dest, const void source, size_t size, hipStream_t stream) - The function launches rocshmem_alltoallmem_kernel which calls device-side alltoall<char> workgroup collective through default context. - Uses dynamic block size determination via occupancy API. - Implemented for all backends. * Fix incorrect sync buffer size allocation for alltoall in GDA and IPC backends When allocating memory for alltoall_pSync_pool in setup_teams() and teams_init() functions, the code incorrectly used ROCSHMEM_BCAST_SYNC_SIZE instead of ROCSHMEM_ALLTOALL_SYNC_SIZE. * Add functional test for team_alltoallmem_on_stream This commit adds a new functional test to verify the correctness of the host-side rocshmem_team_alltoallmem_on_stream API. * Add documentation for rocshmem_alltoallmem_on_stream This commit adds API documentation for the host-side rocshmem_alltoallmem_on_stream function in the collective routines section. The documentation includes:	2025-12-03 08:40:24 -05:00
Dimple Prajapati	a44b581997	Add host API for enqueuing barrier on given stream (#274 ) * add host API for enqueuing barrier on given stream	2025-10-15 14:29:07 -07:00
Edgar Gabriel	a1269e3db5	allow all three backends to co-exist in a single build (#270 ) * add support for compiling all backends also include the logic to select backends either based on user requests or through some heuristics * checkpoint for compiling all backends * final checkpoint all tests seem to pass when compiling all three backends simultaneasly and forcing to use any of the three Backends. * update PR to new envvar system	2025-10-07 10:49:20 -05:00
Dimple Prajapati	87f99e7ec6	Add host APIs for querying device ctx and remote heap pointer (#200 ) * Add host APIs for querying device ctx and remote heap pointer * Host API to query device pointer for ROCSHMEM_DEFAULT_CONTEXT, this is needed to support dynamic module initialization via device kernel library bitcode. * Host API to query remote symmetric heap pointer that can be used in custom device kernel for RMA operations. * Added rocshmem_ptr implementation within the Host Context class * Enables pointer retrieval functionality for symmetric data objects * Copy IPC pointers to host memory in RO host context --------- Co-authored-by: avinashkethineedi <avinash.kethineedi@amd.com>	2025-07-24 11:03:03 -07:00
Aurelien Bouteiller	63a79892b2	rocshmem_config.h has a different include path when installed and built-dir (#186 ) * rocshmem_config.h needs to be in a similar directory structure for includes to work when building testers in build, and from an installed library * Do not change installed rocshmem.hpp	2025-07-02 16:51:38 -04:00
Avinash Kethineedi	f6ef19f5a9	Add SPDX license identifiers and update copyright headers (#85 ) * Update copyright information and add SPDX license identifier * Update AUTHORS * Remove `sos_tests`	2025-04-15 15:37:53 -05:00
avinashkethineedi	6486e29078	Rename config.h to roc_shmem_config.h	2024-12-06 01:08:13 +00:00
Brandon Potter	770890a107	Remove SpinEBOBlockMutex usage and unit tests	2024-07-11 10:12:19 -07:00
Brandon Potter	ea8f264a11	Transfer files from RAD repository	2024-07-01 09:57:08 -05:00

11 Commit