rocm-systems

Author	SHA1	Message	Date
Avinash Kethineedi	f6ef19f5a9	Add SPDX license identifiers and update copyright headers (#85 ) * Update copyright information and add SPDX license identifier * Update AUTHORS * Remove `sos_tests`	2025-04-15 15:37:53 -05:00
Avinash Kethineedi	05755847f5	Update backend to use provided MPI communicator during library initialization (#79 ) * Update backend to use provided MPI communicator during library initialization, default to `MPI_COMM_WORLD` * Update `rocshmem_my_pe` and `rocshmem_n_pes` host APIs - Return values from backend if initialized; otherwise, fallback to MPI_Singleton.	2025-04-14 09:18:57 -05:00
Avinash Kethineedi	dc61bca066	Update `Barrier` and `Sync` APIs (#73 ) * Add thread, wavefront, and workgroup-level `barrier` APIs in IPC and RO conduits; remove collectives on default context - Implemented `barrier` APIs for thread, wavefront, and workgroup scopes - Added support into both IPC and RO conduits - Added functional tests to cover all `barrier` APIs - Removed collective operations on default context * Add thread, wavefront, and workgroup-level `sync` APIs in IPC and RO conduits. - Implemented `sync` APIs for thread, wavefront, and workgroup scopes - Added support into both IPC and RO conduits - Added functional tests to cover all `sync` APIs * update naming convention for context-based `barrier` APIs	2025-04-08 11:25:31 -05:00
Avinash Kethineedi	c652f58cef	Update `Barrier_All` and `Sync_All` APIs (#72 ) * Fix deadlock in `rocshmem_ctx_wg_barrier_all` API in IPC conduit by adding per-context pSync buffers and context IDs - Added separate pSync buffers for each device context - Resolved deadlock when invoking barrier API (`rocshmem_ctx_wg_barrier_all`) concurrently from multiple contexts * Update barrier_all functional tests for multi-context support * Add thread, wavefront, and workgroup-level barrier_all APIs in IPC and RO conduits - Implemented barrier_all APIs at thread, wavefront, and workgroup granularity - Added support in both IPC and RO conduits - Updated functional tests to cover all `barrier_all` APIs * Add thread, wavefront, and workgroup-level sync_all APIs in IPC and RO conduits - Implemented sync_all APIs for thread, wavefront, and workgroup scopes - Added support into both IPC and RO conduits - Added functional tests to cover all `sync_all` APIs	2025-04-02 11:58:55 -05:00
Edgar Gabriel	bcbc42e78f	add rocshmem_barrier() (#61 ) * add team-barrier implementation add a team-barrier API and implementation in the IPC and RO conduit. Clean up some of the logic in the RO Conduit to distinguish between sync, sync_all, barrier, and barrier_all. * add team_barrier_tests to functional tests	2025-03-24 11:23:03 -05:00
Yiltan	658bf2a3b5	Removed GPU_IB (#59 )	2025-03-24 09:04:52 -04:00
Avinash Kethineedi	eb5a38e806	Update(`DeviceProxy`): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations (#48 ) * Update(DeviceProxy): Dynamically Determine Memory Allocation Size & Remove Compile-Time size Calculations - Modified the Device proxy class to determine memory allocation size at runtime. - Updated all classes that include the Device proxy to use dynamic memory allocation. - Removed compile-time memory size calculations. - Ensured the allocated number of backend queue data structures matches the number of RO device contexts.	2025-02-24 15:11:46 -06:00
avinashkethineedi	c5b548c398	Fix `rocshmem_ctx_wg_team_sync` API - Updated `rocshmem_ctx_wg_team_sync` to utilize a team-specific memory buffer for synchronization	2025-02-05 19:09:07 +00:00
Avinash Kethineedi	248972b30b	Merge pull request #31 from avinashkethineedi/rocshmem_g Implement `rocshmem_g` API and optimize memory usage	2025-02-04 11:15:41 -06:00
Yiltan Hassan Temucin	fd3eaa3f69	[IPC] Fix ROCSHMEM_SIGNAL_ADD	2025-02-03 09:59:28 -08:00
avinashkethineedi	757d7e53ca	Implement `rocshmem_g` API and optimize memory usage - Implement `rocshmem_g` API - Free up memory space allocated for `rocshmem_g` and atomic operations' return values	2025-02-02 05:56:46 +00:00
avinashkethineedi	1ef2d3a6b7	Replace raw pointers for `host_interface` with shared_ptr to enable automatic memory handling	2025-01-13 20:58:43 +00:00
Yiltan Temucin	c0e4a32ca2	IPC backend now aborts with rocshmem global_exit()	2024-12-23 11:03:04 -06:00
Yiltan Temucin	fa0858833e	Remove comparisons of signed to unsigned values	2024-12-12 10:21:08 -06:00
avinashkethineedi	6486e29078	Rename config.h to roc_shmem_config.h	2024-12-06 01:08:13 +00:00
avinashkethineedi	d8ce066adc	Merge branch PR #55 into naming_scheme	2024-12-04 21:46:38 +00:00
Brandon Potter	fd8dbc7fb6	Use new naming scheme	2024-11-25 14:25:29 -06:00
Yiltan Temucin	d8f44e4436	Added Signalling Operations	2024-11-22 15:36:17 -06:00
Avinash Kethineedi	2cb5cab038	Merge pull request #52 from avinashkethineedi/IPC_puts/gets Update puts and gets with fence call	2024-11-14 13:19:24 -06:00
avinashkethineedi	d1ee997542	Update puts and gets to include a fence following data movement, ensuring data visibility	2024-11-12 16:52:07 +00:00
avinashkethineedi	5e3d94c705	Update collective APIs to use teams interface * Use team-relative numbering in collective functions * Replace log_stride with stride	2024-11-06 17:50:23 +00:00
Yiltan Hassan Temucin	997eb69b5a	modified team based to_all -> reduce	2024-11-06 09:46:43 -06:00
avinashkethineedi	b2b0d559cb	Merge branch 'ROCm:develop' into active_set_APIs	2024-11-05 23:02:44 +00:00
Yiltan Hassan Temucin	fe767d9abf	remove cooperative groups	2024-10-30 20:10:21 +00:00
avinashkethineedi	5975b8c621	Update broadcast function to use stride calculations instead of log_stride	2024-10-29 19:10:05 +00:00
avinashkethineedi	abec29bd6a	Update all_reduce algorithm to use internal put/get functions for updating pWrk and pSync arrays * Change log_stride calcualtions to stride calculations * Update all_reduce example code to use team based interface	2024-10-28 22:10:18 +00:00
Edgar Gabriel	11df5427a6	add ascii art for ring allredude	2024-10-24 15:08:32 +00:00
Edgar Gabriel	a4b4281f50	fix odd-case allreduce scenarios if the number of elements to be used in the allreduce operation is not exact multiple of the work-array buffer size and number of pe's, we need to adjust the algorithm to: - initially perform a ring_allreduce on n_segments * chunk_size (which is the integer division of the number of elements and the work-buffer size, i.e. will not cover the entire buffer) - perform another ring_allreduce where chunk_size is reduced to match the remaining elements - if the remaining elements from the previous step cannot evenly be divded by the number of pe's, we need to perform a direct_allreduce on the outstanding number of elements.	2024-10-24 15:08:32 +00:00
Edgar Gabriel	87db7f7d38	fix barrier synchronization on gfx90a	2024-10-24 15:08:28 +00:00
Edgar Gabriel	1fbb89bc73	ipc: add ring_allreduce algorithms add the ring allreduce algorithm to the ipc conduit in order to be able to execute slightly largers reductions.	2024-10-24 15:07:17 +00:00
Edgar Gabriel	ba21cb7b85	ipc/to_all: add direct allreduce algorithm add a simple version of an allreduce algorithm as a starting point.	2024-10-24 15:07:14 +00:00
Avinash Kethineedi	8a16968cf2	Merge pull request #41 from avinashkethineedi/collective_routine_buffers Fine grained memory buffers for work/sync arrays	2024-10-23 23:33:48 -05:00
avinashkethineedi	d5ea5868e3	Fix quiet and fence of default context * Update tinfo of default context	2024-10-22 16:18:05 +00:00
avinashkethineedi	6685d0ab60	Add fine grained memory buffers for work/sync arrays * Add interanl put_mem/get_mem{_wave, _wg} functions to read/write to work/sync arrays * Add condition check to ensure all MPI processes are on the same compute node for IPC conduit	2024-10-21 15:28:39 +00:00
Yiltan Hassan Temucin	722a5f0731	updated _wait APIs to use int rather than roc_shmem_cmps	2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin	bcf3fdff10	_wait routines changed parameter from ptr to ivars to match OpenSHMEM	2024-10-11 13:34:28 -07:00
Yiltan Hassan Temucin	509277c034	fixed notifier bug	2024-10-10 06:45:43 -07:00
Yiltan Hassan Temucin	b1134e8633	added notifier->sync() when we are not using cooperative groups updated scope bug	2024-10-09 13:11:28 -07:00
Yiltan Hassan Temucin	63667a3167	Added Cooperative Groups configure option and header	2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin	1baa071edf	Fix initialization order bug	2024-10-09 13:11:12 -07:00
Yiltan Hassan Temucin	e2f6a65284	fixed barrier issue on MI250X	2024-10-08 13:18:04 -07:00
avinashkethineedi	92fb1abaf2	Add team information to the context * Update roc_shmem_ctx_fence API to use team-relative PE numbering * Update backend to populate team_opaque member of ROC_SHMEM_CTX_DEFAULT (used to store information about the team wrt TEAM_WORLD)	2024-10-04 17:56:15 +00:00
avinashkethineedi	979aed105a	Add fence and quiet functionality * Perform atomic stores to enforce memory ordering	2024-10-03 06:28:12 +00:00
Avinash Kethineedi	e58077e3cf	Merge branch 'ipc_bringup' into ipc_atomics	2024-09-09 14:22:55 -05:00
Edgar Gabriel	dfcacdc4a3	remove pSync from internal_bcast functions remove the pSync arguments from the internal_broadcast functions, they are not used anyway.	2024-09-09 12:06:30 -07:00
avinashkethineedi	7bbf34d334	remove local_pe calculation from puts, gets and atomics functions * All the PEs are assumed to be accessible using IPC backend	2024-09-05 11:52:00 -07:00
Edgar Gabriel	aae6295460	ipc/context_ipc_device.cpp: set barrier_sync set the barrier_sync variable on the context during object creation	2024-08-28 09:41:05 -07:00
avinashkethineedi	e1e1ac6df6	Add atomics * Add atomic_add, atomic_set, atomic_cas, atomic_fetch_add and atomic_fetch_cas to IPC backend	2024-08-28 08:30:46 -07:00
avinashkethineedi	45a8cb3354	Update IPC object * Update the IPC object in the context class with the instance created in the IPC backend	2024-08-28 08:14:38 -07:00
Edgar Gabriel	0de3b5e6fc	first cut on collectives and sync code is based on the GPUIB implementations of the routines, which seem however generic enough to work also for the IPC conduit. Some code is in for broadcast, fcollect, and alltoall.	2024-08-27 15:03:38 -07:00

1 2

60 Commits