projects/rocshmem/README.md

# ROCm OpenSHMEM (rocSHMEM)

The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research
initiative to provide GPU-centric networking through an OpenSHMEM-like interface.
This intra-kernel networking library simplifies application
code complexity and enables more fine-grained communication/computation
overlap than traditional host-driven networking.
rocSHMEM uses a single symmetric heap that is allocated on GPU memories.

There are currently three backends for rocSHMEM;
IPC, Reverse Offload (RO), and GDA.
The backends primarily differ in their implementations of intra-kernel networking.

The IPC backend implements communication primitives using load/store operations issued from the GPU.

The Reverse Offload (RO) backend has the GPU runtime forward rocSHMEM networking operations
to the host-side runtime, which calls into a traditional MPI or OpenSHMEM
implementation. This forwarding of requests is transparent to the
programmer, who only sees the GPU-side interface.

The GPU Direct Async (GDA) backend allows for rocSHMEM to issue communication operations to the NIC directly from the device-side code, without involving a CPU proxy.
within the GPU.
During initialization we prepare network resources for each NIC vendor using the vendor-appropriate
Direct Verbs APIs.
When calling the device-side rocSHMEM API, the device threads are used to construct Work Queue Entries (WQEs) and post the communication to the send queues of the NIC directly.
Completion Queues (CQs) are polled from the device-side code as well.

The RO and GDA backend is provided as-is with limited support from AMD or AMD Research.

## Installation and using rocSHMEM

For information on how to install and use rocSHMEM,
[please see our documentation](https://rocm.docs.amd.com/projects/rocSHMEM/en/latest/).
Use new naming scheme 2024-11-25 14:12:15 -06:00			`# ROCm OpenSHMEM (rocSHMEM)`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00
Updated README 2024-12-11 14:32:52 -06:00			`The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research`
			`initiative to provide GPU-centric networking through an OpenSHMEM-like interface.`
			`This intra-kernel networking library simplifies application`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00			`code complexity and enables more fine-grained communication/computation`
			`overlap than traditional host-driven networking.`
Updated docs for ROCm 7.x.x (#239 ) 2025-10-17 12:10:37 -04:00			`rocSHMEM uses a single symmetric heap that is allocated on GPU memories.`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00
Import gda_devel back into develop (#206 ) 2025-09-05 12:21:18 -04:00			`There are currently three backends for rocSHMEM;`
Updated docs for ROCm 7.x.x (#239 ) 2025-10-17 12:10:37 -04:00			`IPC, Reverse Offload (RO), and GDA.`
Updated README 2024-12-11 14:32:52 -06:00			`The backends primarily differ in their implementations of intra-kernel networking.`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00
Updated README 2024-12-11 14:32:52 -06:00			`The IPC backend implements communication primitives using load/store operations issued from the GPU.`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00
Updated README 2024-12-11 14:32:52 -06:00			`The Reverse Offload (RO) backend has the GPU runtime forward rocSHMEM networking operations`
			`to the host-side runtime, which calls into a traditional MPI or OpenSHMEM`
			`implementation. This forwarding of requests is transparent to the`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00			`programmer, who only sees the GPU-side interface.`

Updated docs for ROCm 7.x.x (#239 ) 2025-10-17 12:10:37 -04:00			`The GPU Direct Async (GDA) backend allows for rocSHMEM to issue communication operations to the NIC directly from the device-side code, without involving a CPU proxy.`
			`within the GPU.`
			`During initialization we prepare network resources for each NIC vendor using the vendor-appropriate`
			`Direct Verbs APIs.`
			`When calling the device-side rocSHMEM API, the device threads are used to construct Work Queue Entries (WQEs) and post the communication to the send queues of the NIC directly.`
			`Completion Queues (CQs) are polled from the device-side code as well.`

			`The RO and GDA backend is provided as-is with limited support from AMD or AMD Research.`
update README documentation for RO (#63 ) 2025-03-25 07:50:15 -05:00
Cleanup readme.md 2025-12-01 10:43:18 -05:00			`## Installation and using rocSHMEM`
Transfer files from RAD repository 2024-07-01 09:57:08 -05:00
Cleanup readme.md 2025-12-01 10:43:18 -05:00			`For information on how to install and use rocSHMEM,`
			`[please see our documentation](https://rocm.docs.amd.com/projects/rocSHMEM/en/latest/).`