2
0
[ROCm/rocshmem commit: 67bff9ca30]
Este cometimento está contido em:
yugang-amd
2025-05-13 16:26:28 -04:00
cometido por GitHub
ascendente 2f82ed9bf0
cometimento 17cde51fb7
16 ficheiros modificados com 477 adições e 495 eliminações
+100 -101
Ver ficheiro
@@ -5,27 +5,26 @@
.. _rocshmem-api-amo:
---------------------------
Atomic Memory Operations
Atomic memory operations
---------------------------
- These functions can be called from divergent control paths at per-thread
granularity.
You can call these functions from divergent control paths at the per-thread level.
ROSHMEM_ATOMIC_FETCH
--------------------
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch(TYPE *source, int pe)
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch(rocshmem_ctx_t ctx, TYPE *source, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param pe: PE of the remote process.
:returns: The value of dest
:returns: The value of ``dest``.
**Description:**
Atomically return the value of dest to the calling PE.
This function atomically returns the value of ``dest`` to the calling PE.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in EXTENDED_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in EXTENDED_AMO_TYPES_.
SHMEM_ATOMIC_SET
@@ -33,17 +32,17 @@ SHMEM_ATOMIC_SET
.. cpp:function:: __device__ void rocshmem_TYPENAME_atomic_set(TYPE *dest, TYPE value, int pe);
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_atomic_set(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, int pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param val: The value to be atomically set
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically set.
:param pe: PE of the remote process.
:returns: None
:returns: None.
**Description:**
Atomically set the value val to dest on pe.
This function atomically sets the value ``value`` to ``dest`` on ``pe``.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in EXTENDED_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in EXTENDED_AMO_TYPES_.
SHMEM_ATOMIC_COMPARE_SWAP
-------------------------
@@ -52,20 +51,20 @@ SHMEM_ATOMIC_COMPARE_SWAP
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_compare_swap(rocshmem_ctx_t ctx, TYPE *dest, TYPE cond, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param cond: The value to be compare with
:param val: The value to be atomically swapped
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param cond: The value to be compare with.
:param value: The value to be atomically swapped.
:param pe: PE of the remote process.
:return: The old value of dest
:return: The old value of ``dest``.
**Description:**
Atomically compares if the value in dest with cond is equal then put val in dest.
The operation returns the older value of dest to the calling PE.
This function atomically compares the value in ``dest`` with ``cond``. If they are equal, it stores ``value`` in ``dest``.
The operation returns the older value of ``dest`` to the calling PE.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in STANDARD_AMO_TYPES_.
SHMEM_ATOMIC_SWAP
-----------------
@@ -74,18 +73,18 @@ SHMEM_ATOMIC_SWAP
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_swap(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param val: The value to be atomically swapped
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically swapped.
:param pe: PE of the remote process.
:return: The old value of dest
:return: The old value of ``dest``.
**Description:**
Atomically swaps the value val to dest on pe.
This function atomically swaps the value ``val`` with ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in EXTENDED_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in EXTENDED_AMO_TYPES_.
SHMEM_ATOMIC_FETCH_INC
----------------------
@@ -94,17 +93,17 @@ SHMEM_ATOMIC_FETCH_INC
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_inc(rocshmem_ctx_t ctx, TYPE *dest, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param pe: PE of the remote process.
:return: The old value of dest
:return: The old value of ``dest``.
**Description:**
Atomically adds 1 to dest on pe.
This function atomically adds ``1`` to ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in STANDARD_AMO_TYPES_.
SHMEM_ATOMIC_INC
----------------
@@ -113,17 +112,17 @@ SHMEM_ATOMIC_INC
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_atomic_inc(rocshmem_ctx_t ctx, TYPE *dest, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param pe: PE of the remote process.
:return: None
:return: None.
**Description:**
Atomically adds 1 to dest on pe.
This function atomically adds ``1`` to ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in STANDARD_AMO_TYPES_.
SHMEM_ATOMIC_FETCH_ADD
----------------------
@@ -132,18 +131,18 @@ SHMEM_ATOMIC_FETCH_ADD
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_add(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically added
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically added.
:param pe: PE of the remote process.
:return: The old value of dest
:return: The old value of ``dest``.
**Description:**
Atomically adds value to dest on pe.
This function atomically adds ``value`` to ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in STANDARD_AMO_TYPES_.
SHMEM_ATOMIC_ADD
----------------
@@ -152,15 +151,15 @@ SHMEM_ATOMIC_ADD
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_atomic_add(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically added
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically added.
:param pe: PE of the remote process.
:return: None
:return: None.
**Description:**
Atomically adds value to dest on pe.
This function atomically adds ``value`` to ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
@@ -172,18 +171,18 @@ SHMEM_ATOMIC_FETCH_AND
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_and(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically AND
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically ``AND``.
:param pe: PE of the remote process.
:return: The old value of dest.
:return: The old value of ``dest``.
**Description:**
Atomically bitwise-and value to the value at dest on pe.
This function atomically bitwise-and ``value`` to the value at ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in BITWISE_AMO_TYPES_.
SHMEM_ATOMIC_AND
----------------
@@ -192,18 +191,18 @@ SHMEM_ATOMIC_AND
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_and(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically AND
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically ``AND``.
:param pe: PE of the remote process.
:return: None
**Description:**
Atomically bitwise-and value to the value at dest on pe.
This function atomically bitwise-and ``value`` to the value at ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in BITWISE_AMO_TYPES_.
SHMEM_ATOMIC_FETCH_OR
----------------------
@@ -212,18 +211,18 @@ SHMEM_ATOMIC_FETCH_OR
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_or(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically OR
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically ``OR``.
:param pe: PE of the remote process.
:return: The old value of dest
:return: The old value of ``dest``.
**Description:**
Atomically bitwise-or value to the value at dest on pe.
This function atomically bitwise-or ``value`` to the value at ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in BITWISE_AMO_TYPES_.
SHMEM_ATOMIC_OR
---------------
@@ -232,18 +231,18 @@ SHMEM_ATOMIC_OR
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_or(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically OR
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically ``OR``.
:param pe: PE of the remote process.
:return: None.
**Description:**
Atomically bitwise-or value to the value at dest on pe.
This function atomically bitwise-or ``value`` to the value at ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in BITWISE_AMO_TYPES_.
SHMEM_ATOMIC_FETCH_XOR
----------------------
@@ -252,18 +251,18 @@ SHMEM_ATOMIC_FETCH_XOR
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_xor(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically XOR
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically ``XOR``.
:param pe: PE of the remote process.
:return: The old value of dest
:return: The old value of ``dest``.
**Description:**
Atomically bitwise-xor value to the value at dest on pe.
This function atomically bitwise-xor ``value`` to the value at ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in BITWISE_AMO_TYPES_.
SHMEM_ATOMIC_XOR
----------------
@@ -272,25 +271,25 @@ SHMEM_ATOMIC_XOR
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_xor(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: The value to be atomically XOR
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: The value to be atomically ``XOR``.
:param pe: PE of the remote process.
:return: None
:return: None.
**Description:**
Atomically bitwise-xor value to the value at dest on pe.
This function atomically bitwise-xor ``value`` to the value at ``dest`` on ``pe``.
The operation is blocking.
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in BITWISE_AMO_TYPES_.
SUPPORTED AMO DATA TYPES
Supported AMO data types
------------------------
.. _STANDARD_AMO_TYPES:
.. list-table:: Standard AMO Datatypes
.. list-table:: Standard AMO Data Types
:widths: 10 20 20
:header-rows: 1
@@ -336,7 +335,7 @@ SUPPORTED AMO DATA TYPES
.. _EXTENDED_AMO_TYPES:
.. list-table:: Extended AMO Datatypes
.. list-table:: Extended AMO Data Types
:widths: 10 20 20
:header-rows: 1
@@ -388,7 +387,7 @@ SUPPORTED AMO DATA TYPES
.. _BITWISE_AMO_TYPES:
.. list-table:: Bitwise AMO Datatypes
.. list-table:: Bitwise AMO Data Types
:widths: 10 20 20
:header-rows: 1
+52 -52
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-coll:
---------------------------
Collective Routines
Collective routines
---------------------------
ROCSHMEM_BARRIER_ALL
@@ -14,11 +14,11 @@ ROCSHMEM_BARRIER_ALL
.. cpp:function:: __device__ void rocshmem_ctx_wg_barrier_all(rocshmem_ctx_t ctx)
.. cpp:function:: __device__ void rocshmem_wg_barrier_all()
:param ctx: Context with which to perform this operation
:returns: None
:param ctx: Context with which to perform this operation.
:returns: None.
**Description:**
Perform a collective barrier between all PEs in the system.
This routine performs a collective barrier between all PEs in the system.
The caller is blocked until the barrier is resolved.
ROCSHMEM_TEAM_SYNC
@@ -27,17 +27,17 @@ ROCSHMEM_TEAM_SYNC
.. cpp:function:: __device__ void rocshmem_ctx_wg_team_sync(rocshmem_ctx_t ctx, rocshmem_team_t team)
.. cpp:function:: __device__ void rocshmem_wg_team_sync(rocshmem_team_t team)
:param ctx: Context with which to perform this operation
:param team: Team with which to perform this operation
:returns: None
:param ctx: Context with which to perform this operation.
:param team: Team with which to perform this operation.
:returns: None.
**Description:**
Registers the arrival of a PE at a barrier.
This routine registers the arrival of a PE at a barrier.
The caller is blocked until the synchronization is resolved.
In contrast with the shmem_barrier_all routine, shmem_team_sync only ensures
completion and visibility of previously issued memory stores and does not
ensure completion of remote memory updates issued via OpenSHMEM routines.
Unlike the ``shmem_barrier_all`` routine, ``shmem_team_sync`` only ensures the
completion and visibility of previously issued memory stores, but does not
ensure the completion of remote memory updates issued via OpenSHMEM routines.
ROCSHMEM_SYNC_ALL
-----------------
@@ -45,11 +45,11 @@ ROCSHMEM_SYNC_ALL
.. cpp:function:: __device__ void rocshmem_ctx_wg_sync_all(rocshmem_ctx_t ctx)
.. cpp:function:: __device__ void rocshmem_wg_sync_all()
:param ctx: Context with which to perform this operation
:returns: None
:param ctx: Context with which to perform this operation.
:returns: None.
**Description:**
This routine is the same as ``rocshmem_wg_team_sync`` if were to be called on the world team.
This routine behaves the same as ``rocshmem_wg_team_sync`` when called on the world team.
ROSHMEM_ALLTOALL
@@ -57,79 +57,79 @@ ROSHMEM_ALLTOALL
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_wg_alltoall(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems)
:param team: The team participating in the collective
:param dest: Destination address; Must be an address on the
symmetric heap
:param source: Source address; Must be an address on the symmetric
heap
:param nelems: Number of data blocks transferred per pair of PEs
:returns: None
:param team: The team participating in the collective.
:param dest: Destination address. Must be an address on the
symmetric heap.
:param source: Source address. Must be an address on the symmetric
heap.
:param nelems: Number of data blocks transferred per pair of PEs.
:returns: None.
**Description:**
Exchanges a fixed amount of contiguous data blocks between all pairs
This routine exchanges a fixed amount of contiguous data blocks between all pairs
of PEs participating in the collective routine.
This function must be called as a work-group collective.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`RMA_TYPES`.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`RMA_TYPES`.
ROCSHMEM_BROADCAST
------------------
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_wg_broadcast(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems, int pe_root)
:param ctx: Context with which to perform this collective
:param team: The team participating in the collective
:param dest: Destination address; Must be an address on the
symmetric heap
:param source: Source address; Must be an address on the symmetric
heap
:param nelems: Number of data blocks transferred per pair of PEs
:returns: None
:param ctx: Context with which to perform this collective.
:param team: The team participating in the collective.
:param dest: Destination address. Must be an address on the
symmetric heap.
:param source: Source address. Must be an address on the symmetric
heap.
:param nelems: Number of data blocks transferred per pair of PEs.
:returns: None.
**Description:**
Perform a broadcast between PEs in the team.
This routine performs a broadcast across PEs in the team.
The caller is blocked until the broadcast completes.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`RMA_TYPES`.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`RMA_TYPES`.
ROCSHMEM_FCOLLECT
-----------------
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_wg_fcollect(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems)
:param ctx: Context with which to perform this collective
:param team: The team participating in the collective
:param dest: Destination address; Must be an address on the
symmetric heap
:param source: Source address; Must be an address on the symmetric
heap
:param ctx: Context with which to perform this collective.
:param team: The team participating in the collective.
:param dest: Destination address. Must be an address on the
symmetric heap.
:param source: Source address. Must be an address on the symmetric
heap.
:param nelems: Number of data blocks transferred per pair of PEs.
:returns: None
:returns: None.
**Description:**
Concatenates blocks of data from multiple PEs to an array in every
This routine concatenates blocks of data from multiple PEs to an array in every
PE participating in the collective routine.
ROCSHMEM_REDUCTION
------------------
.. cpp:function:: __device__ int rocshmem_ctx_TYPENAME_OPNAME_wg_reduce(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nreduce)
:param ctx: Context with which to perform this collective
:param team: The team participating in the collective
:param dest: Destination address; Must be an address on the
symmetric heap
:param source: Source address; Must be an address on the symmetric
heap
:param nreduce: Number of data blocks transferred per pair of PEs
:returns: Zero on successful local completion; Nonzero otherwise
:param ctx: Context with which to perform this collective.
:param team: The team participating in the collective.
:param dest: Destination address. Must be an address on the
symmetric heap.
:param source: Source address. Must be an address on the symmetric
heap.
:param nreduce: Number of data blocks transferred per pair of PEs.
:returns: Zero on successful local completion. Nonzero otherwise.
**Description:**
Perform an allreduce between PEs in the team.
This routine performs an allreduce operation across PEs in the team.
Valid ``TYPENAME``, ``TYPE``, and ``OPNAME`` values can be seen at :ref:`REDUCE_TYPES`.
Valid ``TYPENAME``, ``TYPE``, and ``OPNAME`` values are listed in :ref:`REDUCE_TYPES`.
SUPPORTED REDUCTION TYPES AND OPERATIONS
Supported reduction types and operations
----------------------------------------
.. _REDUCE_TYPES:
+13 -13
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-ctx:
-----------------------------------
Context Management Routines
Context management routines
-----------------------------------
ROCSHMEM_CTX_CREATE
@@ -14,27 +14,27 @@ ROCSHMEM_CTX_CREATE
.. cpp:function:: __device__ int rocshmem_wg_ctx_create(int64_t options, rocshmem_ctx_t *ctx)
.. cpp:function:: __device__ int rocshmem_wg_team_create_ctx(rocshmem_team_t team, long options, rocshmem_ctx_t *ctx)
:param team: Team handle to derive the context from
:param options: Options for context creation (Ignored in current design, please use a value of 0)
:param ctx: Context handle
:param team: Team handle to derive the context from.
:param options: Options for context creation. Ignored in current design; use the value ``0``.
:param ctx: Context handle.
:returns: All threads returns 0 if the context was created successfully;
If any thread returns non-zero value, the operation failed and a higher number of
`ROCSHMEM_MAX_NUM_CONTEXTS` is required
:returns: All threads returns ``0`` if the context was created successfully.
If any thread returns non-zero value, the operation fails and a higher number of
``ROCSHMEM_MAX_NUM_CONTEXTS`` is required.
**Description:**
Creates an OpenSHMEM context. By design, the context is private to the calling work-group.
Must be called collectively by all threads in the work-group.
This routine creates an OpenSHMEM context. By design, the context is private to the calling work-group.
It must be called collectively by all threads in the work-group.
ROCSHMEM_CTX_DESTROY
--------------------
.. cpp:function:: __device__ void rocshmem_wg_ctx_destroy(rocshmem_ctx_t *ctx)
:param ctx: Context handle
:param ctx: Context handle.
:returns: None
:returns: None.
**Description:**
Destroys an rocSHMEM context.
Must be called collectively by all threads in the work-group.
This routine destroys an rocSHMEM context.
It must be called collectively by all threads in the work-group.
+33 -33
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-init:
---------------------------------------
Library Setup, Exit, and Query Routines
Library setup, exit, and query routines
---------------------------------------
ROCSHMEM_INIT
@@ -13,86 +13,86 @@ ROCSHMEM_INIT
.. cpp:function:: __host__ void rocshmem_init(void)
:Parameters: None
:returns: None
:Parameters: None.
:returns: None.
**Description:**
This routine initializes the rocSHMEM runtime and underlying transport layer.
Before ``rocshmem_init`` is called,
a user must select the device that this PE is associated to by calling
you must select the device that this PE is associated to by calling
`hipSetDevice
<https://rocm.docs.amd.com/projects/HIP/en/docs-6.0.0/doxygen/html/group___device.html#ga43c1e7f15925eeb762195ccb5e063eae>`_.
.. cpp:function:: __device__ void rocshmem_wg_init(void)
:Parameters: None
:returns: None
:Parameters: None.
:returns: None.
**Description:**
Initializes device-side rocSHMEM resources.
Must be called before any threads in this work-group invoke other rocSHMEM functions.
Must be called collectively by all threads in the work-group.
This routine initializes device-side rocSHMEM resources.
It must be called before any threads in this work-group invoke other rocSHMEM functions.
It must be called collectively by all threads in the work-group.
ROCSHMEM_FINALIZE
-----------------
.. cpp:function:: __host__ void rocshmem_finalize(void)
:Parameters: None
:returns: None
:Parameters: None.
:returns: None.
**Description:**
Finalize the rocSHMEM runtime.
This routine finalizes the rocSHMEM runtime.
.. cpp:function:: __device__ void rocshmem_wg_finalize(void)
:Parameters: None
:returns: None
:Parameters: None.
:returns: None.
**Description:**
Finalizes device-side rocSHMEM resources.
Must be called before work-group completion if the work-group also called ``rocshmem_wg_init``.
Must be called collectively by all threads in the work-group.
This routine finalizes device-side rocSHMEM resources.
It must be called before work-group completion if the work-group also called ``rocshmem_wg_init``.
It must be called collectively by all threads in the work-group.
ROCSHMEM_N_PES
--------------
.. cpp:function:: __host__ int rocshmem_n_pes(void)
:Parameters: None
:returns: Total number of PEs
:Parameters: None.
:returns: Total number of PEs.
**Description:**
Query the total number of PEs.
This routine can be called before ``rocshmem_init``.
This routine queries the total number of PEs.
It can be called before ``rocshmem_init``.
.. cpp:function:: __device__ int rocshmem_n_pes(void)
.. cpp:function:: __device__ int rocshmem_ctx_n_pes(rocshmem_ctx_t ctx)
:param ctx: GPU side context handle
:returns: Total number of PEs
:param ctx: GPU side context handle.
:returns: Total number of PEs.
**Description:**
Query the total number of PEs for a given context.
Can be called per thread with no performance penalty.
This routine queries the total number of PEs for a given context.
It can be called per thread with no performance penalty.
ROCSHMEM_MY_PE
--------------
.. cpp:function:: __host__ int rocshmem_my_pe(void)
:Parameters: None
:returns: PE ID of the caller
:Parameters: None.
:returns: PE ID of the caller.
**Description:**
Query the PE ID of the caller.
This routine can be called before ``rocshmem_init``.
This routine queries the PE ID of the caller.
It can be called before ``rocshmem_init``.
.. cpp:function:: __device__ int rocshmem_my_pe(void)
.. cpp:function:: __device__ int rocshmem_ctx_my_pe(rocshmem_ctx_t ctx)
:param ctx: GPU side context handle
:returns: PE ID of the caller
:param ctx: GPU side context handle.
:returns: PE ID of the caller.
**Description:**
Query the PE ID of the caller.
Can be called per thread with no performance penalty.
This routine queries the PE ID of the caller.
It can be called per thread with no performance penalty.
+9 -9
Ver ficheiro
@@ -6,7 +6,7 @@
---------------------------
Memory Management Routines
Memory management routines
---------------------------
ROCSHMEM_MALLOC
@@ -14,12 +14,12 @@ ROCSHMEM_MALLOC
.. cpp:function:: __host__ void *rocshmem_malloc(size_t size)
:param size: Memory allocation size in bytes
:returns: A pointer to the allocated memory on the symmetric heap;
If a valid allocation cannot be made, it returns NULL
:param size: Memory allocation size in bytes.
:returns: A pointer to the allocated memory on the symmetric heap.
If a valid allocation cannot be made, it returns ``NULL``.
**Description:**
Allocate memory of ``size`` bytes from the symmetric heap.
This routine allocates memory of ``size`` bytes from the symmetric heap.
This is a collective operation and must be called by all PEs.
ROCSHMEM_FREE
@@ -27,9 +27,9 @@ ROCSHMEM_FREE
.. cpp:function:: __host__ void rocshmem_free(void *ptr)
:param ptr: Pointer to previously allocated memory on the symmetric heap
:returns: None
:param ptr: A pointer to previously allocated memory on the symmetric heap.
:returns: None.
**Description:**
Free a memory allocation from the symmetric heap.
This is a collective operation and must be called by all PEs.
This routine frees a memory allocation from the symmetric heap.
It is a collective operation and must be called by all PEs.
+8 -8
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-memory-ordering:
---------------------------
Memory Ordering Routines
Memory ordering routines
---------------------------
ROCSHMEM_FENCE
@@ -16,12 +16,12 @@ ROCSHMEM_FENCE
.. cpp:function:: __device__ void rocshmem_ctx_fence(rocshmem_ctx_t ctx)
.. cpp:function:: __device__ void rocshmem_ctx_fence(rocshmem_ctx_t ctx, int pe)
:param ctx: Context with which to perform this operation
:param pe: Destination pe
:returns: None
:param ctx: Context with which to perform this operation.
:param pe: Destination ``pe``.
:returns: None.
**Description:**
Guarantees order between messages in this context in accordance with OpenSHMEM semantics.
This routine ensures order between messages in this context to follow OpenSHMEM semantics.
ROCSHMEM_QUIET
--------------
@@ -29,8 +29,8 @@ ROCSHMEM_QUIET
.. cpp:function:: __device__ void rocshmem_ctx_quiet(rocshmem_ctx_t ctx)
.. cpp:function:: __device__ void rocshmem_quiet()
:param ctx: Context with which to perform this operation
:returns: None
:param ctx: Context with which to perform this operation.
:returns: None.
**Description:**
Completes all previous operations posted to this context.
This routine completes all previous operations posted to this context.
+44 -42
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-pt2pt-sync:
-----------------------------------------
Point-to-Point Synchronization Routines
Point-to-point synchronization routines
-----------------------------------------
ROCSHMEM_WAIT_UNTIL
@@ -13,95 +13,97 @@ ROCSHMEM_WAIT_UNTIL
.. cpp:function:: __device__ void rocshmem_TYPENAME_wait_until(TYPE *ivars, int cmp, TYPE val)
:param ivars: Pointer to memory on the symmetric heap to wait for
:param cmp: Operation for the comparison
:param val: Value to compare the memory at ivars to
:returns: None
:param ivars: Pointer to memory on the symmetric heap to wait for.
:param cmp: Operation for the comparison.
:param val: Value to compare the memory at ``ivars`` to.
:returns: None.
**Description:**
Block the caller until the condition ``(*ivars cmp val)`` is true.
This routine blocks the caller until the condition ``(*ivars cmp val)`` is true.
Valid ``cmp`` values can be seen at :ref:`CMP_VALUES`.
Valid ``cmp`` values are listed in :ref:`CMP_VALUES`.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`STANDARD_AMO_TYPES`.
ROCSHMEM_WAIT_UNTIL_ALL
-----------------------
.. cpp:function:: __device__ void rocshmem_TYPENAME_wait_until_all(TYPE *ivars, size_t nelems, const int* status, int cmp, TYPE val)
:param ivars: Pointer to memory on the symmetric heap to wait for
:param nelems: Number of elements in the ivars array
:param status: Array of length nelems that is used to exclude elements from wait
:param cmp: Operation for the comparison
:param val: Value to compare
:returns: None
:param ivars: Pointer to memory on the symmetric heap to wait for.
:param nelems: Number of elements in the ``ivars`` array.
:param status: Array of length ``nelems`` to exclude elements from the wait.
:param cmp: Operation for the comparison.
:param val: Value to compare.
:returns: None.
**Description:**
Block the caller until the condition ``(ivars[i] cmp val)`` is true for all ivars
This routine blocks the caller until the condition ``(ivars[i] cmp val)`` is true for all ``ivars``.
Valid ``cmp`` values can be seen at :ref:`CMP_VALUES`.
Valid ``cmp`` values are listed in :ref:`CMP_VALUES`.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`STANDARD_AMO_TYPES`.
ROCSHMEM_WAIT_UNTIL_ANY
-----------------------
.. cpp:function:: __device__ size_t rocshmem_TYPENAME_wait_until_any(TYPE *ivars, size_t nelems, const int* status, int cmp, TYPE val)
:param ivars: Pointer to memory on the symmetric heap to wait for
:param nelems: Number of elements in the ivars array
:param status: Array of length nelems that is used to exclude elements from wait
:param cmp: Operation for the comparison
:param val: Value to compare
:returns: The index of an element in the ivars array that satisfies the wait condition; If the wait set is empty, this routine returns SIZE_MAX
:param ivars: Pointer to memory on the symmetric heap to wait for.
:param nelems: Number of elements in the ``ivars`` array.
:param status: Array of length ``nelems`` to exclude elements from the wait.
:param cmp: Operation for the comparison.
:param val: Value to compare.
:returns: The index of an element in the ``ivars`` array that satisfies the wait condition. If the wait set is empty, this routine returns ``SIZE_MAX``.
**Description:**
Block the caller until any of the condition ``(ivars[i] cmp val)`` is true.
This routine blocks the caller until any of the condition ``(ivars[i] cmp val)`` is true.
Valid `cmp` values can be seen at :ref:`CMP_VALUES`.
Valid ``cmp`` values are listed in :ref:`CMP_VALUES`.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`STANDARD_AMO_TYPES`.
ROCSHMEM_WAIT_UNTIL_SOME
------------------------
.. cpp:function:: __device__ size_t rocshmem_TYPENAME_wait_until_some(TYPE *ivars, size_t nelems, size_t* indices, const int* status, int cmp, TYPE val)
:param ivars: Pointer to memory on the symmetric heap to wait for
:param nelems: Number of elements in the ivars array
:param indices: List of indices that of at least of length nelems
:param status: Array of length nelems that is used to exclude elements from wait
:param cmp: Operation for the comparison
:param val: Value to compare
:returns: The number of indices returned in the indices array; If the wait set is empty, this routine returns 0
:param ivars: Pointer to memory on the symmetric heap to wait for.
:param nelems: Number of elements in the ``ivars`` array.
:param indices: List of indices with a length of at least ``nelems``.
:param status: Array of length ``nelems`` to exclude elements from the wait.
:param cmp: Operation for the comparison.
:param val: Value to compare.
:returns: The number of indices returned in the indices array. If the wait set is empty, this routine returns ``0``.
**Description:**
Block the caller until any of the conditions ``(ivars[i] cmp val)`` is true.
This routine blocks the caller until any of the conditions ``(ivars[i] cmp val)`` is true.
Valid `cmp` values can be seen at :ref:`CMP_VALUES`.
Valid ``cmp`` values are listed in :ref:`CMP_VALUES`.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`STANDARD_AMO_TYPES`.
ROCSHMEM_TEST
-------------
.. cpp:function:: __device__ int rocshmem_TYPENAME_test(TYPE *ivars, int cmp, TYPE val)
:param ivars: Pointer to memory on the symmetric heap to wait for
:param cmp: Operation for the comparison
:param val: Value to compare the memory at ivars to
:param ivars: Pointer to memory on the symmetric heap to wait for.
:param cmp: Operation for the comparison.
:param val: Value to compare the memory at ``ivars`` to.
:returnS: 1 if the evaluation is true, 0 otherwise
:returns: ``1`` if the evaluation is true. ``0`` otherwise.
**Description:**
Test if the condition ``(*ivars cmp val)`` is true.
This routine tests if the condition ``(*ivars cmp val)`` is true.
SUPPORTED COMPARISONS
Supported comparisons
---------------------
.. _CMP_VALUES:
The following table lists the point-to-point comparison constants:
.. list-table:: Point-to-Point Comparison Constants
:widths: 20 20
:header-rows: 1
+48 -47
Ver ficheiro
@@ -5,15 +5,14 @@
.. _rocshmem-api-rma:
-----------------------------------------
Remote Memory Access Routines
Remote memory access routines
-----------------------------------------
- Routines with the ``_wave`` and ``_wg`` suffixes,
require all threads in a wavefront and workgroup, respectively,
to call into the routine with the same parameters.
- Routines with the ``_wave`` and ``_wg`` suffixes require all threads in a wavefront and workgroup, respectively,
to call the routine with the same parameters.
- Routines with the ``_nbi`` substring will return as soon as the request is posted.
- Routines without the ``_nbi`` substring block until the operation completes locally.
- Valid ``TYPENAME`` and ``TYPE`` values can be seen in RMA_TYPES_.
- Routines without the ``_nbi`` substring will block until the operation completes locally.
- Valid ``TYPENAME`` and ``TYPE`` values can be found in RMA_TYPES_.
ROCSHMEM_PUT
------------
@@ -31,15 +30,15 @@ ROCSHMEM_PUT
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_nbi_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_nbi_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param source: Source address; Must be an address on the symmetric heap
:param nelems: The number of elements to transfer
:param pe: PE of the remote process
:returns: None
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param source: Source address. Must be an address on the symmetric heap.
:param nelems: The number of elements to transfer.
:param pe: PE of the remote process.
:returns: None.
**Description:**
Writes contiguous data of nelems elements from source on the calling PE to dest at pe.
This routine writes contiguous data of ``nelems`` elements from source on the calling PE to ``dest`` at ``pe``.
ROCSHMEM_PUTMEM
---------------
@@ -57,16 +56,16 @@ ROCSHMEM_PUTMEM
.. cpp:function:: __device__ void rocshmem_ctx_putmem_nbi_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_putmem_nbi_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param source: Source address; Must be an address on the symmetric heap
:param nelems: Size of the transfer in bytes
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param source: Source address. Must be an address on the symmetric heap.
:param nelems: Size of the transfer in bytes.
:param pe: PE of the remote process.
:returns: None
:returns: None.
**Description:**
Writes contiguous data of nelems bytes from source on the calling PE to dest at pe.
This routine writes contiguous data of ``nelems`` bytes from source on the calling PE to ``dest`` at ``pe``.
ROCSHMEM_P
----------
@@ -74,15 +73,15 @@ ROCSHMEM_P
.. cpp:function:: __device__ void rocshmem_TYPENAME_p(TYPE *dest, TYPE value, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_p(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param value: Value to write to dest at pe
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param value: Value to write to ``dest`` at ``pe``.
:param pe: PE of the remote process.
:returns: None
:returns: None.
**Description:**
Writes a single value to dest at pe PE to dst at pe.
This routine writes a single value to to ``dest`` at ``pe``.
ROCSHMEM_GET
------------
@@ -100,16 +99,16 @@ ROCSHMEM_GET
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_nbi_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_nbi_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param source: Source address; Must be an address on the symmetric heap
:param nelems: The number of elements to transfer
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address; Must be an address on the symmetric heap.
:param source: Source address. Must be an address on the symmetric heap.
:param nelems: The number of elements to transfer.
:param pe: PE of the remote process.
:returns: None
:returns: None.
**Description:**
Reads contiguous data of nelems elements from source on pe to dest on the calling PE.
This routine reads contiguous data of ``nelems`` elements from source on ``pe`` to ``dest`` on the calling PE.
ROCSHMEM_GETMEM
---------------
@@ -127,37 +126,39 @@ ROCSHMEM_GETMEM
.. cpp:function:: __device__ void rocshmem_ctx_getmem_nbi_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_getmem_nbi_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param source: Source address; Must be an address on the symmetric heap
:param nelems: Size of the transfer in bytes
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param source: Source address. Must be an address on the symmetric heap.
:param nelems: Size of the transfer in bytes.
:param pe: PE of the remote process.
:returns: None
:returns: None.
**Description:**
Reads contiguous data of nelems bytes from source on pe to dest on the calling PE.
This routine reads contiguous data of ``nelems`` bytes from source on ``pe`` to ``dest`` on the calling PE.
ROCSHMEM_G
----------
.. cpp:function:: __device__ float rocshmem_ctx_float_g(rocshmem_ctx_t ctx, const float *source, int pe)
.. cpp:function:: __device__ float rocshmem_float_g(const float *source, int pe)
:param ctx: Context with which to perform this operation
:param source: Source address; Must be an address on the symmetric heap
:param pe: PE of the remote process
:param ctx: Context with which to perform this operation.
:param source: Source address. Must be an address on the symmetric heap.
:param pe: PE of the remote process.
:returns: The value read from source at pe
:returns: The value read from source at ``pe``.
**Description:**
Reads and returns single value from source at pe.
This routine reads and returns single value from source at ``pe``.
SUPPORTED RMA DATA TYPES
Supported RMA data types
------------------------
The following table lists the supported RMA data types:
.. _RMA_TYPES:
.. list-table:: RMA Datatypes
.. list-table:: RMA Data Types
:widths: 10 20 20
:header-rows: 1
+32 -32
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-sigops:
---------------------
Signaling Operations
Signaling operations
---------------------
ROCSHMEM_PUTMEM_SIGNAL
@@ -24,20 +24,20 @@ ROCSHMEM_PUTMEM_SIGNAL
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_nbi_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_nbi_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param source: Source address; Must be an address on the symmetric heap
:param nelems: The number of bytes to transfer
:param sig_addr: Signal address; Must be an address on the symmetric heap
:param signal: Signal value
:param sig_op: Atomic operation to apply the signal value
:param pe: PE of the remote process
:returns: None
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param source: Source address. Must be an address on the symmetric heap.
:param nelems: The number of bytes to transfer.
:param sig_addr: Signal address. Must be an address on the symmetric heap.
:param signal: Signal value.
:param sig_op: Atomic operation to apply the signal value.
:param pe: PE of the remote process.
:returns: None.
**Description:**
Writes contiguous data of nelems bytes from source on the calling PE to dest at pe.
Then applies sig_op at sig_addr using the signal value.
Valid sig_op values can be seen at SIGNAL_OPERATORS_.
This function writes contiguous data of ``nelems`` bytes from source on the calling PE to ``dest`` at ``pe``,
then applies ``sig_op`` at ``sig_addr`` with the signal value.
Valid ``sig_op values`` are listed in SIGNAL_OPERATORS_.
ROCSHMEM_PUT_SIGNAL
-------------------
@@ -55,21 +55,21 @@ ROCSHMEM_PUT_SIGNAL
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_nbi_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_nbi_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
:param ctx: Context with which to perform this operation
:param dest: Destination address; Must be an address on the symmetric heap
:param source: Source address; Must be an address on the symmetric heap
:param nelems: The number of elements of size TYPE to transfer
:param sig_addr: Signal address; Must be an address on the symmetric heap
:param signal: Signal value
:param sig_op: Atomic operation to apply the signal value
:param pe: PE of the remote process
:returns: None
:param ctx: Context with which to perform this operation.
:param dest: Destination address. Must be an address on the symmetric heap.
:param source: Source address. Must be an address on the symmetric heap.
:param nelems: The number of elements of size ``TYPE`` to transfer.
:param sig_addr: Signal address. Must be an address on the symmetric heap.
:param signal: Signal value.
:param sig_op: Atomic operation to apply the signal value.
:param pe: PE of the remote process.
:returns: None.
**Description:**
Writes contiguous data of nelems elements of TYPE from source on the calling PE to dest at pe.
Then applies sig_op at sig_addr using the signal value.
Valid sig_op values can be seen at SIGNAL_OPERATORS_.
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`RMA_TYPES`.
This function writes contiguous data of ``nelems`` elements of ``TYPE`` from source on the calling PE to ``dest`` at ``pe``,
then applies ``sig_op`` at ``sig_addr`` with the signal value.
Valid ``sig_op values`` are listed in SIGNAL_OPERATORS_.
Valid ``TYPENAME`` and ``TYPE`` values are listed in :ref:`RMA_TYPES`.
ROCSHMEM_SIGNAL_FETCH
---------------------
@@ -78,13 +78,13 @@ ROCSHMEM_SIGNAL_FETCH
.. cpp:function:: __device__ uint64_t rocshmem_signal_fetch_wg(const uint64_t *sig_addr)
.. cpp:function:: __device__ uint64_t rocshmem_signal_fetch_wave(const uint64_t *sig_addr)
:param sig_addr: Signal address; Must be an address on the symmetric heap
:returns: Value at sig_addr
:param sig_addr: Signal address. Must be an address on the symmetric heap.
:returns: Value at ``sig_addr``.
**Description:**
Atomically fetches the value stored at sig_addr.
This function atomically fetches the value stored at ``sig_addr``.
SIGNAL OPERATORS
Signal operators
----------------
.. _SIGNAL_OPERATORS:
@@ -95,7 +95,7 @@ SIGNAL OPERATORS
* - Value
- Description
* - ROCSHMEM_SIGNAL_SET
- The signaling operation routines will atomical set our signal value at sig_addr.
- The signaling operation routines will atomically set the signal value at ``sig_addr``.
* - ROCSHMEM_SIGNAL_ADD
- The signaling operation routines will atomical add our signal value at sig_addr.
- The signaling operation routines will atomically add the signal value at ``sig_addr``.
+30 -30
Ver ficheiro
@@ -5,7 +5,7 @@
.. _rocshmem-api-teams:
-------------------------
Team Management Routines
Team management routines
-------------------------
ROCSHMEM_TEAM_MY_PE
@@ -13,78 +13,78 @@ ROCSHMEM_TEAM_MY_PE
.. cpp:function:: __host__ int rocshmem_team_my_pe(rocshmem_team_t team)
:param team: The team to query
:returns: PE ID of the caller in the provided team
:param team: The team to query.
:returns: PE ID of the caller in the provided team.
**Description:**
Query the PE ID of the caller in a team.
This routine queries the PE ID of the caller in a team.
ROCSHMEM_TEAM_N_PES
-------------------
.. cpp:function:: __host__ int rocshmem_team_n_pes(rocshmem_team_t team)
:param team: The team to query
:returns: Number of PEs in the provided team
:param team: The team to query.
:returns: Number of PEs in the provided team.
**Description:**
Query the number of PEs in a team.
This routine queries the number of PEs in a team.
ROCSHMEM_TEAM_TRANSLATE_PE
--------------------------
.. cpp:function:: __host__ int rocshmem_team_translate_pe(rocshmem_team_t src_team, int src_pe, rocshmem_team_t dest_team)
:param src_team: Handle of the team from which to translate
:param src_pe: PE-of-interest's index in src_team
:param dest_team: Handle of the team to which to translate
:returns: PE of src_pe in dest_team;
If any input is invalid or if src_pe is
not in both source and destination teams, a value of -1 is returned
:param src_team: Handle of the team from which to translate.
:param src_pe: PE-of-interest's index in ``src_team``.
:param dest_team: Handle of the team to which to translate.
:returns: PE of ``src_pe`` in ``dest_team``.
If any input is invalid or if ``src_pe`` is
not in both source and destination teams, a value of ``-1`` is returned.
**Description:**
Translate the PE in src_team to that in dest_team.
This routine translates the PE in ``src_team`` to that in ``dest_team``.
ROCSHMEM_TEAM_SPLIT_STRIDED
---------------------------
.. cpp:function:: __host__ int rocshmem_team_split_strided(rocshmem_team_t parent_team, int start, int stride, int size, const rocshmem_team_config_t *config, long config_mask, rocshmem_team_t *new_team)
:param parent_team: The team to split from
:param parent_team: The team to split from.
:param start: The lowest PE number of the subset of the PEs
from the parent team that will form the new
team
team.
:param stride: The stride between team PE members in the
parent team that comprise the subset of PEs
that will form the new team
:param size: The number of PEs in the new team
:param config: Pointer to the config parameters for the new team
:param config_mask: Bitwise mask representing parameters to use from config
:param new_team: Pointer to the newly created team;
that will form the new team.
:param size: The number of PEs in the new team.
:param config: Pointer to the config parameters for the new team.
:param config_mask: Bitwise mask representing parameters to use from config.
:param new_team: Pointer to the newly created team.
If an error occurs during team creation, or if the PE in
the parent team is not in the new team, the value will be
ROCSHMEM_TEAM_INVALID
``ROCSHMEM_TEAM_INVALID``.
:returns: Zero upon successful team creation; non-zero if erroneous
:returns: Zero upon successful team creation; non-zero if erroneous.
**Description:**
Create a new a team of PEs. Must be called by all PEs in the parent team.
This routine creates a new a team of PEs. It must be called by all PEs in the parent team.
ROCSHMEM_TEAM_DESTROY
---------------------
.. cpp:function:: __host__ void rocshmem_team_destroy(rocshmem_team_t team)
:param team: The team to destroy; The behavior is undefined if
the input team is ROCSHMEM_TEAM_WORLD or any other
invalid team; If the input is ROCSHMEM_TEAM_INVALID,
this function will not perform any operation
:param team: The team to destroy. The behavior is undefined if
the input team is ``ROCSHMEM_TEAM_WORLD`` or any other
invalid team. If the input is ``ROCSHMEM_TEAM_INVALID``,
this function will not perform any operation.
:returns: None
**Description:**
Destroy a team. Must be called by all PEs in the team.
The user must destroy all private contexts created in the
This routine destroys a team. It must be called by all PEs in the team.
You must destroy all private contexts created in the
team before destroying this team. Otherwise, the behavior
is undefined. This call will destroy only the shareable contexts
created from the referenced team.
+32 -29
Ver ficheiro
@@ -1,24 +1,26 @@
-------------------------
Running rocSHMEM Programs
-------------------------
.. meta::
:description: Information on how to compile and run rocSHMEM programs.
:keywords: rocSHMEM, ROCm, library, API, compile, link, hipcc
Compiling and Linking with rocSHMEM
.. _running-programs:
--------------------------------------------------
Compiling and running rocSHMEM programs
--------------------------------------------------
This topic explains how to compile and run rocSHMEM programs.
Compiling and linking with rocSHMEM
-----------------------------------
RocSHMEM is built as a library that can be statically
linked to your application during compilation using ``hipcc``.
rocSHMEM is a library that can be statically linked to your application during compilation with ``hipcc``. For more information, see :doc:`HIPCC <hipcc:index>`.
During the compilation of your application, include the rocSHMEM header files
and the rocSHMEM library when using ``hipcc``.
Since rocSHMEM depends on MPI (in version 6.4.0, this requirement may be dropped
in future versions) you will need to link with an MPI library.
The arguments for MPI linkage must be added manually as opposed to using ``mpicc``.
When compiling your application with ``hipcc``, you must include the rocSHMEM header files and the rocSHMEM library.
Because rocSHMEM depends on MPI, you must manually add the arguments for MPI linkage instead of using ``mpicc``.
When using ``hipcc`` directly (as opposed to through a build system), we
recommend performing the compilation and linking steps separately.
When using ``hipcc`` directly without a build system, it's recommended to perform the compilation and linking steps separately.
For example, one can refer to how to compile the examples files (``./examples/*`` in
the source tarball) with the following compile and link commands:
Example compile and link commands are provided at the top of the example files in the ``examples`` directory:
.. code-block:: bash
@@ -34,32 +36,33 @@ the source tarball) with the following compile and link commands:
$OPENMPI_UCX_INSTALL_DIR/lib/libmpi.so \
-L/opt/rocm/lib -lamdhip64 -lhsa-runtime64
If your project uses cmake, you may refer to
If your project uses CMake, see
`Using CMake with AMD ROCm <https://rocmdocs.amd.com/en/latest/conceptual/cmake-packages.html>`_.
Running a rocSHMEM program
--------------------------
Program that use rocSHMEM will typically deploy multiple processes (Typically, one per GPU).
The MPI launcher (e.g., ``mpiexec`` when using Open MPI) is used to start the required number
of processes. As an example, one may launch 2 getmem example processes (available when compiled from source) using the following command line:
Programs using rocSHMEM typically deploy multiple processes, usually one per GPU.
The MPI launcher, for example, ``mpiexec`` with Open MPI, is used to start the required number
of processes. For example, to launch two ``getmem`` example processes (available when compiled from source):
.. code-block:: bash
mpiexec --map-by numa --mca pml ucx --mca osc ucx -np 2 ./build/examples/rocshmem_getmem_test
Please refer to the Open MPI documentation for more information about ``mpiexec`` command line parameters.
See the `Open MPI documentation <https://docs.open-mpi.org/en/main/>`_ for more information about ``mpiexec`` command line parameters.
.. note::
Some systems may have multiple installs of MPI, some of which would not
have GPU support enabled. Make sure you use the ``mpiexec`` from the expected
MPI library, notably when using the MPI you built yourself
Some systems may have multiple MPI installations, some of which do not
have GPU support enabled. You must use the ``mpiexec`` from the expected
MPI library, especially when using the MPI built by yourself
as part of :ref:`install-dependencies`.
Environment Variables
Environment variables
---------------------
The behavior of rocSHMEM can be controlled with the following environment variables:
You can control the behavior of rocSHMEM by using the following environment variables:
.. list-table:: Environment Variables
:widths: 30 10 20
@@ -69,12 +72,12 @@ The behavior of rocSHMEM can be controlled with the following environment variab
- Default Value
- Description
* - ROCSHMEM_HEAP_SIZE
- 1 GB
- Defines the size of the rocSHMEM symmetric heap.
- 1
- Defines the size of the rocSHMEM symmetric heap in GB.
Note the heap is on the GPU memory.
* - ROCSHMEM_MAX_NUM_CONTEXTS
- 1024
- Defines the number of contexts an application can use
- Defines the number of contexts an application can use.
* - ROCSHMEM_MAX_NUM_TEAMS
- 40
- Defines the number of teams an application can use
- Defines the number of teams an application can use.
+2 -2
Ver ficheiro
@@ -13,10 +13,10 @@ with open('../include/rocshmem/rocshmem.hpp', encoding='utf-8') as f:
if not match:
raise ValueError("VERSION not found!")
version_number = match[1]
left_nav_title = f"rocSHMEM {version_number} Documentation"
left_nav_title = f"rocSHMEM {version_number} documentation"
# for PDF output on Read the Docs
project = "rocSHMEM Documentation"
project = "rocSHMEM"
author = "Advanced Micro Devices, Inc."
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
version = version_number
+16 -21
Ver ficheiro
@@ -1,17 +1,12 @@
.. meta::
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
:description: rocSHMEM is a runtime that provides GPU-centric networking through an OpenSHMEM-like interface.
:keywords: rocSHMEM, ROCm, OpenSHMEM, library, API, IPC, RO
****************************
rocSHMEM Documentation
rocSHMEM documentation
****************************
The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research initiative
to provide GPU-centric networking through an OpenSHMEM-like interface.
This intra-kernel networking library simplifies application code complexity and
enables more fine-grained communication/computation overlap
than traditional host-driven networking.
rocSHMEM uses a single symmetric heap (SHEAP) that is allocated on GPU memories. To learn more, see :doc:`introduction`
The ROCm OpenSHMEM (rocSHMEM) runtime is an intra-kernel networking library that provides GPU-centric networking through an `OpenSHMEM-like <http://www.openshmem.org/site/>`_ interface. It simplifies application code complexity and enables finer communication and computation overlap than traditional host-driven networking. rocSHMEM uses a single symmetric heap (SHEAP) allocated to GPU memories. For more information, see :doc:`introduction`
The code is open and hosted at `<https://github.com/ROCm/rocSHMEM>`_.
@@ -24,20 +19,20 @@ The code is open and hosted at `<https://github.com/ROCm/rocSHMEM>`_.
.. grid-item-card:: How to
* :doc:`Compile and Run rocSHMEM Programs <./compile_and_run>`
* :doc:`Compile and run rocSHMEM programs <./compile_and_run>`
.. grid-item-card:: API Reference
.. grid-item-card:: API reference
* :doc:`Library Setup, Exit, and Query Routines <./api/init>`
* :doc:`Memory Management Routines <./api/memory_management>`
* :doc:`Team Management Routines <./api/teams>`
* :doc:`Context Management Routines <./api/ctx>`
* :doc:`Remote Memory Access Routines <./api/rma>`
* :doc:`Atomic Memory Operations <./api/amo>`
* :doc:`Signaling Operations <./api/sigops>`
* :doc:`Collective Routines <./api/coll>`
* :doc:`Point-to-Point Synchronization Routines <./api/pt2pt_sync>`
* :doc:`Memory Ordering Routines <./api/memory_ordering>`
* :doc:`Library setup, exit, and query routines <./api/init>`
* :doc:`Memory management routines <./api/memory_management>`
* :doc:`Team management routines <./api/teams>`
* :doc:`Context management routines <./api/ctx>`
* :doc:`Remote memory access routines <./api/rma>`
* :doc:`Atomic memory operations <./api/amo>`
* :doc:`Signaling operations <./api/sigops>`
* :doc:`Collective routines <./api/coll>`
* :doc:`Point-to-point synchronization routines <./api/pt2pt_sync>`
* :doc:`Memory ordering routines <./api/memory_ordering>`
To contribute to the documentation, refer to
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
+28 -37
Ver ficheiro
@@ -1,6 +1,6 @@
.. meta::
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
:description: Instruction on how to install rocSHMEM.
:keywords: rocSHMEM, ROCm, install, build, dependencies, MPI, UCX, Open MPI
.. _install-rocshmem:
@@ -10,27 +10,23 @@ Installing rocSHMEM
This topic describes how to install rocSHMEM.
The file `README.md <https://github.com/ROCm/rocSHMEM/blob/rocm-6.4.0/README.md>`_ in the rocSHMEM sources may contain additional information.
Requirements
---------------------------
1. ROCm stack installed on the system (HIP runtime)
* ROCm 6.4.0 or later, including the :doc:`HIP runtime <hip:index>`.
* ROCm v6.4.0 or later
2. AMD GPUs
* AMD GPUs
* MI250X
* MI300X
3. ROCm-aware Open MPI and UCX as described in Building Dependencies
* ROCm-aware Open MPI and UCX. For more information, see :ref:`install-dependencies`.
Installing from a Package Manager
Installing from a package manager
---------------------------------
On Ubuntu, rocSHMEM can be installed with the following command:
On Ubuntu, you can install rocSHMEM by running:
.. code-block:: bash
@@ -38,22 +34,16 @@ On Ubuntu, rocSHMEM can be installed with the following command:
.. note::
This installation method requires ROCm 6.4 or newer. Dependencies
(open MPI and UCX) still need to be built following the instructions
in the next paragraph, as the distribution packaged versions do not
include full accelerator support.
This installation method requires ROCm 6.4 or later. You must manually build dependencies such as Open MPI and UCX, because the distribution packaged versions don't include full accelerator support. For more information, see :ref:`install-dependencies`.
.. _install-dependencies:
Building Dependencies
Building dependencies
---------------------------
rocSHMEM requires a ROCm-Aware Open MPI and UCX.
Other MPI implementations, such as MPICH,
*should* be compatible, if rocSHMEM is built from source,
but it has not been thoroughly tested.
rocSHMEM requires ROCm-Aware Open MPI and UCX. Other MPI implementations, such as MPICH, have not been fully tested.
To build and configure ROCm-Aware UCX (1.17.0 or later), you need to:
To build and configure ROCm-Aware UCX 1.17.0 or later, run:
.. code-block:: bash
@@ -64,7 +54,7 @@ To build and configure ROCm-Aware UCX (1.17.0 or later), you need to:
make -j 8
make -j 8 install
Then, you need to build Open MPI (5.0.7 or later) with UCX support.
To build Open MPI 5.0.7 or later with UCX support, run:
.. code-block:: bash
@@ -75,23 +65,24 @@ Then, you need to build Open MPI (5.0.7 or later) with UCX support.
make -j 8
make -j 8 install
Alternatively, we have script to install dependencies.
Configuration options are platform dependent, so please review the script to
check for fitness with your system.
Alternatively, you can use a script to install dependencies:
.. code-block:: bash
export BUILD_DIR=/path/to/not_rocshmem_src_or_build/dependencies
/path/to/rocshmem_src/scripts/install_dependencies.sh
For more information on OpenMPI-UCX support, please visit:
`GPU-enabled Message Passing Interface <https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html>`_
.. note::
Installing rocSHMEM from Source
Configuration options vary by platform. Review the script to ensure it is compatible with your system.
For more information about OpenMPI-UCX support, see
`GPU-enabled Message Passing Interface <https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html>`_.
Installing rocSHMEM from source
--------------------------------
The following method can be used to build and install rocSHMEM with the IPC
on-node, GPU-to-GPU backend:
To build and install rocSHMEM with the IPC on-node, GPU-to-GPU backend, run:
.. code-block:: bash
@@ -101,14 +92,14 @@ on-node, GPU-to-GPU backend:
cd build
../scripts/build_configs/ipc_single
The build script passes configuration options to CMake to setup a canonical
build.
There are other scripts for experimental configurations in the
`./scripts/build_configs` directory, but currently, only `ipc_single`
is supported.
The build script passes configuration options to CMake to setup a canonical build.
By default, the library is installed in `~/rocshmem`. You may provide a
custom install path by supplying it as an argument. For example:
.. note::
Other experimental configuration scripts are available in ``./scripts/build_configs``, but only ``ipc_single`` is currently supported.
By default, the library is installed in ``~/rocshmem``. You can customize the installation path by running:
.. code-block:: bash
+18 -27
Ver ficheiro
@@ -1,5 +1,5 @@
.. meta::
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
:description: rocSHMEM intra-kernel networking runtime for AMD GPUs on the ROCm platform.
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
.. _rocshmem-introduction:
@@ -8,47 +8,38 @@
What is rocSHMEM?
---------------------------
The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research initiative
to provide GPU-centric networking through an OpenSHMEM-like interface.
This intra-kernel networking library simplifies application code complexity and
enables more fine-grained communication/computation overlap
than traditional host-driven networking.
rocSHMEM uses a single symmetric heap (SHEAP) that is allocated on GPU memories.
The ROCm OpenSHMEM (rocSHMEM) runtime is an intra-kernel networking library that provides GPU-centric networking through an OpenSHMEM-like interface. It simplifies application code complexity and enables finer communication and computation overlap than traditional host-driven networking. rocSHMEM uses a single symmetric heap (SHEAP) allocated on GPU memories.
The code is open and hosted at `<https://github.com/ROCm/rocSHMEM>`_.
The rocSHMEM Programming Model
The rocSHMEM programming model
-------------------------------
Defining how OpenSHMEM applications interact with GPUs remains an
ongoing active discussion within the OpenSHMEM community, and the OpenSHMEM
ongoing active discussion within the `OpenSHMEM <http://www.openshmem.org/site/>`_ community, and the OpenSHMEM
specification has yet to coalesce on this topic.
rocSHMEM extends beyond the OpenSHMEM specification to add semantic that
support GPU kernel communication, while maintaining close resemblance to
rocSHMEM extends beyond the OpenSHMEM specification to add semantics that
support GPU kernel communication while maintaining close resemblance to
the original OpenSHMEM specification semantics.
Applications that use HIP can be easily interface with rocSHMEM.
As per the HIP programming model,
rocSHMEM has `__host__` APIs which are to be called from host code,
and `__device__` APIs which can be called within GPU Kernels.
Any device APIs which do not have any special suffixes/infixes (e.g. `_wg` or `_wave`)
Applications using :doc:`HIP <hip:index>` can interface with rocSHMEM.
Using the HIP programming model,
rocSHMEM provides ``__host__`` APIs for host code,
and ``__device__`` APIs for GPU kernels.
Device APIs without special suffixes or infixes , for example, ``_wg`` or ``_wave``,
must be called by a single thread.
GPU specific `_wg` and `_wave` APIs are expected to be called from multiple GPU threads
and block until the calling scope completes.
These APIs can be called in divergent code paths but this is not recommended.
GPU specific ``_wg`` and ``_wave`` APIs are designed to be called by multiple GPU threads
and will block until the calling scope completes.
These APIs can be called in divergent code paths, but this is not recommended.
Wavefront APIs
==============
The wavefront APIs are any API calls that have the suffix `_wave`.
Wavefront APIs are those with the ``_wave`` suffix.
The parameters in which these routines are called must be
the same for every thread in the wavefront.
If any thread calls these routines with differing parameters, the behavior is undefined.
These APIs will block until the calling wavefront completes.
The behavior is undefined if any thread calls these routines with different parameters. These APIs will block until the calling wavefront is complete.
Workgroup APIs
==============
The workgroup APIs are any API calls that have the suffix `_wg` or infix `_wg_`.
The workgroup APIs have the ``_wg`` suffix or ``_wg_`` infix.
The parameters in which these routines are called must be
the same for every thread in the workgroup.
If any thread calls these routines with differing parameters, the behavior is undefined.
These APIs will block until the calling workgroup completes.
The behavior is undefined if any thread calls these routines with different parameters. These APIs will block until the calling workgroup is complete.
+12 -12
Ver ficheiro
@@ -16,30 +16,30 @@ subtrees:
- caption: How to
entries:
- file: compile_and_run.rst
title: Compile and Run rocSHMEM Programs
title: Compile and run rocSHMEM programs
- caption: API Reference
- caption: API reference
entries:
- file: api/init.rst
title: Library Setup, Exit, and Query Routines
title: Library setup, exit, and query routines
- file: api/memory_management.rst
title: Memory Management Routines
title: Memory management routines
- file: api/teams.rst
title: Team Management Routines
title: Team management routines
- file: api/ctx.rst
title: Context Management Routines
title: Context management routines
- file: api/rma.rst
title: Remote Memory Access Routines
title: Remote memory access routines
- file: api/amo.rst
title: Atomic Memory Operations
title: Atomic memory operations
- file: api/sigops.rst
title: Signaling Operations
title: Signaling operations
- file: api/coll.rst
title: Collective Routines
title: Collective routines
- file: api/pt2pt_sync.rst
title: Point-to-Point Synchronization Routines
title: Point-to-point synchronization routines
- file: api/memory_ordering.rst
title: Memory Ordering Routines
title: Memory ordering routines
- caption: About
entries: