Initial ROCm-docs (#92)
* Initial ROCm-docs commit Co-authored-by: Aurélien Bouteiller <bouteill@icl.utk.edu> Co-authored-by: Alex Xu <alex.xu@amd.com> Co-authored-by: yugang-amd <yugang.wang@amd.com>
Tento commit je obsažen v:
@@ -0,0 +1,5 @@
|
||||
_build/
|
||||
_doxygen/
|
||||
doxygen/html/
|
||||
doxygen/xml/
|
||||
sphinx/_toc.yml
|
||||
@@ -0,0 +1,21 @@
|
||||
# Building the rocSHMEM documentation
|
||||
|
||||
## macOS
|
||||
|
||||
To build html documentation locally:
|
||||
|
||||
```
|
||||
brew install doxygen sphinx-doc
|
||||
pip3.10 install -r ./requirements.txt
|
||||
python3.10 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html
|
||||
open _build/html/index.html
|
||||
```
|
||||
|
||||
To build pdf documentation we require a LaTeX installation on your machine.
|
||||
Once LaTeX is installed, you may run the following:
|
||||
|
||||
```
|
||||
pip3.10 install -r ./requirements.txt
|
||||
sphinx-build -M latexpdf . _build
|
||||
open _build/latex/rocshmem.pdf
|
||||
```
|
||||
@@ -0,0 +1,419 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-amo:
|
||||
|
||||
---------------------------
|
||||
Atomic Memory Operations
|
||||
---------------------------
|
||||
|
||||
- These functions can be called from divergent control paths at per-thread
|
||||
granularity.
|
||||
|
||||
ROSHMEM_ATOMIC_FETCH
|
||||
--------------------
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch(TYPE *source, int pe)
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch(rocshmem_ctx_t ctx, TYPE *source, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: The value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically return the value of dest to the calling PE.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in EXTENDED_AMO_TYPES_.
|
||||
|
||||
|
||||
SHMEM_ATOMIC_SET
|
||||
----------------
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_atomic_set(TYPE *dest, TYPE value, int pe);
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_atomic_set(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, int pe);
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param val: The value to be atomically set
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Atomically set the value val to dest on pe.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in EXTENDED_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_COMPARE_SWAP
|
||||
-------------------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_compare_swap(TYPE *dest, TYPE cond, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_compare_swap(rocshmem_ctx_t ctx, TYPE *dest, TYPE cond, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param cond: The value to be compare with
|
||||
:param val: The value to be atomically swapped
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically compares if the value in dest with cond is equal then put val in dest.
|
||||
The operation returns the older value of dest to the calling PE.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_SWAP
|
||||
-----------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_swap(TYPE *dest, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_swap(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param val: The value to be atomically swapped
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically swaps the value val to dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in EXTENDED_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_FETCH_INC
|
||||
----------------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch_inc(TYPE *dest, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_inc(rocshmem_ctx_t ctx, TYPE *dest, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically adds 1 to dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_INC
|
||||
----------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_atomic_inc(TYPE *dest, TYPE pe);
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_atomic_inc(rocshmem_ctx_t ctx, TYPE *dest, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: None
|
||||
|
||||
**Description:**
|
||||
Atomically adds 1 to dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_FETCH_ADD
|
||||
----------------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch_add(TYPE *dest, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_add(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically added
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically adds value to dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_ADD
|
||||
----------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_atomic_add(TYPE *dest, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_atomic_add(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically added
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: None
|
||||
|
||||
**Description:**
|
||||
Atomically adds value to dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in STANDARD_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_FETCH_AND
|
||||
----------------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch_and(TYPE *dest, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_and(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically AND
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest.
|
||||
|
||||
**Description:**
|
||||
Atomically bitwise-and value to the value at dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_AND
|
||||
----------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_and(TYPE *dest, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_and(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically AND
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: None
|
||||
|
||||
**Description:**
|
||||
Atomically bitwise-and value to the value at dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_FETCH_OR
|
||||
----------------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch_or(TYPE *dest, TYPE value, TYPE pe)
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_or(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe)
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically OR
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically bitwise-or value to the value at dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_OR
|
||||
---------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_or(TYPE *dest, TYPE value, TYPE pe)
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_or(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe)
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically OR
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: None.
|
||||
|
||||
**Description:**
|
||||
Atomically bitwise-or value to the value at dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_FETCH_XOR
|
||||
----------------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_fetch_xor(TYPE *dest, TYPE value, TYPE pe);
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_fetch_xor(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe);
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically XOR
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: The old value of dest
|
||||
|
||||
**Description:**
|
||||
Atomically bitwise-xor value to the value at dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
|
||||
|
||||
SHMEM_ATOMIC_XOR
|
||||
----------------
|
||||
|
||||
.. cpp:function:: __device__ TYPE rocshmem_TYPENAME_atomic_xor(TYPE *dest, TYPE value, TYPE pe)
|
||||
.. cpp:function:: __device__ TYPE rocshmem_ctx_TYPENAME_atomic_xor(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, TYPE pe)
|
||||
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: The value to be atomically XOR
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:return: None
|
||||
|
||||
**Description:**
|
||||
Atomically bitwise-xor value to the value at dest on pe.
|
||||
The operation is blocking.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen in BITWISE_AMO_TYPES_.
|
||||
|
||||
SUPPORTED AMO DATA TYPES
|
||||
------------------------
|
||||
|
||||
.. _STANDARD_AMO_TYPES:
|
||||
|
||||
.. list-table:: Standard AMO Datatypes
|
||||
:widths: 10 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - TYPE
|
||||
- TYPENAME
|
||||
- Supported
|
||||
* - int
|
||||
- int
|
||||
- Yes
|
||||
* - long
|
||||
- long
|
||||
- Yes
|
||||
* - long long
|
||||
- longlong
|
||||
- Yes
|
||||
* - unsigned int
|
||||
- uint
|
||||
- Yes
|
||||
* - unsigned long
|
||||
- ulong
|
||||
- Yes
|
||||
* - unsigned long long
|
||||
- ulonglong
|
||||
- Yes
|
||||
* - int32_t
|
||||
- int32
|
||||
- Yes
|
||||
* - int64_t
|
||||
- int64
|
||||
- Yes
|
||||
* - uint32_t
|
||||
- uint32
|
||||
- Yes
|
||||
* - uint64_t
|
||||
- uint64
|
||||
- Yes
|
||||
* - size_t
|
||||
- size
|
||||
- Yes
|
||||
* - ptrdiff_t
|
||||
- ptrdiff
|
||||
- Yes
|
||||
|
||||
.. _EXTENDED_AMO_TYPES:
|
||||
|
||||
.. list-table:: Extended AMO Datatypes
|
||||
:widths: 10 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - TYPE
|
||||
- TYPENAME
|
||||
- Supported
|
||||
* - float
|
||||
- float
|
||||
- Yes
|
||||
* - double
|
||||
- double
|
||||
- Yes
|
||||
* - int
|
||||
- int
|
||||
- Yes
|
||||
* - long
|
||||
- long
|
||||
- Yes
|
||||
* - long long
|
||||
- longlong
|
||||
- Yes
|
||||
* - unsigned int
|
||||
- uint
|
||||
- Yes
|
||||
* - unsigned long
|
||||
- ulong
|
||||
- Yes
|
||||
* - unsigned long long
|
||||
- ulonglong
|
||||
- Yes
|
||||
* - int32_t
|
||||
- int32
|
||||
- Yes
|
||||
* - int64_t
|
||||
- int64
|
||||
- Yes
|
||||
* - uint32_t
|
||||
- uint32
|
||||
- Yes
|
||||
* - uint64_t
|
||||
- uint64
|
||||
- Yes
|
||||
* - size_t
|
||||
- size
|
||||
- Yes
|
||||
* - ptrdiff_t
|
||||
- ptrdiff
|
||||
- Yes
|
||||
|
||||
.. _BITWISE_AMO_TYPES:
|
||||
|
||||
.. list-table:: Bitwise AMO Datatypes
|
||||
:widths: 10 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - TYPE
|
||||
- TYPENAME
|
||||
- Supported
|
||||
* - unsigned int
|
||||
- uint
|
||||
- Yes
|
||||
* - unsigned long
|
||||
- ulong
|
||||
- Yes
|
||||
* - unsigned long long
|
||||
- ulonglong
|
||||
- Yes
|
||||
* - int32_t
|
||||
- int32
|
||||
- Yes
|
||||
* - int64_t
|
||||
- int64
|
||||
- Yes
|
||||
* - uint32_t
|
||||
- uint32
|
||||
- Yes
|
||||
* - uint64_t
|
||||
- uint64
|
||||
- Yes
|
||||
|
||||
@@ -0,0 +1,248 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-coll:
|
||||
|
||||
---------------------------
|
||||
Collective Routines
|
||||
---------------------------
|
||||
|
||||
ROCSHMEM_BARRIER_ALL
|
||||
--------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_wg_barrier_all(rocshmem_ctx_t ctx)
|
||||
.. cpp:function:: __device__ void rocshmem_wg_barrier_all()
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Perform a collective barrier between all PEs in the system.
|
||||
The caller is blocked until the barrier is resolved.
|
||||
|
||||
ROCSHMEM_TEAM_SYNC
|
||||
------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_wg_team_sync(rocshmem_ctx_t ctx, rocshmem_team_t team)
|
||||
.. cpp:function:: __device__ void rocshmem_wg_team_sync(rocshmem_team_t team)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param team: Team with which to perform this operation
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Registers the arrival of a PE at a barrier.
|
||||
The caller is blocked until the synchronization is resolved.
|
||||
|
||||
In contrast with the shmem_barrier_all routine, shmem_team_sync only ensures
|
||||
completion and visibility of previously issued memory stores and does not
|
||||
ensure completion of remote memory updates issued via OpenSHMEM routines.
|
||||
|
||||
ROCSHMEM_SYNC_ALL
|
||||
-----------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_wg_sync_all(rocshmem_ctx_t ctx)
|
||||
.. cpp:function:: __device__ void rocshmem_wg_sync_all()
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
This routine is the same as ``rocshmem_wg_team_sync`` if were to be called on the world team.
|
||||
|
||||
|
||||
ROSHMEM_ALLTOALL
|
||||
----------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_wg_alltoall(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems)
|
||||
|
||||
:param team: The team participating in the collective
|
||||
:param dest: Destination address; Must be an address on the
|
||||
symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric
|
||||
heap
|
||||
:param nelems: Number of data blocks transferred per pair of PEs
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Exchanges a fixed amount of contiguous data blocks between all pairs
|
||||
of PEs participating in the collective routine.
|
||||
This function must be called as a work-group collective.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`RMA_TYPES`.
|
||||
|
||||
ROCSHMEM_BROADCAST
|
||||
------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_wg_broadcast(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems, int pe_root)
|
||||
|
||||
:param ctx: Context with which to perform this collective
|
||||
:param team: The team participating in the collective
|
||||
:param dest: Destination address; Must be an address on the
|
||||
symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric
|
||||
heap
|
||||
:param nelems: Number of data blocks transferred per pair of PEs
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Perform a broadcast between PEs in the team.
|
||||
The caller is blocked until the broadcast completes.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`RMA_TYPES`.
|
||||
|
||||
ROCSHMEM_FCOLLECT
|
||||
-----------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_wg_fcollect(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nelems)
|
||||
|
||||
:param ctx: Context with which to perform this collective
|
||||
:param team: The team participating in the collective
|
||||
:param dest: Destination address; Must be an address on the
|
||||
symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric
|
||||
heap
|
||||
:param nelems: Number of data blocks transferred per pair of PEs.
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Concatenates blocks of data from multiple PEs to an array in every
|
||||
PE participating in the collective routine.
|
||||
|
||||
ROCSHMEM_REDUCTION
|
||||
------------------
|
||||
.. cpp:function:: __device__ int rocshmem_ctx_TYPENAME_OPNAME_wg_reduce(rocshmem_ctx_t ctx, rocshmem_team_t team, TYPE *dest, const TYPE *source, int nreduce)
|
||||
|
||||
:param ctx: Context with which to perform this collective
|
||||
:param team: The team participating in the collective
|
||||
:param dest: Destination address; Must be an address on the
|
||||
symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric
|
||||
heap
|
||||
:param nreduce: Number of data blocks transferred per pair of PEs
|
||||
:returns: Zero on successful local completion; Nonzero otherwise
|
||||
|
||||
|
||||
**Description:**
|
||||
Perform an allreduce between PEs in the team.
|
||||
|
||||
Valid ``TYPENAME``, ``TYPE``, and ``OPNAME`` values can be seen at :ref:`REDUCE_TYPES`.
|
||||
|
||||
SUPPORTED REDUCTION TYPES AND OPERATIONS
|
||||
----------------------------------------
|
||||
|
||||
.. _REDUCE_TYPES:
|
||||
|
||||
.. list-table:: Reduction Types, Names and Operations
|
||||
:widths: 20 20 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - TYPE
|
||||
- TYPENAME
|
||||
- OPNAME
|
||||
- Supported
|
||||
* - char
|
||||
- char
|
||||
- max, min, sum, prod
|
||||
- No
|
||||
* - signed char
|
||||
- schar
|
||||
- max, min, sum, prod
|
||||
- No
|
||||
* - short
|
||||
- short
|
||||
- max, min, sum, prod
|
||||
- Yes
|
||||
* - int
|
||||
- int
|
||||
- max, min, sum, prod
|
||||
- Yes
|
||||
* - long
|
||||
- long
|
||||
- max, min, sum, prod
|
||||
- Yes
|
||||
* - long long
|
||||
- longlong
|
||||
- max, min, sum, prod
|
||||
- Yes
|
||||
* - ptrdiff_t
|
||||
- ptrdiff
|
||||
- max, min, sum, prod
|
||||
- No
|
||||
* - unsigned char
|
||||
- uchar
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - unsigned short
|
||||
- ushort
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - unsigned int
|
||||
- uint
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - unsigned long
|
||||
- ulong
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - unsigned long long
|
||||
- ulonglong
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - int8_t
|
||||
- int8
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - int16_t
|
||||
- int16
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - int32_t
|
||||
- int32
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - int64_t
|
||||
- int64
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - uint8_t
|
||||
- uint8
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - uint16_t
|
||||
- uint16
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - uint32_t
|
||||
- uint32
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - uint64_t
|
||||
- uint64
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - size_t
|
||||
- size
|
||||
- and, or, xor, max, min, sum, prod
|
||||
- No
|
||||
* - float
|
||||
- float
|
||||
- max, min, sum, prod
|
||||
- Yes
|
||||
* - double
|
||||
- double
|
||||
- max, min, sum, prod
|
||||
- Yes
|
||||
* - long double
|
||||
- longdouble
|
||||
- max, min, sum, prod
|
||||
- No
|
||||
* - double _Complex
|
||||
- complexd
|
||||
- sum, prod
|
||||
- No
|
||||
* - float _Complex
|
||||
- complexf
|
||||
- sum, prod
|
||||
- No
|
||||
@@ -0,0 +1,40 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-ctx:
|
||||
|
||||
-----------------------------------
|
||||
Context Management Routines
|
||||
-----------------------------------
|
||||
|
||||
ROCSHMEM_CTX_CREATE
|
||||
-------------------
|
||||
|
||||
.. cpp:function:: __device__ int rocshmem_wg_ctx_create(int64_t options, rocshmem_ctx_t *ctx)
|
||||
.. cpp:function:: __device__ int rocshmem_wg_team_create_ctx(rocshmem_team_t team, long options, rocshmem_ctx_t *ctx)
|
||||
|
||||
:param team: Team handle to derive the context from
|
||||
:param options: Options for context creation (Ignored in current design, please use a value of 0)
|
||||
:param ctx: Context handle
|
||||
|
||||
:returns: All threads returns 0 if the context was created successfully;
|
||||
If any thread returns non-zero value, the operation failed and a higher number of
|
||||
`ROCSHMEM_MAX_NUM_CONTEXTS` is required
|
||||
|
||||
**Description:**
|
||||
Creates an OpenSHMEM context. By design, the context is private to the calling work-group.
|
||||
Must be called collectively by all threads in the work-group.
|
||||
|
||||
ROCSHMEM_CTX_DESTROY
|
||||
--------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_wg_ctx_destroy(rocshmem_ctx_t *ctx)
|
||||
|
||||
:param ctx: Context handle
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Destroys an rocSHMEM context.
|
||||
Must be called collectively by all threads in the work-group.
|
||||
@@ -0,0 +1,98 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-init:
|
||||
|
||||
---------------------------------------
|
||||
Library Setup, Exit, and Query Routines
|
||||
---------------------------------------
|
||||
|
||||
ROCSHMEM_INIT
|
||||
-------------
|
||||
|
||||
.. cpp:function:: __host__ void rocshmem_init(void)
|
||||
|
||||
:Parameters: None
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
This routine initializes the rocSHMEM runtime and underlying transport layer.
|
||||
Before ``rocshmem_init`` is called,
|
||||
a user must select the device that this PE is associated to by calling
|
||||
`hipSetDevice
|
||||
<https://rocm.docs.amd.com/projects/HIP/en/docs-6.0.0/doxygen/html/group___device.html#ga43c1e7f15925eeb762195ccb5e063eae>`_.
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_wg_init(void)
|
||||
|
||||
:Parameters: None
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Initializes device-side rocSHMEM resources.
|
||||
Must be called before any threads in this work-group invoke other rocSHMEM functions.
|
||||
Must be called collectively by all threads in the work-group.
|
||||
|
||||
ROCSHMEM_FINALIZE
|
||||
-----------------
|
||||
.. cpp:function:: __host__ void rocshmem_finalize(void)
|
||||
|
||||
:Parameters: None
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Finalize the rocSHMEM runtime.
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_wg_finalize(void)
|
||||
|
||||
:Parameters: None
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Finalizes device-side rocSHMEM resources.
|
||||
Must be called before work-group completion if the work-group also called ``rocshmem_wg_init``.
|
||||
Must be called collectively by all threads in the work-group.
|
||||
|
||||
ROCSHMEM_N_PES
|
||||
--------------
|
||||
|
||||
.. cpp:function:: __host__ int rocshmem_n_pes(void)
|
||||
|
||||
:Parameters: None
|
||||
:returns: Total number of PEs
|
||||
|
||||
**Description:**
|
||||
Query the total number of PEs.
|
||||
This routine can be called before ``rocshmem_init``.
|
||||
|
||||
.. cpp:function:: __device__ int rocshmem_n_pes(void)
|
||||
.. cpp:function:: __device__ int rocshmem_ctx_n_pes(rocshmem_ctx_t ctx)
|
||||
|
||||
:param ctx: GPU side context handle
|
||||
:returns: Total number of PEs
|
||||
|
||||
**Description:**
|
||||
Query the total number of PEs for a given context.
|
||||
Can be called per thread with no performance penalty.
|
||||
|
||||
ROCSHMEM_MY_PE
|
||||
--------------
|
||||
|
||||
.. cpp:function:: __host__ int rocshmem_my_pe(void)
|
||||
|
||||
:Parameters: None
|
||||
:returns: PE ID of the caller
|
||||
|
||||
**Description:**
|
||||
Query the PE ID of the caller.
|
||||
This routine can be called before ``rocshmem_init``.
|
||||
|
||||
.. cpp:function:: __device__ int rocshmem_my_pe(void)
|
||||
.. cpp:function:: __device__ int rocshmem_ctx_my_pe(rocshmem_ctx_t ctx)
|
||||
|
||||
:param ctx: GPU side context handle
|
||||
:returns: PE ID of the caller
|
||||
|
||||
**Description:**
|
||||
Query the PE ID of the caller.
|
||||
Can be called per thread with no performance penalty.
|
||||
@@ -0,0 +1,35 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-memory-management:
|
||||
|
||||
|
||||
---------------------------
|
||||
Memory Management Routines
|
||||
---------------------------
|
||||
|
||||
ROCSHMEM_MALLOC
|
||||
---------------
|
||||
|
||||
.. cpp:function:: __host__ void *rocshmem_malloc(size_t size)
|
||||
|
||||
:param size: Memory allocation size in bytes
|
||||
:returns: A pointer to the allocated memory on the symmetric heap;
|
||||
If a valid allocation cannot be made, it returns NULL
|
||||
|
||||
**Description:**
|
||||
Allocate memory of ``size`` bytes from the symmetric heap.
|
||||
This is a collective operation and must be called by all PEs.
|
||||
|
||||
ROCSHMEM_FREE
|
||||
-------------
|
||||
|
||||
.. cpp:function:: __host__ void rocshmem_free(void *ptr)
|
||||
|
||||
:param ptr: Pointer to previously allocated memory on the symmetric heap
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Free a memory allocation from the symmetric heap.
|
||||
This is a collective operation and must be called by all PEs.
|
||||
@@ -0,0 +1,36 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-memory-ordering:
|
||||
|
||||
---------------------------
|
||||
Memory Ordering Routines
|
||||
---------------------------
|
||||
|
||||
ROCSHMEM_FENCE
|
||||
--------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_fence()
|
||||
.. cpp:function:: __device__ void rocshmem_fence(int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_fence(rocshmem_ctx_t ctx)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_fence(rocshmem_ctx_t ctx, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param pe: Destination pe
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Guarantees order between messages in this context in accordance with OpenSHMEM semantics.
|
||||
|
||||
ROCSHMEM_QUIET
|
||||
--------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_quiet(rocshmem_ctx_t ctx)
|
||||
.. cpp:function:: __device__ void rocshmem_quiet()
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Completes all previous operations posted to this context.
|
||||
@@ -0,0 +1,122 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-pt2pt-sync:
|
||||
|
||||
-----------------------------------------
|
||||
Point-to-Point Synchronization Routines
|
||||
-----------------------------------------
|
||||
|
||||
ROCSHMEM_WAIT_UNTIL
|
||||
-------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_wait_until(TYPE *ivars, int cmp, TYPE val)
|
||||
|
||||
:param ivars: Pointer to memory on the symmetric heap to wait for
|
||||
:param cmp: Operation for the comparison
|
||||
:param val: Value to compare the memory at ivars to
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Block the caller until the condition ``(*ivars cmp val)`` is true.
|
||||
|
||||
Valid ``cmp`` values can be seen at :ref:`CMP_VALUES`.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
|
||||
|
||||
ROCSHMEM_WAIT_UNTIL_ALL
|
||||
-----------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_wait_until_all(TYPE *ivars, size_t nelems, const int* status, int cmp, TYPE val)
|
||||
|
||||
:param ivars: Pointer to memory on the symmetric heap to wait for
|
||||
:param nelems: Number of elements in the ivars array
|
||||
:param status: Array of length nelems that is used to exclude elements from wait
|
||||
:param cmp: Operation for the comparison
|
||||
:param val: Value to compare
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Block the caller until the condition ``(ivars[i] cmp val)`` is true for all ivars
|
||||
|
||||
Valid ``cmp`` values can be seen at :ref:`CMP_VALUES`.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
|
||||
|
||||
ROCSHMEM_WAIT_UNTIL_ANY
|
||||
-----------------------
|
||||
.. cpp:function:: __device__ size_t rocshmem_TYPENAME_wait_until_any(TYPE *ivars, size_t nelems, const int* status, int cmp, TYPE val)
|
||||
|
||||
:param ivars: Pointer to memory on the symmetric heap to wait for
|
||||
:param nelems: Number of elements in the ivars array
|
||||
:param status: Array of length nelems that is used to exclude elements from wait
|
||||
:param cmp: Operation for the comparison
|
||||
:param val: Value to compare
|
||||
:returns: The index of an element in the ivars array that satisfies the wait condition; If the wait set is empty, this routine returns SIZE_MAX
|
||||
|
||||
**Description:**
|
||||
Block the caller until any of the condition ``(ivars[i] cmp val)`` is true.
|
||||
|
||||
Valid `cmp` values can be seen at :ref:`CMP_VALUES`.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
|
||||
|
||||
ROCSHMEM_WAIT_UNTIL_SOME
|
||||
------------------------
|
||||
|
||||
.. cpp:function:: __device__ size_t rocshmem_TYPENAME_wait_until_some(TYPE *ivars, size_t nelems, size_t* indices, const int* status, int cmp, TYPE val)
|
||||
|
||||
:param ivars: Pointer to memory on the symmetric heap to wait for
|
||||
:param nelems: Number of elements in the ivars array
|
||||
:param indices: List of indices that of at least of length nelems
|
||||
:param status: Array of length nelems that is used to exclude elements from wait
|
||||
:param cmp: Operation for the comparison
|
||||
:param val: Value to compare
|
||||
:returns: The number of indices returned in the indices array; If the wait set is empty, this routine returns 0
|
||||
|
||||
**Description:**
|
||||
Block the caller until any of the conditions ``(ivars[i] cmp val)`` is true.
|
||||
|
||||
Valid `cmp` values can be seen at :ref:`CMP_VALUES`.
|
||||
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`STANDARD_AMO_TYPES`.
|
||||
|
||||
ROCSHMEM_TEST
|
||||
-------------
|
||||
|
||||
.. cpp:function:: __device__ int rocshmem_TYPENAME_test(TYPE *ivars, int cmp, TYPE val)
|
||||
|
||||
:param ivars: Pointer to memory on the symmetric heap to wait for
|
||||
:param cmp: Operation for the comparison
|
||||
:param val: Value to compare the memory at ivars to
|
||||
|
||||
:returnS: 1 if the evaluation is true, 0 otherwise
|
||||
|
||||
**Description:**
|
||||
Test if the condition ``(*ivars cmp val)`` is true.
|
||||
|
||||
|
||||
SUPPORTED COMPARISONS
|
||||
---------------------
|
||||
|
||||
.. _CMP_VALUES:
|
||||
|
||||
.. list-table:: Point-to-Point Comparison Constants
|
||||
:widths: 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - Constant
|
||||
- Description
|
||||
* - ROCSHMEM_CMP_EQ
|
||||
- Equal
|
||||
* - ROCSHMEM_CMP_NE
|
||||
- Not equal
|
||||
* - ROCSHMEM_CMP_GT
|
||||
- Greater than
|
||||
* - ROCSHMEM_CMP_GE
|
||||
- Greater than or equal to
|
||||
* - ROCSHMEM_CMP_LT
|
||||
- Less than
|
||||
* - ROCSHMEM_CMP_LE
|
||||
- Less than or equal to
|
||||
@@ -0,0 +1,239 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-rma:
|
||||
|
||||
-----------------------------------------
|
||||
Remote Memory Access Routines
|
||||
-----------------------------------------
|
||||
|
||||
- Routines with the ``_wave`` and ``_wg`` suffixes,
|
||||
require all threads in a wavefront and workgroup, respectively,
|
||||
to call into the routine with the same parameters.
|
||||
- Routines with the ``_nbi`` substring will return as soon as the request is posted.
|
||||
- Routines without the ``_nbi`` substring block until the operation completes locally.
|
||||
- Valid ``TYPENAME`` and ``TYPE`` values can be seen in RMA_TYPES_.
|
||||
|
||||
ROCSHMEM_PUT
|
||||
------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_wave(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_wg(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_nbi(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_nbi_wave(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_nbi_wg(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_nbi(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_nbi_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_nbi_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param nelems: The number of elements to transfer
|
||||
:param pe: PE of the remote process
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Writes contiguous data of nelems elements from source on the calling PE to dest at pe.
|
||||
|
||||
ROCSHMEM_PUTMEM
|
||||
---------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_putmem(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_wave(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_wg(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_nbi(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_nbi_wave(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_nbi_wg(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_nbi(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_nbi_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_nbi_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param nelems: Size of the transfer in bytes
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Writes contiguous data of nelems bytes from source on the calling PE to dest at pe.
|
||||
|
||||
ROCSHMEM_P
|
||||
----------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_p(TYPE *dest, TYPE value, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_p(rocshmem_ctx_t ctx, TYPE *dest, TYPE value, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param value: Value to write to dest at pe
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Writes a single value to dest at pe PE to dst at pe.
|
||||
|
||||
ROCSHMEM_GET
|
||||
------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_get(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_get_wave(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_get_wg(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_get_nbi(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_get_nbi_wave(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_get_nbi_wg(TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_nbi(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_nbi_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_get_nbi_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param nelems: The number of elements to transfer
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Reads contiguous data of nelems elements from source on pe to dest on the calling PE.
|
||||
|
||||
ROCSHMEM_GETMEM
|
||||
---------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_getmem(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_getmem_wave(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_getmem_wg(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_getmem_nbi(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_getmem_nbi_wave(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_getmem_nbi_wg(void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_getmem(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_getmem_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_getmem_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_getmem_nbi(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_getmem_nbi_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_getmem_nbi_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param nelems: Size of the transfer in bytes
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Reads contiguous data of nelems bytes from source on pe to dest on the calling PE.
|
||||
|
||||
ROCSHMEM_G
|
||||
----------
|
||||
.. cpp:function:: __device__ float rocshmem_ctx_float_g(rocshmem_ctx_t ctx, const float *source, int pe)
|
||||
.. cpp:function:: __device__ float rocshmem_float_g(const float *source, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param pe: PE of the remote process
|
||||
|
||||
:returns: The value read from source at pe
|
||||
|
||||
**Description:**
|
||||
Reads and returns single value from source at pe.
|
||||
|
||||
SUPPORTED RMA DATA TYPES
|
||||
------------------------
|
||||
|
||||
.. _RMA_TYPES:
|
||||
|
||||
.. list-table:: RMA Datatypes
|
||||
:widths: 10 20 20
|
||||
:header-rows: 1
|
||||
|
||||
* - TYPE
|
||||
- TYPENAME
|
||||
- Supported
|
||||
* - float
|
||||
- float
|
||||
- Yes
|
||||
* - double
|
||||
- double
|
||||
- Yes
|
||||
* - long double
|
||||
- longdouble
|
||||
- No
|
||||
* - char
|
||||
- char
|
||||
- Yes
|
||||
* - signed char
|
||||
- schar
|
||||
- Yes
|
||||
* - short
|
||||
- short
|
||||
- Yes
|
||||
* - int
|
||||
- int
|
||||
- Yes
|
||||
* - long
|
||||
- long
|
||||
- Yes
|
||||
* - long long
|
||||
- longlong
|
||||
- Yes
|
||||
* - unsigned char
|
||||
- uchar
|
||||
- Yes
|
||||
* - unsigned short
|
||||
- ushort
|
||||
- Yes
|
||||
* - unsigned int
|
||||
- uint
|
||||
- Yes
|
||||
* - unsigned long
|
||||
- ulong
|
||||
- Yes
|
||||
* - unsigned long long
|
||||
- ulonglong
|
||||
- Yes
|
||||
* - int8_t
|
||||
- int8
|
||||
- No
|
||||
* - int16_t
|
||||
- int16
|
||||
- No
|
||||
* - int32_t
|
||||
- int32
|
||||
- No
|
||||
* - int64_t
|
||||
- int64
|
||||
- No
|
||||
* - uint8_t
|
||||
- uint8
|
||||
- No
|
||||
* - uint16_t
|
||||
- uint16
|
||||
- No
|
||||
* - uint32_t
|
||||
- uint32
|
||||
- No
|
||||
* - uint64_t
|
||||
- uint64
|
||||
- No
|
||||
* - size_t
|
||||
- size
|
||||
- No
|
||||
* - ptrdiff_t
|
||||
- ptrdiff
|
||||
- No
|
||||
|
||||
@@ -0,0 +1,101 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-sigops:
|
||||
|
||||
---------------------
|
||||
Signaling Operations
|
||||
---------------------
|
||||
|
||||
ROCSHMEM_PUTMEM_SIGNAL
|
||||
----------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_signal(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_signal_wave(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_signal_wg(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_signal_nbi(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_signal_nbi_wave(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_putmem_signal_nbi_wg(void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_nbi(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_nbi_wave(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_putmem_signal_nbi_wg(rocshmem_ctx_t ctx, void *dest, const void *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param nelems: The number of bytes to transfer
|
||||
:param sig_addr: Signal address; Must be an address on the symmetric heap
|
||||
:param signal: Signal value
|
||||
:param sig_op: Atomic operation to apply the signal value
|
||||
:param pe: PE of the remote process
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Writes contiguous data of nelems bytes from source on the calling PE to dest at pe.
|
||||
Then applies sig_op at sig_addr using the signal value.
|
||||
Valid sig_op values can be seen at SIGNAL_OPERATORS_.
|
||||
|
||||
ROCSHMEM_PUT_SIGNAL
|
||||
-------------------
|
||||
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_signal(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_signal_wave(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_signal_wg(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_signal_nbi(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_signal_nbi_wave(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_TYPENAME_put_signal_nbi_wg(TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_nbi(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_nbi_wave(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
.. cpp:function:: __device__ void rocshmem_ctx_TYPENAME_put_signal_nbi_wg(rocshmem_ctx_t ctx, TYPE *dest, const TYPE *source, size_t nelems, uint64_t *sig_addr, uint64_t signal, int sig_op, int pe)
|
||||
|
||||
:param ctx: Context with which to perform this operation
|
||||
:param dest: Destination address; Must be an address on the symmetric heap
|
||||
:param source: Source address; Must be an address on the symmetric heap
|
||||
:param nelems: The number of elements of size TYPE to transfer
|
||||
:param sig_addr: Signal address; Must be an address on the symmetric heap
|
||||
:param signal: Signal value
|
||||
:param sig_op: Atomic operation to apply the signal value
|
||||
:param pe: PE of the remote process
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Writes contiguous data of nelems elements of TYPE from source on the calling PE to dest at pe.
|
||||
Then applies sig_op at sig_addr using the signal value.
|
||||
Valid sig_op values can be seen at SIGNAL_OPERATORS_.
|
||||
Valid ``TYPENAME`` and ``TYPE`` values can be seen at :ref:`RMA_TYPES`.
|
||||
|
||||
ROCSHMEM_SIGNAL_FETCH
|
||||
---------------------
|
||||
|
||||
.. cpp:function:: __device__ uint64_t rocshmem_signal_fetch(const uint64_t *sig_addr)
|
||||
.. cpp:function:: __device__ uint64_t rocshmem_signal_fetch_wg(const uint64_t *sig_addr)
|
||||
.. cpp:function:: __device__ uint64_t rocshmem_signal_fetch_wave(const uint64_t *sig_addr)
|
||||
|
||||
:param sig_addr: Signal address; Must be an address on the symmetric heap
|
||||
:returns: Value at sig_addr
|
||||
|
||||
**Description:**
|
||||
Atomically fetches the value stored at sig_addr.
|
||||
|
||||
SIGNAL OPERATORS
|
||||
----------------
|
||||
.. _SIGNAL_OPERATORS:
|
||||
|
||||
.. list-table:: Signal Operators
|
||||
:widths: 20 40
|
||||
:header-rows: 1
|
||||
|
||||
* - Value
|
||||
- Description
|
||||
* - ROCSHMEM_SIGNAL_SET
|
||||
- The signaling operation routines will atomical set our signal value at sig_addr.
|
||||
* - ROCSHMEM_SIGNAL_ADD
|
||||
- The signaling operation routines will atomical add our signal value at sig_addr.
|
||||
|
||||
@@ -0,0 +1,90 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-api-teams:
|
||||
|
||||
-------------------------
|
||||
Team Management Routines
|
||||
-------------------------
|
||||
|
||||
ROCSHMEM_TEAM_MY_PE
|
||||
-------------------
|
||||
|
||||
.. cpp:function:: __host__ int rocshmem_team_my_pe(rocshmem_team_t team)
|
||||
|
||||
:param team: The team to query
|
||||
:returns: PE ID of the caller in the provided team
|
||||
|
||||
**Description:**
|
||||
Query the PE ID of the caller in a team.
|
||||
|
||||
ROCSHMEM_TEAM_N_PES
|
||||
-------------------
|
||||
|
||||
.. cpp:function:: __host__ int rocshmem_team_n_pes(rocshmem_team_t team)
|
||||
|
||||
:param team: The team to query
|
||||
:returns: Number of PEs in the provided team
|
||||
|
||||
**Description:**
|
||||
Query the number of PEs in a team.
|
||||
|
||||
ROCSHMEM_TEAM_TRANSLATE_PE
|
||||
--------------------------
|
||||
|
||||
.. cpp:function:: __host__ int rocshmem_team_translate_pe(rocshmem_team_t src_team, int src_pe, rocshmem_team_t dest_team)
|
||||
|
||||
:param src_team: Handle of the team from which to translate
|
||||
:param src_pe: PE-of-interest's index in src_team
|
||||
:param dest_team: Handle of the team to which to translate
|
||||
:returns: PE of src_pe in dest_team;
|
||||
If any input is invalid or if src_pe is
|
||||
not in both source and destination teams, a value of -1 is returned
|
||||
|
||||
**Description:**
|
||||
Translate the PE in src_team to that in dest_team.
|
||||
|
||||
ROCSHMEM_TEAM_SPLIT_STRIDED
|
||||
---------------------------
|
||||
|
||||
.. cpp:function:: __host__ int rocshmem_team_split_strided(rocshmem_team_t parent_team, int start, int stride, int size, const rocshmem_team_config_t *config, long config_mask, rocshmem_team_t *new_team)
|
||||
|
||||
:param parent_team: The team to split from
|
||||
:param start: The lowest PE number of the subset of the PEs
|
||||
from the parent team that will form the new
|
||||
team
|
||||
:param stride: The stride between team PE members in the
|
||||
parent team that comprise the subset of PEs
|
||||
that will form the new team
|
||||
:param size: The number of PEs in the new team
|
||||
:param config: Pointer to the config parameters for the new team
|
||||
:param config_mask: Bitwise mask representing parameters to use from config
|
||||
:param new_team: Pointer to the newly created team;
|
||||
If an error occurs during team creation, or if the PE in
|
||||
the parent team is not in the new team, the value will be
|
||||
ROCSHMEM_TEAM_INVALID
|
||||
|
||||
:returns: Zero upon successful team creation; non-zero if erroneous
|
||||
|
||||
**Description:**
|
||||
Create a new a team of PEs. Must be called by all PEs in the parent team.
|
||||
|
||||
ROCSHMEM_TEAM_DESTROY
|
||||
---------------------
|
||||
|
||||
.. cpp:function:: __host__ void rocshmem_team_destroy(rocshmem_team_t team)
|
||||
|
||||
:param team: The team to destroy; The behavior is undefined if
|
||||
the input team is ROCSHMEM_TEAM_WORLD or any other
|
||||
invalid team; If the input is ROCSHMEM_TEAM_INVALID,
|
||||
this function will not perform any operation
|
||||
|
||||
:returns: None
|
||||
|
||||
**Description:**
|
||||
Destroy a team. Must be called by all PEs in the team.
|
||||
The user must destroy all private contexts created in the
|
||||
team before destroying this team. Otherwise, the behavior
|
||||
is undefined. This call will destroy only the shareable contexts
|
||||
created from the referenced team.
|
||||
@@ -0,0 +1,80 @@
|
||||
-------------------------
|
||||
Running rocSHMEM Programs
|
||||
-------------------------
|
||||
|
||||
Compiling and Linking with rocSHMEM
|
||||
-----------------------------------
|
||||
|
||||
RocSHMEM is built as a library that can be statically
|
||||
linked to your application during compilation using ``hipcc``.
|
||||
|
||||
During the compilation of your application, include the rocSHMEM header files
|
||||
and the rocSHMEM library when using ``hipcc``.
|
||||
Since rocSHMEM depends on MPI (in version 6.4.0, this requirement may be dropped
|
||||
in future versions) you will need to link with an MPI library.
|
||||
The arguments for MPI linkage must be added manually as opposed to using ``mpicc``.
|
||||
|
||||
When using ``hipcc`` directly (as opposed to through a build system), we
|
||||
recommend performing the compilation and linking steps separately.
|
||||
|
||||
For example, one can refer to how to compile the examples files (``./examples/*`` in
|
||||
the source tarball) with the following compile and link commands:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Compile
|
||||
hipcc -c -fgpu-rdc -x hip rocshmem_allreduce_test.cc \
|
||||
-I/opt/rocm/include \
|
||||
-I$ROCSHMEM_INSTALL_DIR/include \
|
||||
-I$OPENMPI_UCX_INSTALL_DIR/include/
|
||||
|
||||
# Link
|
||||
hipcc -fgpu-rdc --hip-link rocshmem_allreduce_test.o -o rocshmem_allreduce_test \
|
||||
$ROCSHMEM_INSTALL_DIR/lib/librocshmem.a \
|
||||
$OPENMPI_UCX_INSTALL_DIR/lib/libmpi.so \
|
||||
-L/opt/rocm/lib -lamdhip64 -lhsa-runtime64
|
||||
|
||||
If your project uses cmake, you may refer to
|
||||
`Using CMake with AMD ROCm <https://rocmdocs.amd.com/en/latest/conceptual/cmake-packages.html>`_.
|
||||
|
||||
Running a rocSHMEM program
|
||||
--------------------------
|
||||
|
||||
Program that use rocSHMEM will typically deploy multiple processes (Typically, one per GPU).
|
||||
The MPI launcher (e.g., ``mpiexec`` when using Open MPI) is used to start the required number
|
||||
of processes. As an example, one may launch 2 getmem example processes (available when compiled from source) using the following command line:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
mpiexec --map-by numa --mca pml ucx --mca osc ucx -np 2 ./build/examples/rocshmem_getmem_test
|
||||
|
||||
Please refer to the Open MPI documentation for more information about ``mpiexec`` command line parameters.
|
||||
|
||||
.. note::
|
||||
Some systems may have multiple installs of MPI, some of which would not
|
||||
have GPU support enabled. Make sure you use the ``mpiexec`` from the expected
|
||||
MPI library, notably when using the MPI you built yourself
|
||||
as part of :ref:`install-dependencies`.
|
||||
|
||||
Environment Variables
|
||||
---------------------
|
||||
|
||||
The behavior of rocSHMEM can be controlled with the following environment variables:
|
||||
|
||||
.. list-table:: Environment Variables
|
||||
:widths: 30 10 20
|
||||
:header-rows: 1
|
||||
|
||||
* - Name
|
||||
- Default Value
|
||||
- Description
|
||||
* - ROCSHMEM_HEAP_SIZE
|
||||
- 1 GB
|
||||
- Defines the size of the rocSHMEM symmetric heap.
|
||||
Note the heap is on the GPU memory.
|
||||
* - ROCSHMEM_MAX_NUM_CONTEXTS
|
||||
- 1024
|
||||
- Defines the number of contexts an application can use
|
||||
* - ROCSHMEM_MAX_NUM_TEAMS
|
||||
- 40
|
||||
- Defines the number of teams an application can use
|
||||
@@ -0,0 +1,36 @@
|
||||
# Configuration file for the Sphinx documentation builder.
|
||||
#
|
||||
# This file only contains a selection of the most common options. For a full
|
||||
# list see the documentation:
|
||||
# https://www.sphinx-doc.org/en/master/usage/configuration.html
|
||||
|
||||
import re
|
||||
|
||||
from rocm_docs import ROCmDocs
|
||||
|
||||
with open('../include/rocshmem/rocshmem.hpp', encoding='utf-8') as f:
|
||||
match = re.search(r'constexpr char VERSION\[\] = "([0-9.]+)[^0-9.]+', f.read())
|
||||
if not match:
|
||||
raise ValueError("VERSION not found!")
|
||||
version_number = match[1]
|
||||
left_nav_title = f"rocSHMEM {version_number} Documentation"
|
||||
|
||||
# for PDF output on Read the Docs
|
||||
project = "rocSHMEM Documentation"
|
||||
author = "Advanced Micro Devices, Inc."
|
||||
copyright = "Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved."
|
||||
version = version_number
|
||||
release = version_number
|
||||
|
||||
external_toc_path = "./sphinx/_toc.yml"
|
||||
|
||||
docs_core = ROCmDocs(left_nav_title)
|
||||
docs_core.run_doxygen(doxygen_root="doxygen", doxygen_path="doxygen/xml")
|
||||
docs_core.setup()
|
||||
|
||||
external_projects_current_project = "rocshmem"
|
||||
cpp_id_attributes = ["__host__", "__global__", "__device__"]
|
||||
exclude_patterns = ["README.md"]
|
||||
|
||||
for sphinx_var in ROCmDocs.SPHINX_VARS:
|
||||
globals()[sphinx_var] = getattr(docs_core, sphinx_var)
|
||||
Rozdílový obsah nebyl zobrazen, protože je příliš veliký
Načíst rozdílové porovnání
@@ -0,0 +1,46 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
****************************
|
||||
rocSHMEM Documentation
|
||||
****************************
|
||||
|
||||
The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research initiative
|
||||
to provide GPU-centric networking through an OpenSHMEM-like interface.
|
||||
This intra-kernel networking library simplifies application code complexity and
|
||||
enables more fine-grained communication/computation overlap
|
||||
than traditional host-driven networking.
|
||||
rocSHMEM uses a single symmetric heap (SHEAP) that is allocated on GPU memories. To learn more, see :doc:`introduction`
|
||||
|
||||
The code is open and hosted at `<https://github.com/ROCm/rocSHMEM>`_.
|
||||
|
||||
.. grid:: 2
|
||||
:gutter: 3
|
||||
|
||||
.. grid-item-card:: Install
|
||||
|
||||
* :doc:`Install rocSHMEM <./install>`
|
||||
|
||||
.. grid-item-card:: How to
|
||||
|
||||
* :doc:`Compile and Run rocSHMEM Programs <./compile_and_run>`
|
||||
|
||||
.. grid-item-card:: API Reference
|
||||
|
||||
* :doc:`Library Setup, Exit, and Query Routines <./api/init>`
|
||||
* :doc:`Memory Management Routines <./api/memory_management>`
|
||||
* :doc:`Team Management Routines <./api/teams>`
|
||||
* :doc:`Context Management Routines <./api/ctx>`
|
||||
* :doc:`Remote Memory Access Routines <./api/rma>`
|
||||
* :doc:`Atomic Memory Operations <./api/amo>`
|
||||
* :doc:`Signaling Operations <./api/sigops>`
|
||||
* :doc:`Collective Routines <./api/coll>`
|
||||
* :doc:`Point-to-Point Synchronization Routines <./api/pt2pt_sync>`
|
||||
* :doc:`Memory Ordering Routines <./api/memory_ordering>`
|
||||
|
||||
To contribute to the documentation, refer to
|
||||
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
You can find licensing information on the
|
||||
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
|
||||
@@ -0,0 +1,116 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _install-rocshmem:
|
||||
|
||||
---------------------------
|
||||
Installing rocSHMEM
|
||||
---------------------------
|
||||
|
||||
This topic describes how to install rocSHMEM.
|
||||
|
||||
The file `README.md <https://github.com/ROCm/rocSHMEM/blob/rocm-6.4.0/README.md>`_ in the rocSHMEM sources may contain additional information.
|
||||
|
||||
Requirements
|
||||
---------------------------
|
||||
|
||||
1. ROCm stack installed on the system (HIP runtime)
|
||||
|
||||
* ROCm v6.4.0 or later
|
||||
|
||||
2. AMD GPUs
|
||||
|
||||
* MI250X
|
||||
|
||||
* MI300X
|
||||
|
||||
3. ROCm-aware Open MPI and UCX as described in Building Dependencies
|
||||
|
||||
Installing from a Package Manager
|
||||
---------------------------------
|
||||
|
||||
On Ubuntu, rocSHMEM can be installed with the following command:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
apt install rocshmem-dev
|
||||
|
||||
.. note::
|
||||
|
||||
This installation method requires ROCm 6.4 or newer. Dependencies
|
||||
(open MPI and UCX) still need to be built following the instructions
|
||||
in the next paragraph, as the distribution packaged versions do not
|
||||
include full accelerator support.
|
||||
|
||||
.. _install-dependencies:
|
||||
|
||||
Building Dependencies
|
||||
---------------------------
|
||||
|
||||
rocSHMEM requires a ROCm-Aware Open MPI and UCX.
|
||||
Other MPI implementations, such as MPICH,
|
||||
*should* be compatible, if rocSHMEM is built from source,
|
||||
but it has not been thoroughly tested.
|
||||
|
||||
To build and configure ROCm-Aware UCX (1.17.0 or later), you need to:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone https://github.com/ROCm/ucx.git -b v1.17.x
|
||||
cd ucx
|
||||
./autogen.sh
|
||||
./configure --prefix=<prefix_dir> --with-rocm=<rocm_path> --enable-mt
|
||||
make -j 8
|
||||
make -j 8 install
|
||||
|
||||
Then, you need to build Open MPI (5.0.7 or later) with UCX support.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone --recursive https://github.com/open-mpi/ompi.git -b v5.0.x
|
||||
cd ompi
|
||||
./autogen.pl
|
||||
./configure --prefix=<prefix_dir> --with-rocm=<rocm_path> --with-ucx=<ucx_path>
|
||||
make -j 8
|
||||
make -j 8 install
|
||||
|
||||
Alternatively, we have script to install dependencies.
|
||||
Configuration options are platform dependent, so please review the script to
|
||||
check for fitness with your system.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
export BUILD_DIR=/path/to/not_rocshmem_src_or_build/dependencies
|
||||
/path/to/rocshmem_src/scripts/install_dependencies.sh
|
||||
|
||||
For more information on OpenMPI-UCX support, please visit:
|
||||
`GPU-enabled Message Passing Interface <https://rocm.docs.amd.com/en/latest/how-to/gpu-enabled-mpi.html>`_
|
||||
|
||||
Installing rocSHMEM from Source
|
||||
--------------------------------
|
||||
|
||||
The following method can be used to build and install rocSHMEM with the IPC
|
||||
on-node, GPU-to-GPU backend:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
git clone git@github.com:ROCm/rocSHMEM.git
|
||||
cd rocSHMEM
|
||||
mkdir build
|
||||
cd build
|
||||
../scripts/build_configs/ipc_single
|
||||
|
||||
The build script passes configuration options to CMake to setup a canonical
|
||||
build.
|
||||
There are other scripts for experimental configurations in the
|
||||
`./scripts/build_configs` directory, but currently, only `ipc_single`
|
||||
is supported.
|
||||
|
||||
By default, the library is installed in `~/rocshmem`. You may provide a
|
||||
custom install path by supplying it as an argument. For example:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
../scripts/build_configs/ipc_single /path/to/install
|
||||
|
||||
@@ -0,0 +1,54 @@
|
||||
.. meta::
|
||||
:description: rocSHMEM intra-kernel networking runtime for AMD dGPUs on the ROCm platform.
|
||||
:keywords: rocSHMEM, API, ROCm, documentation, HIP, Networking, Communication
|
||||
|
||||
.. _rocshmem-introduction:
|
||||
|
||||
---------------------------
|
||||
What is rocSHMEM?
|
||||
---------------------------
|
||||
|
||||
The ROCm OpenSHMEM (rocSHMEM) runtime is part of an AMD and AMD Research initiative
|
||||
to provide GPU-centric networking through an OpenSHMEM-like interface.
|
||||
This intra-kernel networking library simplifies application code complexity and
|
||||
enables more fine-grained communication/computation overlap
|
||||
than traditional host-driven networking.
|
||||
rocSHMEM uses a single symmetric heap (SHEAP) that is allocated on GPU memories.
|
||||
|
||||
The code is open and hosted at `<https://github.com/ROCm/rocSHMEM>`_.
|
||||
|
||||
The rocSHMEM Programming Model
|
||||
-------------------------------
|
||||
|
||||
Defining how OpenSHMEM applications interact with GPUs remains an
|
||||
ongoing active discussion within the OpenSHMEM community, and the OpenSHMEM
|
||||
specification has yet to coalesce on this topic.
|
||||
rocSHMEM extends beyond the OpenSHMEM specification to add semantic that
|
||||
support GPU kernel communication, while maintaining close resemblance to
|
||||
the original OpenSHMEM specification semantics.
|
||||
|
||||
Applications that use HIP can be easily interface with rocSHMEM.
|
||||
As per the HIP programming model,
|
||||
rocSHMEM has `__host__` APIs which are to be called from host code,
|
||||
and `__device__` APIs which can be called within GPU Kernels.
|
||||
Any device APIs which do not have any special suffixes/infixes (e.g. `_wg` or `_wave`)
|
||||
must be called by a single thread.
|
||||
GPU specific `_wg` and `_wave` APIs are expected to be called from multiple GPU threads
|
||||
and block until the calling scope completes.
|
||||
These APIs can be called in divergent code paths but this is not recommended.
|
||||
|
||||
Wavefront APIs
|
||||
==============
|
||||
The wavefront APIs are any API calls that have the suffix `_wave`.
|
||||
The parameters in which these routines are called must be
|
||||
the same for every thread in the wavefront.
|
||||
If any thread calls these routines with differing parameters, the behavior is undefined.
|
||||
These APIs will block until the calling wavefront completes.
|
||||
|
||||
Workgroup APIs
|
||||
==============
|
||||
The workgroup APIs are any API calls that have the suffix `_wg` or infix `_wg_`.
|
||||
The parameters in which these routines are called must be
|
||||
the same for every thread in the workgroup.
|
||||
If any thread calls these routines with differing parameters, the behavior is undefined.
|
||||
These APIs will block until the calling workgroup completes.
|
||||
@@ -0,0 +1,4 @@
|
||||
# License
|
||||
|
||||
```{include} ../LICENSE.md
|
||||
```
|
||||
@@ -0,0 +1,93 @@
|
||||
accessible-pygments==0.0.5
|
||||
alabaster==1.0.0
|
||||
appnope==0.1.4
|
||||
asttokens==3.0.0
|
||||
attrs==25.1.0
|
||||
babel==2.17.0
|
||||
beautifulsoup4==4.13.3
|
||||
breathe==4.35.0
|
||||
certifi==2025.1.31
|
||||
cffi==1.17.1
|
||||
charset-normalizer==3.4.1
|
||||
click==8.1.8
|
||||
comm==0.2.2
|
||||
cryptography==44.0.1
|
||||
debugpy==1.8.12
|
||||
decorator==5.1.1
|
||||
Deprecated==1.2.18
|
||||
docutils==0.21.2
|
||||
exceptiongroup==1.2.2
|
||||
executing==2.2.0
|
||||
fastjsonschema==2.21.1
|
||||
gitdb==4.0.12
|
||||
GitPython==3.1.44
|
||||
idna==3.10
|
||||
imagesize==1.4.1
|
||||
importlib_metadata==8.6.1
|
||||
ipykernel==6.29.5
|
||||
ipython==8.32.0
|
||||
jedi==0.19.2
|
||||
Jinja2==3.1.5
|
||||
jsonschema==4.23.0
|
||||
jsonschema-specifications==2024.10.1
|
||||
jupyter-cache==1.0.1
|
||||
jupyter_client==8.6.3
|
||||
jupyter_core==5.7.2
|
||||
markdown-it-py==3.0.0
|
||||
MarkupSafe==3.0.2
|
||||
matplotlib-inline==0.1.7
|
||||
mdit-py-plugins==0.4.2
|
||||
mdurl==0.1.2
|
||||
myst-nb==1.2.0
|
||||
myst-parser==4.0.1
|
||||
nbclient==0.10.2
|
||||
nbformat==5.10.4
|
||||
nest-asyncio==1.6.0
|
||||
packaging==24.2
|
||||
parso==0.8.4
|
||||
pexpect==4.9.0
|
||||
platformdirs==4.3.6
|
||||
prompt_toolkit==3.0.50
|
||||
psutil==7.0.0
|
||||
ptyprocess==0.7.0
|
||||
pure_eval==0.2.3
|
||||
pycparser==2.22
|
||||
pydata-sphinx-theme==0.15.4
|
||||
PyGithub==2.6.1
|
||||
Pygments==2.19.1
|
||||
PyJWT==2.10.1
|
||||
PyNaCl==1.5.0
|
||||
python-dateutil==2.9.0.post0
|
||||
PyYAML==6.0.2
|
||||
pyzmq==26.2.1
|
||||
referencing==0.36.2
|
||||
requests==2.32.3
|
||||
rocm-docs-core==1.17.0
|
||||
rpds-py==0.23.1
|
||||
six==1.17.0
|
||||
smmap==5.0.2
|
||||
snowballstemmer==2.2.0
|
||||
soupsieve==2.6
|
||||
Sphinx==8.1.3
|
||||
sphinx-book-theme==1.1.4
|
||||
sphinx-copybutton==0.5.2
|
||||
sphinx-notfound-page==1.1.0
|
||||
sphinx_design==0.6.1
|
||||
sphinx_external_toc==1.0.1
|
||||
sphinxcontrib-applehelp==2.0.0
|
||||
sphinxcontrib-devhelp==2.0.0
|
||||
sphinxcontrib-htmlhelp==2.1.0
|
||||
sphinxcontrib-jsmath==1.0.1
|
||||
sphinxcontrib-qthelp==2.0.0
|
||||
sphinxcontrib-serializinghtml==2.0.0
|
||||
SQLAlchemy==2.0.38
|
||||
stack-data==0.6.3
|
||||
tabulate==0.9.0
|
||||
tomli==2.2.1
|
||||
tornado==6.4.2
|
||||
traitlets==5.14.3
|
||||
typing_extensions==4.12.2
|
||||
urllib3==2.3.0
|
||||
wcwidth==0.2.13
|
||||
wrapt==1.17.2
|
||||
zipp==3.21.0
|
||||
@@ -0,0 +1,46 @@
|
||||
defaults:
|
||||
numbered: False
|
||||
root: index
|
||||
subtrees:
|
||||
- caption: Introduction
|
||||
entries:
|
||||
- file: introduction.rst
|
||||
title: What is rocSHMEM?
|
||||
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install.rst
|
||||
title: Install rocSHMEM
|
||||
|
||||
|
||||
- caption: How to
|
||||
entries:
|
||||
- file: compile_and_run.rst
|
||||
title: Compile and Run rocSHMEM Programs
|
||||
|
||||
- caption: API Reference
|
||||
entries:
|
||||
- file: api/init.rst
|
||||
title: Library Setup, Exit, and Query Routines
|
||||
- file: api/memory_management.rst
|
||||
title: Memory Management Routines
|
||||
- file: api/teams.rst
|
||||
title: Team Management Routines
|
||||
- file: api/ctx.rst
|
||||
title: Context Management Routines
|
||||
- file: api/rma.rst
|
||||
title: Remote Memory Access Routines
|
||||
- file: api/amo.rst
|
||||
title: Atomic Memory Operations
|
||||
- file: api/sigops.rst
|
||||
title: Signaling Operations
|
||||
- file: api/coll.rst
|
||||
title: Collective Routines
|
||||
- file: api/pt2pt_sync.rst
|
||||
title: Point-to-Point Synchronization Routines
|
||||
- file: api/memory_ordering.rst
|
||||
title: Memory Ordering Routines
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: license.rst
|
||||
Odkázat v novém úkolu
Zablokovat Uživatele