refactor RCCL (#1112)
* refactor RCCL * rccl updates * Update index.rst * refactor * Update what-is-rccl.rst
This commit is contained in:
@@ -1,7 +0,0 @@
|
||||
=======
|
||||
All API
|
||||
=======
|
||||
|
||||
.. doxygenindex::
|
||||
|
||||
|
||||
@@ -0,0 +1,11 @@
|
||||
.. meta::
|
||||
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
|
||||
:keywords: RCCL, ROCm, library, API
|
||||
|
||||
.. _api-library:
|
||||
|
||||
=============
|
||||
API library
|
||||
=============
|
||||
|
||||
.. doxygenindex::
|
||||
+24
-7
@@ -1,11 +1,28 @@
|
||||
****
|
||||
RCCL
|
||||
****
|
||||
.. meta::
|
||||
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
|
||||
:keywords: RCCL, ROCm, library, API
|
||||
|
||||
The ROCm Collective Communication Library (RCCL) is a stand-alone library which provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
|
||||
.. _index:
|
||||
|
||||
RCCL (pronounced “Rickel”) implements routines such as all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, all-to-allv, and all-to-all as well as direct point-to-point (GPU-to-GPU) send and receive operations.
|
||||
===========================
|
||||
RCCL documentation
|
||||
===========================
|
||||
|
||||
The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, RDMA based collectives. RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
|
||||
Welcome to the ROCm Collective Communication Library (RCCL) docs home page! To learn more, see :ref:`what-is-rccl`.
|
||||
|
||||
RCCL supports an arbitrary number of GPUs installed in a single-node or multi-node platform. It can be easily integrated into either single- or multi-process (e.g., MPI) applications.
|
||||
Our documentation is structured as follows:
|
||||
|
||||
|
||||
.. grid:: 2
|
||||
:gutter: 3
|
||||
|
||||
.. grid-item-card:: API reference
|
||||
|
||||
* :ref:`Library specification<library-specification>`
|
||||
* :ref:`api-library`
|
||||
|
||||
To contribute to the documentation refer to
|
||||
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
Licensing information can be found on the
|
||||
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
|
||||
|
||||
@@ -1,14 +1,16 @@
|
||||
.. toctree::
|
||||
:maxdepth: 4
|
||||
:caption: Contents:
|
||||
.. meta::
|
||||
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
|
||||
:keywords: RCCL, ROCm, library, API
|
||||
|
||||
===
|
||||
API
|
||||
===
|
||||
.. _library-specification:
|
||||
|
||||
This section provides details of the library API
|
||||
============================
|
||||
RCCL library specification
|
||||
============================
|
||||
|
||||
Communicator Functions
|
||||
This document provides details of the API library.
|
||||
|
||||
Communicator functions
|
||||
----------------------
|
||||
|
||||
.. doxygenfunction:: ncclGetUniqueId
|
||||
@@ -27,7 +29,7 @@ Communicator Functions
|
||||
|
||||
.. doxygenfunction:: ncclCommUserRank
|
||||
|
||||
Collective Communication Operations
|
||||
Collective communication operations
|
||||
-----------------------------------
|
||||
|
||||
Collective communication operations must be called separately for each communicator in a communicator clique.
|
||||
@@ -58,7 +60,7 @@ Since they may perform inter-CPU synchronization, each call has to be done from
|
||||
|
||||
.. doxygenfunction:: ncclAllToAll
|
||||
|
||||
Group Semantics
|
||||
Group semantics
|
||||
---------------
|
||||
When managing multiple GPUs from a single thread, and since NCCL collective
|
||||
calls may perform inter-CPU synchronization, we need to "group" calls for
|
||||
@@ -78,7 +80,7 @@ of ncclGroupStart/ncclGroupEnd.
|
||||
|
||||
.. doxygenfunction:: ncclGroupEnd
|
||||
|
||||
Library Functions
|
||||
Library functions
|
||||
-----------------
|
||||
|
||||
.. doxygenfunction:: ncclGetVersion
|
||||
@@ -108,7 +110,3 @@ This section provides all the enumerations used.
|
||||
.. doxygenenum:: ncclRedOp_t
|
||||
|
||||
.. doxygenenum:: ncclDataType_t
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1,10 +1,15 @@
|
||||
root: index
|
||||
subtrees:
|
||||
- entries:
|
||||
- file: api
|
||||
- file: allapi
|
||||
- file: attributions
|
||||
- file: what-is-rccl
|
||||
- caption: API reference
|
||||
entries:
|
||||
- file: library-specification
|
||||
title: Library specification
|
||||
- file: api-library
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: license
|
||||
- file: attributions
|
||||
|
||||
|
||||
@@ -0,0 +1,16 @@
|
||||
.. meta::
|
||||
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
|
||||
:keywords: RCCL, ROCm, library, API
|
||||
|
||||
.. _what-is-rccl:
|
||||
|
||||
=====================
|
||||
What is RCCL?
|
||||
=====================
|
||||
|
||||
RCCL (pronounced “Rickel”) is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
|
||||
It implements routines such as `all-reduce`, `all-gather`, `reduce`, `broadcast`, `reduce-scatter`, `gather`, `scatter`, `all-to-allv`, and `all-to-all` as well as direct point-to-point (GPU-to-GPU) send and receive operations.
|
||||
The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, and RDMA based collectives.
|
||||
|
||||
RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
|
||||
It supports an arbitrary number of GPUs installed in a single-node or multi-node platform and can be easily integrated into single- or multi-process (e.g., MPI) applications.
|
||||
Reference in New Issue
Block a user