refactor RCCL (#1112)

* refactor RCCL

* rccl updates

* Update index.rst

* refactor

* Update what-is-rccl.rst
This commit is contained in:
srawat
2024-03-15 14:14:47 +05:30
committed by GitHub
parent 50f22e8317
commit 45ee5734dd
6 changed files with 72 additions and 32 deletions
-7
View File
@@ -1,7 +0,0 @@
=======
All API
=======
.. doxygenindex::
+11
View File
@@ -0,0 +1,11 @@
.. meta::
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
:keywords: RCCL, ROCm, library, API
.. _api-library:
=============
API library
=============
.. doxygenindex::
+24 -7
View File
@@ -1,11 +1,28 @@
****
RCCL
****
.. meta::
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
:keywords: RCCL, ROCm, library, API
The ROCm Collective Communication Library (RCCL) is a stand-alone library which provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
.. _index:
RCCL (pronounced “Rickel”) implements routines such as all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, all-to-allv, and all-to-all as well as direct point-to-point (GPU-to-GPU) send and receive operations.
===========================
RCCL documentation
===========================
The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, RDMA based collectives. RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
Welcome to the ROCm Collective Communication Library (RCCL) docs home page! To learn more, see :ref:`what-is-rccl`.
RCCL supports an arbitrary number of GPUs installed in a single-node or multi-node platform. It can be easily integrated into either single- or multi-process (e.g., MPI) applications.
Our documentation is structured as follows:
.. grid:: 2
:gutter: 3
.. grid-item-card:: API reference
* :ref:`Library specification<library-specification>`
* :ref:`api-library`
To contribute to the documentation refer to
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
Licensing information can be found on the
`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
+13 -15
View File
@@ -1,14 +1,16 @@
.. toctree::
:maxdepth: 4
:caption: Contents:
.. meta::
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
:keywords: RCCL, ROCm, library, API
===
API
===
.. _library-specification:
This section provides details of the library API
============================
RCCL library specification
============================
Communicator Functions
This document provides details of the API library.
Communicator functions
----------------------
.. doxygenfunction:: ncclGetUniqueId
@@ -27,7 +29,7 @@ Communicator Functions
.. doxygenfunction:: ncclCommUserRank
Collective Communication Operations
Collective communication operations
-----------------------------------
Collective communication operations must be called separately for each communicator in a communicator clique.
@@ -58,7 +60,7 @@ Since they may perform inter-CPU synchronization, each call has to be done from
.. doxygenfunction:: ncclAllToAll
Group Semantics
Group semantics
---------------
When managing multiple GPUs from a single thread, and since NCCL collective
calls may perform inter-CPU synchronization, we need to "group" calls for
@@ -78,7 +80,7 @@ of ncclGroupStart/ncclGroupEnd.
.. doxygenfunction:: ncclGroupEnd
Library Functions
Library functions
-----------------
.. doxygenfunction:: ncclGetVersion
@@ -108,7 +110,3 @@ This section provides all the enumerations used.
.. doxygenenum:: ncclRedOp_t
.. doxygenenum:: ncclDataType_t
+8 -3
View File
@@ -1,10 +1,15 @@
root: index
subtrees:
- entries:
- file: api
- file: allapi
- file: attributions
- file: what-is-rccl
- caption: API reference
entries:
- file: library-specification
title: Library specification
- file: api-library
- caption: About
entries:
- file: license
- file: attributions
+16
View File
@@ -0,0 +1,16 @@
.. meta::
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
:keywords: RCCL, ROCm, library, API
.. _what-is-rccl:
=====================
What is RCCL?
=====================
RCCL (pronounced “Rickel”) is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
It implements routines such as `all-reduce`, `all-gather`, `reduce`, `broadcast`, `reduce-scatter`, `gather`, `scatter`, `all-to-allv`, and `all-to-all` as well as direct point-to-point (GPU-to-GPU) send and receive operations.
The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, and RDMA based collectives.
RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
It supports an arbitrary number of GPUs installed in a single-node or multi-node platform and can be easily integrated into single- or multi-process (e.g., MPI) applications.