refactor RCCL (#1112)

* refactor RCCL * rccl updates * Update index.rst * refactor * Update what-is-rccl.rst
2024-03-15 14:14:47 +05:30
parent 50f22e8317
commit 45ee5734dd
6 changed files with 72 additions and 32 deletions
@@ -1,7 +0,0 @@
-=======
-All API
-=======
-
-.. doxygenindex::
-
-
@@ -0,0 +1,11 @@
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API
+
+.. _api-library:
+
+=============
+API library
+=============
+
+.. doxygenindex::
@@ -1,11 +1,28 @@
-****
-RCCL
-****
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API

-The ROCm Collective Communication Library (RCCL) is a stand-alone library which provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
+.. _index:

-RCCL (pronounced “Rickel”) implements routines such as all-reduce, all-gather, reduce, broadcast, reduce-scatter, gather, scatter, all-to-allv, and all-to-all as well as direct point-to-point (GPU-to-GPU) send and receive operations.
+===========================
+RCCL documentation
+===========================

-The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, RDMA based collectives. RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
+Welcome to the ROCm Collective Communication Library (RCCL) docs home page! To learn more, see :ref:`what-is-rccl`.

-RCCL supports an arbitrary number of GPUs installed in a single-node or multi-node platform. It can be easily integrated into either single- or multi-process (e.g., MPI) applications.
+Our documentation is structured as follows:
+
+
+.. grid:: 2
+  :gutter: 3
+
+  .. grid-item-card:: API reference
+
+    * :ref:`Library specification<library-specification>`
+    * :ref:`api-library`
+       
+To contribute to the documentation refer to
+`Contributing to ROCm  <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
+
+Licensing information can be found on the
+`Licensing <https://rocm.docs.amd.com/en/latest/about/license.html>`_ page.
@@ -1,14 +1,16 @@
-.. toctree::
-   :maxdepth: 4
-   :caption: Contents:
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API

-===
-API
-===
+.. _library-specification:

-This section provides details of the library API
+============================
+RCCL library specification
+============================

-Communicator Functions
+This document provides details of the API library. 
+
+Communicator functions
 ----------------------

 .. doxygenfunction:: ncclGetUniqueId
@@ -27,7 +29,7 @@ Communicator Functions

 .. doxygenfunction:: ncclCommUserRank

-Collective Communication Operations
+Collective communication operations
 -----------------------------------

 Collective communication operations must be called separately for each communicator in a communicator clique.
@@ -58,7 +60,7 @@ Since they may perform inter-CPU synchronization, each call has to be done from

 .. doxygenfunction:: ncclAllToAll

-Group Semantics
+Group semantics
 ---------------
 When managing multiple GPUs from a single thread, and since NCCL collective
 calls may perform inter-CPU synchronization, we need to "group" calls for
@@ -78,7 +80,7 @@ of ncclGroupStart/ncclGroupEnd.

 .. doxygenfunction:: ncclGroupEnd

-Library Functions
+Library functions
 -----------------

 .. doxygenfunction:: ncclGetVersion
@@ -108,7 +110,3 @@ This section provides all the enumerations used.
 .. doxygenenum:: ncclRedOp_t

 .. doxygenenum:: ncclDataType_t
-
-
-
-
@@ -1,10 +1,15 @@
 root: index
 subtrees:
 - entries:
-  - file: api
-  - file: allapi
-  - file: attributions
+  - file: what-is-rccl
+- caption: API reference 
+  entries:
+  - file: library-specification
+    title: Library specification
+  - file: api-library
+  
 - caption: About
  entries:
  - file: license
+  - file: attributions

@@ -0,0 +1,16 @@
+.. meta::
+   :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
+   :keywords: RCCL, ROCm, library, API
+
+.. _what-is-rccl:
+
+=====================
+What is RCCL?
+=====================
+
+RCCL (pronounced “Rickel”) is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs.
+It implements routines such as `all-reduce`, `all-gather`, `reduce`, `broadcast`, `reduce-scatter`, `gather`, `scatter`, `all-to-allv`, and `all-to-all` as well as direct point-to-point (GPU-to-GPU) send and receive operations.
+The provided collective communication routines are implemented using Ring and Tree algorithms. They are optimized to achieve high bandwidth and low latency by leveraging topology awareness, high-speed interconnects, and RDMA based collectives. 
+
+RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication.
+It supports an arbitrary number of GPUs installed in a single-node or multi-node platform and can be easily integrated into single- or multi-process (e.g., MPI) applications.