From 9898395fbe28c4a6aca08c08fb32adb257affd08 Mon Sep 17 00:00:00 2001 From: Jeffrey Novotny Date: Tue, 12 Nov 2024 13:15:27 -0500 Subject: [PATCH] Refactor landing page and move some info to What is RCCL (#1415) [ROCm/rccl commit: 2d07f1869630636ea18e95b8ac34d0a7832f2dae] --- projects/rccl/docs/index.rst | 27 +++++++++++------------ projects/rccl/docs/sphinx/_toc.yml.in | 7 +++++- projects/rccl/docs/what-is-rccl.rst | 31 +++++++++++++++++++++++++++ 3 files changed, 50 insertions(+), 15 deletions(-) create mode 100644 projects/rccl/docs/what-is-rccl.rst diff --git a/projects/rccl/docs/index.rst b/projects/rccl/docs/index.rst index ab157789e4..34125e1801 100644 --- a/projects/rccl/docs/index.rst +++ b/projects/rccl/docs/index.rst @@ -8,24 +8,23 @@ RCCL documentation ****************** -The ROCm Communication Collectives Library (RCCL) is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs. -It implements routines such as ``all-reduce``, ``all-gather``, ``reduce``, ``broadcast``, ``reduce-scatter``, ``gather``, ``scatter``, ``all-to-allv``, and ``all-to-all`` as well as direct point-to-point (GPU-to-GPU) send and receive operations. It has been optimized to achieve high bandwidth on platforms using PCIe, xGMI as well as networking using InfiniBand Verbs or TCP/IP sockets. RCCL supports an arbitrary number of GPUs installed in a single node or multiple nodes, and can be used in either single- or multi-process (e.g., MPI) applications. +The ROCm Communication Collectives Library (RCCL) is a stand-alone library +that provides multi-GPU and multi-node collective communication primitives +optimized for AMD GPUs. It uses PCIe and xGMI high-speed interconnects. +To learn more, see :doc:`what-is-rccl` -The collective operations are implemented using Ring and Tree algorithms, and have been optimized for throughput and latency by leveraging topology awareness, high-speed interconnects, and RDMA based collectives. For best performance, small operations can be either batched into larger operations or aggregated through the API. - -RCCL utilizes PCIe and xGMI high-speed interconnects for intra-node communication as well as InfiniBand, RoCE, and TCP/IP for inter-node communication. It supports an arbitrary number of GPUs installed in a single-node or multi-node platform and can be easily integrated into single- or multi-process (e.g., MPI) applications. - -You can access RCCL code on the `RCCL GitHub repository `_. - -The documentation is structured as follows: +The RCCL public repository is located at ``_. .. grid:: 2 :gutter: 3 - .. grid-item-card:: Installation + .. grid-item-card:: Install + + * :ref:`RCCL installation guide ` + +.. grid:: 2 + :gutter: 3 - * :ref:`install` - .. grid-item-card:: How to * :ref:`using-nccl` @@ -35,8 +34,8 @@ The documentation is structured as follows: * :ref:`Library specification` * :ref:`api-library` -To contribute to the documentation refer to +To contribute to the documentation, see `Contributing to ROCm `_. -Licensing information can be found on the +You can find licensing information on the `Licensing `_ page. diff --git a/projects/rccl/docs/sphinx/_toc.yml.in b/projects/rccl/docs/sphinx/_toc.yml.in index cbe87e069f..7d991867e6 100644 --- a/projects/rccl/docs/sphinx/_toc.yml.in +++ b/projects/rccl/docs/sphinx/_toc.yml.in @@ -1,9 +1,14 @@ root: index subtrees: -- caption: Installation +- entries: + - file: what-is-rccl.rst + title: What is RCCL? + +- caption: Install entries: - file: install/installation + title: Installation guide - caption: How to entries: diff --git a/projects/rccl/docs/what-is-rccl.rst b/projects/rccl/docs/what-is-rccl.rst new file mode 100644 index 0000000000..f95bed2e26 --- /dev/null +++ b/projects/rccl/docs/what-is-rccl.rst @@ -0,0 +1,31 @@ +.. meta:: + :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs + :keywords: RCCL, ROCm, library, API + +.. _what-is: + +****************** +What is RCCL? +****************** + +The ROCm Communication Collectives Library (RCCL) includes multi-GPU and +multi-node collective communication primitives optimized for AMD GPUs. +It implements routines such as ``all-reduce``, ``all-gather``, ``reduce``, +``broadcast``, ``reduce-scatter``, ``gather``, ``scatter``, ``all-to-allv``, +and ``all-to-all``, as well as direct point-to-point (GPU-to-GPU) send +and receive operations. It is optimized to achieve high bandwidth +on platforms using PCIe and xGMI and networking using InfiniBand Verbs or TCP/IP +sockets. RCCL supports an arbitrary number of GPUs installed in a single node +or multiple nodes and can be used in either +single- or multi-process (for example, MPI) applications. + +The collective operations are implemented using ring and tree algorithms and have been optimized +for throughput and latency by leveraging topology awareness, high-speed interconnects, +and RDMA-based collectives. For best performance, small operations can be either +batched into larger operations or aggregated through the API. + +RCCL uses PCIe and xGMI high-speed interconnects for intra-node communication +as well as InfiniBand, RoCE, and TCP/IP for inter-node communication. +It supports an arbitrary number of GPUs installed in a single-node or +multi-node platform and can easily integrate into +single- or multi-process (for example, MPI) applications. \ No newline at end of file