From d3e9db9432634ebeb0af485d9e82c8e701e85386 Mon Sep 17 00:00:00 2001 From: Adel Johar Date: Fri, 13 Jun 2025 12:16:49 +0200 Subject: [PATCH] Docs: Add environment variables reference page [ROCm/rccl commit: aaf8613b7668748499d5cb4f364f3d9b4a74137c] --- projects/rccl/.gitignore | 3 + .../rccl/docs/api-reference/env-variables.rst | 165 ++++++++++++++++++ projects/rccl/docs/index.rst | 5 +- projects/rccl/docs/sphinx/_toc.yml.in | 10 +- 4 files changed, 177 insertions(+), 6 deletions(-) create mode 100644 projects/rccl/docs/api-reference/env-variables.rst diff --git a/projects/rccl/.gitignore b/projects/rccl/.gitignore index b5c0617484..d3dedfb53d 100644 --- a/projects/rccl/.gitignore +++ b/projects/rccl/.gitignore @@ -3,3 +3,6 @@ /coverage/ build/ ext/ + +# Visual Studio Code +.vscode \ No newline at end of file diff --git a/projects/rccl/docs/api-reference/env-variables.rst b/projects/rccl/docs/api-reference/env-variables.rst new file mode 100644 index 0000000000..165f3f0816 --- /dev/null +++ b/projects/rccl/docs/api-reference/env-variables.rst @@ -0,0 +1,165 @@ +.. meta:: + :description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs + :keywords: RCCL, ROCm, library, API, reference, environment variable, environment + +.. _env-variables: + +******************************************************************** +RCCL environment variables +******************************************************************** + +This section describes the most important RCCL environment variables, +which are grouped by functionality. + +Configuration and setup +======================== + +The configuration and setup environment variables for RCCL are collected +in the following table. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``NCCL_CONF_FILE`` + | Specifies the path to the RCCL configuration file. + - | String path to configuration file + | Default: ``~/.rccl.conf`` or ``/etc/rccl.conf`` + + * - | ``NCCL_HOSTID`` + | Sets the host identifier for multi-node communication. + - | String value for host identification + | Used for host hash generation + +Logging and debugging +===================== + +The logging and debugging environment variables for RCCL are collected +in the following table. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``RCCL_LOG_LEVEL`` + | Controls RCCL logging verbosity. + - | Integer value (default: ``1``) + | Higher values increase logging detail + + * - | ``NCCL_DEBUG_SUBSYS`` + | Controls which subsystems generate debug output. + - | Comma-separated list of subsystems (e.g., ``INIT,COLL``) + | Prefix with ``^`` to invert selection + +Algorithm and protocol control +============================== + +The algorithm and protocol control environment variables for RCCL are +collected in the following table. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``NCCL_ALGO`` + | Forces specific algorithm selection for collectives. + - | Algorithm name string + | Used to override automatic algorithm selection + + * - | ``NCCL_PROTO`` + | Forces specific protocol selection for communication. + - | Protocol name string + | Used to override automatic protocol selection + +Network and topology +==================== + +The network and topology environment variables for RCCL are collected +in the following table. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``NCCL_IB_HCA`` + | Specifies InfiniBand device:port to use. + - | Device specification string + | Prefix with ``^`` for exclusion, ``=`` for exact match + + * - | ``NCCL_IB_GID_INDEX`` + | Defines the Global ID index used in RoCE mode. + - | Integer value (default: ``-1``) + | See InfiniBand ``show_gids`` command for valid values + + * - | ``NCCL_SOCKET_IFNAME`` + | Specifies which IP interfaces to use for communication. + - | Interface prefix string or list + | Multiple prefixes separated by ``,`` + | Prefix with ``^`` for exclusion, ``=`` for exact match + | Example: ``eth`` (all eth interfaces), ``=eth0`` (exact match) + + * - | ``NCCL_SOCKET_FAMILY`` + | Forces IPv4/IPv6 interface selection. + - | ``AF_INET``: Force IPv4 + | ``AF_INET6``: Force IPv6 + | Unset: Use first available + + * - | ``NCCL_NET_MERGE_LEVEL`` + | Controls network device merging behavior. + - | Integer value specifying merge level + | Default: ``PATH_PORT`` + + * - | ``NCCL_NET_FORCE_MERGE`` + | Forces merging of network devices. + - | String specifying forced merge configuration + + * - | ``NCCL_RINGS`` + | Defines custom ring topology. + - | Ring topology specification string + | Overrides automatic topology detection + + * - | ``RCCL_TREES`` + | Defines custom tree topology. + - | Tree topology specification string + | Alternative to ring topology + + * - | ``NCCL_RINGS_REMAP`` + | Controls ring remapping for specific topologies. + - | Remapping specification string + | Used with Rome 4P2H topology + +Development and testing (advanced) +================================== + +The development and testing environment variables for RCCL are +collected in the following table. These variables are primarily +intended for debugging and development purposes. + +.. list-table:: + :header-rows: 1 + :widths: 70,30 + + * - **Environment variable** + - **Value** + + * - | ``CUDA_LAUNCH_BLOCKING`` + | Controls CUDA kernel launch blocking behavior. + - | ``0``: Non-blocking launches + | ``1`` or non-zero: Blocking launches + + * - | ``NCCL_COMM_ID`` + | Enables multi-process mode in test applications. + - | Any non-empty value enables multi-process mode + | Used with test executables for distributed testing diff --git a/projects/rccl/docs/index.rst b/projects/rccl/docs/index.rst index 97838d4641..e094e6c900 100644 --- a/projects/rccl/docs/index.rst +++ b/projects/rccl/docs/index.rst @@ -36,12 +36,13 @@ The RCCL public repository is located at ``_. * `RCCL Tuner plugin examples `_ * `NCCL Net plugin examples `_ - + .. grid-item-card:: API reference * :ref:`Library specification` * :ref:`api-library` - + * :ref:`Environment variables` + To contribute to the documentation, see `Contributing to ROCm `_. diff --git a/projects/rccl/docs/sphinx/_toc.yml.in b/projects/rccl/docs/sphinx/_toc.yml.in index 09205f84ef..d8a8d45435 100644 --- a/projects/rccl/docs/sphinx/_toc.yml.in +++ b/projects/rccl/docs/sphinx/_toc.yml.in @@ -5,7 +5,7 @@ subtrees: - file: what-is-rccl.rst title: What is RCCL? -- caption: Install +- caption: Install entries: - file: install/installation title: Installation guide @@ -14,7 +14,7 @@ subtrees: - file: install/building-installing title: Building and installing from source -- caption: How to +- caption: How to entries: - file: how-to/using-rccl-tuner-plugin-api title: Using the RCCL Tuner plugin @@ -31,12 +31,14 @@ subtrees: - url: https://github.com/ROCm/rccl/tree/develop/ext-net/example title: NCCL Net plugin examples -- caption: API reference +- caption: API reference entries: - file: api-reference/library-specification title: Library specification - file: api-reference/api-library - + - file: api-reference/env-variables + title: Environment variables + - caption: About entries: - file: license