Docs: Add environment variables reference page
[ROCm/rccl commit: aaf8613b76]
This commit is contained in:
@@ -3,3 +3,6 @@
|
||||
/coverage/
|
||||
build/
|
||||
ext/
|
||||
|
||||
# Visual Studio Code
|
||||
.vscode
|
||||
@@ -0,0 +1,165 @@
|
||||
.. meta::
|
||||
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
|
||||
:keywords: RCCL, ROCm, library, API, reference, environment variable, environment
|
||||
|
||||
.. _env-variables:
|
||||
|
||||
********************************************************************
|
||||
RCCL environment variables
|
||||
********************************************************************
|
||||
|
||||
This section describes the most important RCCL environment variables,
|
||||
which are grouped by functionality.
|
||||
|
||||
Configuration and setup
|
||||
========================
|
||||
|
||||
The configuration and setup environment variables for RCCL are collected
|
||||
in the following table.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 70,30
|
||||
|
||||
* - **Environment variable**
|
||||
- **Value**
|
||||
|
||||
* - | ``NCCL_CONF_FILE``
|
||||
| Specifies the path to the RCCL configuration file.
|
||||
- | String path to configuration file
|
||||
| Default: ``~/.rccl.conf`` or ``/etc/rccl.conf``
|
||||
|
||||
* - | ``NCCL_HOSTID``
|
||||
| Sets the host identifier for multi-node communication.
|
||||
- | String value for host identification
|
||||
| Used for host hash generation
|
||||
|
||||
Logging and debugging
|
||||
=====================
|
||||
|
||||
The logging and debugging environment variables for RCCL are collected
|
||||
in the following table.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 70,30
|
||||
|
||||
* - **Environment variable**
|
||||
- **Value**
|
||||
|
||||
* - | ``RCCL_LOG_LEVEL``
|
||||
| Controls RCCL logging verbosity.
|
||||
- | Integer value (default: ``1``)
|
||||
| Higher values increase logging detail
|
||||
|
||||
* - | ``NCCL_DEBUG_SUBSYS``
|
||||
| Controls which subsystems generate debug output.
|
||||
- | Comma-separated list of subsystems (e.g., ``INIT,COLL``)
|
||||
| Prefix with ``^`` to invert selection
|
||||
|
||||
Algorithm and protocol control
|
||||
==============================
|
||||
|
||||
The algorithm and protocol control environment variables for RCCL are
|
||||
collected in the following table.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 70,30
|
||||
|
||||
* - **Environment variable**
|
||||
- **Value**
|
||||
|
||||
* - | ``NCCL_ALGO``
|
||||
| Forces specific algorithm selection for collectives.
|
||||
- | Algorithm name string
|
||||
| Used to override automatic algorithm selection
|
||||
|
||||
* - | ``NCCL_PROTO``
|
||||
| Forces specific protocol selection for communication.
|
||||
- | Protocol name string
|
||||
| Used to override automatic protocol selection
|
||||
|
||||
Network and topology
|
||||
====================
|
||||
|
||||
The network and topology environment variables for RCCL are collected
|
||||
in the following table.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 70,30
|
||||
|
||||
* - **Environment variable**
|
||||
- **Value**
|
||||
|
||||
* - | ``NCCL_IB_HCA``
|
||||
| Specifies InfiniBand device:port to use.
|
||||
- | Device specification string
|
||||
| Prefix with ``^`` for exclusion, ``=`` for exact match
|
||||
|
||||
* - | ``NCCL_IB_GID_INDEX``
|
||||
| Defines the Global ID index used in RoCE mode.
|
||||
- | Integer value (default: ``-1``)
|
||||
| See InfiniBand ``show_gids`` command for valid values
|
||||
|
||||
* - | ``NCCL_SOCKET_IFNAME``
|
||||
| Specifies which IP interfaces to use for communication.
|
||||
- | Interface prefix string or list
|
||||
| Multiple prefixes separated by ``,``
|
||||
| Prefix with ``^`` for exclusion, ``=`` for exact match
|
||||
| Example: ``eth`` (all eth interfaces), ``=eth0`` (exact match)
|
||||
|
||||
* - | ``NCCL_SOCKET_FAMILY``
|
||||
| Forces IPv4/IPv6 interface selection.
|
||||
- | ``AF_INET``: Force IPv4
|
||||
| ``AF_INET6``: Force IPv6
|
||||
| Unset: Use first available
|
||||
|
||||
* - | ``NCCL_NET_MERGE_LEVEL``
|
||||
| Controls network device merging behavior.
|
||||
- | Integer value specifying merge level
|
||||
| Default: ``PATH_PORT``
|
||||
|
||||
* - | ``NCCL_NET_FORCE_MERGE``
|
||||
| Forces merging of network devices.
|
||||
- | String specifying forced merge configuration
|
||||
|
||||
* - | ``NCCL_RINGS``
|
||||
| Defines custom ring topology.
|
||||
- | Ring topology specification string
|
||||
| Overrides automatic topology detection
|
||||
|
||||
* - | ``RCCL_TREES``
|
||||
| Defines custom tree topology.
|
||||
- | Tree topology specification string
|
||||
| Alternative to ring topology
|
||||
|
||||
* - | ``NCCL_RINGS_REMAP``
|
||||
| Controls ring remapping for specific topologies.
|
||||
- | Remapping specification string
|
||||
| Used with Rome 4P2H topology
|
||||
|
||||
Development and testing (advanced)
|
||||
==================================
|
||||
|
||||
The development and testing environment variables for RCCL are
|
||||
collected in the following table. These variables are primarily
|
||||
intended for debugging and development purposes.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
:widths: 70,30
|
||||
|
||||
* - **Environment variable**
|
||||
- **Value**
|
||||
|
||||
* - | ``CUDA_LAUNCH_BLOCKING``
|
||||
| Controls CUDA kernel launch blocking behavior.
|
||||
- | ``0``: Non-blocking launches
|
||||
| ``1`` or non-zero: Blocking launches
|
||||
|
||||
* - | ``NCCL_COMM_ID``
|
||||
| Enables multi-process mode in test applications.
|
||||
- | Any non-empty value enables multi-process mode
|
||||
| Used with test executables for distributed testing
|
||||
@@ -36,12 +36,13 @@ The RCCL public repository is located at `<https://github.com/ROCm/rccl>`_.
|
||||
|
||||
* `RCCL Tuner plugin examples <https://github.com/ROCm/rccl/tree/develop/ext-tuner/example>`_
|
||||
* `NCCL Net plugin examples <https://github.com/ROCm/rccl/tree/develop/ext-net/example>`_
|
||||
|
||||
|
||||
.. grid-item-card:: API reference
|
||||
|
||||
* :ref:`Library specification<library-specification>`
|
||||
* :ref:`api-library`
|
||||
|
||||
* :ref:`Environment variables<env-variables>`
|
||||
|
||||
To contribute to the documentation, see
|
||||
`Contributing to ROCm <https://rocm.docs.amd.com/en/latest/contribute/contributing.html>`_.
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@ subtrees:
|
||||
- file: what-is-rccl.rst
|
||||
title: What is RCCL?
|
||||
|
||||
- caption: Install
|
||||
- caption: Install
|
||||
entries:
|
||||
- file: install/installation
|
||||
title: Installation guide
|
||||
@@ -14,7 +14,7 @@ subtrees:
|
||||
- file: install/building-installing
|
||||
title: Building and installing from source
|
||||
|
||||
- caption: How to
|
||||
- caption: How to
|
||||
entries:
|
||||
- file: how-to/using-rccl-tuner-plugin-api
|
||||
title: Using the RCCL Tuner plugin
|
||||
@@ -31,12 +31,14 @@ subtrees:
|
||||
- url: https://github.com/ROCm/rccl/tree/develop/ext-net/example
|
||||
title: NCCL Net plugin examples
|
||||
|
||||
- caption: API reference
|
||||
- caption: API reference
|
||||
entries:
|
||||
- file: api-reference/library-specification
|
||||
title: Library specification
|
||||
- file: api-reference/api-library
|
||||
|
||||
- file: api-reference/env-variables
|
||||
title: Environment variables
|
||||
|
||||
- caption: About
|
||||
entries:
|
||||
- file: license
|
||||
|
||||
Reference in New Issue
Block a user