Файли
rocm-systems/docs/api-reference/env-variables.rst
T
2025-08-14 09:55:28 +02:00

166 рядки
5.0 KiB
ReStructuredText

.. meta::
:description: RCCL is a stand-alone library that provides multi-GPU and multi-node collective communication primitives optimized for AMD GPUs
:keywords: RCCL, ROCm, library, API, reference, environment variable, environment
.. _env-variables:
********************************************************************
RCCL environment variables
********************************************************************
This section describes the most important RCCL environment variables,
which are grouped by functionality.
Configuration and setup
========================
The configuration and setup environment variables for RCCL are collected
in the following table.
.. list-table::
:header-rows: 1
:widths: 70,30
* - **Environment variable**
- **Value**
* - | ``NCCL_CONF_FILE``
| Specifies the path to the RCCL configuration file.
- | String path to configuration file
| Default: ``~/.rccl.conf`` or ``/etc/rccl.conf``
* - | ``NCCL_HOSTID``
| Sets the host identifier for multi-node communication.
- | String value for host identification
| Used for host hash generation
Logging and debugging
=====================
The logging and debugging environment variables for RCCL are collected
in the following table.
.. list-table::
:header-rows: 1
:widths: 70,30
* - **Environment variable**
- **Value**
* - | ``RCCL_LOG_LEVEL``
| Controls RCCL logging verbosity.
- | Integer value (default: ``1``)
| Higher values increase logging detail
* - | ``NCCL_DEBUG_SUBSYS``
| Controls which subsystems generate debug output.
- | Comma-separated list of subsystems (e.g., ``INIT,COLL``)
| Prefix with ``^`` to invert selection
Algorithm and protocol control
==============================
The algorithm and protocol control environment variables for RCCL are
collected in the following table.
.. list-table::
:header-rows: 1
:widths: 70,30
* - **Environment variable**
- **Value**
* - | ``NCCL_ALGO``
| Forces specific algorithm selection for collectives.
- | Algorithm name string
| Used to override automatic algorithm selection
* - | ``NCCL_PROTO``
| Forces specific protocol selection for communication.
- | Protocol name string
| Used to override automatic protocol selection
Network and topology
====================
The network and topology environment variables for RCCL are collected
in the following table.
.. list-table::
:header-rows: 1
:widths: 70,30
* - **Environment variable**
- **Value**
* - | ``NCCL_IB_HCA``
| Specifies InfiniBand device:port to use.
- | Device specification string
| Prefix with ``^`` for exclusion, ``=`` for exact match
* - | ``NCCL_IB_GID_INDEX``
| Defines the Global ID index used in RoCE mode.
- | Integer value (default: ``-1``)
| See InfiniBand ``show_gids`` command for valid values
* - | ``NCCL_SOCKET_IFNAME``
| Specifies which IP interfaces to use for communication.
- | Interface prefix string or list
| Multiple prefixes separated by ``,``
| Prefix with ``^`` for exclusion, ``=`` for exact match
| Example: ``eth`` (all eth interfaces), ``=eth0`` (exact match)
* - | ``NCCL_SOCKET_FAMILY``
| Forces IPv4/IPv6 interface selection.
- | ``AF_INET``: Force IPv4
| ``AF_INET6``: Force IPv6
| Unset: Use first available
* - | ``NCCL_NET_MERGE_LEVEL``
| Controls network device merging behavior.
- | Integer value specifying merge level
| Default: ``PATH_PORT``
* - | ``NCCL_NET_FORCE_MERGE``
| Forces merging of network devices.
- | String specifying forced merge configuration
* - | ``NCCL_RINGS``
| Defines custom ring topology.
- | Ring topology specification string
| Overrides automatic topology detection
* - | ``RCCL_TREES``
| Defines custom tree topology.
- | Tree topology specification string
| Alternative to ring topology
* - | ``NCCL_RINGS_REMAP``
| Controls ring remapping for specific topologies.
- | Remapping specification string
| Used with Rome 4P2H topology
Development and testing (advanced)
==================================
The development and testing environment variables for RCCL are
collected in the following table. These variables are primarily
intended for debugging and development purposes.
.. list-table::
:header-rows: 1
:widths: 70,30
* - **Environment variable**
- **Value**
* - | ``CUDA_LAUNCH_BLOCKING``
| Controls CUDA kernel launch blocking behavior.
- | ``0``: Non-blocking launches
| ``1`` or non-zero: Blocking launches
* - | ``NCCL_COMM_ID``
| Enables multi-process mode in test applications.
- | Any non-empty value enables multi-process mode
| Used with test executables for distributed testing