2024-11-27 15:34:26 -05:00
|
|
|
.. meta::
|
|
|
|
|
:description: Instruction on how to install the RCCL library for collective communication primitives using Docker
|
|
|
|
|
:keywords: RCCL, ROCm, library, API, install, Docker
|
|
|
|
|
|
|
|
|
|
.. _install-docker:
|
|
|
|
|
|
|
|
|
|
*****************************************
|
|
|
|
|
Running RCCL using Docker
|
|
|
|
|
*****************************************
|
|
|
|
|
|
|
|
|
|
To use Docker to run RCCL, Docker must already be installed on the system.
|
|
|
|
|
To build the Docker image and run the container, follow these steps.
|
|
|
|
|
|
|
|
|
|
#. Build the Docker image
|
|
|
|
|
|
|
|
|
|
By default, the Dockerfile uses ``docker.io/rocm/dev-ubuntu-22.04:latest`` as the base Docker image.
|
2025-04-10 11:40:10 -05:00
|
|
|
It then installs RCCL and rccl-tests (in both cases, it uses the version from the ``develop`` branch).
|
2024-11-27 15:34:26 -05:00
|
|
|
|
|
|
|
|
Use this command to build the Docker image:
|
|
|
|
|
|
|
|
|
|
.. code-block:: shell
|
|
|
|
|
|
|
|
|
|
docker build -t rccl-tests -f Dockerfile.ubuntu --pull .
|
|
|
|
|
|
2025-04-10 11:40:10 -05:00
|
|
|
The base Docker image, rccl repository, rccl-tests repository, and GPU targets can be modified
|
|
|
|
|
by using ``--build-args`` in the ``docker build`` command above. For example, to use a different base Docker image for the MI250 GPU,
|
2024-11-27 15:34:26 -05:00
|
|
|
use this command:
|
|
|
|
|
|
|
|
|
|
.. code-block:: shell
|
|
|
|
|
|
2025-04-10 11:40:10 -05:00
|
|
|
docker build -t rccl-tests -f Dockerfile.ubuntu --build-arg="ROCM_IMAGE_NAME=rocm/dev-ubuntu-20.04" --build-arg="ROCM_IMAGE_TAG=6.2" --build-arg="GPU_TARGETS=gfx90a" --pull .
|
2024-11-27 15:34:26 -05:00
|
|
|
|
|
|
|
|
#. Launch an interactive Docker container on a system with AMD GPUs:
|
|
|
|
|
|
|
|
|
|
.. code-block:: shell
|
|
|
|
|
|
2025-04-10 11:40:10 -05:00
|
|
|
docker run --rm --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --network=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -it rccl-tests /bin/bash
|
2024-11-27 15:34:26 -05:00
|
|
|
|
2025-08-26 10:18:32 -04:00
|
|
|
To run, for example, the ``all_reduce_perf`` test from rccl-tests on 8 AMD GPUs from inside the Docker container, use this command
|
|
|
|
|
for ROCm 6.4.1 or earlier:
|
2024-11-27 15:34:26 -05:00
|
|
|
|
2025-04-10 11:40:10 -05:00
|
|
|
.. code-block:: shell
|
|
|
|
|
|
|
|
|
|
mpirun --allow-run-as-root -np 8 --mca pml ucx --mca btl ^openib -x NCCL_DEBUG=VERSION -x HSA_NO_SCRATCH_RECLAIM=1 /workspace/rccl-tests/build/all_reduce_perf -b 1 -e 16G -f 2 -g 1
|
|
|
|
|
|
2025-08-26 10:18:32 -04:00
|
|
|
For ROCm 6.4.2 or later, use this command:
|
|
|
|
|
|
2024-11-27 15:34:26 -05:00
|
|
|
.. code-block:: shell
|
|
|
|
|
|
|
|
|
|
mpirun --allow-run-as-root -np 8 --mca pml ucx --mca btl ^openib -x NCCL_DEBUG=VERSION /workspace/rccl-tests/build/all_reduce_perf -b 1 -e 16G -f 2 -g 1
|
|
|
|
|
|
2025-04-10 11:40:10 -05:00
|
|
|
For more information on the rccl-tests options, see the `Usage guidelines <https://github.com/ROCm/rccl-tests#usage>`_ in the GitHub repository.
|