From a2474846f5ed1565d2a6f479d2952cf389c1a34b Mon Sep 17 00:00:00 2001 From: Nilesh M Negi Date: Tue, 6 Aug 2024 11:12:09 -0500 Subject: [PATCH] [README] Tips on using less than 8 MI300 GPUs (#1270) Signed-off-by: nileshnegi --- README.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/README.md b/README.md index abaac66a13..cd9322007f 100644 --- a/README.md +++ b/README.md @@ -148,6 +148,17 @@ pip3 install -r sphinx/requirements.txt python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html ``` +### Improving performance on MI300 when using less than 8 GPUs + +On a system with 8\*MI300X GPUs, each pair of GPUs are connected with dedicated XGMI links in a fully-connected topology. So, for collective operations, one can achieve good performance when all 8 GPUs (and all XGMI links) are used. When using less than 8 GPUs, one can only achieve a fraction of the potential bandwidth on the system. + +But, if your workload warrants using less than 8 MI300 GPUs on a system, you can set the run-time variable `NCCL_MIN_NCHANNELS` to increase the number of channels.\ +E.g.: `export NCCL_MIN_NCHANNELS=32` + +Increasing the number of channels can be beneficial to performance, but it also increases GPU utilization for collective operations. + +Additionally, we have pre-defined higher number of channels when using only 2 GPUs or 4 GPUs on a 8\*MI300 system. Here, RCCL will use **32 channels** for the 2 MI300 GPUs scenario and **24 channels** for the 4 MI300 GPUs scenario. + ## Copyright All source code and accompanying documentation is copyright (c) 2015-2022, NVIDIA CORPORATION. All rights reserved.