From 9da345dadf703e383b545572c62aba02848e8221 Mon Sep 17 00:00:00 2001 From: Alex Breslow Date: Tue, 8 Apr 2025 11:19:45 -0500 Subject: [PATCH] Add instructions to README regarding benchmarking on pre ROCm 6.4.x versions with HSA_NO_SCRATCH_RECLAIM=1 (#114) [ROCm/rccl-tests commit: 284ff2ac84d38456ce5ab837edf70d01c48f926c] --- projects/rccl-tests/README.md | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/projects/rccl-tests/README.md b/projects/rccl-tests/README.md index 02a82ee71d..c89c15bb28 100644 --- a/projects/rccl-tests/README.md +++ b/projects/rccl-tests/README.md @@ -59,6 +59,18 @@ Running with 1 MPI process per GPU ensures a 1:1 mapping for CPUs and GPUs, whic See the [Performance](doc/PERFORMANCE.md) page for explanation about numbers, and in particular the "busbw" column. +### Environment variables +On some older versions of ROCm before 6.4.0, setting `HSA_NO_SCRATCH_RECLAIM=1` + as part of the environment might be necessary to achieve better performance. When running without MPI, a command similar to the following one should be sufficient: +```shell +HSA_NO_SCRATCH_RECLAIM=1 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8 +``` + +For MPI, you might need to use a command similar to the following: +```shell +mpirun.mpich -np 8 -env NCCL_DEBUG=VERSION -env HSA_NO_SCRATCH_RECLAIM=1 ./build/all_reduce_perf -b 8M -e 128M -i 8388608 -g 1 -d bfloat16 +``` + ### Arguments All tests support the same set of arguments :