* Add argument to select performance test with bias or not; if with bias, the maximum memory usage should be re-calculated and reduce the data size to avoid the Out of Memory issue; if without bias, no need to allocate buffers for bias
* Remove argument option for bias; memory calculation and buffer allocation are determined by the exec name.
---------
Co-authored-by: Li <jialili@ctr2-alola-ctrl-01.amd.com>
[ROCm/rccl-tests commit: 5272cd16ef]
* Update CODEOWNERS
Adding me as a reviewer
* Update .github/CODEOWNERS
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
* Update CODEOWNERS
Added Alex
---------
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
[ROCm/rccl-tests commit: a4943c512e]
One thing missing from the stdout of each performance test is
the name of the test that is actually being run.
This patch adds 2 new messages to the stdout. At the beginning
of the execution of a test (e.g. sendrecv_perf) we will now
see this message:
Collective test starting: sendrecv_perf
And at the end, we will now see this:
Collective test concluded: sendrecv_perf
This is needed when running several tests consecutively and we're
trying to parse the stdout to collect the results.
For example, using a Python script to parse the stdout, one could
retrieve the results for each test and plot them on a graph. This
patch makes it easier to implement such a script.
Signed-off-by: Martin Belanger <martin.belanger@dell.com>
[ROCm/rccl-tests commit: dafb70408d]
* addressing hip_fp8 support compatibility issue
* skipping mulsum and avg test for fp8, using hip_fp8 for product
* syncing with nccl-tests
removing the fp8 filter for pre-hopper gpus and resolving the merge conflict
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl-tests commit: 4b2b635766]
* [BUILD] Add options to install script for compiler and GPU targets
* Fix GPU_TARGETS field and add option for custom ROCm path
* Check for ROCM_PATH
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
[ROCm/rccl-tests commit: 41b383a0d4]
`pthread.h` is included in `src/common.h` but lib is not properly
linked, resulting in the build failing with unresolved symbols when
trying to link.
[ROCm/rccl-tests commit: 5b27b961b2]
Previously, the logger was logging the number of expected bytes a node was to recieve.
This differs from the stdout logging, where the reported message size is the total size of a message.
Signed-off-by: Grant Pinkert <gpinkert@amd.com>
[ROCm/rccl-tests commit: f611dbd49a]
Build option DSO=1 generates libverifiable.so which can be
used to reduce the combined binary size.
Build option NAME_SUFFIX can be used to a add suffix to all
generated binaries. e.g. NAME_SUFFIX=_mpi
Added new make target: clean_intermediates
[ROCm/rccl-tests commit: 1021260ca9]
* skipping the prod test for FP8 types in reduce and reduce-scatter
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl-tests commit: 5e838ad9df]
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.
Will be overrided by `NCCL_TESTS_SPLIT_MASK`.
Examples:
NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72.
You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.
The followings are all equivalent:
NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"
[ROCm/rccl-tests commit: a89cf07fe8]