257 Коммитов

Автор SHA1 Сообщение Дата
Rahul Vaidya 62dab32433 Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161)
Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl-tests commit: a52452e891]
2026-01-16 11:28:40 -08:00
amd-jiali d5e8f372dc Fix Out of Memory issue when allocating bias buffer (#160)
* Add argument to select performance test with bias or not; if with bias, the maximum memory usage should be re-calculated and reduce the data size to avoid the Out of Memory issue; if without bias, no need to allocate buffers for bias

* Remove argument option for bias; memory calculation and buffer allocation are determined by the exec name.

---------

Co-authored-by: Li <jialili@ctr2-alola-ctrl-01.amd.com>

[ROCm/rccl-tests commit: 5272cd16ef]
2025-12-11 14:00:29 -08:00
gilbertlee-amd 555a5f1892 Fixing install script hip_compiler bug and improving logging on fallback (#156)
* Fixing install script hip_compiler bug and improving logging on fallback

[ROCm/rccl-tests commit: 6405c76e68]
2025-10-29 10:57:56 -06:00
mberenjk abf0605823 Fixing the AR_Bias issue for FP8 (#155)
Authored-by: Marzieh Berenjkoub <146776561+mberenjk@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 33cc4df1e4]
2025-10-18 14:46:31 -05:00
Wenkai Du 75a69211a0 Add all_reduce_bias_perf to support All Reduce with Bias (#130)
Use dynamic symbol loading of ncclAllReduceWithBias

Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>

[ROCm/rccl-tests commit: db6ea5a594]
2025-10-13 16:09:10 -05:00
Nilesh M Negi 28de8ea25f [BUILD] Add link to libdl for RCCL-Tests builds (#153)
[ROCm/rccl-tests commit: d0a99b1847]
2025-10-05 04:12:05 -05:00
David DeBonis 85040cd9de Update CODEOWNERS (#154)
* Update CODEOWNERS

Adding me as a reviewer

* Update .github/CODEOWNERS

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

* Update CODEOWNERS

Added Alex

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: a4943c512e]
2025-10-01 07:07:28 -06:00
Nilesh M Negi 9d300c46f0 [BUILD] Add rccl_compat.h to src/CMakeLists.txt (#152)
[ROCm/rccl-tests commit: a15d1edaa3]
2025-09-28 13:33:33 -05:00
Mustafa Abduljabbar cb4b286d2b Enable viewing algo/proto/channels used in rccl-tests output (#151)
* Enable algo/proto/channel viewing 

* Use dynamic symbol loading to avoid build/runtime issues with non-compatible RCCL versions

* Reduce code duplication

[ROCm/rccl-tests commit: 0c94d4d2b3]
2025-09-26 18:09:01 -04:00
arvindcheru b07376b9ae Dependency removal with hipify_perl symlink (#150)
[ROCm/rccl-tests commit: e1b8a3aefc]
2025-09-15 13:16:09 -05:00
Nilesh M Negi 15bf0f5fd1 Merge remote-tracking branch 'nccl-tests/master' into develop
[ROCm/rccl-tests commit: 6f1b11ad49]
2025-08-16 16:10:04 -04:00
Kajsa Arnold aed68678a4 Standardize output formats (#140)
* remove spaces from csv
* consistently set redop to none when applicable
* write output file after test finishes

[ROCm/rccl-tests commit: a7809b3243]
2025-07-30 17:28:04 -05:00
David Addison 33b74ad124 Merge pull request #316 from martin-belanger/print-program-name
Print the name of the program being executed before and after test output

[ROCm/rccl-tests commit: fae7cb4727]
2025-07-24 14:58:54 -07:00
Bertan Dogancay 7111d2dd99 [Common] Use NCCL API to allocate/free memory (#144)
[ROCm/rccl-tests commit: 645be0eb45]
2025-07-24 11:14:49 -04:00
David Addison 146ecc2212 Add extra reserved space during maxBytes calculation
Also, don't allow minBytes > maxBytes


[ROCm/rccl-tests commit: 6edafa0a9c]
2025-07-23 16:19:37 -07:00
David Addison 57af056dd0 Minor fix to Makefile
Move comments to separate lines


[ROCm/rccl-tests commit: def2d3689c]
2025-07-23 16:04:30 -07:00
BertanDogancay 0010193b64 Merge remote-tracking branch 'nccl-tests/master' into develop
[ROCm/rccl-tests commit: 50a26637fb]
2025-07-23 14:23:22 -05:00
Nilesh M Negi a74d983073 [BUILD] Fix GPU_TARGETS in Makefile for ROCm 7.0 (#136)
[ROCm/rccl-tests commit: 2c255c4763]
2025-07-16 09:38:33 -05:00
Sam Wu c3f93c526d Remove precheckin script (#88)
[ROCm/rccl-tests commit: 66e513c24f]
2025-07-11 13:49:38 -06:00
Sam Wu f0df6fcccb Remove call to junit in math ci (#124)
[ROCm/rccl-tests commit: aac5f2b56c]
2025-07-04 11:54:11 -06:00
Satyanvesh Dittakavi 5fd16bd1c3 Add cstring header explictly as it is removed from HIP (#132)
[ROCm/rccl-tests commit: 0039629ac5]
2025-06-24 15:09:23 -05:00
David Addison 4ec9c91be3 Add Turing (SM75) support to CUDA 13.0 builds
[ROCm/rccl-tests commit: 97ee098516]
2025-06-04 17:54:58 -07:00
David Addison 0ae7c8cbf4 Wrap ncclCommWindowRegister() calls within ncclGroup
[ROCm/rccl-tests commit: e7c8825b0b]
2025-06-03 10:36:53 -07:00
Martin Belanger ce1a83a0e8 Print the name of the program being executed
One thing missing from the stdout of each performance test is
the name of the test that is actually being run.

This patch adds 2 new messages to the stdout. At the beginning
of the execution of a test (e.g. sendrecv_perf) we will now
see this message:

  Collective test starting: sendrecv_perf

And at the end, we will now see this:

  Collective test concluded: sendrecv_perf

This is needed when running several tests consecutively and we're
trying to parse the stdout to collect the results.

For example, using a Python script to parse the stdout, one could
retrieve the results for each test and plot them on a graph. This
patch makes it easier to implement such a script.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>


[ROCm/rccl-tests commit: dafb70408d]
2025-06-03 11:43:02 -04:00
David Addison 3b79e1a05c Reinstate Pascal suppport for CUDA 12.8+ builds
[ROCm/rccl-tests commit: 5290298ab6]
2025-06-02 09:29:52 -07:00
David Addison 07aa6e264d Need to drop Volta (sm_70) support from CUDA 13.0
[ROCm/rccl-tests commit: 8bc16f4e01]
2025-05-30 18:04:25 -07:00
David Addison cc15c84a01 Fix formatting errors in README.md
[ROCm/rccl-tests commit: 0c60e6a8e4]
2025-05-30 17:43:30 -07:00
David Addison 46e09f18c8 Add support for Symmetric Memory Registration
From NCCL 2.27.x we can now use the Symmetric Memory APIs (-R 2)


[ROCm/rccl-tests commit: a5c539e68b]
2025-05-30 17:31:34 -07:00
Nilesh M Negi 2b52453488 [BUILD] Fix logic for rocm-cmake dependency (#129)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: b0a3841b35]
2025-05-22 22:27:09 -05:00
mberenjk db5ab33461 Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 9076091602]
2025-05-16 09:14:46 -05:00
Rahul Vaidya fa5259894c Ensure backward compatibility for fp8 datatypes (#126)
* Ensure backward compatibility for fp8 datatypes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Update code comments

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl-tests commit: 0abe3c80bb]
2025-05-15 13:56:40 -05:00
mberenjk ed6ebb12a7 Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109)
* addressing hip_fp8 support compatibility issue

* skipping mulsum and avg test for fp8, using hip_fp8 for product

* syncing with nccl-tests

removing the fp8 filter for pre-hopper gpus and resolving the merge conflict

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 4b2b635766]
2025-05-14 15:30:07 -05:00
Wenkai Du fe47d3dd77 Automatically set in-place option from out-of-place (#123)
[ROCm/rccl-tests commit: cac33a8c2f]
2025-05-09 16:48:42 -05:00
Nilesh M Negi e3b9d785cc [BUILD] Add options to install script for compiler and GPU targets (#121)
* [BUILD] Add options to install script for compiler and GPU targets
* Fix GPU_TARGETS field and add option for custom ROCm path
* Check for ROCM_PATH

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 41b383a0d4]
2025-05-07 13:19:10 -05:00
David Addison 173c15f4f4 Re-add sm_70 support for CUDA 12.8+ and 13.0 builds
[ROCm/rccl-tests commit: e041d901e6]
2025-05-07 10:30:59 -07:00
Marius Brehler b0b615091e Link Threads::Threads (#119)
`pthread.h` is included in `src/common.h` but lib is not properly
linked, resulting in the build failing with unresolved symbols when
trying to link.

[ROCm/rccl-tests commit: 5b27b961b2]
2025-04-29 16:18:51 -05:00
Nilesh M Negi 6d2ec88eec [BUILD] Fix rccl-tests version string for packaging (#117)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: c96deb13cd]
2025-04-29 08:51:43 -05:00
Rahul Vaidya 10c31fb05f Fix build issues caused by 2.24.3 sync (#118)
[ROCm/rccl-tests commit: a4fd8f4667]
2025-04-28 10:22:38 -05:00
Grant Pinkert 3f962f5d58 Fix message size logging (#115)
Previously, the logger was logging the number of expected bytes a node was to recieve.
This differs from the stdout logging, where the reported message size is the total size of a message.

Signed-off-by: Grant Pinkert <gpinkert@amd.com>

[ROCm/rccl-tests commit: f611dbd49a]
2025-04-25 11:05:21 -05:00
David Addison b8dcb4dd83 Make verifiable a DSO and add NAME_SUFFIX support
Build option DSO=1 generates libverifiable.so which can be
used to reduce the combined binary size.

Build option NAME_SUFFIX can be used to a add suffix to all
generated binaries. e.g. NAME_SUFFIX=_mpi

Added new make target: clean_intermediates


[ROCm/rccl-tests commit: 1021260ca9]
2025-04-23 17:07:24 -07:00
Nilesh M Negi ba1adc3316 Merge pull request #116 from nileshnegi/sync/nccl-tests/02-28-2025
[SYNC] NCCL-Tests v2.14.1

[ROCm/rccl-tests commit: 83d38d91b6]
2025-04-21 19:53:35 -05:00
nileshnegi 8d887aad0d Merge remote-tracking branch 'nccl-tests/master' into develop
[ROCm/rccl-tests commit: 5625599dda]
2025-04-21 19:46:10 -05:00
David Addison 8d71063e05 Add support for FP8 datatypes
Added new datatypes: f8e4m3, f8e5m2

Only supported on H100+ architectures and NCCL versions >= 2.24.0


[ROCm/rccl-tests commit: 501a149d57]
2025-04-18 19:20:59 -07:00
mberenjk f3f3158a7e skipping the prod test for FP8 types in reduce and reduce-scatter (#111)
* skipping the prod test for FP8 types in reduce and reduce-scatter
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 5e838ad9df]
2025-04-15 09:38:33 -05:00
Alex Breslow 9da345dadf Add instructions to README regarding benchmarking on pre ROCm 6.4.x versions with HSA_NO_SCRATCH_RECLAIM=1 (#114)
[ROCm/rccl-tests commit: 284ff2ac84]
2025-04-08 09:59:57 -07:00
David Addison d516392fac Add PCI domain and device ID for GPU device BDF display
[ROCm/rccl-tests commit: b4300cc79d]
2025-02-28 13:25:51 -08:00
Sylvain Jeaugey b740da9a31 Add NCCL_TESTS_SPLIT documentation in the README
[ROCm/rccl-tests commit: 903918fc54]
2025-02-06 14:10:07 +01:00
Junyu Ma e2a9cbb362 Perftests: Introduce NCCL_TESTS_SPLIT env
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.

Will be overrided by `NCCL_TESTS_SPLIT_MASK`.

Examples:

NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7"  # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72"   # color = rank % 72.  One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72"   # color = rank / 72.  Intra NVLink domain on NVL72.

You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.

The followings are all equivalent:

NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"


[ROCm/rccl-tests commit: a89cf07fe8]
2025-02-04 15:18:09 -08:00
David Addison 6f2e0f8a21 Update CUDA gencodes
Add support for Blackwell sm100 and sm120 from CUDA 12.8

Add support for Hopper sm90 from CUDA 12.0


[ROCm/rccl-tests commit: cb6a46fdd6]
2025-01-25 17:32:16 -08:00
Nilesh M Negi 590c2b0187 [GIT] Add CODEOWNERS and PR Template (#102)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 448c4c7269]
2025-01-16 17:05:48 -07:00