Граф коммитов

295 Коммитов

Автор SHA1 Сообщение Дата
Rahul Vaidya a52452e891 Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161)
Signed-off-by: ravaidya <ravaidya@amd.com>
2026-01-16 11:28:40 -08:00
amd-jiali 5272cd16ef Fix Out of Memory issue when allocating bias buffer (#160)
* Add argument to select performance test with bias or not; if with bias, the maximum memory usage should be re-calculated and reduce the data size to avoid the Out of Memory issue; if without bias, no need to allocate buffers for bias

* Remove argument option for bias; memory calculation and buffer allocation are determined by the exec name.

---------

Co-authored-by: Li <jialili@ctr2-alola-ctrl-01.amd.com>
2025-12-11 14:00:29 -08:00
gilbertlee-amd 6405c76e68 Fixing install script hip_compiler bug and improving logging on fallback (#156)
* Fixing install script hip_compiler bug and improving logging on fallback
2025-10-29 10:57:56 -06:00
mberenjk 33cc4df1e4 Fixing the AR_Bias issue for FP8 (#155)
Authored-by: Marzieh Berenjkoub <146776561+mberenjk@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-10-18 14:46:31 -05:00
Wenkai Du db6ea5a594 Add all_reduce_bias_perf to support All Reduce with Bias (#130)
Use dynamic symbol loading of ncclAllReduceWithBias

Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>
2025-10-13 16:09:10 -05:00
Nilesh M Negi d0a99b1847 [BUILD] Add link to libdl for RCCL-Tests builds (#153) 2025-10-05 04:12:05 -05:00
David DeBonis a4943c512e Update CODEOWNERS (#154)
* Update CODEOWNERS

Adding me as a reviewer

* Update .github/CODEOWNERS

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

* Update CODEOWNERS

Added Alex

---------

Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-10-01 07:07:28 -06:00
Nilesh M Negi a15d1edaa3 [BUILD] Add rccl_compat.h to src/CMakeLists.txt (#152) 2025-09-28 13:33:33 -05:00
Mustafa Abduljabbar 0c94d4d2b3 Enable viewing algo/proto/channels used in rccl-tests output (#151)
* Enable algo/proto/channel viewing 

* Use dynamic symbol loading to avoid build/runtime issues with non-compatible RCCL versions

* Reduce code duplication
2025-09-26 18:09:01 -04:00
arvindcheru e1b8a3aefc Dependency removal with hipify_perl symlink (#150) 2025-09-15 13:16:09 -05:00
nileshnegi 690f97c119 Merge pull request #147 from nileshnegi/sync/nccl-tests_v2.16.7
[SYNC] NCCL-Tests v2.16.7
2025-08-18 15:28:34 -04:00
Nilesh M Negi 6f1b11ad49 Merge remote-tracking branch 'nccl-tests/master' into develop 2025-08-16 16:10:04 -04:00
Kajsa Arnold a7809b3243 Standardize output formats (#140)
* remove spaces from csv
* consistently set redop to none when applicable
* write output file after test finishes
2025-07-30 17:28:04 -05:00
David Addison fae7cb4727 Merge pull request #316 from martin-belanger/print-program-name
Print the name of the program being executed before and after test output
2025-07-24 14:58:54 -07:00
Bertan Dogancay 645be0eb45 [Common] Use NCCL API to allocate/free memory (#144) 2025-07-24 11:14:49 -04:00
David Addison 6edafa0a9c Add extra reserved space during maxBytes calculation
Also, don't allow minBytes > maxBytes
2025-07-23 16:19:37 -07:00
David Addison def2d3689c Minor fix to Makefile
Move comments to separate lines
2025-07-23 16:04:30 -07:00
Bertan Dogancay a9b1ce0456 Merge pull request #143 from ROCm/v2.16.4
[SYNC] NCCL-Tests v2.16.4
2025-07-23 15:31:31 -04:00
BertanDogancay 50a26637fb Merge remote-tracking branch 'nccl-tests/master' into develop 2025-07-23 14:23:22 -05:00
Nilesh M Negi 2c255c4763 [BUILD] Fix GPU_TARGETS in Makefile for ROCm 7.0 (#136) 2025-07-16 09:38:33 -05:00
Sam Wu 66e513c24f Remove precheckin script (#88) 2025-07-11 13:49:38 -06:00
Sam Wu aac5f2b56c Remove call to junit in math ci (#124) 2025-07-04 11:54:11 -06:00
Satyanvesh Dittakavi 0039629ac5 Add cstring header explictly as it is removed from HIP (#132) 2025-06-24 15:09:23 -05:00
David Addison 97ee098516 Add Turing (SM75) support to CUDA 13.0 builds 2025-06-04 17:54:58 -07:00
David Addison e7c8825b0b Wrap ncclCommWindowRegister() calls within ncclGroup 2025-06-03 10:36:53 -07:00
Martin Belanger dafb70408d Print the name of the program being executed
One thing missing from the stdout of each performance test is
the name of the test that is actually being run.

This patch adds 2 new messages to the stdout. At the beginning
of the execution of a test (e.g. sendrecv_perf) we will now
see this message:

  Collective test starting: sendrecv_perf

And at the end, we will now see this:

  Collective test concluded: sendrecv_perf

This is needed when running several tests consecutively and we're
trying to parse the stdout to collect the results.

For example, using a Python script to parse the stdout, one could
retrieve the results for each test and plot them on a graph. This
patch makes it easier to implement such a script.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
2025-06-03 11:43:02 -04:00
David Addison 5290298ab6 Reinstate Pascal suppport for CUDA 12.8+ builds 2025-06-02 09:29:52 -07:00
David Addison 8bc16f4e01 Need to drop Volta (sm_70) support from CUDA 13.0 2025-05-30 18:04:25 -07:00
David Addison 0c60e6a8e4 Fix formatting errors in README.md 2025-05-30 17:43:30 -07:00
David Addison a5c539e68b Add support for Symmetric Memory Registration
From NCCL 2.27.x we can now use the Symmetric Memory APIs (-R 2)
2025-05-30 17:31:34 -07:00
Nilesh M Negi b0a3841b35 [BUILD] Fix logic for rocm-cmake dependency (#129)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-05-22 22:27:09 -05:00
mberenjk 9076091602 Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-16 09:14:46 -05:00
Rahul Vaidya 0abe3c80bb Ensure backward compatibility for fp8 datatypes (#126)
* Ensure backward compatibility for fp8 datatypes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Update code comments

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>
2025-05-15 13:56:40 -05:00
mberenjk 4b2b635766 Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109)
* addressing hip_fp8 support compatibility issue

* skipping mulsum and avg test for fp8, using hip_fp8 for product

* syncing with nccl-tests

removing the fp8 filter for pre-hopper gpus and resolving the merge conflict

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-14 15:30:07 -05:00
Wenkai Du cac33a8c2f Automatically set in-place option from out-of-place (#123) 2025-05-09 16:48:42 -05:00
Nilesh M Negi 41b383a0d4 [BUILD] Add options to install script for compiler and GPU targets (#121)
* [BUILD] Add options to install script for compiler and GPU targets
* Fix GPU_TARGETS field and add option for custom ROCm path
* Check for ROCM_PATH

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-05-07 13:19:10 -05:00
David Addison e041d901e6 Re-add sm_70 support for CUDA 12.8+ and 13.0 builds 2025-05-07 10:30:59 -07:00
Marius Brehler 5b27b961b2 Link Threads::Threads (#119)
`pthread.h` is included in `src/common.h` but lib is not properly
linked, resulting in the build failing with unresolved symbols when
trying to link.
2025-04-29 16:18:51 -05:00
Nilesh M Negi c96deb13cd [BUILD] Fix rccl-tests version string for packaging (#117)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-04-29 08:51:43 -05:00
Rahul Vaidya a4fd8f4667 Fix build issues caused by 2.24.3 sync (#118) 2025-04-28 10:22:38 -05:00
Grant Pinkert f611dbd49a Fix message size logging (#115)
Previously, the logger was logging the number of expected bytes a node was to recieve.
This differs from the stdout logging, where the reported message size is the total size of a message.

Signed-off-by: Grant Pinkert <gpinkert@amd.com>
2025-04-25 11:05:21 -05:00
David Addison 1021260ca9 Make verifiable a DSO and add NAME_SUFFIX support
Build option DSO=1 generates libverifiable.so which can be
used to reduce the combined binary size.

Build option NAME_SUFFIX can be used to a add suffix to all
generated binaries. e.g. NAME_SUFFIX=_mpi

Added new make target: clean_intermediates
2025-04-23 17:07:24 -07:00
Nilesh M Negi 83d38d91b6 Merge pull request #116 from nileshnegi/sync/nccl-tests/02-28-2025
[SYNC] NCCL-Tests v2.14.1
2025-04-21 19:53:35 -05:00
nileshnegi 5625599dda Merge remote-tracking branch 'nccl-tests/master' into develop 2025-04-21 19:46:10 -05:00
David Addison 501a149d57 Add support for FP8 datatypes
Added new datatypes: f8e4m3, f8e5m2

Only supported on H100+ architectures and NCCL versions >= 2.24.0
2025-04-18 19:20:59 -07:00
mberenjk 5e838ad9df skipping the prod test for FP8 types in reduce and reduce-scatter (#111)
* skipping the prod test for FP8 types in reduce and reduce-scatter
---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-04-15 09:38:33 -05:00
Alex Breslow 284ff2ac84 Add instructions to README regarding benchmarking on pre ROCm 6.4.x versions with HSA_NO_SCRATCH_RECLAIM=1 (#114) 2025-04-08 09:59:57 -07:00
David Addison b4300cc79d Add PCI domain and device ID for GPU device BDF display 2025-02-28 13:25:51 -08:00
Sylvain Jeaugey 903918fc54 Add NCCL_TESTS_SPLIT documentation in the README 2025-02-06 14:10:07 +01:00
Junyu Ma a89cf07fe8 Perftests: Introduce NCCL_TESTS_SPLIT env
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.

Will be overrided by `NCCL_TESTS_SPLIT_MASK`.

Examples:

NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7"  # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72"   # color = rank % 72.  One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72"   # color = rank / 72.  Intra NVLink domain on NVL72.

You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.

The followings are all equivalent:

NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"
2025-02-04 15:18:09 -08:00