rocm-systems

Auteur	SHA1	Bericht	Datum
Rahul Vaidya	62dab32433	Update AlltoAll and AlltoAllv API for 2.28.3 compatibility (#161 ) Signed-off-by: ravaidya <ravaidya@amd.com> [ROCm/rccl-tests commit: `a52452e891`]	2026-01-16 11:28:40 -08:00
amd-jiali	d5e8f372dc	Fix Out of Memory issue when allocating bias buffer (#160 ) * Add argument to select performance test with bias or not; if with bias, the maximum memory usage should be re-calculated and reduce the data size to avoid the Out of Memory issue; if without bias, no need to allocate buffers for bias * Remove argument option for bias; memory calculation and buffer allocation are determined by the exec name. --------- Co-authored-by: Li <jialili@ctr2-alola-ctrl-01.amd.com> [ROCm/rccl-tests commit: `5272cd16ef`]	2025-12-11 14:00:29 -08:00
gilbertlee-amd	555a5f1892	Fixing install script hip_compiler bug and improving logging on fallback (#156 ) * Fixing install script hip_compiler bug and improving logging on fallback [ROCm/rccl-tests commit: `6405c76e68`]	2025-10-29 10:57:56 -06:00
mberenjk	abf0605823	Fixing the AR_Bias issue for FP8 (#155 ) Authored-by: Marzieh Berenjkoub <146776561+mberenjk@users.noreply.github.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> [ROCm/rccl-tests commit: `33cc4df1e4`]	2025-10-18 14:46:31 -05:00
Wenkai Du	75a69211a0	Add all_reduce_bias_perf to support All Reduce with Bias (#130 ) Use dynamic symbol loading of ncclAllReduceWithBias Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com> [ROCm/rccl-tests commit: `db6ea5a594`]	2025-10-13 16:09:10 -05:00
Nilesh M Negi	28de8ea25f	[BUILD] Add link to `libdl` for RCCL-Tests builds (#153 ) [ROCm/rccl-tests commit: `d0a99b1847`]	2025-10-05 04:12:05 -05:00
David DeBonis	85040cd9de	Update CODEOWNERS (#154 ) * Update CODEOWNERS Adding me as a reviewer * Update .github/CODEOWNERS Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> * Update CODEOWNERS Added Alex --------- Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> [ROCm/rccl-tests commit: `a4943c512e`]	2025-10-01 07:07:28 -06:00
Nilesh M Negi	9d300c46f0	[BUILD] Add rccl_compat.h to src/CMakeLists.txt (#152 ) [ROCm/rccl-tests commit: `a15d1edaa3`]	2025-09-28 13:33:33 -05:00
Mustafa Abduljabbar	cb4b286d2b	Enable viewing algo/proto/channels used in rccl-tests output (#151 ) * Enable algo/proto/channel viewing * Use dynamic symbol loading to avoid build/runtime issues with non-compatible RCCL versions * Reduce code duplication [ROCm/rccl-tests commit: `0c94d4d2b3`]	2025-09-26 18:09:01 -04:00
arvindcheru	b07376b9ae	Dependency removal with hipify_perl symlink (#150 ) [ROCm/rccl-tests commit: `e1b8a3aefc`]	2025-09-15 13:16:09 -05:00
Nilesh M Negi	15bf0f5fd1	Merge remote-tracking branch 'nccl-tests/master' into develop [ROCm/rccl-tests commit: `6f1b11ad49`]	2025-08-16 16:10:04 -04:00
Kajsa Arnold	aed68678a4	Standardize output formats (#140 ) * remove spaces from csv * consistently set redop to none when applicable * write output file after test finishes [ROCm/rccl-tests commit: `a7809b3243`]	2025-07-30 17:28:04 -05:00
David Addison	33b74ad124	Merge pull request #316 from martin-belanger/print-program-name Print the name of the program being executed before and after test output [ROCm/rccl-tests commit: `fae7cb4727`]	2025-07-24 14:58:54 -07:00
Bertan Dogancay	7111d2dd99	[Common] Use NCCL API to allocate/free memory (#144 ) [ROCm/rccl-tests commit: `645be0eb45`]	2025-07-24 11:14:49 -04:00
David Addison	146ecc2212	Add extra reserved space during maxBytes calculation Also, don't allow minBytes > maxBytes [ROCm/rccl-tests commit: `6edafa0a9c`]	2025-07-23 16:19:37 -07:00
David Addison	57af056dd0	Minor fix to Makefile Move comments to separate lines [ROCm/rccl-tests commit: `def2d3689c`]	2025-07-23 16:04:30 -07:00
BertanDogancay	0010193b64	Merge remote-tracking branch 'nccl-tests/master' into develop [ROCm/rccl-tests commit: `50a26637fb`]	2025-07-23 14:23:22 -05:00
Nilesh M Negi	a74d983073	[BUILD] Fix GPU_TARGETS in Makefile for ROCm 7.0 (#136 ) [ROCm/rccl-tests commit: `2c255c4763`]	2025-07-16 09:38:33 -05:00
Sam Wu	c3f93c526d	Remove precheckin script (#88 ) [ROCm/rccl-tests commit: `66e513c24f`]	2025-07-11 13:49:38 -06:00
Sam Wu	f0df6fcccb	Remove call to junit in math ci (#124 ) [ROCm/rccl-tests commit: `aac5f2b56c`]	2025-07-04 11:54:11 -06:00
Satyanvesh Dittakavi	5fd16bd1c3	Add cstring header explictly as it is removed from HIP (#132 ) [ROCm/rccl-tests commit: `0039629ac5`]	2025-06-24 15:09:23 -05:00
David Addison	4ec9c91be3	Add Turing (SM75) support to CUDA 13.0 builds [ROCm/rccl-tests commit: `97ee098516`]	2025-06-04 17:54:58 -07:00
David Addison	0ae7c8cbf4	Wrap ncclCommWindowRegister() calls within ncclGroup [ROCm/rccl-tests commit: `e7c8825b0b`]	2025-06-03 10:36:53 -07:00
Martin Belanger	ce1a83a0e8	Print the name of the program being executed One thing missing from the stdout of each performance test is the name of the test that is actually being run. This patch adds 2 new messages to the stdout. At the beginning of the execution of a test (e.g. sendrecv_perf) we will now see this message: Collective test starting: sendrecv_perf And at the end, we will now see this: Collective test concluded: sendrecv_perf This is needed when running several tests consecutively and we're trying to parse the stdout to collect the results. For example, using a Python script to parse the stdout, one could retrieve the results for each test and plot them on a graph. This patch makes it easier to implement such a script. Signed-off-by: Martin Belanger <martin.belanger@dell.com> [ROCm/rccl-tests commit: `dafb70408d`]	2025-06-03 11:43:02 -04:00
David Addison	3b79e1a05c	Reinstate Pascal suppport for CUDA 12.8+ builds [ROCm/rccl-tests commit: `5290298ab6`]	2025-06-02 09:29:52 -07:00
David Addison	07aa6e264d	Need to drop Volta (sm_70) support from CUDA 13.0 [ROCm/rccl-tests commit: `8bc16f4e01`]	2025-05-30 18:04:25 -07:00
David Addison	cc15c84a01	Fix formatting errors in README.md [ROCm/rccl-tests commit: `0c60e6a8e4`]	2025-05-30 17:43:30 -07:00
David Addison	46e09f18c8	Add support for Symmetric Memory Registration From NCCL 2.27.x we can now use the Symmetric Memory APIs (-R 2) [ROCm/rccl-tests commit: `a5c539e68b`]	2025-05-30 17:31:34 -07:00
Nilesh M Negi	2b52453488	[BUILD] Fix logic for rocm-cmake dependency (#129 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> [ROCm/rccl-tests commit: `b0a3841b35`]	2025-05-22 22:27:09 -05:00
mberenjk	db5ab33461	Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128 ) Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl-tests commit: `9076091602`]	2025-05-16 09:14:46 -05:00
Rahul Vaidya	fa5259894c	Ensure backward compatibility for fp8 datatypes (#126 ) * Ensure backward compatibility for fp8 datatypes Signed-off-by: ravaidya <ravaidya@amd.com> * Update code comments Signed-off-by: ravaidya <ravaidya@amd.com> --------- Signed-off-by: ravaidya <ravaidya@amd.com> [ROCm/rccl-tests commit: `0abe3c80bb`]	2025-05-15 13:56:40 -05:00
mberenjk	ed6ebb12a7	Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109 ) * addressing hip_fp8 support compatibility issue * skipping mulsum and avg test for fp8, using hip_fp8 for product * syncing with nccl-tests removing the fp8 filter for pre-hopper gpus and resolving the merge conflict --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl-tests commit: `4b2b635766`]	2025-05-14 15:30:07 -05:00
Wenkai Du	fe47d3dd77	Automatically set in-place option from out-of-place (#123 ) [ROCm/rccl-tests commit: `cac33a8c2f`]	2025-05-09 16:48:42 -05:00
Nilesh M Negi	e3b9d785cc	[BUILD] Add options to install script for compiler and GPU targets (#121 ) * [BUILD] Add options to install script for compiler and GPU targets * Fix GPU_TARGETS field and add option for custom ROCm path * Check for ROCM_PATH --------- Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> [ROCm/rccl-tests commit: `41b383a0d4`]	2025-05-07 13:19:10 -05:00
David Addison	173c15f4f4	Re-add sm_70 support for CUDA 12.8+ and 13.0 builds [ROCm/rccl-tests commit: `e041d901e6`]	2025-05-07 10:30:59 -07:00
Marius Brehler	b0b615091e	Link `Threads::Threads` (#119 ) `pthread.h` is included in `src/common.h` but lib is not properly linked, resulting in the build failing with unresolved symbols when trying to link. [ROCm/rccl-tests commit: `5b27b961b2`]	2025-04-29 16:18:51 -05:00
Nilesh M Negi	6d2ec88eec	[BUILD] Fix rccl-tests version string for packaging (#117 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> [ROCm/rccl-tests commit: `c96deb13cd`]	2025-04-29 08:51:43 -05:00
Rahul Vaidya	10c31fb05f	Fix build issues caused by 2.24.3 sync (#118 ) [ROCm/rccl-tests commit: `a4fd8f4667`]	2025-04-28 10:22:38 -05:00
Grant Pinkert	3f962f5d58	Fix message size logging (#115 ) Previously, the logger was logging the number of expected bytes a node was to recieve. This differs from the stdout logging, where the reported message size is the total size of a message. Signed-off-by: Grant Pinkert <gpinkert@amd.com> [ROCm/rccl-tests commit: `f611dbd49a`]	2025-04-25 11:05:21 -05:00
David Addison	b8dcb4dd83	Make verifiable a DSO and add NAME_SUFFIX support Build option DSO=1 generates libverifiable.so which can be used to reduce the combined binary size. Build option NAME_SUFFIX can be used to a add suffix to all generated binaries. e.g. NAME_SUFFIX=_mpi Added new make target: clean_intermediates [ROCm/rccl-tests commit: `1021260ca9`]	2025-04-23 17:07:24 -07:00
Nilesh M Negi	ba1adc3316	Merge pull request #116 from nileshnegi/sync/nccl-tests/02-28-2025 [SYNC] NCCL-Tests v2.14.1 [ROCm/rccl-tests commit: `83d38d91b6`]	2025-04-21 19:53:35 -05:00
nileshnegi	8d887aad0d	Merge remote-tracking branch 'nccl-tests/master' into develop [ROCm/rccl-tests commit: `5625599dda`]	2025-04-21 19:46:10 -05:00
David Addison	8d71063e05	Add support for FP8 datatypes Added new datatypes: f8e4m3, f8e5m2 Only supported on H100+ architectures and NCCL versions >= 2.24.0 [ROCm/rccl-tests commit: `501a149d57`]	2025-04-18 19:20:59 -07:00
mberenjk	f3f3158a7e	skipping the prod test for FP8 types in reduce and reduce-scatter (#111 ) * skipping the prod test for FP8 types in reduce and reduce-scatter --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com> [ROCm/rccl-tests commit: `5e838ad9df`]	2025-04-15 09:38:33 -05:00
Alex Breslow	9da345dadf	Add instructions to README regarding benchmarking on pre ROCm 6.4.x versions with HSA_NO_SCRATCH_RECLAIM=1 (#114 ) [ROCm/rccl-tests commit: `284ff2ac84`]	2025-04-08 09:59:57 -07:00
David Addison	d516392fac	Add PCI domain and device ID for GPU device BDF display [ROCm/rccl-tests commit: `b4300cc79d`]	2025-02-28 13:25:51 -08:00
Sylvain Jeaugey	b740da9a31	Add NCCL_TESTS_SPLIT documentation in the README [ROCm/rccl-tests commit: `903918fc54`]	2025-02-06 14:10:07 +01:00
Junyu Ma	e2a9cbb362	Perftests: Introduce NCCL_TESTS_SPLIT env `NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators. Will be overrided by `NCCL_TESTS_SPLIT_MASK`. Examples: NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node. NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7 NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system. NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72. You can also use: "%" "&" "\|" "/" for short. Extra spaces in the middle will be automatically ignored. Not case sensitive. The followings are all equivalent: NCCL_TESTS_SPLIT="%0x7" NCCL_TESTS_SPLIT="%0b111" NCCL_TESTS_SPLIT="AND 7" NCCL_TESTS_SPLIT="and 0x7" [ROCm/rccl-tests commit: `a89cf07fe8`]	2025-02-04 15:18:09 -08:00
David Addison	6f2e0f8a21	Update CUDA gencodes Add support for Blackwell sm100 and sm120 from CUDA 12.8 Add support for Hopper sm90 from CUDA 12.0 [ROCm/rccl-tests commit: `cb6a46fdd6`]	2025-01-25 17:32:16 -08:00
Nilesh M Negi	590c2b0187	[GIT] Add CODEOWNERS and PR Template (#102 ) Signed-off-by: nileshnegi <Nilesh.Negi@amd.com> [ROCm/rccl-tests commit: `448c4c7269`]	2025-01-16 17:05:48 -07:00

1 2 3 4 5 ...

257 Commits