19 Commity

Autor SHA1 Zpráva Datum
mberenjk abf0605823 Fixing the AR_Bias issue for FP8 (#155)
Authored-by: Marzieh Berenjkoub <146776561+mberenjk@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>

[ROCm/rccl-tests commit: 33cc4df1e4]
2025-10-18 14:46:31 -05:00
Wenkai Du 75a69211a0 Add all_reduce_bias_perf to support All Reduce with Bias (#130)
Use dynamic symbol loading of ncclAllReduceWithBias

Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>

[ROCm/rccl-tests commit: db6ea5a594]
2025-10-13 16:09:10 -05:00
arvindcheru b07376b9ae Dependency removal with hipify_perl symlink (#150)
[ROCm/rccl-tests commit: e1b8a3aefc]
2025-09-15 13:16:09 -05:00
BertanDogancay 0010193b64 Merge remote-tracking branch 'nccl-tests/master' into develop
[ROCm/rccl-tests commit: 50a26637fb]
2025-07-23 14:23:22 -05:00
mberenjk db5ab33461 Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 9076091602]
2025-05-16 09:14:46 -05:00
Rahul Vaidya fa5259894c Ensure backward compatibility for fp8 datatypes (#126)
* Ensure backward compatibility for fp8 datatypes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Update code comments

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>

[ROCm/rccl-tests commit: 0abe3c80bb]
2025-05-15 13:56:40 -05:00
mberenjk ed6ebb12a7 Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109)
* addressing hip_fp8 support compatibility issue

* skipping mulsum and avg test for fp8, using hip_fp8 for product

* syncing with nccl-tests

removing the fp8 filter for pre-hopper gpus and resolving the merge conflict

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl-tests commit: 4b2b635766]
2025-05-14 15:30:07 -05:00
Rahul Vaidya 10c31fb05f Fix build issues caused by 2.24.3 sync (#118)
[ROCm/rccl-tests commit: a4fd8f4667]
2025-04-28 10:22:38 -05:00
David Addison b8dcb4dd83 Make verifiable a DSO and add NAME_SUFFIX support
Build option DSO=1 generates libverifiable.so which can be
used to reduce the combined binary size.

Build option NAME_SUFFIX can be used to a add suffix to all
generated binaries. e.g. NAME_SUFFIX=_mpi

Added new make target: clean_intermediates


[ROCm/rccl-tests commit: 1021260ca9]
2025-04-23 17:07:24 -07:00
nileshnegi 8d887aad0d Merge remote-tracking branch 'nccl-tests/master' into develop
[ROCm/rccl-tests commit: 5625599dda]
2025-04-21 19:46:10 -05:00
David Addison 8d71063e05 Add support for FP8 datatypes
Added new datatypes: f8e4m3, f8e5m2

Only supported on H100+ architectures and NCCL versions >= 2.24.0


[ROCm/rccl-tests commit: 501a149d57]
2025-04-18 19:20:59 -07:00
mberenjk ca4ba933a3 adding git version to rccl-tests (#69)
Co-authored-by: mberenjk <mberenjk@amd.com>

[ROCm/rccl-tests commit: 3f7f7859bf]
2024-03-28 14:03:59 -05:00
Andy li aaf1e27af2 update the fp8 header file name (#65)
* update the fp8 header name

[ROCm/rccl-tests commit: e447c17382]
2024-03-08 10:02:40 -08:00
Andy li c128f0422d Enable fp8 support (#63)
* initial checkin

* rename the fp8 datatype name

* update based on cr comments

* resolve the build issue

* resolve fp8 campability issue

* fix minior bug and catch up to reflex latest develop branch change

* add fp8 + operatior support

* update fp8 header file

* resolve merge issue from develop branch

[ROCm/rccl-tests commit: 21e59fb283]
2024-03-07 16:54:41 -08:00
Bertan Dogancay efbfad7fe5 Revert __nv_bfloat16 back to hip_bfloat16 (#64)
[ROCm/rccl-tests commit: 7a7a5969d0]
2024-03-06 11:11:44 -07:00
Bertan Dogancay 882a96f5cb Add hipify steps prior to build (#62)
* Add hipify steps prior to build

[ROCm/rccl-tests commit: 88cf7dbf45]
2024-03-05 09:47:18 -07:00
Edgar Gabriel 84ee112d50 fix algorithm assigning values in testsuite
avoid a division by zero which seems to only occur for op=prod and
datatype=half, since the maximum exponent is small (15) and can exceed
the number of ranks.


[ROCm/rccl-tests commit: e9f5be184c]
2022-11-30 23:01:46 +00:00
Edgar Gabriel fab39367ae make rccl-test compile again.
all files compile now.
mpi tests also pass


[ROCm/rccl-tests commit: 641e93e99c]
2022-10-21 22:07:33 +00:00
John Bachan b5d746b58e Resync with NCCL 2.13
* Added "verifiable", a suite of kernels for generating and verifying reduction
  input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
  deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.


[ROCm/rccl-tests commit: 51af5572bf]
2022-08-22 17:51:06 -07:00