Граф коммитов

19 Коммитов

Автор SHA1 Сообщение Дата
mberenjk 33cc4df1e4 Fixing the AR_Bias issue for FP8 (#155)
Authored-by: Marzieh Berenjkoub <146776561+mberenjk@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2025-10-18 14:46:31 -05:00
Wenkai Du db6ea5a594 Add all_reduce_bias_perf to support All Reduce with Bias (#130)
Use dynamic symbol loading of ncclAllReduceWithBias

Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>
2025-10-13 16:09:10 -05:00
arvindcheru e1b8a3aefc Dependency removal with hipify_perl symlink (#150) 2025-09-15 13:16:09 -05:00
BertanDogancay 50a26637fb Merge remote-tracking branch 'nccl-tests/master' into develop 2025-07-23 14:23:22 -05:00
mberenjk 9076091602 Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128)
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-16 09:14:46 -05:00
Rahul Vaidya 0abe3c80bb Ensure backward compatibility for fp8 datatypes (#126)
* Ensure backward compatibility for fp8 datatypes

Signed-off-by: ravaidya <ravaidya@amd.com>

* Update code comments

Signed-off-by: ravaidya <ravaidya@amd.com>

---------

Signed-off-by: ravaidya <ravaidya@amd.com>
2025-05-15 13:56:40 -05:00
mberenjk 4b2b635766 Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109)
* addressing hip_fp8 support compatibility issue

* skipping mulsum and avg test for fp8, using hip_fp8 for product

* syncing with nccl-tests

removing the fp8 filter for pre-hopper gpus and resolving the merge conflict

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-14 15:30:07 -05:00
Rahul Vaidya a4fd8f4667 Fix build issues caused by 2.24.3 sync (#118) 2025-04-28 10:22:38 -05:00
David Addison 1021260ca9 Make verifiable a DSO and add NAME_SUFFIX support
Build option DSO=1 generates libverifiable.so which can be
used to reduce the combined binary size.

Build option NAME_SUFFIX can be used to a add suffix to all
generated binaries. e.g. NAME_SUFFIX=_mpi

Added new make target: clean_intermediates
2025-04-23 17:07:24 -07:00
nileshnegi 5625599dda Merge remote-tracking branch 'nccl-tests/master' into develop 2025-04-21 19:46:10 -05:00
David Addison 501a149d57 Add support for FP8 datatypes
Added new datatypes: f8e4m3, f8e5m2

Only supported on H100+ architectures and NCCL versions >= 2.24.0
2025-04-18 19:20:59 -07:00
mberenjk 3f7f7859bf adding git version to rccl-tests (#69)
Co-authored-by: mberenjk <mberenjk@amd.com>
2024-03-28 14:03:59 -05:00
Andy li e447c17382 update the fp8 header file name (#65)
* update the fp8 header name
2024-03-08 10:02:40 -08:00
Andy li 21e59fb283 Enable fp8 support (#63)
* initial checkin

* rename the fp8 datatype name

* update based on cr comments

* resolve the build issue

* resolve fp8 campability issue

* fix minior bug and catch up to reflex latest develop branch change

* add fp8 + operatior support

* update fp8 header file

* resolve merge issue from develop branch
2024-03-07 16:54:41 -08:00
Bertan Dogancay 7a7a5969d0 Revert __nv_bfloat16 back to hip_bfloat16 (#64) 2024-03-06 11:11:44 -07:00
Bertan Dogancay 88cf7dbf45 Add hipify steps prior to build (#62)
* Add hipify steps prior to build
2024-03-05 09:47:18 -07:00
Edgar Gabriel e9f5be184c fix algorithm assigning values in testsuite
avoid a division by zero which seems to only occur for op=prod and
datatype=half, since the maximum exponent is small (15) and can exceed
the number of ranks.
2022-11-30 23:01:46 +00:00
Edgar Gabriel 641e93e99c make rccl-test compile again.
all files compile now.
mpi tests also pass
2022-10-21 22:07:33 +00:00
John Bachan 51af5572bf Resync with NCCL 2.13
* Added "verifiable", a suite of kernels for generating and verifying reduction
  input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
  deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
2022-08-22 17:51:06 -07:00