rocm-systems

Автор	SHA1	Сообщение	Дата
mberenjk	33cc4df1e4	Fixing the AR_Bias issue for FP8 (#155 ) Authored-by: Marzieh Berenjkoub <146776561+mberenjk@users.noreply.github.com> Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>	2025-10-18 14:46:31 -05:00
Wenkai Du	db6ea5a594	Add all_reduce_bias_perf to support All Reduce with Bias (#130 ) Use dynamic symbol loading of ncclAllReduceWithBias Co-authored-by: mberenjk <146776561+mberenjk@users.noreply.github.com>	2025-10-13 16:09:10 -05:00
arvindcheru	e1b8a3aefc	Dependency removal with hipify_perl symlink (#150 )	2025-09-15 13:16:09 -05:00
BertanDogancay	50a26637fb	Merge remote-tracking branch 'nccl-tests/master' into develop	2025-07-23 14:23:22 -05:00
mberenjk	9076091602	Switching to old version of rccl_float8 for ROCm versions earlier than 6.3 for backward compatibility. (#128 ) Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>	2025-05-16 09:14:46 -05:00
Rahul Vaidya	0abe3c80bb	Ensure backward compatibility for fp8 datatypes (#126 ) * Ensure backward compatibility for fp8 datatypes Signed-off-by: ravaidya <ravaidya@amd.com> * Update code comments Signed-off-by: ravaidya <ravaidya@amd.com> --------- Signed-off-by: ravaidya <ravaidya@amd.com>	2025-05-15 13:56:40 -05:00
mberenjk	4b2b635766	Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues.(#109 ) * addressing hip_fp8 support compatibility issue * skipping mulsum and avg test for fp8, using hip_fp8 for product * syncing with nccl-tests removing the fp8 filter for pre-hopper gpus and resolving the merge conflict --------- Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>	2025-05-14 15:30:07 -05:00
Rahul Vaidya	a4fd8f4667	Fix build issues caused by 2.24.3 sync (#118 )	2025-04-28 10:22:38 -05:00
David Addison	1021260ca9	Make verifiable a DSO and add NAME_SUFFIX support Build option DSO=1 generates libverifiable.so which can be used to reduce the combined binary size. Build option NAME_SUFFIX can be used to a add suffix to all generated binaries. e.g. NAME_SUFFIX=_mpi Added new make target: clean_intermediates	2025-04-23 17:07:24 -07:00
nileshnegi	5625599dda	Merge remote-tracking branch 'nccl-tests/master' into develop	2025-04-21 19:46:10 -05:00
David Addison	501a149d57	Add support for FP8 datatypes Added new datatypes: f8e4m3, f8e5m2 Only supported on H100+ architectures and NCCL versions >= 2.24.0	2025-04-18 19:20:59 -07:00
mberenjk	3f7f7859bf	adding git version to rccl-tests (#69 ) Co-authored-by: mberenjk <mberenjk@amd.com>	2024-03-28 14:03:59 -05:00
Andy li	e447c17382	update the fp8 header file name (#65 ) * update the fp8 header name	2024-03-08 10:02:40 -08:00
Andy li	21e59fb283	Enable fp8 support (#63 ) * initial checkin * rename the fp8 datatype name * update based on cr comments * resolve the build issue * resolve fp8 campability issue * fix minior bug and catch up to reflex latest develop branch change * add fp8 + operatior support * update fp8 header file * resolve merge issue from develop branch	2024-03-07 16:54:41 -08:00
Bertan Dogancay	7a7a5969d0	Revert __nv_bfloat16 back to hip_bfloat16 (#64 )	2024-03-06 11:11:44 -07:00
Bertan Dogancay	88cf7dbf45	Add hipify steps prior to build (#62 ) * Add hipify steps prior to build	2024-03-05 09:47:18 -07:00
Edgar Gabriel	e9f5be184c	fix algorithm assigning values in testsuite avoid a division by zero which seems to only occur for op=prod and datatype=half, since the maximum exponent is small (15) and can exceed the number of ranks.	2022-11-30 23:01:46 +00:00
Edgar Gabriel	641e93e99c	make rccl-test compile again. all files compile now. mpi tests also pass	2022-10-21 22:07:33 +00:00
John Bachan	51af5572bf	Resync with NCCL 2.13 * Added "verifiable", a suite of kernels for generating and verifying reduction input and output arrays in a bit-precise way. * Data corruption errors now reported in number of wrong elements instead of max deviation. * Use ncclGetLastError. * Don't run hypercube on non-powers of 2 ranks. * Fix to hypercube data verification. * Use "thread local" as the defaut CUDA capture mode. * Replaced pthread_yield -> sched_yield() * Bugfix to the cpu-side barrier/allreduce implementations.	2022-08-22 17:51:06 -07:00

19 Коммитов