nileshnegi
8d887aad0d
Merge remote-tracking branch 'nccl-tests/master' into develop
...
[ROCm/rccl-tests commit: 5625599dda ]
2025-04-21 19:46:10 -05:00
mberenjk
efa2d204b2
removing FP8 product from allReduce test cases ( #97 )
...
* removing FP8 product from allReduce test cases
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
[ROCm/rccl-tests commit: 77ae744c18 ]
2025-01-06 14:05:38 -06:00
John Bachan
69b9a05e71
Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes.
...
This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.
[ROCm/rccl-tests commit: 29f4114f02 ]
2024-12-18 11:20:28 -08:00
Bertan Dogancay
882a96f5cb
Add hipify steps prior to build ( #62 )
...
* Add hipify steps prior to build
[ROCm/rccl-tests commit: 88cf7dbf45 ]
2024-03-05 09:47:18 -07:00
Wenkai Du
b49f6da1ec
Merge remote-tracking branch 'nccl-tests/master' into HEAD
...
[ROCm/rccl-tests commit: 621dde544d ]
2024-03-01 18:34:44 +00:00
Edgar Gabriel
08f9435e5a
Merge remote-tracking branch 'nccl-tests/master' into topic/v2.13.4-sync
...
[ROCm/rccl-tests commit: 3ae371cce7 ]
2022-10-14 16:02:54 -05:00
Sylvain Jeaugey
fdaa88710b
Update NCCL tests
...
[ROCm/rccl-tests commit: d313d20a26 ]
2022-09-23 01:13:29 -07:00
John Bachan
b5d746b58e
Resync with NCCL 2.13
...
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
[ROCm/rccl-tests commit: 51af5572bf ]
2022-08-22 17:51:06 -07:00
Edgar
dad6d819d0
implementation of multi-rank support in rccl-tests.
...
[ROCm/rccl-tests commit: 0500f2f132 ]
2022-06-10 14:54:10 -04:00
Wenkai Du
06f4ccd9d2
Merge remote-tracking branch 'nccl/master' into develop
...
[ROCm/rccl-tests commit: 9f8ddadcdf ]
2021-07-13 08:11:44 -07:00
David Addison
20b63cf465
Fixed formatting for bfloat16 support
...
[ROCm/rccl-tests commit: 526eacadf7 ]
2021-06-28 10:12:34 -07:00
David Addison
a41268e26e
Add support for ncclAvg operation
...
[ROCm/rccl-tests commit: cde7e769c1 ]
2021-06-28 09:41:58 -07:00
Wenkai Du
8ff34620fb
workaround weak symbol issue
...
hcc prints "error: alias must point to a defined variable or function"
[ROCm/rccl-tests commit: 4474fe168d ]
2019-04-18 10:34:55 -07:00
Stanley Tsang
aac7cfb64f
Adding AMD copyright notices
...
[ROCm/rccl-tests commit: 71e663e62d ]
2019-04-10 15:28:40 -07:00
Wenkai Du
3c8cfb2d6e
hipify nccl-tests to become rccl-tests
...
[ROCm/rccl-tests commit: a15f771cb2 ]
2019-04-10 13:43:58 -07:00
David Addison
18902f40a7
Resync all tests with test code from NCCL 2.4
...
Major rework to merge most of the changes from the NCCL internal
tests into the public ones
Added "-m <agg_iters>" operation aggregation option.
Data integrity checking is now much more performant at scale.
Startup times at scale are improved.
Test latency units are now displayed in usec.
[ROCm/rccl-tests commit: cbe7f65400 ]
2019-04-05 13:42:15 -07:00
Sylvain Jeaugey
4cb47ccb21
Initial commit
...
[ROCm/rccl-tests commit: b188a15299 ]
2017-08-08 16:18:34 -07:00