8 Commits

Autor SHA1 Mensaje Fecha
BertanDogancay d045d0ca23 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a6bf9bfc9e]
2025-04-23 20:47:43 -07:00
isaki001 14be1c9a7a fix the size of the recv buffer in AllGather UBR test (#1564)
[ROCm/rccl commit: 59c55842f1]
2025-03-05 11:42:15 -06:00
isaki001 a40d4eb960 non-hipGraph MSCCL++ tests for allReduce and allGather (#1503)
* working tests for a single message size

* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests

* simplify test invocation

* remove unecessary logs and exit from ncclCommRegister

* set expected results for allGather

* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process

* fix improper size of expected-results vector

* Removing unused changes.

* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.

* Apply suggestions from code review

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

[ROCm/rccl commit: 3398fa78fe]
2025-02-04 09:11:32 -06:00
Andy li e373bd44bf Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo

[ROCm/rccl commit: 6777e65c1d]
2024-03-08 15:17:53 -08:00
Tim 826d20495f Adding FP16 cases to unit tests(#1093)
Signed-off-by: Tim Hu <timhu102@amd.com>

[ROCm/rccl commit: 0d06b0f1de]
2024-02-26 12:08:04 -05:00
akolliasAMD 59e62c807b Re-enabled graph tests (#736)
* enabled graph tests
* joined multi and single process CI testing

[ROCm/rccl commit: cf8cfa88a8]
2023-06-29 08:08:17 -06:00
Pedram Alizadeh aeffdf872b Disabled hipgraph tests! (#725)
[ROCm/rccl commit: 53c1c38f0e]
2023-04-13 17:42:05 -04:00
gilbertlee-amd ff2c1c5d0f Unit test performance refactor (#700)
* Refactoring unit tests to improve performance
* Spawning child processes during InitComms instead of on TestBed construction
* Temporarily disabling graph unit tests

[ROCm/rccl commit: 27e0cb43c2]
2023-04-06 12:28:53 -06:00