15 Коммитов

Автор SHA1 Сообщение Дата
Atul Kulkarni e4aef19511 Added new unit tests for AllReduce with Bias API (#2036)
* Added new unit tests for AllReduce with Bias API

* Address review comments

[ROCm/rccl commit: 7c12b0b76b]
2025-12-03 17:37:34 -06:00
BertanDogancay d045d0ca23 Merge remote-tracking branch 'nccl/master' into develop
[ROCm/rccl commit: a6bf9bfc9e]
2025-04-23 20:47:43 -07:00
gilbertlee-amd 4f67522420 Removing the experimental clique kernel files (#1610)
[ROCm/rccl commit: 626dc50ab5]
2025-03-20 18:10:01 -06:00
isaki001 a40d4eb960 non-hipGraph MSCCL++ tests for allReduce and allGather (#1503)
* working tests for a single message size

* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests

* simplify test invocation

* remove unecessary logs and exit from ncclCommRegister

* set expected results for allGather

* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process

* fix improper size of expected-results vector

* Removing unused changes.

* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.

* Apply suggestions from code review

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

[ROCm/rccl commit: 3398fa78fe]
2025-02-04 09:11:32 -06:00
mberenjk 300f954185 Initializing all ranks to the same value to avoid failure of UT AllR… (#1459)
* Initializing all ranks to the same value to avoid failure of  UT AllReduce for FP8 type

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>

[ROCm/rccl commit: 39483c55f8]
2025-01-02 11:39:02 -06:00
saurabhAMD 69d976532b GPU allocation for CPX Unit Tests using PCI bus id (#1403)
* mapping devices wrt pci

* Gpu allocation by using pci mapping

* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.

* Removing redundant testBed instance calling

[ROCm/rccl commit: 69b2b712ab]
2024-11-04 10:51:00 -06:00
corey-derochie-amd ad1384bea1 Hide or fix all build warnings (#1331)
* Changing C-strings to be const.

* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.

* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.

* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.

* Fixed VLA in rccl UT.

[ROCm/rccl commit: 1c45962273]
2024-11-04 09:46:42 -07:00
saurabhAMD de7ea612d7 Unit Tests for testing channels (#1222)
[ROCm/rccl commit: e170f41ddd]
2024-06-25 10:10:10 -05:00
saurabhAMD 44064a612c enable UT to test with channels greater than 64
[ROCm/rccl commit: 392a73fdef]
2024-06-13 13:54:08 -05:00
Andy li e373bd44bf Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo

[ROCm/rccl commit: 6777e65c1d]
2024-03-08 15:17:53 -08:00
akolliasAMD 0c1f773021 rearranged how the min and max functions are part of msccl (#1025)
* rearranged how the min and max functions are part of msccl

* added more coverage on in place graph tests

[ROCm/rccl commit: f4858e14b2]
2023-12-21 08:58:33 -07:00
akolliasAMD bc7df769a2 AllReduceTests,fixed the number of roots (#925)
[ROCm/rccl commit: d8dc282eeb]
2023-10-20 10:25:11 -06:00
akolliasAMD 59e62c807b Re-enabled graph tests (#736)
* enabled graph tests
* joined multi and single process CI testing

[ROCm/rccl commit: cf8cfa88a8]
2023-06-29 08:08:17 -06:00
Pedram Alizadeh aeffdf872b Disabled hipgraph tests! (#725)
[ROCm/rccl commit: 53c1c38f0e]
2023-04-13 17:42:05 -04:00
gilbertlee-amd ff2c1c5d0f Unit test performance refactor (#700)
* Refactoring unit tests to improve performance
* Spawning child processes during InitComms instead of on TestBed construction
* Temporarily disabling graph unit tests

[ROCm/rccl commit: 27e0cb43c2]
2023-04-06 12:28:53 -06:00