Commit Graph

74 Commits

Author SHA1 Message Date
mberenjk 697bee4ee8 Improving build time by removing the gfx11xx and host code from rccl_float8.h (#1789)
* removing extra build time by removing the gfx11xx arch from using hip_fp8

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-07-09 14:03:47 -05:00
Rakesh Roy dd3b1d816c Fix chrono build error (#1790) 2025-07-04 08:27:30 -05:00
BertanDogancay aaf023976a Merge remote-tracking branch 'nccl/master' into develop 2025-06-20 07:54:49 -05:00
Arm Patinyasakdikul 6c37ae9470 Added missing copyright message. (#1742)
* Added missing copyright message.

* addressed comments.
2025-06-12 09:58:01 -05:00
Atul Kulkarni 682ed36fe6 Added new ENABLE_CODE_COVERAGE option. (#1664)
Modified install.sh script to add this new option
2025-06-10 12:12:36 -05:00
Arm Patinyasakdikul c07445d5b4 Test: bump max stacksize once again to match current expectation. 2025-05-23 11:18:25 -05:00
Arm Patinyasakdikul 523e0893e4 Test: Change max stack size to 520 to accomodate new ROCm changes. 2025-05-21 20:21:27 -05:00
corey-derochie-amd 170acf3bda Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues. (#1546)
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"

This reverts commit 824b81c034.

* [UT] Modify max stack size to 496

* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION

* addressing the ci failure

* Adding the device tag

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-05-14 15:33:03 -05:00
BertanDogancay a6bf9bfc9e Merge remote-tracking branch 'nccl/master' into develop 2025-04-23 20:47:43 -07:00
gilbertlee-amd ee85a70bb4 Adding UT_DEBUG_PAUSE to unit tests (#1653) 2025-04-21 21:15:07 -06:00
Tim 9a55ff60a9 RCCL Replayer update (#1603)
RCCL recorder w/ suggested change and UT
2025-04-19 00:21:27 -04:00
AbandiGa 7a84c5dbb0 added copyright (#1635) 2025-04-14 09:46:18 -05:00
BertanDogancay 0b2062c560 Merge remote-tracking branch 'nccl/master' into develop 2025-03-27 12:53:04 -05:00
Nilesh M Negi d6b987a53f [UT] Increase stack size for StandaloneTests to 480 (#1616)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2025-03-21 21:33:32 -05:00
mberenjk 5f691aaf65 Skipping AllReduce test on more than 8 ranks for FP8 type on Hyabusa (#1598)
* Skipping AllReduce FP8 test on 9 to 16 ranks (gfx90a) as it's using Tree algorithm not RING

---------

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-03-17 10:22:49 -05:00
Wenkai Du 4237caad69 Limit P2P channels per peer to not exceeding max channels (#1594)
* Limit P2P channels per peer to not exceeding max channels

* [UT] test single GPU cases for all collectives

* [UT] fix out of range root value
2025-03-11 09:32:09 -07:00
Nilesh M Negi 4e406acc43 [UT] Include iomanip if not defined (#1510)
* [UT] Include iomanip if not defined

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Remove include guards

`iomanip.h` has pre-defined include guards. These are not needed.

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2025-02-11 08:48:47 -07:00
isaki001 3398fa78fe non-hipGraph MSCCL++ tests for allReduce and allGather (#1503)
* working tests for a single message size

* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests

* simplify test invocation

* remove unecessary logs and exit from ncclCommRegister

* set expected results for allGather

* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process

* fix improper size of expected-results vector

* Removing unused changes.

* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.

* Apply suggestions from code review

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>
2025-02-04 09:11:32 -06:00
corey-derochie-amd c68b558ed5 Increased gfx90a stack size expectation to 320 to match latest compiler. (#1487) 2025-01-16 17:04:51 -07:00
mberenjk 39483c55f8 Initializing all ranks to the same value to avoid failure of UT AllR… (#1459)
* Initializing all ranks to the same value to avoid failure of  UT AllReduce for FP8 type

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
2025-01-02 11:39:02 -06:00
saurabhAMD 69b2b712ab GPU allocation for CPX Unit Tests using PCI bus id (#1403)
* mapping devices wrt pci

* Gpu allocation by using pci mapping

* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.

* Removing redundant testBed instance calling
2024-11-04 10:51:00 -06:00
Bertan Dogancay 984f1e4343 Increase MAX_STACK_SIZE for UT (#1398) 2024-11-01 13:07:45 -04:00
Tim fd9924cfe7 Adjustment for UT Sendrecv (#1400)
Enabled UT sendrecv to same rank and refactor UBR call
2024-10-30 15:13:53 -04:00
saurabhAMD 4856309413 Making variable names consistent in EnvVars.cpp (#1327)
* Making variable names consistent in EnvVars.cpp
2024-09-11 09:23:31 -05:00
saurabhAMD 289a80c4e9 Enabling Unit Tests for CPX mode (#1324)
* Unit Tests for RCCL in CPX mode

* override pow2gpus set by cpx mode by user argument

* Adding comment for UT_POW2_GPUS

* Additional comment on why using pow2gpus for cpx mode.
2024-09-09 10:12:33 -05:00
Tim 757d1891e9 Update EnvVars.cpp 2024-09-04 16:55:36 -04:00
mberenjk db840f024e adding all nccl apis to api_support to enable rccl tracing by rocprofv3 (#1297)
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-08-22 12:36:07 -05:00
Tim a4793286c7 Adding User Buffer Registration support for Unit test (#1199)
* Adding UBR support for UT SendRecv

Signed-off-by: Tim Hu <timhu102@amd.com>

* Update test/common/TestBedChild.cpp

Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>

---------

Signed-off-by: Tim Hu <timhu102@amd.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
2024-07-30 13:39:25 -04:00
akolliasAMD c246e25f8e gfx12 Disable ll protocol (#1268) 2024-07-26 08:59:55 -06:00
Wenkai Du 89349f2ce4 Template unroll for RCCL kernels (#1250)
* Template unroll for RCCL kernels

* Adding unroll template arg during CMake hipification

* Reduce linking parallel jobs to avoid OOM in CI

* Workaround issues with UT tests

SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking

* CI: do not use -j 16 when building

* CI: use -j 8 when building

* Only reduce parallel linking job for CI extended

* Restore original jenkins command. Change parallel linking jobs in cmake

* Disable MSCCLPP

---------

Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com>
2024-07-19 08:15:59 -07:00
corey-derochie-amd 0c36d571ea Enable multi-threading for MSCCL (#1203)
MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.
2024-07-04 09:34:38 -06:00
saurabhAMD e170f41ddd Unit Tests for testing channels (#1222) 2024-06-25 10:10:10 -05:00
saurabhAMD 392a73fdef enable UT to test with channels greater than 64 2024-06-13 13:54:08 -05:00
Bertan Dogancay 0ec41f1386 [UT] Start supporting multiple group calls and graphs (#1151)
* Start supporting multiple group calls UT
2024-04-25 11:11:16 -06:00
mberenjk 428837ffe4 replacing rccl_bfloat16 with hip_bfloat16 (#1126)
Co-authored-by: mberenjk <mberenjk@amd.com>
2024-04-11 11:30:37 -05:00
Andy li 6777e65c1d Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo
2024-03-08 15:17:53 -08:00
Tim 0d06b0f1de Adding FP16 cases to unit tests(#1093)
Signed-off-by: Tim Hu <timhu102@amd.com>
2024-02-26 12:08:04 -05:00
BertanDogancay b098120c40 Increase max stack size when ll128 enabled 2024-02-15 15:56:59 -08:00
Bertan Dogancay dc2d486ba0 Add stack size UT (#1081)
* Add stack size UT
2024-02-12 17:56:15 -07:00
Shilei Tian ba9f7917ba Add a constructor for PtrUnion in case it is not initialized explicitly (#1064) 2024-01-26 08:00:27 -08:00
Bertan Dogancay 28d9b170c9 [DEV] Configure functions in RCCL (#986)
* configure functions in rccl
2024-01-18 15:07:16 -07:00
Tim 05850e89f2 Relaxing default timeout limit, add error log (#1052)
Signed-off-by: Tim Hu <timhu102@amd.com>
2024-01-18 15:09:08 -05:00
Tim 9c0ef11ac7 Adding timeout functionality/EnvVar to TestBed (#1044)
* Adding timeout functionality/EnvVar to TestBed
* updating timeout unit to microseconds

Signed-off-by: Tim Hu <timhu102@amd.com>
2024-01-17 11:33:01 -05:00
Bertan Dogancay 9d11cd092f Add ncclCommSplit test (#852)
Add ncclSplitCommTest
2023-08-25 16:26:45 -06:00
Wenkai Du ce6a2ffac8 Merge pull request #782 from ROCmSoftwarePlatform/2.18.3
Sync up with NCCL 2.18.3
2023-06-29 15:04:16 -07:00
gilbertlee-amd f7c553edad Report unit test environment variable values as part of output (#789) 2023-06-29 07:13:05 -06:00
Wenkai Du abd0615351 Merge remote-tracking branch 'nccl/master' into develop 2023-06-26 22:51:56 +00:00
akolliasAMD 2ce7d971e5 lessened the amount of child processes to active ones (#720) 2023-04-11 08:59:56 -06:00
gilbertlee-amd 27e0cb43c2 Unit test performance refactor (#700)
* Refactoring unit tests to improve performance
* Spawning child processes during InitComms instead of on TestBed construction
* Temporarily disabling graph unit tests
2023-04-06 12:28:53 -06:00
gilbertlee-amd 00c3d8d850 Adding interactive mode for unit tests (UT_INTERACTIVE) (#715) 2023-03-21 10:58:24 -06:00