vstojilj
2ac44cfe4e
SWDEV-536040 - Include <thread> header ( #1724 )
2025-06-06 10:28:11 -06:00
Arm Patinyasakdikul
c07445d5b4
Test: bump max stacksize once again to match current expectation.
2025-05-23 11:18:25 -05:00
Arm Patinyasakdikul
523e0893e4
Test: Change max stack size to 520 to accomodate new ROCm changes.
2025-05-21 20:21:27 -05:00
corey-derochie-amd
170acf3bda
Switched to using the hip_fp8 header instead of rccl_float8, resolving compatibility issues. ( #1546 )
...
* Revert "Revert "replacing rccl_float8 with hip_fp8 and address compatibility …"
This reverts commit 824b81c034 .
* [UT] Modify max stack size to 496
* adding a check for OCP type and replacing ROCM_VERSION with HIP_VERSION
* addressing the ci failure
* Adding the device tag
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
2025-05-14 15:33:03 -05:00
mberenjk
e70003736e
Write JSON file to /tmp directory to avoid incorrect write access in recorderTest ( #1680 )
...
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
2025-05-07 13:58:27 -05:00
Siu Chi Chan
9525c5b2ef
rccl-UnitTests - link to dl library ( #1673 )
2025-05-02 21:20:22 -05:00
deeksha-amd
2486838465
Added new tests for improving the code coverage ( #1656 )
...
Signed-off-by: Deeksha Goplani <deeksha.goplani@amd.com >
2025-04-30 18:01:11 -05:00
BertanDogancay
a6bf9bfc9e
Merge remote-tracking branch 'nccl/master' into develop
2025-04-23 20:47:43 -07:00
gilbertlee-amd
ee85a70bb4
Adding UT_DEBUG_PAUSE to unit tests ( #1653 )
2025-04-21 21:15:07 -06:00
Tim
9a55ff60a9
RCCL Replayer update ( #1603 )
...
RCCL recorder w/ suggested change and UT
2025-04-19 00:21:27 -04:00
AbandiGa
7a84c5dbb0
added copyright ( #1635 )
2025-04-14 09:46:18 -05:00
BertanDogancay
0b2062c560
Merge remote-tracking branch 'nccl/master' into develop
2025-03-27 12:53:04 -05:00
Nilesh M Negi
d6b987a53f
[UT] Increase stack size for StandaloneTests to 480 ( #1616 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2025-03-21 21:33:32 -05:00
gilbertlee-amd
626dc50ab5
Removing the experimental clique kernel files ( #1610 )
2025-03-20 18:10:01 -06:00
gilbertlee-amd
9a4e49ff1a
Psuedo-randomly adding zero-byte sends in AllToAllv unit test ( #1597 )
2025-03-20 17:00:48 -06:00
mberenjk
5f691aaf65
Skipping AllReduce test on more than 8 ranks for FP8 type on Hyabusa ( #1598 )
...
* Skipping AllReduce FP8 test on 9 to 16 ranks (gfx90a) as it's using Tree algorithm not RING
---------
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
2025-03-17 10:22:49 -05:00
Wenkai Du
4237caad69
Limit P2P channels per peer to not exceeding max channels ( #1594 )
...
* Limit P2P channels per peer to not exceeding max channels
* [UT] test single GPU cases for all collectives
* [UT] fix out of range root value
2025-03-11 09:32:09 -07:00
isaki001
59c55842f1
fix the size of the recv buffer in AllGather UBR test ( #1564 )
2025-03-05 11:42:15 -06:00
Nilesh M Negi
4e406acc43
[UT] Include iomanip if not defined ( #1510 )
...
* [UT] Include iomanip if not defined
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
* Remove include guards
`iomanip.h` has pre-defined include guards. These are not needed.
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
2025-02-11 08:48:47 -07:00
isaki001
3398fa78fe
non-hipGraph MSCCL++ tests for allReduce and allGather ( #1503 )
...
* working tests for a single message size
* move call_RCCL routine StandaloneUtils, create .cpp file for StandaloneUtils so that it can be included in several tests
* simplify test invocation
* remove unecessary logs and exit from ncclCommRegister
* set expected results for allGather
* skip test if nranks doesn't match number of gpus, call getAndDistributeNCCLid only from parent process
* fix improper size of expected-results vector
* Removing unused changes.
* Refactored to create a new file for the forked collectives call, as StandaloneUtils is for the Standalone tests. Renamed the functions to be slightly more accurate and follow existing naming conventions.
* Apply suggestions from code review
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
---------
Co-authored-by: isaki001 <isakioti@banff-pla-r27-38.pla.dcgpu >
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
Co-authored-by: Corey Derochie <corey.derochie@amd.com >
2025-02-04 09:11:32 -06:00
Bertan Dogancay
5afe900efd
Only look for librccl .co files in StackSize test ( #1499 )
...
Co-authored-by: BertanDogancay <bertan.dogancay>
2025-01-22 16:48:10 -07:00
corey-derochie-amd
c68b558ed5
Increased gfx90a stack size expectation to 320 to match latest compiler. ( #1487 )
2025-01-16 17:04:51 -07:00
mberenjk
39483c55f8
Initializing all ranks to the same value to avoid failure of UT AllR… ( #1459 )
...
* Initializing all ranks to the same value to avoid failure of UT AllReduce for FP8 type
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
2025-01-02 11:39:02 -06:00
saurabhAMD
69b2b712ab
GPU allocation for CPX Unit Tests using PCI bus id ( #1403 )
...
* mapping devices wrt pci
* Gpu allocation by using pci mapping
* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.
* Removing redundant testBed instance calling
2024-11-04 10:51:00 -06:00
corey-derochie-amd
1c45962273
Hide or fix all build warnings ( #1331 )
...
* Changing C-strings to be const.
* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.
* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.
* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.
* Fixed VLA in rccl UT.
2024-11-04 09:46:42 -07:00
Bertan Dogancay
984f1e4343
Increase MAX_STACK_SIZE for UT ( #1398 )
2024-11-01 13:07:45 -04:00
Tim
fd9924cfe7
Adjustment for UT Sendrecv ( #1400 )
...
Enabled UT sendrecv to same rank and refactor UBR call
2024-10-30 15:13:53 -04:00
Sean Karlage
bdf9544c81
static: Enable true rccl static library build ( #1379 )
...
* static: Enable true rccl static library build
Rccl uses `-fgpu-rdc` to compile, which requires a specialized link command in order to produce a true static library.
When "linking" with `amdclang++`, you need to use `--emit-static-lib` and `--hip-link` to get a static library with all gpu code generated. Subsequent links with binaries do not need any special flags to generate gpu code.`
Building a static library:
```
$ cmake -DROCM_PATH=$ROCM_PATH -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=off -DCMAKE_POSITION_INDEPENDENT_CODE=on -DAMDGPU_TARGETS=gfx942 -DCMAKE_CXX_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang .. 2>&1 | tee -a /tmp/build.txt
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- Checking for ROCm support for GPU targets: gfx942
-- Compiling for gfx942
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- ROCM_PATH found: /opt/rocm
-- Compiling with amdclang++
-- HIP compiler: clang
-- HIP runtime: rocclr
-- amdclang++ executable: /opt/rocm/llvm/bin/amdclang++
-- amdclang++ version: 18.0.0git
-- hipconfig executable: /opt/rocm/bin/hipconfig
-- amdclang++ HIP version: 6.2.41133
-- ROCm version: 6.2.0
...
$ make -j 32
[ 0%] Updating git_version.cpp if necessary
-- Updating git_version.cpp
[ 0%] Built target git_version_check
[ 0%] Hipifying src/transport/shm.cc -> /home/skarlage/local/rccl/build/hipify/src/transport/shm.cc
[ 0%] Hipifying src/bootstrap.cc -> /home/skarlage/local/rccl/build/hipify/src/bootstrap.cc
[ 0%] Hipifying src/channel.cc -> /home/skarlage/local/rccl/build/hipify/src/channel.cc
[ 1%] Hipifying src/device/all_reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_reduce.h
[ 1%] Hipifying src/device/broadcast.h -> /home/skarlage/local/rccl/build/hipify/src/device/broadcast.h
[ 1%] Hipifying src/device/all_gather.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_gather.h
[ 1%] Hipifying src/device/common.cu -> /home/skarlage/local/rccl/build/hipify/src/device/common.cu.cpp
[ 1%] Hipifying src/debug.cc -> /home/skarlage/local/rccl/build/hipify/src/debug.cc
[ 1%] Hipifying src/device/alltoall_pivot.h -> /home/skarlage/local/rccl/build/hipify/src/device/alltoall_pivot.h
[ 1%] Hipifying src/device/network/unpack/unpack.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack.h
[ 4%] Hipifying src/collectives.cc -> /home/skarlage/local/rccl/build/hipify/src/collectives.cc
[ 4%] Hipifying src/device/msccl_kernel_impl.h -> /home/skarlage/local/rccl/build/hipify/src/device/msccl_kernel_impl.h
[ 4%] Hipifying src/device/network/unpack/unpack_defs.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack_defs.h
[ 4%] Hipifying src/device/op128.h -> /home/skarlage/local/rccl/build/hipify/src/device/op128.h
[ 4%] Hipifying src/device/onerank.cu -> /home/skarlage/local/rccl/build/hipify/src/device/onerank.cu.cpp
[ 4%] Hipifying src/device/common.h -> /home/skarlage/local/rccl/build/hipify/src/device/common.h
[ 6%] Hipifying src/device/prims_ll.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll.h
[ 6%] Hipifying src/device/primitives.h -> /home/skarlage/local/rccl/build/hipify/src/device/primitives.h
[ 6%] Hipifying src/device/prims_ll128.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll128.h
[ 6%] Hipifying src/device/reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce.h
[ 7%] Hipifying src/device/common_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/common_kernel.h
[ 7%] Hipifying src/device/reduce_scatter.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_scatter.h
[ 7%] Hipifying src/device/sendrecv.h -> /home/skarlage/local/rccl/build/hipify/src/device/sendrecv.h
[ 7%] Hipifying src/device/prims_simple.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_simple.h
[ 7%] Hipifying src/enqueue.cc -> /home/skarlage/local/rccl/build/hipify/src/enqueue.cc
[ 7%] Hipifying src/device/reduce_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_kernel.h
[ 7%] Hipifying src/graph/connect.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/connect.cc
[ 7%] Hipifying src/graph/rings.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.h
[ 8%] Hipifying src/graph/rings.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.cc
[ 8%] Hipifying src/graph/rome_models.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.cc
[ 8%] Hipifying src/graph/rome_models.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.h
[ 8%] Hipifying src/graph/paths.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/paths.cc
[ 9%] Hipifying src/graph/search.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/search.cc
[ 9%] Hipifying src/graph/topo.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/topo.cc
...
[100%] Linking CXX static library librccl.a
Elapsed time: 270 s. (time), 0.00046 s. (clock)
Elapsed time: 0 s. (time), 0.000342 s. (clock)
[100%] Built target rccl
```
Static rccl exists:
```
$ file librccl.a
librccl.a: current ar archive
```
* Fix up tests Cmake for static builds
We also need to fix up the tests CMakeLists.txt to:
* Remove the unused `BUILD_STATIC` option
* Use `SHARED_LIBS` as a definition of whether we're building static or
not.
2024-10-16 06:58:50 -07:00
akolliasAMD
7fb9189760
Regression timing fix ( #1361 )
...
* Removed testbed initialization on standalone tests
* .jenkins renabled all tests
2024-10-03 10:41:26 -06:00
Tim
40e93ebc29
Remove 0 size UBR ( #1346 )
...
ncclCommRegister, required for UBR, will call IB dmabuf regMr directly which forbids 0 size message
2024-09-24 18:16:51 -04:00
saurabhAMD
4856309413
Making variable names consistent in EnvVars.cpp ( #1327 )
...
* Making variable names consistent in EnvVars.cpp
2024-09-11 09:23:31 -05:00
saurabhAMD
289a80c4e9
Enabling Unit Tests for CPX mode ( #1324 )
...
* Unit Tests for RCCL in CPX mode
* override pow2gpus set by cpx mode by user argument
* Adding comment for UT_POW2_GPUS
* Additional comment on why using pow2gpus for cpx mode.
2024-09-09 10:12:33 -05:00
Tim
757d1891e9
Update EnvVars.cpp
2024-09-04 16:55:36 -04:00
mberenjk
db840f024e
adding all nccl apis to api_support to enable rccl tracing by rocprofv3 ( #1297 )
...
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com >
2024-08-22 12:36:07 -05:00
Tim
a4793286c7
Adding User Buffer Registration support for Unit test ( #1199 )
...
* Adding UBR support for UT SendRecv
Signed-off-by: Tim Hu <timhu102@amd.com >
* Update test/common/TestBedChild.cpp
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
---------
Signed-off-by: Tim Hu <timhu102@amd.com >
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com >
2024-07-30 13:39:25 -04:00
akolliasAMD
c246e25f8e
gfx12 Disable ll protocol ( #1268 )
2024-07-26 08:59:55 -06:00
Wenkai Du
89349f2ce4
Template unroll for RCCL kernels ( #1250 )
...
* Template unroll for RCCL kernels
* Adding unroll template arg during CMake hipification
* Reduce linking parallel jobs to avoid OOM in CI
* Workaround issues with UT tests
SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking
* CI: do not use -j 16 when building
* CI: use -j 8 when building
* Only reduce parallel linking job for CI extended
* Restore original jenkins command. Change parallel linking jobs in cmake
* Disable MSCCLPP
---------
Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com >
2024-07-19 08:15:59 -07:00
corey-derochie-amd
0c36d571ea
Enable multi-threading for MSCCL ( #1203 )
...
MSCCL can now run in a multi-threaded configuration. To test in the unit tests, added the ENABLE_OPENMP compile definition flag and the --openmp-test-enable flag to the unit test build script. To activate, set the environment variables UT_MULTITHREADED=1 and UT_PROCESS_MASK=1. Set Jenkins to use this mode.
2024-07-04 09:34:38 -06:00
saurabhAMD
e170f41ddd
Unit Tests for testing channels ( #1222 )
2024-06-25 10:10:10 -05:00
saurabhAMD
392a73fdef
enable UT to test with channels greater than 64
2024-06-13 13:54:08 -05:00
Bertan Dogancay
0ec41f1386
[UT] Start supporting multiple group calls and graphs ( #1151 )
...
* Start supporting multiple group calls UT
2024-04-25 11:11:16 -06:00
BertanDogancay
e1a835910e
Merge remote-tracking branch 'nccl/master' into develop
2024-04-23 13:34:00 -07:00
mberenjk
428837ffe4
replacing rccl_bfloat16 with hip_bfloat16 ( #1126 )
...
Co-authored-by: mberenjk <mberenjk@amd.com >
2024-04-11 11:30:37 -05:00
arvindcheru
c0a51dc84b
Static Build update - Moved all cmake install() to rocm-cmake APIs, static build update ( #1123 )
2024-03-26 11:11:09 -04:00
Andy li
6777e65c1d
Enable fp8 support ( #1101 )
...
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
2024-03-08 15:17:53 -08:00
Tim
0d06b0f1de
Adding FP16 cases to unit tests( #1093 )
...
Signed-off-by: Tim Hu <timhu102@amd.com >
2024-02-26 12:08:04 -05:00
BertanDogancay
b098120c40
Increase max stack size when ll128 enabled
2024-02-15 15:56:59 -08:00
BertanDogancay
76f83f95ab
Merge remote-tracking branch 'rccl/develop' into 2.19.4
2024-02-15 13:37:14 -08:00
Bertan Dogancay
dc2d486ba0
Add stack size UT ( #1081 )
...
* Add stack size UT
2024-02-12 17:56:15 -07:00
Shilei Tian
ba9f7917ba
Add a constructor for PtrUnion in case it is not initialized explicitly ( #1064 )
2024-01-26 08:00:27 -08:00