Граф коммитов

192 Коммитов

Автор SHA1 Сообщение Дата
Nilesh M Negi daaa6e155f [BUILD] MSCCLPP: Fix OS check for CentOS (#1568)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: corey-derochie-amd <corey.derochie@amd.com>
2025-02-25 13:03:04 -06:00
Sohaib Nadeem 2f1c0bb213 Remove COMPILING_TARGETS from CMakeLists.txt (#1533)
COMPILING_TARGETS is not actually used for --offload-arch option,
instead GPU_TARGETS is being used implicitly when we call
find_package(hip REQUIRED) (See hip-config-amd.cmake).
2025-02-16 21:46:37 -06:00
rahulc1984 92ac136db5 Make rccl version detection robust. (#1517)
* Accept an EXPLICIT_ROCM_VERSION and use that vs inspecting the environment if provided.
* Use CMake's built in file reading support vs execute_process (without error checking) to avoid silent but deadly later failures.
* Properly quote some comparisons to avoid syntax errors if they happen to have an empty string.
* Guard against ROCM_PATH being an empty string, avoiding stray path extensions to root directories, etc.

Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
2025-02-11 10:48:22 -07:00
corey-derochie-amd 42ab425037 Switched from cmake_host_system_information feature to a manual parse (#1518)
* Switched cmake_host_system_information feature to a manual parse to remain cmake 3.5 compliant.

* Updating minimum cmake to 3.16 to conform with the rest of ROCm. This change still applies.
2025-02-11 08:51:39 -07:00
Bertan Dogancay 5804603632 [BUILD] Fix unsupported arguments in generator (#1519)
* Fix unsupported arguments in generator

* Get ROCM_PATH as env variable
2025-02-03 14:51:55 -05:00
Jeffrey E Erickson 7af21dd996 modify max memory to use free (#1513) 2025-02-03 09:35:02 -06:00
Bertan Dogancay 35fe9e06f3 [Profiler] Enable ROCTX during build by default (#1506)
* Enable ROCTX during build by default

* Check for roctx support in cmake
2025-01-29 11:29:46 -05:00
corey-derochie-amd bd0f5cccbe Disabled MSCCL++ feature except when building on Ubuntu or CentOS host systems (#1505)
* Added condition for MSCCL++ to only build on an Ubuntu host system.

* Added CentOS to the supported OS list
2025-01-29 08:54:09 -07:00
BertanDogancay 36343be84f Merge remote-tracking branch 'nccl/master' into develop 2025-01-23 12:08:46 -06:00
Nilesh M Negi fd03b5b6a5 [BUILD] Fix ASAN build if GPU targets has xnack+ (#1474)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-12-26 12:13:36 -06:00
akolliasAMD 45c1c1a781 changed the CMake option from AMDGPU_TARGETS to GPU_TARGETS (#1440) 2024-12-12 12:09:30 -07:00
Shilei Tian 7386fac64a Improve the handling of CMake deduplication (#1450)
Certain CMake functions deduplicates arguments by default. For example, if we
have two `target_link_options` with both `-Xoffload-linker -opt-A` and then
`-Xoffload-linker -opt-B`, the final link command would be `-Xoffload-linker
-opt-A -opt-B`, which is not what we want.
2024-12-11 13:48:18 -08:00
Shilei Tian 8e9fcf111a Check -parallel-jobs before use (#1451)
`-parallel-jobs` is not always available, such as upstream LLVM.
2024-12-11 11:40:49 -06:00
akolliasAMD 2284101624 removing unused gfx targets (#1411) 2024-11-06 08:50:08 -07:00
corey-derochie-amd 1c45962273 Hide or fix all build warnings (#1331)
* Changing C-strings to be const.

* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.

* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.

* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.

* Fixed VLA in rccl UT.
2024-11-04 09:46:42 -07:00
corey-derochie-amd 6db2644766 Set minimum ROCm version for MSCCLPP to 6.2 (#1401)
* Added ROCm version check around setting `ENABLE_MSCCLPP` flag.
2024-10-30 16:48:54 -06:00
Bertan Dogancay 373f113524 Dynamically select unroll factor to build for when targeting local arch (#1371)
* Dynamically select unroll factor to build for when targeting local arch only
2024-10-21 10:53:11 -04:00
Sean Karlage bdf9544c81 static: Enable true rccl static library build (#1379)
* static: Enable true rccl static library build

Rccl uses `-fgpu-rdc` to compile, which requires a specialized link command in order to produce a true static library.

When "linking" with `amdclang++`, you need to use `--emit-static-lib` and `--hip-link` to get a static library with all gpu code generated.  Subsequent links with binaries do not need any special flags to generate gpu code.`

Building a static library:
```
$ cmake -DROCM_PATH=$ROCM_PATH -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=off -DCMAKE_POSITION_INDEPENDENT_CODE=on -DAMDGPU_TARGETS=gfx942 -DCMAKE_CXX_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang .. 2>&1 | tee -a /tmp/build.txt
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- Checking for ROCm support for GPU targets: gfx942
-- Compiling for gfx942
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- ROCM_PATH found: /opt/rocm
-- Compiling with amdclang++
-- HIP compiler:     clang
-- HIP runtime:      rocclr
-- amdclang++ executable: /opt/rocm/llvm/bin/amdclang++
-- amdclang++ version:    18.0.0git
-- hipconfig executable: /opt/rocm/bin/hipconfig
-- amdclang++ HIP version:    6.2.41133
-- ROCm version: 6.2.0
...
$ make -j 32
[  0%] Updating git_version.cpp if necessary
-- Updating git_version.cpp
[  0%] Built target git_version_check
[  0%] Hipifying src/transport/shm.cc -> /home/skarlage/local/rccl/build/hipify/src/transport/shm.cc
[  0%] Hipifying src/bootstrap.cc -> /home/skarlage/local/rccl/build/hipify/src/bootstrap.cc
[  0%] Hipifying src/channel.cc -> /home/skarlage/local/rccl/build/hipify/src/channel.cc
[  1%] Hipifying src/device/all_reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_reduce.h
[  1%] Hipifying src/device/broadcast.h -> /home/skarlage/local/rccl/build/hipify/src/device/broadcast.h
[  1%] Hipifying src/device/all_gather.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_gather.h
[  1%] Hipifying src/device/common.cu -> /home/skarlage/local/rccl/build/hipify/src/device/common.cu.cpp
[  1%] Hipifying src/debug.cc -> /home/skarlage/local/rccl/build/hipify/src/debug.cc
[  1%] Hipifying src/device/alltoall_pivot.h -> /home/skarlage/local/rccl/build/hipify/src/device/alltoall_pivot.h
[  1%] Hipifying src/device/network/unpack/unpack.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack.h
[  4%] Hipifying src/collectives.cc -> /home/skarlage/local/rccl/build/hipify/src/collectives.cc
[  4%] Hipifying src/device/msccl_kernel_impl.h -> /home/skarlage/local/rccl/build/hipify/src/device/msccl_kernel_impl.h
[  4%] Hipifying src/device/network/unpack/unpack_defs.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack_defs.h
[  4%] Hipifying src/device/op128.h -> /home/skarlage/local/rccl/build/hipify/src/device/op128.h
[  4%] Hipifying src/device/onerank.cu -> /home/skarlage/local/rccl/build/hipify/src/device/onerank.cu.cpp
[  4%] Hipifying src/device/common.h -> /home/skarlage/local/rccl/build/hipify/src/device/common.h
[  6%] Hipifying src/device/prims_ll.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll.h
[  6%] Hipifying src/device/primitives.h -> /home/skarlage/local/rccl/build/hipify/src/device/primitives.h
[  6%] Hipifying src/device/prims_ll128.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll128.h
[  6%] Hipifying src/device/reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce.h
[  7%] Hipifying src/device/common_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/common_kernel.h
[  7%] Hipifying src/device/reduce_scatter.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_scatter.h
[  7%] Hipifying src/device/sendrecv.h -> /home/skarlage/local/rccl/build/hipify/src/device/sendrecv.h
[  7%] Hipifying src/device/prims_simple.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_simple.h
[  7%] Hipifying src/enqueue.cc -> /home/skarlage/local/rccl/build/hipify/src/enqueue.cc
[  7%] Hipifying src/device/reduce_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_kernel.h
[  7%] Hipifying src/graph/connect.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/connect.cc
[  7%] Hipifying src/graph/rings.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.h
[  8%] Hipifying src/graph/rings.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.cc
[  8%] Hipifying src/graph/rome_models.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.cc
[  8%] Hipifying src/graph/rome_models.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.h
[  8%] Hipifying src/graph/paths.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/paths.cc
[  9%] Hipifying src/graph/search.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/search.cc
[  9%] Hipifying src/graph/topo.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/topo.cc
...
[100%] Linking CXX static library librccl.a
Elapsed time: 270 s. (time), 0.00046 s. (clock)
Elapsed time: 0 s. (time), 0.000342 s. (clock)
[100%] Built target rccl
```
Static rccl exists:
```
$ file librccl.a
librccl.a: current ar archive
```

* Fix up tests Cmake for static builds

We also need to fix up the tests CMakeLists.txt to:
* Remove the unused `BUILD_STATIC` option
* Use `SHARED_LIBS` as a definition of whether we're building static or
  not.
2024-10-16 06:58:50 -07:00
Nilesh M Negi 41a2c02773 [BUILD] Require use of Python3 interpreter (#1367)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-10-09 22:36:50 -05:00
Bertan Dogancay 2dd10c8f17 [BUILD] Move code generation to python from CMake (#1360)
* Use generate.py for func generation

* Convert AddUnroll.cmake to bash
2024-10-03 10:21:19 -04:00
Nilesh M Negi 3c61e934f2 [BUILD] Enable MSCCL++ for gfx942 variants (#1344)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-09-23 19:05:49 -05:00
corey-derochie-amd 736a705875 Re-enabled MSCCL++ (#1325)
* Added restrictions around calling MSCCL++ collectives (#1281)

* Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather.

* Renamed and refactored some mscclpp types.

* Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging.

* Disable MSCCL++ when using managed memory buffers as it isn't supported.

* Added datatype and op constraints for MSCCL++ AllReduce.

* Added documentation on MSCCL++ restrictions to the README.

* [BUILD] Support custom CMake flags in MSCCLPP (#1275)

* [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] CMake flags to support build-id in MSCCLPP

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] Fix CMake warnings in MSCCLPP build

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>

* Link to libmscclpp_nccl statically (#1282)

* Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions.

* Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled.

* `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt.

* Removed IBVerbs dependency for integrating with MSCCL++ (#1313)

* Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294)

* Include mscclpp as a git submodule (#1314)

* Added the desired mscclpp commit as a git submodule.

* Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively.

* Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule.

* Enabled MSCCL++ feature build.

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
2024-09-11 09:55:16 -06:00
Nilesh M Negi d3012d3307 [BUILD] Support clang++ compiler (#1316)
* [BUILD] Support clang++ compiler

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] Enable check_symbol_exists for BFD and clang++

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* [BUILD] Define default C compiler

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-09-05 09:59:58 -05:00
Nilesh M Negi 607e34dd99 [BUILD] Enable RCCL build with amdclang++ (#1128)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-08-25 13:44:22 -04:00
mberenjk db840f024e adding all nccl apis to api_support to enable rccl tracing by rocprofv3 (#1297)
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3

Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-08-22 12:36:07 -05:00
akolliasAMD d6c317d6ae removed hcc mentions (#1291) 2024-08-14 15:04:13 -06:00
Nilesh M Negi 4f31ab85ea [BUILD] Update gfxTargets for ASAN build (#1242)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-08-06 10:53:51 -05:00
Nilesh M Negi cb2e0615d7 [BUILD] Disable MSCCLPP build by default (#1283)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-08-02 23:17:51 -05:00
Wenkai Du ca5341d419 Restore number of parallel linking jobs (#1278)
* Restore number of parallel linking jobs

* Dynamically adjust number of linker jobs with limit of 16 jobs max

* Fix typo

* Add cgroup v1 support
2024-07-30 08:04:14 -07:00
corey-derochie-amd b31b4082dd Only initialize MSCCL++ when runtime-enabled. (#1266) 2024-07-22 00:41:31 -06:00
Wenkai Du 89349f2ce4 Template unroll for RCCL kernels (#1250)
* Template unroll for RCCL kernels

* Adding unroll template arg during CMake hipification

* Reduce linking parallel jobs to avoid OOM in CI

* Workaround issues with UT tests

SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking

* CI: do not use -j 16 when building

* CI: use -j 8 when building

* Only reduce parallel linking job for CI extended

* Restore original jenkins command. Change parallel linking jobs in cmake

* Disable MSCCLPP

---------

Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com>
2024-07-19 08:15:59 -07:00
corey-derochie-amd 6dc47eecd7 Integrated RCCL with MSCCL++ for small message sizes (#1231) 2024-07-12 15:32:58 -06:00
Rahul Vaidya c755b9cf93 Improved version reporting in NCCL_DEBUG=VERSION (#1232)
* Improved version reporting in NCCL_DEBUG=VERSION.

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

* Version reporting changes

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>

* Versioning changes: Initialized char arrays to null and fixed typo.

---------

Signed-off-by: rahulvaidya20 <ravaidya@amd.com>
2024-07-12 08:14:29 -05:00
akolliasAMD 63e4d76e23 gfx12 initial enablement (#1219) 2024-07-10 13:32:09 -06:00
akolliasAMD 6475da2ed9 fixed typo on BFD linkage (#1192) 2024-06-03 10:05:47 -06:00
Nilesh M Negi 5aaf7121d9 [BUILD] Update install.sh for RCCL build (#1191)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-05-31 17:58:34 -05:00
Edgar Gabriel 9ad913bfa8 add alternative to rocm_smi_lib 2024-05-14 13:51:41 -07:00
Wenkai Du a64aab5f63 Use rocm-smi thread only mutex when available (#1169) 2024-05-08 14:32:24 -07:00
Wenkai Du b18784d8b8 Add compiler warning for uninitialized variable and fix (#1163)
* Add compiler warning for uninitialized variable and fix

* Add -Wsometimes-uninitialized

* Convert warning to error
2024-05-08 07:00:25 -07:00
BertanDogancay e1a835910e Merge remote-tracking branch 'nccl/master' into develop 2024-04-23 13:34:00 -07:00
Bertan Dogancay 3caad91f32 Add unique files to source list (#1144) 2024-04-15 09:46:53 -06:00
mberenjk 428837ffe4 replacing rccl_bfloat16 with hip_bfloat16 (#1126)
Co-authored-by: mberenjk <mberenjk@amd.com>
2024-04-11 11:30:37 -05:00
arvindcheru c1b8eab8e1 Update Depends with correct HIP Runtime package name (#1130) 2024-04-09 19:27:07 -04:00
arvindcheru c0a51dc84b Static Build update - Moved all cmake install() to rocm-cmake APIs, static build update (#1123) 2024-03-26 11:11:09 -04:00
corey-derochie-amd 503a472a25 Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. (#1125) 2024-03-25 16:29:13 -06:00
Wenkai Du 5976f757dd Remove hipEventDisableSystemFence (#1122)
There is no indication that disabling system fence has any latency improvement.
Removing it per recommendation from HIP.
2024-03-25 08:01:57 -07:00
Nilesh M Negi 53fad75001 BUILD: Enable RCCL static build (#1114)
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
2024-03-15 12:18:18 -05:00
Andy li 6777e65c1d Enable fp8 support (#1101)
* initial checkin

* resolve cr comments

* resolve the build issue

* fix the data correctless issue

* update fp8 header file and update the unit test for fp8 support

* remove fp16 from fp8 headers

* fix ut issue and catch up the latest code from develop

* udate according to cr comments

* update ut according to cr comments

* update num floats for each SumPostDiv from 4 to 6

* update fp8 header file name

* fix the typo
2024-03-08 15:17:53 -08:00
Wenkai Du cbd955627e Add support for using contiguous for GPU direct RDMA (#1096)
Enabled by env var RCCL_NET_CONTIGUOUS_MEM=1
2024-02-29 10:06:43 -08:00
Bertan Dogancay b617aecc31 Implement ROCTX (#1094)
* Implement roctx
2024-02-27 15:46:15 -07:00