Shilei Tian
8e9fcf111a
Check -parallel-jobs before use ( #1451 )
...
`-parallel-jobs` is not always available, such as upstream LLVM.
2024-12-11 11:40:49 -06:00
akolliasAMD
2284101624
removing unused gfx targets ( #1411 )
2024-11-06 08:50:08 -07:00
corey-derochie-amd
1c45962273
Hide or fix all build warnings ( #1331 )
...
* Changing C-strings to be const.
* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.
* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.
* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.
* Fixed VLA in rccl UT.
2024-11-04 09:46:42 -07:00
corey-derochie-amd
6db2644766
Set minimum ROCm version for MSCCLPP to 6.2 ( #1401 )
...
* Added ROCm version check around setting `ENABLE_MSCCLPP` flag.
2024-10-30 16:48:54 -06:00
Bertan Dogancay
373f113524
Dynamically select unroll factor to build for when targeting local arch ( #1371 )
...
* Dynamically select unroll factor to build for when targeting local arch only
2024-10-21 10:53:11 -04:00
Sean Karlage
bdf9544c81
static: Enable true rccl static library build ( #1379 )
...
* static: Enable true rccl static library build
Rccl uses `-fgpu-rdc` to compile, which requires a specialized link command in order to produce a true static library.
When "linking" with `amdclang++`, you need to use `--emit-static-lib` and `--hip-link` to get a static library with all gpu code generated. Subsequent links with binaries do not need any special flags to generate gpu code.`
Building a static library:
```
$ cmake -DROCM_PATH=$ROCM_PATH -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=off -DCMAKE_POSITION_INDEPENDENT_CODE=on -DAMDGPU_TARGETS=gfx942 -DCMAKE_CXX_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang++ -DCMAKE_C_COMPILER=$ROCM_PATH/lib/llvm/bin/amdclang .. 2>&1 | tee -a /tmp/build.txt
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- Checking for ROCm support for GPU targets: gfx942
-- Compiling for gfx942
-- Could NOT find GTest (missing: GTEST_LIBRARY GTEST_INCLUDE_DIR GTEST_MAIN_LIBRARY) (Required is at least version "1.11")
-- ROCM_PATH found: /opt/rocm
-- Compiling with amdclang++
-- HIP compiler: clang
-- HIP runtime: rocclr
-- amdclang++ executable: /opt/rocm/llvm/bin/amdclang++
-- amdclang++ version: 18.0.0git
-- hipconfig executable: /opt/rocm/bin/hipconfig
-- amdclang++ HIP version: 6.2.41133
-- ROCm version: 6.2.0
...
$ make -j 32
[ 0%] Updating git_version.cpp if necessary
-- Updating git_version.cpp
[ 0%] Built target git_version_check
[ 0%] Hipifying src/transport/shm.cc -> /home/skarlage/local/rccl/build/hipify/src/transport/shm.cc
[ 0%] Hipifying src/bootstrap.cc -> /home/skarlage/local/rccl/build/hipify/src/bootstrap.cc
[ 0%] Hipifying src/channel.cc -> /home/skarlage/local/rccl/build/hipify/src/channel.cc
[ 1%] Hipifying src/device/all_reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_reduce.h
[ 1%] Hipifying src/device/broadcast.h -> /home/skarlage/local/rccl/build/hipify/src/device/broadcast.h
[ 1%] Hipifying src/device/all_gather.h -> /home/skarlage/local/rccl/build/hipify/src/device/all_gather.h
[ 1%] Hipifying src/device/common.cu -> /home/skarlage/local/rccl/build/hipify/src/device/common.cu.cpp
[ 1%] Hipifying src/debug.cc -> /home/skarlage/local/rccl/build/hipify/src/debug.cc
[ 1%] Hipifying src/device/alltoall_pivot.h -> /home/skarlage/local/rccl/build/hipify/src/device/alltoall_pivot.h
[ 1%] Hipifying src/device/network/unpack/unpack.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack.h
[ 4%] Hipifying src/collectives.cc -> /home/skarlage/local/rccl/build/hipify/src/collectives.cc
[ 4%] Hipifying src/device/msccl_kernel_impl.h -> /home/skarlage/local/rccl/build/hipify/src/device/msccl_kernel_impl.h
[ 4%] Hipifying src/device/network/unpack/unpack_defs.h -> /home/skarlage/local/rccl/build/hipify/src/device/network/unpack/unpack_defs.h
[ 4%] Hipifying src/device/op128.h -> /home/skarlage/local/rccl/build/hipify/src/device/op128.h
[ 4%] Hipifying src/device/onerank.cu -> /home/skarlage/local/rccl/build/hipify/src/device/onerank.cu.cpp
[ 4%] Hipifying src/device/common.h -> /home/skarlage/local/rccl/build/hipify/src/device/common.h
[ 6%] Hipifying src/device/prims_ll.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll.h
[ 6%] Hipifying src/device/primitives.h -> /home/skarlage/local/rccl/build/hipify/src/device/primitives.h
[ 6%] Hipifying src/device/prims_ll128.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_ll128.h
[ 6%] Hipifying src/device/reduce.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce.h
[ 7%] Hipifying src/device/common_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/common_kernel.h
[ 7%] Hipifying src/device/reduce_scatter.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_scatter.h
[ 7%] Hipifying src/device/sendrecv.h -> /home/skarlage/local/rccl/build/hipify/src/device/sendrecv.h
[ 7%] Hipifying src/device/prims_simple.h -> /home/skarlage/local/rccl/build/hipify/src/device/prims_simple.h
[ 7%] Hipifying src/enqueue.cc -> /home/skarlage/local/rccl/build/hipify/src/enqueue.cc
[ 7%] Hipifying src/device/reduce_kernel.h -> /home/skarlage/local/rccl/build/hipify/src/device/reduce_kernel.h
[ 7%] Hipifying src/graph/connect.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/connect.cc
[ 7%] Hipifying src/graph/rings.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.h
[ 8%] Hipifying src/graph/rings.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rings.cc
[ 8%] Hipifying src/graph/rome_models.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.cc
[ 8%] Hipifying src/graph/rome_models.h -> /home/skarlage/local/rccl/build/hipify/src/graph/rome_models.h
[ 8%] Hipifying src/graph/paths.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/paths.cc
[ 9%] Hipifying src/graph/search.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/search.cc
[ 9%] Hipifying src/graph/topo.cc -> /home/skarlage/local/rccl/build/hipify/src/graph/topo.cc
...
[100%] Linking CXX static library librccl.a
Elapsed time: 270 s. (time), 0.00046 s. (clock)
Elapsed time: 0 s. (time), 0.000342 s. (clock)
[100%] Built target rccl
```
Static rccl exists:
```
$ file librccl.a
librccl.a: current ar archive
```
* Fix up tests Cmake for static builds
We also need to fix up the tests CMakeLists.txt to:
* Remove the unused `BUILD_STATIC` option
* Use `SHARED_LIBS` as a definition of whether we're building static or
not.
2024-10-16 06:58:50 -07:00
Nilesh M Negi
41a2c02773
[BUILD] Require use of Python3 interpreter ( #1367 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-10-09 22:36:50 -05:00
Bertan Dogancay
2dd10c8f17
[BUILD] Move code generation to python from CMake ( #1360 )
...
* Use generate.py for func generation
* Convert AddUnroll.cmake to bash
2024-10-03 10:21:19 -04:00
Nilesh M Negi
3c61e934f2
[BUILD] Enable MSCCL++ for gfx942 variants ( #1344 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-09-23 19:05:49 -05:00
corey-derochie-amd
736a705875
Re-enabled MSCCL++ ( #1325 )
...
* Added restrictions around calling MSCCL++ collectives (#1281 )
* Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather.
* Renamed and refactored some mscclpp types.
* Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging.
* Disable MSCCL++ when using managed memory buffers as it isn't supported.
* Added datatype and op constraints for MSCCL++ AllReduce.
* Added documentation on MSCCL++ restrictions to the README.
* [BUILD] Support custom CMake flags in MSCCLPP (#1275 )
* [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
* [BUILD] CMake flags to support build-id in MSCCLPP
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
* [BUILD] Fix CMake warnings in MSCCLPP build
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
* Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them.
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
Co-authored-by: Corey Derochie <corey.derochie@amd.com >
* Link to libmscclpp_nccl statically (#1282 )
* Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions.
* Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled.
* `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt.
* Removed IBVerbs dependency for integrating with MSCCL++ (#1313 )
* Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294 )
* Include mscclpp as a git submodule (#1314 )
* Added the desired mscclpp commit as a git submodule.
* Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively.
* Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule.
* Enabled MSCCL++ feature build.
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com >
2024-09-11 09:55:16 -06:00
Nilesh M Negi
d3012d3307
[BUILD] Support clang++ compiler ( #1316 )
...
* [BUILD] Support clang++ compiler
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
* [BUILD] Enable check_symbol_exists for BFD and clang++
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
* [BUILD] Define default C compiler
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-09-05 09:59:58 -05:00
Nilesh M Negi
607e34dd99
[BUILD] Enable RCCL build with amdclang++ ( #1128 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-08-25 13:44:22 -04:00
mberenjk
db840f024e
adding all nccl apis to api_support to enable rccl tracing by rocprofv3 ( #1297 )
...
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com >
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com >
2024-08-22 12:36:07 -05:00
akolliasAMD
d6c317d6ae
removed hcc mentions ( #1291 )
2024-08-14 15:04:13 -06:00
Nilesh M Negi
4f31ab85ea
[BUILD] Update gfxTargets for ASAN build ( #1242 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-08-06 10:53:51 -05:00
Nilesh M Negi
cb2e0615d7
[BUILD] Disable MSCCLPP build by default ( #1283 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-08-02 23:17:51 -05:00
Wenkai Du
ca5341d419
Restore number of parallel linking jobs ( #1278 )
...
* Restore number of parallel linking jobs
* Dynamically adjust number of linker jobs with limit of 16 jobs max
* Fix typo
* Add cgroup v1 support
2024-07-30 08:04:14 -07:00
corey-derochie-amd
b31b4082dd
Only initialize MSCCL++ when runtime-enabled. ( #1266 )
2024-07-22 00:41:31 -06:00
Wenkai Du
89349f2ce4
Template unroll for RCCL kernels ( #1250 )
...
* Template unroll for RCCL kernels
* Adding unroll template arg during CMake hipification
* Reduce linking parallel jobs to avoid OOM in CI
* Workaround issues with UT tests
SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking
* CI: do not use -j 16 when building
* CI: use -j 8 when building
* Only reduce parallel linking job for CI extended
* Restore original jenkins command. Change parallel linking jobs in cmake
* Disable MSCCLPP
---------
Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com >
2024-07-19 08:15:59 -07:00
corey-derochie-amd
6dc47eecd7
Integrated RCCL with MSCCL++ for small message sizes ( #1231 )
2024-07-12 15:32:58 -06:00
Rahul Vaidya
c755b9cf93
Improved version reporting in NCCL_DEBUG=VERSION ( #1232 )
...
* Improved version reporting in NCCL_DEBUG=VERSION.
Signed-off-by: rahulvaidya20 <ravaidya@amd.com >
* Version reporting changes
Signed-off-by: rahulvaidya20 <ravaidya@amd.com >
* Versioning changes: Initialized char arrays to null and fixed typo.
---------
Signed-off-by: rahulvaidya20 <ravaidya@amd.com >
2024-07-12 08:14:29 -05:00
akolliasAMD
63e4d76e23
gfx12 initial enablement ( #1219 )
2024-07-10 13:32:09 -06:00
akolliasAMD
6475da2ed9
fixed typo on BFD linkage ( #1192 )
2024-06-03 10:05:47 -06:00
Nilesh M Negi
5aaf7121d9
[BUILD] Update install.sh for RCCL build ( #1191 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-05-31 17:58:34 -05:00
Edgar Gabriel
9ad913bfa8
add alternative to rocm_smi_lib
2024-05-14 13:51:41 -07:00
Wenkai Du
a64aab5f63
Use rocm-smi thread only mutex when available ( #1169 )
2024-05-08 14:32:24 -07:00
Wenkai Du
b18784d8b8
Add compiler warning for uninitialized variable and fix ( #1163 )
...
* Add compiler warning for uninitialized variable and fix
* Add -Wsometimes-uninitialized
* Convert warning to error
2024-05-08 07:00:25 -07:00
BertanDogancay
e1a835910e
Merge remote-tracking branch 'nccl/master' into develop
2024-04-23 13:34:00 -07:00
Bertan Dogancay
3caad91f32
Add unique files to source list ( #1144 )
2024-04-15 09:46:53 -06:00
mberenjk
428837ffe4
replacing rccl_bfloat16 with hip_bfloat16 ( #1126 )
...
Co-authored-by: mberenjk <mberenjk@amd.com >
2024-04-11 11:30:37 -05:00
arvindcheru
c1b8eab8e1
Update Depends with correct HIP Runtime package name ( #1130 )
2024-04-09 19:27:07 -04:00
arvindcheru
c0a51dc84b
Static Build update - Moved all cmake install() to rocm-cmake APIs, static build update ( #1123 )
2024-03-26 11:11:09 -04:00
corey-derochie-amd
503a472a25
Replaced ROCmSoftwarePlatform and RadeonOpenCompute links with ROCm links. ( #1125 )
2024-03-25 16:29:13 -06:00
Wenkai Du
5976f757dd
Remove hipEventDisableSystemFence ( #1122 )
...
There is no indication that disabling system fence has any latency improvement.
Removing it per recommendation from HIP.
2024-03-25 08:01:57 -07:00
Nilesh M Negi
53fad75001
BUILD: Enable RCCL static build ( #1114 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-03-15 12:18:18 -05:00
Andy li
6777e65c1d
Enable fp8 support ( #1101 )
...
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo
2024-03-08 15:17:53 -08:00
Wenkai Du
cbd955627e
Add support for using contiguous for GPU direct RDMA ( #1096 )
...
Enabled by env var RCCL_NET_CONTIGUOUS_MEM=1
2024-02-29 10:06:43 -08:00
Bertan Dogancay
b617aecc31
Implement ROCTX ( #1094 )
...
* Implement roctx
2024-02-27 15:46:15 -07:00
BertanDogancay
76f83f95ab
Merge remote-tracking branch 'rccl/develop' into 2.19.4
2024-02-15 13:37:14 -08:00
Bertan Dogancay
dc2d486ba0
Add stack size UT ( #1081 )
...
* Add stack size UT
2024-02-12 17:56:15 -07:00
Wenkai Du
d999d9ad21
Merge remote-tracking branch 'rccl/develop' into 2.19.4
2024-02-09 11:31:03 -06:00
Bertan Dogancay
8a442faa12
Nvtx support ( #1076 )
...
* NVTX support
2024-02-08 14:08:24 -07:00
Wenkai Du
e64324a64a
Merge remote-tracking branch 'rccl/develop' into HEAD
2024-02-01 12:17:09 -06:00
Nilesh M Negi
2458f158b1
Enable kernarg preloading for ROCm 6.1 ( #1068 )
...
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com >
2024-02-01 12:14:04 -06:00
Wenkai Du
6afabf0d0b
Remove enhcompat.cc
2024-01-24 17:13:30 -08:00
BertanDogancay
81ddf9de89
Merge remote-tracking branch 'nccl/v2.19' into develop
2024-01-24 15:25:33 -08:00
Wenkai Du
7e25d5bc55
Use new HIP graph API compatible with CUDA 11030 ( #991 )
...
* Use new HIP graph API compatible with CUDA 11030
* Update dependency to ROCm 6.1
* Fix single stream use case
2024-01-21 19:00:50 -08:00
Bertan Dogancay
5f365a9957
Turn IFC off ( #1053 )
2024-01-18 15:29:36 -07:00
Bertan Dogancay
28d9b170c9
[DEV] Configure functions in RCCL ( #986 )
...
* configure functions in rccl
2024-01-18 15:07:16 -07:00
Wenkai Du
5851ae5974
Re-enable L128 on gfx90a of compiler supports it ( #1036 )
2024-01-10 08:01:11 -08:00