COMPILING_TARGETS is not actually used for --offload-arch option,
instead GPU_TARGETS is being used implicitly when we call
find_package(hip REQUIRED) (See hip-config-amd.cmake).
* Accept an EXPLICIT_ROCM_VERSION and use that vs inspecting the environment if provided.
* Use CMake's built in file reading support vs execute_process (without error checking) to avoid silent but deadly later failures.
* Properly quote some comparisons to avoid syntax errors if they happen to have an empty string.
* Guard against ROCM_PATH being an empty string, avoiding stray path extensions to root directories, etc.
Co-authored-by: Stella Laurenzo <stellaraccident@gmail.com>
* Switched cmake_host_system_information feature to a manual parse to remain cmake 3.5 compliant.
* Updating minimum cmake to 3.16 to conform with the rest of ROCm. This change still applies.
Certain CMake functions deduplicates arguments by default. For example, if we
have two `target_link_options` with both `-Xoffload-linker -opt-A` and then
`-Xoffload-linker -opt-B`, the final link command would be `-Xoffload-linker
-opt-A -opt-B`, which is not what we want.
* Changing C-strings to be const.
* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.
* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.
* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.
* Fixed VLA in rccl UT.
* Added restrictions around calling MSCCL++ collectives (#1281)
* Added restriction to non-zero 32-byte multiple message sizes to MSCCL++ AllGather.
* Renamed and refactored some mscclpp types.
* Only transmit the MSCCL++ unique id for non-split comm init. For splitting comm, it has already been transmitted. Instead, save the MSCCL++ communicator in child communicators when calling `ncclCommSplit`. Only destroy MSCCL++ communicators when no RCCL communicators remain that use it. Also improved trace logging.
* Disable MSCCL++ when using managed memory buffers as it isn't supported.
* Added datatype and op constraints for MSCCL++ AllReduce.
* Added documentation on MSCCL++ restrictions to the README.
* [BUILD] Support custom CMake flags in MSCCLPP (#1275)
* [BUILD] Support custom CMAKE_PREFIX_PATH in MSCCLPP
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
* [BUILD] CMake flags to support build-id in MSCCLPP
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
* [BUILD] Fix CMake warnings in MSCCLPP build
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
* Wrapped all cmake arguments passed to mscclpp to remove empty arguments and properly format them.
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Corey Derochie <corey.derochie@amd.com>
* Link to libmscclpp_nccl statically (#1282)
* Switched mscclpp_nccl to static linking. Added a build step to rename the NCCL API functions.
* Undid separation of building libmscclpp_nccl from building librccl with MSCCL++ integration. With a static build, it's either fully enabled or fully disabled.
* `nm` isn't always available in docker containers due to being stripped down. Removed use of `nm` in `cmake` and hard-coded the output into mscclpp_nccl_syms.txt.
* Removed IBVerbs dependency for integrating with MSCCL++ (#1313)
* Renamed `RCCL_ENABLE_MSCCLPP` to `RCCL_MSCCLPP_ENABLE` to conform to MSCCL. Set `RCCL_MSCCLPP_ENABLE` to 1 by default if `ENABLE_MSCCLPP` is defined, or 0 otherwise. Added a log warning if `RCCL_MSCCLPP_ENABLE` is set to 1 but `ENABLE_MSCCLPP` is not defined. (#1294)
* Include mscclpp as a git submodule (#1314)
* Added the desired mscclpp commit as a git submodule.
* Added step to automatically checkout the mscclpp submodule if it isn't already present, in case the user forgot to clone recursively.
* Added instruction to README to clone using --recurse-submodules to get the mscclpp submodule.
* Enabled MSCCL++ feature build.
---------
Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
* adding all nccl apis to api_support to enable rccl tracing by rocprofv3
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Template unroll for RCCL kernels
* Adding unroll template arg during CMake hipification
* Reduce linking parallel jobs to avoid OOM in CI
* Workaround issues with UT tests
SWDEV-469533: register spill fix is needed for mainline build
LWPCOMMLIBS-369: cannot enable 112 channels with 80 CUs
Use -parallel-jobs=8 for linking
* CI: do not use -j 16 when building
* CI: use -j 8 when building
* Only reduce parallel linking job for CI extended
* Restore original jenkins command. Change parallel linking jobs in cmake
* Disable MSCCLPP
---------
Co-authored-by: gilbertlee-amd <gilbert.lee@amd.com>
* initial checkin
* resolve cr comments
* resolve the build issue
* fix the data correctless issue
* update fp8 header file and update the unit test for fp8 support
* remove fp16 from fp8 headers
* fix ut issue and catch up the latest code from develop
* udate according to cr comments
* update ut according to cr comments
* update num floats for each SumPostDiv from 4 to 6
* update fp8 header file name
* fix the typo