* Added RCCL env params to control setting the SO_REUSEADDR and SO_LINGER socket options. This can allow control over the number of file descriptors created during bootstrapping.
* Casted the linger value to `int` sooner to avoid a scope of unknown typed-ness.
* Added CHANGELOG entry for this feature.
[ROCm/rccl commit: 2e35417fe5]
* ext-src: add MSCCLPP memory registration APIs
* update mem-reg patch with mscclpp helper routine to check if buffer is registered
* RCCL integration of MSCCL++ user-buffer registration APIs
* only include mscclpp_nccl header if ENABLE_MSCCLPP is defined
* ext-src: update mscclpp mem-reg patch
* add helper routine to patch
* check handle before MSCCL++ deregister
* fix typo to replace send buff with recv buff
* in case of no mscclpp registration, dduring deRegister call, ont fall back to rccl deRegister which will return an error
* Apply suggestions from code review
Whitespace suggestions and reducing diffs to avoid future merge conflicts
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
* rename helper functions and change their return type
* set RCCL user-buffer registration to occur if attempting MSCCL++ registration with a buffer in managed memory
---------
Co-authored-by: isaki001 <Ioannis.Sakiotis@amd.com>
Co-authored-by: isaki001 <36317038+isaki001@users.noreply.github.com>
Co-authored-by: corey-derochie-amd <161367113+corey-derochie-amd@users.noreply.github.com>
[ROCm/rccl commit: e9b6bbca8a]
It seems like here wants to check xgmi_node instead. If checks node for "nvlink", it will verify the link_info everytime.
If checks node for "xgmi", when get yes answer, it won't need check vsmi topo interface.
[ROCm/rccl commit: f2ee8d9132]
* Switched calls to `cudaMemcpyAsync` to be `cudaMemcpy` in `ncclTransportP2pSetup` to avoid race condition with `cudaIpcOpenMemHandle` inside p2p `connect`. See `ncclP2pImportShareableBuffer`.
* Moved synchronize outside of the loop, as it isn't necessary to sync between every iteration of the loop.
[ROCm/rccl commit: c158d3a9b4]
* Initializing all ranks to the same value to avoid failure of UT AllReduce for FP8 type
Co-authored-by: Marzieh Berenjkoub <mberenjk@amd.com>
[ROCm/rccl commit: 39483c55f8]
Certain CMake functions deduplicates arguments by default. For example, if we
have two `target_link_options` with both `-Xoffload-linker -opt-A` and then
`-Xoffload-linker -opt-B`, the final link command would be `-Xoffload-linker
-opt-A -opt-B`, which is not what we want.
[ROCm/rccl commit: 7386fac64a]
* Add RCCL debugging guide
* Changes from external review
* More edits from internal review
* Additional edits
* Minor correction
* More changes after external review
* Integrate index and ToC changes with incoming merge changes
* Integrate feedback from management review
* Minor edits from the internal review
[ROCm/rccl commit: 6d34fb7632]
* Add Topologies for 16-GPU gfx942 SuperNode
- Add GigaIO topologies to tools/topo_expl for dev and testing
- Add GigaIO Columba 16 GPU romeModel and adjust topology
matching algorithm in rome_models for 16 GPU system
- Fix bug which failed to match Rome Model when using subsets
of system resources (i.e. ROCR_VISIBLE_DEVICES is set)
- Fixes for topo_expl
* Fix bug w/ 1H16P
[ROCm/rccl commit: a05329bd0d]
* Refactor RCCL install guide into several pages
* Changes from code review and new docker guide
* Add missing entries to ToC
* Minor fixes
* Fix help strings
* Edits after review and remove extra white space
[ROCm/rccl commit: bf7c130631]
* mapping devices wrt pci
* Gpu allocation by using pci mapping
* Passing gpuPriorityOrder in as an argument rather than making the functions non-static.
* Removing redundant testBed instance calling
[ROCm/rccl commit: 69b2b712ab]
* Changing C-strings to be const.
* Changed variable-length arrays to std::vector to avoid warnings. VLA is a compiler extension.
* Changed `#define` inside functions into `constexpr int` to preserve scoping and avoid macro redefinition warnings.
* Disabled warnings for modifying `CMAKE_CXX_FLAGS` caused by `check_symbol_exists`, which temporarily modifies the flag to do a compile check.
* Fixed VLA in rccl UT.
[ROCm/rccl commit: 1c45962273]