add support for intercepting SIGUSR2 in RCCL. This signal will
not terminate the execution of the application, but print the stacktrace
of the process that the signal was sent to instead.
* Reduce AlltoAll port usage when connecting proxy
Reuse socket ports when connecting proxies in AlltoAll.
Existing port usage in AlltoAll is O(n) for recv and O(n) for send,
reusing socket ports in server or client side will make one of them
O(1), reusing both will reduce the total port usage to O(1) and enables
AlltoAll in >64 MI200 nodes.
* Update changelog accordingly
Update changelog accordingly.
* Adding the missing roc:: namespace, effectively changing the value of RCCL_LIBRARY from rccl to roc::rccl.
The important difference is that rccl is treated as a symbolic "-lrccl" by linker (and fail the linking
due to a missing library search path), while roc::rccl is a target name, which can resolve into an absolute
library path.
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
* Adding a changelog entry
* minor updates to wording
* missing period
Co-authored-by: Paul Fultz II <pfultz2@yahoo.com>
Co-authored-by: Saad Rahim <44449863+saadrahim@users.noreply.github.com>
This is a single commit of the source code changes required to
introduce support for multiple ranks per device.
A new interface (ncclCommRankInitMulti) has to be used to make use of
this new feature.
Improve allreduce performance when we have more than one network interface per
GPU and we need to use PXN to close rings.
Add support for PCI Gen5 on 5.4 kernels.
Fix crash when setting NCCL_SET_THREAD_NAME.
Fix random crash in init due to uninitialized struct.
Fix hang on cubemesh topologies.
Add P2P_DIRECT_DISABLE parameter to disable direct access to pointers within a
process.
Starting from rocm 5.2 there is a reorganization of the
include directories. This pr allows to compile
rccl on both the old and the new directory layout.
This solution is using find_package() for identifying correct
settings for rocm_smi starting from rocm-5.2, and the original (manual)
settings for all previous releases.
Tested with rocm-5.2, 5.1.1, 5.0.2, and 4.5.2.
Tweak the signal handler and force non-release build
Increase ulimit locked memory value
Update the singal handler to use bfd symbol resolution.
Include configure logic to find bfd functions.
Add optionally c++ function name demangling