* Fix XGMI detection
* Increase stack size
* Temporarily disable signal hangler in CI
[Process: 17281] Inside handler function signal: Segmentation fault (11)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
Add new NVLS algorithm for allreduce using NVLink SHARP (intra-node only).
Add new config options: cgaClusterSize, minCTAs, maxCTAs, netName.
Enable LL128 when we use PXN to close rings.
NVTX3 includes update.
Fix crash when one CollNet (SHARP) rail fails to initialize.
Add support for 400Gbit NDR network adapters (CX7)
Handle EINTR in socket poll() function
Add NCCL_PROGRESS_APPENDOP_FREQ to control op append overhead
Resource cleanup fixes
Fix double free in case of init failure
Fix crash in ncclCommAbort
Revert AMD speed commit
* Add MSCCL support
* Add alignment and message size checking
* Fix nRanks checking, in-place and out-of-place tests and group call handling
* Fix hipGraph unit test
* Change MSCCL init warning to INFO
* Revise license info
Add support for CUDA 12.0, drop Kepler (sm_35).
Support for H100 features.
Make socket code more robust and protected. Solves #555.
Improve performance on large CUDA graphs, reducing dependencies.
Reduce inter-socket bandwidth on AMD CPUs to favor better paths.
Various fixes to ncclCommAbort.
Make service thread polling resistant to EINTR.
Compile with profiling API by default.
Extend NVTX instrumentation with call arguments.
Fix crash with CollnetChain on some node topologies
Fix hang when interleaving the capture of different graphs
Fix hang during init in multi-threaded mode
Fix potential data corruption with LL128 protocol on unaligned buffers.
Fix CPU usage during preconnect
Fixes double-free in the error path for ncclCommInitAll
Workaround hang on H100 with Ring/LL128 on 2 GPUs.