Updated Changelog with 7.1.1 and 7.2.0 stub sections (#2008)

* Missing ROCm 7.0 & 7.1.0 Changelog entries (#1976) * Update CHANGELOG.md * Update CHANGELOG.md * Apply suggestions from code review Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Update CHANGELOG.md * Update CHANGELOG.md * Update CHANGELOG.md --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> * Added ROCm 7.2.0 section. * Update CHANGELOG.md * Apply suggestion from @corey-derochie-amd --------- Co-authored-by: Jeffrey Novotny <jnovotny@amd.com> [ROCm/rccl commit: 561ad2fe05]
2025-10-28 13:41:22 -06:00
@@ -2,24 +2,35 @@

 Full documentation for RCCL is available at [https://rccl.readthedocs.io](https://rccl.readthedocs.io)

-## Unreleased - RCCL 2.27.7 for ROCm 7.1.0
+## Unreleased - RCCL 2.27.7 for ROCm 7.2.0
+
+## Unreleased - RCCL 2.27.7 for ROCm 7.1.1
+
+### Resolved Issues
+
+* Fixed crash when using the librccl-profiler plugin with the all-to-all collective after the 2.27 update.
+
+## RCCL 2.27.7 for ROCm 7.1.0

 ### Added
-* `RCCL_FORCE_ENABLE_DMABUF` added as a debugging feature if the user wants to explicitly enable DMABUF and forego system/kernel checks.
+* Added `RCCL_FORCE_ENABLE_DMABUF` as a debugging feature if the user wants to explicitly enable DMABUF and forego system/kernel checks.
 * Added `RCCL_P2P_BATCH_THRESHOLD` to set the message size limit for batching P2P operations. This mainly affects small message performance for alltoall at a large scale but also applies to alltoallv.
 * Added `RCCL_P2P_BATCH_ENABLE` to enable batching P2P operations to receive performance gains for smaller messages up to 4MB for alltoall when the workload requires it. This is to avoid performance dips for larger messages.
-* added `RCCL_CHANNEL_TUNING_ENABLE` to enable channel tuning that overrides RCCL's internal adjustments based on threadThreshold.
+* Added `RCCL_CHANNEL_TUNING_ENABLE` to enable channel tuning that overrides RCCL's internal adjustments based on `threadThreshold`.

 ### Changed

 * The MSCCL++ feature is now disabled by default. The `--disable-mscclpp` build flag is replaced with `--enable-mscclpp` in the `rccl/install.sh` script.
-* Compatibility with NCCL 2.27.7
+* Compatibility with NCCL 2.27.7.

-### Resolved issues
-* Improve small message performance for alltoall by enabling and optimizing batched P2P operations. 
+### Optimized
+* Enabled and optimized batched P2P operations to improve small message performance for AllToAll and AllGather.
+* Optimized channel count selection to improve efficiency for small to medium message sizes in ReduceScatter.
+* Changed code inlining to improve latency for small message sizes for AllReduce, AllGather, and ReduceScatter.

 ### Known issues
 * Symmetric memory kernels are currently disabled due to ongoing CUMEM enablement work.
+* When running this version of RCCL using ROCm versions earlier than 6.4.0, the user must set the environment flag `HSA_NO_SCRATCH_RECLAIM=1`.

 ## RCCL 2.26.6 for ROCm 7.0.0

@@ -29,6 +40,7 @@ Full documentation for RCCL is available at [https://rccl.readthedocs.io](https:
 * Fixed unit test failures in tests ending with `ManagedMem` and `ManagedMemGraph` suffixes.
 * Suboptimal algorithmic switching point for AllReduce on MI300x.
 * Fixed the known issue "When splitting a communicator using `ncclCommSplit` in some GPU configurations, MSCCL initialization can cause a segmentation fault." with a design change to use `comm` instead of `rank` for `mscclStatus`. The Global map for `comm` to `mscclStatus` is still not thread safe but should be explicitly handled by mutexes for read writes. This is tested for correctness, but there is a plan to use a thread-safe map data structure in upcoming changes.
+* Fixed broken functionality within the LL protocol on gfx950 by disabling inlining of LLGenericOp kernels.

 ### Added

@@ -47,10 +59,16 @@ Full documentation for RCCL is available at [https://rccl.readthedocs.io](https:

 ### Changed

-* Compatibility with NCCL 2.23.4
-* Compatibility with NCCL 2.24.3
-* Compatibility with NCCL 2.25.1
-* Compatibility with NCCL 2.26.6
+* Compatibility with NCCL 2.23.4.
+* Compatibility with NCCL 2.24.3.
+* Compatibility with NCCL 2.25.1.
+* Compatibility with NCCL 2.26.6.
+
+### Optimized
+* Improved the performance of the `FP8` Sum operation by upcasting to `FP16`.
+
+### Known Issues
+* When running this version of RCCL using ROCm versions earlier than 6.4.0, the user must set the environment flag `HSA_NO_SCRATCH_RECLAIM=1`.

 ## RCCL 2.22.3 for ROCm 6.4.2