Update CHANGELOG to match release branches 6.2 and 6.3 (#1391)

* [CHANGELOG] Add Known issues for ROCm 6.2.1

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>

* Updated 6.2.1 known issues to match the content in develop.

* Updated CHANGELOG for ROCm 6.3 release. (#1380)

* Updated CHANGELOG for ROCm 6.3 release.

* Update CHANGELOG to new format.

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

---------

Signed-off-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: nileshnegi <Nilesh.Negi@amd.com>
Co-authored-by: Jeffrey Novotny <jnovotny@amd.com>

[ROCm/rccl commit: 6ed513e1b9]
This commit is contained in:
corey-derochie-amd
2024-10-23 13:49:40 -06:00
zatwierdzone przez GitHub
rodzic 928414ac06
commit 1c700083b2
+33 -10
Wyświetl plik
@@ -2,19 +2,42 @@
Full documentation for RCCL is available at [https://rccl.readthedocs.io](https://rccl.readthedocs.io)
## RCCL 2.21.5 for ROCm 6.3.0
### Added
* MSCCL++ integration for specific contexts
* Performance collection to rccl_replayer
* Tuner Plugin example for MI300
* Tuning table for large number of nodes
* Support for amdclang++
* New Rome model
### Changed
* Compatibility with NCCL 2.21.5
* Increased channel count for MI300X multi-node
* Enabled MSCCL for single-process multi-threaded contexts
* Enabled gfx12
* Enabled CPX mode for MI300X
* Enabled tracing with rocprof
* Improved version reporting
* Enabled GDRDMA for Linux kernel 6.4.0+
### Resolved issues
* Fixed model matching with PXN enable
## RCCL 2.20.5 for ROCm 6.2.1
### Fixed
- GDR support flag now set with DMABUF
### Known issues
- On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, Direct Memory Access (DMA) transfers between the GPU and NIC are disabled and impacts multi-node RCCL performance.
- This issue was reproduced with RCCL 2.20.5 (ROCm 6.2.0 and 6.2.1) on systems with Broadcom Thor-2 NICs and affects other systems with RoCE networks using Linux 6.8.0 or newer.
- Older RCCL versions are also impacted.
- This issue will be addressed in a future ROCm release.
On systems running Linux kernel 6.8.0, such as Ubuntu 24.04, Direct Memory Access (DMA) transfers between the GPU and NIC are disabled and impacts multi-node RCCL performance.
This issue was reproduced with RCCL 2.20.5 (ROCm 6.2.0 and 6.2.1) on systems with Broadcom Thor-2 NICs and affects other systems with RoCE networks using Linux 6.8.0 or newer.
Older RCCL versions are also impacted.
This issue will be addressed in a future ROCm release.
## Unreleased - RCCL 2.20.5 for ROCm 6.2.0
## RCCL 2.20.5 for ROCm 6.2.0
### Changed
- Compatibility with NCCL 2.20.5
- Compatibility with NCCL 2.19.4