[clr] SWDEV-566950 - Adding changelog for 7.2 (#1891)

* [clr]SWDEV-566950 - Adding changelog for 7.2

* Update CHANGELOG.md

* Update CHANGELOG.md
Этот коммит содержится в:
Julia Jiang
2025-11-19 12:10:14 -05:00
коммит произвёл GitHub
родитель 56a829995e
Коммит 78a9d9ff70
+23 -4
Просмотреть файл
@@ -2,6 +2,13 @@
Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs.amd.com/projects/HIP/en/latest/index.html)
## HIP 8.0 for ROCm 8.0
### Added
* New HIP APIs
- `hipKernelGetParamInfo` returns the offset and size of a kernel parameter
## HIP 7.2 for ROCm 7.2
### Added
@@ -17,15 +24,27 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
- `hipLibraryGetKernelCount` gets kernel count in library
- `hipStreamCopyAttributes` copies attributes from source stream to destination stream
- `hipOccupancyAvailableDynamicSMemPerBlock` returns dynamic shared memory available per block when launching numBlocks blocks on CU.
- `hipKernelGetParamInfo` returns the offset and size of a kernel parameter
* New HIP flags
- `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
- `hipHostRegisterIoMemory` is supported in `hipHostRegister`, used to register I/O memory with HIP runtime so it can be accessed by the GPU.
* New HIP flags
- `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
- Support for flags in `hipGetProcAddress`, enables searching for the per-thread version symbols.
- `HIP_GET_PROC_ADDRESS_DEFAULT`
- `HIP_GET_PROC_ADDRESS_LEGACY_STREAM`
- `HIP_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM`
### Resolved issues
* Corrected the calculation of the value of maximum shared memory per multiprocessor, in HIP device properties.
### Optimized
* Graph node scaling:
HIP runtime implements optimized doorbell ring mechanism for certain topologies of graph execution. It enables efficient batching of graph nodes. This enhancement provides better alignment with CUDA Graph optimizations.
HIP also adds a new performance test for HIP graphs with programmable topologies to measure graph performance across different structures. The test evaluates graph instantiation time, first launch time, repeat launch times, and end-to-end execution for various graph topologies. The test implements comprehensive timing measurements including CPU overhead and device execution time.
* Back memory set (`memset`) optimization:
HIP runtime now implements a back memory set (memset) optimization to improve how `memset` nodes are processed during graph execution. This enhancement specifically handles varying number of AQL (Architected Queue Language) packets for `memset` graph node due to graph node set params for AQL batch submission approach.
* Async handler performance improvement:
HIP runtime has removed the lock contention in async handler enqueue path. This enhancement reduces runtime overhead and maximizes GPU throughput, for asynchronous kernel execution, especially in multi-threaded applications.
## HIP 7.1.1 for ROCm 7.1.1
### Added