[clr] SWDEV-566950 - Adding changelog for 7.2 (#1891)

* [clr]SWDEV-566950 - Adding changelog for 7.2 * Update CHANGELOG.md * Update CHANGELOG.md
2025-11-19 12:10:14 -05:00
@@ -2,6 +2,13 @@

 Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs.amd.com/projects/HIP/en/latest/index.html)

+## HIP 8.0 for ROCm 8.0
+
+### Added
+
+* New HIP APIs
+    - `hipKernelGetParamInfo`   returns the offset and size of a kernel parameter
+
 ## HIP 7.2 for ROCm 7.2

 ### Added
@@ -17,15 +24,27 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
    - `hipLibraryGetKernelCount` gets kernel count in library
    - `hipStreamCopyAttributes` copies attributes from source stream to destination stream
    - `hipOccupancyAvailableDynamicSMemPerBlock` returns dynamic shared memory available per block when launching numBlocks blocks on CU.
-    - `hipKernelGetParamInfo`   returns the offset and size of a kernel parameter
-* New HIP flags  
-    - `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory. 
-    - `hipHostRegisterIoMemory` is supported in `hipHostRegister`, used to register I/O memory with HIP runtime so it can be accessed by the GPU.
+* New HIP flags
+    - `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
    - Support for flags in `hipGetProcAddress`, enables searching for the per-thread version symbols.
      - `HIP_GET_PROC_ADDRESS_DEFAULT`
      - `HIP_GET_PROC_ADDRESS_LEGACY_STREAM`
      - `HIP_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM`

+### Resolved issues
+
+* Corrected the calculation of the value of maximum shared memory per multiprocessor, in HIP device properties. 
+
+### Optimized
+
+* Graph node scaling:
+HIP runtime implements optimized doorbell ring mechanism for certain topologies of graph execution. It enables efficient batching of graph nodes. This enhancement provides better alignment with CUDA Graph optimizations. 
+HIP also adds a new performance test for HIP graphs with programmable topologies to measure graph performance across different structures. The test evaluates graph instantiation time, first launch time, repeat launch times, and end-to-end execution for various graph topologies. The test implements comprehensive timing measurements including CPU overhead and device execution time.
+* Back memory set (`memset`) optimization:
+HIP runtime now implements a back memory set (memset) optimization to improve how `memset` nodes are processed during graph execution. This enhancement specifically handles varying number of AQL (Architected Queue Language) packets for `memset` graph node due to graph node set params for AQL batch submission approach.
+* Async handler performance improvement:
+HIP runtime has removed the lock contention in async handler enqueue path. This enhancement reduces runtime overhead and maximizes GPU throughput, for asynchronous kernel execution, especially in multi-threaded applications.
+
 ## HIP 7.1.1 for ROCm 7.1.1

 ### Added