[clr] SWDEV-566950 - Adding changelog for 7.2 (#1891)
* [clr]SWDEV-566950 - Adding changelog for 7.2 * Update CHANGELOG.md * Update CHANGELOG.md
Этот коммит содержится в:
@@ -2,6 +2,13 @@
|
||||
|
||||
Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs.amd.com/projects/HIP/en/latest/index.html)
|
||||
|
||||
## HIP 8.0 for ROCm 8.0
|
||||
|
||||
### Added
|
||||
|
||||
* New HIP APIs
|
||||
- `hipKernelGetParamInfo` returns the offset and size of a kernel parameter
|
||||
|
||||
## HIP 7.2 for ROCm 7.2
|
||||
|
||||
### Added
|
||||
@@ -17,15 +24,27 @@ Full documentation for HIP is available at [rocm.docs.amd.com](https://rocm.docs
|
||||
- `hipLibraryGetKernelCount` gets kernel count in library
|
||||
- `hipStreamCopyAttributes` copies attributes from source stream to destination stream
|
||||
- `hipOccupancyAvailableDynamicSMemPerBlock` returns dynamic shared memory available per block when launching numBlocks blocks on CU.
|
||||
- `hipKernelGetParamInfo` returns the offset and size of a kernel parameter
|
||||
* New HIP flags
|
||||
- `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
|
||||
- `hipHostRegisterIoMemory` is supported in `hipHostRegister`, used to register I/O memory with HIP runtime so it can be accessed by the GPU.
|
||||
* New HIP flags
|
||||
- `hipMemLocationTypeHost`, enables handling virtual memory management in host memory location, in addition to device memory.
|
||||
- Support for flags in `hipGetProcAddress`, enables searching for the per-thread version symbols.
|
||||
- `HIP_GET_PROC_ADDRESS_DEFAULT`
|
||||
- `HIP_GET_PROC_ADDRESS_LEGACY_STREAM`
|
||||
- `HIP_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM`
|
||||
|
||||
### Resolved issues
|
||||
|
||||
* Corrected the calculation of the value of maximum shared memory per multiprocessor, in HIP device properties.
|
||||
|
||||
### Optimized
|
||||
|
||||
* Graph node scaling:
|
||||
HIP runtime implements optimized doorbell ring mechanism for certain topologies of graph execution. It enables efficient batching of graph nodes. This enhancement provides better alignment with CUDA Graph optimizations.
|
||||
HIP also adds a new performance test for HIP graphs with programmable topologies to measure graph performance across different structures. The test evaluates graph instantiation time, first launch time, repeat launch times, and end-to-end execution for various graph topologies. The test implements comprehensive timing measurements including CPU overhead and device execution time.
|
||||
* Back memory set (`memset`) optimization:
|
||||
HIP runtime now implements a back memory set (memset) optimization to improve how `memset` nodes are processed during graph execution. This enhancement specifically handles varying number of AQL (Architected Queue Language) packets for `memset` graph node due to graph node set params for AQL batch submission approach.
|
||||
* Async handler performance improvement:
|
||||
HIP runtime has removed the lock contention in async handler enqueue path. This enhancement reduces runtime overhead and maximizes GPU throughput, for asynchronous kernel execution, especially in multi-threaded applications.
|
||||
|
||||
## HIP 7.1.1 for ROCm 7.1.1
|
||||
|
||||
### Added
|
||||
|
||||
Ссылка в новой задаче
Block a user