[Background] it was found that if lazy linking used for a library that calls hipExtLaunchMultiKernelMultiDevice API then this API can get the wrong program_state object for looking up device kernels leading to a "No device code available" error in this API.
To fix this issue, the API was refactored to be inline and get and pass the correct program_state to an internal hip API to request a multi-device kernel launch.
[ROCm/hip commit: 68cc787781]
SWDEV-212749:
o Recent changes to “add support for extended launch” require hip_runtime.h to be include in hip_ext.h
o Order in which external applications include hip_hcc.h/hip_runtime.h causes compilation failure
[ROCm/hip commit: e60dec51da]
Handled the HCC version check appropriately as few of the directed tests (SWDEV-212161) were failing when hcc was bumped to 3.0.
[ROCm/hip commit: 6b06911ef1]
This will fix issue #1621. It also adds tests for is_callable with c++11, c++14, and c++17.
The fallback implementation was completely broken so I rewrote it so it pass the tests as well. This should be used instead of PR #1631.
[ROCm/hip commit: 8519a1411c]
- Fix 2 runtime API prototypes
`hipOccupancyMaxActiveBlocksPerMultiprocessor` and
`hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- Add missing function templates of them in hip-clang.
[ROCm/hip commit: 5c8a7521f4]
This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code).
[ROCm/hip commit: a855a13c22]
Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync.
[ROCm/hip commit: 356765a223]
* Make CAS loops use the TTAS idiom.
* More efficient re-formulation of TTAS.
* Fix typo.
* The typo was not quite a typo
[ROCm/hip commit: 9ba25b42c8]
* [HIP] Introduce library_types.h as a common header for libs
[Reason]
Currently, hipFFT, hipBLAS and other HIP libs use their own data types, prefixed with HIPFFT or HIPBLAS, whereas in CUDA those types are common and declared in library_types.h
[TODO]
Switch hipFFT, hipBLAS and other HIP libs to use common library_types.h.
* [HIP] Move include for library_types.h to hip_runtime.h
[Reason]
Repeat CUDA's behaviour, where library_types.h is included in cuda_runtime.h
[ROCm/hip commit: 94eb4155dd]
This fixes the usage of an uninitialized cdattr variable in hipDeviceGetAttribute for the CUDA backend when taking the switch default, as detailed in #1317.
Note that the directed_tests/runtimeApi/device/hipGetDeviceAttribute.tst test fails for me, but it already did before applying this patch. Let's see what CI says!
[ROCm/hip commit: 9ababa4276]
Added new memory API's hipMemAllocPitch, hipMemAllocHost, hipMemsetD16, hipMemsetD16Async, hipMemsetD8Async
Modified to support all scenarios hipMemcpyParam2DAsync, hipMemcpyParam2D.
[ROCm/hip commit: ba9c6e13e4]