[Background] it was found that if lazy linking used for a library that calls hipExtLaunchMultiKernelMultiDevice API then this API can get the wrong program_state object for looking up device kernels leading to a "No device code available" error in this API.
To fix this issue, the API was refactored to be inline and get and pass the correct program_state to an internal hip API to request a multi-device kernel launch.
SWDEV-212749:
o Recent changes to “add support for extended launch” require hip_runtime.h to be include in hip_ext.h
o Order in which external applications include hip_hcc.h/hip_runtime.h causes compilation failure
This will fix issue #1621. It also adds tests for is_callable with c++11, c++14, and c++17.
The fallback implementation was completely broken so I rewrote it so it pass the tests as well. This should be used instead of PR #1631.
* Add support for extended launch syntax.
* Add unit test.
* Fix typo
* hipExtLaunchKernelGGL lives in hip_ext.h
Change-Id: Ice32dab0d43475fda65c6a910c11416871a8f2ff
* [dtest] remove redundant include from hipModuleGetGlobal dtest
Texture 2D image mapping for pitched arrays:
github issue: Texture Object's Buffer seems to be Misaligned #886
JIRA ticket: SWDEV-199313
SWDEV-151670 : Fixed issue with 3D texture with 4 components
SWDEV-151671 : Issue with 2D layered texture with 4 components
- Fix 2 runtime API prototypes
`hipOccupancyMaxActiveBlocksPerMultiprocessor` and
`hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- Add missing function templates of them in hip-clang.
* [HIP] Introduce library_types.h as a common header for libs
[Reason]
Currently, hipFFT, hipBLAS and other HIP libs use their own data types, prefixed with HIPFFT or HIPBLAS, whereas in CUDA those types are common and declared in library_types.h
[TODO]
Switch hipFFT, hipBLAS and other HIP libs to use common library_types.h.
* [HIP] Move include for library_types.h to hip_runtime.h
[Reason]
Repeat CUDA's behaviour, where library_types.h is included in cuda_runtime.h
This fixes the usage of an uninitialized cdattr variable in hipDeviceGetAttribute for the CUDA backend when taking the switch default, as detailed in #1317.
Note that the directed_tests/runtimeApi/device/hipGetDeviceAttribute.tst test fails for me, but it already did before applying this patch. Let's see what CI says!
Added new memory API's hipMemAllocPitch, hipMemAllocHost, hipMemsetD16, hipMemsetD16Async, hipMemsetD8Async
Modified to support all scenarios hipMemcpyParam2DAsync, hipMemcpyParam2D.