libc++ defines fma as template function for auto promotion of mixed-type
arguments. libc++ does not handle _Float16 as _Float16 is not a supported
type by C++ standard. As such, it is unlikely we can commit our fix for
_Float16 to libc++ trunk.
Therefore we handle _Float16 with a template specialization of
__numeric_type in HIP headers.
Change-Id: If01960a657ebf1a7a67463cdcf66fab7458dff3c
* Fix cooperative launch APIs to set hipGetLastError
Previously, the cooperative launch APIs did not properly log their
errors in the global hipGetLastError variable before returning back
to the user. As such, the APIs would leave hipSuccess in the
last error, which would break some use cases.
This fixes that problem by making a trampoline function that does
the HIP_INIT_API and ihipLogStatus.
* Add missing flag to the log of multi-GPU launch
HIP-Clang cuda_wrapper headers require clang include path before standard C++ include path.
However libc++ include path requires to be before clang include path.
To workaround this, we pass -isystem with the parent directory of clang include
path instead of the clang include path itself.
GCC emits a warning about using static functions like
hipCUDAErrorTohipError inside this function, because it has an
inline directive, but it's not static. Adding static to this function
to silence warnings (and prevent potential problems in the future).
NVCC warned if you tried to use hipOccupancyMaxActiveBlocksPerMultiprocessor
because when passing in a device function pointer, "const void* func" was
insufficient to describe it accurately. Adding a C++ templated class type
definition for this function.
- Need to check the availability of `__has_attribute` builtin macro
instead of compiler versions. That's more reliable and portable among
various compilers.
- Provides a very basic support of vectors for unknown compilers.
There are now two implementations of printf in HIP:
1. The implemenation for HCC is controlled by the HC_FEATURE_PRINTF
macro, and it works only with the HCC compiler used in combination
with the HCC runtime.
2. The implementation for hip-clang requires the VDI runtime, and is
always enabled with that combination.
* Device texture functions should not normalize the sampled pixel. This is already done by HW.
* Add support to use h/w capability for normalized float data convertion for driver API's
Co-authored-by: ansurya <50609411+ansurya@users.noreply.github.com>
* Add missing texturePitchAlignment member to the hipDeviceProp_t struct.
* Add missing hipDeviceAttributeTexturePitchAlignment enumerator to the hipDeviceAttribute_t enum.
* Initialize texturePitchAlignment to 256. This works for gfx9+, but is technically overaligned in most cases for pre-gfx9.
* Add the texturePitchAlignment property to the NVCC path.
Fixes SWDEV-218626 and SWDEV-218629
Changes:
- Revert "`static inline` in a header, just like excess sugar in a diet, causes bloat (#1692)"
This reverts commit be70b9f7e7.
- Revert "Fix rocFFT build failure (#1777)"
This reverts commit 753277422a.