1. Updated FAQ with shft*sync not supported hip_faq.md
2. Corrected some of input parameter description in hcc_details/hip_runtime_api.h
3. Redirect shfl*() to shfl_*_sync() for nvcc path where CUDA > 9.0
Change-Id: I3d8184db5fcc622852c9bad96b706348e8dfc16c
This technique should never be used, and only accessed through
__builtins.
There's currently no builtin for groupstaticsize. I left ds_swizzle
since for some reason it switches to the builtin based on __HCC__ or
not.
Change-Id: If1e1394221dba83ea4add6db5e94d6b715552044
This change is required by AMDMIGraphX.
It was for HCC only. HIP-Clang also needs it for __fp16 since AMDMIGraphX uses it.
Change-Id: Id49322b7b89ef799accdf6b47627a6fce51d1ab5
These might contain garbage causing the runtime to incorrectly parse the state of the texture references.
Change-Id: I93c726fa30b580b3e14c50ac939f3c71b0d1c8d9
* Disable device side malloc
Currently device side malloc is not working and takes excessive
device memory.
Disable it for now until a working malloc is implemented.
Change-Id: I1ad908c1c53a83752383b4be96688a848642c699
This is charrypick of 9ead991784
and https://github.com/ROCm-Developer-Tools/HIP/pull/2009
Fix cmake config file
Removed cmake target files under packaging directory.
Merged cmake config .in files for HIP-Clang and HCC as one.
Use cmake generated target files in both install and packaging.
This makes cmake config file consistent for make install and
make package.
Let device side malloc/free return nullptr and trap
Change-Id: I448f3ea2d4934648089bad371debc203f895cba6
Support hipLaunchCooperativeKernelMultiDevice()
- Add validation logic for MGPU launches to pass a cuda test
Change-Id: Iccca7fde43493fc3bc6685512d39202271ae3e92
If the code object is embedded in an already mapped file, and the
lifetime of the mapped file exceeds the lifetime of the executable,
we do not need to make a copy of the binary.
This allows the ROCR to present the code object URI as
file:///path/to/file#offset=X&size=Y.
libc++ defines fma as template function for auto promotion of mixed-type
arguments. libc++ does not handle _Float16 as _Float16 is not a supported
type by C++ standard. As such, it is unlikely we can commit our fix for
_Float16 to libc++ trunk.
Therefore we handle _Float16 with a template specialization of
__numeric_type in HIP headers.
Change-Id: If01960a657ebf1a7a67463cdcf66fab7458dff3c
Even though the runtime and driver texture object API is one to one, the structs used by these APIs are not. See hipResourceDesc vs HIP_RESOURCE_DESC differences.
These differences are not trivial and most likely won't be able to handled by hipify, so we need new API entry points.
Change-Id: Id4bcb1ad0ae15378dbdb5a2ed07e5ea30f320082
- Use symbol value as the qeury key. Compared to the symbol name, the
symbol value is more robust as developers may use unqualified or
qualified identifiers. It also removes the mangling and/or demangling
requirement for the runtime API.
Change-Id: I9d4259f3842612c7cc98551269fc2092d8b5c19e
This is cherry-picked from PR#1947 that was committed to the
github repo. It allows printf to work with hip-clang and HCC
runtime.
Change-Id: I754753250ea1e694cf3441722e2d4c9d25fa75bc
This also adds declarations of all the missing texture APIs.
hipTexRefSet*() functions need to take a textureReference as a ptr for type erasure to work. Runtime has been modified to accomodate this.
This change only applies to VDI.
Change-Id: Icf43cc5bd44dfc2c39084b7fe56d5a793bf7319f
* Fix cooperative launch APIs to set hipGetLastError
Previously, the cooperative launch APIs did not properly log their
errors in the global hipGetLastError variable before returning back
to the user. As such, the APIs would leave hipSuccess in the
last error, which would break some use cases.
This fixes that problem by making a trampoline function that does
the HIP_INIT_API and ihipLogStatus.
* Add missing flag to the log of multi-GPU launch
What Cuda refers to "linear texture memory" is the OpenCL equivalent of CL_MEM_OBJECT_IMAGE1D_BUFFER. For these types of allocations we should create a typed buffer instead of an image.
Currently there is no check in the texture fetch functions as to what kind of SRD is written into the texture object, so any kind of incorrect programming will cause the TA to hang. Fortunately for us, every one writes correct code :)
Change-Id: I80dab85a992f2c0754ebf303d40ac6b5e045c7c1
Currently the texture C++ API is forwarded to the ihip*Impl() calls, which are not even a part of Cuda. These should be forwarded to their respective Cuda C APIs instead.
This change also fixes a bug with hipUnbindTexture() creating a dangling pointer.
Change-Id: Ifafc9d106855a11bec84a18ea214b3d89e39990d