This makes hipLaunchKernelGGL take a variable argument list, that will be
expanded before being fed to hipLaunchKernelGGLInternal.
This is different from 961717879d.
We try to accomodate the case when a kernel template has multiple
type parameters.
Change-Id: I87577d402c92b0f3b51e298f8293f4065e1f6de8
- HIP-Clang follows the standard assert definition by providing
`__assert_fail`. But, `assert` macro is added as an HCC-specific
workaround due to the missing implementation. Only enable that on the
HCC compilation to avoid unexpected behaviors on HIP-Clang
compilation.
Change-Id: I1c9a707baff9b85c30faef58c52ebfe07e3fc3fc
This makes hipLaunchKernelGGL take a variable argument list, that will be
expanded before being fed to hipLaunchKernelGGLInternal.
Change-Id: Id76e2bf91acd5d68f56a24fc39f219f2eeb06d33
Currently std::complex and some other std functions require uses to
include hip_runtime.h before any other headers to work, which is not
reliable.
changes are made in clang to fix this issue:
https://reviews.llvm.org/D81176
which requires hipcc and HIP headers to make corresponding changes.
This patch will make sure the clang change will not break
HIP/ROCclr during this transition.
After the transition is done, we can remove explicitly setting
include path for HIP-Clang and HIP header in hipcc and hip config
cmake files and rely on clang driver to set it automatically.
Change-Id: I5d226861c2560ffa6c5ab17343a43cc378048061
1. Updated FAQ with shft*sync not supported hip_faq.md
2. Corrected some of input parameter description in hcc_details/hip_runtime_api.h
3. Redirect shfl*() to shfl_*_sync() for nvcc path where CUDA > 9.0
Change-Id: I3d8184db5fcc622852c9bad96b706348e8dfc16c
This technique should never be used, and only accessed through
__builtins.
There's currently no builtin for groupstaticsize. I left ds_swizzle
since for some reason it switches to the builtin based on __HCC__ or
not.
Change-Id: If1e1394221dba83ea4add6db5e94d6b715552044
This change is required by AMDMIGraphX.
It was for HCC only. HIP-Clang also needs it for __fp16 since AMDMIGraphX uses it.
Change-Id: Id49322b7b89ef799accdf6b47627a6fce51d1ab5
These might contain garbage causing the runtime to incorrectly parse the state of the texture references.
Change-Id: I93c726fa30b580b3e14c50ac939f3c71b0d1c8d9
* Disable device side malloc
Currently device side malloc is not working and takes excessive
device memory.
Disable it for now until a working malloc is implemented.
Change-Id: I1ad908c1c53a83752383b4be96688a848642c699
This is charrypick of 9ead991784
and https://github.com/ROCm-Developer-Tools/HIP/pull/2009
Fix cmake config file
Removed cmake target files under packaging directory.
Merged cmake config .in files for HIP-Clang and HCC as one.
Use cmake generated target files in both install and packaging.
This makes cmake config file consistent for make install and
make package.
Let device side malloc/free return nullptr and trap
Change-Id: I448f3ea2d4934648089bad371debc203f895cba6
Support hipLaunchCooperativeKernelMultiDevice()
- Add validation logic for MGPU launches to pass a cuda test
Change-Id: Iccca7fde43493fc3bc6685512d39202271ae3e92
If the code object is embedded in an already mapped file, and the
lifetime of the mapped file exceeds the lifetime of the executable,
we do not need to make a copy of the binary.
This allows the ROCR to present the code object URI as
file:///path/to/file#offset=X&size=Y.