Currently std::complex and some other std functions require uses to
include hip_runtime.h before any other headers to work, which is not
reliable.
changes are made in clang to fix this issue:
https://reviews.llvm.org/D81176
which requires hipcc and HIP headers to make corresponding changes.
This patch will make sure the clang change will not break
HIP/ROCclr during this transition.
After the transition is done, we can remove explicitly setting
include path for HIP-Clang and HIP header in hipcc and hip config
cmake files and rely on clang driver to set it automatically.
Change-Id: I5d226861c2560ffa6c5ab17343a43cc378048061
1.Added hipModuleLaunchKernel multithreaded multi GPU scenario.
2.removed hipCtxCreate API from earlier test as it is deprecated.
SWDEV-238517 for enhancing hip unit tests
Change-Id: Id102d80887b6ff61a59938dbeb9fa2a26a3275b2
Similar to HCC, link with compiler-rt to support __fp16 and _Float16 type conversions in ONNX models. This should resolve SWDEV-238491.
Change-Id: Iad8dcff568831719f501f562a04023326ae8036c
The hipOccupancyMaxPotentialBlockSize API is meant to return the
number of threads for the highest-occupancy workgroup, and the number
of those workgroups. It was previously calculating the number of
maximum-sized workgroups that would fit on a single CU. This is
a mixture of the API we wanted (to calculate max potential block size)
and the MaxBlocksPerMultiprocessor function.
This patch fixes it up so that the internal occupancy calculation
function works for two uses: the traditional function that calculates
the maximum blocks per multiprocessor when a user passes in a fixed
block size (used for hipMaxBlocksPerMultiprocessor style functions)
and a function that calculates the size of a block that would lead
to maximum occupancy, and how many blocks of that size would be
needed to fill the whole GPU (for hipOccupancyMaxPotentialBlockSize
style functions).
This also updates the occupancy calculation function to prepare for
gfx10, which does not have SGPR-based occupancy limits.
Change-Id: Ie007b3f9d5ebc4e166b50a3a051498af35650f35
Git may not be available, and this may not be a git checkout, as would
happen in a release tarball. Doesn't really attempt to get a nicer
version formatting if some of the git subcommands fail.
Change-Id: Ib568cd1310983a43f2664ded72528d7e41f554c0
SWDEV-237377 - This fixes time calculation where the event may
be recorded on Null stream and work submitted on other streams
Change-Id: Ie36310dea5cee2fed4a514ed01f04db4b47e571c
1. Updated FAQ with shft*sync not supported hip_faq.md
2. Corrected some of input parameter description in hcc_details/hip_runtime_api.h
3. Redirect shfl*() to shfl_*_sync() for nvcc path where CUDA > 9.0
Change-Id: I3d8184db5fcc622852c9bad96b706348e8dfc16c
find_package should now be the only way to import ROCclr. Also update
the build example comment.
The build scripts used 2 custom variables to manually specify the
build and source directories for where to find VDI. Once renamed to
ROCclr, these conflicted with the variables automatically set by
find_package(ROCclr). These hacks tried to satisfy this intermediate
step to try satisfying commit ordering problems to get through PSDB.
The INSTALL.md documentation should also be updated, but it's
completely missing any mention of ROCclr now, and still gives
directions for hcc.
Change-Id: I6fc94b6cb36241a9d4f22d24e49523367f803461