1.Added hipModuleLaunchKernel multithreaded multi GPU scenario.
2.removed hipCtxCreate API from earlier test as it is deprecated.
SWDEV-238517 for enhancing hip unit tests
Change-Id: Id102d80887b6ff61a59938dbeb9fa2a26a3275b2
Similar to HCC, link with compiler-rt to support __fp16 and _Float16 type conversions in ONNX models. This should resolve SWDEV-238491.
Change-Id: Iad8dcff568831719f501f562a04023326ae8036c
The hipOccupancyMaxPotentialBlockSize API is meant to return the
number of threads for the highest-occupancy workgroup, and the number
of those workgroups. It was previously calculating the number of
maximum-sized workgroups that would fit on a single CU. This is
a mixture of the API we wanted (to calculate max potential block size)
and the MaxBlocksPerMultiprocessor function.
This patch fixes it up so that the internal occupancy calculation
function works for two uses: the traditional function that calculates
the maximum blocks per multiprocessor when a user passes in a fixed
block size (used for hipMaxBlocksPerMultiprocessor style functions)
and a function that calculates the size of a block that would lead
to maximum occupancy, and how many blocks of that size would be
needed to fill the whole GPU (for hipOccupancyMaxPotentialBlockSize
style functions).
This also updates the occupancy calculation function to prepare for
gfx10, which does not have SGPR-based occupancy limits.
Change-Id: Ie007b3f9d5ebc4e166b50a3a051498af35650f35
Git may not be available, and this may not be a git checkout, as would
happen in a release tarball. Doesn't really attempt to get a nicer
version formatting if some of the git subcommands fail.
Change-Id: Ib568cd1310983a43f2664ded72528d7e41f554c0
SWDEV-237377 - This fixes time calculation where the event may
be recorded on Null stream and work submitted on other streams
Change-Id: Ie36310dea5cee2fed4a514ed01f04db4b47e571c
1. Updated FAQ with shft*sync not supported hip_faq.md
2. Corrected some of input parameter description in hcc_details/hip_runtime_api.h
3. Redirect shfl*() to shfl_*_sync() for nvcc path where CUDA > 9.0
Change-Id: I3d8184db5fcc622852c9bad96b706348e8dfc16c
find_package should now be the only way to import ROCclr. Also update
the build example comment.
The build scripts used 2 custom variables to manually specify the
build and source directories for where to find VDI. Once renamed to
ROCclr, these conflicted with the variables automatically set by
find_package(ROCclr). These hacks tried to satisfy this intermediate
step to try satisfying commit ordering problems to get through PSDB.
The INSTALL.md documentation should also be updated, but it's
completely missing any mention of ROCclr now, and still gives
directions for hcc.
Change-Id: I6fc94b6cb36241a9d4f22d24e49523367f803461
When libamdhip64_static.a is built by Jenkin, sample square cannot been
built successfully because libamdhip64_static.a is archiveved in thin
mode. Thus in the patch it will be archiveved in full mode. Meanwhile
libamdhip64_static_temp.a will be useless and thus removed.
Change-Id: Ifd3882598ef0dc5e7af8db0e389e786025ceb455
This points to the cmake directory where the find module was found,
not a prefix for where it was found.
Based on the search below looking in roctracer, searching in ROCclr
for the header doesn't make much sense. The header should be either
provided by ROCclr xor roctracer. Having it possibly be provided by
two different dependencies is confusing, and a potential source of
version mismatch problems.
Change-Id: Ic2f6ec03f9a7b86225cf7e5c43f39a1360318a34
If the start and stop events have same command internally
then measure command end to command start
Change-Id: Ie70cfa37c06c06573f0ed58dab2bbe4434c1724b