The hipOccupancyMaxPotentialBlockSize API is meant to return the
number of threads for the highest-occupancy workgroup, and the number
of those workgroups. It was previously calculating the number of
maximum-sized workgroups that would fit on a single CU. This is
a mixture of the API we wanted (to calculate max potential block size)
and the MaxBlocksPerMultiprocessor function.
This patch fixes it up so that the internal occupancy calculation
function works for two uses: the traditional function that calculates
the maximum blocks per multiprocessor when a user passes in a fixed
block size (used for hipMaxBlocksPerMultiprocessor style functions)
and a function that calculates the size of a block that would lead
to maximum occupancy, and how many blocks of that size would be
needed to fill the whole GPU (for hipOccupancyMaxPotentialBlockSize
style functions).
This also updates the occupancy calculation function to prepare for
gfx10, which does not have SGPR-based occupancy limits.
Change-Id: Ie007b3f9d5ebc4e166b50a3a051498af35650f35
[ROCm/hip commit: ebe5054e04]
Git may not be available, and this may not be a git checkout, as would
happen in a release tarball. Doesn't really attempt to get a nicer
version formatting if some of the git subcommands fail.
Change-Id: Ib568cd1310983a43f2664ded72528d7e41f554c0
[ROCm/hip commit: 1983d720c2]
SWDEV-237377 - This fixes time calculation where the event may
be recorded on Null stream and work submitted on other streams
Change-Id: Ie36310dea5cee2fed4a514ed01f04db4b47e571c
[ROCm/hip commit: fb2d7bcd2b]
1. Updated FAQ with shft*sync not supported hip_faq.md
2. Corrected some of input parameter description in hcc_details/hip_runtime_api.h
3. Redirect shfl*() to shfl_*_sync() for nvcc path where CUDA > 9.0
Change-Id: I3d8184db5fcc622852c9bad96b706348e8dfc16c
[ROCm/hip commit: 83b11f9a61]
find_package should now be the only way to import ROCclr. Also update
the build example comment.
The build scripts used 2 custom variables to manually specify the
build and source directories for where to find VDI. Once renamed to
ROCclr, these conflicted with the variables automatically set by
find_package(ROCclr). These hacks tried to satisfy this intermediate
step to try satisfying commit ordering problems to get through PSDB.
The INSTALL.md documentation should also be updated, but it's
completely missing any mention of ROCclr now, and still gives
directions for hcc.
Change-Id: I6fc94b6cb36241a9d4f22d24e49523367f803461
[ROCm/hip commit: a2d2709ec1]
When libamdhip64_static.a is built by Jenkin, sample square cannot been
built successfully because libamdhip64_static.a is archiveved in thin
mode. Thus in the patch it will be archiveved in full mode. Meanwhile
libamdhip64_static_temp.a will be useless and thus removed.
Change-Id: Ifd3882598ef0dc5e7af8db0e389e786025ceb455
[ROCm/hip commit: 470b89a6bf]
This points to the cmake directory where the find module was found,
not a prefix for where it was found.
Based on the search below looking in roctracer, searching in ROCclr
for the header doesn't make much sense. The header should be either
provided by ROCclr xor roctracer. Having it possibly be provided by
two different dependencies is confusing, and a potential source of
version mismatch problems.
Change-Id: Ic2f6ec03f9a7b86225cf7e5c43f39a1360318a34
[ROCm/hip commit: d6aad8ae91]
If the start and stop events have same command internally
then measure command end to command start
Change-Id: Ie70cfa37c06c06573f0ed58dab2bbe4434c1724b
[ROCm/hip commit: 50be95e169]
When the original size is devided accross all GPUs rounding can
occur, causing incorrect validation. Readjust the final value
for comparison to the new size accordingly.
Change-Id: I9b42149e33dfcb328de7419e546a0202a69a8610
[ROCm/hip commit: 20f0e36041]
We need this otherwise ROCr can give us a matching address
for another allocation and doing "insert" in ROCclr will not
update the map with the newest object. We would then end up
using stale objects (yikes)
SWDEV-234992
Change-Id: I3475adf9781a9309d64a024fae45181d7e5afb04
[ROCm/hip commit: a03fee04fe]
In case hipModule(Un)Load is called from different thread as hipInit we need to grab the lock
as both are going to modify modules_
Also add some logging for __hipExtractCodeObjectFromFatBinary in case binary isn't found for GPU
SWDEV-236032
Change-Id: Icbd72b412502df80d5066cea42a4fbcd5b0b8a98
[ROCm/hip commit: f100ae3679]