Reduce the size of the queueLock and lastCmdLock critical sections
to improve lock contention performance. The smaller the critical
sections are the better.
lasCmdLock is still needed to guarantee that getLastEnqueueCommand_
can retain the command before it is swapped out and released.
Change-Id: Id35d4a77c035b2da0de4c15568b153d49e958bb7
Replace constexpr with const in kernel source
codes because some kernel compiler doesn't
support constexpr.
Replace scheduler with __amd_rocclr_scheduler
due to name change.
Change-Id: I1ad4ddcdf1df5237b83e1ea2447eb39a19f7dc4a
root cause - cooperative queue is not inserted into queuePool_ (HSA queues) of ROC device calss causing a crash when creating hostcall buffers for printf
Change-Id: I3f9aceb4e5fe6a7c7a2a549a4bb0a3511fe02799
There is no synchronize with relationship between the monitor micro-
lock and the onDeck microlock, so it is possible for an onDeck.load to
move above a contendersList.store, or a contendersList.load to move
above an ondeck.store.
To fix this issue a full memory fence (mm_mfence on x86) is needed
after the last store in the contendersList and onDeck critical regions.
Change-Id: I5beb7dfe0d21010c5bf00cd65d59b9c7af58e919
Fix a typo with the name define, when compilation wasn't enabled.
Force CPU prefetch if system was forced in runtime
Change-Id: Id4b578f9fa44a45426fdb5d8ecb1da803aa42313
The current implementation creates default reference in the stack and assigns it to class member cuMasks_, so whenever the content of the stack changes, cuMask_ would change.
Change-Id: Iefab63c335d504b83c4ae90bd34ae76c6afb8f3c
Optimizaiton to remove extra syncs uncovered a bug with the cache
coherency layer, there runtime could lose the track of mem address
if coherency layer performed a sync.
Change-Id: I25647cfa4a4be9cdbd8577ff076a740bbdac79c8
When HIP_ENABLE_DEFERRED_LOADING=0, many global variables will be
referenced but they are not initialized in that early time. The patch
will use constexpr to initialze global constant varables in compile
time.
Change-Id: I9d538b7abc6a0ce700ec3332b97fc144db5fc1ef
HIP or any ROCm component above HIP may not be calling
hsa-runtime directly. OpenCl and HIP are the two components
calling ROCclr and to bring in the transitive dependency of
thunk,ROCR,amd_comgr it is better to have the dependency
chain set correctly in the ROCclr cmake target. With this
change OpenCl or HIP should not be setting ROCR dependency
directly.
This helps to link OpenCl(libamdocl.so) link statically with
comgr,hsa,thunk.
Change-Id: I0d538b7abc6a0ce700ec3332b97fc144db5fc5ff
If numa lib is in building system, define ROCCLR_NUMA_SUPPORT to
support numa; otherwise, don't support numa.
Change-Id: I3848d7fdec5a3813ff1edad9b71ff04372dc0b9a