If we don't create the __amd_rocclr_gwsInit kernel, we still want
to create the rest of the image related blit kernels.
Change-Id: I8bc4645f9f9116eeecbb8b22e981ac4d520f3121
[ROCm/clr commit: 55a0cf0b0c]
Debug builds fails with error due to missing
parentheses with -Werror=parentheses enabled
Change-Id: I5745a63b5cf2c7a3aeed90ea572081a6fa67e366
[ROCm/clr commit: 5e116c6c99]
ROC_AQL_QUEUE_SIZE will control the size of AQL queue.
The current sefault value is 4096.
Change-Id: Icd2a4ee3ba554c06aa05b08defd922d2c63e43fd
[ROCm/clr commit: 7fe696b6ef]
The original logic was left after initial testing when HMM
couldn't handle xnack properly
Change-Id: I0abf01805704171e931dfba8b6d95bfe87d5fab1
[ROCm/clr commit: d17108e8d0]
Change the scope of hostcall buffer access lock during destruction.
Make sure wait() returns the signal value after timeout. That
matches ROCr behaviour for HSA signal wait.
Change-Id: I3df34207e0c2e21972ec8052777e5742bda1dca0
[ROCm/clr commit: 9a9d10a10b]
Implementation to use a blit kernel to perform
a hipStreamWait/write instead of an AQL packet.
Change-Id: I462671ed5cec37144dfe97ff66439249196117c1
[ROCm/clr commit: cbb8d82bdb]
info_.extensions_ and settings_ are deleted at amd::Device()::~Device().
Change-Id: I06f240a42e5c131dbd4e61a759f905bcdf84b45a
[ROCm/clr commit: f212fc91ca]
Cache coherency layer is OCL feature to support multiple devices in
single OCL context.
Change-Id: Ic66df9551fad5b0c4df95ab3e1db1da259919f25
[ROCm/clr commit: 6da9d18140]
The queue can be destroyed at the time the app will request
the event status. Hence just get the active state from the device.
Change-Id: I887ecb0cfe414c2119247228b0d1255b8308da1e
[ROCm/clr commit: f116959b54]
When unsetting runtime should use HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE
for the agent and not HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE
Change-Id: I3814802d1fb3b72c54e7566defafafed6b0d5cee
[ROCm/clr commit: d8a86e4870]
The original logic left only one slot for HW processing in the queue.
For some reason there is a race condition on CPU overwrite of the slot
before the current active. The workaround is to avoid the previous to
the current active slot for possible unfinished HW processing.
Change-Id: I565495a8feeaedffc9fc8a505edbee5ff5816975
[ROCm/clr commit: 65ddfcc6a8]
std::mem_fun() and std::bind2nd() are removed in c++17. Switch to
simpler logic that does not require those functions.
Change-Id: I19a31f076e1813e367615bd377b424046ce144c7
[ROCm/clr commit: d934612948]
CMake does not provide a way to query the NUMA library, hence we need
to find it manually.
Change-Id: I370b286acdee75cbebc21340da3c432c79f8ffa7
[ROCm/clr commit: dd23379ac8]
std: :mem_fun() is removed in c++17. Simplify logic to not require it.
Change-Id: Ic9a4753b48dd13fcb20cd5b90ff73c3df3211b9f
[ROCm/clr commit: c68f024b35]
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.
Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b
[ROCm/clr commit: 21ba34d0fe]
Add a env var ROC_USE_FGS_KERNARG to toggle kernel arg placement
By default its in Fine Grain Kernel arg segment for supported asics.
Change-Id: I3d57ed69a1a4db2b392b0438ead499f3ddca4716
[ROCm/clr commit: e29b9c00ee]