The queue can be destroyed at the time the app will request
the event status. Hence just get the active state from the device.
Change-Id: I887ecb0cfe414c2119247228b0d1255b8308da1e
[ROCm/clr commit: f116959b54]
When unsetting runtime should use HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE
for the agent and not HSA_AMD_SVM_ATTRIB_AGENT_ACCESSIBLE_IN_PLACE
Change-Id: I3814802d1fb3b72c54e7566defafafed6b0d5cee
[ROCm/clr commit: d8a86e4870]
The original logic left only one slot for HW processing in the queue.
For some reason there is a race condition on CPU overwrite of the slot
before the current active. The workaround is to avoid the previous to
the current active slot for possible unfinished HW processing.
Change-Id: I565495a8feeaedffc9fc8a505edbee5ff5816975
[ROCm/clr commit: 65ddfcc6a8]
std::mem_fun() and std::bind2nd() are removed in c++17. Switch to
simpler logic that does not require those functions.
Change-Id: I19a31f076e1813e367615bd377b424046ce144c7
[ROCm/clr commit: d934612948]
CMake does not provide a way to query the NUMA library, hence we need
to find it manually.
Change-Id: I370b286acdee75cbebc21340da3c432c79f8ffa7
[ROCm/clr commit: dd23379ac8]
std: :mem_fun() is removed in c++17. Simplify logic to not require it.
Change-Id: Ic9a4753b48dd13fcb20cd5b90ff73c3df3211b9f
[ROCm/clr commit: c68f024b35]
For the fillBuffer shader, if there are two 32bit writes to a MMIO
register, it can get dropped. It has to be a single 64bit write.
Add optimization to fillBuffer to write 64bit and 16bit writes.
Change-Id: I3aa78e027898f8ae01e9c8f09004615673720c2b
[ROCm/clr commit: 21ba34d0fe]
Add a env var ROC_USE_FGS_KERNARG to toggle kernel arg placement
By default its in Fine Grain Kernel arg segment for supported asics.
Change-Id: I3d57ed69a1a4db2b392b0438ead499f3ddca4716
[ROCm/clr commit: e29b9c00ee]
PCMark10 counts the time spent in clCreateKernel as part of execution
time, so as workaround for the PAL path, move code object loading
back to clBuildProgram.
Change-Id: I3b9cf1879ece08ab59f447ec165b0525bc8593a4
[ROCm/clr commit: 1d0364e590]
Pass the device agent specified by the user to the ROCr api instead of passing the device agent attached to the specified stream
Change-Id: I86c98935b9dc404eaa6d47ccdd082a8c3678fb36
[ROCm/clr commit: 169cc857fd]
Fixes Seg fault caused when the attribute hipMemRangeAttributeAccessedBy
is queried using hipMemRangeGetAttribute
Change-Id: I2ceb2267d89bfc31a55d9eae2685610c7ad89b1f
[ROCm/clr commit: 48c1b895c0]
Reuse FillMemory function, that should fix the cache syncs from the host
Change-Id: Ieebec5fc3ed3a322b88d5187c8dca4805ec6f84b
[ROCm/clr commit: 24442be35a]
This patch allows to substitute binary for the opencl program. It supposed to be used as:
1. Run the opencl program with -save-temps.
2. Open the cl temp and find the following text in the program header:
Hash to override:
Source: 0xd66bcfa20e69e605
Source + clang options: 0x656a9dd8aedcbfb6
3. Create config file (ascii text) with a pair(s):
<hash> <path_to_binary_to_substitute>
where hash is the hex value from step 2 (without leading 0x), you can use either hash
depending on what you're going to match:
only the source text of the program or along with it's clang options.
4. Set the env variable AMD_OCL_SUBST_OBJFILE to the path of your config file.
5. Rerun the opencl program.
Change-Id: I977c80fe529ea14458194918c6ddfbe2de6a8857
[ROCm/clr commit: 51cc9c2f8c]
Current logic when creating a buffer view will end up going into the
allocation block. Even though no memory will be allocated, since
owner()->getSvmPtr() is already allocated, we'll still end up
calling updateFreeMemory().
Checking if we're creating a view, will skip the SVM allocation logic
and let us fall into the actual view creation logic. This won't end up
updating the free memory counter.
Change-Id: I1c260a9ef57895130b272ea1246e06e812b25b37
[ROCm/clr commit: f167136918]
The new query MemRangeAttribute::CoherencyMode can return current
coherency mode for the provided memory region. Coherency mode can
be one of the following types: FineGrain, CoarseGrain and
Indeterminate
Change-Id: Ib66feeeb14f57a8b1cc731c65bb3d0276d297ff7
[ROCm/clr commit: 992830bab7]