CPU read updates L2 with the latest values and requires
invalidation after, because SDMA doesn't use L2 and data can become
out of sync.
Change-Id: I98d1c91ca78a103fa5409e638f97485d62d5b11e
When OCL ROCr backend performs CL_MEM_COPY_HOST_PTR it may attempt
to have access to amd::Memory object it's currently creating,
but it's not ready yet. The logic creates a temporary dummy object
to perform a copy transfer. The new change will make sure runtime
skips allocation of the same device::Memory object second time.
Change-Id: I14c6a00a3941fdcaa6aea299e9f096e4c3f5cadf
Use barrier packets for every profile marker that gets submitted
and use the completion signal to get GPU ts. This gives most accurate
dispatch time. Club cache flushes with profile marker if there is a
pending dispatch that needs cache flush. This optimization saves on
extra barrier and helps wall time
Change-Id: Ib62d6d7aabf4743827b561be6c9c5afa813203da
[PAL to KFD/ROCr][ROCr_Runtime][Vega10] OCLSeparateCompile subtest of
oclcompiler from ocltst test package is encountering clLinkProgram()
failed (chksum 0x00000001) error
If runtime does not provide a file name as dump file to ELF library,
ELF library use a temp file in current folder.
The current folder can be not writable for several reasons:
1. The application current folder might be system folder, the user
does not have write permission.
2. The current folder is under a readonly file system. This happens for
embedded customers.
Tested in VEGA10. Issue was fixed.
Change-Id: Ic0e9f040b7c7583914301673cce237ab28b0c0cb
PAL doesn't perform chunking for system memory allocations, hence we
should fall back to using pinned memory for mapping large buffers.
Change-Id: I1b472616b72d12ed0105fb65532acacdb98ac7b3
A device's offset in Pal::AsicRevision could be changed from time to time, while the current implementation assume the offset never changes.
Change-Id: Id993512aa0da6e0b2356f594d5e58f76d1f97f16
OCLTST crashing at oclruntime.OCLKernelBinary for
Tahiti because of deleting on pointer vector which
is however a single pointer. The fix will correct
the wrong deleting in TempWrapper destructor.
Change-Id: Ic5a1387a426c102b085a4ef8ff8ff05e6a870cba
ROCr is now reporting the actual HW addressing limits for HIP, so OpenCL will have to impose lower limit.
Change-Id: I60c2ce27ed1d1f45f16fb76438965a236ba872c6
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command
Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165
We unmap a memory with a different pointer.
ROCr runtime might be confused and silently ignore the unmap request
Change-Id: Ic5a1387a426cf02a985a4ef8ff8ff05e6a870cbf
PAL may internally align up the allocation size to the page size
reported by KMD. This will cause a mismatch in size between OCL and PAL.
To avoid this, use PAL size when updating the free memory counter on
both alloc and free.
Change-Id: Ic6e8c861a52170476474fb70a769eef93be3261f
Enable this optimization when the barrier is disabled, since
reuse requires a signal wait.
Use the size of pending AQL signals as the size of signal pool.
Change-Id: I2754a0f8b67e19d2601c58945e10fdf0e8be1624
On ReBar systems the invible heap is not present, so in theory we should
fail creating the suballocation chunk, however PAL doesn't report any
errors.
To make sure we never fail, allow creating the allocation in the visible
heap and system memory.
Change-Id: Iea9cc68d98b9cb396a2b7a37398b98b66274083b
Now rocm/rocdevice.cpp also includes comgrctx.hpp, and we don't want to statically link against comgr when buidling shared libs.
Change-Id: Ic330bd860559b3e07b776c951afe6126b0f43f7d
This is helpfull to do when debugging issues on lowend asics. Navi14 can be emulated as Navi10. So can Navi22 be emulated as Navi21.
Change-Id: I693ffd45a5b03657822afdc872781901bc69b65c
With the PAL_ALWAYS_RESIDENT flag memory objects are resident at allocation time, no need to make them resident again before submit.
Also we should never evict anything with this setting, or we'll generate a VM fault.
Change-Id: Ieacc6af88ab4e09c20efd94100e148b2502e1d70
The change reuses HSA signals for dispatches as a wait signal.
Skipping the barrier requires to disable L2 cache for sysmem
allocations and extra tracking for HDP access with the large bar.
ROC_BARRIER_SYNC=0 activates the new logic. Barrier sync is
still used by default.
ROC_ACTIVE_WAIT=1 enables unconditional active wait in ROCr.
The change also consolidated ROCr wait logic under single function.
Change-Id: I6bd1be30aa88258da1b1f9de319ef5a45852afd8