Use barrier packets for every profile marker that gets submitted
and use the completion signal to get GPU ts. This gives most accurate
dispatch time. Club cache flushes with profile marker if there is a
pending dispatch that needs cache flush. This optimization saves on
extra barrier and helps wall time
Change-Id: Ib62d6d7aabf4743827b561be6c9c5afa813203da
[ROCm/clr commit: 59c6cb0268]
[PAL to KFD/ROCr][ROCr_Runtime][Vega10] OCLSeparateCompile subtest of
oclcompiler from ocltst test package is encountering clLinkProgram()
failed (chksum 0x00000001) error
If runtime does not provide a file name as dump file to ELF library,
ELF library use a temp file in current folder.
The current folder can be not writable for several reasons:
1. The application current folder might be system folder, the user
does not have write permission.
2. The current folder is under a readonly file system. This happens for
embedded customers.
Tested in VEGA10. Issue was fixed.
Change-Id: Ic0e9f040b7c7583914301673cce237ab28b0c0cb
[ROCm/clr commit: 6327dbc4cc]
PAL doesn't perform chunking for system memory allocations, hence we
should fall back to using pinned memory for mapping large buffers.
Change-Id: I1b472616b72d12ed0105fb65532acacdb98ac7b3
[ROCm/clr commit: b4e212a0f9]
If deferred allocation is disabled, then make sure the image view
is created without a delay. Also reset the allocation state, since
create() method isn't called for a view creation.
Change-Id: I7aa22a62bff18289ade83e56b5d3305ba68c715b
[ROCm/clr commit: 089a5cc4ad]
The hack dosn't really track the commands status. It may be not
necessary for HIP, but will cause early resource release.
Change-Id: I791ad36dd8abd3b6b3d2c9b16a210a555c08ca64
[ROCm/clr commit: 532f0ae951]
A device's offset in Pal::AsicRevision could be changed from time to time, while the current implementation assume the offset never changes.
Change-Id: Id993512aa0da6e0b2356f594d5e58f76d1f97f16
[ROCm/clr commit: b1d75637bd]
OCLTST crashing at oclruntime.OCLKernelBinary for
Tahiti because of deleting on pointer vector which
is however a single pointer. The fix will correct
the wrong deleting in TempWrapper destructor.
Change-Id: Ic5a1387a426c102b085a4ef8ff8ff05e6a870cba
[ROCm/clr commit: 6a6faf1d58]
ROCr is now reporting the actual HW addressing limits for HIP, so OpenCL will have to impose lower limit.
Change-Id: I60c2ce27ed1d1f45f16fb76438965a236ba872c6
[ROCm/clr commit: 1da0fe4263]
OCL can't distinguish different copy types, but ROC profiler
expects SDMA transfer visibility. Add extra code to detect
a transfer with the host memory and substitute OCL command
Change-Id: I5290acd0e10bc082e00c1d4ae1474a075de7f165
[ROCm/clr commit: bd340d8cbf]