Test window will quit or show error message when run OpenCL API.
Add a workaround for the race condition with the first page
during pinning.
Change-Id: I9a27b4e173cf94c84aefcb94e255f11169453d94
[ROCm/clr commit: ab85674f8a]
- Check the queue for nullptr, since the user events may not have
a queue, associated with them
Change-Id: Ib969a052acc9108ca3fd0c063157fe4d47c5b244
[ROCm/clr commit: 288967eff4]
The printf call in the device code is expanded by the compiler into a
series of hostcalls that together form a "message". This change
introduces the following functionality in the runtime:
1. Receive a generic message consisting of a series of hostcalls.
2. Process a printf message.
Change-Id: I9d667d6f91607a907a96e46cc5fca55734339747
[ROCm/clr commit: ce4a34bc71]
Each WGP consists of 2 CU, so the number of available SIMD units is doubled.
Change-Id: I43978a8a9139c33f5f776b344a36bee927cc187d
[ROCm/clr commit: e76d867740]
~45% to 50% of Performance drop on rocBLAS_int8 test
Add support for active waits without blocking the host thread.
Change-Id: Ie7bb48dcafcb4c93d448bf74749b829b626c3578
[ROCm/clr commit: 0fc433e076]
cl_bool needed to be replaced with uint32_t instead of bool. This is due to cl_bool being a typedef of cl_uint32.
Currently clGetDeviceInfo() reports incorrect size for the return value, due to cl_bool being 4 bytes and c++ bool being 1 byte.
Change-Id: I647a4b8873627059865c84c8ca27694dbc0916de
[ROCm/clr commit: 243a3c2aa4]
Add MS HWS support. PAL reports just one compute engine
in that mode and runtime needs extra logic to detect RT queues.
Change-Id: I011f1f1b18dec6a7195a4f1fe939f8029bc269ae
[ROCm/clr commit: 622c714165]
~45% to 50% of Performance drop on rocBLAS_int8 test
Use the last command in the queue for a wait.
Add extra print information about processed commands.
Add an option to disable file location printing.
Change-Id: I4187883e1a90e571fde3128af98368108fda8785
[ROCm/clr commit: a66d09f5a3]
When we're aligning rowPitch to imagePitchAlignment, rowPitch is in pixels,
but imagePitchAlignment_ is bytes, so we end up overaligning the pitch.
Convert imagePitchAlignment_ to pixels before doing any logic.
Change-Id: Ia5ab9d54bed150fe974e86b060dbadc196165b29
[ROCm/clr commit: 696d00e71b]
hip_threadfence_system passes locally with this change. This also fixes
hipHostMalloc() failures when hipHostMallocMapped flag is used.
Change-Id: Id412efe502accc7c6e7676b52c05ccb9d8fbbe67
[ROCm/clr commit: 5de65ba4a0]
Remove a workaround to CS_PARTIAL_FLUSH added in CL#1495187,
since PAL is no longer uses CS_PARTIAL_FLUSH.
Change-Id: I03edc7595459e19aad33b2b0901f0ebe4754d310
[ROCm/clr commit: 1d25343af8]
[hipclang-vdi-rocm][perf]~45% to 50% of Performance drop on
rocBLAS_int8 test
- Enable AMD_OPT_FLUSH optimization by default to match HCC
- Disable CPU writes to GPU memory on boards with large bar,
because it requires HDP flush tracking.
- Enable L2 cache on kernel arguments, because L2 will be
invalidated on memory reuse .
Change-Id: I124cf250bdd4d19c523ce542c163813828f8fbdc
[ROCm/clr commit: 374f612b7c]