EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- FIx for TC regression after CL#1069020. Move the lock directly to the gsl flush() calls.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#332 edit
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- Add VGPU lock to flush() method, because gsl flush for the same context could be called from multiple threads
- Use new scratchAlloc_ monitor for scratch reallocation
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#455 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#130 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#331 edit
ECR #304775 - Correct a typo where I didnt remove the offset from the condition which made the writeRect take pinning path.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5566/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#330 edit
ECR #304775 - Device enqueuing
- Provide scratch buffer offset for generic address space
- Use single scratch buffer for all available queues. Each queue will have a unique subbuffer in the global buffer
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#454 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#329 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#120 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#63 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#37 edit
ECR #304775 - Use accelerated copy path for read/writeRect if the host memory has offsets. This avoids re-pinning the memory giving nearly a 100% perf boost for such copies.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5371/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#328 edit
ECR #304775 - Device enqueuing
- Use atomic fetch for enqueue flags
- Switch to a multithreaded scheduler
- Add a workaround for Linux host_multi_queue failures. Linux has only 2 queues, but the test allocates multiple host queues and the same HW ring can be used
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#106 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#449 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#325 edit
ECR #304775 - Device enqueuing
- Add L2 cache flush after the scheduler execution. Although CP has to work with L2 cache, it seems some functionality relies on direct memory access and without explicit L2 flush CP can pick old values in the template.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#324 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#35 edit
ECR #304775 - Device enqueuing
- Add printing of the waiting events
- Add early exit in the scheduler if nothing to launch
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#321 edit
ECR #304775 - Device enqueuing
- Match the printed value width with the argument size
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#319 edit
ECR #304775 - Device enqueuing
- Added debug print for the generated child kernels. GPU_PRINT_CHILD_KERNEL=N, where N is the number of child kernels for dump.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#318 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#205 edit