EPR #409950 - [IV][OCL] Multiple OCL samples crashed on multiple machines for 32-bit OS.
There are two issues:
1. the SC dll should be dynamically loaded only when it is available. This is to allow apps to run on CPU device without the SC dll. This CL fixes it. It also allows user to use env var AMD_OCL_SC_LIB to provide the name or complete path of SC dll to load.
2. The test fails because amdhsasc.dll is not included in base driver for 32 bit OS. The proper solution should be ask package team to include amdhsasc.dll in the base driver. Also amdhsasc.dll should be renamed amdoclsc.dll since it is not only used for HSAIL but also used by AMDIL. The benefit of separate SC component as a shared library is decreased build time since changes in SC does not require rebuild of amdocl.dll, and ease of debugging and regression analysis by allowing swapping SC comopnent.
However since 15.10 branch is close, there is not enough time to make changes to package. Therefore this CL implements a workaround for this issue without change to the package. We will implement the proper fix in the next relase.
The workaround implemented by this CL embeds SC statically in amdocl.dll. The runtime loads SC dll specified by env var AMD_OCL_SC_LIB only if it is available. If the SC dll is not available, it will use the embeded SC.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/build/Makefile.api#96 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/api/v0_8/acl.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/api/v0_8/aclLoaders.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/Makefile#44 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sclibdefs.opencl#20 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclStructs.h#13 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclTypes.h#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/build/Makefile.aoc2#21 edit
... //depot/stg/opencl/drivers/opencl/opencldefs#148 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#485 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#220 edit
EPR #407215 - reset host memory pointer of a image view based on original image
ReviewBoard #6301
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#480 edit
EPR #408459 - changed the implementation of svmAlloc, so that the first device can create amd::Memory object, and the rest of devices only added gpu memory to it. This is part of changes for mgpu support for svmalloc
code review:
http://ocltc.amd.com/reviews/r/6245/
precheckin testing results:
http://ocltc.amd.com:8111/viewModification.html?modId=43136&personal=true&buildTypeId=&tab=vcsModificationTests
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/cpu/cpudevice.hpp#88 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#233 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#479 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsadevice.cpp#87 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsadevice.hpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsadevice.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsadevice.hpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/context.cpp#34 edit
EPR #408185 - Use pinned memory if directaccess is true and remoteAlloc is used.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#478 edit
EPR #408506 - Extended the reported global memory size(CL_DEVICE_GLOBAL_FREE_MEMORY_AMD) to include a portion of remote memory for APU
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#476 edit
ECR #304775 - add flag to force CL_FP_DENORM on gpu
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#475 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#216 edit
EPR #407469 - disabled the SVM fine grain buffer support for CZ on mainline
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#472 edit
EPR #405824 - On apus, if we run out of local memory to allocate cl_mem objects, ocl runtime will use remote (system) memory. Update maxMemAllocSize_ to include that.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#469 edit
EPR #406328 - changed the device open algorithm so that the we only open the first OpenCL device. This is the OPENCL runtime changes, but this will be removed once we implemented multiple device support for SVM.
the code review and precheckin test:
http://ocltc.amd.com/reviews/r/5942/http://ocltc.amd.com:8111/viewModification.html?modId=40902&personal=true&buildTypeId=&tab=vcsModificationBuilds&show_all_builds=true
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#466 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#287 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.hpp#87 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/include/cal/cal.h#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#40 edit
EPR #406110 - OCL20:Basic subtest fails when running on GPU
- Reduce max prog variable size to 90% of max single allocation
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#465 edit
EPR #405824 - Back out changelist 1079967. It causes regressions in other apus.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#464 edit
EPR #405824 - On apus, if we run out of local memory to allocate cl_mem objects, ocl runtime will use remote (system) memory. Update maxMemAllocSize_ to include that.
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#463 edit
EPR #397491 - disable OpenCL 2.0 for mainline when there are multiple devices in the system, because svm test will fail even test on the first device
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#462 edit
EPR #397491 - only using OpenCL 1.2 onr multiple GPUs on mainline by default.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#460 edit
ECR #304775 - Add batching to the device enqueue for possible asynchronous execution
- Increase the max device queue size to 512KB. That will allow to pass conformance tests that enqueue more jobs than the queue size.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#459 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#333 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#65 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#39 edit
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- Add a new MapCacheLock monitor to separate the map cache from the global lock
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#456 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#131 edit
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- Add VGPU lock to flush() method, because gsl flush for the same context could be called from multiple threads
- Use new scratchAlloc_ monitor for scratch reallocation
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#455 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#130 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#331 edit
ECR #304775 - Device enqueuing
- Provide scratch buffer offset for generic address space
- Use single scratch buffer for all available queues. Each queue will have a unique subbuffer in the global buffer
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#454 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#329 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#120 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#63 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#37 edit
ECR #304775 - Device enqueuing
- Use atomic fetch for enqueue flags
- Switch to a multithreaded scheduler
- Add a workaround for Linux host_multi_queue failures. Linux has only 2 queues, but the test allocates multiple host queues and the same HW ring can be used
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#106 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#449 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#325 edit
ECR #304775 - OpenCL version should be 2.0 for only CI+
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5311/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#448 edit
EPR #397491 - enabled the svm fine grain buffer for stg, disabled for mainline
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#446 edit
EPR #304775 - temporarily disable the SVM fine_grained_buffer support for OpenCL 2.0 on discrete GPUs, because the feature is supposed to release in 14.50. After the 14.40 is branched, we will enable it again on stg.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#445 edit