SWDEV-78299 - Back out changelist 1236441 since OCLCreateBuffer fails.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#541 edit
SWDEV-78299 - [Brahma] Setting max single allocation size by comparing cardMemAvailableBytes with cardExtMemAvailableBytes on Brahma.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#540 edit
SWDEV-68792 - [OpenCL][HWS/CWSR] Adding support for Hardware Scheduler and Compute Wave Save restore (CWSR) feature on ORCA
Adding a temporary w/a for a CP uCode bug in HWS mode. Due to this bug, CP uCode loops through a RUNLIST unless there is a submission on all queues in HWS mode. This causes some overhead and performance drop in PCMark8 on CZ in HWS mode. To work around this issue, it was suggested to submit a dummy packet during initialization on all available queues on HWS mode so that CP uCode can break the loop. This w/a should be removed once CP uCode provides a final fix for this issue.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/9616/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#160 edit
SWDEV-86885 - [CQE OCL][2.0][QR][CFX] Few SDK 64 and 32 bit Samples resulting in Soft/hard hangs with faulty cl: 1233743
Unsubmit CL1233743 because of problems with CFX.
I have not been able to figure out how Cl1233743 would cause a problem. It sets a flag, to disable new code... I would expect the new code to be the problem, not the disable.
So, in case this unsubmit does not elminate the problem, CL1233686 is shelved, to use #ifndef CAL_SUPPORT around the new code in CL1226184.
CL1233743 was submitted for:
SWDEV-86253 - [QR] 6 to 7% performance drop is observed in BasemarkCL test
CL1226184 adds serialization to LHIO, because pxproxy accesses global state, without locks, leading to crash, etc.
To fix OpenCL perf regressions, allow unserialized access to pxproxy. It may be that OpenCL design leads away from racy behavior, so it may be safe. This was checked in to gather information. What software gets perf drops? Does any software get fixed?
The dangerous functions are CreateDevice, CreateContext, because the handles from the OS are saved in global caches, and if those global caches are modified in one thread, while other threads are looking-up from the caches - that is a problem.
CL1226184 was submitted for:
SWDEV-80442 - [QR][Adobe Premier Pro CS6] TDR/App Crash observed while resizing the video window within workspace
TC: http://ocltc:8111/viewModification.html?modId=66278&personal=true
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#159 edit
SWDEV-86253 - [QR] 6 to 7% performance drop is observed in BasemarkCL test
CL1226184 adds serialization to LHIO, because pxproxy accesses global state, without locks, leading to crash, etc.
To fix OpenCL perf regressions, allow unserialized access to pxproxy. It may be that OpenCL design leads away from racy behavior, so it may be safe. This was checked in to gather information. What software gets perf drops? Does any software get fixed?
The dangerous functions are CreateDevice, CreateContext, because the handles from the OS are saved in global caches, and if those global caches are modified in one thread, while other threads are looking-up from the caches - that is a problem.
TC: http://ocltc:8111/viewModification.html?modId=66278&personal=true
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#158 edit
SWDEV-85649 - The return of owner() needs to be casted to get amd::Image.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#125 edit
SWDEV-79308 - Use 64-bit to calculate the scratch buffer size for OCL. We observed that the computed scratch buffer size could be > 4G when compilier optimization option is not used.
Cross branch change - requires CL1231547.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#157 edit
SWDEV-85602 - rename hsail-64 arch to hsail64
This is to match other existing llvm targets, such as spir64 and amdil64, as well as to match behavior of open source HSAIL BE.
For legacy users there is alias "-hsail-64" provided in the aoc2 only.
Testing: smoke, precheckin
Reviewed by Matthew Arsenault, Evgeny Mankov and Nikolay Haustov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/backends/common/codegen.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/backends/common/frontend.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/utils/v0_8/target_mappings.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#66 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend.cpp#37 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/target_mappings.h#37 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILTargetMachine.cpp#53 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/TargetInfo/HSAILTargetInfo.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/aacl/aa.h#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/amp_libm/build/Makefile.amp_libm#4 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/gcn/build/Makefile.gcn#20 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/gcndev/build/Makefile.gcndev#3 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/build/Makefile.hsail#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpucompiler.cpp#153 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#222 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsacompiler.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#25 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/complib.tlst#21 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/ocl_debug.tlst#9 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/ocl_regression.tlst#25 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/spir/SPIRBase.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/spir/SPIRVBasic.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/spir/SPIRVDropIn.cpp#5 edit
SWDEV-79308 - Resubmit of CL1228064 with restriction of mininum scratch buffer size of 64K if a scratch buffer is needed.
Reduce the total scratch buffer size by a factor of 4, which in effect reducing the max. scratch waves from 32 to 8, to avoid the required total scratch buffer size exceeds the available local memory.
Made sure the scratch buffer size is aligned with 64K boundary
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#235 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#156 edit
SWDEV-77172 - Choose isa handle on CZ based on whether SVM is supported or not as SVM may need additional SC Options to be passed as default.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/9531/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#220 edit
SWDEV-79308 - Resubmit of CL1226881 with the fix of the SC sanity check issue. Reduce the total scratch buffer size by a factor of 4, which in effect reducing the max. scratch waves from 32 to 8, to avoid the required total scratch buffer
size exceeds the available local memory.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#233 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#154 edit
SWDEV-77172 - Disable ThreadTrace on SVM as its causing hang until a solution can be found.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/9502/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#341 edit
SWDEV-79308 - Back out changelist 1226881
Causes failures in execution model, math and pipes
http://ocltc.amd.com:8111/viewLog.html?buildId=14142599&tab=buildResultsDiv&buildTypeId=TestsOpenCLScSanity_BonaireConformanceWin764bit
Reduce the total scratch buffer size by a factor of 4, which in effect reducing the max. scratch waves from 32 to 8, to avoid the required total scratch buffer
size exceeds the available local memory.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#153 edit
SWDEV-79308 - Reduce the total scratch buffer size by a factor of 4, which in effect reducing the max. scratch waves from 32 to 8, to avoid the required total scratch buffer
size exceeds the available local memory.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#152 edit
SWDEV-84309 - Using agpMemAvailableCacheableBytes instead of agpMemAvailableBytes when calculating free memory for viPlus_ apu.
When memory allocation is in system memory, only agpMemAvailableCacheableBytes is changed
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#539 edit
SWDEV-77584 - Fix ocl_conformance compiler failures.
1. If compiling in debug mode, linkImpl wasn't called and kernelNames wasn't set
which led to CL_INVALID_KERNEL_NAME errors in debug configs in TeamCity. Looking at AMDIL
code, there is no reason to skip linkImpl in debug mode.
2. Set types to TYPE_LIBRARY/TYPE_EXECUTABLE. This fixes ocl_conformance compiler program_binary_type.
Reviewed by: Evgeniy Mankov
Testing: smoke, pre-checkin, OCLSeparateCompile.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#218 edit
SWDEV-66693 - OpenCL Runtime HW Debug support development - use flag, instead of getenv() call, in IOL to indicate the enablement of HW Debug.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#151 edit
SWDEV-83467 - [SPIRV] Add support of SPIRV to CPU
Modifying runtime and compile time to allow SPIRV binaries to run on CPU since it only runs on HSAIL GPU
Added changes to allow conversion of CPU's llvmBinaryIsSpir boolean into compiler library's oclElfSections enum
Cpuprogram.cpp's llvmBinaryIsSpir flag renamed to elfSectionType will now support LLVMIR, SPIR, and SPIRV
Added SPIRV to compiler lib's elf as new oclElfSections enum
cpuprogram.cpp changes also made to gpuprogram.cpp's NullProgram to allow compilation
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/loaders/elf/elf.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/loaders/elf/elf.hpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/cpu/cpuprogram.cpp#69 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#191 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#266 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpucompiler.cpp#152 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#217 edit
SWDEV-82296 - [CQE OCL][2.0][HWSC][16.10]SDK Sample "AtomicCounters" 32/64bit failed with HWSC driver
Disabling the cl_ext_atomic_counters_32 extension since there is no support for this extension on HSAIL and HWS.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/9221/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#265 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#538 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsasettings.cpp#11 edit
SWDEV-82256 - Limit the workaround for Win 7 only because KMD has fixed TDR issue on Win 8.1/10
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#336 edit
SWDEV-82205 - Increased workloard to pass this test.
- This is workaround because KMD don't have solution to fix TDR issue yet in 15.30.
- This workaround including CL#1201765 should be reverted once KMD has a fix
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#335 edit
SWDEV-80061 - Copy flag HostMemoryDirectAccess from parent to view
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#124 edit
SWDEV-80450 - Fix the issue of app context reference count > 0 after app termination by using device context for the mapped buffer/image resource.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#155 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#123 edit
SWDEV-80874 - fixed out of bound access to the printf format string
We do not really need two separate induction variables, pos and i, and we had a bug of not incrementing i as needed.
The only reason it used to work is because all strings we used for testing ended with '\n'.
The bug resulted in ignoring this '\n', but the code unconditionally adds '\n', so nobody noticed.
If you try to print anything having any other escape, '\n' not at the end, or a colon, there will be assertion.
That is fixed, and newline now is only added if last symbol in user's format was not newline, because otherwise
we would now print 2 new lines. NB, I prefer to use bool variable rather then addressing last symbol of the string
which could be empty.
A side node, why do we run flex scanner past the last colon? If we do not we would not need this double encoding at all.
Testing: smoke, precheckin, conformance printf with HSAIL forced, custom test
Reviewed by German Andreev
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#309 edit
SWDEV-77584 - ORCA RT: Preparations for enabling HSAIL on OpenCL 1.2 by default. Integrate new algorithm for device program choice.
[Reasons]
1. Make the switching change as less as possible.
2. Give a chance to test HSA_foundation device work on OCL 1.2 beforehand (asked by Nikolay).
Almost already reviewed:
http://ocltc.amd.com/reviews/r/8850/
Additionally:
1. Linking logic was changed: if the target of one of the binaries is hsail-(64) linking goes through HSAIL, otherwise - through AMDIL. Previously -cl-std=CL2.0 in any of the linking binaries was a criterion for HSAIL, what will be wrong for HSAIL 1.2 after switching. -clang & -edg options are set now to distinguish the path while linking.
2. -cl-std=CL2.0 as a criterion for HSAIL was returned back in isHSAILProgram() method; -clang & -edg options were also added as a criterion.
[ToDo] After enabling HSAIL by default remove -cl-std, -clang & -edg checks from the code.
[Testing] Pre-checkin
http://ocltc.amd.com:8111/viewModification.html?modId=61929&personal=true&buildTypeId=&tab=vcsModificationBuilds&show_all_builds=true
[Reviewers] German Andryeyev, Nikolay Haustov
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_program.cpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/cpu/cpudevice.cpp#279 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/cpu/cpudevice.hpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#261 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#534 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#154 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsadevice.cpp#47 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsadevice.hpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/program.cpp#76 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/program.hpp#38 edit
SWDEV-79957 - use system memory to calculate the largest available memory size on Linux APU system.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#533 edit
SWDEV-77172 - IOMMUv2 changes for Windows 10
- Clear passing SVM flag from top level and fix GL interop on SVM
- Add\Remove gpuvmOffset before WDDM calls as its added manually for SUA model
ReviewBoardURL = http://ocltc.amd.com/reviews/r/8914/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#230 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDeviceGL.cpp#25 edit
SWDEV-80874 - fixed staging buffer overflow with HSA printf
Staging buffer is ~2 times smaller than allocated printf buffer, so if amount of data in printf buffer exceeds the size of the staging buffer
we hit assertion in the memory copy. To hit the assertion that is enough to print 2 integers with 64K workitems.
Added loop to read printf buffer into staging in portions.
Testing: smoke, precheckin, conformance printf with HSAIL forced, custom tests
Reviewed by German Andreev
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprintf.cpp#41 edit
SWDEV-80874 - Fixed ORCA RT HSA printf buffer indexing issues
The format of the buffer is: printf_id, <arg1>, <arg2>, ...
The RT did not advance index for printf_id field, so for example for a format string "%d" we have been printing printf_id instead of actual argument for every other string.
The other issue is that outputDbgBuffer is adjusting its last argument (idx) by the number of consumed DWORD values,
but PrintfDbgHSA::output() is also ajusting dbgBufferPtr, so we had adjustment done twice, printing only half of the actual data and then printing zeroes from the buffer.
The resolution for both is to always pass 1 as index to outputDbgBuffer(). 1 because 0 is printf_id.
Testing: smoke, precheckin, conformance printf with HSAIL forced, custom tests
Reviewed by Brian Sumner and German Andreev
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprintf.cpp#40 edit
SWDEV-80864 - HSAIL Metadata Workgroup Size Hint and Vec Type Hint added to HSAIL Runtime
Runtime changes required for the use of these two metadata:
- Runtime's gpukernel.cpp requires new aclQueries during HSAILKernel::Init
- One for quering WorkGroupSizeHint's array
- Two for size of VecTypeHint and fetching VecTypeHint's string
- initArgList needs to be moved to end of HSAILKernel::init to allow createSignature to get non empty values
- Compiler lib's workgroup hint (wsh) needs to match runtime's type (size_t)
- In Kernel constructor, instead of using memset which corrupts std::string, specifically set default workGroupInfo struct's variables
Also fixed wavesPerSimdHint to use size_t to match runtime.
Updated CLAssumptionCheck.cpp since aclMetadata structure was modified.
Note: This is the runtime counterpart to submitted CL#1204512. (Post Review#8808, SWDEV-79695)
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/include/v0_8/aclStructs.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclStructs.h#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#260 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#308 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLAssumptionCheck.cpp#48 edit