SWDEV-116690 - disable passing of -cl-fast-relaxed-math on ORCA path only
This is the w/a for bogus accurancy expectations of flopscl.
Testing: flopscl, precheckin
Reviewed by Brian Sumner and Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#40 edit
SWDEV-109533 - AMDIL: increase inline cost threshold from 400 to 14000
This is the w/a to allow Blender work on SI device.
Testing: precheckin
Reviewed by Boleslaw Ciesielski
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/utils/OPTIONS.def#6 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#135 edit
SWDEV-107271 - [OpenCL][GFXIP9 Bring up] add support for Raven(gfx901)
- Add case for gfx901 else target device is incorrect for finalizer
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#23 edit
SWDEV-105122 - Changing Baffin, Ellesmere and Lexa ISAtype, it has to be 803
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#21 edit
SWDEV-104875 - ROCm/HSA: use the finalizer from source tree
Right now ROCm/HSA uses finalizer from the HSA RT installed.
This finalizer version has outdated stale SC sources.
At the same time source tree has fresh finalizer sources matching ORCA.
The offline tool amdhsafin is built from that sources.
This change switches from HSA RT finalizer to the in tree finalizer.
Testing: precheckin
Reviewed by Laurent Morichetti and Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#39 edit
SWDEV-93545 - HSA HLC: target option interface between complib and BE and denorm control refactoring
Global variables and associated options to control target GPU for optimizations and fp32 denorm support are removed.
Instead standard llvm -mcpu=<cpu> is used to pass chip family name and fp32 denorm is turned into a subtarget feature.
Subtarget feature can be set for llc as standard -mattr=+fp32-denormals and corresponding code to pass feature string
to the BE is added to the compiler lib, mimicing what we used to have for AMDIL.
Device name HSAIL metadata will now reflect an actual GPU family passed to the HSAIL BE instead of "generic".
Denorm support can be switched on as a feature bit in the target mapping. It is on starting from VI. However, just
switching this bit for a family will not produce denorm supporting code. The option -cl-denorms-are-zero can be used
to override this and runtime passes it for configs where CL_FP_DENORM is not reported.
Currently CL_FP_DENORM is not reported for any device, however it can be changed with AMD_GPU_FORCE_SINGLE_FP_DENORM
environment variable. If set it will be honored only starting from VI as set in the target mapping.
Implemented isFMAFasterThanFMulAndFAdd to handle use of v_fma_f32 on GFX9 instead of a direct chip family check.
Testing: smoke, precheckin
Reviewed by Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#146 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/opt_level.cpp#30 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#27 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/target_mappings.h#42 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/target_mappings_hsail.h#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/target_mappings_hsail64.h#27 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/BRIGAsmPrinter.cpp#155 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/BRIGAsmPrinter.h#69 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAIL.td#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelDAGToDAG.cpp#71 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelLowering.cpp#116 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelLowering.h#29 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILInstrInfo.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILInstrInfo.h#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILInstructions.td#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILKernelManager.cpp#55 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILSubtarget.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILSubtarget.h#15 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILTargetMachine.cpp#57 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#29 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/llc_opt.tlst#97 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/ocl_features.tlst#58 edit
SWDEV-93545 - HSA HLC: produce v_fma_f32 instead of v_mad_f32/v_mac_f32 on GFX9 if denorms are supported
1. Added means to know HW target for HSAIL BE.
2. Simplified compiler lib logic in handling target capabilities.
3. Used target info to fuse mul/add into fma_f32 on GFX9.
Previously it was disabled because mad/mac always flush. v_fma_f32 does not flush,
but only fast starting with GFX9.
Testing: smoke, precheckin
Reviewed by Nikolay Haustov and Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#69 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#142 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#17 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#26 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAIL.h#42 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILFusion.td#29 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelDAGToDAG.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelLowering.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILInstructions.td#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILTargetMachine.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/llc_opt.tlst#95 edit
SWDEV-90709 - Complib: unquote command line arguments for -I and -D before passing to clang
Testing: smoke, precheckin
Reviewed by Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#36 edit
SWDEV-77584 - HSA HLC: refactoring of min/max processing and folding
1. Fixed correctness bug: if a source contains code like (x > y) ? x : y, HLC was folding
this and similar patterns to min and max instructions. The problem is with NaN handling.
Such a pattern may return NaN if one of two arguments is a NaN. All our instructions return
a number in this case, except for gcn instruction returning a qNaN if input is sNaN.
For a qNaN a number is retuned in any way. Therefor such folding is only correct if NaN handling
is disabled. Patterns are predicated to work with -cl-finite-math-only or -cl-fast-relaxed-math
which includes the former option.
NB: Performance regressions are expected in programs which do not use either of these options.
2. Compiler lib did hot handle -cl-finite-math-only. Also added handling of -cl-no-signed-zeros,
even though it does not affect code generation because there is no llvm counterpart for it.
3. Patterns for NaN agnostic comparison codes are added. We are getting these in case if finite
only math is requested.
4. Removed patterns for __hsail_min_f* and __hsail_max_f*. Instead these intrinsics are lowered
to fminnum and fmaxnum llvm operations with the same semantics. This allows to decrease the number
of patterns and simplify handling.
5. For f32 we were only producing gcn versions min and max with source patterns if gcn is enabled.
Added similar lowering to standard min/max HSAIL operations if gcn is disabled.
6. Added lowering of fmaxnum/fminnum to more efficient gcn operations if gcn is enabled.
Neither OpenCL nor LLVM IR semantics are violated by this.
7. Moved GCN media intrinsics definitions into the GCN directory.
8. Added folding of gcn f32 instructions min(max), min(min), max(max) into corresponding gcn
instructions med3, min3 and max3. This should have been helpful for color clamping.
Performance testing showed these are slow, however. T-Rex test from compubench has slowed down
by 50 times for no obvious reason. Therefor folding is disabled by default. The option -enable-gcn-mm3
is added to enable the folding for testing purposes.
Testing: smoke, precheckin, luxmark, compubench, BasemarkCL,
conformance: commonfns, bruteforce -w, relationals, select
Reviewed by Brian Sumner
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/opt_level.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#35 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/GCN/HSAILArithmetic.td#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/GCN/HSAILFusion.td#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/GCN/HSAILIntrinsics.td#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILArithmetic.td#45 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILFusion.td#28 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelDAGToDAG.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelLowering.cpp#113 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILInstrInfo.td#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILIntrinsics.td#70 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/src/llc/opt/minmax/minmaxf3pat.cl#1 add
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/llc_opt.tlst#93 edit
SWDEV-86836 - Enhance caching library class to prepare one-stage kernel caching by:
0. Moving cache storage setup into constructor
1. Controlling cache storage size
2. Explicit cache cleanup
a. -kcache-wipe is off by default; when turned on, the caching directory would be wiped off
b. Here it's just an option. The implementation (the call of wipeCacheFolders()) will be added in the compiler library
3. Enforcing cache miss (actual compilation enforcing and adding a new entry to the cache storage).
a. -kcache-enforce-miss is off by default; when turned on, the real compilation will be enforced
b. Here it's just an option. The implementation will be added in the compiler library
ReviewBoardURL = http://ocltc.amd.com/reviews/r/9726/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#134 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/caching/cache.cpp#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/caching/cache.hpp#7 edit
SWDEV-85602 - rename hsail-64 arch to hsail64
This is to match other existing llvm targets, such as spir64 and amdil64, as well as to match behavior of open source HSAIL BE.
For legacy users there is alias "-hsail-64" provided in the aoc2 only.
Testing: smoke, precheckin
Reviewed by Matthew Arsenault, Evgeny Mankov and Nikolay Haustov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/backends/common/codegen.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/backends/common/frontend.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/utils/v0_8/target_mappings.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#66 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend.cpp#37 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/target_mappings.h#37 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILTargetMachine.cpp#53 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/TargetInfo/HSAILTargetInfo.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/aacl/aa.h#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/amp_libm/build/Makefile.amp_libm#4 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/gcn/build/Makefile.gcn#20 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/gcndev/build/Makefile.gcndev#3 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/build/Makefile.hsail#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpucompiler.cpp#153 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#222 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsacompiler.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#25 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/complib.tlst#21 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/ocl_debug.tlst#9 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/ocl_regression.tlst#25 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/spir/SPIRBase.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/spir/SPIRVBasic.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/spir/SPIRVDropIn.cpp#5 edit
SWDEV-80173 - HSA HLC: disable liveness analysis and jump threading
After the investigation I have found liveness analysis never changed code generation in any of the benchmarks or applications.
Its only use is in the LICM and the hoisting limitation was never really triggered.
Since the analysis is very expensive I'm disabling it.
The jump threading is generally bad on the GPU because it creates unstructured control flow.
Even if hsail might become smaller and have less branches, it does not help because finalzier's structurizer will have to clone blocks.
Jump threading is disabled for GPU. This improves compilation speed and just slightly improves performance.
Testing: smoke, precheckin, vray and blender compilation
Reviewed by Daniil Fukalov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#133 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#29 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Analysis/AMDLiveAnalysis.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDPassManagerBuilder.cpp#61 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#29 edit
SWDEV-77584 - Compiler Lib: Preparations for enabling HSAIL on OpenCL 1.2 by default. Adding -legacy and -binary_is_spirv.
-legacy option will be used for forcing AMDIL path after switching HSAIL by default for OpenCL.
-binary_is_spirv option will be used for indicating that the binary is constructed from SPIRV.
[Testing] pre-checkin:
http://ocltc.amd.com:8111/viewModification.html?modId=61541&personal=true&buildTypeId=&tab=vcsModificationBuilds&show_all_builds=true
[Reviewer] Stanislav Mekhanoshin
http://ocltc.amd.com/reviews/r/8850
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/utils/OPTIONS.def#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#132 edit
SWDEV-2 - Change OpenCL version number from 1898 to 1899.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1645 edit
SWDEV-77584 - HSA HLC: fixed reflection metadata generation on HSAIL OCL 1.2 path
We are producing 6 extra arguments, but metadata was produced only for 3.
Removed KE_OCL12_NUM_ARGS define to avoid confusion.
Testing: smoke, precheckin
Reviewed by Yaxun Liu
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDOpenCLKernenv.h#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDInsertOpenCLKernenv.cpp#10 edit
EPR #425389 - Back out changelist 1181925
Although the compiler library sources are split, the build does not yet use this, so the wrong default value is being used for AMDIL vs. HSAIL
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#130 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#28 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/Transforms/IPO/AMDOptOptions.h#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDOptOptions.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDPassManagerBuilder.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#26 edit
ECR #354633 - SPIR-V: Let aoc2 load and save SPIR-V.
E.g.
aoc2 -march=hsail-64 -cl-std=CL2.0 -srctospv testReadf.cl
compile a cl to SPIR-V binary and save it as .spv
aoc2 -march=hsail-64 -cl-std=CL2.0 -spirv work_group_any.spv
load a SPIR-V binary and compiles it to ISA and save it to elf in .bin
Changed the option for round-trip translation of SPIR-V to -round-trip-spirv.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/api/v0_8/acl.cpp#35 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend_clang.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclEnums.h#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#129 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#76 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLEnumCheck.cpp#47 edit
ECR #304775 - Remove HLC_Unroll_* variables.
HLC_Unroll_Scratch_Threshold was unused. The others have equivalent settings in the AMDLLVMContextHook, so consistently use that version. The patches to opt were already had different set of command line flags for the same options.
This changes two of the defaults in compiler library and the equivalent flags in opt to match the values which were actually in use so this shoudn't change the current behavior. The unroll threshold default and allow partial unrolling defaults were changed to the actually used default values. Eventually all of these custom options should be removed, because in current LLVM these can be controlled per loop by the TargetTransformInfo, and all have equivalent cl::opts already.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#128 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#27 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/Transforms/IPO/AMDOptOptions.h#7 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDOptOptions.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDPassManagerBuilder.cpp#55 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#25 edit
ECR #304775 - Preparation for kernel caching feature
1. Each device have a separate cache directory
2. It logs caching errors, so we can debug the cache and/or detect collisions
3. Implementeded cache size tracking, so we can evict old data when cache files are too large
4. Added file/path access permission control on both windows and linux
5. Have read/write file lock protection
6. -kcache-disable flag can be used to turn on/off the caching functionality
7. AMD_FORCE_KCACHE_TEST env variable is used for internal testing
8. For the stage we want to cache, call getCacheEntry() followed by makeCacheEntry() if the get fails; otherwise directly return cached data.
- After the compiler library code is refactored, getCacheEntry() and makeCacheEntry() will be wrapped into one function call, so that only one call is needed at the place we want to cache
TO DO:
1. Use it in the compiler library code
- Waiting for the decision on how many stages we want to cache, i) 1-stage caching: source->ISA; or ii) 3-stage caching: source->LLVM IR, LLVM IR->IL, IL->ISA
2. Tracking of timestamps for cache entries
- LRU eviction when cache grows too large
- Suggestion from Laurent: Regarding tracking timestamps for LRU eviction: Random eviction would probably perform as well as LRU and does not require timestamps.
3. Track cache entries per application
ReviewBoardURL = http://ocltc.amd.com/reviews/r/8194/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/cache.cpp#3 add
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/cache.hpp#3 add
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#127 edit