SWDEV-116690 - disable passing of -cl-fast-relaxed-math on ORCA path only
This is the w/a for bogus accurancy expectations of flopscl.
Testing: flopscl, precheckin
Reviewed by Brian Sumner and Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#40 edit
SWDEV-90709 - Complib: unquote command line arguments for -I and -D before passing to clang
Testing: smoke, precheckin
Reviewed by Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#36 edit
SWDEV-77584 - HSA HLC: refactoring of min/max processing and folding
1. Fixed correctness bug: if a source contains code like (x > y) ? x : y, HLC was folding
this and similar patterns to min and max instructions. The problem is with NaN handling.
Such a pattern may return NaN if one of two arguments is a NaN. All our instructions return
a number in this case, except for gcn instruction returning a qNaN if input is sNaN.
For a qNaN a number is retuned in any way. Therefor such folding is only correct if NaN handling
is disabled. Patterns are predicated to work with -cl-finite-math-only or -cl-fast-relaxed-math
which includes the former option.
NB: Performance regressions are expected in programs which do not use either of these options.
2. Compiler lib did hot handle -cl-finite-math-only. Also added handling of -cl-no-signed-zeros,
even though it does not affect code generation because there is no llvm counterpart for it.
3. Patterns for NaN agnostic comparison codes are added. We are getting these in case if finite
only math is requested.
4. Removed patterns for __hsail_min_f* and __hsail_max_f*. Instead these intrinsics are lowered
to fminnum and fmaxnum llvm operations with the same semantics. This allows to decrease the number
of patterns and simplify handling.
5. For f32 we were only producing gcn versions min and max with source patterns if gcn is enabled.
Added similar lowering to standard min/max HSAIL operations if gcn is disabled.
6. Added lowering of fmaxnum/fminnum to more efficient gcn operations if gcn is enabled.
Neither OpenCL nor LLVM IR semantics are violated by this.
7. Moved GCN media intrinsics definitions into the GCN directory.
8. Added folding of gcn f32 instructions min(max), min(min), max(max) into corresponding gcn
instructions med3, min3 and max3. This should have been helpful for color clamping.
Performance testing showed these are slow, however. T-Rex test from compubench has slowed down
by 50 times for no obvious reason. Therefor folding is disabled by default. The option -enable-gcn-mm3
is added to enable the folding for testing purposes.
Testing: smoke, precheckin, luxmark, compubench, BasemarkCL,
conformance: commonfns, bruteforce -w, relationals, select
Reviewed by Brian Sumner
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/opt_level.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#35 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/GCN/HSAILArithmetic.td#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/GCN/HSAILFusion.td#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/GCN/HSAILIntrinsics.td#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILArithmetic.td#45 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILFusion.td#28 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelDAGToDAG.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelLowering.cpp#113 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILInstrInfo.td#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILIntrinsics.td#70 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/src/llc/opt/minmax/minmaxf3pat.cl#1 add
... //depot/stg/opencl/drivers/opencl/tests/hsa/tlst/llc_opt.tlst#93 edit
SWDEV-2 - Change OpenCL version number from 1898 to 1899.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1645 edit
SWDEV-77584 - HSA HLC: fixed reflection metadata generation on HSAIL OCL 1.2 path
We are producing 6 extra arguments, but metadata was produced only for 3.
Removed KE_OCL12_NUM_ARGS define to avoid confusion.
Testing: smoke, precheckin
Reviewed by Yaxun Liu
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDOpenCLKernenv.h#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDInsertOpenCLKernenv.cpp#10 edit
ECR #304775 - Fix failure in TC. Allocation and deallocation cannot be done by different DLLs.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#31 edit
EPR #405458 - clinfo segfaults when ENABLE_CAL_SHUTDOWN=1.
For the global variables of:
std::map <std::string, int> OptionNameMap[2];
std::map <std::string, int> NoneSeparatorOptionMap[2];
std::map <std::string, int> FOptionMap;
std::map <std::string, int> MOptionMap;
We don't need to call the clear() method explicitly, since the std::map destructor will clean things up (valgrind mem-check doesn't report any leak related to these global variables after this change). Besides, on Linux amd::option::teardown() is called after the global variables' destructors are called, and it will cause segfault.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#29 edit