520cfc439dbf3cc47fe5446d5334e1d3a13bb5de
SWDEV-82596 - HSA HLC: Create AMDInline pass
The generic llvm inlining heuristcs do not work well for GPU.
In particular we have a common problem in several tests:
If we have a pointer to private array passed into a function it will not be optimized out, leaving scratch usage.
The pass increases the inline threshold to allow inliniting in this case.
Also that we can move at least some AMD inlining customizations into this file from the common code.
Inline hint threshold is moved in this change.
Performance impact on ocltst sha256, 32 bit, Fiji:
AMDIL HSAIL Diff HSAIL+Inliner Diff Diff
before to AMDIL to HSAIL to AMDIL
OCLPerfSHA256[ 0] 43.843 40.894 0.93 69.910 1.71 1.59
OCLPerfSHA256[ 1] 53.611 51.083 0.95 80.919 1.58 1.51
OCLPerfSHA256[ 2] 52.127 51.528 0.99 80.640 1.56 1.55
OCLPerfSHA256[ 3] 60.952 57.027 0.94 68.615 1.20 1.13
OCLPerfSHA256[ 4] 76.173 70.150 0.92 80.582 1.15 1.06
OCLPerfSHA256[ 5] 75.886 70.264 0.93 81.000 1.15 1.07
Testing: smoke, precheckin, ocltst sha256
Reviewed by Danill Fukalov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/opt_level.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/InitializePasses.h#93 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/LinkAllPasses.h#49 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/Transforms/IPO.h#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDInline.cpp#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/CMakeLists.txt#24 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/IPO.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/Inliner.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#28 edit
[ROCm/clr commit: 5e3d4f5a01]
Описание
No description provided
Languages
C++
67.5%
C
20.6%
Python
6.6%
CMake
3.4%
Shell
0.6%
Разное
1.1%