SWDEV-132899 - [gfx10][OCL]- Adding support for forcing WaveSize32 from runtime for testing on gfx10 HW emulator
Motivation: During testing ocltst on Windows on PAL/HSAIL/SC path on gfx10 HW emulator, it was found that SC uses WaveSize64 by default for compute kernels.
SC also has an interface that can be used for forcing the WaveSize to 32 or 64.
- Adding the "-force-wave-size-32" into compiler to be passed down to Finalizer/SC
- Adding environment variable "GPU_FORCE_WAVE_SIZE_32" that can be used from runtime to force WaveSize32 compilation in HSAIL/SC path
ReviewBoardURL = http://ocltc.amd.com/reviews/r/14364/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#69 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#55 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#284 edit
SWDEV-109533 - AMDIL: increase inline cost threshold from 400 to 14000
This is the w/a to allow Blender work on SI device.
Testing: precheckin
Reviewed by Boleslaw Ciesielski
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/utils/OPTIONS.def#6 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#135 edit
SWDEV-86836 - Enhance caching library class to prepare one-stage kernel caching by:
0. Moving cache storage setup into constructor
1. Controlling cache storage size
2. Explicit cache cleanup
a. -kcache-wipe is off by default; when turned on, the caching directory would be wiped off
b. Here it's just an option. The implementation (the call of wipeCacheFolders()) will be added in the compiler library
3. Enforcing cache miss (actual compilation enforcing and adding a new entry to the cache storage).
a. -kcache-enforce-miss is off by default; when turned on, the real compilation will be enforced
b. Here it's just an option. The implementation will be added in the compiler library
ReviewBoardURL = http://ocltc.amd.com/reviews/r/9726/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#134 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/caching/cache.cpp#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/caching/cache.hpp#7 edit
SWDEV-80173 - HSA HLC: disable liveness analysis and jump threading
After the investigation I have found liveness analysis never changed code generation in any of the benchmarks or applications.
Its only use is in the LICM and the hoisting limitation was never really triggered.
Since the analysis is very expensive I'm disabling it.
The jump threading is generally bad on the GPU because it creates unstructured control flow.
Even if hsail might become smaller and have less branches, it does not help because finalzier's structurizer will have to clone blocks.
Jump threading is disabled for GPU. This improves compilation speed and just slightly improves performance.
Testing: smoke, precheckin, vray and blender compilation
Reviewed by Daniil Fukalov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#133 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#29 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Analysis/AMDLiveAnalysis.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDPassManagerBuilder.cpp#61 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#29 edit
SWDEV-77584 - Compiler Lib: Preparations for enabling HSAIL on OpenCL 1.2 by default. Adding -legacy and -binary_is_spirv.
-legacy option will be used for forcing AMDIL path after switching HSAIL by default for OpenCL.
-binary_is_spirv option will be used for indicating that the binary is constructed from SPIRV.
[Testing] pre-checkin:
http://ocltc.amd.com:8111/viewModification.html?modId=61541&personal=true&buildTypeId=&tab=vcsModificationBuilds&show_all_builds=true
[Reviewer] Stanislav Mekhanoshin
http://ocltc.amd.com/reviews/r/8850
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/utils/OPTIONS.def#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#132 edit
EPR #425389 - Back out changelist 1181925
Although the compiler library sources are split, the build does not yet use this, so the wrong default value is being used for AMDIL vs. HSAIL
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#130 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#28 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/Transforms/IPO/AMDOptOptions.h#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDOptOptions.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDPassManagerBuilder.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#26 edit
ECR #354633 - SPIR-V: Let aoc2 load and save SPIR-V.
E.g.
aoc2 -march=hsail-64 -cl-std=CL2.0 -srctospv testReadf.cl
compile a cl to SPIR-V binary and save it as .spv
aoc2 -march=hsail-64 -cl-std=CL2.0 -spirv work_group_any.spv
load a SPIR-V binary and compiles it to ISA and save it to elf in .bin
Changed the option for round-trip translation of SPIR-V to -round-trip-spirv.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/api/v0_8/acl.cpp#35 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend_clang.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclEnums.h#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#129 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#76 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLEnumCheck.cpp#47 edit
ECR #304775 - Remove HLC_Unroll_* variables.
HLC_Unroll_Scratch_Threshold was unused. The others have equivalent settings in the AMDLLVMContextHook, so consistently use that version. The patches to opt were already had different set of command line flags for the same options.
This changes two of the defaults in compiler library and the equivalent flags in opt to match the values which were actually in use so this shoudn't change the current behavior. The unroll threshold default and allow partial unrolling defaults were changed to the actually used default values. Eventually all of these custom options should be removed, because in current LLVM these can be controlled per loop by the TargetTransformInfo, and all have equivalent cl::opts already.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#128 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#27 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/Transforms/IPO/AMDOptOptions.h#7 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDOptOptions.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/IPO/AMDPassManagerBuilder.cpp#55 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/opt/amdopt.inc#25 edit
ECR #304775 - Preparation for kernel caching feature
1. Each device have a separate cache directory
2. It logs caching errors, so we can debug the cache and/or detect collisions
3. Implementeded cache size tracking, so we can evict old data when cache files are too large
4. Added file/path access permission control on both windows and linux
5. Have read/write file lock protection
6. -kcache-disable flag can be used to turn on/off the caching functionality
7. AMD_FORCE_KCACHE_TEST env variable is used for internal testing
8. For the stage we want to cache, call getCacheEntry() followed by makeCacheEntry() if the get fails; otherwise directly return cached data.
- After the compiler library code is refactored, getCacheEntry() and makeCacheEntry() will be wrapped into one function call, so that only one call is needed at the place we want to cache
TO DO:
1. Use it in the compiler library code
- Waiting for the decision on how many stages we want to cache, i) 1-stage caching: source->ISA; or ii) 3-stage caching: source->LLVM IR, LLVM IR->IL, IL->ISA
2. Tracking of timestamps for cache entries
- LRU eviction when cache grows too large
- Suggestion from Laurent: Regarding tracking timestamps for LRU eviction: Random eviction would probably perform as well as LRU and does not require timestamps.
3. Track cache entries per application
ReviewBoardURL = http://ocltc.amd.com/reviews/r/8194/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/cache.cpp#3 add
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/cache.hpp#3 add
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#127 edit
ECR #304775 - Bug 10752 kernel caching feature (AMDIL and HSAIL path)
1. For the stage we want to cache, call getCacheEntry() followed by makeCacheEntry() if the get fails; otherwise directly return cached data.
a. Each device have a separate cache directory
b. It logs caching errors, so we can debug the cache and/or detect collisions
2. Implementeded cache size tracking, so we can evict old data when cache files are too large
3. Added file/path access permission control on both windows and linux
4. Have read/write file lock protection
5. -kcache-disable flag can be used to turn on/off the caching functionality
6. AMD_FORCE_KCACHE_TEST env variable is used for internal testing
TO DO:
1. Tracking of timestamps for cache entries
-LRU eviction when cache grows too large
2. Track cache entries per application
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/cache.cpp#1 add
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/cache.hpp#1 add
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend.cpp#34 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/frontend_clang.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/amdil_be.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#124 edit
... //depot/stg/opencl/drivers/opencl/tests/kcache/Makefile#1 add
... //depot/stg/opencl/drivers/opencl/tests/kcache/build/Makefile#1 add
... //depot/stg/opencl/drivers/opencl/tests/kcache/build/Makefile.kcache#1 add
... //depot/stg/opencl/drivers/opencl/tests/kcache/kCacheTest_std.txt#1 add
... //depot/stg/opencl/drivers/opencl/tests/kcache/kernel.cl#1 add
... //depot/stg/opencl/drivers/opencl/tests/kcache/main.cpp#1 add
EPR #403782 - IOMMU2/SVM
- Enable SCOption_R1200_ENABLE_XNACK whenever IOMMUv2 is supported.
- Add "-sc-xnack-iommu" option for compile and link and pass this to SCWrapper in the options string.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7266/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/scStateSI.cpp#30 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#122 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#282 edit
EPR #405889 - Added option to set VGPR/SGPR/LDS usage in ISA to certain value greater than actual usage for debugging purpose. If the given value is smaller than actual value, this option has no effect.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/scCompileSI.cpp#52 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/scHWShaderInfo.h#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#121 edit
EPR #407056, #407061, #406980 - Back out changelist 1083545 since it causes a bunch of perf degradations. Will add a heurstics for -scras=2 for memory bound kernels only.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#118 edit
EPR #402000 - [CQE OCL][Perf][QR] ~6-7% perf drop in CompuCL Benchmark (Graphics: T-Rex subtest).
Add option to disable SC merge memory loads and stores. By default it is disabled. Will decide whether to enable it by default after performance runs.
cherrypick 1076590 and CL#1077419 from sc stg for adding option in sc.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sc/Interface/SCCommon.h#42 integrate
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sc/Src/CompilerBase.cpp#51 integrate
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sc/Src/CompilerBase.hpp#35 integrate
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sc/Src/HwUtils.cpp#36 integrate
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/scState.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#114 edit
EPR #405194 - Change unroll threshold to LLVM default to partially work around Linpack performance problem.
Prior to CL 1058428, which increased the unroll threshold to 200, this was only 100 which is lower than the LLVM default. Linpack's new ISA has increased register usage, but decreasing the unroll threshold to the previous level does not reduce the register count to its previous level. The increased register usage is probably a new SC problem, so this should probably be increased again in the future. There is no change in register usage with 100 vs. 150 on Linpack.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#113 edit
ECR #333756 - HSA Finalizer: added runtime option to force buffer instructions for global access
This can be used under ORCA RT.
Testing: smoke, smoke_clang, precheckin, clbas dgemm
Reviewed by Nikolay Haustov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/scStateSI.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#112 edit
ECR #304775 - Bug 10112 - Raise default unroll threshold. The current default is 100, which is even lower than the LLVM default of 150. Increasing to 200 is a modest increase, and this should probably be even higher.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#111 edit
EPR #389586 - Add workaround for VI SPI SGPR initialization hardware bug for HSAIL path.
There is a hardware bug in VI (UBTS502672) which requires a workaround. Compute shaders need to tell shader compiler the available sGPR is 78 and set sGPUR usage in the compiled ISA to be 94. It has been done in AMDIL path but not done in HSAIL path. This change will apply the workaround to HSAIL path.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/devStateSI.cpp#16 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/devStateSI.h#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/scCompileSI.cpp#41 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#109 edit