SWDEV-116535 - Report CL_FP_DENORM for single fp config for gfx9 for LC for rocm/pal and force denorms on based on AMD_GPU_FORCE_SINGLE_FP_DENORM for rocm.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#47 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#62 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocsettings.cpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocsettings.hpp#7 edit
[ROCm/clr commit: c7e04f3222]
SWDEV-116959 - ROCm OpenCL: split -amdgpu-internalize-symbols and -amdgpu-early-inline-all
We shall not run internalize pass in FE, it breaks separate compilation.
Thus two options are split now.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcompiler.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#61 edit
[ROCm/clr commit: a831af4af0]
SWDEV-103424 - [ROCm CQE][OCL] OCLRuntime - OCLCreateBuffer tests are failing. The failure is due to AQL cannot support global size > 32bit range. Adding dispatch split support for ROCm, similar to that of GSL (CL#1159349), to resolve the issue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#8 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/runtime/OCLCreateBuffer.cpp#6 edit
[ROCm/clr commit: dadd21fced]
SWDEV-105835 - ROCm OpenCL: add -amdgpu-internalize-symbols to BE
The option -amdgpu-internalize-symbols allows to drop unused symbols from program,
functions and global variables. This saves compile time and object size, a lot in
case of a big program.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#55 edit
[ROCm/clr commit: df22dc2a47]
SWDEV-94644 - Make sure we are processing metadata note entry with supported n_type. Update build log and fail for not supported metadata n_type. Use constants defined in AMDGPUPTNote.h
This change is needed for https://reviews.llvm.org/D29115
This change is required for CL 1366203
ReviewBoardURL: http://ocltc.amd.com/reviews/r/12223/
Testing: lightning conformance tests locally
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#54 edit
[ROCm/clr commit: 4ad12d089c]
SWDEV-107568 - [ROCm CQE][OCL][CZ] Basic 2.0 conformance test giving Segmentation fault (core dumped) at "progvar_prog_scope_uninit"
- Detect if writable program scope variables are present in the program, and if so, insert barrier each dispatch of a kernel from this program.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#48 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.hpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#21 edit
[ROCm/clr commit: b705f74166]
SWDEV-102966 - Dump code object disassembly in OpenCL rocm device.
Invoke DumpExecutableAsText from driver library.
Update build to depend on some more LLVM libraries.
LLVM changes are included, but will come through amd-common.
Driver changes will come through ROCm-OpenCL-driver.
Testing: Run some SDK samples/test_basic with AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps
Reviewed by: Laurent Morichetti, German Andryeyev.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#47 edit
[ROCm/clr commit: c7d2b188de]
SWDEV-105136 - Use the "execution" view rather than the "linking" view to find the metadata and size of the program scope variables.In the "execution" view, the section header table is optional, so we should iterate through the segments to add up the size of PT_LOAD segments with read but not execute flags. We will also find the metadata in the PT_NOTE segment.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#45 edit
[ROCm/clr commit: 66c5d710bc]
SWDEV-102510 - Need a way to control cl_khr/cl_amd extension macros
- Use -cl-ext option to enable OpenCL extensions
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcompiler.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#44 edit
[ROCm/clr commit: 90d4cf8e78]
SWDEV-104875 - ROCm/HSA: use the finalizer from source tree
Right now ROCm/HSA uses finalizer from the HSA RT installed.
This finalizer version has outdated stale SC sources.
At the same time source tree has fresh finalizer sources matching ORCA.
The offline tool amdhsafin is built from that sources.
This change switches from HSA RT finalizer to the in tree finalizer.
Testing: precheckin
Reviewed by Laurent Morichetti and Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#39 edit
[ROCm/clr commit: 9385a3778f]
SWDEV-94610 - Remove the padding at the end of the kernargs (It was for the hidden arguments, but now, LC reports the correct size). Set the LLVM triple to amdgcn-amd-amdhsa-opencl when building the built-in library.
Affected files ...
... //depot/stg/opencl/drivers/opencl/opencldefs#186 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#37 edit
[ROCm/clr commit: 0fb0fb1d8b]
SWDEV-94610 - Add gfx700 to the list of suported targets in HSAILProgram::linkImpl_LC. When dumping the source (-save-temps), print the options actually sent to clang as well as the options passed to OpenCL.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#56 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/amdgpu_metadata.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#36 edit
[ROCm/clr commit: 383e97425b]
SWDEV-94610 - Target features are only needed in the CL->IR stage. The attributes remain on the function, so they should not be set again in the IR->ISA stage.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#34 edit
[ROCm/clr commit: a1009a5d11]
SWDEV-94610 - Don't use the -cl-denorms-are-zero, but instead set the fp32/fp64 denorms with the target features +fp32-denormals and +fp64-denormals. fp64-denormals is always set, fp32-denormals in only set if device >= gfx900 and -cl-denorms-are-zero is not set.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#33 edit
[ROCm/clr commit: 7239172265]
SWDEV-94644 - Run prepare-builtins from the modules build directory, instead of right before generating the include files. Renamed the files to match the opensource build names (except for the .amdgcn suffix). Automatically generate a single include file for all libraries.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#54 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/irif/build/Makefile.irif#7 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ockl/build/Makefile.ockl#8 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#10 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ocml/build/Makefile.ocml#8 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/opencl/build/Makefile.opencl#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#30 edit
[ROCm/clr commit: 8bb15b463b]
SWDEV-94610 - Make sure each kernarg segment sits on a different cache line (align the kernargs on cache lines at minimum). Minor misc cleanups.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#13 edit
[ROCm/clr commit: 3a61b24dd5]
SWDEV-94610 - The spec says that the value returned for HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH does not include the NUL terminator. We should add one before using the string.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#25 edit
[ROCm/clr commit: 557d2bfddf]
SWDEV-94610 - Fix the argName length issue. The string returned by the ROCR is already NUL-terminated.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#22 edit
[ROCm/clr commit: 52e3652f92]
SWDEV-101678 - Create a new instance of the ROCm-OpenCL-Driver for each call to compileImpl and linkImpl.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#202 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.hpp#9 edit
[ROCm/clr commit: bac3dbc7a8]
SWDEV-101354 - HSA HLC: fix unify metadata pass
When we link multiple modules we have metadata duplicated, so after we link with our library bitcode is twice bigger than needs to be.
Besides we did not unify llvm.ident metadata since llvm 3.6 merge.
Fix that:
1. Add llvm.ident to the processing;
2. Do not duplicate strings within unified metadata;
3. Run unification pass post link, not before the link.
Now since our library is compiled for OpenCL 2.0 we will always get OCL version 2.0 as a maximum. That is not really correct, and since
the pass was not really working before that would lead to regression, as we would fail to identify correct kernel's OpenCL version and
perform simplifications for 1.2. Now the pass will pick the first version, which shall represent the kernel module. That might not be
100% correct because we may have several kernel modules, but a proper fix would require to correctly identify library as 1.2, which is
troublesome. In the current state that just keeps the status quo.
Testing: smoke, precheckin
Reviewed by Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#152 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/include/AMDFixupKernelModule.h#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/lib/AMDFixupKernelModule.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/tools/opencl-link/opencl-link.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDUnifyMetadata.cpp#2 edit
[ROCm/clr commit: 82f13f6ba1]