SWDEV-103424 - [ROCm CQE][OCL] OCLRuntime - OCLCreateBuffer tests are failing. The failure is due to AQL cannot support global size > 32bit range. Adding dispatch split support for ROCm, similar to that of GSL (CL#1159349), to resolve the issue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#8 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/runtime/OCLCreateBuffer.cpp#6 edit
SWDEV-105835 - ROCm OpenCL: add -amdgpu-internalize-symbols to BE
The option -amdgpu-internalize-symbols allows to drop unused symbols from program,
functions and global variables. This saves compile time and object size, a lot in
case of a big program.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#55 edit
SWDEV-94644 - Make sure we are processing metadata note entry with supported n_type. Update build log and fail for not supported metadata n_type. Use constants defined in AMDGPUPTNote.h
This change is needed for https://reviews.llvm.org/D29115
This change is required for CL 1366203
ReviewBoardURL: http://ocltc.amd.com/reviews/r/12223/
Testing: lightning conformance tests locally
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#54 edit
SWDEV-107568 - [ROCm CQE][OCL][CZ] Basic 2.0 conformance test giving Segmentation fault (core dumped) at "progvar_prog_scope_uninit"
- Detect if writable program scope variables are present in the program, and if so, insert barrier each dispatch of a kernel from this program.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#48 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.hpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#21 edit
SWDEV-102966 - Dump code object disassembly in OpenCL rocm device.
Invoke DumpExecutableAsText from driver library.
Update build to depend on some more LLVM libraries.
LLVM changes are included, but will come through amd-common.
Driver changes will come through ROCm-OpenCL-driver.
Testing: Run some SDK samples/test_basic with AMD_OCL_BUILD_OPTIONS_APPEND=-save-temps
Reviewed by: Laurent Morichetti, German Andryeyev.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#47 edit
SWDEV-105136 - Use the "execution" view rather than the "linking" view to find the metadata and size of the program scope variables.In the "execution" view, the section header table is optional, so we should iterate through the segments to add up the size of PT_LOAD segments with read but not execute flags. We will also find the metadata in the PT_NOTE segment.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#45 edit
SWDEV-102510 - Need a way to control cl_khr/cl_amd extension macros
- Use -cl-ext option to enable OpenCL extensions
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcompiler.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#44 edit
SWDEV-94610 - Fix the build for OpenCL/LC on Linux.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#43 edit
SWDEV-105136 - [OCL-LC-ROCm] Missing CL_PROGRAM_GLOBAL_VARIABLE_TOTAL_SIZE implementation
- iterate over the elf sections and add up the section size for SHF_ALLOC && ! SHF_EXECINSTR
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#41 edit
SWDEV-104875 - ROCm/HSA: use the finalizer from source tree
Right now ROCm/HSA uses finalizer from the HSA RT installed.
This finalizer version has outdated stale SC sources.
At the same time source tree has fresh finalizer sources matching ORCA.
The offline tool amdhsafin is built from that sources.
This change switches from HSA RT finalizer to the in tree finalizer.
Testing: precheckin
Reviewed by Laurent Morichetti and Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#39 edit
SWDEV-94610 - Remove the padding at the end of the kernargs (It was for the hidden arguments, but now, LC reports the correct size). Set the LLVM triple to amdgcn-amd-amdhsa-opencl when building the built-in library.
Affected files ...
... //depot/stg/opencl/drivers/opencl/opencldefs#186 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#37 edit
SWDEV-94610 - Add gfx700 to the list of suported targets in HSAILProgram::linkImpl_LC. When dumping the source (-save-temps), print the options actually sent to clang as well as the options passed to OpenCL.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#56 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/amdgpu_metadata.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#36 edit
SWDEV-94610 - Target features are only needed in the CL->IR stage. The attributes remain on the function, so they should not be set again in the IR->ISA stage.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#34 edit
SWDEV-94610 - Don't use the -cl-denorms-are-zero, but instead set the fp32/fp64 denorms with the target features +fp32-denormals and +fp64-denormals. fp64-denormals is always set, fp32-denormals in only set if device >= gfx900 and -cl-denorms-are-zero is not set.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#33 edit
SWDEV-94644 - Run prepare-builtins from the modules build directory, instead of right before generating the include files. Renamed the files to match the opensource build names (except for the .amdgcn suffix). Automatically generate a single include file for all libraries.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#54 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/irif/build/Makefile.irif#7 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ockl/build/Makefile.ockl#8 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#10 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ocml/build/Makefile.ocml#8 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/opencl/build/Makefile.opencl#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#30 edit
SWDEV-94610 - Make sure each kernarg segment sits on a different cache line (align the kernargs on cache lines at minimum). Minor misc cleanups.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#13 edit
SWDEV-94610 - The spec says that the value returned for HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH does not include the NUL terminator. We should add one before using the string.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#25 edit
SWDEV-94610 - Fix the argName length issue. The string returned by the ROCR is already NUL-terminated.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#22 edit
SWDEV-94644 - Run prepare-builtins on the control functions.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#53 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#19 edit
SWDEV-101678 - Create a new instance of the ROCm-OpenCL-Driver for each call to compileImpl and linkImpl.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#202 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.hpp#9 edit
SWDEV-101354 - HSA HLC: fix unify metadata pass
When we link multiple modules we have metadata duplicated, so after we link with our library bitcode is twice bigger than needs to be.
Besides we did not unify llvm.ident metadata since llvm 3.6 merge.
Fix that:
1. Add llvm.ident to the processing;
2. Do not duplicate strings within unified metadata;
3. Run unification pass post link, not before the link.
Now since our library is compiled for OpenCL 2.0 we will always get OCL version 2.0 as a maximum. That is not really correct, and since
the pass was not really working before that would lead to regression, as we would fail to identify correct kernel's OpenCL version and
perform simplifications for 1.2. Now the pass will pick the first version, which shall represent the kernel module. That might not be
100% correct because we may have several kernel modules, but a proper fix would require to correctly identify library as 1.2, which is
troublesome. In the current state that just keeps the status quo.
Testing: smoke, precheckin
Reviewed by Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#152 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/include/AMDFixupKernelModule.h#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/lib/AMDFixupKernelModule.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/tools/opencl-link/opencl-link.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDUnifyMetadata.cpp#2 edit
SWDEV-1306648 - Add OptimizeLLVMBitcode to the ROCm-OpenCL-Driver. Call the optimizer in roc::HSAILProgram::linkImpl, between linking with the bitcode built-in libraries and code generation. Use the default optimization level, with at a minimum -strip -instcombine -always-inline.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#13 edit
SWDEV-94640 - Set the O# level and cl-std from the command args or default settings.
Select the correct header (CL1.2/C2.0) from the given cl-std.
Reorder the linked libraries in reverse order of dependency (was failing linking).
Make the built-in library objects depend on the pre-compiled header.
Disable the timestamp on the pre-compiled header.
Disable image support reporting.
Affected files ...
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/headers/build/Makefile.headers#7 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/irif/build/Makefile.irif#4 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ockl/build/Makefile.ockl#5 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#5 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ocml/build/Makefile.ocml#5 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/opencl/build/Makefile.opencl#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#11 edit
SWDEV-94637 - Add the logic to select the control function libraries. Add a gfxipVersion field to the DeviceInfo. Generate both CL1.2 and CL2.0 precompiled headers. Set the -mcpu target using the DeviceInfo.machineTarget_.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#51 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/headers/build/Makefile.headers#2 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/irif/build/Makefile.irif#3 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ockl/build/Makefile.ockl#4 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#4 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ocml/build/Makefile.ocml#4 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/opencl/build/Makefile.opencl#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdefs.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#10 edit
SWDEV-94610 - Add the control functions to the link (roc::HSAILProgram::linkImpl_LC)
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#9 edit