Init and fini kernel needs to be launched when we load and unload code object. Avoid looping through all kernels within a code object just to run the init and fini kernels. Compiler currently only generates 1 init and fini kernel.
[ROCm/clr commit: cd46294b31]
The cl_khr_depth_images associated macro definition is defined twice in
the compiler: in opencl-c.h and automatically by the compiler deduced
from the cl-ext list. These two co-exist and there is no need to remove
cl_khr_depth_images from the cl-ext list.
If we remove cl_khr_depth_images from the cl-ext list, and we do not
include opencl-c.h the macro is not defined.
This fixes conformance test ./test_compiler compiler_defines_for_extensions
when using Comgr with -include opencl-c-base.h -fdeclare-opencl-builtins
without including opencl-c.h.
Before we got the error `ERROR: Supported extension cl_khr_depth_images
not defined in kernel`
This change is needed to eventually get rid of the opencl-c.pch that is embedded in comgr, and that makes implementing a compilation cache in comgr hard.
Change-Id: I76497874ebe7163966420d4ac23a0788b93a36fd
[ROCm/clr commit: 8c9e6d0fa5]
This reverts commit e5b6537315ce9b2688ee0269ba0828a703c3e2c9.
The regressions (SWDEV-459556 and SWDEV-460260) caused by the original patch
has been resolved.
Change-Id: I32344492b4ff88bd7e91ea47983ac15636dc77c1
[ROCm/clr commit: b0930263e5]
Previously, we used the following approach and Comgr actions
for device lib linking:
AMD_COMGR_COMPILE_SOURCE_TO_BC (compile with clang driver)
AMD_COMGR_ADD_DEVICE_LIBRARIES (link in device libs with
llvm-link API)
However, the clang driver can link in device libraries as part
of compilation, assuming a --rocm-path is set. In this context,
this is accomplished by using the following Comgr action instead:
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC (compile and
link in device libs with clang driver)
Change-Id: Ie0bbee7d9a12672536b6d751056a941128ed58be
[ROCm/clr commit: 6311ed8a8e]
Currently we force inlining everything for HIP. Now we'd like to enable function
supports. The first step is to remove uses of `-amdgpu-early-inline-all` in
various places. This patch is to remove all of them from clr.
Change-Id: Ib0cad1f586714c9989778b00746aa4c47a4eec95
[ROCm/clr commit: a09204388a]
This is an initial change before we refactor the build/link paths for
kernel launches for HIP. This current change is needed as compiler was
setting some dump file which needed fs access which has slowdowns for
NFS mounted file systems
Change-Id: I828f9bb04d789b4f8c05c1ed08767f325efeb47c
[ROCm/clr commit: 3f2f7252aa]
Add trap handler code into runtime and compile/load during
device initialization. The current interface for trap handler in
PAL is obsolete and the new interface will be provided later.
Change-Id: I1fa702c5d1f2e6731f781369c980d546cf422328
[ROCm/clr commit: e1d34cb24f]
This reverts commit 58e62063f3.
Reason for revert: There are currently some outstanding issues with the COMPILE_SOURCE_WITH_DEVICE_LIBS Comgr action (https://ontrack-internal.amd.com/browse/SWDEV-386072). Once these LLVM issues have been resovled, we can safely re-apply this patch
Change-Id: I8501967af8496ea50d6e4a97399e45db51bbed1e
[ROCm/clr commit: 19526e46e6]
This is related to SWDEV-410182, but it's not enough to fix it.
Functions from device-libs are precompiled into llvm-ir in a "target agnostic" way
(in reality, it's not 100% target agnostic, which brings us many headaches).
When linking builtins (like device-libs) from the command line, we use the flag
-mlink-builtin-bitcode. The difference between regular linking of bitcode and
this flag is that the later propagates target-specific attributes. If this
attributes are not propagated, we can end up with incosistent target attributes.
Comgr provides the action AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
for this exact reason. The old action is currently deprecated and this one should
be used.
Change-Id: I518415214debdf4fedf0b1d81456d6e9fb8a3d19
[ROCm/clr commit: f3dc04a50d]
This reverts commit 2e664d2492.
Reason for revert: Performance regressions and failures observed. Need to investigate those and before re-applying patch
Change-Id: I42ba0605797f9bdcfb5d5102927dd01405cf05e3
[ROCm/clr commit: 8047d8e3e8]
Previously, we used the following approach and Comgr actions
for device lib linking:
AMD_COMGR_COMPILE_SOURCE_TO_BC (compile with clang driver)
AMD_COMGR_ADD_DEVICE_LIBRARIES (link in device libs with
llvm-link API)
However, the clang driver can link in device libraries as part
of compilation, assuming a --rocm-path is set. In this context,
this is accomplished by using the following Comgr action instead:
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC (compile and
link in device libs with clang driver)
Change-Id: I661465865365afecc44aa15d4df91bfab361af8d
[ROCm/clr commit: a4c5c44008]
* Return the result by value and not through a pointer passed by
parameter
Change-Id: I8f872c95c4a330bebe299d486fb73f36660b469c
[ROCm/clr commit: 37849a0726]
This reverts commit 8da846db0a.
Reason for revert: Test failures with Luxmark, blender, and Indigobench. Need to investigate before re-applying
Change-Id: I6b08273a8f9c8bcaa4e7a06cd42d15048e52ca2a
[ROCm/clr commit: 5168485d23]
The Comgr ADD_DEVICE_LIBRARIES action has been deprecated. In place
of the previous two-action approach:
AMD_COMGR_COMPILE_SOURCE_TO_BC
AMD_COMGR_ADD_DEVICE_LIBRARIES
We can now use a single combined action:
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
This new action more closely alings with how device library
management is done by the clang driver.
Change-Id: Id844e9031a1896dedeacec453440b9babc4b111a
[ROCm/clr commit: 0969056f66]
Weirdly, the `requiredDump` argument to linkLLVMBitcode was used to enable/disable
the keeping temporary bytecode files (those generated by -save-temps=all) after linking.
This patch removes this argument as there is no obvious benefit from keepeing it
(the user would only rely on -save-temps=all to control this).
Change-Id: I0c00486f95eb1d4e296b5247c488407c47f0b2d9
[ROCm/clr commit: 8ab3fd58cf]
The current code generates a _optimized.bc regardlessly, so put back the original logic.
Change-Id: I3f84d10934b3e983f5f828af8d0943449a6e1d94
[ROCm/clr commit: 6647882773]
Disable devlib linking when runtime links multiple objects from
the app. Otherwise devlibs will be linked twice and may cause
undefined behavior with COv5.
Change-Id: I3b8640c64ff898893225fe3af5b4b4a32d42bf40
[ROCm/clr commit: c275d9b4b3]
Currently COMGR doesn't provide global variable size and runtime
parses ELF binary directly. Avoid parsing for HIP. That can save
5% in hipModuleLoad() time.
Change-Id: I47540d1e957bdb0c2406b6b848222de2920b2504
[ROCm/clr commit: 2664d8cf9e]
Metadata in Codeobject version 5 is the extension of CO3 and CO4.
Add the detection of the new fields and program them in
the setup of the kernel arguments.
Change-Id: I27e58df77320ad00f4f16d35912668db803826af
[ROCm/clr commit: be6a06384e]
This patch allows to substitute binary for the opencl program. It supposed to be used as:
1. Run the opencl program with -save-temps.
2. Open the cl temp and find the following text in the program header:
Hash to override:
Source: 0xd66bcfa20e69e605
Source + clang options: 0x656a9dd8aedcbfb6
3. Create config file (ascii text) with a pair(s):
<hash> <path_to_binary_to_substitute>
where hash is the hex value from step 2 (without leading 0x), you can use either hash
depending on what you're going to match:
only the source text of the program or along with it's clang options.
4. Set the env variable AMD_OCL_SUBST_OBJFILE to the path of your config file.
5. Rerun the opencl program.
Change-Id: I977c80fe529ea14458194918c6ddfbe2de6a8857
[ROCm/clr commit: 51cc9c2f8c]