MI300 does not support image APIs.
Apps to use __HIP_NO_IMAGE_SUPPORT instead of arch
Change-Id: I91178bfd27ea7b7188e7a958a876c0264f4469aa
[ROCm/clr commit: 16c6b365c2]
Using backward compatibility paths will provide #warning message be default.
Compile time option added to enable/disable the #error message.
Updated the backward compatibility message
Change-Id: I3bab00df26145991b32176d2d76977c2e953bf5f
[ROCm/clr commit: f788150132]
Updating hip_init lock to use std::call_once fixed Performance
drops in TF benchmarks for FP16
Change-Id: Ib1125ac66806b232057ba183e296ea4d0642d08d
[ROCm/clr commit: 2f83719d12]
With recent upstream changes (D145770), we can now use the
Comgr unbundler without requiring an env field in the supplied
targetID. For users, this is consistent with previous legacy
unbundler behavior.
Change-Id: I5f085b0fa1ad352bbbb282b75367c206b75f279f
[ROCm/clr commit: 443f912c7f]
Not a required change, but it does make dealing with temporary files generated
by Comgr easier.
Change-Id: I9c43138dd2a6c4fea965b57fbce7a087ab2bbd28
[ROCm/clr commit: 1171518b97]
Relates to https://reviews.llvm.org/D150427,
Each printf call populates buffer with following data
1. Control DWord - contains info regarding stream, format string constness and size of data frame
(see http://gerrit-git.amd.com/c/lightning/ec/device-libs/+/857722 for more info)
2. Hash of the format string (if constant) else the format string itself
3. Printf arguments (each aligned to 8 byte boundary)
Change-Id: I7e320deb343921b4b4cfaf08a2be2883e0bc1f65
[ROCm/clr commit: 7b6a8f1702]
"FILES" installs files as 644, but we want libraries to be 755, which
we can do with "PROGRAMS".
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I155ed77482839ff6d71f90239a014d239e20f4b8
[ROCm/clr commit: 2cda949920]
Because hipRTC is now using the newer
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC, and now that this
action has been fixed for HIP compilations in Comgr, hipRTC no
longer needs separate Comgr call to link in the device libs.
Change-Id: Ibf9024cbaaab825584566e8d0b5fce60d7063dd8
[ROCm/clr commit: 283dd8352d]
RUNPATH in libraries will be : $ORIGIN
RUNPATH in binaries will be : $ORIGIN/../lib
Change-Id: I87b6a7d1f58f20499c3a0913d03701ac687d910d
[ROCm/clr commit: 31d1420c54]
HIP_FORCE_DEV_KERNARG=1 will create a device allocation for kernel arg
segment. Flag is 0 by default.
Change-Id: Iaaf5a149f3be8596568878d5d272268baf067c60
[ROCm/clr commit: 5436d362b1]
- Use regular copy API if we exhaust free SDMA engines and not fall back
to compute copy. Falling to compute is affecting performance for
numerous apps that are GPU bound
Change-Id: I75c767eff0b9f5ada324301c5c327fe2c23a9806
[ROCm/clr commit: 60d9a4ebab]
Previously, we used the following approach and Comgr actions
for device lib linking:
AMD_COMGR_COMPILE_SOURCE_TO_BC (compile with clang driver)
AMD_COMGR_ADD_DEVICE_LIBRARIES (link in device libs with
llvm-link API)
However, the clang driver can link in device libraries as part
of compilation, assuming a --rocm-path is set. In this context,
this is accomplished by using the following Comgr action instead:
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC (compile and
link in device libs with clang driver)
Change-Id: I661465865365afecc44aa15d4df91bfab361af8d
[ROCm/clr commit: a4c5c44008]
hipcc and clang++ both have logic to detect the installed hardware
and to automatically select the appropriate AMDGPU target when it is
left unspecified. When the AMDGPU_TARGETS property is initialized with
a set of default values, it results in the addition of an explicit set
of --offload-arch flags being passed. These explicit architecture flags
disable the architecture autodetection in the compiler.
The resulting behaviour from setting fixed defaults makes it unpleasant
to compile with CMake because they increase the build times for projects
unless they are overriden (as most users do not need to build for all
five default architectures). The fixed defaults are also troublesome for
users with hardware not included in the default set (e.g., gfx1011,
gfx1031, gfx1100).
A possible alternative might be to detect the architecture within
hip-config.cmake rather than running the detection logic on each
compiler invocation. However, this approach is simpler.
Change-Id: I9495d766b7eed03852eb4dc72b0aabe4100bc32c
Signed-off-by: Cordell Bloor <Cordell.Bloor@amd.com>
[ROCm/clr commit: e1bed6f354]
HIPRTC_INIT_API can have nullptr in the arguments and ClPrint
can crash while printing
Change-Id: Iecade5c3867196509c8cc0647b9aa24be0960a02
[ROCm/clr commit: c98fad1edc]
Add dstMemory format updating.
Separate format updating for srcMemory and dstMemory.
Change-Id: I1692b92d417bbd742d562679f218ebf8ca532e92
[ROCm/clr commit: 7624a48de9]
The previous implementation using std::copy() resulted in
differences between the in-memory and on-disk representations.
With the updated implementation, we get the same contents.
Change-Id: Iadfae3cd7f7ba99538da2ac4f11f30f5a78260d8
[ROCm/clr commit: b17056cb93]
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.
Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
[ROCm/clr commit: 04b696abee]