Because hipRTC is now using the newer
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC, and now that this
action has been fixed for HIP compilations in Comgr, hipRTC no
longer needs separate Comgr call to link in the device libs.
Change-Id: Ibf9024cbaaab825584566e8d0b5fce60d7063dd8
[ROCm/clr commit: 283dd8352d]
RUNPATH in libraries will be : $ORIGIN
RUNPATH in binaries will be : $ORIGIN/../lib
Change-Id: I87b6a7d1f58f20499c3a0913d03701ac687d910d
[ROCm/clr commit: 31d1420c54]
HIP_FORCE_DEV_KERNARG=1 will create a device allocation for kernel arg
segment. Flag is 0 by default.
Change-Id: Iaaf5a149f3be8596568878d5d272268baf067c60
[ROCm/clr commit: 5436d362b1]
- Use regular copy API if we exhaust free SDMA engines and not fall back
to compute copy. Falling to compute is affecting performance for
numerous apps that are GPU bound
Change-Id: I75c767eff0b9f5ada324301c5c327fe2c23a9806
[ROCm/clr commit: 60d9a4ebab]
Previously, we used the following approach and Comgr actions
for device lib linking:
AMD_COMGR_COMPILE_SOURCE_TO_BC (compile with clang driver)
AMD_COMGR_ADD_DEVICE_LIBRARIES (link in device libs with
llvm-link API)
However, the clang driver can link in device libraries as part
of compilation, assuming a --rocm-path is set. In this context,
this is accomplished by using the following Comgr action instead:
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC (compile and
link in device libs with clang driver)
Change-Id: I661465865365afecc44aa15d4df91bfab361af8d
[ROCm/clr commit: a4c5c44008]
hipcc and clang++ both have logic to detect the installed hardware
and to automatically select the appropriate AMDGPU target when it is
left unspecified. When the AMDGPU_TARGETS property is initialized with
a set of default values, it results in the addition of an explicit set
of --offload-arch flags being passed. These explicit architecture flags
disable the architecture autodetection in the compiler.
The resulting behaviour from setting fixed defaults makes it unpleasant
to compile with CMake because they increase the build times for projects
unless they are overriden (as most users do not need to build for all
five default architectures). The fixed defaults are also troublesome for
users with hardware not included in the default set (e.g., gfx1011,
gfx1031, gfx1100).
A possible alternative might be to detect the architecture within
hip-config.cmake rather than running the detection logic on each
compiler invocation. However, this approach is simpler.
Change-Id: I9495d766b7eed03852eb4dc72b0aabe4100bc32c
Signed-off-by: Cordell Bloor <Cordell.Bloor@amd.com>
[ROCm/clr commit: e1bed6f354]
HIPRTC_INIT_API can have nullptr in the arguments and ClPrint
can crash while printing
Change-Id: Iecade5c3867196509c8cc0647b9aa24be0960a02
[ROCm/clr commit: c98fad1edc]
Add dstMemory format updating.
Separate format updating for srcMemory and dstMemory.
Change-Id: I1692b92d417bbd742d562679f218ebf8ca532e92
[ROCm/clr commit: 7624a48de9]
The previous implementation using std::copy() resulted in
differences between the in-memory and on-disk representations.
With the updated implementation, we get the same contents.
Change-Id: Iadfae3cd7f7ba99538da2ac4f11f30f5a78260d8
[ROCm/clr commit: b17056cb93]
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.
Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
[ROCm/clr commit: 04b696abee]
hipStreamPerThrdCompilerOptn.cc test fails to build with cudaStreamGetCaptureInfo_v2
in CUDA 12.0.
fix was to change runtime API cudaStreamGetCaptureInfo_v2
to Driver cuStreamGetCaptureInfo_v2
Change-Id: I44a0110770d3246f5345092acae301c9a2f6d520
[ROCm/clr commit: 0aa70ee0e1]
- Introduce a state variable to indicate if HwProfiling is enabled to
eliminate a possible data race of vector<> signals_.
Change-Id: Id504cc76d7fa9f7e6455587dd232b60ccbbb735b
[ROCm/clr commit: afa28cdf44]
- correct error for hipStreamWaitEvent when event recorded before
capture
- correct hipEventSync when event is synced during capture
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I7ecbed5621eaf323846d4ccb20ec112aaa8a5757
[ROCm/clr commit: 544318fffe]