Previously, we used the following approach and Comgr actions
for device lib linking:
AMD_COMGR_COMPILE_SOURCE_TO_BC (compile with clang driver)
AMD_COMGR_ADD_DEVICE_LIBRARIES (link in device libs with
llvm-link API)
However, the clang driver can link in device libraries as part
of compilation, assuming a --rocm-path is set. In this context,
this is accomplished by using the following Comgr action instead:
AMD_COMGR_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC (compile and
link in device libs with clang driver)
Change-Id: I661465865365afecc44aa15d4df91bfab361af8d
[ROCm/clr commit: a4c5c44008]
hipcc and clang++ both have logic to detect the installed hardware
and to automatically select the appropriate AMDGPU target when it is
left unspecified. When the AMDGPU_TARGETS property is initialized with
a set of default values, it results in the addition of an explicit set
of --offload-arch flags being passed. These explicit architecture flags
disable the architecture autodetection in the compiler.
The resulting behaviour from setting fixed defaults makes it unpleasant
to compile with CMake because they increase the build times for projects
unless they are overriden (as most users do not need to build for all
five default architectures). The fixed defaults are also troublesome for
users with hardware not included in the default set (e.g., gfx1011,
gfx1031, gfx1100).
A possible alternative might be to detect the architecture within
hip-config.cmake rather than running the detection logic on each
compiler invocation. However, this approach is simpler.
Change-Id: I9495d766b7eed03852eb4dc72b0aabe4100bc32c
Signed-off-by: Cordell Bloor <Cordell.Bloor@amd.com>
[ROCm/clr commit: e1bed6f354]
HIPRTC_INIT_API can have nullptr in the arguments and ClPrint
can crash while printing
Change-Id: Iecade5c3867196509c8cc0647b9aa24be0960a02
[ROCm/clr commit: c98fad1edc]
Add dstMemory format updating.
Separate format updating for srcMemory and dstMemory.
Change-Id: I1692b92d417bbd742d562679f218ebf8ca532e92
[ROCm/clr commit: 7624a48de9]
The previous implementation using std::copy() resulted in
differences between the in-memory and on-disk representations.
With the updated implementation, we get the same contents.
Change-Id: Iadfae3cd7f7ba99538da2ac4f11f30f5a78260d8
[ROCm/clr commit: b17056cb93]
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.
Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
[ROCm/clr commit: 04b696abee]
hipStreamPerThrdCompilerOptn.cc test fails to build with cudaStreamGetCaptureInfo_v2
in CUDA 12.0.
fix was to change runtime API cudaStreamGetCaptureInfo_v2
to Driver cuStreamGetCaptureInfo_v2
Change-Id: I44a0110770d3246f5345092acae301c9a2f6d520
[ROCm/clr commit: 0aa70ee0e1]
- Introduce a state variable to indicate if HwProfiling is enabled to
eliminate a possible data race of vector<> signals_.
Change-Id: Id504cc76d7fa9f7e6455587dd232b60ccbbb735b
[ROCm/clr commit: afa28cdf44]
- correct error for hipStreamWaitEvent when event recorded before
capture
- correct hipEventSync when event is synced during capture
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I7ecbed5621eaf323846d4ccb20ec112aaa8a5757
[ROCm/clr commit: 544318fffe]
- The implementation in mempool graphs requires refcounting VA object.
That requires release() to update the map only on the actual destruction.
- Add GPU event tracking for paging operation. Otherwise, runtime
may not always flush IB.
Change-Id: Idf99ffb894321a38e04b490116a7ca435635918d
[ROCm/clr commit: 7ef2da5aba]
Rename VK interop to ExternalMemory object, since it should handle
DX interops also
Change-Id: I536ec46d3e53ece35234a2e29030393ad411b96d
[ROCm/clr commit: 3e5803c4c0]
GraphMemcpyNodeSetParamsFrom/ToSymbol APIs neew to check device id for
original src/dst is same as what is passed in while set.
Change-Id: If0b610808223dce9115562bb5e9b31c8eaa2df22
[ROCm/clr commit: b6aa27d4a3]