This is related to SWDEV-410182, but it's not enough to fix it.
Functions from device-libs are precompiled into llvm-ir in a "target agnostic" way
(in reality, it's not 100% target agnostic, which brings us many headaches).
When linking builtins (like device-libs) from the command line, we use the flag
-mlink-builtin-bitcode. The difference between regular linking of bitcode and
this flag is that the later propagates target-specific attributes. If this
attributes are not propagated, we can end up with incosistent target attributes.
Comgr provides the action AMD_COMGR_ACTION_COMPILE_SOURCE_WITH_DEVICE_LIBS_TO_BC
for this exact reason. The old action is currently deprecated and this one should
be used.
Change-Id: I518415214debdf4fedf0b1d81456d6e9fb8a3d19
Use large signal pool if profiler is connected or profiling forced
enabled. This is needed to mitigate signal creation overhead when
profiling as signals are attached to every packet and deeper batch may
show overhead of signal allocation.
Change-Id: I8034b8a20b55328b87d593bf044f59672f9653e8
This reverts commit 44a3935cda.
Implement the right way to make ExternalSemaphores be signalled
only after prior works on the stream have been finished.
Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa
- Program unique AQL index for debugger. The logic manages AQL array of packets per HW queue.
- Provide debug state to PAL
Change-Id: I38fa1f5435fa711fd1d44dc391f2e61eb2a25efa
This patch did not consider the dicussions in SWDEV-270908
> "we found that in GeekBench5, forcing Wave64 instead of the default
> Wave32 compute policy yields big gains in every subtest except one"
This reverts commit d6dc82b220.
Change-Id: Ice1728585b9d1b2c1b36a06cfa0b8c47cb2bfa49
Add a view bit to avoid original resource destruction when parent
dependency doesn't exist with the image view cache
Change-Id: I8277afd575af8f29951c5d1a9f7d94d784251657
Make sure parent_ field is cleared for the internal image views.
The internal image views don't require dependencies tracking.
The issue appeard only when Navi10 pitch workaround was enabled.
Change-Id: I376d212750085a9391f8c32fc2979dcb5d93c89c
- Enable CUs adjacent pairwise for WGP mode
- In HostQueue::terminate() do not segfault if virtual device hasn't been created
Change-Id: I94402ff333308af5824878086cc238b3993d534d
- Rename HIP_USE_SDMA_QUERY to DEBUG_CLR_USE_SDMA_QUERY as this is
supposed to be a temporary env var for debug purposes only.
Change-Id: If6ebd52ab87624375a3df24ceccdcc05c60a65af
Blit manager requires an image view to reduce the amount
of copy kernels. Creation/destruction of a view in ROCr is
an expensive operation. Thus, runtime can cache views for fast access.
Change-Id: Ia67d775b481cc8326d91215ca22d4a73c1dddb59
- Remove large bar memcpy path. Since we end up waiting for a barrier,
its defeating the true intent of the copy, Also memcpy over PCIE\XGMI is
introducing variability in perf for HPC apps like GROMACS
Change-Id: I3b5c9d9ce93333959c39023bf4f703e2ccb6e3af
rocclr/platform/external_memory.hpp:93:30: warning: class with destructor marked 'final' cannot be inherited from [-Wfinal-dtor-non-final-class]
93 | virtual ~ExternalImage() final {}
Change-Id: I56d760fa6c08544100e3bc03d35129bd16d8a428