Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
ROCr initially had a bug where memory allocations that were not 4K
aligned were internally 4K aligned but ROCr would not keep track
of user-requested size. This would cause some pointer_info queries
to fail, but HIP was already aligning the buffer sizes for IPC
requests. For backward compatibility accross 2 minor versions,
we allowed IPC look-ups to be both aligned and un-aligned.
Removing this check as this 4 minor versions have been released
since then.
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.
The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
gfx9-generic cannot support sramecc- and sramecc+.
sramecc feature is only configurable on gfx906.
The code object produced for gfx9-generic can be loaded on both
gfx906 with any sramecc setting, compiler will produce the isa
that will correctly work on both (EF_AMDGPU_FEATURE_SRAMECC_ANY_V4).
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
Using HSA_ENABLE_DTIF to control dtif/native thunk code path
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
When copying for inter devices, Currently only XGMI as exposed. Now
SDMA0/1 will be exposed as well for inter device copies especially that
they are one of the recommended engines.
Signed-off-by: Li Ma <li.ma@amd.com>
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.
Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.
This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.
Signed-off-by: Shane Xiao <shane.xiao@amd.com>