Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.
Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.
This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
[ROCm/ROCR-Runtime commit: f2c482d923]
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
[ROCm/ROCR-Runtime commit: 6e3c375bf1]
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.
[ROCm/ROCR-Runtime commit: 57c0c643ce]
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
[ROCm/ROCR-Runtime commit: 6a63170b38]
Use spaces consistently to format the trap handler code. This patch
does not introduce any change in the trap handler. Using `git show -w`
on this patch shows an empty diff.
Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a
[ROCm/ROCR-Runtime commit: e0359e5d35]
The trap handler should read the PERF_SNAPSHOT_DATA after all of
PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI. This
patch fixes this.
Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658
[ROCm/ROCR-Runtime commit: 6a4785f650]
Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning
from the second level trap handler. As those bits are sticky, this
ensures future re-entry to the trap handler (for context save for
example) will not be confused with a sampling trap.
Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
[ROCm/ROCR-Runtime commit: eece210a5c]
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
[ROCm/ROCR-Runtime commit: 947391deac]
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().
Signed-off-by: lyndonli <Lyndon.Li@amd.com>
[ROCm/ROCR-Runtime commit: c34a2798ce]
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.
[ROCm/ROCR-Runtime commit: 2ae70735e8]
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
[ROCm/ROCR-Runtime commit: d4b85b6bf5]
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.
[ROCm/ROCR-Runtime commit: 6903a41b1d]
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.
[ROCm/ROCR-Runtime commit: d2a89a467b]
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
[ROCm/ROCR-Runtime commit: e4d027191c]
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.
Also, completely disabling alternate scratch as we need some changes to
support debugger.
[ROCm/ROCR-Runtime commit: 02b38d0614]
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.
[ROCm/ROCR-Runtime commit: 6dac90c89a]
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
[ROCm/ROCR-Runtime commit: d031af9eb5]
A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving on GFX9, GF10 and GFX11. By default support for swizzles that
do 3D interleaving is disabled.
[ROCm/ROCR-Runtime commit: 0984a1f0fd]
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.
Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.
[ROCm/ROCR-Runtime commit: a9f6bc8d0e]
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
Maximum amount of scratch memory allowed on this agent
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
Current limit for scratch memory on this agent
[ROCm/ROCR-Runtime commit: 107b48fb15]
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
[ROCm/ROCR-Runtime commit: aa2f98e6f9]