1371 Commity

Autor SHA1 Wiadomość Data
David Yat Sin b48b401a09 rocr: Fix logic for scratch reclaim
Fix logic error that can cause scratch memory to be reclaimed while a
dispatch is still using it.


[ROCm/ROCR-Runtime commit: 4ed5950beb]
2025-04-29 17:23:45 -04:00
Tony Gutierrez ce61e3301b rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).


[ROCm/ROCR-Runtime commit: f2c482d923]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6f37386eb2 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.


[ROCm/ROCR-Runtime commit: 6e3c375bf1]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 18404ba8a8 rocr: Remove empty shared.cpp
[ROCm/ROCR-Runtime commit: 11d1d2cd25]
2025-04-23 15:53:29 -04:00
Tony Gutierrez 3ebcf3020f rocr/libhsakmt: Add coarse-grain allocator to GPU
[ROCm/ROCR-Runtime commit: adbc0495e2]
2025-04-23 15:53:29 -04:00
Saleel Kudchadker 945d6da90b rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.


[ROCm/ROCR-Runtime commit: 57c0c643ce]
2025-04-22 13:28:38 -07:00
Yiannis Papadopoulos 8246b54f1e rocr/aie: Remove redundant cache flushes for already loaded PDIs
[ROCm/ROCR-Runtime commit: 7c8fa87160]
2025-04-17 09:48:41 -05:00
Shane Xiao 8d34f4e12d rocr: Add rec sdma engines with limited XGMI SDMA engine
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.

Signed-off-by: Shane Xiao <shane.xiao@amd.com>


[ROCm/ROCR-Runtime commit: 6a63170b38]
2025-04-11 23:54:15 +08:00
David Yat Sin 309a1354ab rocr: refactor PC Sampling PRED_EXEC op
Refactor PRED_EXEC op command size calculation.
Fix issue when copy size is less than 32MB.


[ROCm/ROCR-Runtime commit: c1b7aa39ed]
2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos 96b7e42776 rocr/aie: Increment write pointer upon packet submission
[ROCm/ROCR-Runtime commit: 2d2c47bdef]
2025-04-08 15:36:40 -05:00
Yiannis Papadopoulos f53a9c72c4 rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition.
[ROCm/ROCR-Runtime commit: c63e01724c]
2025-04-03 15:13:20 -05:00
Lancelot SIX c813d2c62d rocr: Replace tabs with spaces in trap handler source codes
Use spaces consistently to format the trap handler code.  This patch
does not introduce any change in the trap handler.  Using `git show -w`
on this patch shows an empty diff.

Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a


[ROCm/ROCR-Runtime commit: e0359e5d35]
2025-04-03 09:44:23 +01:00
David Yat Sin f46bc26cff rocr: Fix PC Sampling PRED_EXEC num dwords count
Fix incorrect value for number of dwords in the PRED_EXEC command.


[ROCm/ROCR-Runtime commit: 2a433e2b96]
2025-04-01 15:53:45 -04:00
Lancelot SIX fff4455589 Fix Stochastic sampling trap handler
The trap handler should read the PERF_SNAPSHOT_DATA after all of
PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI.  This
patch fixes this.

Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658


[ROCm/ROCR-Runtime commit: 6a4785f650]
2025-03-31 10:20:19 +01:00
Lancelot SIX 23254f7a1d trap_handler.s: Clear PERF_SNAPSHOT/HOST_TRAP before returning
Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning
from the second level trap handler.  As those bits are sticky, this
ensures future re-entry to the trap handler (for context save for
example) will not be confused with a sampling trap.

Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>


[ROCm/ROCR-Runtime commit: eece210a5c]
2025-03-31 10:20:19 +01:00
Yiannis Papadopoulos 2c731096c6 rocr/aie: Returning error code if query not recognized
[ROCm/ROCR-Runtime commit: 0bd4acb5d4]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos c142b04fc1 rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error
[ROCm/ROCR-Runtime commit: e55503e7f8]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos 4c0e8b5f70 rocr/aie: Avoiding XdnaDriver class in queue API
[ROCm/ROCR-Runtime commit: f4e1c9b0ba]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos bd109ec288 rocr/aie: Remove unused struct from HSA API
[ROCm/ROCR-Runtime commit: 8dcbbf31c7]
2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos a11d693a47 rocr: Remove unused lambda
[ROCm/ROCR-Runtime commit: bf8ab493c4]
2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos 13723c6308 rocr/aie: Resolve parentheses warning
[ROCm/ROCR-Runtime commit: b066e0eefa]
2025-03-27 10:33:40 -04:00
David Yat Sin edcc3a1ed5 rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.


[ROCm/ROCR-Runtime commit: 947391deac]
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 7a2b25e1ea rocr: Release vmem handles before agent destruction
[ROCm/ROCR-Runtime commit: a66130bc48]
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 427962679e rocr: Return success status in IsModelEnabled()
[ROCm/ROCR-Runtime commit: 765563b786]
2025-03-25 10:05:16 -04:00
lyndonli e9c934c116 rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>


[ROCm/ROCR-Runtime commit: c34a2798ce]
2025-03-25 09:13:59 -04:00
Adel Johar 6195f65f9e Docs: Add more variables to env_variables.rst
[ROCm/ROCR-Runtime commit: d8d27d4fd6]
2025-03-20 11:59:58 -04:00
Shweta Khatri b570f22aca rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.


[ROCm/ROCR-Runtime commit: 2ae70735e8]
2025-03-19 14:42:41 -04:00
Lao, Darren c03e4cfe4d rocr: Change ISA grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>

[ROCm/ROCR-Runtime commit: cd4d236185]
2025-03-19 13:44:17 -04:00
randyh62 407704bf61 fix license include path
[ROCm/ROCR-Runtime commit: e2f3e8c0de]
2025-03-18 16:29:10 -04:00
David Yat Sin d94b4becd8 Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems
This reverts commit 0097218f2b.


[ROCm/ROCR-Runtime commit: ce0244ac03]
2025-03-18 16:28:36 -04:00
jordans 938b34da24 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting


[ROCm/ROCR-Runtime commit: d4b85b6bf5]
2025-03-18 16:22:17 -04:00
David Yat Sin 9e8859636e rocr: Workaround for SDMA POLL_REGMEM on gfx9.0
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.


[ROCm/ROCR-Runtime commit: 6903a41b1d]
2025-03-17 17:59:15 -04:00
Benjamin Welton e62422520a rocr: Reset event_age when signals move
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.


[ROCm/ROCR-Runtime commit: d2a89a467b]
2025-03-13 11:32:16 -04:00
Yiannis Papadopoulos 566269e8b7 rocr/aie: Changing variable names
[ROCm/ROCR-Runtime commit: c7936334cf]
2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos 8e111ff2f0 rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion
[ROCm/ROCR-Runtime commit: fb33e2e724]
2025-03-11 19:35:21 -04:00
Longlong Yao 007795951b rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>


[ROCm/ROCR-Runtime commit: a254e35fd6]
2025-03-11 10:16:58 -04:00
zichguan-amd 1d51406e80 Throw exception when runtime not initialized for hsa_amd_signal_wait_*
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: 3415a500c7]
2025-03-07 15:17:10 -05:00
zichguan-amd b172fbd538 rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: e4d027191c]
2025-03-07 15:17:10 -05:00
David Yat Sin e130172218 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.


[ROCm/ROCR-Runtime commit: 02b38d0614]
2025-03-06 16:23:38 -05:00
David Yat Sin 0097218f2b rocr: Only expose ext-fine-grain pool on xgmi-hive systems
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.


[ROCm/ROCR-Runtime commit: 6dac90c89a]
2025-03-05 10:41:38 -05:00
Lao, Darren de8e56a964 rocr: Change grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>


[ROCm/ROCR-Runtime commit: 0cd46b6582]
2025-03-04 16:19:51 -05:00
David Yat Sin 35faa9783a rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.


[ROCm/ROCR-Runtime commit: d031af9eb5]
2025-03-04 10:29:34 -05:00
David Yat Sin 0a8ce4b90d rocr:Only set asan flag on GPU agents
[ROCm/ROCR-Runtime commit: 3944da1d76]
2025-03-03 14:51:19 -05:00
David Yat Sin d93d05bcf1 rocr: Temporarily disable alternate scratch memory
Temporarily disable alternate scratch memory usage by default due to
some stability issues.


[ROCm/ROCR-Runtime commit: 9a950ab788]
2025-03-03 09:27:29 -05:00
Khatri, Shweta 9816c2ecd3 rocr: GFX9, GFX10, GFX11: Use view3dAs2dArray flag, for thick/3D swizzle modes. (#58)
A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving on GFX9, GF10 and GFX11. By default support for swizzles that
do 3D interleaving is disabled.

[ROCm/ROCR-Runtime commit: 0984a1f0fd]
2025-02-26 09:38:17 -05:00
Tony Gutierrez 3b30b8a975 rocr: Remove KMT usage from AMD ext
Use the core Driver in AMD's HSA extension API to make it
agnostic to the underlying OS and kernel-mode driver.


[ROCm/ROCR-Runtime commit: d3a4dc9687]
2025-02-25 21:51:52 -05:00
Khatri, Shweta e00c926d27 rocr: Adding support for Stochastic PC Sampling for gfx94x (#47)
Change-Id: Ide4c2e25b88f1f25ea4ce35a619b93963c0355ee

[ROCm/ROCR-Runtime commit: 322a794cf6]
2025-02-22 00:13:08 -05:00
Tony Gutierrez 727159b4db rocr: Remove KMT usage from CPU agent
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.

Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.


[ROCm/ROCR-Runtime commit: a9f6bc8d0e]
2025-02-21 10:00:38 -05:00
David Yat Sin 2dcc1989bc rocr: Add queries for async scratch reclaim
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
  Maximum amount of scratch memory allowed on this agent

- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
  Current limit for scratch memory on this agent


[ROCm/ROCR-Runtime commit: 107b48fb15]
2025-02-19 21:02:00 -05:00
David Yat Sin 5905b82579 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.


[ROCm/ROCR-Runtime commit: aa2f98e6f9]
2025-02-19 21:02:00 -05:00