İşleme Grafiği

1171 İşleme

Yazar SHA1 Mesaj Tarih
David Yat Sin 4ed5950beb rocr: Fix logic for scratch reclaim
Fix logic error that can cause scratch memory to be reclaimed while a
dispatch is still using it.
2025-04-29 17:23:45 -04:00
Tony Gutierrez f2c482d923 rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6e3c375bf1 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
2025-04-23 15:53:29 -04:00
Tony Gutierrez 11d1d2cd25 rocr: Remove empty shared.cpp 2025-04-23 15:53:29 -04:00
Tony Gutierrez adbc0495e2 rocr/libhsakmt: Add coarse-grain allocator to GPU 2025-04-23 15:53:29 -04:00
Saleel Kudchadker 57c0c643ce rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.
2025-04-22 13:28:38 -07:00
Yiannis Papadopoulos 7c8fa87160 rocr/aie: Remove redundant cache flushes for already loaded PDIs 2025-04-17 09:48:41 -05:00
Shane Xiao 6a63170b38 rocr: Add rec sdma engines with limited XGMI SDMA engine
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.

Signed-off-by: Shane Xiao <shane.xiao@amd.com>
2025-04-11 23:54:15 +08:00
David Yat Sin c1b7aa39ed rocr: refactor PC Sampling PRED_EXEC op
Refactor PRED_EXEC op command size calculation.
Fix issue when copy size is less than 32MB.
2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos 2d2c47bdef rocr/aie: Increment write pointer upon packet submission 2025-04-08 15:36:40 -05:00
Yiannis Papadopoulos c63e01724c rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition. 2025-04-03 15:13:20 -05:00
Lancelot SIX e0359e5d35 rocr: Replace tabs with spaces in trap handler source codes
Use spaces consistently to format the trap handler code.  This patch
does not introduce any change in the trap handler.  Using `git show -w`
on this patch shows an empty diff.

Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a
2025-04-03 09:44:23 +01:00
David Yat Sin 2a433e2b96 rocr: Fix PC Sampling PRED_EXEC num dwords count
Fix incorrect value for number of dwords in the PRED_EXEC command.
2025-04-01 15:53:45 -04:00
Lancelot SIX 6a4785f650 Fix Stochastic sampling trap handler
The trap handler should read the PERF_SNAPSHOT_DATA after all of
PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI.  This
patch fixes this.

Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658
2025-03-31 10:20:19 +01:00
Lancelot SIX eece210a5c trap_handler.s: Clear PERF_SNAPSHOT/HOST_TRAP before returning
Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning
from the second level trap handler.  As those bits are sticky, this
ensures future re-entry to the trap handler (for context save for
example) will not be confused with a sampling trap.

Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
2025-03-31 10:20:19 +01:00
Yiannis Papadopoulos 0bd4acb5d4 rocr/aie: Returning error code if query not recognized 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos e55503e7f8 rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos f4e1c9b0ba rocr/aie: Avoiding XdnaDriver class in queue API 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos 8dcbbf31c7 rocr/aie: Remove unused struct from HSA API 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos bf8ab493c4 rocr: Remove unused lambda 2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos b066e0eefa rocr/aie: Resolve parentheses warning 2025-03-27 10:33:40 -04:00
David Yat Sin 947391deac rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos a66130bc48 rocr: Release vmem handles before agent destruction 2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 765563b786 rocr: Return success status in IsModelEnabled() 2025-03-25 10:05:16 -04:00
lyndonli c34a2798ce rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>
2025-03-25 09:13:59 -04:00
Adel Johar d8d27d4fd6 Docs: Add more variables to env_variables.rst 2025-03-20 11:59:58 -04:00
Shweta Khatri 2ae70735e8 rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.
2025-03-19 14:42:41 -04:00
Lao, Darren cd4d236185 rocr: Change ISA grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-19 13:44:17 -04:00
randyh62 e2f3e8c0de fix license include path 2025-03-18 16:29:10 -04:00
David Yat Sin ce0244ac03 Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems
This reverts commit 6dac90c89a.
2025-03-18 16:28:36 -04:00
jordans d4b85b6bf5 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
2025-03-18 16:22:17 -04:00
David Yat Sin 6903a41b1d rocr: Workaround for SDMA POLL_REGMEM on gfx9.0
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.
2025-03-17 17:59:15 -04:00
Benjamin Welton d2a89a467b rocr: Reset event_age when signals move
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.
2025-03-13 11:32:16 -04:00
Yiannis Papadopoulos c7936334cf rocr/aie: Changing variable names 2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos fb33e2e724 rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion 2025-03-11 19:35:21 -04:00
Longlong Yao a254e35fd6 rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-03-11 10:16:58 -04:00
zichguan-amd 3415a500c7 Throw exception when runtime not initialized for hsa_amd_signal_wait_*
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
zichguan-amd e4d027191c rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
David Yat Sin 02b38d0614 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.
2025-03-06 16:23:38 -05:00
David Yat Sin 6dac90c89a rocr: Only expose ext-fine-grain pool on xgmi-hive systems
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.
2025-03-05 10:41:38 -05:00
Lao, Darren 0cd46b6582 rocr: Change grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-04 16:19:51 -05:00
David Yat Sin d031af9eb5 rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
2025-03-04 10:29:34 -05:00
David Yat Sin 3944da1d76 rocr:Only set asan flag on GPU agents 2025-03-03 14:51:19 -05:00
David Yat Sin 9a950ab788 rocr: Temporarily disable alternate scratch memory
Temporarily disable alternate scratch memory usage by default due to
some stability issues.
2025-03-03 09:27:29 -05:00
Khatri, Shweta 0984a1f0fd rocr: GFX9, GFX10, GFX11: Use view3dAs2dArray flag, for thick/3D swizzle modes. (#58)
A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving on GFX9, GF10 and GFX11. By default support for swizzles that
do 3D interleaving is disabled.
2025-02-26 09:38:17 -05:00
Tony Gutierrez d3a4dc9687 rocr: Remove KMT usage from AMD ext
Use the core Driver in AMD's HSA extension API to make it
agnostic to the underlying OS and kernel-mode driver.
2025-02-25 21:51:52 -05:00
Khatri, Shweta 322a794cf6 rocr: Adding support for Stochastic PC Sampling for gfx94x (#47)
Change-Id: Ide4c2e25b88f1f25ea4ce35a619b93963c0355ee
2025-02-22 00:13:08 -05:00
Tony Gutierrez a9f6bc8d0e rocr: Remove KMT usage from CPU agent
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.

Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.
2025-02-21 10:00:38 -05:00
David Yat Sin 107b48fb15 rocr: Add queries for async scratch reclaim
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
  Maximum amount of scratch memory allowed on this agent

- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
  Current limit for scratch memory on this agent
2025-02-19 21:02:00 -05:00
David Yat Sin aa2f98e6f9 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
2025-02-19 21:02:00 -05:00