Граф коммитов

1261 Коммитов

Автор SHA1 Сообщение Дата
Yiannis Papadopoulos c63e01724c rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition. 2025-04-03 15:13:20 -05:00
Lancelot SIX e0359e5d35 rocr: Replace tabs with spaces in trap handler source codes
Use spaces consistently to format the trap handler code.  This patch
does not introduce any change in the trap handler.  Using `git show -w`
on this patch shows an empty diff.

Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a
2025-04-03 09:44:23 +01:00
David Yat Sin 2a433e2b96 rocr: Fix PC Sampling PRED_EXEC num dwords count
Fix incorrect value for number of dwords in the PRED_EXEC command.
2025-04-01 15:53:45 -04:00
Lancelot SIX 6a4785f650 Fix Stochastic sampling trap handler
The trap handler should read the PERF_SNAPSHOT_DATA after all of
PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI.  This
patch fixes this.

Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658
2025-03-31 10:20:19 +01:00
Lancelot SIX eece210a5c trap_handler.s: Clear PERF_SNAPSHOT/HOST_TRAP before returning
Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning
from the second level trap handler.  As those bits are sticky, this
ensures future re-entry to the trap handler (for context save for
example) will not be confused with a sampling trap.

Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
2025-03-31 10:20:19 +01:00
Yiannis Papadopoulos 0bd4acb5d4 rocr/aie: Returning error code if query not recognized 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos e55503e7f8 rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos f4e1c9b0ba rocr/aie: Avoiding XdnaDriver class in queue API 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos 8dcbbf31c7 rocr/aie: Remove unused struct from HSA API 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos bf8ab493c4 rocr: Remove unused lambda 2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos b066e0eefa rocr/aie: Resolve parentheses warning 2025-03-27 10:33:40 -04:00
David Yat Sin 947391deac rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos a66130bc48 rocr: Release vmem handles before agent destruction 2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 765563b786 rocr: Return success status in IsModelEnabled() 2025-03-25 10:05:16 -04:00
lyndonli c34a2798ce rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>
2025-03-25 09:13:59 -04:00
Adel Johar d8d27d4fd6 Docs: Add more variables to env_variables.rst 2025-03-20 11:59:58 -04:00
Shweta Khatri 2ae70735e8 rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.
2025-03-19 14:42:41 -04:00
Lao, Darren cd4d236185 rocr: Change ISA grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-19 13:44:17 -04:00
randyh62 e2f3e8c0de fix license include path 2025-03-18 16:29:10 -04:00
David Yat Sin ce0244ac03 Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems
This reverts commit 6dac90c89a.
2025-03-18 16:28:36 -04:00
jordans d4b85b6bf5 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
2025-03-18 16:22:17 -04:00
David Yat Sin 6903a41b1d rocr: Workaround for SDMA POLL_REGMEM on gfx9.0
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.
2025-03-17 17:59:15 -04:00
Benjamin Welton d2a89a467b rocr: Reset event_age when signals move
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.
2025-03-13 11:32:16 -04:00
Yiannis Papadopoulos c7936334cf rocr/aie: Changing variable names 2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos fb33e2e724 rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion 2025-03-11 19:35:21 -04:00
Longlong Yao a254e35fd6 rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-03-11 10:16:58 -04:00
zichguan-amd 3415a500c7 Throw exception when runtime not initialized for hsa_amd_signal_wait_*
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
zichguan-amd e4d027191c rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
David Yat Sin 02b38d0614 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.
2025-03-06 16:23:38 -05:00
David Yat Sin 6dac90c89a rocr: Only expose ext-fine-grain pool on xgmi-hive systems
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.
2025-03-05 10:41:38 -05:00
Lao, Darren 0cd46b6582 rocr: Change grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-04 16:19:51 -05:00
David Yat Sin d031af9eb5 rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
2025-03-04 10:29:34 -05:00
David Yat Sin 3944da1d76 rocr:Only set asan flag on GPU agents 2025-03-03 14:51:19 -05:00
David Yat Sin 9a950ab788 rocr: Temporarily disable alternate scratch memory
Temporarily disable alternate scratch memory usage by default due to
some stability issues.
2025-03-03 09:27:29 -05:00
Khatri, Shweta 0984a1f0fd rocr: GFX9, GFX10, GFX11: Use view3dAs2dArray flag, for thick/3D swizzle modes. (#58)
A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving on GFX9, GF10 and GFX11. By default support for swizzles that
do 3D interleaving is disabled.
2025-02-26 09:38:17 -05:00
Tony Gutierrez d3a4dc9687 rocr: Remove KMT usage from AMD ext
Use the core Driver in AMD's HSA extension API to make it
agnostic to the underlying OS and kernel-mode driver.
2025-02-25 21:51:52 -05:00
Khatri, Shweta 322a794cf6 rocr: Adding support for Stochastic PC Sampling for gfx94x (#47)
Change-Id: Ide4c2e25b88f1f25ea4ce35a619b93963c0355ee
2025-02-22 00:13:08 -05:00
Tony Gutierrez a9f6bc8d0e rocr: Remove KMT usage from CPU agent
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.

Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.
2025-02-21 10:00:38 -05:00
David Yat Sin 107b48fb15 rocr: Add queries for async scratch reclaim
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
  Maximum amount of scratch memory allowed on this agent

- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
  Current limit for scratch memory on this agent
2025-02-19 21:02:00 -05:00
David Yat Sin aa2f98e6f9 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
2025-02-19 21:02:00 -05:00
David Yat Sin 2f8a9b28d0 rocr: Remove unused fields in amd_queue_t
scratch_wave64_lane_byte_size and alt_scratch_wave64_lane_byte_size are
not used by CP FW.
2025-02-19 21:02:00 -05:00
David Yat Sin 13c591d250 rocr: Remove gfx940 and gfx941 support 2025-02-19 12:16:24 -05:00
David Yat Sin fa8be44df9 rocr: Allow IPC signals in hsa_amd_signal_async_handler
Allow IPC signals to be registered with hsa_amd_signal_async_handler.
This forces AsyncEventsLoop to switch to polling instead of interrupts.
2025-02-19 11:19:09 -05:00
Adel Johar b4f8b5c202 Docs: Update environment variables page 2025-02-14 10:15:20 -05:00
Saleel Kudchadker 890399a7cf rocr: Skip uSleep for non-interrupt signals
- When waiting on non-interrupt signals, do not uSleep. This causes
  regressions compared to interrupt signal usage.
- Cleanup code.

Change-Id: I706bda0b13e64ffec0b607c1915d8380a2ce0dea
2025-02-06 23:48:35 -05:00
Luna Nova 166b08346b rocr: set underlying type of hsa_region
Set underlying type of hsa_region_info_t, hsa_amd_region_info_t
to int.

Change-Id: Ibf97a025eec6176d8e28af8009e9bd6795ca061f
2025-02-06 16:25:03 -05:00
sonadeem ff01f62777 cmake: Fix BUILD_SHARED_LIBS option and README for it
BUILD_SHARED_LIBS is a global flag so we don't need to set a default
option for it in both libhsakmt and hsa-runtime, only the top level
CMakeLists file. Also updated README to reflect that libhsakmt is
always built statically and gets linked to libhsa-runtime.

Change-Id: I1511f68a268032bec9758bc731d8074f33ec980f
2025-01-30 14:17:27 -05:00
Sv. Lockal 5d04bd42f3 Fix build issues for musl libc (#267)
Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9
2025-01-29 14:31:05 +00:00
Yiannis Papadopoulos 03bd4c9508 rocr: Changing to a device SVM flag
Change-Id: Ib085801d23604eeef0a17a05cf2b298170fb3d24
2025-01-28 17:06:16 +00:00
Yiannis Papadopoulos 144e7674d1 rocr: Use SVM information to separate dev heap
Use SVM information from user accessible memory

Change-Id: I8fad37eb1a90dc1f5827a096552130a3fd6187f4
2025-01-28 17:05:52 +00:00