Граф коммитов

1152 Коммитов

Автор SHA1 Сообщение Дата
Yiannis Papadopoulos bf8ab493c4 rocr: Remove unused lambda 2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos b066e0eefa rocr/aie: Resolve parentheses warning 2025-03-27 10:33:40 -04:00
David Yat Sin 947391deac rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos a66130bc48 rocr: Release vmem handles before agent destruction 2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos 765563b786 rocr: Return success status in IsModelEnabled() 2025-03-25 10:05:16 -04:00
lyndonli c34a2798ce rocr: Remove redundant Refresh() call
The initial call to Refresh() in the constructor is
unnecessary as it's handled in Runtime::Load().

Signed-off-by: lyndonli <Lyndon.Li@amd.com>
2025-03-25 09:13:59 -04:00
Adel Johar d8d27d4fd6 Docs: Add more variables to env_variables.rst 2025-03-20 11:59:58 -04:00
Shweta Khatri 2ae70735e8 rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill
In PcSamplingCreateFromId, convert number of bytes into number of
dwords because DmaFill expects a count of 32-bit words, not raw bytes.
This prevents OOB writes on large sampling buffers.
2025-03-19 14:42:41 -04:00
Lao, Darren cd4d236185 rocr: Change ISA grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-19 13:44:17 -04:00
randyh62 e2f3e8c0de fix license include path 2025-03-18 16:29:10 -04:00
David Yat Sin ce0244ac03 Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems
This reverts commit 6dac90c89a.
2025-03-18 16:28:36 -04:00
jordans d4b85b6bf5 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
2025-03-18 16:22:17 -04:00
David Yat Sin 6903a41b1d rocr: Workaround for SDMA POLL_REGMEM on gfx9.0
Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a.
This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM
may return before the memory is actually cleared.
2025-03-17 17:59:15 -04:00
Benjamin Welton d2a89a467b rocr: Reset event_age when signals move
Resets event_age when signals move. Prior to this PR, event_age
can become unaligned with hsa_event, causing hangs if the event_age
exceeds the true hsa_event age.
2025-03-13 11:32:16 -04:00
Yiannis Papadopoulos c7936334cf rocr/aie: Changing variable names 2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos fb33e2e724 rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion 2025-03-11 19:35:21 -04:00
Longlong Yao a254e35fd6 rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-03-11 10:16:58 -04:00
zichguan-amd 3415a500c7 Throw exception when runtime not initialized for hsa_amd_signal_wait_*
Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
zichguan-amd e4d027191c rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-03-07 15:17:10 -05:00
David Yat Sin 02b38d0614 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.
2025-03-06 16:23:38 -05:00
David Yat Sin 6dac90c89a rocr: Only expose ext-fine-grain pool on xgmi-hive systems
We cannot guarrantee system-scope coherency on systems with only PCIe
connections, so do not expose extended fine-grain memory pool on these
systems.
2025-03-05 10:41:38 -05:00
Lao, Darren 0cd46b6582 rocr: Change grid dimensions
Signed-off-by: Lao, Darren <Darren.Lao@amd.com>
2025-03-04 16:19:51 -05:00
David Yat Sin d031af9eb5 rocr: Check RLIMIT_CORE before generating coredump
Check for RLIMIT_CORE before collecting data for coredump. If the
current limit is 0, then we can return early without spending time
collecting coredump data.
2025-03-04 10:29:34 -05:00
David Yat Sin 3944da1d76 rocr:Only set asan flag on GPU agents 2025-03-03 14:51:19 -05:00
David Yat Sin 9a950ab788 rocr: Temporarily disable alternate scratch memory
Temporarily disable alternate scratch memory usage by default due to
some stability issues.
2025-03-03 09:27:29 -05:00
Khatri, Shweta 0984a1f0fd rocr: GFX9, GFX10, GFX11: Use view3dAs2dArray flag, for thick/3D swizzle modes. (#58)
A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving on GFX9, GF10 and GFX11. By default support for swizzles that
do 3D interleaving is disabled.
2025-02-26 09:38:17 -05:00
Tony Gutierrez d3a4dc9687 rocr: Remove KMT usage from AMD ext
Use the core Driver in AMD's HSA extension API to make it
agnostic to the underlying OS and kernel-mode driver.
2025-02-25 21:51:52 -05:00
Khatri, Shweta 322a794cf6 rocr: Adding support for Stochastic PC Sampling for gfx94x (#47)
Change-Id: Ide4c2e25b88f1f25ea4ce35a619b93963c0355ee
2025-02-22 00:13:08 -05:00
Tony Gutierrez a9f6bc8d0e rocr: Remove KMT usage from CPU agent
Use the core Driver object in the CPU agent to make it OS/driver
agnostic.

Implement the GetMemoryProperties() and GetCacheProperties methods
for the KFD driver.
2025-02-21 10:00:38 -05:00
David Yat Sin 107b48fb15 rocr: Add queries for async scratch reclaim
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
  Maximum amount of scratch memory allowed on this agent

- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
  Current limit for scratch memory on this agent
2025-02-19 21:02:00 -05:00
David Yat Sin aa2f98e6f9 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.
2025-02-19 21:02:00 -05:00
David Yat Sin 2f8a9b28d0 rocr: Remove unused fields in amd_queue_t
scratch_wave64_lane_byte_size and alt_scratch_wave64_lane_byte_size are
not used by CP FW.
2025-02-19 21:02:00 -05:00
David Yat Sin 13c591d250 rocr: Remove gfx940 and gfx941 support 2025-02-19 12:16:24 -05:00
David Yat Sin fa8be44df9 rocr: Allow IPC signals in hsa_amd_signal_async_handler
Allow IPC signals to be registered with hsa_amd_signal_async_handler.
This forces AsyncEventsLoop to switch to polling instead of interrupts.
2025-02-19 11:19:09 -05:00
Adel Johar b4f8b5c202 Docs: Update environment variables page 2025-02-14 10:15:20 -05:00
Saleel Kudchadker 890399a7cf rocr: Skip uSleep for non-interrupt signals
- When waiting on non-interrupt signals, do not uSleep. This causes
  regressions compared to interrupt signal usage.
- Cleanup code.

Change-Id: I706bda0b13e64ffec0b607c1915d8380a2ce0dea
2025-02-06 23:48:35 -05:00
Luna Nova 166b08346b rocr: set underlying type of hsa_region
Set underlying type of hsa_region_info_t, hsa_amd_region_info_t
to int.

Change-Id: Ibf97a025eec6176d8e28af8009e9bd6795ca061f
2025-02-06 16:25:03 -05:00
sonadeem ff01f62777 cmake: Fix BUILD_SHARED_LIBS option and README for it
BUILD_SHARED_LIBS is a global flag so we don't need to set a default
option for it in both libhsakmt and hsa-runtime, only the top level
CMakeLists file. Also updated README to reflect that libhsakmt is
always built statically and gets linked to libhsa-runtime.

Change-Id: I1511f68a268032bec9758bc731d8074f33ec980f
2025-01-30 14:17:27 -05:00
Sv. Lockal 5d04bd42f3 Fix build issues for musl libc (#267)
Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9
2025-01-29 14:31:05 +00:00
Yiannis Papadopoulos 03bd4c9508 rocr: Changing to a device SVM flag
Change-Id: Ib085801d23604eeef0a17a05cf2b298170fb3d24
2025-01-28 17:06:16 +00:00
Yiannis Papadopoulos 144e7674d1 rocr: Use SVM information to separate dev heap
Use SVM information from user accessible memory

Change-Id: I8fad37eb1a90dc1f5827a096552130a3fd6187f4
2025-01-28 17:05:52 +00:00
Min Zhou a82f2f3134 rocr: delete duplicated conditional expression
Change-Id: Idc8b1a8ca2975f33191a448f03cabf3fc4f8f8a6
2025-01-28 10:48:44 -05:00
Yiannis Papadopoulos 1d8a77db34 rocr/aie: AIE agent memory pools correct size and user data pool
Change-Id: I831711a7d1cdc36cbc9ed30bd74d0dc984228ce7
2025-01-28 10:48:16 -05:00
Yiannis Papadopoulos 26bfa0b8f6 rocr/aie: Add dma-buf import support for AIEAgents via the Driver interface
Change-Id: I70f8d8772dda7c06944d75042cb3034ddd89aff4
2025-01-27 15:22:46 -05:00
Shweta Khatri 6361466baa rocr: Use view3dAs2dArray flag, for thick/3D swizzle modes.
Added HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag to
enable/disable this. Default value is false (view3dAs2dArray = 1)
Enabling this flag will enable support for swizzles that do 3D
interleaving. Note that all features of 3D images are supported
with 2D swizzles,it's just that the access patterns are different
and therefore cache hit-rates may be better or worse, depending
on how it's used. Volumetric algorithms do better with 3D and apps
that tend to access a single slice at a time do better with 2D.

Change-Id: Id8574a6710fe4333a1ee331e5ce9195a81434198
2025-01-27 09:28:33 -05:00
Tony Gutierrez 8a38f121ea rocr: Add WaitMultiple to core Signal
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.

WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.

Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75
2025-01-27 09:21:43 -05:00
David Yat Sin dab8f2fc65 rocr: Add support for gfx950
<squashed with patch for gfx950 generic targets>

Signed-off-by: Chris Freehill <Chris.Freehill@amd.com>

Change-Id: Ifec6d93cf46c7fbf736c6572882299e279260af6
2025-01-26 13:04:58 -05:00
Ben Vanik 7d64fe49fa rocr: Fix HostQueue to obey the alignment requirement
Change-Id: I06542e9ff94e826ca0abba0328b301fec50a95ea
2025-01-24 12:08:11 -05:00
David Yat Sin 7ea25ebb85 rocr: Add thread priority for AsyncEventHandler
Set priority to maximum for signal event handler and minimum for
exceptions event handler.

Change-Id: I1b982d3c2e4c880fafc073fe1a542d01692a6fdc
2025-01-24 10:08:12 -05:00
Ben Vanik 9971e7b004 rocr: Fixing non-portable inline attribute on hsa_flag_* utilities.
Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184408
2025-01-23 15:09:21 -05:00