rocm-systems

Autor	SHA1	Wiadomość	Data
David Yat Sin	b48b401a09	rocr: Fix logic for scratch reclaim Fix logic error that can cause scratch memory to be reclaimed while a dispatch is still using it. [ROCm/ROCR-Runtime commit: `4ed5950beb`]	2025-04-29 17:23:45 -04:00
Tony Gutierrez	ce61e3301b	rocr: Add large_bar_enabled var to the GPU agent Adds a bool to the GPU agent and a public member method to check if the GPU supports large BAR. This is needed so we can check if large BAR is supported when a user tries to allocate an AQL queue in device memory on a given GPU agent. Also adds an exception to the AQL queue if device-side AQL queues are requested and the GPU owner of the AQL doesn't support large BAR. Otherwise, ROCr will currently allow device-side queues that can cause faults when the user tries to touch their ring buffers and the user will not know why the faults are occuring. This relies on the fact that the KFD does not exposed any links from the CPU to the GPU if large BAR is not enabled (though links from the GPU to the CPU may still be exposed by the KFD). [ROCm/ROCR-Runtime commit: `f2c482d923`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	6f37386eb2	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases. [ROCm/ROCR-Runtime commit: `6e3c375bf1`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	18404ba8a8	rocr: Remove empty shared.cpp [ROCm/ROCR-Runtime commit: `11d1d2cd25`]	2025-04-23 15:53:29 -04:00
Tony Gutierrez	3ebcf3020f	rocr/libhsakmt: Add coarse-grain allocator to GPU [ROCm/ROCR-Runtime commit: `adbc0495e2`]	2025-04-23 15:53:29 -04:00
Saleel Kudchadker	945d6da90b	rocr: return preferred SDMA engine mask - Add a new AMD extension API to return preferred SDMA engine mask. This can use used in conjunction with copy_on_engine API to get optimal bandwidth. [ROCm/ROCR-Runtime commit: `57c0c643ce`]	2025-04-22 13:28:38 -07:00
Yiannis Papadopoulos	8246b54f1e	rocr/aie: Remove redundant cache flushes for already loaded PDIs [ROCm/ROCR-Runtime commit: `7c8fa87160`]	2025-04-17 09:48:41 -05:00
Shane Xiao	8d34f4e12d	rocr: Add rec sdma engines with limited XGMI SDMA engine This patch will adds recommended sdma supports with limited XGMI SDMA engine. It will use one PCIe SDMA to do gpu <-> gpu copies which will help improve all to all copy performance. Signed-off-by: Shane Xiao <shane.xiao@amd.com> [ROCm/ROCR-Runtime commit: `6a63170b38`]	2025-04-11 23:54:15 +08:00
David Yat Sin	309a1354ab	rocr: refactor PC Sampling PRED_EXEC op Refactor PRED_EXEC op command size calculation. Fix issue when copy size is less than 32MB. [ROCm/ROCR-Runtime commit: `c1b7aa39ed`]	2025-04-08 17:26:29 -04:00
Yiannis Papadopoulos	96b7e42776	rocr/aie: Increment write pointer upon packet submission [ROCm/ROCR-Runtime commit: `2d2c47bdef`]	2025-04-08 15:36:40 -05:00
Yiannis Papadopoulos	f53a9c72c4	rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition. [ROCm/ROCR-Runtime commit: `c63e01724c`]	2025-04-03 15:13:20 -05:00
Lancelot SIX	c813d2c62d	rocr: Replace tabs with spaces in trap handler source codes Use spaces consistently to format the trap handler code. This patch does not introduce any change in the trap handler. Using `git show -w` on this patch shows an empty diff. Change-Id: Ic0244dd203347146ffde65460cd87ecbcc43732a [ROCm/ROCR-Runtime commit: `e0359e5d35`]	2025-04-03 09:44:23 +01:00
David Yat Sin	f46bc26cff	rocr: Fix PC Sampling PRED_EXEC num dwords count Fix incorrect value for number of dwords in the PRED_EXEC command. [ROCm/ROCR-Runtime commit: `2a433e2b96`]	2025-04-01 15:53:45 -04:00
Lancelot SIX	fff4455589	Fix Stochastic sampling trap handler The trap handler should read the PERF_SNAPSHOT_DATA after all of PERF_SNAPSHOT_DATA, PERF_SNAPSHOT_PC_LO and PERF_SNAPSHOT_PC_HI. This patch fixes this. Change-Id: I7f78e16d7a0d8bfebb34906b4dff73c2eaeb5658 [ROCm/ROCR-Runtime commit: `6a4785f650`]	2025-03-31 10:20:19 +01:00
Lancelot SIX	23254f7a1d	trap_handler.s: Clear PERF_SNAPSHOT/HOST_TRAP before returning Make sure to clear the HOST_TRAP and PERF_SNAPSHOT bits before returning from the second level trap handler. As those bits are sticky, this ensures future re-entry to the trap handler (for context save for example) will not be confused with a sampling trap. Change-Id: I05e5e58779a650b324ac6e30d574dc6931340f13 Signed-off-by: Lancelot SIX <lancelot.six@amd.com> [ROCm/ROCR-Runtime commit: `eece210a5c`]	2025-03-31 10:20:19 +01:00
Yiannis Papadopoulos	2c731096c6	rocr/aie: Returning error code if query not recognized [ROCm/ROCR-Runtime commit: `0bd4acb5d4`]	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	c142b04fc1	rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error [ROCm/ROCR-Runtime commit: `e55503e7f8`]	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	4c0e8b5f70	rocr/aie: Avoiding XdnaDriver class in queue API [ROCm/ROCR-Runtime commit: `f4e1c9b0ba`]	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	bd109ec288	rocr/aie: Remove unused struct from HSA API [ROCm/ROCR-Runtime commit: `8dcbbf31c7`]	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	a11d693a47	rocr: Remove unused lambda [ROCm/ROCR-Runtime commit: `bf8ab493c4`]	2025-03-27 10:33:40 -04:00
Yiannis Papadopoulos	13723c6308	rocr/aie: Resolve parentheses warning [ROCm/ROCR-Runtime commit: `b066e0eefa`]	2025-03-27 10:33:40 -04:00
David Yat Sin	edcc3a1ed5	rocr: Release agent resources before pools Adding a general stage for agents to release their resources on shutdown. This avoids a circular dependency during shutdown because we have to delete allocated resources before deleting memory pools, but we also have to delete memory pools before destroying agents. [ROCm/ROCR-Runtime commit: `947391deac`]	2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos	7a2b25e1ea	rocr: Release vmem handles before agent destruction [ROCm/ROCR-Runtime commit: `a66130bc48`]	2025-03-25 14:25:04 -04:00
Yiannis Papadopoulos	427962679e	rocr: Return success status in IsModelEnabled() [ROCm/ROCR-Runtime commit: `765563b786`]	2025-03-25 10:05:16 -04:00
lyndonli	e9c934c116	rocr: Remove redundant Refresh() call The initial call to Refresh() in the constructor is unnecessary as it's handled in Runtime::Load(). Signed-off-by: lyndonli <Lyndon.Li@amd.com> [ROCm/ROCR-Runtime commit: `c34a2798ce`]	2025-03-25 09:13:59 -04:00
Adel Johar	6195f65f9e	Docs: Add more variables to env_variables.rst [ROCm/ROCR-Runtime commit: `d8d27d4fd6`]	2025-03-20 11:59:58 -04:00
Shweta Khatri	b570f22aca	rocr: Fix PcSamplingCreateFromId to pass 32-bit dword count to DmaFill In PcSamplingCreateFromId, convert number of bytes into number of dwords because DmaFill expects a count of 32-bit words, not raw bytes. This prevents OOB writes on large sampling buffers. [ROCm/ROCR-Runtime commit: `2ae70735e8`]	2025-03-19 14:42:41 -04:00
Lao, Darren	c03e4cfe4d	rocr: Change ISA grid dimensions Signed-off-by: Lao, Darren <Darren.Lao@amd.com> [ROCm/ROCR-Runtime commit: `cd4d236185`]	2025-03-19 13:44:17 -04:00
randyh62	407704bf61	fix license include path [ROCm/ROCR-Runtime commit: `e2f3e8c0de`]	2025-03-18 16:29:10 -04:00
David Yat Sin	d94b4becd8	Revert rocr: Only expose ext-fine-grain pool on xgmi-hive systems This reverts commit `0097218f2b`. [ROCm/ROCR-Runtime commit: `ce0244ac03`]	2025-03-18 16:28:36 -04:00
jordans	938b34da24	hsakmt: Initial Commit for the HSA KMT Model The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting [ROCm/ROCR-Runtime commit: `d4b85b6bf5`]	2025-03-18 16:22:17 -04:00
David Yat Sin	9e8859636e	rocr: Workaround for SDMA POLL_REGMEM on gfx9.0 Poll the dependent signals twice on all gfx9.0 GPUs except gfx90a. This is needed as a work-around for a rare issue where SDMA_POLL_REGMEM may return before the memory is actually cleared. [ROCm/ROCR-Runtime commit: `6903a41b1d`]	2025-03-17 17:59:15 -04:00
Benjamin Welton	e62422520a	rocr: Reset event_age when signals move Resets event_age when signals move. Prior to this PR, event_age can become unaligned with hsa_event, causing hangs if the event_age exceeds the true hsa_event age. [ROCm/ROCR-Runtime commit: `d2a89a467b`]	2025-03-13 11:32:16 -04:00
Yiannis Papadopoulos	566269e8b7	rocr/aie: Changing variable names [ROCm/ROCR-Runtime commit: `c7936334cf`]	2025-03-11 19:35:21 -04:00
Yiannis Papadopoulos	8e111ff2f0	rocr/aie: Handle non-HSA_STATUS_SUCCESS during VisitRegion [ROCm/ROCR-Runtime commit: `fb33e2e724`]	2025-03-11 19:35:21 -04:00
Longlong Yao	007795951b	rocr: export pointer type for OnlyAddress Signed-off-by: Longlong Yao <Longlong.Yao@amd.com> [ROCm/ROCR-Runtime commit: `a254e35fd6`]	2025-03-11 10:16:58 -04:00
zichguan-amd	1d51406e80	Throw exception when runtime not initialized for hsa_amd_signal_wait_* Signed-off-by: zichguan-amd <zichuan.guan@amd.com> [ROCm/ROCR-Runtime commit: `3415a500c7`]	2025-03-07 15:17:10 -05:00
zichguan-amd	b172fbd538	rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets. Signed-off-by: zichguan-amd <zichuan.guan@amd.com> [ROCm/ROCR-Runtime commit: `e4d027191c`]	2025-03-07 15:17:10 -05:00
David Yat Sin	e130172218	rocr: Put back scratch_backing_memory_byte_size The scratch_backing_memory_byte_size is not used by CP, but it is currently used by rocgdb. Putting the field back, but we need to find a solution for alt_scratch_backing_memory_byte_size. Also, completely disabling alternate scratch as we need some changes to support debugger. [ROCm/ROCR-Runtime commit: `02b38d0614`]	2025-03-06 16:23:38 -05:00
David Yat Sin	0097218f2b	rocr: Only expose ext-fine-grain pool on xgmi-hive systems We cannot guarrantee system-scope coherency on systems with only PCIe connections, so do not expose extended fine-grain memory pool on these systems. [ROCm/ROCR-Runtime commit: `6dac90c89a`]	2025-03-05 10:41:38 -05:00
Lao, Darren	de8e56a964	rocr: Change grid dimensions Signed-off-by: Lao, Darren <Darren.Lao@amd.com> [ROCm/ROCR-Runtime commit: `0cd46b6582`]	2025-03-04 16:19:51 -05:00
David Yat Sin	35faa9783a	rocr: Check RLIMIT_CORE before generating coredump Check for RLIMIT_CORE before collecting data for coredump. If the current limit is 0, then we can return early without spending time collecting coredump data. [ROCm/ROCR-Runtime commit: `d031af9eb5`]	2025-03-04 10:29:34 -05:00
David Yat Sin	0a8ce4b90d	rocr:Only set asan flag on GPU agents [ROCm/ROCR-Runtime commit: `3944da1d76`]	2025-03-03 14:51:19 -05:00
David Yat Sin	d93d05bcf1	rocr: Temporarily disable alternate scratch memory Temporarily disable alternate scratch memory usage by default due to some stability issues. [ROCm/ROCR-Runtime commit: `9a950ab788`]	2025-03-03 09:27:29 -05:00
Khatri, Shweta	9816c2ecd3	rocr: GFX9, GFX10, GFX11: Use view3dAs2dArray flag, for thick/3D swizzle modes. (#58 ) A HSA_IMAGE_ENABLE_3D_SWIZZLE_DEBUG environment flag exists already to enable/disable this. Default value is false (view3dAs2dArray = 1) Enabling this flag will enable support for swizzles that do 3D interleaving on GFX9, GF10 and GFX11. By default support for swizzles that do 3D interleaving is disabled. [ROCm/ROCR-Runtime commit: `0984a1f0fd`]	2025-02-26 09:38:17 -05:00
Tony Gutierrez	3b30b8a975	rocr: Remove KMT usage from AMD ext Use the core Driver in AMD's HSA extension API to make it agnostic to the underlying OS and kernel-mode driver. [ROCm/ROCR-Runtime commit: `d3a4dc9687`]	2025-02-25 21:51:52 -05:00
Khatri, Shweta	e00c926d27	rocr: Adding support for Stochastic PC Sampling for gfx94x (#47 ) Change-Id: Ide4c2e25b88f1f25ea4ce35a619b93963c0355ee [ROCm/ROCR-Runtime commit: `322a794cf6`]	2025-02-22 00:13:08 -05:00
Tony Gutierrez	727159b4db	rocr: Remove KMT usage from CPU agent Use the core Driver object in the CPU agent to make it OS/driver agnostic. Implement the GetMemoryProperties() and GetCacheProperties methods for the KFD driver. [ROCm/ROCR-Runtime commit: `a9f6bc8d0e`]	2025-02-21 10:00:38 -05:00
David Yat Sin	2dcc1989bc	rocr: Add queries for async scratch reclaim Add support for these 2 new queries: - HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX Maximum amount of scratch memory allowed on this agent - HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT Current limit for scratch memory on this agent [ROCm/ROCR-Runtime commit: `107b48fb15`]	2025-02-19 21:02:00 -05:00
David Yat Sin	5905b82579	rocr: Update for new async scratch reclaim Updating ROCr code to match new handshake protocol with CP FW for asynchronous scratch reclaim. Increase previous limits when scratch reclaim feature is available. [ROCm/ROCR-Runtime commit: `aa2f98e6f9`]	2025-02-19 21:02:00 -05:00

... 3 4 5 6 7 ...

1371 Commity