Commit Graph

388 Commitit

Tekijä SHA1 Viesti Päivämäärä
Flora Cui a765dd7e94 rocr: add specific flag for blit kernel object
so that aql-to-pm4 conversion could verify the validity of the kernel
object.

Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-07-17 21:55:02 +08:00
Honglei Huang 6c87f5b5ce rocr/driver: add memory residency management interface in driver
This commit introduces MakeMemoryResident and MakeMemoryUnresident
functions to KfdDriver and XdnaDriver classes.

- Added implementations in amd_kfd_driver.cpp
- Added stubs in amd_xdna_driver.cpp returning HSA_STATUS_ERROR
- Updated header files amd_kfd_driver.h and amd_xdna_driver.h
- Removed MakeKfdMemoryResident/Unresident from amd_memory_region.cpp

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang ab6bda7e96 rocr/driver: add memory registration and deregistration into driver
This commit completes the memory register/deregister interface change.

Removed static RegisterMemory and DeregisterMemory from MemoryRegion class

- Added pure virtual methods to base Driver interface in driver class
- Added implementation in KFD driver
- Modified MemoryRegion Lock and Unlock to use driver interface

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang 6c390e32cc rocr/driver: add AvailableMemory API to driver
This commit introduces a new AvailableMemory API to the KfdDriver and
 XdnaDriver classes.

- Implemented AvailableMemory in KfdDriver to return the available memory size
  using hsaKmtAvailableMemory.
- Added a stub implementation of AvailableMemory in XdnaDriver that returns an error.
- Updated the GpuAgent class to use the new AvailableMemory API instead of
  directly calling hsaKmtAvailableMemory.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang 9c18618847 rocr: add const version of driver() method to Agent class
This change adds a const-qualified version of the driver() method to the Agent
class, allowing const Agent objects to access their associated driver without
modifying the object's state.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang da8dd9e1e3 rocr/driver: add scratch memory allocation into driver interface
Add AllocateScratchMemory interface to Driver base class and implement it
in both KFD and XDNA drivers. This change encapsulates the low-level
scratch memory allocation details within driver implementations, making
the code more maintainable and the interface cleaner.

The main changes include:
- Add AllocateScratchMemory virtual method to Driver interface
- Implement the interface in KfdDriver with existing allocation logic
- Add stub implementation in XdnaDriver
- Update GpuAgent to use the new interface instead of direct KMT calls

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang 412e386b50 rocr/driver: move wallclock frequency query to driver layer
Move the wallclock frequency query from GpuAgent to driver layer to improve
code organization and support multiple driver types. This change:

1. Add GetWallclockFrequency API to KFD/XDNA drivers
2. Move libdrm GPU info query from GpuAgent to driver implementation
3. Update GpuAgent to use the new driver API

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 9bc38e2ee6 rocr/driver: add support for getting GPU tile configuration
- Implemented GetTileConfig in KfdDriver to retrieve tile configuration for
a specific node.
- Added a stub implementation of GetTileConfig in XdnaDriver.
- Updated driver.h to include a virtual GetTileConfig method.
- Extended hsa_internal.h with a new hsa_get_tile_config function.
- Integrated hsa_get_tile_config into hsa.cpp to call the driver-specific
  implementation.
- Updated driver headers to declare the new GetTileConfig method.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 8d077dba3b rocr/driver: add GetClockCounters API to driver interface
This commit introduces a new GetClockCounters API to the driver interface.

- Implemented GetClockCounters in KfdDriver to fetch clock counters
  using hsaKmtGetClockCounters.
- Added a stub implementation of GetClockCounters in XdnaDriver that
  returns HSA_STATUS_ERROR.
- Modified GpuAgent to use driver().GetClockCounters instead of
  directly calling hsaKmtGetClockCounters.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 05b83e72d9 rocr/driver: add GetDeviceHandle to driver interface
This commit introduces a new GetDeviceHandle API to the driver
interface, allowing retrieval of the device handle for a
specific node.

- Implemented GetDeviceHandle in KfdDriver to fetch the AMD GPU
  device handle using hsaKmtGetAMDGPUDeviceHandle.
- Added a stub implementation of GetDeviceHandle in XdnaDriver
  that returns HSA_STATUS_ERROR.
- Modified GpuAgent::InitLibDrm to use driver().GetDeviceHandle
  instead of directly calling hsaKmtGetAMDGPUDeviceHandle.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Tony Gutierrez cb7b0c8d9f rocr: Remove driver usage from filter device
Slightly refactor the RvdFilter so it doesn't need to call into the driver.
2025-07-10 09:41:34 -07:00
David Yat Sin 4c2dec5bb8 doc: Fix doxygen comments for in-out params 2025-07-10 08:21:01 -04:00
Honglei Huang d874b8003a rocr/driver: add SetTrapHandler API to driver interface
This commit introduces a new SetTrapHandler API to the driver interface

- Implemented SetTrapHandler in KfdDriver to set trap handlers using
  hsaKmtSetTrapHandler.
- Added a stub implementation of SetTrapHandler in XdnaDriver that returns
  HSA_STATUS_ERROR.
- Updated the driver interface in driver.h to include the new SetTrapHandler
  method.
- Modified GpuAgent to use driver().SetTrapHandler instead of directly calling
  hsaKmtSetTrapHandler.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-06-27 23:32:53 +08:00
Tony Gutierrez 1a339feb1f rocr: Move OpenSMI call to Driver 2025-06-25 15:53:02 -07:00
Yiannis Papadopoulos 2ca4d8f6d4 rocr/aie: Remove redundant and unused functions. 2025-06-25 11:32:42 -04:00
Tony Gutierrez e03d44d742 rocr: Update Driver queue-related APIs
Update the user-mode driver queue APIs to leverage KMT types.

Move queue-related calls to the core::Driver API.
2025-06-23 12:21:01 -07:00
David Yat Sin b3c48cc68c rocr: support reserving non-registered VA
Extend hsa_amd_vmem_address_reserve/hsa_amd_vmem_address_reserve_align
to support HSA_AMD_VMEM_ADDRESS_NO_REGISTER flag. This allocation can be
used to reserve virtual address ranges that can later be used by
hsa_amd_svm_attributes_set for SVM based memory allocations.
2025-06-18 18:21:11 -04:00
Sunday Clement 06efa50c09 rocr: Fix Recursive Include in header files
scratch_cache.h includes amd_gpu_agent.h which then again includes
scratch_cache.h, this has now been fixed removing the unecessary
header include.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-13 12:29:52 -04:00
David Yat Sin 24ce840732 rocr: Remove support for Kaveri GPUs
Kaveri GPUs are EoL
2025-06-12 10:38:58 -04:00
David Yat Sin 96d0f07b15 rocr: Fix compile warning when using clang 2025-06-12 10:38:58 -04:00
Chris Freehill 3a9d14bb66 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.
2025-06-09 15:42:58 -05:00
Alysa Liu 9b3d15e68d rocr: Remove structurally dead code
Remove unreachable return statement.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-09 14:01:39 -04:00
Alysa Liu f6c8cbd293 rocr: Fix inefficient copy operations
Refactor variable assignments to use std::move() where appropriate.
Updat function headers to accept parameters by const& where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu ae6851dbb4 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu 369d89ade3 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
David Yat Sin 11da1293de rocr:Fix compile warnings 2025-05-28 16:12:02 -04:00
David Yat Sin 0d70045817 rocr: Remove deprecated doorbell type 1 support 2025-05-28 16:12:02 -04:00
David Yat Sin 4bae509296 rocr: Remove deprecated queue doubleMap code 2025-05-28 16:12:02 -04:00
David Yat Sin b8434529a5 rocr: Remove queue_full_workaround code
Remove deprecated queue_full_workaround code as gfx7 and gfx8 GPUs are
EoL.
2025-05-28 16:12:02 -04:00
David Yat Sin 04dbf769f6 rocr: update required CP FW version
Update required CP FW version required for async-scratch memory support
on gfx950.
2025-05-28 13:03:58 -04:00
David Yat Sin 9d38ca0d22 rocr: Fix compile error when using clang 2025-05-27 23:56:28 -04:00
David Yat Sin da2607024b rocr: Perform memcpy for small code-object loads
On large BAR systems, for small-sized code-objects, we get performance
using direct memcpy due to latencies when doing the blit-copy.
2025-05-22 18:39:19 -04:00
David Yat Sin e969e01f54 rocr: Perform range based cache invalidates
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.
2025-05-22 18:39:19 -04:00
Jiadong Zhu 0f9d2b836c rocr/dtif: use default signal for intercept queue for DTIF
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Jiadong Zhu e2d767879d rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader
hsaKmtQueueRingDoorbell is specfic to DTIF backend

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu e9088d6e47 rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 0cd4ddd62b rocr/dtif: add DRM APIs wrapper in thunk loader
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 1b79caa214 rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...)
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
Aaron Liu 7ba77fb193 rocr/dtif: add thunk loader to wrap hsaKmt APIs
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
2025-05-13 16:44:31 -04:00
christian-heusel 5cc61b714d rocr:Add missing cstdint include 2025-05-06 20:52:48 -04:00
Tony Gutierrez f2c482d923 rocr: Add large_bar_enabled var to the GPU agent
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.

Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.

This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
2025-04-23 15:53:29 -04:00
Tony Gutierrez 6e3c375bf1 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
2025-04-23 15:53:29 -04:00
Tony Gutierrez adbc0495e2 rocr/libhsakmt: Add coarse-grain allocator to GPU 2025-04-23 15:53:29 -04:00
Saleel Kudchadker 57c0c643ce rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.
2025-04-22 13:28:38 -07:00
Shane Xiao 6a63170b38 rocr: Add rec sdma engines with limited XGMI SDMA engine
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.

Signed-off-by: Shane Xiao <shane.xiao@amd.com>
2025-04-11 23:54:15 +08:00
Yiannis Papadopoulos 2d2c47bdef rocr/aie: Increment write pointer upon packet submission 2025-04-08 15:36:40 -05:00
Yiannis Papadopoulos c63e01724c rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition. 2025-04-03 15:13:20 -05:00
Yiannis Papadopoulos e55503e7f8 rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error 2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos f4e1c9b0ba rocr/aie: Avoiding XdnaDriver class in queue API 2025-03-27 13:15:13 -04:00
David Yat Sin 947391deac rocr: Release agent resources before pools
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.
2025-03-25 14:25:04 -04:00