This commit introduces MakeMemoryResident and MakeMemoryUnresident
functions to KfdDriver and XdnaDriver classes.
- Added implementations in amd_kfd_driver.cpp
- Added stubs in amd_xdna_driver.cpp returning HSA_STATUS_ERROR
- Updated header files amd_kfd_driver.h and amd_xdna_driver.h
- Removed MakeKfdMemoryResident/Unresident from amd_memory_region.cpp
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This commit completes the memory register/deregister interface change.
Removed static RegisterMemory and DeregisterMemory from MemoryRegion class
- Added pure virtual methods to base Driver interface in driver class
- Added implementation in KFD driver
- Modified MemoryRegion Lock and Unlock to use driver interface
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This commit introduces a new AvailableMemory API to the KfdDriver and
XdnaDriver classes.
- Implemented AvailableMemory in KfdDriver to return the available memory size
using hsaKmtAvailableMemory.
- Added a stub implementation of AvailableMemory in XdnaDriver that returns an error.
- Updated the GpuAgent class to use the new AvailableMemory API instead of
directly calling hsaKmtAvailableMemory.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This change adds a const-qualified version of the driver() method to the Agent
class, allowing const Agent objects to access their associated driver without
modifying the object's state.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Add AllocateScratchMemory interface to Driver base class and implement it
in both KFD and XDNA drivers. This change encapsulates the low-level
scratch memory allocation details within driver implementations, making
the code more maintainable and the interface cleaner.
The main changes include:
- Add AllocateScratchMemory virtual method to Driver interface
- Implement the interface in KfdDriver with existing allocation logic
- Add stub implementation in XdnaDriver
- Update GpuAgent to use the new interface instead of direct KMT calls
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Move the wallclock frequency query from GpuAgent to driver layer to improve
code organization and support multiple driver types. This change:
1. Add GetWallclockFrequency API to KFD/XDNA drivers
2. Move libdrm GPU info query from GpuAgent to driver implementation
3. Update GpuAgent to use the new driver API
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
- Implemented GetTileConfig in KfdDriver to retrieve tile configuration for
a specific node.
- Added a stub implementation of GetTileConfig in XdnaDriver.
- Updated driver.h to include a virtual GetTileConfig method.
- Extended hsa_internal.h with a new hsa_get_tile_config function.
- Integrated hsa_get_tile_config into hsa.cpp to call the driver-specific
implementation.
- Updated driver headers to declare the new GetTileConfig method.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This commit introduces a new GetClockCounters API to the driver interface.
- Implemented GetClockCounters in KfdDriver to fetch clock counters
using hsaKmtGetClockCounters.
- Added a stub implementation of GetClockCounters in XdnaDriver that
returns HSA_STATUS_ERROR.
- Modified GpuAgent to use driver().GetClockCounters instead of
directly calling hsaKmtGetClockCounters.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This commit introduces a new GetDeviceHandle API to the driver
interface, allowing retrieval of the device handle for a
specific node.
- Implemented GetDeviceHandle in KfdDriver to fetch the AMD GPU
device handle using hsaKmtGetAMDGPUDeviceHandle.
- Added a stub implementation of GetDeviceHandle in XdnaDriver
that returns HSA_STATUS_ERROR.
- Modified GpuAgent::InitLibDrm to use driver().GetDeviceHandle
instead of directly calling hsaKmtGetAMDGPUDeviceHandle.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
This commit introduces a new SetTrapHandler API to the driver interface
- Implemented SetTrapHandler in KfdDriver to set trap handlers using
hsaKmtSetTrapHandler.
- Added a stub implementation of SetTrapHandler in XdnaDriver that returns
HSA_STATUS_ERROR.
- Updated the driver interface in driver.h to include the new SetTrapHandler
method.
- Modified GpuAgent to use driver().SetTrapHandler instead of directly calling
hsaKmtSetTrapHandler.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
Extend hsa_amd_vmem_address_reserve/hsa_amd_vmem_address_reserve_align
to support HSA_AMD_VMEM_ADDRESS_NO_REGISTER flag. This allocation can be
used to reserve virtual address ranges that can later be used by
hsa_amd_svm_attributes_set for SVM based memory allocations.
scratch_cache.h includes amd_gpu_agent.h which then again includes
scratch_cache.h, this has now been fixed removing the unecessary
header include.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.
Refactor variable assignments to use std::move() where appropriate.
Updat function headers to accept parameters by const& where appropriate.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Invalidate only the address range that covers the newly copied
code-object. This avoids invalidating I$ for old code objects and thus
might increase I$ hit rate.
For native and DTIF backends, unify to use HSAKMT_CALL(...) to call
hsaKmt APIs.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: David Yat Sin <David.YatSin@amd.com>
Adds a bool to the GPU agent and a public member method to
check if the GPU supports large BAR. This is needed so we can
check if large BAR is supported when a user tries to allocate
an AQL queue in device memory on a given GPU agent.
Also adds an exception to the AQL queue if device-side AQL queues
are requested and the GPU owner of the AQL doesn't support large
BAR. Otherwise, ROCr will currently allow device-side queues
that can cause faults when the user tries to touch their ring
buffers and the user will not know why the faults are occuring.
This relies on the fact that the KFD does not exposed any links
from the CPU to the GPU if large BAR is not enabled (though
links from the GPU to the CPU may still be exposed by the KFD).
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
This patch will adds recommended sdma supports with
limited XGMI SDMA engine. It will use one PCIe SDMA
to do gpu <-> gpu copies which will help improve all
to all copy performance.
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Adding a general stage for agents to release their resources on
shutdown. This avoids a circular dependency during shutdown because
we have to delete allocated resources before deleting memory pools, but
we also have to delete memory pools before destroying agents.