Граф коммитов

923 Коммитов

Автор SHA1 Сообщение Дата
Flora Cui a765dd7e94 rocr: add specific flag for blit kernel object
so that aql-to-pm4 conversion could verify the validity of the kernel
object.

Signed-off-by: Flora Cui <flora.cui@amd.com>
2025-07-17 21:55:02 +08:00
Honglei Huang 6c87f5b5ce rocr/driver: add memory residency management interface in driver
This commit introduces MakeMemoryResident and MakeMemoryUnresident
functions to KfdDriver and XdnaDriver classes.

- Added implementations in amd_kfd_driver.cpp
- Added stubs in amd_xdna_driver.cpp returning HSA_STATUS_ERROR
- Updated header files amd_kfd_driver.h and amd_xdna_driver.h
- Removed MakeKfdMemoryResident/Unresident from amd_memory_region.cpp

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang ab6bda7e96 rocr/driver: add memory registration and deregistration into driver
This commit completes the memory register/deregister interface change.

Removed static RegisterMemory and DeregisterMemory from MemoryRegion class

- Added pure virtual methods to base Driver interface in driver class
- Added implementation in KFD driver
- Modified MemoryRegion Lock and Unlock to use driver interface

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang 6c390e32cc rocr/driver: add AvailableMemory API to driver
This commit introduces a new AvailableMemory API to the KfdDriver and
 XdnaDriver classes.

- Implemented AvailableMemory in KfdDriver to return the available memory size
  using hsaKmtAvailableMemory.
- Added a stub implementation of AvailableMemory in XdnaDriver that returns an error.
- Updated the GpuAgent class to use the new AvailableMemory API instead of
  directly calling hsaKmtAvailableMemory.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang 9c18618847 rocr: add const version of driver() method to Agent class
This change adds a const-qualified version of the driver() method to the Agent
class, allowing const Agent objects to access their associated driver without
modifying the object's state.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang 8216787d4c rocr: use driver interface for scratch memory deallocation
Replace direct hsaKmtFreeMemory call with driver's FreeMemory interface
in GpuAgent::ReleaseResources(). This change improves code abstraction
by handling memory deallocation through the unified driver interface.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Honglei Huang da8dd9e1e3 rocr/driver: add scratch memory allocation into driver interface
Add AllocateScratchMemory interface to Driver base class and implement it
in both KFD and XDNA drivers. This change encapsulates the low-level
scratch memory allocation details within driver implementations, making
the code more maintainable and the interface cleaner.

The main changes include:
- Add AllocateScratchMemory virtual method to Driver interface
- Implement the interface in KfdDriver with existing allocation logic
- Add stub implementation in XdnaDriver
- Update GpuAgent to use the new interface instead of direct KMT calls

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-16 13:15:45 +08:00
Yiannis Papadopoulos bfe76cf94e rocr: Using MemoryRegion::GetInfo(HSA_REGION_INFO_ALLOC_MAX_SIZE) for HSA_AMD_MEMORY_POOL_INFO_ALLOC_MAX_SIZE 2025-07-15 14:22:56 -05:00
zichguan-amd 7946ddb647 rocr: check _SC_LEVEL1_DCACHE_LINESIZE before use
Support musl
Fixes ROCm/ROCR-Runtime#318

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>
2025-07-14 14:44:31 -04:00
Chris Freehill c065d9a7e2 rocr: Ensure AqlQueue can exit on memory error
A hang would occur when a memory error occurs because the
AQLQueue destructor would be waiting for a signal that
wouldn't come. This change allows it to break out of the
wait loop.
2025-07-11 12:58:21 -05:00
Honglei Huang 412e386b50 rocr/driver: move wallclock frequency query to driver layer
Move the wallclock frequency query from GpuAgent to driver layer to improve
code organization and support multiple driver types. This change:

1. Add GetWallclockFrequency API to KFD/XDNA drivers
2. Move libdrm GPU info query from GpuAgent to driver implementation
3. Update GpuAgent to use the new driver API

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 9bc38e2ee6 rocr/driver: add support for getting GPU tile configuration
- Implemented GetTileConfig in KfdDriver to retrieve tile configuration for
a specific node.
- Added a stub implementation of GetTileConfig in XdnaDriver.
- Updated driver.h to include a virtual GetTileConfig method.
- Extended hsa_internal.h with a new hsa_get_tile_config function.
- Integrated hsa_get_tile_config into hsa.cpp to call the driver-specific
  implementation.
- Updated driver headers to declare the new GetTileConfig method.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 8d077dba3b rocr/driver: add GetClockCounters API to driver interface
This commit introduces a new GetClockCounters API to the driver interface.

- Implemented GetClockCounters in KfdDriver to fetch clock counters
  using hsaKmtGetClockCounters.
- Added a stub implementation of GetClockCounters in XdnaDriver that
  returns HSA_STATUS_ERROR.
- Modified GpuAgent to use driver().GetClockCounters instead of
  directly calling hsaKmtGetClockCounters.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 05b83e72d9 rocr/driver: add GetDeviceHandle to driver interface
This commit introduces a new GetDeviceHandle API to the driver
interface, allowing retrieval of the device handle for a
specific node.

- Implemented GetDeviceHandle in KfdDriver to fetch the AMD GPU
  device handle using hsaKmtGetAMDGPUDeviceHandle.
- Added a stub implementation of GetDeviceHandle in XdnaDriver
  that returns HSA_STATUS_ERROR.
- Modified GpuAgent::InitLibDrm to use driver().GetDeviceHandle
  instead of directly calling hsaKmtGetAMDGPUDeviceHandle.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 16:14:29 +08:00
Honglei Huang 837fd044d0 rocr: replace DMABuf export paths by driver interface
This change improves code maintainability and error handling by
centralizing DMABuf export functionality in the driver interface.

- Replace direct hsaKmtExportDMABufHandle calls with driver's ExportDMABuf method
- Improve error handling with more specific error status returns
- Add explicit invalid parameter checks and assertions
- Consolidate DMABuf export logic in IPC and VMemory paths
- Propagate detailed error status from driver layer

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-07-11 13:36:45 +08:00
Tony Gutierrez cb7b0c8d9f rocr: Remove driver usage from filter device
Slightly refactor the RvdFilter so it doesn't need to call into the driver.
2025-07-10 09:41:34 -07:00
David Yat Sin 4c2dec5bb8 doc: Fix doxygen comments for in-out params 2025-07-10 08:21:01 -04:00
Chris Freehill 12430fe25a rocr: Fix isa entries for gfx906/sramecc
Some of the entries for gfx906 in the ISA table in isa.cpp
had "any" for "sramecc-" instead of "disabled". This fixes
that.
2025-07-02 08:40:30 -05:00
Honglei Huang d874b8003a rocr/driver: add SetTrapHandler API to driver interface
This commit introduces a new SetTrapHandler API to the driver interface

- Implemented SetTrapHandler in KfdDriver to set trap handlers using
  hsaKmtSetTrapHandler.
- Added a stub implementation of SetTrapHandler in XdnaDriver that returns
  HSA_STATUS_ERROR.
- Updated the driver interface in driver.h to include the new SetTrapHandler
  method.
- Modified GpuAgent to use driver().SetTrapHandler instead of directly calling
  hsaKmtSetTrapHandler.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-06-27 23:32:53 +08:00
Honglei Huang dee5bdc679 rocr: replace direct libhsakmt calls with driver interfaces
Replace direct hsakmt API calls with calls through the driver abstraction layer
in queue management related functions. This includes:
- CreateQueue/DestroyQueue operations
- Queue update and GWS allocation
- CU masking configuration

Also update the corresponding error status types from HSAKMT_STATUS to
hsa_status_t and adjust error handling accordingly.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-06-26 15:53:01 +08:00
Honglei Huang 046591419f rocr: use driver interface for memory and cache properties query
Replace direct libhsakmt calls with driver interface methods
in GpuAgent initialization:
- Replace hsaKmtGetNodeMemoryProperties with driver().GetMemoryProperties
- Replace hsaKmtGetNodeCacheProperties with driver().GetCacheProperties

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
2025-06-26 15:53:01 +08:00
Honglei Huang ffa07e28e7 rocr: remove unused agent properties reference in scratch initialization
The agent properties variable `agent_props` was declared but never used
in the `InitScratchSRD()` function. Which casued compile warning:

runtime/core/runtime/amd_aql_queue.cpp:1880:15: warning:
unused variable ‘agent_props’ [-Wunused-variable]
 1880 |   const auto& agent_props = agent_->properties();

No functional changes, purely a code cleanup commit.
2025-06-26 13:05:40 +08:00
Tony Gutierrez 1a339feb1f rocr: Move OpenSMI call to Driver 2025-06-25 15:53:02 -07:00
Yiannis Papadopoulos 2ca4d8f6d4 rocr/aie: Remove redundant and unused functions. 2025-06-25 11:32:42 -04:00
Yiannis Papadopoulos e5125c9d5e rocr/aie: Correct calculation of neural cores and avoid error on invalid queue ID. 2025-06-25 11:32:42 -04:00
Ken O'Brien 7b8a6f8ca2 rocr: Fixes memory allocation issue
Fixes a bug in memory allocation in which dmabuf export only works on
GPU 0 in a multi-GPU environment.
2025-06-24 14:53:14 -04:00
Sunday Clement e97d06530e rocr: Add hsa-agent Queries for Clock Counters
Support has been added to query the following
HSA_AMD_INFO_GET_CLOCK_COUNTERS agent info exposed through the hsa api
in rocr, rather than the user having to make a direct IOCTL call
through the kernel driver.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-23 18:45:09 -04:00
Tony Gutierrez e03d44d742 rocr: Update Driver queue-related APIs
Update the user-mode driver queue APIs to leverage KMT types.

Move queue-related calls to the core::Driver API.
2025-06-23 12:21:01 -07:00
David Yat Sin b3c48cc68c rocr: support reserving non-registered VA
Extend hsa_amd_vmem_address_reserve/hsa_amd_vmem_address_reserve_align
to support HSA_AMD_VMEM_ADDRESS_NO_REGISTER flag. This allocation can be
used to reserve virtual address ranges that can later be used by
hsa_amd_svm_attributes_set for SVM based memory allocations.
2025-06-18 18:21:11 -04:00
Chris Freehill 24f36de037 rocr: Add missing close of dmabuf after import 2025-06-17 20:22:34 -04:00
David Yat Sin 488cfd467c rocr: Always send free scratch notifications
Always send notification to profiler tools when scratch memory is freed.
2025-06-16 17:39:33 -04:00
Alysa Liu 3b450397d6 rocr: Fix wrong sizeof argument
Update size calculation from 2 * sizeof(void*) to 2 * sizeof(uint64_t)

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-16 13:11:07 -04:00
Sunday Clement 31b6474801 rocr: Remove Recursive Include
Removed unnecessary header inlude in file to prevent circular include.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-13 12:29:52 -04:00
Sunday Clement 06efa50c09 rocr: Fix Recursive Include in header files
scratch_cache.h includes amd_gpu_agent.h which then again includes
scratch_cache.h, this has now been fixed removing the unecessary
header include.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-13 12:29:52 -04:00
David Yat Sin 3c0af843e3 rocr: Remove scratch_backing_memory_byte_size
scratch_backing_memory_byte_size was originally removed, and then put
back in 02b38d0614. This was because it
was used by rocgdb. rocgdb code has been updated to not use this field.
Bumped _amdgpu_r_debug for the ABI change.
2025-06-12 15:33:47 -04:00
David Yat Sin 24ce840732 rocr: Remove support for Kaveri GPUs
Kaveri GPUs are EoL
2025-06-12 10:38:58 -04:00
David Yat Sin 96d0f07b15 rocr: Fix compile warning when using clang 2025-06-12 10:38:58 -04:00
Alysa Liu 77b86ca908 rocr: Prevent int overflow in arithmetic operation
Cast range->x and range->y to uint64_t before performing multiplication

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-11 19:36:36 -04:00
David Yat Sin df5d66eae5 rocr: document pseudo-code for scratch reclaim
Document CP FW and ROCr pseudo-code for asynchronous reclaim.
No code change.
2025-06-11 16:19:59 -04:00
Chris Freehill 3a9d14bb66 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.
2025-06-09 15:42:58 -05:00
Sunday Clement dce52be686 rocr: Fix Unintentional Integer Overflow
Its safer to have the integer literal explicitly be an unsigned long
in this expression as that's what the type of the errorCode variable
resolves to, preventing any overflow errors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-09 15:16:10 -04:00
Alysa Liu 9b3d15e68d rocr: Remove structurally dead code
Remove unreachable return statement.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-09 14:01:39 -04:00
Sunday Clement 1635746a9c rocr: Fix Potential Deadlock
Moved the Call to pthread_mutex_lock to an else statement for better
code readibility.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-04 10:18:09 -04:00
Sunday Clement a97b7df4b9 rocr: Fix Potential Deadlock
Because eventDescrp->mutex is a non-recursive lock attempting to
acquire the lock with pthread_mutex_lock can cause the system to hang
indefinitely if the lock was already previously aquired with the
preceeding call to pthread_mutex_trylock.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-06-04 10:18:09 -04:00
Alysa Liu f6c8cbd293 rocr: Fix inefficient copy operations
Refactor variable assignments to use std::move() where appropriate.
Updat function headers to accept parameters by const& where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu ae6851dbb4 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Changed function headers to pass string arguments by reference where appropriate.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu a945b5d493 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate.
Revert change in amd_kfd_driver.cpp.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Alysa Liu 369d89ade3 rocr: Fixed inefficient copy operations
Changed variable assignments to use std::move() where appropriate

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-06-02 11:18:36 -04:00
Sunday Clement 293092f32f rocr: Fix Resource Leak
allocated memory was previously not freed in the event of an error
with rwlock initialization.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-05-30 09:16:26 -04:00
David Yat Sin fc561ff37a rocr: Add all sysfs entries for L2 Cache
For L2 Cache and above, we report the total amount of cache for the
whole partition, so we add up the L2 Cache entry for each partition.
2025-05-29 19:02:38 -04:00