Граф коммитов

1298 Коммитов

Автор SHA1 Сообщение Дата
hkasivis 5e7210980e Users/hkasivis/add ais support v2.1 (#928)
* libhsakmt: Update hsakmt_fmm_get_handle to support address range

Currently, hsakmt_fmm_get_handle works only if the address is allocated
(staring) value. Update it so it can find the handle if address falls in
the valid allocated range. This is useful for AMD infinity storage
feature where data needs to be transferred to any memory within in the
allocated range

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Introduce AMD Infinity Storage (AIS) API

Add hsaKmtAisReadWriteFile() API to support AMD Infinity Storage. The
API moves data directly from GPU VRAM to a file.

v2: Add in/out ioctl arguments to provide more status information to
user space. Modify hsaKmt API also accordingly.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* rocr: Initial implementation of AMD Infinity Storage (AIS)

Implement first two API: hsa_amd_ais_file_write and hsa_amd_ais_file_read

v2: Change API from hsa_amd_ to hsa_amd_ais_
    Change API to take in handle instead of fd for compatibility accross
     different platforms

Original Author: Chris Freehill <Chris.Freehill@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-20 11:30:05 -04:00
Tony G c34c9826c3 rocr: Remove QueueProxy (#700)
Because the base QueueWrapper class copies the wrapped queue's
amd_queue_v2_t queue descriptor struct the QueueProxy seems
superfluous as it will have the same effect as calling the
underlying methods on the wrapped queue itself.

Additionally, because the QueueProxy needs to access the wrapped
queue's queue descriptor it breaks the Queue API which is meant
to abstract the underlying agent's queue implementation.

This makes it easier to generalize the core::Queue as well as
the InterceptQueue.

Signed-off-by: Tony Gutierrez <anthony.gutierrez@amd.com>
2025-09-19 09:07:28 -07:00
German Andryeyev 913743d433 Add windows build support into ROCr (#912)
Make sure ROCR can be compiled under windows. Extra setup for the windows build environment is required. The change should not have any functional changes under Linux.
2025-09-19 10:10:17 -04:00
David Yat Sin 96a0d16eda rocr: Fix hsa_amd_pointer_info regression (#719)
Fix for hsa_amd_pointer_info returning only
HSA_EXT_POINTER_TYPE_RESERVED_ADDR for SVM allocations.
2025-09-19 10:09:22 -04:00
Sunday Clement 7c8e575f5d Fix Undefined behavior from signed bit shifts (#871)
* libhsakmt: fix UB due to signed integer literal in 1 << 31

Bit shift operations on signed numbers should not shift into or beyond
the signed bit as this results in Undefined Behaviour.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* libhsakmt: Fix UB due to signed integer literal in 1 << x

Bit Shifting an unsigned integer is undefined behavior.

BUG: SWDEV-532853

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix UB in various places due signed integer in bit shift

Bit shifting signed integers into or beyond the sign bit is undefined.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Change signed integer literals to unsigned

Changing the signed integers in the macro expressions throughout the file
to avoid overflow.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Flora Cui <flora.cui@amd.com>
2025-09-18 09:09:30 -04:00
systems-assistant[bot] f1fabcfd64 rocr: Error Handling Issues (#264)
* rocr: Fix Incorrect Assertion Check

The wrong variable is used in the assertion statement, should be error
checking for the value of paramEndLoc after it is modified by the call
to find().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Potential Undefined Behaviour

In the event that the SvmProfileControl destructor is called and
event == -1 is true then the call to close(event) is effectively
close(-1) which is undefined behaviour. This has been changed to only
call close() on valid file descriptors.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Add Error Check on Bytes Read

In the case that there is an incomplete read the call to copyTo() will
now return an error.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix Exception Error

Destructors are implicitly marked with noexcept being true by default
so if its not explicitly marked false in the destructor or the
functions it calls, any thrown exceptions will cause the program to
crash.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-16 09:43:45 -04:00
Alysa Liu 2b2b8329b5 rocr: Add copyright for new files (#886)
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-09-11 10:56:31 -04:00
Benjamin Welton ed5b2ac165 Fix deadlock in InterceptQueue::Submit when packet count exceeds queue capacity (#855)
InterceptQueue::Submit had an "all-or-nothing" packet submission policy that
could cause infinite retry loops when the number of packets to submit exceeded
the available queue slots. When 504+ packets needed submission to a ~500-slot
queue, the system would:
1. Set submitted_count=0 (submit nothing)
2. Add retry barrier packet
3. Trigger async handler via StoreRelaxed
4. Attempt to submit overflow packets
5. Fail again due to same space constraints
6. Repeat

Solution:
Added partial packet submission capability during overflow processing while
preserving the original "all-or-nothing" behavior for normal operations.
When processing overflow packets and insufficient space exists for all packets,
the system now submits as many packets as possible rather than none.

The fix:
- Detects overflow processing via !overflow_.empty()
- Allows partial submission: submitted_count = free_slots - barrier_reservation
- Maintains atomicity guarantees for normal packet rewrites
- Prevents infinite retry loops by ensuring forward progress

This resolves deadlocks in high-throughput scenarios while maintaining
backward compatibility and the original design intent for packet rewrite
atomicity.
2025-09-09 14:06:29 -07:00
Flora Cui e7cb108a5e [rocr-runtime] Add support for WSL DXG devices (#854)
* rocr/rocdxg: add rocdxg support

* rocr/dxg: set flags for dxg env

* rocr: ring doorbell for dtif/dxg

* rocr/dxg: sdma changes

1. align command size to 64
2. call hsaKmtQueueRingDoorbell
3. disable gcr && hdp flush


Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-09-09 10:16:57 +08:00
estewart08 bc35beafbf rocr: Remove extra LibElf find_package (#767)
This should have been removed when the libelf config search
was added.
2025-09-03 20:04:05 -04:00
systems-assistant[bot] 83a10986a4 SWDEV-539130 - Log blit copy duration (#258)
Co-authored-by: Pengda Xie <pengda.xie@amd.com>
2025-09-03 10:01:47 -07:00
SaleelK 230a22b395 rocr: Workaround for peak SDMA b/w on gfx94x (#626)
* Ideally SDMA0/1/2 are the engines to use for H2D/D2H due to physical
  PCIE proximity
* Allow using same src/dst agent for SDMA query apis
2025-09-03 09:33:29 -04:00
shwetakhatri-amd 79400a1f23 rocr: GFX12+ - Fix trap handler to process SW trap ID correctly (#736)
When stochastic sampling is not active, the trap handler is incorrectly
branching to .check_exceptions, bypassing the software trap ID checks
and inturn not advancing the PC. Fixed the issue to always check software
traps regardless of PC sampling state.

Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
2025-08-25 19:20:37 -04:00
cfreeamd a013e141b7 Revert "rocr: river interface changes" (#724)
This commit reverts the following related commits which cause
test failures:

6d15779b3e rocr/driver: add PC sampling support to driver interface
56cb9390ff rocr/driver: add PC sampling support to driver interface
76bf829f09 rocr/driver: add ASAN header page management to Driver class
a47c060d6a rocr/driver: add ASAN header page management to Driver class
02d7eaf3b7 rocr: add memory sharing call to Driver interface
9312468655 rocr: add memory sharing call to Driver interface
2025-08-25 12:44:26 +05:30
David Yat Sin a1597a358a rocr: Expose flag to allocate uncached memory (#674)
Add new flag for clients to directly request uncached memory
2025-08-22 09:52:39 -04:00
David Yat Sin 87b348c51d rocr: Fix hsa_amd_pointer_info regression (#638)
Fix regression when hsa_amd_pointer_info is called on a pointer that was
allocated using non-VMM APIs. The helper function VMemoryPtrInfo should
return error when the address is not found so that PtrInfo does the
lookup via Thunk.
2025-08-21 10:25:50 -04:00
jokim-amd 700afd2d17 Re-Enable IPC DMA Bufs by default
Let ROCr use the new IPC-DMA bufs path.
2025-08-14 18:49:09 -04:00
systems-assistant[bot] 3fd8af5974 rocr: SvmPrefetch to a particular node (#294)
Previously regardless of hsa_agent passed the prefetch is always driven
to node 0, now the agent of interest may be properly prefetched.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2025-08-14 09:52:45 -04:00
David Yat Sin 875fb40a03 Dayatsin/develop vmm pointer info (#305)
* rocr: hsa_amd_pointer_info to support VMEM pointers

Extend hsa_amd_pointer_info to support virtual memory addresses.

If hsa_amd_pointer_info is called on an address that is reserved but not
mapped to memory, then the pointer type will be reported as
HSA_EXT_POINTER_TYPE_RESERVED_ADDR.

If hsa_amd_pointer_info is called on an address that is mapped, then the
pointer type will be reported as HSA_EXT_POINTER_TYPE_HSA_VMEM

* rocrtst: VirtMemory_Basic_Test test for pointer info

Extend rocrtstFunc.VirtMemory_Basic_Test to test for
hsa_amd_pointer_info

* rocrtst: Add SVM Memory Test
2025-08-13 14:21:47 -04:00
mat3ix c41050d01f rocr: SDMA improvements (#326)
- When SDMA queue gets full when copying 2GB or more it blocks async
copy api
- Improve/format logging
2025-08-13 10:25:29 -04:00
systems-assistant[bot] d0a18e0eb9 [cmake] - Update search for LibElf (#256)
There is an issue with TheRock build currently. They have
a local source build of elfutils they want to use instead
of a system package. Currently, rocr uses it's own
FindLibElf.cmake module and this is inhibiting the build
from finding the libelf config built by TheRock.

Now we will first search in config mode and fallback to
module mode if nothing is found.

Authored-by: Ethan Stewart <ethan.stewart@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2025-08-13 09:13:45 -04:00
Alysa Liu cd5cd88d0d rocr: Fix type mismatch in printf
Format packet.workgroup_size_x correctly as a size_t.
Format packet.workgroup_size_y correctly as a size_t.
Format packet.workgroup_size_z correctly as a size_t.

Format packet.grid_size_x correctly as a size_t.
Format packet.grid_size_y correctly as a size_t.
Format packet.grid_size_z correctly as a size_t.

Format packet.group_segment_size correctly as a size_t.
Format packet.private_segment_size correctly as a size_t.

Format barrier_packet.completion_signal correctly as an address using %zx.
Format barrier_packet.dep_signal[0] correctly as an address using %zx.
Format barrier_packet.dep_signal[1] correctly as an address using %zx.
Format barrier_packet.dep_signal[2] correctly as an address using %zx.
Format barrier_packet.dep_signal[3] correctly as an address using %zx.
Format barrier_packet.dep_signal[4] correctly as an address using %zx.
Format packet.kernarg_address correctly as an address using %zx.
Format completion_signal correctly as an address using %zx.

Format this->queue_->public_handle()->id correctly as an unsigned long.
Format this->queue_->LoadReadIndexRelaxed() correctly as an unsigned long.
Format write_index correctly as an unsigned long.
Format index correctly as an unsigned long.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 53873e32f3]
2025-08-08 11:59:48 -04:00
Honglei Huang 6d15779b3e rocr/driver: add PC sampling support to driver interface
Add PC sampling functionality to the driver interface:

1. Add new PC sampling methods to Driver base class:
   - PcSamplingQueryCapabilities
   - PcSamplingCreate
   - PcSamplingDestroy
   - PcSamplingStart
   - PcSamplingStop

2. Implement PC sampling methods in KfdDriver using HSAKMT APIs:
   - Map HSAKMT status codes to HSA status codes
   - Handle resource busy conditions
   - Proper error handling for all operations

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 56cb9390ff]
2025-07-31 21:48:25 +08:00
Honglei Huang 76bf829f09 rocr/driver: add ASAN header page management to Driver class
Add ASAN header page management to Driver

- Add ReplaceAsanHeaderPage and ReturnAsanHeaderPage to Driver interface
- Implement ASAN functions in KfdDriver using hsaKmt calls

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: a47c060d6a]
2025-07-31 21:48:25 +08:00
Honglei Huang 02d7eaf3b7 rocr: add memory sharing call to Driver interface
This change improves the abstraction of memory sharing operations by moving them
to the driver layer and adds safety checks for cross-driver operations.

- Add ShareMemory and RegisterSharedHandle methods to support memory sharing
  between processes
- Add IsDifferentDriver utility methods to check driver compatibility across
  agents/nodes
- Refactor IPC memory handling to use driver-based memory sharing instead of
  direct HSAKMT calls
- Improve error handling for memory sharing operations across different drivers

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 9312468655]
2025-07-31 21:48:25 +08:00
Shweta Khatri ec6ed9586b rocr: Remove ISA check to disable stochastic support for GFX12.0 in ROCR
Feature support should be determined by KFD via the query-capabilities
IOCTL, not in ROCR.


[ROCm/ROCR-Runtime commit: a5de07d1b8]
2025-07-29 11:18:36 -04:00
Yiannis Papadopoulos 9fd770ac78 rocr: Adding conversion function from hsa_amd_vmem_alloc_handle_t to ThunkHandle
[ROCm/ROCR-Runtime commit: b7cd5cc7f1]
2025-07-26 00:55:21 -04:00
Yiannis Papadopoulos 54933a3db2 rocr: DmaBufExport support for other agent types
[ROCm/ROCR-Runtime commit: f5120bfe68]
2025-07-25 21:49:35 -04:00
Yiannis Papadopoulos 91895208f8 rocr/aie: XdnaDriver::ExportDMABuf implementation
[ROCm/ROCR-Runtime commit: ccaac9045b]
2025-07-25 21:49:35 -04:00
Yat Sin, David 90153e90e1 Update runtime/hsa-runtime/core/runtime/amd_blit_sdma.cpp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Yat Sin, David <David.YatSin@amd.com>

[ROCm/ROCR-Runtime commit: 0dec2ab43b]
2025-07-25 14:50:40 -04:00
David Yat Sin 7114e098d6 rocr: Remove SDMA code for gfx7 and gfx8
Remove deprecated SDMA code for gfx7 and gfx8 asics


[ROCm/ROCR-Runtime commit: d3f70910e1]
2025-07-25 14:50:40 -04:00
Tony Gutierrez 36072821a8 rocr: Remove unused member of GPUAgent
The ape1_size_ member was leftover after the removal
of KV and is no longer used.

Remove it to remove some compiler warnings.

Signed-off-by: Tony Gutierrez <anthony.gutierrez@amd.com>


[ROCm/ROCR-Runtime commit: 5285c24657]
2025-07-25 10:43:28 -04:00
Honglei Huang 8f91cd2b03 rocr: support multiple driver types in agent initialization
Modify agent initialization to support different driver types,
to enable KFD_VIRTIO dirver for CPU and GPU agent here.

1. Add driver_type parameter to CpuAgent and GpuAgent constructors
2. Update topology discovery to handle multiple driver types
3. Fix MakeMemoryResident return value check in VirtioDriver
4. Add helper function IsGPUDriver to check driver types
5. Update agent discovery to iterate through all available drivers

This change makes the runtime more flexible by removing hardcoded KFD
driver assumptions and properly handling different driver backends.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 20806577ce]
2025-07-24 23:20:36 +08:00
Honglei Huang 8d7d06a867 rocr/driver: add virtio driver support for ROCm runtime
This commit adds virtio driver support to the ROCm runtime by:

1. Implementing KfdVirtioDriver class that inherits from core::Driver
2. Adding KFD_VIRTIO to DriverType enum
3. Registering virtio driver discovery function in topology
4. Adding virtio driver source files to CMake build

The virtio driver implementation provides basic memory management and
queue operations for virtualized GPU environments. Some advanced features
like PC sampling and SMI are currently not supported.

Key changes:
- Add new files: amd_kfd_virtio_driver.h/cpp
- Update CMakeLists.txt to include virtio driver
- Add VIRTIO to DriverType enum in driver.h
- Register virtio driver in amd_topology.cpp

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: d36cb195da]
2025-07-24 23:20:36 +08:00
Honglei Huang 5e68bd163a libhsakmt/virtio: add virtio support for libhsakmt
This patch adds VirtIO support to the libhsakmt library, enabling communication
 with AMD GPUs via VirtIO.

Details
- CMakeLists.txt: Added a new CMakeLists.txt file for the VirtIO component
of libhsakmt.
- hsakmt_virtio.c/h: Implemented the core VirtIO functionality, including
VirtIO GPU device initialization, command execution, and memory management.
- virtio_gpu.c/h: Contains the implementation of the VirtIO GPU device,
including ioctl handling, shared memory management, and command execution.
- hsakmt_virtio_events.c: Implements event handling for VirtIO, such as event
creation, destruction, setting, resetting, and querying event states.
- hsakmt_virtio_memory.c: Manages memory operations for VirtIO, including memory
allocation, freeing, mapping, and unmapping.
- hsakmt_virtio_queues.c: Implements queue management for VirtIO, including
queue creation, destruction, and updating.
- hsakmt_virtio_topology.c: Handles system and node properties for VirtIO.
- hsakmt_virtio_vm.c: Manages VM-related operations for VirtIO, such as
reserving and dereserving VA space.
- include/linux/virtgpu_drm.h: Contains DRM definitions for VirtIO GPU.

Key Features
- VirtIO GPU Initialization: The library can now initialize a VirtIO GPU device
and communicate with it.
- Command Execution: Supports executing commands on the VirtIO GPU device.
- Memory Management: Provides functions for allocating, freeing, mapping, and
unmapping memory for VirtIO operations.
- Event Handling: Implements a comprehensive event system for VirtIO.
- Queue Management: Allows for creating, destroying, and updating queues
on the VirtIO GPU device.
- System and Node Properties: Retrieves and manages system and node
properties for VirtIO.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 48d3719dba]
2025-07-24 23:20:36 +08:00
Shweta Khatri 2bd0f85f80 rocr: GFX12 - Enable host trap PC Sampling
[ROCm/ROCR-Runtime commit: 6015ad1016]
2025-07-23 06:51:53 -04:00
Yiannis Papadopoulos 6ebc5bd4e4 rocr: Fix warnings
[ROCm/ROCR-Runtime commit: eb3d45d300]
2025-07-21 12:55:46 -04:00
Flora Cui 6bb53e88c5 rocr: add specific flag for blit kernel object
so that aql-to-pm4 conversion could verify the validity of the kernel
object.

Signed-off-by: Flora Cui <flora.cui@amd.com>


[ROCm/ROCR-Runtime commit: a765dd7e94]
2025-07-17 21:55:02 +08:00
Honglei Huang 7c29c36f7e rocr/driver: add memory residency management interface in driver
This commit introduces MakeMemoryResident and MakeMemoryUnresident
functions to KfdDriver and XdnaDriver classes.

- Added implementations in amd_kfd_driver.cpp
- Added stubs in amd_xdna_driver.cpp returning HSA_STATUS_ERROR
- Updated header files amd_kfd_driver.h and amd_xdna_driver.h
- Removed MakeKfdMemoryResident/Unresident from amd_memory_region.cpp

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 6c87f5b5ce]
2025-07-16 13:15:45 +08:00
Honglei Huang b61df004ff rocr/driver: add memory registration and deregistration into driver
This commit completes the memory register/deregister interface change.

Removed static RegisterMemory and DeregisterMemory from MemoryRegion class

- Added pure virtual methods to base Driver interface in driver class
- Added implementation in KFD driver
- Modified MemoryRegion Lock and Unlock to use driver interface

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: ab6bda7e96]
2025-07-16 13:15:45 +08:00
Honglei Huang 724c9b9803 rocr/driver: add AvailableMemory API to driver
This commit introduces a new AvailableMemory API to the KfdDriver and
 XdnaDriver classes.

- Implemented AvailableMemory in KfdDriver to return the available memory size
  using hsaKmtAvailableMemory.
- Added a stub implementation of AvailableMemory in XdnaDriver that returns an error.
- Updated the GpuAgent class to use the new AvailableMemory API instead of
  directly calling hsaKmtAvailableMemory.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 6c390e32cc]
2025-07-16 13:15:45 +08:00
Honglei Huang a509e93393 rocr: add const version of driver() method to Agent class
This change adds a const-qualified version of the driver() method to the Agent
class, allowing const Agent objects to access their associated driver without
modifying the object's state.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 9c18618847]
2025-07-16 13:15:45 +08:00
Honglei Huang e81be86a31 rocr: use driver interface for scratch memory deallocation
Replace direct hsaKmtFreeMemory call with driver's FreeMemory interface
in GpuAgent::ReleaseResources(). This change improves code abstraction
by handling memory deallocation through the unified driver interface.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 8216787d4c]
2025-07-16 13:15:45 +08:00
Honglei Huang 1ad79f04d0 rocr/driver: add scratch memory allocation into driver interface
Add AllocateScratchMemory interface to Driver base class and implement it
in both KFD and XDNA drivers. This change encapsulates the low-level
scratch memory allocation details within driver implementations, making
the code more maintainable and the interface cleaner.

The main changes include:
- Add AllocateScratchMemory virtual method to Driver interface
- Implement the interface in KfdDriver with existing allocation logic
- Add stub implementation in XdnaDriver
- Update GpuAgent to use the new interface instead of direct KMT calls

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: da8dd9e1e3]
2025-07-16 13:15:45 +08:00
Yiannis Papadopoulos a0ef6f6473 rocr: Using MemoryRegion::GetInfo(HSA_REGION_INFO_ALLOC_MAX_SIZE) for HSA_AMD_MEMORY_POOL_INFO_ALLOC_MAX_SIZE
[ROCm/ROCR-Runtime commit: bfe76cf94e]
2025-07-15 14:22:56 -05:00
zichguan-amd 0c698557a0 rocr: check _SC_LEVEL1_DCACHE_LINESIZE before use
Support musl
Fixes ROCm/ROCR-Runtime#318

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: 7946ddb647]
2025-07-14 14:44:31 -04:00
Chris Freehill c5faafeb25 rocr: Ensure AqlQueue can exit on memory error
A hang would occur when a memory error occurs because the
AQLQueue destructor would be waiting for a signal that
wouldn't come. This change allows it to break out of the
wait loop.


[ROCm/ROCR-Runtime commit: c065d9a7e2]
2025-07-11 12:58:21 -05:00
Honglei Huang 3fb4c8d3d7 rocr/driver: move wallclock frequency query to driver layer
Move the wallclock frequency query from GpuAgent to driver layer to improve
code organization and support multiple driver types. This change:

1. Add GetWallclockFrequency API to KFD/XDNA drivers
2. Move libdrm GPU info query from GpuAgent to driver implementation
3. Update GpuAgent to use the new driver API

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 412e386b50]
2025-07-11 16:14:29 +08:00
Honglei Huang 309e8b1a9f rocr/driver: add support for getting GPU tile configuration
- Implemented GetTileConfig in KfdDriver to retrieve tile configuration for
a specific node.
- Added a stub implementation of GetTileConfig in XdnaDriver.
- Updated driver.h to include a virtual GetTileConfig method.
- Extended hsa_internal.h with a new hsa_get_tile_config function.
- Integrated hsa_get_tile_config into hsa.cpp to call the driver-specific
  implementation.
- Updated driver headers to declare the new GetTileConfig method.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 9bc38e2ee6]
2025-07-11 16:14:29 +08:00
Honglei Huang e459cc0c3b rocr/driver: add GetClockCounters API to driver interface
This commit introduces a new GetClockCounters API to the driver interface.

- Implemented GetClockCounters in KfdDriver to fetch clock counters
  using hsaKmtGetClockCounters.
- Added a stub implementation of GetClockCounters in XdnaDriver that
  returns HSA_STATUS_ERROR.
- Modified GpuAgent to use driver().GetClockCounters instead of
  directly calling hsaKmtGetClockCounters.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 8d077dba3b]
2025-07-11 16:14:29 +08:00