* SWDEV-558848 - Move DRM calls to thunk for better abstraction
* Use thunk device handle instead of drm inside agent
* Update IPC functions with new thunk calls
* create hsaKmtHandleImport interface to support ipc
* Reset metadata inside hsaKmtMemHandleFree
* remove whitespaces and NULL usage
* Add thunk apis to libhsakmt.ver
* Add comments to new structs in thunk
* Minor fixes to declarations
* resolve merge conflicts in amd_kfd_driver
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.
## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.
The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install
```
## JIRA ID
SWDEV-558849
## Test Plan
N/A
## Test Result
N/A
## Submission Checklist
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* SWDEV-561708 Initial shared queue pool apis
* Validate params; some fixes in callback function (but still needs to be checked)
* Dtor cleanup
* minor
* Enable profiling; remove callback since aql_queue takes care of it
* setPriority and setCuMask APIs updated for counted queues
* Increasing step and minor version for rocprofiler
* Tests for CountedQueueManager
* tests
* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending
* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added
* Changed to ASSERT_SUCCESS macros for all tests
* RIng buffer overflow test added
* tests fixed; cleanup added at hsa_shutdown
* priority conversion table changes
* Compiler warnings fixed
* Rewrite 1 test; add desc and improve SetUp() code
* Improvement
* Unififed getinfo for both counted and non-counted queues
* Address PR feedback
* Addressing feedback: memleak, data type mismatch, documentation
* improve comment
* format
* Missing HSA_API macros for roctracer
* Revert "Addressing feedback: memleak, data type mismatch, documentation"
This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.
* Improving acquire api doc
* release api doc improved
* error codes for release api doc
* SWDEV-555889 - Support mipmap on rocr
Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.
Add some SRD logs that will be removed finally.
* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.
* Rewrite view logic
* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.
* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.
* minor format chang
* Exclude mipmap tests for mi200+ which don't support mipmap.
* Fix set/get access failure for VMM on windows
* seperate code paths for linux and windows to avoid using import/export calls in windows
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
* Add hipDeviceAttributeExpertSchedMode
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
* Update hipDeviceAttributeExpertSchedMode unit test
* Move check to ROCr from thunk interface
* Revert unrelated whitespace changes
* Revert version bump
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Add explicit test cases to verify ROCr functionality with interrupts
disabled (HSA_ENABLE_INTERRUPT=0). This ensures compatibility with
virtio, dtif, and WSL configurations which require interrupt-disabled
mode.
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.
Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
* rocr: reduce ifdef linux
v2: Fix IsDXG check logic
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
* libhsakmt/virtio: Add alloc memory align api
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Rename CLGL BO to AMDGPU BO
Rename VHSA_BO_CLGL to VHSA_BO_AMDGPU to support generic AMDGPU buffer objects, not just CL/GL interop.
* libhsakmt/virtio: Add atomic helpers and node lookup
Add vhsakmt_atomic_inc/dec macros and vhsakmt_get_node_by_id helper function.
* libhsakmt/virtio: Add AMDGPU device initialization support
Add vamdgpu_device_initialize and vamdgpu_device_deinitialize functions.
* libhsakmt/virtio: Add AMDGPU device handle and DRM command support
Add vamdgpu_device_get_fd, vdrmCommandWriteRead and update vhsaKmtGetAMDGPUDeviceHandle.
* libhsakmt/virtio: Add AMDGPU BO free and CPU map support
Add vamdgpu_bo_free and vamdgpu_bo_cpu_map functions.
* libhsakmt/virtio: Add AMDGPU BO import and export support
Add vamdgpu_bo_import, vamdgpu_bo_export and vhsakmt_bo_from_resid functions.
* libhsakmt/virtio: Add AMDGPU BO VA operation support
Add vamdgpu_bo_va_op function.
* libhsakmt/virtio: Add dma buf export support
Add vhsaKmtExportDMABufHandle API in virtio driver to support export
feature.
* libhsakmt/virtio: Fix potential deadlock in userptr deregistration
Refactor vhsakmt_deregister_userptr_non_svm to avoid calling
vhsakmt_destroy_userptr while holding the bo_handles_mutex lock.
Previously, destroying userptrs directly while iterating the tree
could cause deadlock issues due to nested locking.
- Move interval tree removal from vhsakmt_destroy_userptr to caller
- Collect BOs to free in a temporary array during tree traversal
- Destroy BOs after releasing the mutex to avoid lock contention
- Use dynamic array with realloc to handle arbitrary number of BOs
Signed-off-by: Honglei Huang <honghuan@amd.com>
* rocr: driver/virtio: Implement DMA-BUF import/export and memory mapping APIs
Implement the missing DMA-BUF handling and memory mapping functions
in the virtio KFD driver to enable cross-process memory sharing:
- ExportDMABuf: Export HSA memory as DMA-BUF file descriptor
- ImportDMABuf: Import DMA-BUF fd as shareable buffer object
- Map: Map imported buffer into virtual address space with permissions
- Unmap: Unmap buffer from virtual address space
- ReleaseShareableHandle: Free imported buffer object
Also add drm_perm() helper to convert HSA access permissions to
AMDGPU VM page flags (READABLE/WRITEABLE).
These APIs enable IPC memory sharing between HSA processes through
DMA-BUF mechanism in virtualized environments.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add register memory APIs
Add two new memory registration functions to the virtio HSA KMT library:
1. vhsaKmtRegisterMemory: A simplified wrapper for vhsaKmtRegisterMemoryWithFlags
that uses default CoarseGrain memory flags.
2. vhsaKmtRegisterMemoryToNodes: A stub implementation for registering memory
to specific nodes. Returns HSAKMT_STATUS_NOT_IMPLEMENTED as it's currently
not used in ROCR.
Changes:
- Added function declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Exported symbols in libhsakmt_virtio.ver
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add graphics handle registration and mapping APIs
- Add vhsaKmtRegisterGraphicsHandleToNodesExt() with flags support
- Add vhsaKmtMapGraphicHandle() and vhsaKmtUnmapGraphicHandle() stubs
- Refactor existing registration API to use extended version
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add virtio support for queue APIs
Implement vhsaKmtUpdateQueue, vhsaKmtSetQueueCUMask,
vhsaKmtAllocQueueGWS and vhsaKmtGetQueueInfo functions
with virtio protocol extensions and symbol exports.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add new virtio API support for model, SMI, and XNACK mode
Add three new API functions to the virtio backend:
- vhsaKmtModelEnabled: Check if pre-silicon model is enabled (returns false for virtio)
- vhsaKmtOpenSMI: Open SMI interface for a node (not yet supported in virtio)
- vhsaKmtSetXNACKMode: Set XNACK mode via virtio control command
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add shared memory support for virtio backend
Implement shared memory APIs for the virtio backend to enable
memory sharing between processes:
- Add vhsaKmtShareMemory() to share memory regions and create
shared memory handles
- Add vhsaKmtRegisterSharedHandle() to register shared memory
handles in the current process
- Add vhsaKmtRegisterSharedHandleToNodes() for node-specific
shared memory registration
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add memory management APIs for virtio
Add the following new memory management APIs to virtio implementation:
- vhsaKmtSetMemoryUserData: Set user data for memory pointer
- vhsaKmtSetMemoryPolicy: Configure memory policy for nodes
- vhsaKmtSVMGetAttr: Get SVM (Shared Virtual Memory) attributes
- vhsaKmtSVMSetAttr: Set SVM attributes
- vhsaKmtReplaceAsanHeaderPage: ASAN header page replacement (stub)
- vhsaKmtReturnAsanHeaderPage: ASAN header page return (stub)
Changes include:
- Added API declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Extended protocol definitions in hsakmt_virtio_proto.h
- Added user_data field to vhsakmt_bo structure
- Exported new symbols in libhsakmt_virtio.ver
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add SPM APIs
Add three new SPM-related APIs to the virtio interface:
- vhsaKmtSPMAcquire: Acquire SPM resources on a preferred node
- vhsaKmtSPMRelease: Release SPM resources on a preferred node
- vhsaKmtSPMSetDestBuffer: Set destination buffer for SPM data with
optional userptr support and data loss detection
These APIs extend the virtio command protocol with new query types:
- VHSAKMT_CCMD_QUERY_SPM_ACQUIRE
- VHSAKMT_CCMD_QUERY_SPM_RELEASE
- VHSAKMT_CCMD_QUERY_SPM_SET_DST_BUFFER
The implementation includes proper buffer management for both
direct BO access and userptr fallback for smaller buffers.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add virtio stub for hsaKmtAisReadWriteFile API
Add vhsaKmtAisReadWriteFile stub implementation for the virtio backend
to support AIS (Accelerated I/O Service) file read/write operations.
This stub currently returns HSAKMT_STATUS_NOT_IMPLEMENTED.
Changes include:
- Add vhsaKmtAisReadWriteFile declaration in hsakmt_virtio.h
- Add stub implementation in hsakmt_virtio_memory.c
- Export the symbol in libhsakmt_virtio.ver
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
* libhsakmt/virtio: Add vamdgpu_bo_query_info and vamdgpu_bo_set_metadata APIs
Implement two new virtio wrapper functions for AMDGPU buffer object operations:
1. vamdgpu_bo_query_info: Query buffer object information including
allocation parameters, memory usage, and metadata.
2. vamdgpu_bo_set_metadata: Set metadata for a buffer object, allowing
applications to attach custom data to GPU memory allocations.
Signed-off-by: Honglei Huang <honghuan@amd.com>
* libhsakmt/virtio: Add ProcessVMRead/Write stub implementations for virtio
Add vhsaKmtProcessVMRead and vhsaKmtProcessVMWrite stub functions
to the virtio interface. These APIs return HSAKMT_STATUS_NOT_IMPLEMENTED
since they are not supported in the baremetal implementation, matching
the behavior of the deprecated hsaKmtProcessVMRead/Write APIs.
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
---------
Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
Co-authored-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
SWDEV-539526 - Add support for Mipmapped Array in Rocr
Add support for Mipmapped Array functionality in Rocr Runtimeenabling GPU applications to work with multi-level texture mipmaps. The implementation introduces new public APIs for creating, querying, and managing mipmapped arrays across different GPU architectures.
Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
Co-authored-by: taosang2 <tao.sang@amd.com>
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers
* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex
* Replaced unique_locks with lock_guards
* More changes
* Replace new and deletes with smart pointers
* Replaced some more with shared ptrs
* Replacements with smart pointers - pt 2
* missed change
* Add HasExpertSchedMode device prop
* Add unit tests for HasExpertSchedMode
* Add gfx12 check for HasExpertSchedMode prop
* Update gfx major version check and test for ExpertSchedMode
* Minor fix and ROCr version bump
* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h
* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h
* Apply suggestion from @dayatsin-amd
* Apply suggestion from @dayatsin-amd
---------
Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
* hsakmt: Expose CWSR and Control stack sizes
This is better than hardcoding values and hoping that they align with
KFD's definitions
Signed-off-by: Kent Russell <kent.russell@amd.com>
* hsakmt: Use CwsrSize and CtlStackSize if available
If KFD is providing the CwsrSize and CtlStackSize, use the maximum
of those and the old calculations for the ctx_save_restore_size
and ctl_stack_size defined in the queue
Signed-off-by: Kent Russell <kent.russell@amd.com>
* hsakmt: Add warning when ABI<1.20 on GFX1151
CwsrSize and CtlStackSize are reported by KFD ABI 1.20. GFX1151
specifically may have some issues if these regions are misaligned, so
report a strong warning during topology initialization if the system is
GFX1151 but is using KFD ABI < 1.20
Signed-off-by: Kent Russell <kent.russell@amd.com>
---------
Signed-off-by: Kent Russell <kent.russell@amd.com>
Although the value is correct; there is no source of truth between
kernel and userspace. This leads to problems if the kernel has strict
restrictions (such as kernel 6.17 or earlier). The restrictions were
lifted in 6.17.9 and and 6.18, but there is no guarantee userspace is
using this.
So short term this value will be wrong. But on newer kernels the kernel
will communicate the right size and rocr-runtime will be adjusted to
use that.
Link: https://github.com/ROCm/TheRock/pull/2505
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* rocrtst: Updated CMakeFiles to find_package instead of hardcoded
This is to support TheROCK build environment
* rocrtst: Fix CMake to use find_package() instead of hardcoded ENV paths
Fixed CMake style issues from previos first commit's code review
* rocrtst: Fix rocrtst NUMA dependency detection to use find_package
Also added handling of missing headers
* rocrtst: Fix NUMA and hwloc detection for cross-platform builds
---------
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
* SWDEV-558848 - vmm api support for rocr on windows
* Fixes to VMM handle Map/Unmap Set/Get Access
* Fix GetShareableHandle to use pointer for shareable handle
* Update os specific map/unmap memory calls
* clang format update
* Minor syntax fixes from code review
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
---------
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
* Run pre-commit's whitespace related hooks on projects/rocr-runtime
In order for pre-commit to be useful, everything needs to meet a common
baseline.
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Add missing semicolon which would block compilation on big endian CPUs
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
---------
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
* Fixing the copy back to the original buffer malformed packets
* Addressing Copilot Comments
* Addressing Review comments
* Adjust staging buffer size allocation
Change staging buffer size to match the number of packets.
This patch enhances compatibility for DXG environments by introducing conditional
checks for DRM operations, particularly around buffer object metadata handling
in IPC scenarios. These changes improve robustness in DXG IPC memory management
without impacting existing functionality in standard Linux environments.
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
WSL uses the call just for the thread wake-up, however under Windows
KMD needs the actual value (SWDEV-568592). The interface is changed
to avoid programming of a modified write_ptr value, which somewhat
changes the client's logic.
Changed ipc_sock_server_conns_ map's value type to size_t. Previous
type of int caused allocations of sizes greater than 2GB to overflow,
causing the message len to be stored as a negative value, preventing the
IPC server from exporting dmabuf file descriptors, which lead to hangs.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* kfdtest: Replace pthread with std::thread
Modify concurrent kfdtest to use std::thread
instead of pthread, eventually modify KFDTestLaunch
to take in a member function of test instance
instead of static function.
Convert KFDQMTest to pass in member function for
multi-gpu kfdtest.
* kfdtest: Convert KFDPerfCountersTest to use std::thread
Convert KFDPerfCountersTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDGraphicsInterop to use std::thread
Convert KFDGraphicsInterop to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDGWSTest to use std::thread
Convert KFDGWSTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDCWSRTest to use std::thread
Convert KFDCWSRTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDEventTest to use std::thread
Convert KFDEventTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDExceptionTest to use std::thread
Convert KFDExceptionTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDLocalMemoryTest to use std::thread
Convert KFDLocalMemoryTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDMemoryTest to use std::thread
Convert KFDMemoryTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDSVMRangeTest to use std::thread
Convert KFDSVMRangeTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Convert KFDHWSTest to use std::thread
Convert KFDHWSTest to use std::thread for
multi-gpu kfdtest.
* kfdtest: Remove pthread multigpu test structure
Remove older multi-gpu test framework which
uses pthread.
Core dumps are not supporetd for gfx110x, but should be possible for
gfx115x. The current code disables core dumps completly for all gfx11xx
agents, relax this to allow gfx115x.