220 Коммитов

Автор SHA1 Сообщение Дата
David Yat Sin 00e8a67165 rocr: Restore mmap flags back to MAP_PRIVATE (#2886)
Change mmap flags back to MAP_PRIVATE as MAP_SHARED increases allocation
time. Transparent huge pages are disabled for MAP_SHARED by default.
2026-01-30 08:36:05 -08:00
Alysa Liu 13091e18ad libhsakmt: Add THEROCK_SANITIZER support for ASAN builds (#2978)
Add THEROCK_SANITIZER support for ASAN builds.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-30 10:02:10 -05:00
Junhua Shen 0d98c3bdd5 libhsakmt: Implement per-context topology for multi-context KFD support (#2405)
This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management.

Changes:
* Add hsaKmtGetClockCountersCtx for multi-context support
  - Add context-aware version of hsaKmtGetClockCounters
  - Original API is retained as a wrapper calling the ctx-version with primary context

* Enable independent debug sessions across multiple KFD contexts
  -Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context

* Add perf sub-context for per-context performance counter management
  - Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization

* Refactor FMM for per-context resource management
  - Refactor multiple global variables related to FMM, including 
    GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context

* Implement per-context topology for complete context isolation
  - Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id)
     to per-context hsa_kfd_topology_context structure
  - Update all topology functions to accept HsaKFDContext parameter for
     context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.)
  - Refactor topology snapshot management for per-context isolation
  - Add context-aware PMC trace access APIs

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2026-01-30 09:42:25 +08:00
Rahul Manocha 324a864bc4 SWDEV-558848 - Move DRM calls to thunk for better abstraction (#1912)
* SWDEV-558848 - Move DRM calls to thunk for better abstraction

* Use thunk device handle instead of drm inside agent

* Update IPC functions with new thunk calls

* create hsaKmtHandleImport interface to support ipc

* Reset metadata inside hsaKmtMemHandleFree

* remove whitespaces and NULL usage

* Add thunk apis to libhsakmt.ver

* Add comments to new structs in thunk

* Minor fixes to declarations

* resolve merge conflicts in amd_kfd_driver

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-27 08:56:57 -08:00
German Andryeyev e438308541 rocr/libhskamt: Add wsl build in thunk 2026-01-15 17:29:50 -05:00
German Andryeyev 5c5b9729ff Add 'projects/rocr-runtime/libhsakmt/include/hsakmt/drm/' from commit '8c47e25315e70f9c8cdd57a5790d3e080938c969'
git-subtree-dir: projects/rocr-runtime/libhsakmt/include/hsakmt/drm
git-subtree-mainline: 5319163521
git-subtree-split: 8c47e25315
2026-01-15 16:06:07 -05:00
German Andryeyev 5319163521 Add 'projects/rocr-runtime/libhsakmt/include/impl/' from commit 'c34ec1e52fcb52da248c00207ebe646197ea9d3e'
git-subtree-dir: projects/rocr-runtime/libhsakmt/include/impl
git-subtree-mainline: 55f7d39fa5
git-subtree-split: c34ec1e52f
2026-01-15 15:54:37 -05:00
German Andryeyev 55f7d39fa5 Add 'projects/rocr-runtime/libhsakmt/src/dxg/' from commit '029690f0a4f62fefefbb67305a066a72e99f8c0b'
git-subtree-dir: projects/rocr-runtime/libhsakmt/src/dxg
git-subtree-mainline: 8760fb4976
git-subtree-split: 029690f0a4
2026-01-15 15:51:21 -05:00
Jin Jung d4758bc29e SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323) 2026-01-12 13:10:46 -06:00
Honglei Huang 054bf836f1 [rocr/libhskamt/virtio] Add some apis into libhsakmt virtio (#2457)
* libhsakmt/virtio: Add alloc memory align api

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Rename CLGL BO to AMDGPU BO

Rename VHSA_BO_CLGL to VHSA_BO_AMDGPU to support generic AMDGPU buffer objects, not just CL/GL interop.

* libhsakmt/virtio: Add atomic helpers and node lookup

Add vhsakmt_atomic_inc/dec macros and vhsakmt_get_node_by_id helper function.

* libhsakmt/virtio: Add AMDGPU device initialization support

Add vamdgpu_device_initialize and vamdgpu_device_deinitialize functions.

* libhsakmt/virtio: Add AMDGPU device handle and DRM command support

Add vamdgpu_device_get_fd, vdrmCommandWriteRead and update vhsaKmtGetAMDGPUDeviceHandle.

* libhsakmt/virtio: Add AMDGPU BO free and CPU map support

Add vamdgpu_bo_free and vamdgpu_bo_cpu_map functions.

* libhsakmt/virtio: Add AMDGPU BO import and export support

Add vamdgpu_bo_import, vamdgpu_bo_export and vhsakmt_bo_from_resid functions.

* libhsakmt/virtio: Add AMDGPU BO VA operation support

Add vamdgpu_bo_va_op function.

* libhsakmt/virtio: Add dma buf export support

Add vhsaKmtExportDMABufHandle API in virtio driver to support export
feature.

* libhsakmt/virtio: Fix potential deadlock in userptr deregistration

Refactor vhsakmt_deregister_userptr_non_svm to avoid calling
vhsakmt_destroy_userptr while holding the bo_handles_mutex lock.
Previously, destroying userptrs directly while iterating the tree
could cause deadlock issues due to nested locking.

- Move interval tree removal from vhsakmt_destroy_userptr to caller
- Collect BOs to free in a temporary array during tree traversal
- Destroy BOs after releasing the mutex to avoid lock contention
- Use dynamic array with realloc to handle arbitrary number of BOs

Signed-off-by: Honglei Huang <honghuan@amd.com>

* rocr: driver/virtio: Implement DMA-BUF import/export and memory mapping APIs

Implement the missing DMA-BUF handling and memory mapping functions
in the virtio KFD driver to enable cross-process memory sharing:

- ExportDMABuf: Export HSA memory as DMA-BUF file descriptor
- ImportDMABuf: Import DMA-BUF fd as shareable buffer object
- Map: Map imported buffer into virtual address space with permissions
- Unmap: Unmap buffer from virtual address space
- ReleaseShareableHandle: Free imported buffer object

Also add drm_perm() helper to convert HSA access permissions to
AMDGPU VM page flags (READABLE/WRITEABLE).

These APIs enable IPC memory sharing between HSA processes through
DMA-BUF mechanism in virtualized environments.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add register memory APIs

Add two new memory registration functions to the virtio HSA KMT library:

1. vhsaKmtRegisterMemory: A simplified wrapper for vhsaKmtRegisterMemoryWithFlags
   that uses default CoarseGrain memory flags.

2. vhsaKmtRegisterMemoryToNodes: A stub implementation for registering memory
   to specific nodes. Returns HSAKMT_STATUS_NOT_IMPLEMENTED as it's currently
   not used in ROCR.

Changes:
- Added function declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Exported symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add graphics handle registration and mapping APIs

- Add vhsaKmtRegisterGraphicsHandleToNodesExt() with flags support
- Add vhsaKmtMapGraphicHandle() and vhsaKmtUnmapGraphicHandle() stubs
- Refactor existing registration API to use extended version

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio support for queue APIs

Implement vhsaKmtUpdateQueue, vhsaKmtSetQueueCUMask,
vhsaKmtAllocQueueGWS and vhsaKmtGetQueueInfo functions
with virtio protocol extensions and symbol exports.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add new virtio API support for model, SMI, and XNACK mode

Add three new API functions to the virtio backend:
- vhsaKmtModelEnabled: Check if pre-silicon model is enabled (returns false for virtio)
- vhsaKmtOpenSMI: Open SMI interface for a node (not yet supported in virtio)
- vhsaKmtSetXNACKMode: Set XNACK mode via virtio control command

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add shared memory support for virtio backend

Implement shared memory APIs for the virtio backend to enable
memory sharing between processes:

- Add vhsaKmtShareMemory() to share memory regions and create
  shared memory handles
- Add vhsaKmtRegisterSharedHandle() to register shared memory
  handles in the current process
- Add vhsaKmtRegisterSharedHandleToNodes() for node-specific
  shared memory registration

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add memory management APIs for virtio

Add the following new memory management APIs to virtio implementation:
- vhsaKmtSetMemoryUserData: Set user data for memory pointer
- vhsaKmtSetMemoryPolicy: Configure memory policy for nodes
- vhsaKmtSVMGetAttr: Get SVM (Shared Virtual Memory) attributes
- vhsaKmtSVMSetAttr: Set SVM attributes
- vhsaKmtReplaceAsanHeaderPage: ASAN header page replacement (stub)
- vhsaKmtReturnAsanHeaderPage: ASAN header page return (stub)

Changes include:
- Added API declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Extended protocol definitions in hsakmt_virtio_proto.h
- Added user_data field to vhsakmt_bo structure
- Exported new symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add SPM APIs

Add three new SPM-related APIs to the virtio interface:
- vhsaKmtSPMAcquire: Acquire SPM resources on a preferred node
- vhsaKmtSPMRelease: Release SPM resources on a preferred node
- vhsaKmtSPMSetDestBuffer: Set destination buffer for SPM data with
  optional userptr support and data loss detection

These APIs extend the virtio command protocol with new query types:
- VHSAKMT_CCMD_QUERY_SPM_ACQUIRE
- VHSAKMT_CCMD_QUERY_SPM_RELEASE
- VHSAKMT_CCMD_QUERY_SPM_SET_DST_BUFFER

The implementation includes proper buffer management for both
direct BO access and userptr fallback for smaller buffers.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio stub for hsaKmtAisReadWriteFile API

Add vhsaKmtAisReadWriteFile stub implementation for the virtio backend
to support AIS (Accelerated I/O Service) file read/write operations.
This stub currently returns HSAKMT_STATUS_NOT_IMPLEMENTED.

Changes include:
- Add vhsaKmtAisReadWriteFile declaration in hsakmt_virtio.h
- Add stub implementation in hsakmt_virtio_memory.c
- Export the symbol in libhsakmt_virtio.ver

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

* libhsakmt/virtio: Add vamdgpu_bo_query_info and vamdgpu_bo_set_metadata APIs

Implement two new virtio wrapper functions for AMDGPU buffer object operations:

1. vamdgpu_bo_query_info: Query buffer object information including
   allocation parameters, memory usage, and metadata.

2. vamdgpu_bo_set_metadata: Set metadata for a buffer object, allowing
   applications to attach custom data to GPU memory allocations.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add ProcessVMRead/Write stub implementations for virtio

Add vhsaKmtProcessVMRead and vhsaKmtProcessVMWrite stub functions
to the virtio interface. These APIs return HSAKMT_STATUS_NOT_IMPLEMENTED
since they are not supported in the baremetal implementation, matching
the behavior of the deprecated hsaKmtProcessVMRead/Write APIs.

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
Co-authored-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
2026-01-09 18:18:53 -08:00
Flora Cui be04fa8250 rocr: reorder HsaNodeProperties to improve compatibility (#2447)
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-08 09:56:39 +08:00
Alysa Liu 5be4fddf06 kfdtest: Support blit kernel copy (#677)
Add support for blit kernel copy.
Add GpuMemCopyTest test for KFDQMTest.
2026-01-07 16:48:11 -05:00
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
Kent Russell 0a2ea9ef55 hsakmt: Expose and use CWSR and Control stack sizes (#2200)
* hsakmt: Expose CWSR and Control stack sizes

This is better than hardcoding values and hoping that they align with
KFD's definitions

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Use CwsrSize and CtlStackSize if available

If KFD is providing the CwsrSize and CtlStackSize, use the maximum
of those and the old calculations for the ctx_save_restore_size
and ctl_stack_size defined in the queue

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Add warning when ABI<1.20 on GFX1151

CwsrSize and CtlStackSize are reported by KFD ABI 1.20. GFX1151
specifically may have some issues if these regions are misaligned, so
report a strong warning during topology initialization if the system is
GFX1151 but is using KFD ABI < 1.20

Signed-off-by: Kent Russell <kent.russell@amd.com>

---------

Signed-off-by: Kent Russell <kent.russell@amd.com>
2025-12-16 06:26:14 -06:00
Rahul Manocha dd4bee33ff SWDEV-558848 - Update thunk interface signature for vmm enablement (#2259)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-12-11 08:43:28 -08:00
Mario Limonciello 0c4d08f38d Revert correcting the VGPR size for GFX 11.5.1 (#2268)
Although the value is correct; there is no source of truth between
kernel and userspace.  This leads to problems if the kernel has strict
restrictions (such as kernel 6.17 or earlier). The restrictions were
lifted in 6.17.9 and and 6.18, but there is no guarantee userspace is
using this.

So short term this value will be wrong.  But on newer kernels the kernel
will communicate the right size and rocr-runtime will be adjusted to
use that.

Link: https://github.com/ROCm/TheRock/pull/2505

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-11 07:59:19 -06:00
Rahul Manocha 0c1f87a7f6 SWDEV-558848 - vmm api support for rocr on windows (#1761)
* SWDEV-558848 - vmm api support for rocr on windows

* Fixes to VMM handle Map/Unmap Set/Get Access

* Fix GetShareableHandle to use pointer for shareable handle

* Update os specific map/unmap memory calls

* clang format update

* Minor syntax fixes from code review

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
2025-12-10 08:39:51 -08:00
Jin Jung deaf8ab38a SWDEV-567119 - Windows GL Interop Support (#1892) 2025-12-08 11:03:59 -05:00
Alysa Liu 3a7b5571c0 kfdtest: Replace pthread with std::thread (#1448)
* kfdtest: Replace pthread with std::thread

Modify concurrent kfdtest to use std::thread
instead of pthread, eventually modify KFDTestLaunch
to take in a member function of test instance
instead of static function.

Convert KFDQMTest to pass in member function for
multi-gpu kfdtest.

* kfdtest: Convert KFDPerfCountersTest to use std::thread

Convert KFDPerfCountersTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDGraphicsInterop to use std::thread

Convert KFDGraphicsInterop to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDGWSTest to use std::thread

Convert KFDGWSTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDCWSRTest to use std::thread

Convert KFDCWSRTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDEventTest to use std::thread

Convert KFDEventTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDExceptionTest to use std::thread

Convert KFDExceptionTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDLocalMemoryTest to use std::thread

Convert KFDLocalMemoryTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDMemoryTest to use std::thread

Convert KFDMemoryTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDSVMRangeTest to use std::thread

Convert KFDSVMRangeTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Convert KFDHWSTest to use std::thread

Convert KFDHWSTest to use std::thread for
multi-gpu kfdtest.

* kfdtest: Remove pthread multigpu test structure

Remove older multi-gpu test framework which
uses pthread.
2025-12-02 10:25:21 -05:00
Honglei Huang aaa06e1609 libhsakmt/virtio: add non SVM mode in libhsakmt virtio driver and many fixes (#1756)
* libhsakmt/virtio: change shmem size to 80

Some DGPU props have a lot of information,
so it is necessary to increase the size of shmem.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: use BO handle instead of pointer in memory registration

Change vhsakmt_map_to_gpu() return type from void* to vhsakmt_bo_handle
to properly handle buffer object information. This allows access to
both the host address and resource ID needed for memory registration.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Improve memory mapping logic

- Update vhsakmt_mappable() to check NoAddress flag and require HostAccess
- Remove mappable checks in cpu_map/unmap to allow all BOs to be mapped
- Set BO flags properly in vhsakmt_alloc_memory and scratch memory creation
- Ensure scratch memory is correctly flagged for proper handling

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: add no svm mode for libhsakmt virtio

Add no svm mode for libhsakmt virtio driver, in no svm mode userptrs
need UMD to manage, so add interval tree to manage them.

New Features:
- Add augmented red-black tree based interval tree implementation
  * Implement RB-tree insertion, deletion, and color balancing
  * Provide interval query for fast overlapping range lookup
  * Based on Linux kernel's augmented rbtree implementation

- Improve userptr memory management
  * Use interval tree to efficiently track userptr memory regions
  * Support finding registered memory within given address ranges
  * Optimize memory mapping and unmapping performance

Signed-off-by: Honglei Huang <honghuan@amd.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
2025-11-28 09:20:43 +08:00
jokim-amd 770f30bc4c hsakmt: bump vgpr count for gfx1151 (#1807)
GFX1151 has 1.5x VGPR memory compared to the rest of GFX11.
2025-11-21 09:53:32 -08:00
andmar-amd da6e939c6c Disable PCSampling on upstream branches (#1421)
- PC Sampling ioctls/tests are not up-streamed. They should be skipped
   for any and all upstream branches.
2025-11-19 14:15:40 -08:00
andmar-amd 70fc774ad0 Disable KFDDBGTest.HitMemoryViolation for navi 10 (#1423)
- Filter out KFDDBGTest.HitMemoryViolation for navi10, which is
   currently failing
2025-11-19 14:15:05 -08:00
andmar-amd 2b4d17078a Improve test script logic and error handling (#1424)
- Fix exclude+gtest_filter logic
 - Improve error handling when detecting upstream branches
2025-11-19 14:14:40 -08:00
Junhua Shen 9da1572c42 libhsakmt: Refactor for Multi-KFD Context Support (Multiple KFD FDs per Process) (#1701)
* Introduce HsaKFDContext structure and infrastructure for multiple KFD contexts, enabling
   independent contexts within a single process.
* Refactor core components (queue, event, FMM, topology) to be context-aware,
   using explicit HsaKFDContext parameters instead of global state.
* Replace global hsakmt_kfd_fd with context-specific file descriptors, ensuring full context isolation.
* Maintain backward compatibility by redirecting legacy APIs to use the primary context.

This refactoring establishes a foundation for multi-context support while preserving existing functionality.

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2025-11-10 11:19:58 +08:00
David Yat Sin de3b7322f2 rocr/hsakmt: Fix asan compile errors - KFDQMTest (#1638)
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-07 14:52:36 -05:00
systems-assistant[bot] 740b27528f kfdtest: Enable GPU selection via CLI for multi-GPU tests (#245)
* kfdtest: Enable GPU selection via CLI for multi-GPU tests

Replaced environment variable-based GPU selection with
GPU selection via command-line parameter --concurrentnodes (-c)
Modified g_TestGPUsNum to be passed in via command-line
parameter --testnodenum (t)

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>

* kfdtest: Enable GPU selection via CLI for multi-GPU tests
Replaced environment variable-based GPU selection with
GPU selection via command-line parameter --concurrentnodes (-c)
Modified g_TestGPUsNum to be passed in via command-line
parameter --testnodenum (t)

---------

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Co-authored-by: Alysa Liu <Alysa.Liu@amd.com>
2025-11-03 09:27:38 -05:00
David Yat Sin e2f3bd2429 Changes for RDMA with VMM (#801)
* rocr: Add support for VMM and RDMA

Add extra CPU mapping so that kernel-mode drivers can look up the memory
mapping by virtual address.

* Update projects/rocr-runtime/runtime/hsa-runtime/core/runtime/runtime.cpp

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

* Update projects/rocr-runtime/runtime/hsa-runtime/core/inc/runtime.h

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

* rocr: Honor uncache flag in memory_lock_to_pool()

Also, combined several flag options used in apis into a
single integer.

Signed-off-by: Chris Freehill <cfreehil@amd.com>

* rocr: Fix hsa_amd_pointer_info on CPU agents

Fix hsa_amd_pointer_info query returning allowd on VMM pointers for CPU
agents when CPU mapping was mapped with PROT_NONE.

---------

Signed-off-by: Chris Freehill <cfreehil@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
Co-authored-by: Chris Freehill <cfreehil@amd.com>
Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com>
2025-10-21 12:19:02 -04:00
David Bélanger 02294e3852 kfdtest: Fix ExtendedCuMasking on GPUs with inactive CUs (#726)
Modify the code that computes the adjusted CU mask array to take
into account of additional cases for inactive CUs.

Signed-off-by: David Belanger <david.belanger@amd.com>
2025-10-17 08:26:12 -07:00
cfreeamd 9df655088f thunk: Correct kfd_ioctl_create_queue_args comment (#1235) 2025-10-17 08:25:51 -07:00
Alysa Liu 4342579645 libhsakmt: Fix memory leak for events_page metadata (#807) 2025-10-15 14:52:40 -04:00
German Andryeyev 7ca2497378 rocr: Add AQL queue support under Windows (#1211)
Add 2 extra caps into the thunk interface to indicate
the queue object creation and PM4 emulation
2025-10-07 17:55:08 -04:00
David Yat Sin cd48105282 rocr: Fix ext-fine-grain flag on host memory (#1067)
Fix for extended-fine-grain flag not set in thunk when
allocating host memory.
2025-09-25 11:10:43 -04:00
hkasivis 5e7210980e Users/hkasivis/add ais support v2.1 (#928)
* libhsakmt: Update hsakmt_fmm_get_handle to support address range

Currently, hsakmt_fmm_get_handle works only if the address is allocated
(staring) value. Update it so it can find the handle if address falls in
the valid allocated range. This is useful for AMD infinity storage
feature where data needs to be transferred to any memory within in the
allocated range

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Introduce AMD Infinity Storage (AIS) API

Add hsaKmtAisReadWriteFile() API to support AMD Infinity Storage. The
API moves data directly from GPU VRAM to a file.

v2: Add in/out ioctl arguments to provide more status information to
user space. Modify hsaKmt API also accordingly.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* rocr: Initial implementation of AMD Infinity Storage (AIS)

Implement first two API: hsa_amd_ais_file_write and hsa_amd_ais_file_read

v2: Change API from hsa_amd_ to hsa_amd_ais_
    Change API to take in handle instead of fd for compatibility accross
     different platforms

Original Author: Chris Freehill <Chris.Freehill@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-20 11:30:05 -04:00
Sunday Clement 7c8e575f5d Fix Undefined behavior from signed bit shifts (#871)
* libhsakmt: fix UB due to signed integer literal in 1 << 31

Bit shift operations on signed numbers should not shift into or beyond
the signed bit as this results in Undefined Behaviour.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* libhsakmt: Fix UB due to signed integer literal in 1 << x

Bit Shifting an unsigned integer is undefined behavior.

BUG: SWDEV-532853

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Fix UB in various places due signed integer in bit shift

Bit shifting signed integers into or beyond the sign bit is undefined.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

* rocr: Change signed integer literals to unsigned

Changing the signed integers in the macro expressions throughout the file
to avoid overflow.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>

---------

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Flora Cui <flora.cui@amd.com>
2025-09-18 09:09:30 -04:00
Sunday Clement db63d4c38b hsakmt: Update udmabuf.h License Identifier Header (#873)
Fix typos, and update the license header to include SPDX license
identifier.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-09-16 10:36:02 -04:00
Alysa Liu 2b2b8329b5 rocr: Add copyright for new files (#886)
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-09-11 10:56:31 -04:00
hkasivis a5713c85bb Users/hkasivis/sync kfd ioctl header (#848)
* libhsakmt: Update ioctl version to 1.18

Sync with kernel ioctl version.

Also explicitly set the ioctl flag to KFD_PROC_FLAG_MFMA_HIGH_PRECISION

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Sync ioctl header by adding kfd_ioctl_profiler

Sync with kernel ioctl version. Add kfd_ioctl_profiler.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-07 20:04:31 -04:00
hkasivis 53ba025a2e libhsakmt: Don't use MADV_DONTFORK for paged memory (#356)
Also advice parameter of madvise() system call is not a bitmask. So fix
that also

v2: Use MAP_SHARED instead of MAP_PRIVATE. This avoids MMU notifiers and
    evictions.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-08-15 09:22:20 -04:00
Yiannis Papadopoulos b14bb0f942 libhsakmt: Use numa_node_size64 with long long
[ROCm/ROCR-Runtime commit: ea0a3e8da4]
2025-07-31 18:17:52 -05:00
Xiaogang Chen 9603606d80 hsakmt: Use udmabuf to allocate system memory
This patch uses udmabuf driver to allocate system memory instead of using amdgpu
driver for APU. With this function app can account its consumed system memory by
cgroup mechanism. This function is enabled by env variable HSA_USE_UDMABUF.

Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>


[ROCm/ROCR-Runtime commit: 996e8bbfb7]
2025-07-28 14:11:17 -07:00
Honglei Huang 5e68bd163a libhsakmt/virtio: add virtio support for libhsakmt
This patch adds VirtIO support to the libhsakmt library, enabling communication
 with AMD GPUs via VirtIO.

Details
- CMakeLists.txt: Added a new CMakeLists.txt file for the VirtIO component
of libhsakmt.
- hsakmt_virtio.c/h: Implemented the core VirtIO functionality, including
VirtIO GPU device initialization, command execution, and memory management.
- virtio_gpu.c/h: Contains the implementation of the VirtIO GPU device,
including ioctl handling, shared memory management, and command execution.
- hsakmt_virtio_events.c: Implements event handling for VirtIO, such as event
creation, destruction, setting, resetting, and querying event states.
- hsakmt_virtio_memory.c: Manages memory operations for VirtIO, including memory
allocation, freeing, mapping, and unmapping.
- hsakmt_virtio_queues.c: Implements queue management for VirtIO, including
queue creation, destruction, and updating.
- hsakmt_virtio_topology.c: Handles system and node properties for VirtIO.
- hsakmt_virtio_vm.c: Manages VM-related operations for VirtIO, such as
reserving and dereserving VA space.
- include/linux/virtgpu_drm.h: Contains DRM definitions for VirtIO GPU.

Key Features
- VirtIO GPU Initialization: The library can now initialize a VirtIO GPU device
and communicate with it.
- Command Execution: Supports executing commands on the VirtIO GPU device.
- Memory Management: Provides functions for allocating, freeing, mapping, and
unmapping memory for VirtIO operations.
- Event Handling: Implements a comprehensive event system for VirtIO.
- Queue Management: Allows for creating, destroying, and updating queues
on the VirtIO GPU device.
- System and Node Properties: Retrieves and manages system and node
properties for VirtIO.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 48d3719dba]
2025-07-24 23:20:36 +08:00
Kent Russell 991f72bb9f kfdtest: Remove gfx940/941 references
Support was removed for these eng samples, so remove them from the
blacklist, and make sure that we're using 942 for the shader store


[ROCm/ROCR-Runtime commit: f755981f03]
2025-07-22 08:47:34 -04:00
Honglei Huang b0866264b4 libhsakmt: modify is scratch memory helper
- Refactored scratch memory handling by introducing fmm_is_scratch_aperture to
replace repeated for-loops.
- Simplified code paths in hsakmt_fmm_release, hsakmt_fmm_map_to_gpu, and
hsakmt_fmm_unmap_from_gpu by using the new helper.

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 72061a9024]
2025-07-18 14:25:55 +08:00
Flora Cui 6bb53e88c5 rocr: add specific flag for blit kernel object
so that aql-to-pm4 conversion could verify the validity of the kernel
object.

Signed-off-by: Flora Cui <flora.cui@amd.com>


[ROCm/ROCR-Runtime commit: a765dd7e94]
2025-07-17 21:55:02 +08:00
Honglei Huang a8e7d69b18 libhsakmt: use uint32_t for loop index variables
This patch changes the type of several loop index variables from int to
uint32_t in fmm.c. The affected functions are:
- __fmm_release
- _fmm_map_to_gpu
- _fmm_unmap_from_gpu

To fix compile warning:

warning: comparison of integer expressions of different signedness:
'int' and 'uint32_t' {aka 'unsigned int'} [-Wsign-compare]
 2009 |         for (i = 0; i < object->handle_num; i++) {

Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>


[ROCm/ROCR-Runtime commit: 45af009c5d]
2025-07-09 13:15:42 +08:00
Apurv Mishra 6c89d61cef kfdtest: Temporarily blacklist KFDEvictTest suite
blacklist the KFDEvictTest suite until the defects
SWDEV 535386 and 537002, where these test cases fail
inconsistently, are fixed

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: 3115384874]
2025-07-04 11:47:20 -04:00
David Yat Sin 8982f2c2c6 rocr: Fix compile warning when using clang
[ROCm/ROCR-Runtime commit: 96d0f07b15]
2025-06-12 10:38:58 -04:00
Apurv Mishra 226d8126c9 kfdtest: Disable KFD RAS test case
disable KFD RAS test case as the tests cause GPU reset
which affects the active kfdtest, the tests can only be
run successfully as separate processes

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>


[ROCm/ROCR-Runtime commit: d9a95605cc]
2025-05-27 19:04:04 -04:00