3115 İşleme

Yazar SHA1 Mesaj Tarih
cfreeamd 5172701708 rocr: Correct gpu dumped core contents (#2851)
Includes several tests (rocrtst) for this capability.
2026-01-30 09:38:09 -08:00
David Yat Sin 00e8a67165 rocr: Restore mmap flags back to MAP_PRIVATE (#2886)
Change mmap flags back to MAP_PRIVATE as MAP_SHARED increases allocation
time. Transparent huge pages are disabled for MAP_SHARED by default.
2026-01-30 08:36:05 -08:00
Alysa Liu 13091e18ad libhsakmt: Add THEROCK_SANITIZER support for ASAN builds (#2978)
Add THEROCK_SANITIZER support for ASAN builds.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-30 10:02:10 -05:00
Junhua Shen 0d98c3bdd5 libhsakmt: Implement per-context topology for multi-context KFD support (#2405)
This enhances libhsakmt's capabilities for multi-context KFD support by implementing per-context topology management.

Changes:
* Add hsaKmtGetClockCountersCtx for multi-context support
  - Add context-aware version of hsaKmtGetClockCounters
  - Original API is retained as a wrapper calling the ctx-version with primary context

* Enable independent debug sessions across multiple KFD contexts
  -Create hsa_kfd_debug_context, introduce context-aware debug APIs, shift debug state to per-context

* Add perf sub-context for per-context performance counter management
  - Introduce hsa_kfd_perf_context, move counter properties, add context - aware perf APIs, and update initialization

* Refactor FMM for per-context resource management
  - Refactor multiple global variables related to FMM, including 
    GPU ID arrays , svm, cpuvm_aperture, and mem_handle_aperture to hsa_kfd_fmm_context

* Implement per-context topology for complete context isolation
  - Migrate global topology data (g_system, g_props, map_user_to_sysfs_node_id)
     to per-context hsa_kfd_topology_context structure
  - Update all topology functions to accept HsaKFDContext parameter for
     context-aware operations (validate_nodeid, get_node_props, get_iolink_props, etc.)
  - Refactor topology snapshot management for per-context isolation
  - Add context-aware PMC trace access APIs

Signed-off-by: Junhua Shen <Junhua.Shen@amd.com>
2026-01-30 09:42:25 +08:00
pghoshamd adaaa883b2 SWDEV-576090 Fix mem leaks and double free of signals (#2817) 2026-01-29 16:53:27 -05:00
Yiannis Papadopoulos 66ee941fea rocr/aie: Throw exception for malformed packets and packet submission errors (#2528) 2026-01-29 10:52:42 -05:00
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
Tao Sang 66a1e38387 SWDEV-577011 Fix missing ais symbols in Windows (#2871)
Fix missing ais symbols in rocr in Windows
2026-01-28 22:29:30 -05:00
Yiannis Papadopoulos fdb19e5a4c rocr: Format script skips non-existing files in sparse checkouts (#2360) 2026-01-27 15:58:53 -05:00
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
Rahul Manocha 324a864bc4 SWDEV-558848 - Move DRM calls to thunk for better abstraction (#1912)
* SWDEV-558848 - Move DRM calls to thunk for better abstraction

* Use thunk device handle instead of drm inside agent

* Update IPC functions with new thunk calls

* create hsaKmtHandleImport interface to support ipc

* Reset metadata inside hsaKmtMemHandleFree

* remove whitespaces and NULL usage

* Add thunk apis to libhsakmt.ver

* Add comments to new structs in thunk

* Minor fixes to declarations

* resolve merge conflicts in amd_kfd_driver

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-27 08:56:57 -08:00
yugang-amd 576fbaef74 Fix link for global enum and defines (#2802) 2026-01-22 16:17:25 -05:00
Danylo Lytovchenko 7f906ced10 Revert "rocr: SVMPrefetch to a particular numa node (#1063)" (#2792)
This reverts commit 0ba5a01baa.
2026-01-22 17:26:19 +01:00
David Yat Sin 5267cd334b rocr: Refactor SDMA object creation (#2629)
Refactor SDMA object creation and add comment to clarify why GCR is not
needed on DXG.
2026-01-21 21:09:56 -05:00
German Andryeyev d902429f1f rocr/hsakmt/wsl Move WSL under ROCR hsakmt. (#2638)
## Motivation
ROCR on Windows uses WSL implementation as the codebase. We want to make
sure Windows changes can continue to work with WSL and share the same
core implementation. Hence, it's easier to maintain the code under the
same rocm-system infrastructure and automate all builds/tests in the
future.

## Technical Details
The new files is the copy of https://github.com/ROCm/librocdxg/ with
preserved history. Native windows support and clean-ups will be added in
the following check-ins.

The same command lines can be used to build WSL under libhsakmt folder
for now.
```
# Set the Windows SDK path (adjust version number if different)
export win_sdk='/mnt/c/Program Files (x86)/Windows Kits/10/Include/10.0.26100.0/'
 
# Build the library
mkdir -p build
cd build
cmake .. -DWIN_SDK="${win_sdk}/shared"
make
sudo make install

```
## JIRA ID
SWDEV-558849

## Test Plan
N/A

## Test Result
N/A

## Submission Checklist
2026-01-21 20:00:33 -05:00
German Andryeyev 196baa4321 rocr: Fix static build in Windows (#2660) 2026-01-21 18:44:51 -05:00
Sunday Clement 0ba5a01baa rocr: SVMPrefetch to a particular numa node (#1063)
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2026-01-21 16:52:15 -05:00
pghoshamd 8d73201a35 AILIROCR-4 Fix double free of tma_region (#2678) 2026-01-21 15:31:10 -05:00
pghoshamd 793755532f SWDEV-561708 Initial shared queue pool apis (#1614)
* SWDEV-561708 Initial shared queue pool apis

* Validate params; some fixes in callback function (but still needs to be checked)

* Dtor cleanup

* minor

* Enable profiling; remove callback since aql_queue takes care of it

* setPriority and setCuMask APIs updated for counted queues

* Increasing step and minor version for rocprofiler

* Tests for CountedQueueManager

* tests

* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending

* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added

* Changed to ASSERT_SUCCESS macros for all tests

* RIng buffer overflow test added

* tests fixed; cleanup added at hsa_shutdown

* priority conversion table changes

* Compiler warnings fixed

* Rewrite 1 test; add desc and improve SetUp() code

* Improvement

* Unififed getinfo for both counted and non-counted queues

* Address PR feedback

* Addressing feedback: memleak, data type mismatch, documentation

* improve comment

* format

* Missing HSA_API macros for roctracer

* Revert "Addressing feedback: memleak, data type mismatch, documentation"

This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.

* Improving acquire api doc

* release api doc improved

* error codes for release api doc
2026-01-21 15:30:04 -05:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
hongkzha-amd d94185c5b2 rocrtst: set HSA_ENABLE_INTERRUPT after TestExample creation (#2687)
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Co-authored-by: cfreeamd <166262151+cfreeamd@users.noreply.github.com>
2026-01-21 10:39:50 +08:00
Alysa Liu 9139f5a241 Revert "rocr: Switch back to legacy IPC (#1744)" (#2676)
This reverts commit 7e4b62290c.
2026-01-20 14:34:10 -05:00
German Andryeyev 3af2bf4952 Merge branch 'develop' into amd/dev/gandryey/SWDEV-558849 2026-01-20 12:04:53 -05:00
Rahul Manocha 249a947dc7 Fix set/get access failure for VMM on windows (#2280)
* Fix set/get access failure for VMM on windows

* seperate code paths for linux and windows to avoid using import/export calls in windows

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-16 08:34:21 -08:00
German Andryeyev 07a6b45535 rocr: restore the original line 2026-01-16 11:05:24 -05:00
German Andryeyev e438308541 rocr/libhskamt: Add wsl build in thunk 2026-01-15 17:29:50 -05:00
German Andryeyev 5c5b9729ff Add 'projects/rocr-runtime/libhsakmt/include/hsakmt/drm/' from commit '8c47e25315e70f9c8cdd57a5790d3e080938c969'
git-subtree-dir: projects/rocr-runtime/libhsakmt/include/hsakmt/drm
git-subtree-mainline: 5319163521
git-subtree-split: 8c47e25315
2026-01-15 16:06:07 -05:00
German Andryeyev 5319163521 Add 'projects/rocr-runtime/libhsakmt/include/impl/' from commit 'c34ec1e52fcb52da248c00207ebe646197ea9d3e'
git-subtree-dir: projects/rocr-runtime/libhsakmt/include/impl
git-subtree-mainline: 55f7d39fa5
git-subtree-split: c34ec1e52f
2026-01-15 15:54:37 -05:00
German Andryeyev 55f7d39fa5 Add 'projects/rocr-runtime/libhsakmt/src/dxg/' from commit '029690f0a4f62fefefbb67305a066a72e99f8c0b'
git-subtree-dir: projects/rocr-runtime/libhsakmt/src/dxg
git-subtree-mainline: 8760fb4976
git-subtree-split: 029690f0a4
2026-01-15 15:51:21 -05:00
Filip Jankovic 29cd25df66 Add hipDeviceAttributeExpertSchedMode (#2435)
* Add hipDeviceAttributeExpertSchedMode

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

* Update hipDeviceAttributeExpertSchedMode unit test

* Move check to ROCr from thunk interface

* Revert unrelated whitespace changes

* Revert version bump

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
2026-01-15 08:41:39 -08:00
pghoshamd d2a1fc945e SWDEV-569319 Fix dangling reference warning (#2509)
* SWDEV-569319 Fix dangling reference warning

* fix nullptr warning

* use emplace

* return regular pointer
2026-01-13 15:39:03 -06:00
hongkzha-amd 9dc2488b6b rocrtst: Add test cases for interrupt disabled mode (#2385)
Add explicit test cases to verify ROCr functionality with interrupts
disabled (HSA_ENABLE_INTERRUPT=0). This ensures compatibility with
virtio, dtif, and WSL configurations which require interrupt-disabled
mode.

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
2026-01-13 12:10:11 -06:00
hongkzha-amd b3c4e94e70 rocr: Improve memory protection and WSL compatibility (#2274)
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.

Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: reduce ifdef linux
v2: Fix IsDXG check logic

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-13 12:08:20 -06:00
Jin Jung d4758bc29e SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323) 2026-01-12 13:10:46 -06:00
Honglei Huang 054bf836f1 [rocr/libhskamt/virtio] Add some apis into libhsakmt virtio (#2457)
* libhsakmt/virtio: Add alloc memory align api

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Rename CLGL BO to AMDGPU BO

Rename VHSA_BO_CLGL to VHSA_BO_AMDGPU to support generic AMDGPU buffer objects, not just CL/GL interop.

* libhsakmt/virtio: Add atomic helpers and node lookup

Add vhsakmt_atomic_inc/dec macros and vhsakmt_get_node_by_id helper function.

* libhsakmt/virtio: Add AMDGPU device initialization support

Add vamdgpu_device_initialize and vamdgpu_device_deinitialize functions.

* libhsakmt/virtio: Add AMDGPU device handle and DRM command support

Add vamdgpu_device_get_fd, vdrmCommandWriteRead and update vhsaKmtGetAMDGPUDeviceHandle.

* libhsakmt/virtio: Add AMDGPU BO free and CPU map support

Add vamdgpu_bo_free and vamdgpu_bo_cpu_map functions.

* libhsakmt/virtio: Add AMDGPU BO import and export support

Add vamdgpu_bo_import, vamdgpu_bo_export and vhsakmt_bo_from_resid functions.

* libhsakmt/virtio: Add AMDGPU BO VA operation support

Add vamdgpu_bo_va_op function.

* libhsakmt/virtio: Add dma buf export support

Add vhsaKmtExportDMABufHandle API in virtio driver to support export
feature.

* libhsakmt/virtio: Fix potential deadlock in userptr deregistration

Refactor vhsakmt_deregister_userptr_non_svm to avoid calling
vhsakmt_destroy_userptr while holding the bo_handles_mutex lock.
Previously, destroying userptrs directly while iterating the tree
could cause deadlock issues due to nested locking.

- Move interval tree removal from vhsakmt_destroy_userptr to caller
- Collect BOs to free in a temporary array during tree traversal
- Destroy BOs after releasing the mutex to avoid lock contention
- Use dynamic array with realloc to handle arbitrary number of BOs

Signed-off-by: Honglei Huang <honghuan@amd.com>

* rocr: driver/virtio: Implement DMA-BUF import/export and memory mapping APIs

Implement the missing DMA-BUF handling and memory mapping functions
in the virtio KFD driver to enable cross-process memory sharing:

- ExportDMABuf: Export HSA memory as DMA-BUF file descriptor
- ImportDMABuf: Import DMA-BUF fd as shareable buffer object
- Map: Map imported buffer into virtual address space with permissions
- Unmap: Unmap buffer from virtual address space
- ReleaseShareableHandle: Free imported buffer object

Also add drm_perm() helper to convert HSA access permissions to
AMDGPU VM page flags (READABLE/WRITEABLE).

These APIs enable IPC memory sharing between HSA processes through
DMA-BUF mechanism in virtualized environments.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add register memory APIs

Add two new memory registration functions to the virtio HSA KMT library:

1. vhsaKmtRegisterMemory: A simplified wrapper for vhsaKmtRegisterMemoryWithFlags
   that uses default CoarseGrain memory flags.

2. vhsaKmtRegisterMemoryToNodes: A stub implementation for registering memory
   to specific nodes. Returns HSAKMT_STATUS_NOT_IMPLEMENTED as it's currently
   not used in ROCR.

Changes:
- Added function declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Exported symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add graphics handle registration and mapping APIs

- Add vhsaKmtRegisterGraphicsHandleToNodesExt() with flags support
- Add vhsaKmtMapGraphicHandle() and vhsaKmtUnmapGraphicHandle() stubs
- Refactor existing registration API to use extended version

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio support for queue APIs

Implement vhsaKmtUpdateQueue, vhsaKmtSetQueueCUMask,
vhsaKmtAllocQueueGWS and vhsaKmtGetQueueInfo functions
with virtio protocol extensions and symbol exports.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add new virtio API support for model, SMI, and XNACK mode

Add three new API functions to the virtio backend:
- vhsaKmtModelEnabled: Check if pre-silicon model is enabled (returns false for virtio)
- vhsaKmtOpenSMI: Open SMI interface for a node (not yet supported in virtio)
- vhsaKmtSetXNACKMode: Set XNACK mode via virtio control command

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add shared memory support for virtio backend

Implement shared memory APIs for the virtio backend to enable
memory sharing between processes:

- Add vhsaKmtShareMemory() to share memory regions and create
  shared memory handles
- Add vhsaKmtRegisterSharedHandle() to register shared memory
  handles in the current process
- Add vhsaKmtRegisterSharedHandleToNodes() for node-specific
  shared memory registration

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add memory management APIs for virtio

Add the following new memory management APIs to virtio implementation:
- vhsaKmtSetMemoryUserData: Set user data for memory pointer
- vhsaKmtSetMemoryPolicy: Configure memory policy for nodes
- vhsaKmtSVMGetAttr: Get SVM (Shared Virtual Memory) attributes
- vhsaKmtSVMSetAttr: Set SVM attributes
- vhsaKmtReplaceAsanHeaderPage: ASAN header page replacement (stub)
- vhsaKmtReturnAsanHeaderPage: ASAN header page return (stub)

Changes include:
- Added API declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Extended protocol definitions in hsakmt_virtio_proto.h
- Added user_data field to vhsakmt_bo structure
- Exported new symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add SPM APIs

Add three new SPM-related APIs to the virtio interface:
- vhsaKmtSPMAcquire: Acquire SPM resources on a preferred node
- vhsaKmtSPMRelease: Release SPM resources on a preferred node
- vhsaKmtSPMSetDestBuffer: Set destination buffer for SPM data with
  optional userptr support and data loss detection

These APIs extend the virtio command protocol with new query types:
- VHSAKMT_CCMD_QUERY_SPM_ACQUIRE
- VHSAKMT_CCMD_QUERY_SPM_RELEASE
- VHSAKMT_CCMD_QUERY_SPM_SET_DST_BUFFER

The implementation includes proper buffer management for both
direct BO access and userptr fallback for smaller buffers.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio stub for hsaKmtAisReadWriteFile API

Add vhsaKmtAisReadWriteFile stub implementation for the virtio backend
to support AIS (Accelerated I/O Service) file read/write operations.
This stub currently returns HSAKMT_STATUS_NOT_IMPLEMENTED.

Changes include:
- Add vhsaKmtAisReadWriteFile declaration in hsakmt_virtio.h
- Add stub implementation in hsakmt_virtio_memory.c
- Export the symbol in libhsakmt_virtio.ver

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

* libhsakmt/virtio: Add vamdgpu_bo_query_info and vamdgpu_bo_set_metadata APIs

Implement two new virtio wrapper functions for AMDGPU buffer object operations:

1. vamdgpu_bo_query_info: Query buffer object information including
   allocation parameters, memory usage, and metadata.

2. vamdgpu_bo_set_metadata: Set metadata for a buffer object, allowing
   applications to attach custom data to GPU memory allocations.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add ProcessVMRead/Write stub implementations for virtio

Add vhsaKmtProcessVMRead and vhsaKmtProcessVMWrite stub functions
to the virtio interface. These APIs return HSAKMT_STATUS_NOT_IMPLEMENTED
since they are not supported in the baremetal implementation, matching
the behavior of the deprecated hsaKmtProcessVMRead/Write APIs.

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
Co-authored-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
2026-01-09 18:18:53 -08:00
Apurv Mishra be375c2dbf rocr: Add support for Mipmapped Array (#1847)
SWDEV-539526 - Add support for Mipmapped Array in Rocr

Add support for Mipmapped Array functionality in Rocr Runtimeenabling GPU applications to work with multi-level texture mipmaps. The implementation introduces new public APIs for creating, querying, and managing mipmapped arrays across different GPU architectures.

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
Co-authored-by: taosang2 <tao.sang@amd.com>
2026-01-08 17:14:39 -06:00
Mario Limonciello 8b529e7b29 Run pre-commit's whitespace related hooks on projects/rocr-runtime/samples (#2126)
In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2026-01-08 15:36:57 -05:00
Yiannis Papadopoulos e8fef02e5a rocr/aie: Use util/os to get system memory (#2520) 2026-01-08 12:40:22 -06:00
Flora Cui be04fa8250 rocr: reorder HsaNodeProperties to improve compatibility (#2447)
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-08 09:56:39 +08:00
Alysa Liu 5be4fddf06 kfdtest: Support blit kernel copy (#677)
Add support for blit kernel copy.
Add GpuMemCopyTest test for KFDQMTest.
2026-01-07 16:48:11 -05:00
pghoshamd 637b0d71f0 SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146)
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers

* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex

* Replaced unique_locks with lock_guards

* More changes

* Replace new and deletes with smart pointers

* Replaced some more with shared ptrs

* Replacements with smart pointers - pt 2

* missed change
2026-01-06 10:59:34 -05:00
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
David Yat Sin 5ebd50c0b4 rocr: Fix asyncHandler segfault (#2261)
Fix initialization order for the async events handler. The polling
thread would launch before the wake signal is initialized.
2025-12-17 20:52:20 -08:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
Kent Russell 0a2ea9ef55 hsakmt: Expose and use CWSR and Control stack sizes (#2200)
* hsakmt: Expose CWSR and Control stack sizes

This is better than hardcoding values and hoping that they align with
KFD's definitions

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Use CwsrSize and CtlStackSize if available

If KFD is providing the CwsrSize and CtlStackSize, use the maximum
of those and the old calculations for the ctx_save_restore_size
and ctl_stack_size defined in the queue

Signed-off-by: Kent Russell <kent.russell@amd.com>

* hsakmt: Add warning when ABI<1.20 on GFX1151

CwsrSize and CtlStackSize are reported by KFD ABI 1.20. GFX1151
specifically may have some issues if these regions are misaligned, so
report a strong warning during topology initialization if the system is
GFX1151 but is using KFD ABI < 1.20

Signed-off-by: Kent Russell <kent.russell@amd.com>

---------

Signed-off-by: Kent Russell <kent.russell@amd.com>
2025-12-16 06:26:14 -06:00
randyh62 1240b592a5 Git url fix (#2285)
* Update README-doc.md

Correct GitHub URL for components moved into rocm-systems

* Update amd_clr.rst

Update github.com URLs

* Update Dockerfile

Update rocm-systems paths

* Update CONTRIBUTING.md

update for rocm-systems

* Update CONTRIBUTING.md

minor change

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* Update hip_runtime_api.rst

Update for rocm-systems

* Update installation.rst

update URL to libhsakmt

* Update what_is_hip.rst

* Update projects/clr/CONTRIBUTING.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update projects/clr/README-doc.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update Dockerfile

Update git clone for sparse checkout

* Update projects/hip/CONTRIBUTING.md

* Update projects/clr/CONTRIBUTING.md

* Update projects/hipother/CONTRIBUTING.md

---------

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>
2025-12-15 11:57:18 -08:00
Rahul Manocha dd4bee33ff SWDEV-558848 - Update thunk interface signature for vmm enablement (#2259)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-12-11 08:43:28 -08:00
Mario Limonciello 0c4d08f38d Revert correcting the VGPR size for GFX 11.5.1 (#2268)
Although the value is correct; there is no source of truth between
kernel and userspace.  This leads to problems if the kernel has strict
restrictions (such as kernel 6.17 or earlier). The restrictions were
lifted in 6.17.9 and and 6.18, but there is no guarantee userspace is
using this.

So short term this value will be wrong.  But on newer kernels the kernel
will communicate the right size and rocr-runtime will be adjusted to
use that.

Link: https://github.com/ROCm/TheRock/pull/2505

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-11 07:59:19 -06:00
shwetakhatri-amd 0835f2e75a rocrtst: Updated CMakeFiles to find_package instead of hardcoded (#2095)
* rocrtst: Updated CMakeFiles to find_package instead of hardcoded

This is to support TheROCK build environment

* rocrtst: Fix CMake to use find_package() instead of hardcoded ENV paths

Fixed CMake style issues from previos first commit's code review

* rocrtst: Fix rocrtst NUMA dependency detection to use find_package

Also added handling of missing headers

* rocrtst: Fix NUMA and hwloc detection for cross-platform builds

---------

Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
2025-12-10 16:16:25 -05:00
Rahul Manocha 0c1f87a7f6 SWDEV-558848 - vmm api support for rocr on windows (#1761)
* SWDEV-558848 - vmm api support for rocr on windows

* Fixes to VMM handle Map/Unmap Set/Get Access

* Fix GetShareableHandle to use pointer for shareable handle

* Update os specific map/unmap memory calls

* clang format update

* Minor syntax fixes from code review

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
2025-12-10 08:39:51 -08:00