1371 Commit

Autore SHA1 Messaggio Data
cfreeamd 5172701708 rocr: Correct gpu dumped core contents (#2851)
Includes several tests (rocrtst) for this capability.
2026-01-30 09:38:09 -08:00
pghoshamd adaaa883b2 SWDEV-576090 Fix mem leaks and double free of signals (#2817) 2026-01-29 16:53:27 -05:00
Yiannis Papadopoulos 66ee941fea rocr/aie: Throw exception for malformed packets and packet submission errors (#2528) 2026-01-29 10:52:42 -05:00
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
Tao Sang 66a1e38387 SWDEV-577011 Fix missing ais symbols in Windows (#2871)
Fix missing ais symbols in rocr in Windows
2026-01-28 22:29:30 -05:00
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00
Rahul Manocha 324a864bc4 SWDEV-558848 - Move DRM calls to thunk for better abstraction (#1912)
* SWDEV-558848 - Move DRM calls to thunk for better abstraction

* Use thunk device handle instead of drm inside agent

* Update IPC functions with new thunk calls

* create hsaKmtHandleImport interface to support ipc

* Reset metadata inside hsaKmtMemHandleFree

* remove whitespaces and NULL usage

* Add thunk apis to libhsakmt.ver

* Add comments to new structs in thunk

* Minor fixes to declarations

* resolve merge conflicts in amd_kfd_driver

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-27 08:56:57 -08:00
yugang-amd 576fbaef74 Fix link for global enum and defines (#2802) 2026-01-22 16:17:25 -05:00
Danylo Lytovchenko 7f906ced10 Revert "rocr: SVMPrefetch to a particular numa node (#1063)" (#2792)
This reverts commit 0ba5a01baa.
2026-01-22 17:26:19 +01:00
David Yat Sin 5267cd334b rocr: Refactor SDMA object creation (#2629)
Refactor SDMA object creation and add comment to clarify why GCR is not
needed on DXG.
2026-01-21 21:09:56 -05:00
German Andryeyev 196baa4321 rocr: Fix static build in Windows (#2660) 2026-01-21 18:44:51 -05:00
Sunday Clement 0ba5a01baa rocr: SVMPrefetch to a particular numa node (#1063)
In order for hipMemPrefetchAysnc_v2() api to work, we need rocr to
migrates the ranges of pages requested to the particular NUMA node in
question, via move_pages().

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2026-01-21 16:52:15 -05:00
pghoshamd 8d73201a35 AILIROCR-4 Fix double free of tma_region (#2678) 2026-01-21 15:31:10 -05:00
pghoshamd 793755532f SWDEV-561708 Initial shared queue pool apis (#1614)
* SWDEV-561708 Initial shared queue pool apis

* Validate params; some fixes in callback function (but still needs to be checked)

* Dtor cleanup

* minor

* Enable profiling; remove callback since aql_queue takes care of it

* setPriority and setCuMask APIs updated for counted queues

* Increasing step and minor version for rocprofiler

* Tests for CountedQueueManager

* tests

* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending

* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added

* Changed to ASSERT_SUCCESS macros for all tests

* RIng buffer overflow test added

* tests fixed; cleanup added at hsa_shutdown

* priority conversion table changes

* Compiler warnings fixed

* Rewrite 1 test; add desc and improve SetUp() code

* Improvement

* Unififed getinfo for both counted and non-counted queues

* Address PR feedback

* Addressing feedback: memleak, data type mismatch, documentation

* improve comment

* format

* Missing HSA_API macros for roctracer

* Revert "Addressing feedback: memleak, data type mismatch, documentation"

This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.

* Improving acquire api doc

* release api doc improved

* error codes for release api doc
2026-01-21 15:30:04 -05:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
Alysa Liu 9139f5a241 Revert "rocr: Switch back to legacy IPC (#1744)" (#2676)
This reverts commit 7e4b62290c.
2026-01-20 14:34:10 -05:00
Rahul Manocha 249a947dc7 Fix set/get access failure for VMM on windows (#2280)
* Fix set/get access failure for VMM on windows

* seperate code paths for linux and windows to avoid using import/export calls in windows

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2026-01-16 08:34:21 -08:00
Filip Jankovic 29cd25df66 Add hipDeviceAttributeExpertSchedMode (#2435)
* Add hipDeviceAttributeExpertSchedMode

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

* Update hipDeviceAttributeExpertSchedMode unit test

* Move check to ROCr from thunk interface

* Revert unrelated whitespace changes

* Revert version bump

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
2026-01-15 08:41:39 -08:00
pghoshamd d2a1fc945e SWDEV-569319 Fix dangling reference warning (#2509)
* SWDEV-569319 Fix dangling reference warning

* fix nullptr warning

* use emplace

* return regular pointer
2026-01-13 15:39:03 -06:00
hongkzha-amd b3c4e94e70 rocr: Improve memory protection and WSL compatibility (#2274)
* rocr: Add ProtectMemory API and use it in RemoveAccess
Replace munmap + mmap with mprotect when removing memory access.
This improves performance by 5-10x, ensures atomicity (no race
condition window), and prepares for WSL/DXG compatibility fixes.

Suggested-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: Skip CPU mapping operations on WSL
On WSL, CPU cannot access GPU VRAM due to platform restrictions.
CPU access would fault-in system RAM instead, causing data corruption
and memory leaks. Return HSA_STATUS_ERROR to fail fast rather than
silently creating broken mappings. GPU-to-GPU mappings remain functional.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

* rocr: reduce ifdef linux
v2: Fix IsDXG check logic

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>

---------
Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Signed-off-by: Flora Cui <flora.cui@amd.com>
2026-01-13 12:08:20 -06:00
Jin Jung d4758bc29e SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323) 2026-01-12 13:10:46 -06:00
Honglei Huang 054bf836f1 [rocr/libhskamt/virtio] Add some apis into libhsakmt virtio (#2457)
* libhsakmt/virtio: Add alloc memory align api

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Rename CLGL BO to AMDGPU BO

Rename VHSA_BO_CLGL to VHSA_BO_AMDGPU to support generic AMDGPU buffer objects, not just CL/GL interop.

* libhsakmt/virtio: Add atomic helpers and node lookup

Add vhsakmt_atomic_inc/dec macros and vhsakmt_get_node_by_id helper function.

* libhsakmt/virtio: Add AMDGPU device initialization support

Add vamdgpu_device_initialize and vamdgpu_device_deinitialize functions.

* libhsakmt/virtio: Add AMDGPU device handle and DRM command support

Add vamdgpu_device_get_fd, vdrmCommandWriteRead and update vhsaKmtGetAMDGPUDeviceHandle.

* libhsakmt/virtio: Add AMDGPU BO free and CPU map support

Add vamdgpu_bo_free and vamdgpu_bo_cpu_map functions.

* libhsakmt/virtio: Add AMDGPU BO import and export support

Add vamdgpu_bo_import, vamdgpu_bo_export and vhsakmt_bo_from_resid functions.

* libhsakmt/virtio: Add AMDGPU BO VA operation support

Add vamdgpu_bo_va_op function.

* libhsakmt/virtio: Add dma buf export support

Add vhsaKmtExportDMABufHandle API in virtio driver to support export
feature.

* libhsakmt/virtio: Fix potential deadlock in userptr deregistration

Refactor vhsakmt_deregister_userptr_non_svm to avoid calling
vhsakmt_destroy_userptr while holding the bo_handles_mutex lock.
Previously, destroying userptrs directly while iterating the tree
could cause deadlock issues due to nested locking.

- Move interval tree removal from vhsakmt_destroy_userptr to caller
- Collect BOs to free in a temporary array during tree traversal
- Destroy BOs after releasing the mutex to avoid lock contention
- Use dynamic array with realloc to handle arbitrary number of BOs

Signed-off-by: Honglei Huang <honghuan@amd.com>

* rocr: driver/virtio: Implement DMA-BUF import/export and memory mapping APIs

Implement the missing DMA-BUF handling and memory mapping functions
in the virtio KFD driver to enable cross-process memory sharing:

- ExportDMABuf: Export HSA memory as DMA-BUF file descriptor
- ImportDMABuf: Import DMA-BUF fd as shareable buffer object
- Map: Map imported buffer into virtual address space with permissions
- Unmap: Unmap buffer from virtual address space
- ReleaseShareableHandle: Free imported buffer object

Also add drm_perm() helper to convert HSA access permissions to
AMDGPU VM page flags (READABLE/WRITEABLE).

These APIs enable IPC memory sharing between HSA processes through
DMA-BUF mechanism in virtualized environments.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add register memory APIs

Add two new memory registration functions to the virtio HSA KMT library:

1. vhsaKmtRegisterMemory: A simplified wrapper for vhsaKmtRegisterMemoryWithFlags
   that uses default CoarseGrain memory flags.

2. vhsaKmtRegisterMemoryToNodes: A stub implementation for registering memory
   to specific nodes. Returns HSAKMT_STATUS_NOT_IMPLEMENTED as it's currently
   not used in ROCR.

Changes:
- Added function declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Exported symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add graphics handle registration and mapping APIs

- Add vhsaKmtRegisterGraphicsHandleToNodesExt() with flags support
- Add vhsaKmtMapGraphicHandle() and vhsaKmtUnmapGraphicHandle() stubs
- Refactor existing registration API to use extended version

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio support for queue APIs

Implement vhsaKmtUpdateQueue, vhsaKmtSetQueueCUMask,
vhsaKmtAllocQueueGWS and vhsaKmtGetQueueInfo functions
with virtio protocol extensions and symbol exports.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add new virtio API support for model, SMI, and XNACK mode

Add three new API functions to the virtio backend:
- vhsaKmtModelEnabled: Check if pre-silicon model is enabled (returns false for virtio)
- vhsaKmtOpenSMI: Open SMI interface for a node (not yet supported in virtio)
- vhsaKmtSetXNACKMode: Set XNACK mode via virtio control command

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add shared memory support for virtio backend

Implement shared memory APIs for the virtio backend to enable
memory sharing between processes:

- Add vhsaKmtShareMemory() to share memory regions and create
  shared memory handles
- Add vhsaKmtRegisterSharedHandle() to register shared memory
  handles in the current process
- Add vhsaKmtRegisterSharedHandleToNodes() for node-specific
  shared memory registration

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add memory management APIs for virtio

Add the following new memory management APIs to virtio implementation:
- vhsaKmtSetMemoryUserData: Set user data for memory pointer
- vhsaKmtSetMemoryPolicy: Configure memory policy for nodes
- vhsaKmtSVMGetAttr: Get SVM (Shared Virtual Memory) attributes
- vhsaKmtSVMSetAttr: Set SVM attributes
- vhsaKmtReplaceAsanHeaderPage: ASAN header page replacement (stub)
- vhsaKmtReturnAsanHeaderPage: ASAN header page return (stub)

Changes include:
- Added API declarations in hsakmt_virtio.h
- Implemented functions in hsakmt_virtio_memory.c
- Extended protocol definitions in hsakmt_virtio_proto.h
- Added user_data field to vhsakmt_bo structure
- Exported new symbols in libhsakmt_virtio.ver

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add SPM APIs

Add three new SPM-related APIs to the virtio interface:
- vhsaKmtSPMAcquire: Acquire SPM resources on a preferred node
- vhsaKmtSPMRelease: Release SPM resources on a preferred node
- vhsaKmtSPMSetDestBuffer: Set destination buffer for SPM data with
  optional userptr support and data loss detection

These APIs extend the virtio command protocol with new query types:
- VHSAKMT_CCMD_QUERY_SPM_ACQUIRE
- VHSAKMT_CCMD_QUERY_SPM_RELEASE
- VHSAKMT_CCMD_QUERY_SPM_SET_DST_BUFFER

The implementation includes proper buffer management for both
direct BO access and userptr fallback for smaller buffers.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add virtio stub for hsaKmtAisReadWriteFile API

Add vhsaKmtAisReadWriteFile stub implementation for the virtio backend
to support AIS (Accelerated I/O Service) file read/write operations.
This stub currently returns HSAKMT_STATUS_NOT_IMPLEMENTED.

Changes include:
- Add vhsaKmtAisReadWriteFile declaration in hsakmt_virtio.h
- Add stub implementation in hsakmt_virtio_memory.c
- Export the symbol in libhsakmt_virtio.ver

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

* libhsakmt/virtio: Add vamdgpu_bo_query_info and vamdgpu_bo_set_metadata APIs

Implement two new virtio wrapper functions for AMDGPU buffer object operations:

1. vamdgpu_bo_query_info: Query buffer object information including
   allocation parameters, memory usage, and metadata.

2. vamdgpu_bo_set_metadata: Set metadata for a buffer object, allowing
   applications to attach custom data to GPU memory allocations.

Signed-off-by: Honglei Huang <honghuan@amd.com>

* libhsakmt/virtio: Add ProcessVMRead/Write stub implementations for virtio

Add vhsaKmtProcessVMRead and vhsaKmtProcessVMWrite stub functions
to the virtio interface. These APIs return HSAKMT_STATUS_NOT_IMPLEMENTED
since they are not supported in the baremetal implementation, matching
the behavior of the deprecated hsaKmtProcessVMRead/Write APIs.

Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>

---------

Signed-off-by: Honglei Huang <honghuan@amd.com>
Signed-off-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
Co-authored-by: energystoryhhl <energystoryhhl@users.noreply.github.com>
2026-01-09 18:18:53 -08:00
Apurv Mishra be375c2dbf rocr: Add support for Mipmapped Array (#1847)
SWDEV-539526 - Add support for Mipmapped Array in Rocr

Add support for Mipmapped Array functionality in Rocr Runtimeenabling GPU applications to work with multi-level texture mipmaps. The implementation introduces new public APIs for creating, querying, and managing mipmapped arrays across different GPU architectures.

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
Co-authored-by: taosang2 <tao.sang@amd.com>
2026-01-08 17:14:39 -06:00
Yiannis Papadopoulos e8fef02e5a rocr/aie: Use util/os to get system memory (#2520) 2026-01-08 12:40:22 -06:00
pghoshamd 637b0d71f0 SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers (#2146)
* SWDEV-569319 Replace ScopedAcquire with stdcpp wrappers

* Remove KernelMutex and KernelSharedMutex abstractions with std::mutex and std::shared_mutex

* Replaced unique_locks with lock_guards

* More changes

* Replace new and deletes with smart pointers

* Replaced some more with shared ptrs

* Replacements with smart pointers - pt 2

* missed change
2026-01-06 10:59:34 -05:00
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
David Yat Sin 5ebd50c0b4 rocr: Fix asyncHandler segfault (#2261)
Fix initialization order for the async events handler. The polling
thread would launch before the wake signal is initialized.
2025-12-17 20:52:20 -08:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
randyh62 1240b592a5 Git url fix (#2285)
* Update README-doc.md

Correct GitHub URL for components moved into rocm-systems

* Update amd_clr.rst

Update github.com URLs

* Update Dockerfile

Update rocm-systems paths

* Update CONTRIBUTING.md

update for rocm-systems

* Update CONTRIBUTING.md

minor change

* Update CONTRIBUTING.md

* Update CONTRIBUTING.md

* Update hip_runtime_api.rst

Update for rocm-systems

* Update installation.rst

update URL to libhsakmt

* Update what_is_hip.rst

* Update projects/clr/CONTRIBUTING.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update projects/clr/README-doc.md

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>

* Update Dockerfile

Update git clone for sparse checkout

* Update projects/hip/CONTRIBUTING.md

* Update projects/clr/CONTRIBUTING.md

* Update projects/hipother/CONTRIBUTING.md

---------

Co-authored-by: Dominic Widdows <dwiddows@gmail.com>
2025-12-15 11:57:18 -08:00
Rahul Manocha dd4bee33ff SWDEV-558848 - Update thunk interface signature for vmm enablement (#2259)
Co-authored-by: Rahul Manocha <rmanocha@amd.com>
2025-12-11 08:43:28 -08:00
Rahul Manocha 0c1f87a7f6 SWDEV-558848 - vmm api support for rocr on windows (#1761)
* SWDEV-558848 - vmm api support for rocr on windows

* Fixes to VMM handle Map/Unmap Set/Get Access

* Fix GetShareableHandle to use pointer for shareable handle

* Update os specific map/unmap memory calls

* clang format update

* Minor syntax fixes from code review

Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>

---------

Co-authored-by: Rahul Manocha <rmanocha@amd.com>
Co-authored-by: Yiannis Papadopoulos <102817138+ypapadop-amd@users.noreply.github.com>
2025-12-10 08:39:51 -08:00
Jin Jung deaf8ab38a SWDEV-567119 - Windows GL Interop Support (#1892) 2025-12-08 11:03:59 -05:00
Mario Limonciello bc5d48e76c Run pre-commit's whitespace related hooks on projects/rocr-runtime (#2130)
* Run pre-commit's whitespace related hooks on projects/rocr-runtime

In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Add missing semicolon which would block compilation on big endian CPUs

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 07:56:50 -06:00
Ammar ELWazir 9baa65a8b7 [ROCR-Runtime] [ROCProfiler-SDK] Fixing the copy back to the original buffer malformed packets (#2185)
* Fixing the copy back to the original buffer malformed packets

* Addressing Copilot Comments

* Addressing Review comments

* Adjust staging buffer size allocation

Change staging buffer size to match the number of packets.
2025-12-05 10:58:01 -06:00
hongkzha-amd 4bd1b90e62 rocr: Add WSL support by conditionally handling DRM operations (#2081)
This patch enhances compatibility for DXG environments by introducing conditional
checks for DRM operations, particularly around buffer object metadata handling
in IPC scenarios. These changes improve robustness in DXG IPC memory management
without impacting existing functionality in standard Linux environments.

Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com>
2025-12-05 07:50:48 +08:00
German Andryeyev f3ffd7070c rocr: Change hsaKmtQueueRingDoorbell interface (#2068)
WSL uses the call just for the thread wake-up, however under Windows
KMD needs the actual value (SWDEV-568592). The interface is changed
to avoid programming of a modified write_ptr value, which somewhat
changes the client's logic.
2025-12-03 11:49:40 -05:00
Sunday Clement 3f3260ffd2 rocr: Fix IPC dmabuf hang with large allocations (#1945)
Changed ipc_sock_server_conns_ map's value type to size_t. Previous
type of int caused allocations of sizes greater than 2GB to overflow,
causing the message len to be stored as a negative value, preventing the
IPC server from exporting dmabuf file descriptors, which lead to hangs.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
2025-12-03 11:20:30 -05:00
lancesix 7ba0f1bdcf ROCr-runtime: Allow core dumps for gfx115x (#1827)
Core dumps are not supporetd for gfx110x, but should be possible for
gfx115x.  The current code disables core dumps completly for all gfx11xx
agents, relax this to allow gfx115x.
2025-12-02 11:51:29 +00:00
Jaydeep cf91f5f77a SWDEV-568926 - Destroy queue_inactive_signal directly. (#2059) 2025-12-02 09:55:58 +05:30
Alysa Liu 5b75ec6a09 rocr: Fix error when internal signal is destroyed (#1845)
Fix error when we destroy internal signals during shutdown.
Fix init dependency on uninitialized value.
2025-11-26 16:22:57 -08:00
cfreeamd 24c2a84e3f rocr: GPU core file location support (#1732)
* rocr: WIP Support dump of GPU core file

* WIP new core dump tests compile

* WIP: anony namespaces, test updates, progress

Added disabled Fault test. Other non-disabled coredump tests don't work.

* WIP: address code review feedback

* WIP: gpu core dump rocrtst works; combined

* WIP: remove rocrtst changes for this commit
2025-11-20 18:50:51 -08:00
jonatluu 6b8aae3796 Enable Lintian Support rocm-systems (#1578)
* draft testing fix for no copyright file and no changelog

* test fix no-changelog no-copyright

* changelog copyright fixt

* remove utils.cmake

* rocr lintian

* lintian overrides, copyright, changelog install

* fix lintian overrides install

* comp_type static fix and remove debug logs

* syntax error

* update static build check

* update file permissions to 0755 to fix error control-file-has-bad-permissions 0664 != 0755

* fix lintian errors in rdc and remove logs from roctracer

* lintian error fix rocprofiler

* fix lintian error

* mmove lintian overrides install

* lintian errors fix

* move lintian overrides install

* use changelog already provided by rdc

* fix formatting use existing changelog if provided

* fix formatting use changelog in rocprofiler

* draft testing fix for no copyright file and no changelog

* test fix no-changelog no-copyright

* changelog copyright fixt

* lintian overrides, copyright, changelog install

* fix lintian overrides install

* comp_type static fix and remove debug logs

* fix lintian errors in rdc and remove logs from roctracer

* lintian error fix rocprofiler

* fix lintian error

* mmove lintian overrides install

* lintian errors fix

* move lintian overrides install

* use changelog already provided by rdc

* fix formatting use existing changelog if provided

* fix formatting use changelog in rocprofiler

* remove overrides. Use existing changelog and copyright

* resolve merge conflict

* update license for hsa-rocr. Use NCSA license

* install license

* install license
2025-11-20 11:38:39 -05:00
German Andryeyev 919642f721 rocr: Expose PM4 emulation in the agent info (#1869)
Native AQL path in Windows requires extra logic, which has a conflict
with the implementation of Pm4 emulation and needs a detection in the client.
2025-11-19 18:26:23 -05:00
JeniferC99 09010eb68a Revert "rocr: Fix VMM cpu mapping clean up (#1831)" (#1923)
This reverts commit 2327cd35c8.
2025-11-19 10:33:50 -08:00
David Yat Sin 9535b7fcbe rocr: Fix exception on AsyncEventControl init (#1852)
* rocr: Fix exception on AsyncEventControl init

Fix exception on init when compiling with in release mode.

* rocr: Fix crash when interrupts are disabled

Fix segfault due to assert for signal->EopEvent() being false when
HSA_ENABLE_INTERRUPT=0. Use Signal::WaitMultiple(..) when interrupt is
disabled.

---------

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-14 12:45:34 -08:00
Alysa Liu 2327cd35c8 rocr: Fix VMM cpu mapping clean up (#1831)
Remove CPU mapping before calling RemoveAccess().
2025-11-14 13:52:45 -05:00
David Yat Sin 7e4b62290c rocr: Switch back to legacy IPC (#1744)
Switch back to legacy IPC Implementation while we fix some race
conditions.
2025-11-13 09:41:55 -05:00
Tim Huang e2d83014cf rocr/dtif: Add ring doorbell for sdma user queue (#1619)
Signed-off-by: Tim Huang <tim.huang@amd.com>
2025-11-13 15:08:08 +08:00
David Yat Sin 7b097599c4 rocr: Fix race condition in SetAsyncSignalHandler (#1642)
Fix race condition when SetAsyncSignalHandler for the first time because
async_events_thread_ could be null and launched twice.

Refactored async-events to use lazy_pointer.
2025-11-12 13:54:26 -05:00
Jin Jung 291ff6c468 SWDEV-558855 - Enable Interop Map Buffer on Windows (#1748)
* Support Windows HANDLE in interop_map_buffer

* Refactored Windows HANDLE in interop_map_buffer

* ROCr System Dependent Handle Type

* Fix for ROCr Handle Conversion Bug

* Remove Windows Header
2025-11-07 12:47:01 -08:00