269 Commits

Author SHA1 Message Date
pghoshamd bc20b51f40 SWDEV-561708 Counted queue size from env var (#2844)
* SWDEV-561708 Counted queue size from env var

* use counted_queue_size for test

* remove rocrtst changes; add a const for default queue size

* Remove env var from test; use queue->size

* Improve env var documentation

* Correct type
2026-01-29 10:00:37 -05:00
pghoshamd 793755532f SWDEV-561708 Initial shared queue pool apis (#1614)
* SWDEV-561708 Initial shared queue pool apis

* Validate params; some fixes in callback function (but still needs to be checked)

* Dtor cleanup

* minor

* Enable profiling; remove callback since aql_queue takes care of it

* setPriority and setCuMask APIs updated for counted queues

* Increasing step and minor version for rocprofiler

* Tests for CountedQueueManager

* tests

* Code refactored to make pool manager part of GpuAgent only (incomplete); unique handles issue pending

* Refactored code to support CQM inside GpuAgent and unique handles; multithreaded test added

* Changed to ASSERT_SUCCESS macros for all tests

* RIng buffer overflow test added

* tests fixed; cleanup added at hsa_shutdown

* priority conversion table changes

* Compiler warnings fixed

* Rewrite 1 test; add desc and improve SetUp() code

* Improvement

* Unififed getinfo for both counted and non-counted queues

* Address PR feedback

* Addressing feedback: memleak, data type mismatch, documentation

* improve comment

* format

* Missing HSA_API macros for roctracer

* Revert "Addressing feedback: memleak, data type mismatch, documentation"

This reverts commit 5e498a55fb3640e00d06cec63dcec79293fb23de.

* Improving acquire api doc

* release api doc improved

* error codes for release api doc
2026-01-21 15:30:04 -05:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
Filip Jankovic 29cd25df66 Add hipDeviceAttributeExpertSchedMode (#2435)
* Add hipDeviceAttributeExpertSchedMode

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>

* Update hipDeviceAttributeExpertSchedMode unit test

* Move check to ROCr from thunk interface

* Revert unrelated whitespace changes

* Revert version bump

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
2026-01-15 08:41:39 -08:00
Jin Jung d4758bc29e SWDEV-570501 - Add Windows support for hipGraphicsGLRegisterBuffer (#2323) 2026-01-12 13:10:46 -06:00
Apurv Mishra be375c2dbf rocr: Add support for Mipmapped Array (#1847)
SWDEV-539526 - Add support for Mipmapped Array in Rocr

Add support for Mipmapped Array functionality in Rocr Runtimeenabling GPU applications to work with multi-level texture mipmaps. The implementation introduces new public APIs for creating, querying, and managing mipmapped arrays across different GPU architectures.

Signed-off-by: Apurv Mishra <Apurv.Mishra@amd.com>
Co-authored-by: Shweta Khatri <shweta.khatri@amd.com>
Co-authored-by: taosang2 <tao.sang@amd.com>
2026-01-08 17:14:39 -06:00
Maneesh Gupta 4a9833e70e Revert "Add HasExpertSchedMode device prop (#2241)" (#2371)
This reverts commit c0b4aef5ad.
2025-12-17 21:26:44 -08:00
Filip Jankovic c0b4aef5ad Add HasExpertSchedMode device prop (#2241)
* Add HasExpertSchedMode device prop

* Add unit tests for HasExpertSchedMode

* Add gfx12 check for HasExpertSchedMode prop

* Update gfx major version check and test for ExpertSchedMode

* Minor fix and ROCr version bump

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Update projects/rocr-runtime/runtime/hsa-runtime/inc/hsa_ext_amd.h

* Apply suggestion from @dayatsin-amd

* Apply suggestion from @dayatsin-amd

---------

Co-authored-by: Stefan Sokolovic <stefan.sokolovic2@amd.com>
Co-authored-by: David Yat Sin <77975354+dayatsin-amd@users.noreply.github.com>
2025-12-17 17:06:08 +01:00
Jin Jung deaf8ab38a SWDEV-567119 - Windows GL Interop Support (#1892) 2025-12-08 11:03:59 -05:00
Mario Limonciello bc5d48e76c Run pre-commit's whitespace related hooks on projects/rocr-runtime (#2130)
* Run pre-commit's whitespace related hooks on projects/rocr-runtime

In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

* Add missing semicolon which would block compilation on big endian CPUs

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>

---------

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-08 07:56:50 -06:00
German Andryeyev 919642f721 rocr: Expose PM4 emulation in the agent info (#1869)
Native AQL path in Windows requires extra logic, which has a conflict
with the implementation of Pm4 emulation and needs a detection in the client.
2025-11-19 18:26:23 -05:00
Jin Jung 291ff6c468 SWDEV-558855 - Enable Interop Map Buffer on Windows (#1748)
* Support Windows HANDLE in interop_map_buffer

* Refactored Windows HANDLE in interop_map_buffer

* ROCr System Dependent Handle Type

* Fix for ROCr Handle Conversion Bug

* Remove Windows Header
2025-11-07 12:47:01 -08:00
David Yat Sin db01d95ebc Users/dayatsin/swdev 519413 hsa amd pointer info return err shutdown (#1509)
* rocr: hsa_amd_pointer_info return err on shutdown

Decrement ref count before starting to unload to make sure API
calls during shutdown return error.

Delete blit objects during agent destructor.

* Add support for HSA_AMD_SYSTEM_SHUTDOWN_EVENT

Add support for new event to indicate shut down within the
hsa_amd_register_system_event_handler API.
2025-10-27 09:32:52 -04:00
hkasivis 5e7210980e Users/hkasivis/add ais support v2.1 (#928)
* libhsakmt: Update hsakmt_fmm_get_handle to support address range

Currently, hsakmt_fmm_get_handle works only if the address is allocated
(staring) value. Update it so it can find the handle if address falls in
the valid allocated range. This is useful for AMD infinity storage
feature where data needs to be transferred to any memory within in the
allocated range

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* libhsakmt: Introduce AMD Infinity Storage (AIS) API

Add hsaKmtAisReadWriteFile() API to support AMD Infinity Storage. The
API moves data directly from GPU VRAM to a file.

v2: Add in/out ioctl arguments to provide more status information to
user space. Modify hsaKmt API also accordingly.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

* rocr: Initial implementation of AMD Infinity Storage (AIS)

Implement first two API: hsa_amd_ais_file_write and hsa_amd_ais_file_read

v2: Change API from hsa_amd_ to hsa_amd_ais_
    Change API to take in handle instead of fd for compatibility accross
     different platforms

Original Author: Chris Freehill <Chris.Freehill@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

---------

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2025-09-20 11:30:05 -04:00
David Yat Sin 96a0d16eda rocr: Fix hsa_amd_pointer_info regression (#719)
Fix for hsa_amd_pointer_info returning only
HSA_EXT_POINTER_TYPE_RESERVED_ADDR for SVM allocations.
2025-09-19 10:09:22 -04:00
Alysa Liu 2b2b8329b5 rocr: Add copyright for new files (#886)
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
2025-09-11 10:56:31 -04:00
SaleelK 230a22b395 rocr: Workaround for peak SDMA b/w on gfx94x (#626)
* Ideally SDMA0/1/2 are the engines to use for H2D/D2H due to physical
  PCIE proximity
* Allow using same src/dst agent for SDMA query apis
2025-09-03 09:33:29 -04:00
David Yat Sin a1597a358a rocr: Expose flag to allocate uncached memory (#674)
Add new flag for clients to directly request uncached memory
2025-08-22 09:52:39 -04:00
David Yat Sin 875fb40a03 Dayatsin/develop vmm pointer info (#305)
* rocr: hsa_amd_pointer_info to support VMEM pointers

Extend hsa_amd_pointer_info to support virtual memory addresses.

If hsa_amd_pointer_info is called on an address that is reserved but not
mapped to memory, then the pointer type will be reported as
HSA_EXT_POINTER_TYPE_RESERVED_ADDR.

If hsa_amd_pointer_info is called on an address that is mapped, then the
pointer type will be reported as HSA_EXT_POINTER_TYPE_HSA_VMEM

* rocrtst: VirtMemory_Basic_Test test for pointer info

Extend rocrtstFunc.VirtMemory_Basic_Test to test for
hsa_amd_pointer_info

* rocrtst: Add SVM Memory Test
2025-08-13 14:21:47 -04:00
mat3ix c41050d01f rocr: SDMA improvements (#326)
- When SDMA queue gets full when copying 2GB or more it blocks async
copy api
- Improve/format logging
2025-08-13 10:25:29 -04:00
David Yat Sin 4e069fe72b doc: Fix doxygen comments for in-out params
[ROCm/ROCR-Runtime commit: 4c2dec5bb8]
2025-07-10 08:21:01 -04:00
Sunday Clement 315b1abaf9 rocr: Add hsa-agent Queries for Clock Counters
Support has been added to query the following
HSA_AMD_INFO_GET_CLOCK_COUNTERS agent info exposed through the hsa api
in rocr, rather than the user having to make a direct IOCTL call
through the kernel driver.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>


[ROCm/ROCR-Runtime commit: e97d06530e]
2025-06-23 18:45:09 -04:00
David Yat Sin 39bddd8b9d rocr: support reserving non-registered VA
Extend hsa_amd_vmem_address_reserve/hsa_amd_vmem_address_reserve_align
to support HSA_AMD_VMEM_ADDRESS_NO_REGISTER flag. This allocation can be
used to reserve virtual address ranges that can later be used by
hsa_amd_svm_attributes_set for SVM based memory allocations.


[ROCm/ROCR-Runtime commit: b3c48cc68c]
2025-06-18 18:21:11 -04:00
David Yat Sin b66b6991b0 rocr: Remove scratch_backing_memory_byte_size
scratch_backing_memory_byte_size was originally removed, and then put
back in e130172218. This was because it
was used by rocgdb. rocgdb code has been updated to not use this field.
Bumped _amdgpu_r_debug for the ABI change.


[ROCm/ROCR-Runtime commit: 3c0af843e3]
2025-06-12 15:33:47 -04:00
Chris Freehill 91268a6be9 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.


[ROCm/ROCR-Runtime commit: a34604bddb]
2025-06-09 15:42:58 -05:00
Chris Freehill 287986ab65 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.


[ROCm/ROCR-Runtime commit: 3a9d14bb66]
2025-06-09 15:42:58 -05:00
David Yat Sin 4515a48355 rocr: Update async-scratch reclaim API doc
[ROCm/ROCR-Runtime commit: c3978d03a4]
2025-05-28 20:08:52 -04:00
David Yat Sin 39ecc88315 rocr: Remove deprecated doorbell type 1 support
[ROCm/ROCR-Runtime commit: 0d70045817]
2025-05-28 16:12:02 -04:00
David Yat Sin 38ea4370c1 rocr: Fix doorbell ring
When compiling with -O0, some compilers generate a xchg instruction for
the __atomic_store(...) built-in. Using xchg on MMIO memory is
undefined-behavior and may be ignored on certain CPUs.


[ROCm/ROCR-Runtime commit: f011a9506d]
2025-05-20 09:19:10 -04:00
Tony Gutierrez 6f37386eb2 rocr: Flags to alloc queue buf/struct in dev mem
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.


[ROCm/ROCR-Runtime commit: 6e3c375bf1]
2025-04-23 15:53:29 -04:00
Saleel Kudchadker 945d6da90b rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.


[ROCm/ROCR-Runtime commit: 57c0c643ce]
2025-04-22 13:28:38 -07:00
Yiannis Papadopoulos f53a9c72c4 rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition.
[ROCm/ROCR-Runtime commit: c63e01724c]
2025-04-03 15:13:20 -05:00
Yiannis Papadopoulos bd109ec288 rocr/aie: Remove unused struct from HSA API
[ROCm/ROCR-Runtime commit: 8dcbbf31c7]
2025-03-27 13:15:13 -04:00
Longlong Yao 007795951b rocr: export pointer type for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>


[ROCm/ROCR-Runtime commit: a254e35fd6]
2025-03-11 10:16:58 -04:00
zichguan-amd b172fbd538 rocr: Allow 0/NULL/invalid signal handles for wait operations to be no-op
Remove hard assertions for signal validation on hsa_amd_signal_wait_* operations, instead ignore 0/NULL/invalid signals in the dependency condition evaluation to align with HSA specs for barrier-AND and barrier-OR packets.

Signed-off-by: zichguan-amd <zichuan.guan@amd.com>


[ROCm/ROCR-Runtime commit: e4d027191c]
2025-03-07 15:17:10 -05:00
David Yat Sin e130172218 rocr: Put back scratch_backing_memory_byte_size
The scratch_backing_memory_byte_size is not used by CP, but it is
currently used by rocgdb. Putting the field back, but we need to find a
solution for alt_scratch_backing_memory_byte_size.

Also, completely disabling alternate scratch as we need some changes to
support debugger.


[ROCm/ROCR-Runtime commit: 02b38d0614]
2025-03-06 16:23:38 -05:00
David Yat Sin 2dcc1989bc rocr: Add queries for async scratch reclaim
Add support for these 2 new queries:
- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_MAX
  Maximum amount of scratch memory allowed on this agent

- HSA_AMD_AGENT_INFO_SCRATCH_LIMIT_CURRENT
  Current limit for scratch memory on this agent


[ROCm/ROCR-Runtime commit: 107b48fb15]
2025-02-19 21:02:00 -05:00
David Yat Sin 5905b82579 rocr: Update for new async scratch reclaim
Updating ROCr code to match new handshake protocol with CP FW for
asynchronous scratch reclaim.
Increase previous limits when scratch reclaim feature is available.


[ROCm/ROCR-Runtime commit: aa2f98e6f9]
2025-02-19 21:02:00 -05:00
Luna Nova 9a0f0858fa rocr: set underlying type of hsa_region
Set underlying type of hsa_region_info_t, hsa_amd_region_info_t
to int.

Change-Id: Ibf97a025eec6176d8e28af8009e9bd6795ca061f


[ROCm/ROCR-Runtime commit: 166b08346b]
2025-02-06 16:25:03 -05:00
Tony Gutierrez ff52d6fc13 rocr: Add WaitMultiple to core Signal
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.

WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.

Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75


[ROCm/ROCR-Runtime commit: 8a38f121ea]
2025-01-27 09:21:43 -05:00
David Yat Sin d0ae8b2eb5 rocr: Add support for gfx950
<squashed with patch for gfx950 generic targets>

Signed-off-by: Chris Freehill <Chris.Freehill@amd.com>

Change-Id: Ifec6d93cf46c7fbf736c6572882299e279260af6


[ROCm/ROCR-Runtime commit: dab8f2fc65]
2025-01-26 13:04:58 -05:00
Ben Vanik 15cc61baf4 rocr: Fixing non-portable inline attribute on hsa_flag_* utilities.
Change-Id: Ie1c53fef407a71b5ec4c6eaf3a3ed00871184408


[ROCm/ROCR-Runtime commit: 9971e7b004]
2025-01-23 15:09:21 -05:00
Longlong Yao 0b1dc71200 rocr: add AMD_KERNEL_CODE_PROPERTIES_ENABLE_WAVEFRONT_SIZE32
Change-Id: I158705499f4ab0b1231d698d66902eb4ab1ececa
Signed-off-by: LonglongYao <Longlong.Yao@amd.com>


[ROCm/ROCR-Runtime commit: 5d8fba133d]
2025-01-22 13:02:31 -05:00
taosang2 a5de0f048d rocr: Support different address modes
Support different address modes in X, Y, Z directions

Change-Id: If1db5a8af33c92ddc4b48968c3d8eceb97daea6a


[ROCm/ROCR-Runtime commit: df250a49a5]
2024-12-02 09:07:56 -05:00
Konstantin Zhuravlyov bee079fc24 loader: add gfx9-4-generic support
Change-Id: Icb148f7a78a4ce0fc661e35d0df605e05db2de3d


[ROCm/ROCR-Runtime commit: 4c7a9a0f67]
2024-11-14 12:47:46 -05:00
Konstantin Zhuravlyov 45c824a387 amd_hsa_elf.h: bring EF_AMDGPU_MACH_* in sync with llvm-project
- formatting
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X56
  - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X57
  - add EF_AMDGPU_MACH_AMDGCN_GFX1153
  - add EF_AMDGPU_MACH_AMDGCN_GFX12_GENERIC

Change-Id: Ibad464c659137c0c98fa9fa9d1f293ea62684ee6


[ROCm/ROCR-Runtime commit: d9404a52ed]
2024-11-07 18:03:27 -05:00
David Yat Sin 35187a00df rocr: Add executable flag for memory allocations
Change-Id: I8307cd3562c3ab9c12fef8c457a59916e33b7923


[ROCm/ROCR-Runtime commit: d58c9dea0a]
2024-10-15 16:52:00 +00:00
David Yat Sin 298ec3d840 rocr/vmm: Only modify permisions for specified agents
When hsa_amd_vmem_set_access is called, do not remove permissions for
unspecified agents. Also updating documentation in header to clarify
this.

Change-Id: I3bb4cf08ba399f85cc67b17fd13a4a40d862415f


[ROCm/ROCR-Runtime commit: 73f6bfa747]
2024-09-30 17:41:58 -04:00
Tony Gutierrez f096174cfc rocr/aie: Add AMD AIE Embedded Runtime vendor packets
Adds support for the packet interface for interacting with
the Embedded Runtime (ERT) on AIE agents. The ERT is what
interprets command packets send to the AIE agent work
queues.

Change-Id: Id28fb98056b2c046354c446bdc9568d74385bea1


[ROCm/ROCR-Runtime commit: 6abb993f65]
2024-09-19 19:44:53 +00:00
Tony Gutierrez 64e2d37be8 rocr/aie: Add initial support for AIE agents
This change adds the initial classes for the AIE agent and AIE AQL
queue.

An AIE agent list is added to the core runtime object.

Change-Id: I84b02f52171b80726dfb2c8431582a3ea2986eb3


[ROCm/ROCR-Runtime commit: 8ea62f1cea]
2024-08-27 14:47:05 -07:00