Current DMABUF implemenation is unstable. Switch back to legacy
support for now.
Change-Id: I3be871f38c6524b0bcc9225bab61de4e57771efb
[ROCm/ROCR-Runtime commit: ea646cf958]
Currently, the only error type is HSA_AMD_MEMORY_ERROR_MEMORY_IN_USE,
which happens when a user application incorrectly tries to free memory
that is currently being used by underlying device hardware.
Change-Id: I8ce352eb9719694135fba1fa56d62368036b2e5e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 2853bf03f0]
New API to accept a file stream for logging
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 26e105d9ab]
Fallback to old userptr registration in case SVM method fails.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I70c3ec74a8b4f762713e6a0619453642f3fca8e5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 626eb4bfaf]
This lets you run two unsupported-but-really-supported cards of different architecture together in the same program. Works great w/ llama.cpp on my 7900XT + 6600.
Example usage (device 0 is RDNA3, device 1 is RDNA2):
HSA_OVERRIDE_GFX_VERSION_1="11.0.0" HSA_OVERRIDE_GFX_VERSION_2="10.3.0" ollama serve
Change-Id: Ic63ef462f698dee722d360f7fc3ef72789c277b7
Signed-off-by: AdamNiederer
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 84567b6416]
Fix register COMPUTE_PGM_RSRC2 in Dispatch code.
Bit 6 (called TRAP_PRESENT on pre-GFX12) should not be set on GFX12
as it has a different meaning (DYNAMIC_VGPR).
Minor instructions changes for CopyOnSignalIsa and WriteAndSignalIsa
shaders.
Change-Id: Ib4e75e3c92f220210bc45778738d81b91efb9d5e
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 611911020c]
A function call was refactored out of CommandLine.h, so add the header
to include it
Change-Id: If5594e3abc2fdfdd59f108c4379802cedab127ee
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: bd2d9770f7]
RDMATest.ContiguousVRAMAllocation test uses 4GB buffer, skip the test if
total VRAM size is less than 5GB, considering page table and other
reserved VRAM usage.
Change-Id: I0342417501cdd3477c2bf1b2f7d1e6bef61d1871
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 42df8b2b34]
This commit was ported from old repo.
Original author: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>
For static builds use drm and drm_amdgpu static libraries for linking.
Created a separate cmake target file and static library for static use case
The config file will include the respective target file based on
BUILD_SHARED_LIBS.
Default target file will be the one where drm and libdrm_amdgpu shaed libraries
are linked.
Applications using statically linked cmake targets of hsakmt should install the
required static libraries before building.
Change-Id: Idf4e1a2b5f18b344f5a9927803756d50c2b33702
[ROCm/ROCR-Runtime commit: 9e8477e1c9]
This will link static libraries of drm and libdrm_amdgpu libraries
This commit was ported from old repo and originally authored by:
Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>
Date: Thu, 20 Jun 2024 08:29:03 -0700
Change-Id: I8b06811516335317d4fb3d7c98b001a12776a808
[ROCm/ROCR-Runtime commit: 2a5e433393]
Change max_slice type to uint64_t and calculation to 64-bit, otherwise
value overflows to 0.
Problem triggered only on GFX12 as field size was increased.
Change-Id: If26451224538743dabc41bdc1b327c6ef021bc24
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 13c3f06dfe]
Since "libhsakmt: Prevent hsaKmtRegisterMemory* from registering non-userptr",
non-userptr is not allowed to be pinned any more.
Use hsa_amd_agents_allow_access to map host memory.
Change-Id: I898d2f83222907de58cafc1a2b18a636634d1b20
Signed-off-by: Lang Yu <lang.yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 7e6c3d1bfa]
Fix encoding of pitch in SRD (1 bit missing).
Issue affects images with pitch > 8192.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Id0b431f51ab3984d1a47d3e8c13d35e28a6009cf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 4f453f3bd4]
This variable is now in a sub-project, but needs to be visible
in the super-project.
Change-Id: I14d307646253df8f0a8a50d01b8ca677b904234c
[ROCm/ROCR-Runtime commit: 5820fa37d7]
New API to support alignment parameter when reserving virtual addresses.
If the alignment is 0, then the default size is used. Otherwise the
alignment needs to be a power of 2 and greater than or equal to page
size.
Existing hsa_amd_vmem_address_reserve marked for future deprecation.
Change-Id: I17cee75420183dea5842fc1ecc2514cdcd760bac
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 08c44fbda6]
The elf libraries are installed in /usr/lib64 in RHEL.
Removed invalid paths
Change-Id: I8c2b5525c1e3b62a2bd4e31a442d9931005c2f30
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 14ed20e0cc]
This is to fix the situation where libhsa-runtime exists in
/usr/lib. The preinstall script will check for this and ask
the user if they want to delete the old version, or else
abandon install.
Change-Id: I0976b6ec95b9752c95031f1a73fc49a150b02b23
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 4474dcff2c]
The current trap handler defined:
.set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SHIFT , 0
.set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SIZE , 6
.set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SHIFT , 0
.set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SIZE , 6
However, the ALU exception in EXCP_FLAG_USER go from bit 0 (alu_invalid)
to bit 6 (alu_int_div0), making it a total of 7 bits, not 6. Similarly,
the corresponding bits in TRAP_CTRL go from bit 0 to 6 as well.
Fix the incorrect size to be sure to properly detect the int_div0
exception.
Change-Id: I60c2d94a447b71ca0ce26a87b7f55b055b9aef8e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: cb8705627f]
This patch is to remove duplicated definition of GFX1150.
Change-Id: I4a8b8bce5c2721748c4d64e1da13b59feae2139a
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 1d1a32d725]
This avoids conflicts in case application is loading another copy of
addrlib.
Change-Id: Ifb4a10270c867366d5eed0a8c015257b415189a5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: f1a13b6d87]
This reverts commit b156e906d9c192bd487d10a8900e3eb6090ef547.
Reason for revert: Memory violation test causing a timeout in subsequent test.
Change-Id: If3a217575af545a47d6d67bebba4a2c640a43b81
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 2e1f363d2f]
The value of STATE_PRIV is captured by the 1st level trap handler, and
passed on to the second level trap handler. The value is to be restored
before exit. However it is possible for the value of
STATE_PRIV.BARRIER_COMPLETE to change while the wave is in the trap
handler (all the other waves in the workgroup has signaled the
work-gropu barrier), and in this case restoring STATE_PRIV in full would
result in STATE_PRIV.BARRIER_COMPLETE to be cleared.
Restore every bits of STATE_PRIV except for BARRIER_COMPLETE before
return to prevent this race.
Change-Id: I76c875bced7d23c58670b28f257d22c933f99fc5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 9e625307d2]
GFX94x runs into performance regression when doing large packet
enqueues.
Drop back to legacy packet sizes for now.
Change-Id: I595838ebada66c6c5143bfdb2f56c83ee71654a9
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: b8aae52404]
Removing extra bits set in forbiddenBlock that seemed to be set for
debugging and are causing unexpected image formats to be used.
Change-Id: I29c9e319907027a2b0b6bf7c1c0c8558eb6a36f4
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: e721eb509b]
Packet for GFX12 is incompatible with pre-GFX12 as some fields changed
location. Implement code path and packet specific to GFX12.
This fixes some issues with SDMA blits and 3D images.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I56c204aaa12160e563ec960bd3b226cfa94e142d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 6d147dd3b1]
Add new files image_manager_gfx12.{h,cpp}.
Implement BUF/IMG/SAMP desc changes for GFX12.
Implement compute surface info code using AddrLib3 API (new starting
from GFX12).
Implement algorithm for choosing "best" swizzle mode (starting
from AddrLib3/GFX12, AddrLib provides only list of suitable swizzle mode,
up to client, ROCr, to choose the best). Algorithm implemented follows
behaviour in GFX11 and behaviour for GFX12 on other platforms.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ib344c86228a98bbac5acdab421ee2ef9b1e84eef
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: f8a015f53e]
Added GFX12 implementation for InitScratchSRD and for compute_tmpring.
Implementation for compute_tmpring could be combined with GFX11 with some
refactoring as a possible future improvement.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I8013cbe4438786bf41bbfd03f6a5d3b9ef51e7bf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: def4a6c326]
Updated struct definitions, field size changes and new fields in
registers.h.
Added resource_gfx12.h and updated fields in BUF/IMG/SAMP descriptor
structs based on documentation.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I08f05ba30f54c40e7b823a6a105829a1e8590b3d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 8165da63cc]
Do not allow extended-scope fine-grain memory on gfx120x devices.
Change-Id: I1e6e6c1860de00160cca9d8137b129c7e32c0526
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 7dd90f8361]
There is an issue in the gfx12 trap handler where the EXCP_FLAG_PRIV
is only fetched under certain conditions (trap_id != 0) while it should
have been fetched unconditionally. As a consequence, the interrupt
payload might contain invalid data, leading to incorrect exceptions
being reported by the runtime. Debugger is mostly un-affected as it
will inspect the wave's state to figure out what exception(s) have been
reported for each wave.
Also, it is not necessary to check for the host trap bit if trap_id is
!= 0 in gfx12, there is on trap ID anymore for host trap.
This patch implements those fixes.
Co-Authored-By: Laurent Morichetti <laurent.morichetti@amd.com>
Change-Id: Ib72cd8cc5d935ca643e241da7fccd3f96201b09d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 7a3bf30769]
The constant declarations in trap_handler_gfx12.s have been sorted
alphabetically, which causes inconsistencies. Fix the order of
declarations where it makes sense.
Change-Id: I5b05d87a5afbe1ff3362746801a1c9373537b49e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: ff9b11fd89]
Given the differences between previous architectures and gfx12, this
patch implements the gfx12 2nd level trap handler in a separate source
file, and adjusts the build system.
Change-Id: I65192ffbbcd66a4f78d2d0c3fb1739a92cac95d4
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 855015377c]
For GFX12, the workgroup id is passed in ttmp9 (trap temp register) instead of the scalar register.
Normal shader code (i.e. not priv, not trap handler) can only read the ttmp registers.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I42404d8c8c0ee9c746e23879fd30b2d16cfa1787
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 40cc6559f1]
Add timeout to AQLQueue destructor signal wait to prevent indefinite hang
Change-Id: I6c6c98a7bdd27d39569af1d667aa9aa7e9596535
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 4e9647704d]