This change adds the initial classes for the AIE agent and AIE AQL
queue.
An AIE agent list is added to the core runtime object.
Change-Id: I84b02f52171b80726dfb2c8431582a3ea2986eb3
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.
Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06
A recent patch introduced a build failure when building with Clang:
[ 65%] Building CXX object runtime/hsa-runtime/CMakeFiles/hsa-runtime64.dir/libamdhsacode/amd_core_dump.cpp.o
[…]/runtime/hsa-runtime/libamdhsacode/amd_core_dump.cpp:271:29: error: arithmetic on a pointer to void
271 | read = pread(fd_, buf + done, buf_size - done,
| ~~~ ^
1 error generated.
This patch fixes this by making sure the "void *" pointer is converting
to "char *" before doing arithmetic on it.
Change-Id: Ib1663ed30abce76e05f06d042975eccd7d729823
Recommended SDMA engines for DMA copies are now exposed for better
GPU-GPU performance. ROCr can now select those DMA engines.
Also lock-in host-device copies to SDMA0 and device-host copies to
SDMA1 for better stability and performance.
Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.
Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
The current trap handler has 2 limitations:
1) If it receives a HOST_TRAP, it clears the corresponding bit
and notifies the host, when it should not.
2) When it is entered because of a debug trap (s_trap 3) and the
debugger is not attached, it returns unconditionally. However,
if another exception is reported at the same time as the trap
handler is entered for the debug trap (a memory violation for
example), that other exception ends-up being ignored.
This patch addresses both of those issues. It makes it so host traps
and debug traps are ignored when necessary. If any other exception is
reported to the wave, we halt the wave and notify the host, and if no
other exception is reported (i.e. we entered the trap handler because of
host trap or debug trap), we return to shader code.
Other minor defects are also fixed during this refactor:
- Fixed SQ_WAVE_EXCP_FLAG_PRIV_XNACK_ERROR_SHIFT which had an incorrect
value
- Host traps can be sent at any time, including after we have halted a
wave. In such case, the old approach would have:
1) cleared the trap ID saved in ttmp6
2) clobbered ttmp10 where part of the actual wave's PC is saved.
Change-Id: I9ecd341f4967e686233dec182b3e5b0388ef19bd
This fixes an issue for missing HW events when out of HW events.
We cannot determine whether a HW event has occurred unless we call the
underlying drivers with hsaKmtWaitOnMultipleEvents_Ext. Previous logic
in Signal::WaitAny would switch to ACTIVE_WAIT state if we run out of
hardware events (signal->EopEvent() == NULL) and this would cause the
hsaKmtWaitOnMultipleEvents_Ext call to be skipped. But also, when we
have some signals without hardware events, calling
hsaKmtWaitOnMultipleEvents_Ext with a timeout of 0 so that we can poll
for remaining signals adds overhead with an IOCTL call and may cause
extra delay. Separating AsyncEventLoop into two separate threads so
that:
1. We can have a new Signal::WaitAnyExceptions to wait for HW events
This function can be simpler as it does not have to perform all the
timer calculations because it is expected to be always waiting on
hsaKmtWaitOnMultipleEvents_Ext through the lifetime of a process.
2. Signal::WaitAny does not need to have extra code to check for HW
exceptions as it only needs to handle HSA_EVENTTYPE_SIGNAL events. It
can also skip the calls to hsaKmtWaitOnMultipleEvents_Ext if needed.
Change-Id: I52ba99fd6e483e0cb477b7931a0dcc03520aa523
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Delete queues used internally in agent destructor to make sure any
memory allocated by the queue objects are freed before the agent memory
regions are destroyed.
Change-Id: I4768c9cf66f77ac00a5a355f373f7f22dc266e47
If user application tries to free memory that is currently being used by
the underlying HW device, the hsaKmtFreeMemory function call will fail.
This would be caused by an incorrect call by the user application. A
system memory error is raised and the user application is expected to
abort when this happens.
Note: This leaves the allocation_map_ table in an inconsistent state as
this address entry is removed from it while the pointer is not actually
free'd. But re-organising the FreeMemory() function would require the
memory_lock_ to be held for much longer and may affect performance.
Since this is a very unlikely and invalid use case, we prefer to leave
the FreeMemory() function as is.
Change-Id: I24279eb98620c32d34f4c5ad1b7a0a30cb65835d
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Skip coredump generation when receiving HSA_STATUS_ERROR_MEMORY_FAULT.
We also receive a system error of type HSA_EVENTTYPE_MEMORY and generate
the coredump there. Trying to generate coredump from 2 places sometimes
causes unnecessary error message because both places try to create a
coredump file with the same name.
Change-Id: If3f03bab2c24ad71dfeff39ab411bb9ac08b337e
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
This patch adds output (to stderr) to indicate step in the core dump
creation failed to improve debuggability.
Change-Id: I349692e278c2d744136d7fba7f7c2e5a7ada0c06
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
It is possible for the runtime to receive an interrupt while trying to
access VRAM data using /proc/self/mem. In such case, pread(2) would
return -1 and set errno to -EINTR. This is not an error case, the
pread(2) call just need to be restarted, however current implementation
would tread it as an error.
This patch changes the the implementation to correctly retry on EINTR.
While at it, this patch also handles cases where pread(2) reads less
data than originally requested.
Change-Id: I6a72fc5eda4afd90319f0d24b35c9eac6d1ff41c
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Force mem_flags to be explicit passed in then calling Queue constructor
to avoid ambiguity with calls to Queue constructor trying to only pass
the agent_node_id.
Change-Id: Ib6fedcb9e52d6c9f35f9051dfa989343456ca368
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Currently, the only error type is HSA_AMD_MEMORY_ERROR_MEMORY_IN_USE,
which happens when a user application incorrectly tries to free memory
that is currently being used by underlying device hardware.
Change-Id: I8ce352eb9719694135fba1fa56d62368036b2e5e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
New API to accept a file stream for logging
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
This will link static libraries of drm and libdrm_amdgpu libraries
This commit was ported from old repo and originally authored by:
Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>
Date: Thu, 20 Jun 2024 08:29:03 -0700
Change-Id: I8b06811516335317d4fb3d7c98b001a12776a808
Change max_slice type to uint64_t and calculation to 64-bit, otherwise
value overflows to 0.
Problem triggered only on GFX12 as field size was increased.
Change-Id: If26451224538743dabc41bdc1b327c6ef021bc24
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Fix encoding of pitch in SRD (1 bit missing).
Issue affects images with pitch > 8192.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Id0b431f51ab3984d1a47d3e8c13d35e28a6009cf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
New API to support alignment parameter when reserving virtual addresses.
If the alignment is 0, then the default size is used. Otherwise the
alignment needs to be a power of 2 and greater than or equal to page
size.
Existing hsa_amd_vmem_address_reserve marked for future deprecation.
Change-Id: I17cee75420183dea5842fc1ecc2514cdcd760bac
Signed-off-by: Chris Freehill <cfreehil@amd.com>
The elf libraries are installed in /usr/lib64 in RHEL.
Removed invalid paths
Change-Id: I8c2b5525c1e3b62a2bd4e31a442d9931005c2f30
Signed-off-by: Chris Freehill <cfreehil@amd.com>
The current trap handler defined:
.set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SHIFT , 0
.set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SIZE , 6
.set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SHIFT , 0
.set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SIZE , 6
However, the ALU exception in EXCP_FLAG_USER go from bit 0 (alu_invalid)
to bit 6 (alu_int_div0), making it a total of 7 bits, not 6. Similarly,
the corresponding bits in TRAP_CTRL go from bit 0 to 6 as well.
Fix the incorrect size to be sure to properly detect the int_div0
exception.
Change-Id: I60c2d94a447b71ca0ce26a87b7f55b055b9aef8e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
This patch is to remove duplicated definition of GFX1150.
Change-Id: I4a8b8bce5c2721748c4d64e1da13b59feae2139a
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
This avoids conflicts in case application is loading another copy of
addrlib.
Change-Id: Ifb4a10270c867366d5eed0a8c015257b415189a5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
The value of STATE_PRIV is captured by the 1st level trap handler, and
passed on to the second level trap handler. The value is to be restored
before exit. However it is possible for the value of
STATE_PRIV.BARRIER_COMPLETE to change while the wave is in the trap
handler (all the other waves in the workgroup has signaled the
work-gropu barrier), and in this case restoring STATE_PRIV in full would
result in STATE_PRIV.BARRIER_COMPLETE to be cleared.
Restore every bits of STATE_PRIV except for BARRIER_COMPLETE before
return to prevent this race.
Change-Id: I76c875bced7d23c58670b28f257d22c933f99fc5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
GFX94x runs into performance regression when doing large packet
enqueues.
Drop back to legacy packet sizes for now.
Change-Id: I595838ebada66c6c5143bfdb2f56c83ee71654a9
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Removing extra bits set in forbiddenBlock that seemed to be set for
debugging and are causing unexpected image formats to be used.
Change-Id: I29c9e319907027a2b0b6bf7c1c0c8558eb6a36f4
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Packet for GFX12 is incompatible with pre-GFX12 as some fields changed
location. Implement code path and packet specific to GFX12.
This fixes some issues with SDMA blits and 3D images.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I56c204aaa12160e563ec960bd3b226cfa94e142d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Add new files image_manager_gfx12.{h,cpp}.
Implement BUF/IMG/SAMP desc changes for GFX12.
Implement compute surface info code using AddrLib3 API (new starting
from GFX12).
Implement algorithm for choosing "best" swizzle mode (starting
from AddrLib3/GFX12, AddrLib provides only list of suitable swizzle mode,
up to client, ROCr, to choose the best). Algorithm implemented follows
behaviour in GFX11 and behaviour for GFX12 on other platforms.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Ib344c86228a98bbac5acdab421ee2ef9b1e84eef
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Added GFX12 implementation for InitScratchSRD and for compute_tmpring.
Implementation for compute_tmpring could be combined with GFX11 with some
refactoring as a possible future improvement.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I8013cbe4438786bf41bbfd03f6a5d3b9ef51e7bf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Updated struct definitions, field size changes and new fields in
registers.h.
Added resource_gfx12.h and updated fields in BUF/IMG/SAMP descriptor
structs based on documentation.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I08f05ba30f54c40e7b823a6a105829a1e8590b3d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Do not allow extended-scope fine-grain memory on gfx120x devices.
Change-Id: I1e6e6c1860de00160cca9d8137b129c7e32c0526
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Added GFX12 and AddrLib3 files, updated include paths.
Change-Id: I4880eadfd627b79ebcf2fe26b91649642911b050
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
There is an issue in the gfx12 trap handler where the EXCP_FLAG_PRIV
is only fetched under certain conditions (trap_id != 0) while it should
have been fetched unconditionally. As a consequence, the interrupt
payload might contain invalid data, leading to incorrect exceptions
being reported by the runtime. Debugger is mostly un-affected as it
will inspect the wave's state to figure out what exception(s) have been
reported for each wave.
Also, it is not necessary to check for the host trap bit if trap_id is
!= 0 in gfx12, there is on trap ID anymore for host trap.
This patch implements those fixes.
Co-Authored-By: Laurent Morichetti <laurent.morichetti@amd.com>
Change-Id: Ib72cd8cc5d935ca643e241da7fccd3f96201b09d
Signed-off-by: Chris Freehill <cfreehil@amd.com>
The constant declarations in trap_handler_gfx12.s have been sorted
alphabetically, which causes inconsistencies. Fix the order of
declarations where it makes sense.
Change-Id: I5b05d87a5afbe1ff3362746801a1c9373537b49e
Signed-off-by: Chris Freehill <cfreehil@amd.com>