Wykres commitów

1036 Commity

Autor SHA1 Wiadomość Data
Tony Gutierrez a851f73da5 rocr/aie: Init mem regions for AIE agents
Change-Id: If180bdbcb3eb659f0d05a710526864494316d7a9
2024-09-19 19:44:53 +00:00
Tony Gutierrez 6abb993f65 rocr/aie: Add AMD AIE Embedded Runtime vendor packets
Adds support for the packet interface for interacting with
the Embedded Runtime (ERT) on AIE agents. The ERT is what
interprets command packets send to the AIE agent work
queues.

Change-Id: Id28fb98056b2c046354c446bdc9568d74385bea1
2024-09-19 19:44:53 +00:00
Tony Gutierrez 931733d51a rocr/aie: Add support for creating AIE queue context
Adds support for initialzing the XDNA driver so that
a hardware context can be created for an AIE queue.

Right now this initializes the device heap in the driver,
gets the relevant tile parameters for the AIE agent,
and creates a hardware context that backs the AIE queue.

Change-Id: Ib90e1bc67a8637f6db3ff2bebe34677843796417
2024-09-19 19:44:53 +00:00
David Yat Sin d6ec7b6489 rocr: Remove unnecessary function declarations
Change-Id: Ia2613ce74cac808f9239fc24049b57b7b1abaed9
2024-09-19 19:44:53 +00:00
David Yat Sin 12e299e8d4 rocr: Fix compile error
Change-Id: Iae6bf08e834a426f6f97cbc51d2a1a38199015bd
2024-09-19 19:44:53 +00:00
David Yat Sin 924e11ba7f rocr: Increase queue size for co-op queues
Increase queue-size for co-op queues to 16K to improve performance on
some workloads

Change-Id: I4d3bf0ecbd30ebb648b68d9c5fdabadc670a386c
2024-09-19 19:44:53 +00:00
Jonathan Kim 509e8d863a rocr: Reverse host-device copy engines on GFX94x
GFX 9.4.x has better performance for CPU-GPU copies when using
engines in reverse order from other devices.

Change-Id: I1eaebf0e837bb7f44712f40d5115df618f6a73d7
2024-09-06 19:02:59 -04:00
Jonathan Kim 24b25003b0 rocr: Fix backwards compatible host-device copies on target engines
If the KFD doesn't support targeting SDMA engines, ensure that ROCr
selects the correct downstream queue type by using an invalid engine.

Change-Id: Ia6848126f67f3d35ab37248633e8e0e6e2d77fff
2024-09-06 19:02:51 -04:00
Saleel Kudchadker 3baaa6e9c0 rocr: Allocate AQL queue on device memory
- Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device
memory.
- Before writing AQL packet header to the queue use an SFENCE to ensure
that there is no reodering of the writes over PCIE

Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d
2024-09-05 17:48:02 -04:00
Chris Freehill 7e13b9e62f hsakmt: Fix ROCr static lib build in new layout
Change-Id: Idc71524924b96a44d63be9b1d0fccbe0e328d96e
2024-09-05 10:26:06 -04:00
Tony Gutierrez 68669f4e1a rocr: Generalize AMD::MemoryRegion Allocate and Free
Remove KFD-specific Allocate/Free calls from the AMD::MemoryRegion.
The KFD-driver-specific Allocate/Free calls are now implemented in
the KfdDriver. Future changes will migrate the remaining KFD-specific
calls out of AMD::MemoryRegion.

This allows the MemoryRegion to be used across AMD drivers like the
XDNA driver.

Change-Id: Ib6a2a9e5e1a15e61644d2592beb3a8e6578c3010
2024-08-28 14:35:07 -07:00
Tony Gutierrez c42ff44a6a rocr/kfd-driver: Add initial KFD driver interface
Adds the initial KFD driver interface and use it to open the
KFD from amd_topology.cpp.

This change is to show the direction of the Driver interface for
initially supporting the KFD and to get feedback on the approach.
For now we wrap relevant ROCt calls behind this generic driver
interface so that we can generalize core ROCr components like
MemoryRegion, Runtime, etc.

Now that ROCt is incorporated into ROCr, we can more fully integrate
ROCt into the Driver interface. Ideally, we get to a point where
the generic Driver interface can support KFD, XDNA, and potential
future drivers.

Change-Id: I4573fd6af1f8398233ee9d3814d9f3139dd0279c
2024-08-28 14:34:54 -07:00
Tony Gutierrez 86f40ae489 rocr/xdna-driver: Initial support for amdxdna driver
Change-Id: I319b55d89dc644e7151228cb6c19d1a633171295
2024-08-28 14:34:39 -07:00
Shweta Khatri da69ffff0f Set internal cache for rocprofiler-register dependency
Change-Id: I8a661818c11c4de0df9743dacb78b7c5163b6da9
2024-08-28 14:48:51 -04:00
Tony Gutierrez 8ea62f1cea rocr/aie: Add initial support for AIE agents
This change adds the initial classes for the AIE agent and AIE AQL
queue.

An AIE agent list is added to the core runtime object.

Change-Id: I84b02f52171b80726dfb2c8431582a3ea2986eb3
2024-08-27 14:47:05 -07:00
David Yat Sin cb672ebcd1 Set ELF_GETSHDRSTRNDX when cxx compiler is not loaded
Change-Id: Ia26b8999909f688ce78d9bbe4cb2a7262df2ee02
2024-08-22 17:20:37 -04:00
David Yat Sin c8dd4d2b3b rocr: Handle pthread_create returning errors
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.

Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06
2024-08-22 12:15:10 -04:00
Lancelot SIX 3475a45137 rocr/amd_core_dump: Fix "arithmetic on a pointer to void"
A recent patch introduced a build failure when building with Clang:

    [ 65%] Building CXX object runtime/hsa-runtime/CMakeFiles/hsa-runtime64.dir/libamdhsacode/amd_core_dump.cpp.o
    […]/runtime/hsa-runtime/libamdhsacode/amd_core_dump.cpp:271:29: error: arithmetic on a pointer to void
      271 |       read = pread(fd_, buf + done, buf_size - done,
          |                         ~~~ ^
    1 error generated.

This patch fixes this by making sure the "void *" pointer is converting
to "char *" before doing arithmetic on it.

Change-Id: Ib1663ed30abce76e05f06d042975eccd7d729823
2024-08-21 17:19:28 -04:00
Jonathan Kim eb30a5bbc7 rocr: Memory copy based on recommended SDMA engines
Recommended SDMA engines for DMA copies are now exposed for better
GPU-GPU performance. ROCr can now select those DMA engines.

Also lock-in host-device copies to SDMA0 and device-host copies to
SDMA1 for better stability and performance.

Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5
2024-08-20 16:22:32 -04:00
Yifan Zhang 3f1f68c8cb libhsakmt: add OverrideEngineId property
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.

Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
2024-08-20 09:10:52 -04:00
Lancelot Six 123b2c080a rocr/trap_handler_gfx12: Properly ignore HOST_TRAP and debug_trap
The current trap handler has 2 limitations:
1) If it receives a HOST_TRAP, it clears the corresponding bit
   and notifies the host, when it should not.
2) When it is entered because of a debug trap (s_trap 3) and the
   debugger is not attached, it returns unconditionally.  However,
   if another exception is reported at the same time as the trap
   handler is entered for the debug trap (a memory violation for
   example), that other exception ends-up being ignored.

This patch addresses both of those issues.  It makes it so host traps
and debug traps are ignored when necessary.  If any other exception is
reported to the wave, we halt the wave and notify the host, and if no
other exception is reported (i.e. we entered the trap handler because of
host trap or debug trap), we return to shader code.

Other minor defects are also fixed during this refactor:
- Fixed SQ_WAVE_EXCP_FLAG_PRIV_XNACK_ERROR_SHIFT which had an incorrect
  value
- Host traps can be sent at any time, including after we have halted a
  wave.  In such case, the old approach would have:
  1) cleared the trap ID saved in ttmp6
  2) clobbered ttmp10 where part of the actual wave's PC is saved.

Change-Id: I9ecd341f4967e686233dec182b3e5b0388ef19bd
2024-08-19 21:22:13 -04:00
David Yat Sin 88eaa834d0 Separate AsyncEventsLoop into two separate threads
This fixes an issue for missing HW events when out of HW events.

We cannot determine whether a HW event has occurred unless we call the
underlying drivers with hsaKmtWaitOnMultipleEvents_Ext. Previous logic
in Signal::WaitAny would switch to ACTIVE_WAIT state if we run out of
hardware events (signal->EopEvent() == NULL) and this would cause the
hsaKmtWaitOnMultipleEvents_Ext call to be skipped. But also, when we
have some signals without hardware events, calling
hsaKmtWaitOnMultipleEvents_Ext with a timeout of 0 so that we can poll
for remaining signals adds overhead with an IOCTL call and may cause
extra delay. Separating AsyncEventLoop into two separate threads so
that:

1. We can have a new Signal::WaitAnyExceptions to wait for HW events
This function can be simpler as it does not have to perform all the
timer calculations because it is expected to be always waiting on
hsaKmtWaitOnMultipleEvents_Ext through the lifetime of a process.

2. Signal::WaitAny does not need to have extra code to check for HW
exceptions as it only needs to handle HSA_EVENTTYPE_SIGNAL events. It
can also skip the calls to hsaKmtWaitOnMultipleEvents_Ext if needed.

Change-Id: I52ba99fd6e483e0cb477b7931a0dcc03520aa523
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 17:54:11 -04:00
David Yat Sin 56ba584a22 rocr: Delete internal CP queues in GPU agent destructor
Delete queues used internally in agent destructor to make sure any
memory allocated by the queue objects are freed before the agent memory
regions are destroyed.

Change-Id: I4768c9cf66f77ac00a5a355f373f7f22dc266e47
2024-08-19 17:16:46 -04:00
David Yat Sin 921471bd94 Raise system error when memory free is denied
If user application tries to free memory that is currently being used by
the underlying HW device, the hsaKmtFreeMemory function call will fail.

This would be caused by an incorrect call by the user application. A
system memory error is raised and the user application is expected to
abort when this happens.

Note: This leaves the allocation_map_ table in an inconsistent state as
this address entry is removed from it while the pointer is not actually
free'd. But re-organising the FreeMemory() function would require the
memory_lock_ to be held for much longer and may affect performance.
Since this is a very unlikely and invalid use case, we prefer to leave
the FreeMemory() function as is.

Change-Id: I24279eb98620c32d34f4c5ad1b7a0a30cb65835d
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 14:03:12 -04:00
David Yat Sin aae4dab88e Do not generate coredump on VM fault signal event
Skip coredump generation when receiving HSA_STATUS_ERROR_MEMORY_FAULT.
We also receive a system error of type HSA_EVENTTYPE_MEMORY and generate
the coredump there. Trying to generate coredump from 2 places sometimes
causes unnecessary error message because both places try to create a
coredump file with the same name.

Change-Id: If3f03bab2c24ad71dfeff39ab411bb9ac08b337e
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 13:21:26 -04:00
David Yat Sin 5f943dc44e Fix compile warnings
Removing unused variables

Change-Id: I3a9811e40c9bc735d13a0330b2015576ed112026
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 13:21:08 -04:00
Shweta.Khatri fda2a395a3 PC-Sampling - gfx94x Hosttrap method support
Supports PC-Sampling on gfx94x in both CPX and SPX mode

Change-Id: Ife1e50ab08155678ea4aa2b80475b9974812c40e
2024-08-19 13:20:42 -04:00
Lancelot Six 3646064a0e coredump: Print diagnostic in stderr when errors are detected
This patch adds output (to stderr) to indicate step in the core dump
creation failed to improve debuggability.

Change-Id: I349692e278c2d744136d7fba7f7c2e5a7ada0c06
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 13:20:20 -04:00
Lancelot Six 3e0d3d6d61 coredump: Improve error handling when reading VRAM
It is possible for the runtime to receive an interrupt while trying to
access VRAM data using /proc/self/mem.  In such case, pread(2) would
return -1 and set errno to -EINTR.  This is not an error case, the
pread(2) call just need to be restarted, however current implementation
would tread it as an error.

This patch changes the the implementation to correctly retry on EINTR.
While at it, this patch also handles cases where pread(2) reads less
data than originally requested.

Change-Id: I6a72fc5eda4afd90319f0d24b35c9eac6d1ff41c
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 12:20:22 -04:00
David Yat Sin 1d1d402dcc Do not allow default mem_flags
Force mem_flags to be explicit passed in then calling Queue constructor
to avoid ambiguity with calls to Queue constructor trying to only pass
the agent_node_id.

Change-Id: Ib6fedcb9e52d6c9f35f9051dfa989343456ca368
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 12:19:32 -04:00
Swati Rawat 4cb5c509f9 Tagging APIs from hsa_ext_amd.h for Doxygen
Change-Id: I2ab2358985442647cedbd99eca5b1140cb0b0680
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 12:17:20 -04:00
Shweta.Khatri 8176a8830f Adjusted indentation with tabs
No functional change

Change-Id: Ibe97b03f62c4affcb60d3469312c8a0b6eb11391
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 12:16:58 -04:00
James Xu a621bca303 Fix compile errors with musl>=1.2.3
Patch submitted on behalf of user AngryLoki:

The fix repeats common pattern, used for musl, 
e.g: https://github.com/void-linux/void-packages/blob/5ccf1c66a1df2d644e1a0db0a68fca321469c57e/srcpkgs/MangoHud/patches/0001-elfhacks-d_un.d_ptr-is-relative-on-non-glibc-systems.patch#L90.

Quoting:
d_un.d_ptr is relative on non glibc systems

elf(5) documents it this way, glibc diverts from this documentation

Change-Id: I815f88f127ef00c88ae827a8ad48df0d33c92467
2024-08-19 11:02:29 -04:00
Jonathan Kim ea646cf958 Disable DMABUF IPC iplementation
Current DMABUF implemenation is unstable.  Switch back to legacy
support for now.

Change-Id: I3be871f38c6524b0bcc9225bab61de4e57771efb
2024-08-12 13:14:14 -04:00
David Yat Sin 2853bf03f0 Add new system event for memory errors
Currently, the only error type is HSA_AMD_MEMORY_ERROR_MEMORY_IN_USE,
which happens when a user application incorrectly tries to free memory
that is currently being used by underlying device hardware.

Change-Id: I8ce352eb9719694135fba1fa56d62368036b2e5e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-08-07 02:59:00 +00:00
Saleel Kudchadker 26e105d9ab Initial external logging API
New API to accept a file stream for logging

Co-authored-by: David Yat Sin <David.YatSin@amd.com>

Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-08-07 02:59:00 +00:00
Jonathan R. Madsen 64af2d71ef Fix hsa_amd_vmem_address_reserve_align_fn addition
- https://gerrit-git.amd.com/c/hsa/ec/hsa-runtime/+/1058280 erroneously placed the new function pointer in the middle of the struct instead of the end

Change-Id: I49d1fa86a86764138250cd0471df1915a756d1ca
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-08-07 02:59:00 +00:00
Chris Freehill c48b858093 Update documentation
Sync with the latest changes from upstream repo

Change-Id: I309880f5c7f77c58a8b186db320bbc0f0e634089
2024-08-07 02:58:34 +00:00
Chris Freehill 2a5e433393 Use staticdrm target of hsakmt for static build
This will link static libraries of drm and libdrm_amdgpu libraries

This commit was ported from old repo and originally authored by:
Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>
Date: Thu, 20 Jun 2024 08:29:03 -0700

Change-Id: I8b06811516335317d4fb3d7c98b001a12776a808
2024-07-17 22:45:50 -05:00
Tim Huang 1278ac25c0 Fix last AMDGCN-based processors enumerator error
Change-Id: Idd0659a327585b30b0f7d4dcb9e2212b55239941
Signed-off-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-17 22:32:23 -05:00
David Belanger 13c3f06dfe Fix overflow in max_slice variable for GFX12
Change max_slice type to uint64_t and calculation to 64-bit, otherwise
value overflows to 0.

Problem triggered only on GFX12 as field size was increased.

Change-Id: If26451224538743dabc41bdc1b327c6ef021bc24
Signed-off-by: David Belanger <david.belanger@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-17 22:32:23 -05:00
David Belanger 4f453f3bd4 Fix image issue on GFX12
Fix encoding of pitch in SRD (1 bit missing).
Issue affects images with pitch > 8192.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Id0b431f51ab3984d1a47d3e8c13d35e28a6009cf
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-17 22:32:23 -05:00
Chris Freehill 5820fa37d7 Set PARENT_SCOPE for HSA_DEP_ROCPROFILER_REG
This variable is now in a sub-project, but needs to be visible
in the super-project.

Change-Id: I14d307646253df8f0a8a50d01b8ca677b904234c
2024-07-17 17:52:59 -05:00
David Yat Sin 08c44fbda6 Add hsa_amd_vmem_address_reserve_align API
New API to support alignment parameter when reserving virtual addresses.
If the alignment is 0, then the default size is used. Otherwise the
alignment needs to be a power of 2 and greater than or equal to page
size.

Existing hsa_amd_vmem_address_reserve marked for future deprecation.

Change-Id: I17cee75420183dea5842fc1ecc2514cdcd760bac
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:57:22 -05:00
Yifan Zhang 71494a920b Add support for GC 11.5.2
Change-Id: Iad8604881dc66108933ac2155fef3b74bca9ac3f
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:50:03 -05:00
Ranjith Ramakrishnan 14ed20e0cc Update elf library search path with lib64 path as well
The elf libraries are installed in /usr/lib64 in RHEL.
Removed invalid paths

Change-Id: I8c2b5525c1e3b62a2bd4e31a442d9931005c2f30
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:50:03 -05:00
Vladimir Indic c15e5d0e9d PC Sampling: Add s_nop prior to s_sendmeg
Add s_nop before s_sendmsg. This is required because the HW does not
check for dependencies for SALU writes to M0.

Section 4.5: Manually Inserted Wait States (NOPs)
"AMD Instinct MI200" Instruction Set Architecture
https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/instinct-mi200-cdna2-instruction-set-architecture.pdf

Change-Id: I90f503e3cc80cd29eab8bafa2565699461654055
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:50:03 -05:00
Lancelot SIX cb8705627f trap_handler_gfx12: fix-math-excp-size
The current trap handler defined:

    .set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SHIFT    , 0
    .set SQ_WAVE_EXCP_FLAG_USER_MATH_EXCP_SIZE     , 6
    .set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SHIFT         , 0
    .set SQ_WAVE_TRAP_CTRL_MATH_EXCP_SIZE          , 6

However, the ALU exception in EXCP_FLAG_USER go from bit 0 (alu_invalid)
to bit 6 (alu_int_div0), making it a total of 7 bits, not 6.  Similarly,
the corresponding bits in TRAP_CTRL go from bit 0 to 6 as well.

Fix the incorrect size to be sure to properly detect the int_div0
exception.

Change-Id: I60c2d94a447b71ca0ce26a87b7f55b055b9aef8e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00
Yifan Zhang 1d1a32d725 GFX1150: remove dupilcated definition of GFX1150
This patch is to remove duplicated definition of GFX1150.

Change-Id: I4a8b8bce5c2721748c4d64e1da13b59feae2139a
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00
David Yat Sin f1a13b6d87 Move addrlib into rocr namespace
This avoids conflicts in case application is loading another copy of
addrlib.

Change-Id: Ifb4a10270c867366d5eed0a8c015257b415189a5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:41:53 -05:00