Commit grafiek

2589 Commits

Auteur SHA1 Bericht Datum
Ramesh Errabolu 71fa3fa19b Do not check default value of SVM attribute Granularity
Change-Id: I3cf97fc551259c873351cfd22fc83e8615cc3e56
2024-09-19 19:44:53 +00:00
Ranjith Ramakrishnan d60f56ab32 Remove license file from hsa-rocr-devel package
License file is already there in hsa-rocr package .Devel package do not need the same

Change-Id: I08cceeb169d0c061078cd495342f78c089087f0d
2024-09-19 19:44:53 +00:00
Tony Gutierrez a851f73da5 rocr/aie: Init mem regions for AIE agents
Change-Id: If180bdbcb3eb659f0d05a710526864494316d7a9
2024-09-19 19:44:53 +00:00
Tony Gutierrez 6abb993f65 rocr/aie: Add AMD AIE Embedded Runtime vendor packets
Adds support for the packet interface for interacting with
the Embedded Runtime (ERT) on AIE agents. The ERT is what
interprets command packets send to the AIE agent work
queues.

Change-Id: Id28fb98056b2c046354c446bdc9568d74385bea1
2024-09-19 19:44:53 +00:00
Tony Gutierrez 931733d51a rocr/aie: Add support for creating AIE queue context
Adds support for initialzing the XDNA driver so that
a hardware context can be created for an AIE queue.

Right now this initializes the device heap in the driver,
gets the relevant tile parameters for the AIE agent,
and creates a hardware context that backs the AIE queue.

Change-Id: Ib90e1bc67a8637f6db3ff2bebe34677843796417
2024-09-19 19:44:53 +00:00
Chris Freehill f8d63e2fb4 hsakmt: Update amdp2ptest.c license to MIT
Change-Id: I1eb814dbb4b420840d9877fc6a4806708754ac69
2024-09-19 19:44:53 +00:00
David Yat Sin d6ec7b6489 rocr: Remove unnecessary function declarations
Change-Id: Ia2613ce74cac808f9239fc24049b57b7b1abaed9
2024-09-19 19:44:53 +00:00
David Yat Sin 12e299e8d4 rocr: Fix compile error
Change-Id: Iae6bf08e834a426f6f97cbc51d2a1a38199015bd
2024-09-19 19:44:53 +00:00
Shweta Khatri c30ff893a6 Add rocprofiler-register dependency to build
Ensure rocprofiler-register is linked and added to DEB and RPM package dependencies.
Github ticket - https://github.com/ROCm/ROCm/issues/3654

Change-Id: Iaaaca8bfa81ca33da147673ef1be798109b70aa5
2024-09-19 19:44:53 +00:00
David Yat Sin 924e11ba7f rocr: Increase queue size for co-op queues
Increase queue-size for co-op queues to 16K to improve performance on
some workloads

Change-Id: I4d3bf0ecbd30ebb648b68d9c5fdabadc670a386c
2024-09-19 19:44:53 +00:00
David Yat Sin 3cb25e5236 rocrtst: Add negative test for invalid buffer free
Add a negative test to try to free the ring buffer of a queue and
confirm that a memory error is generated.

Change-Id: I4afd95c69c62f7c3e1138d5d6c4a5fd237631e43
2024-09-19 19:44:53 +00:00
Wang, Yanyao c064218637 Remove hard-coded llvm-project folder for rocrtst
Signed-off-by: Wang, Yanyao <yanyao.wang@amd.com>
Change-Id: I9ba81c1182da812596d7d314f3a6dae7cbcd0c2d
2024-09-19 19:44:53 +00:00
Jonathan Kim 509e8d863a rocr: Reverse host-device copy engines on GFX94x
GFX 9.4.x has better performance for CPU-GPU copies when using
engines in reverse order from other devices.

Change-Id: I1eaebf0e837bb7f44712f40d5115df618f6a73d7
2024-09-06 19:02:59 -04:00
Jonathan Kim 24b25003b0 rocr: Fix backwards compatible host-device copies on target engines
If the KFD doesn't support targeting SDMA engines, ensure that ROCr
selects the correct downstream queue type by using an invalid engine.

Change-Id: Ia6848126f67f3d35ab37248633e8e0e6e2d77fff
2024-09-06 19:02:51 -04:00
Kent Russell 3da42a0847 libhsakmt: Prefix global symbols with hsakmt
To support fully-static library ROCm builds, ensure that all global
symbols are prefixed with something meaningful to avoid collisions with
other libraries

A script was made using" objdump -C -t" to get a list of symbols,
then checking if the global symbols have a meaningful prefix (for thunk:
hsakmt or kmt in various cases)

Change-Id: Ifd353f64a3344eb60d1f6c4e041aa20967b38a59
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-06 09:56:07 -04:00
Chris Freehill a676d8639c Support for not building ROCr
Add cmake variable BUILD_ROCR so that user can elect to not
build ROCr.

Change-Id: I73bd28cde9430ba86aed50fb88ec2e42b3443dbb
2024-09-05 23:27:54 -04:00
Saleel Kudchadker 3baaa6e9c0 rocr: Allocate AQL queue on device memory
- Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device
memory.
- Before writing AQL packet header to the queue use an SFENCE to ensure
that there is no reodering of the writes over PCIE

Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d
2024-09-05 17:48:02 -04:00
David Yat Sin fe8d8c15f1 kfdtest: Fix ISA buffers not executable
Fix for some places where the ISA buffers are not declared as
executable. Previous code in Thunk was blindly setting exec bit on all
memory allocations so this issue was masked.

Change-Id: Ic7a1169c69fb85ff9e8ea7bcc49a1845b37c08ff
2024-09-05 16:57:34 -04:00
Kent Russell 545467be04 kfdtest: Check for NULL at MCABackend creation
The function can return NULL if it fails to create the backend, so check
for NULL before using it.

Change-Id: I4d6501bffd6dd0fc0d0f2224720f7d6dca1646f3
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-05 12:07:11 -04:00
Chris Freehill 7e13b9e62f hsakmt: Fix ROCr static lib build in new layout
Change-Id: Idc71524924b96a44d63be9b1d0fccbe0e328d96e
2024-09-05 10:26:06 -04:00
Kent Russell 4dc9d49aa6 hsakmt: Free alloc'd memory
trace is calloc'd but never freed. Free it.

Change-Id: I5795cbe5738f25a9621d24be86abb35c263fa8b7
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-05 10:20:09 -04:00
Shane Xiao 821f6e58f9 Revert "gfx11 is able to perform atomic ops even PCI reports no atomic support."
This reverts commit 9f0f7741de.
For APU, the PCIe atomic is supported by default. However, the PCIe
atomic feature needs to checked for dGPU. The kfd driver has already
set PCIe atomic support for APUs, so this patch can be reverted.

Change-Id: I131d5b8e095c1104e1695e7cf8b1ed178bccddde
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
2024-09-05 01:44:16 -04:00
Jeremy Newton c574c81835 kfdtest: Drop sp3 licensing comments
This is obsolete and can be dropped.

Change-Id: I4ed7d22567043f9cca39879a82e5ea945c27efc1
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2024-09-04 11:20:38 -04:00
Xuanteng Huang 7a52a45824 hsakmt: fix spelling error
This was pulled in from:
https://github.com/ROCm/ROCT-Thunk-Interface/pull/107

Change-Id: Ic30e4552a94a212a9cd138f9311b1c85b0c13867
2024-09-04 10:46:39 -04:00
Tom Rix b9c6144f23 kfdtest: Improve finding rocm-smi
On Fedora, rocm-smi is a standard package and is installed to /usr/bin
So when run_kfdtest.sh is run this error is produced

find: ‘/opt/rocm*’: No such file or directory

First redirect stderr to dev/null on the original search.
Then fall back to either looking for rocm-smi in BIN_DIR or
look for it in the PATH.

Change-Id: I389ed0b9a4a4507263c9eb19894b25326c9a4222
Signed-off-by: Tom Rix <Tom.Rix@amd.com>
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2024-09-04 09:20:32 -04:00
Joseph Greathouse 75143555fa hsakmt: Only set exec flag when requested
Previous code would blindly set executable bit on all allocations.

Change-Id: Id1154f08f6ba21c633905fd46b06053994d6f3cc
2024-09-03 15:13:56 -04:00
Jeremy Newton 8fd1b14a42 Fix permissions on kfdtest
Using "PROGRAMS" and "FILES" without specifying permissions will
automatically select the right permissions.

PROGRAMS is used for executables, FILES is used for data files

Change-Id: I0fb6eff257a8f936848bd648cf877da6dc0b6906
2024-09-03 14:11:01 -04:00
David Yat Sin 4ba4867fa5 rocrtst: Fix segfault on p2p copies
Fix segfault on p2p copies when 2 agents cannot access each other's
memory (usually because the PCI BAR's are out of range). The
AcquireAsyncCopyAccess function should return NULL in that case, so that
the test can be skipped.

Change-Id: If018f3609dd21a01c56eaec94de3bca52c385c4d
2024-09-03 14:06:48 -04:00
David Yat Sin f505444aaf Adding clang format files and strings
Adding files to auto-format code using:
bash format

Change-Id: I6a9edf3ff4d1e6102a44c4106e646ff9d63340cc
2024-08-29 15:43:41 -04:00
Tony Gutierrez 68669f4e1a rocr: Generalize AMD::MemoryRegion Allocate and Free
Remove KFD-specific Allocate/Free calls from the AMD::MemoryRegion.
The KFD-driver-specific Allocate/Free calls are now implemented in
the KfdDriver. Future changes will migrate the remaining KFD-specific
calls out of AMD::MemoryRegion.

This allows the MemoryRegion to be used across AMD drivers like the
XDNA driver.

Change-Id: Ib6a2a9e5e1a15e61644d2592beb3a8e6578c3010
2024-08-28 14:35:07 -07:00
Tony Gutierrez c42ff44a6a rocr/kfd-driver: Add initial KFD driver interface
Adds the initial KFD driver interface and use it to open the
KFD from amd_topology.cpp.

This change is to show the direction of the Driver interface for
initially supporting the KFD and to get feedback on the approach.
For now we wrap relevant ROCt calls behind this generic driver
interface so that we can generalize core ROCr components like
MemoryRegion, Runtime, etc.

Now that ROCt is incorporated into ROCr, we can more fully integrate
ROCt into the Driver interface. Ideally, we get to a point where
the generic Driver interface can support KFD, XDNA, and potential
future drivers.

Change-Id: I4573fd6af1f8398233ee9d3814d9f3139dd0279c
2024-08-28 14:34:54 -07:00
Tony Gutierrez 86f40ae489 rocr/xdna-driver: Initial support for amdxdna driver
Change-Id: I319b55d89dc644e7151228cb6c19d1a633171295
2024-08-28 14:34:39 -07:00
David Yat Sin 2360253b3b rocrtst: Skip inaccessible agents when importing dmabuf
If some agents cannot access the memory buffer directly, this will cause
the hsa_amd_interop_map_buffer API call to fail

Change-Id: If2f0e1735c2926440d657831de50775d7f304c8e
2024-08-28 15:58:02 -04:00
Shweta Khatri da69ffff0f Set internal cache for rocprofiler-register dependency
Change-Id: I8a661818c11c4de0df9743dacb78b7c5163b6da9
2024-08-28 14:48:51 -04:00
Jonathan Kim ae99effb29 libhsakmt: Fix improper type range check in legacy queue creation
Enum type for compute AQL is defined as larger then targeted SDMAs
enum types.  We should only deny legacy calls for SDMA queues that
require targeted engines.

Change-Id: I6386a8700b3b18af825b6f0d2be27052cc8de0f5
2024-08-28 13:55:41 -04:00
Tony Gutierrez 8ea62f1cea rocr/aie: Add initial support for AIE agents
This change adds the initial classes for the AIE agent and AIE AQL
queue.

An AIE agent list is added to the core runtime object.

Change-Id: I84b02f52171b80726dfb2c8431582a3ea2986eb3
2024-08-27 14:47:05 -07:00
David Yat Sin cb672ebcd1 Set ELF_GETSHDRSTRNDX when cxx compiler is not loaded
Change-Id: Ia26b8999909f688ce78d9bbe4cb2a7262df2ee02
2024-08-22 17:20:37 -04:00
Joseph Macaranas be31cca4df External CI: Add support for ROCR+Thunk combined repo.
Change-Id: Ib2305d6ed81f29d146c73a4063e08671c8a8273a
2024-08-22 12:40:28 -04:00
David Yat Sin c8dd4d2b3b rocr: Handle pthread_create returning errors
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.

Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06
2024-08-22 12:15:10 -04:00
Lancelot SIX d5acab2b39 libhsakmt: Check for KFD 1.13 for debug ioctl interface
Core dump support relies on debugger related KFD ioctl which have been
introduced in version 1.13 of the interface.  However, the code checks
for KFD_IOCTL_MINOR_VERSION (currently 17), making it impossible to
produce core dumps when using some drivers that should support it.

Update the CHECK_KFD_MINOR_VERSION calls in the debugger related ioctl
wrappers and look for KFD 1.13 or above.

Change-Id: I10a7fd03bf8f678b6318d7c25d6a7ded804dac67
2024-08-21 23:45:25 +01:00
Lancelot SIX 3475a45137 rocr/amd_core_dump: Fix "arithmetic on a pointer to void"
A recent patch introduced a build failure when building with Clang:

    [ 65%] Building CXX object runtime/hsa-runtime/CMakeFiles/hsa-runtime64.dir/libamdhsacode/amd_core_dump.cpp.o
    […]/runtime/hsa-runtime/libamdhsacode/amd_core_dump.cpp:271:29: error: arithmetic on a pointer to void
      271 |       read = pread(fd_, buf + done, buf_size - done,
          |                         ~~~ ^
    1 error generated.

This patch fixes this by making sure the "void *" pointer is converting
to "char *" before doing arithmetic on it.

Change-Id: Ib1663ed30abce76e05f06d042975eccd7d729823
2024-08-21 17:19:28 -04:00
Jonathan Kim eb30a5bbc7 rocr: Memory copy based on recommended SDMA engines
Recommended SDMA engines for DMA copies are now exposed for better
GPU-GPU performance. ROCr can now select those DMA engines.

Also lock-in host-device copies to SDMA0 and device-host copies to
SDMA1 for better stability and performance.

Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5
2024-08-20 16:22:32 -04:00
Jonathan Kim 2f588a2406 libhsakmt: Extend thunk queue creation with recommended sdma engines
Extend the current Thunk implementation of queue creation to target
specific SDMA engine IDs.

Also expose the new recommend SDMA engines per IO link from the KFD
sysfs.

Change-Id: I51f9a0d83c0f1fc4d5dc837f879a7ae332e7d7e9
2024-08-20 11:13:57 -04:00
Yifan Zhang 3f1f68c8cb libhsakmt: add OverrideEngineId property
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.

Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
2024-08-20 09:10:52 -04:00
Lancelot Six 123b2c080a rocr/trap_handler_gfx12: Properly ignore HOST_TRAP and debug_trap
The current trap handler has 2 limitations:
1) If it receives a HOST_TRAP, it clears the corresponding bit
   and notifies the host, when it should not.
2) When it is entered because of a debug trap (s_trap 3) and the
   debugger is not attached, it returns unconditionally.  However,
   if another exception is reported at the same time as the trap
   handler is entered for the debug trap (a memory violation for
   example), that other exception ends-up being ignored.

This patch addresses both of those issues.  It makes it so host traps
and debug traps are ignored when necessary.  If any other exception is
reported to the wave, we halt the wave and notify the host, and if no
other exception is reported (i.e. we entered the trap handler because of
host trap or debug trap), we return to shader code.

Other minor defects are also fixed during this refactor:
- Fixed SQ_WAVE_EXCP_FLAG_PRIV_XNACK_ERROR_SHIFT which had an incorrect
  value
- Host traps can be sent at any time, including after we have halted a
  wave.  In such case, the old approach would have:
  1) cleared the trap ID saved in ttmp6
  2) clobbered ttmp10 where part of the actual wave's PC is saved.

Change-Id: I9ecd341f4967e686233dec182b3e5b0388ef19bd
2024-08-19 21:22:13 -04:00
David Yat Sin 88eaa834d0 Separate AsyncEventsLoop into two separate threads
This fixes an issue for missing HW events when out of HW events.

We cannot determine whether a HW event has occurred unless we call the
underlying drivers with hsaKmtWaitOnMultipleEvents_Ext. Previous logic
in Signal::WaitAny would switch to ACTIVE_WAIT state if we run out of
hardware events (signal->EopEvent() == NULL) and this would cause the
hsaKmtWaitOnMultipleEvents_Ext call to be skipped. But also, when we
have some signals without hardware events, calling
hsaKmtWaitOnMultipleEvents_Ext with a timeout of 0 so that we can poll
for remaining signals adds overhead with an IOCTL call and may cause
extra delay. Separating AsyncEventLoop into two separate threads so
that:

1. We can have a new Signal::WaitAnyExceptions to wait for HW events
This function can be simpler as it does not have to perform all the
timer calculations because it is expected to be always waiting on
hsaKmtWaitOnMultipleEvents_Ext through the lifetime of a process.

2. Signal::WaitAny does not need to have extra code to check for HW
exceptions as it only needs to handle HSA_EVENTTYPE_SIGNAL events. It
can also skip the calls to hsaKmtWaitOnMultipleEvents_Ext if needed.

Change-Id: I52ba99fd6e483e0cb477b7931a0dcc03520aa523
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 17:54:11 -04:00
David Yat Sin 56ba584a22 rocr: Delete internal CP queues in GPU agent destructor
Delete queues used internally in agent destructor to make sure any
memory allocated by the queue objects are freed before the agent memory
regions are destroyed.

Change-Id: I4768c9cf66f77ac00a5a355f373f7f22dc266e47
2024-08-19 17:16:46 -04:00
David Yat Sin 4ffa325c08 libhsakmt: Add two symbols to global symbols
For users still using non-static hsakmt

Change-Id: I12b1c25f0d952ed9178529cadc518c57c1aeb06d
2024-08-19 14:56:00 -04:00
David Yat Sin 921471bd94 Raise system error when memory free is denied
If user application tries to free memory that is currently being used by
the underlying HW device, the hsaKmtFreeMemory function call will fail.

This would be caused by an incorrect call by the user application. A
system memory error is raised and the user application is expected to
abort when this happens.

Note: This leaves the allocation_map_ table in an inconsistent state as
this address entry is removed from it while the pointer is not actually
free'd. But re-organising the FreeMemory() function would require the
memory_lock_ to be held for much longer and may affect performance.
Since this is a very unlikely and invalid use case, we prefer to leave
the FreeMemory() function as is.

Change-Id: I24279eb98620c32d34f4c5ad1b7a0a30cb65835d
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 14:03:12 -04:00
David Yat Sin aae4dab88e Do not generate coredump on VM fault signal event
Skip coredump generation when receiving HSA_STATUS_ERROR_MEMORY_FAULT.
We also receive a system error of type HSA_EVENTTYPE_MEMORY and generate
the coredump there. Trying to generate coredump from 2 places sometimes
causes unnecessary error message because both places try to create a
coredump file with the same name.

Change-Id: If3f03bab2c24ad71dfeff39ab411bb9ac08b337e
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
2024-08-19 13:21:26 -04:00