GFX 9.4.x has better performance for CPU-GPU copies when using
engines in reverse order from other devices.
Change-Id: I1eaebf0e837bb7f44712f40d5115df618f6a73d7
[ROCm/ROCR-Runtime commit: 509e8d863a]
If the KFD doesn't support targeting SDMA engines, ensure that ROCr
selects the correct downstream queue type by using an invalid engine.
Change-Id: Ia6848126f67f3d35ab37248633e8e0e6e2d77fff
[ROCm/ROCR-Runtime commit: 24b25003b0]
To support fully-static library ROCm builds, ensure that all global
symbols are prefixed with something meaningful to avoid collisions with
other libraries
A script was made using" objdump -C -t" to get a list of symbols,
then checking if the global symbols have a meaningful prefix (for thunk:
hsakmt or kmt in various cases)
Change-Id: Ifd353f64a3344eb60d1f6c4e041aa20967b38a59
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 3da42a0847]
Add cmake variable BUILD_ROCR so that user can elect to not
build ROCr.
Change-Id: I73bd28cde9430ba86aed50fb88ec2e42b3443dbb
[ROCm/ROCR-Runtime commit: a676d8639c]
- Use HSA_ALLOCATE_QUEUE_DEV_MEM=1 to create AQL queue in device
memory.
- Before writing AQL packet header to the queue use an SFENCE to ensure
that there is no reodering of the writes over PCIE
Change-Id: I5eacdc35108c4a1e245c75ae349b7495451aa60d
[ROCm/ROCR-Runtime commit: 3baaa6e9c0]
Fix for some places where the ISA buffers are not declared as
executable. Previous code in Thunk was blindly setting exec bit on all
memory allocations so this issue was masked.
Change-Id: Ic7a1169c69fb85ff9e8ea7bcc49a1845b37c08ff
[ROCm/ROCR-Runtime commit: fe8d8c15f1]
The function can return NULL if it fails to create the backend, so check
for NULL before using it.
Change-Id: I4d6501bffd6dd0fc0d0f2224720f7d6dca1646f3
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 545467be04]
trace is calloc'd but never freed. Free it.
Change-Id: I5795cbe5738f25a9621d24be86abb35c263fa8b7
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 4dc9d49aa6]
This reverts commit 7daedd5eef.
For APU, the PCIe atomic is supported by default. However, the PCIe
atomic feature needs to checked for dGPU. The kfd driver has already
set PCIe atomic support for APUs, so this patch can be reverted.
Change-Id: I131d5b8e095c1104e1695e7cf8b1ed178bccddde
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
[ROCm/ROCR-Runtime commit: 821f6e58f9]
This is obsolete and can be dropped.
Change-Id: I4ed7d22567043f9cca39879a82e5ea945c27efc1
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
[ROCm/ROCR-Runtime commit: c574c81835]
On Fedora, rocm-smi is a standard package and is installed to /usr/bin
So when run_kfdtest.sh is run this error is produced
find: ‘/opt/rocm*’: No such file or directory
First redirect stderr to dev/null on the original search.
Then fall back to either looking for rocm-smi in BIN_DIR or
look for it in the PATH.
Change-Id: I389ed0b9a4a4507263c9eb19894b25326c9a4222
Signed-off-by: Tom Rix <Tom.Rix@amd.com>
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
[ROCm/ROCR-Runtime commit: b9c6144f23]
Previous code would blindly set executable bit on all allocations.
Change-Id: Id1154f08f6ba21c633905fd46b06053994d6f3cc
[ROCm/ROCR-Runtime commit: 75143555fa]
Using "PROGRAMS" and "FILES" without specifying permissions will
automatically select the right permissions.
PROGRAMS is used for executables, FILES is used for data files
Change-Id: I0fb6eff257a8f936848bd648cf877da6dc0b6906
[ROCm/ROCR-Runtime commit: 8fd1b14a42]
Fix segfault on p2p copies when 2 agents cannot access each other's
memory (usually because the PCI BAR's are out of range). The
AcquireAsyncCopyAccess function should return NULL in that case, so that
the test can be skipped.
Change-Id: If018f3609dd21a01c56eaec94de3bca52c385c4d
[ROCm/ROCR-Runtime commit: 4ba4867fa5]
Remove KFD-specific Allocate/Free calls from the AMD::MemoryRegion.
The KFD-driver-specific Allocate/Free calls are now implemented in
the KfdDriver. Future changes will migrate the remaining KFD-specific
calls out of AMD::MemoryRegion.
This allows the MemoryRegion to be used across AMD drivers like the
XDNA driver.
Change-Id: Ib6a2a9e5e1a15e61644d2592beb3a8e6578c3010
[ROCm/ROCR-Runtime commit: 68669f4e1a]
Adds the initial KFD driver interface and use it to open the
KFD from amd_topology.cpp.
This change is to show the direction of the Driver interface for
initially supporting the KFD and to get feedback on the approach.
For now we wrap relevant ROCt calls behind this generic driver
interface so that we can generalize core ROCr components like
MemoryRegion, Runtime, etc.
Now that ROCt is incorporated into ROCr, we can more fully integrate
ROCt into the Driver interface. Ideally, we get to a point where
the generic Driver interface can support KFD, XDNA, and potential
future drivers.
Change-Id: I4573fd6af1f8398233ee9d3814d9f3139dd0279c
[ROCm/ROCR-Runtime commit: c42ff44a6a]
If some agents cannot access the memory buffer directly, this will cause
the hsa_amd_interop_map_buffer API call to fail
Change-Id: If2f0e1735c2926440d657831de50775d7f304c8e
[ROCm/ROCR-Runtime commit: 2360253b3b]
Enum type for compute AQL is defined as larger then targeted SDMAs
enum types. We should only deny legacy calls for SDMA queues that
require targeted engines.
Change-Id: I6386a8700b3b18af825b6f0d2be27052cc8de0f5
[ROCm/ROCR-Runtime commit: ae99effb29]
This change adds the initial classes for the AIE agent and AIE AQL
queue.
An AIE agent list is added to the core runtime object.
Change-Id: I84b02f52171b80726dfb2c8431582a3ea2986eb3
[ROCm/ROCR-Runtime commit: 8ea62f1cea]
Rewriting logic to fix issue where pthread_create would return errors
other than EINVAL, and these errors would be ignored.
Change-Id: I573958724dcf886c20e8c14e6a9182303b3ffa06
[ROCm/ROCR-Runtime commit: c8dd4d2b3b]
Core dump support relies on debugger related KFD ioctl which have been
introduced in version 1.13 of the interface. However, the code checks
for KFD_IOCTL_MINOR_VERSION (currently 17), making it impossible to
produce core dumps when using some drivers that should support it.
Update the CHECK_KFD_MINOR_VERSION calls in the debugger related ioctl
wrappers and look for KFD 1.13 or above.
Change-Id: I10a7fd03bf8f678b6318d7c25d6a7ded804dac67
[ROCm/ROCR-Runtime commit: d5acab2b39]
A recent patch introduced a build failure when building with Clang:
[ 65%] Building CXX object runtime/hsa-runtime/CMakeFiles/hsa-runtime64.dir/libamdhsacode/amd_core_dump.cpp.o
[…]/runtime/hsa-runtime/libamdhsacode/amd_core_dump.cpp:271:29: error: arithmetic on a pointer to void
271 | read = pread(fd_, buf + done, buf_size - done,
| ~~~ ^
1 error generated.
This patch fixes this by making sure the "void *" pointer is converting
to "char *" before doing arithmetic on it.
Change-Id: Ib1663ed30abce76e05f06d042975eccd7d729823
[ROCm/ROCR-Runtime commit: 3475a45137]
Recommended SDMA engines for DMA copies are now exposed for better
GPU-GPU performance. ROCr can now select those DMA engines.
Also lock-in host-device copies to SDMA0 and device-host copies to
SDMA1 for better stability and performance.
Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5
[ROCm/ROCR-Runtime commit: eb30a5bbc7]
Extend the current Thunk implementation of queue creation to target
specific SDMA engine IDs.
Also expose the new recommend SDMA engines per IO link from the KFD
sysfs.
Change-Id: I51f9a0d83c0f1fc4d5dc837f879a7ae332e7d7e9
[ROCm/ROCR-Runtime commit: 2f588a2406]
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.
Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: 3f1f68c8cb]
The current trap handler has 2 limitations:
1) If it receives a HOST_TRAP, it clears the corresponding bit
and notifies the host, when it should not.
2) When it is entered because of a debug trap (s_trap 3) and the
debugger is not attached, it returns unconditionally. However,
if another exception is reported at the same time as the trap
handler is entered for the debug trap (a memory violation for
example), that other exception ends-up being ignored.
This patch addresses both of those issues. It makes it so host traps
and debug traps are ignored when necessary. If any other exception is
reported to the wave, we halt the wave and notify the host, and if no
other exception is reported (i.e. we entered the trap handler because of
host trap or debug trap), we return to shader code.
Other minor defects are also fixed during this refactor:
- Fixed SQ_WAVE_EXCP_FLAG_PRIV_XNACK_ERROR_SHIFT which had an incorrect
value
- Host traps can be sent at any time, including after we have halted a
wave. In such case, the old approach would have:
1) cleared the trap ID saved in ttmp6
2) clobbered ttmp10 where part of the actual wave's PC is saved.
Change-Id: I9ecd341f4967e686233dec182b3e5b0388ef19bd
[ROCm/ROCR-Runtime commit: 123b2c080a]
This fixes an issue for missing HW events when out of HW events.
We cannot determine whether a HW event has occurred unless we call the
underlying drivers with hsaKmtWaitOnMultipleEvents_Ext. Previous logic
in Signal::WaitAny would switch to ACTIVE_WAIT state if we run out of
hardware events (signal->EopEvent() == NULL) and this would cause the
hsaKmtWaitOnMultipleEvents_Ext call to be skipped. But also, when we
have some signals without hardware events, calling
hsaKmtWaitOnMultipleEvents_Ext with a timeout of 0 so that we can poll
for remaining signals adds overhead with an IOCTL call and may cause
extra delay. Separating AsyncEventLoop into two separate threads so
that:
1. We can have a new Signal::WaitAnyExceptions to wait for HW events
This function can be simpler as it does not have to perform all the
timer calculations because it is expected to be always waiting on
hsaKmtWaitOnMultipleEvents_Ext through the lifetime of a process.
2. Signal::WaitAny does not need to have extra code to check for HW
exceptions as it only needs to handle HSA_EVENTTYPE_SIGNAL events. It
can also skip the calls to hsaKmtWaitOnMultipleEvents_Ext if needed.
Change-Id: I52ba99fd6e483e0cb477b7931a0dcc03520aa523
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 88eaa834d0]
Delete queues used internally in agent destructor to make sure any
memory allocated by the queue objects are freed before the agent memory
regions are destroyed.
Change-Id: I4768c9cf66f77ac00a5a355f373f7f22dc266e47
[ROCm/ROCR-Runtime commit: 56ba584a22]
If user application tries to free memory that is currently being used by
the underlying HW device, the hsaKmtFreeMemory function call will fail.
This would be caused by an incorrect call by the user application. A
system memory error is raised and the user application is expected to
abort when this happens.
Note: This leaves the allocation_map_ table in an inconsistent state as
this address entry is removed from it while the pointer is not actually
free'd. But re-organising the FreeMemory() function would require the
memory_lock_ to be held for much longer and may affect performance.
Since this is a very unlikely and invalid use case, we prefer to leave
the FreeMemory() function as is.
Change-Id: I24279eb98620c32d34f4c5ad1b7a0a30cb65835d
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 921471bd94]
Skip coredump generation when receiving HSA_STATUS_ERROR_MEMORY_FAULT.
We also receive a system error of type HSA_EVENTTYPE_MEMORY and generate
the coredump there. Trying to generate coredump from 2 places sometimes
causes unnecessary error message because both places try to create a
coredump file with the same name.
Change-Id: If3f03bab2c24ad71dfeff39ab411bb9ac08b337e
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: aae4dab88e]
This patch adds output (to stderr) to indicate step in the core dump
creation failed to improve debuggability.
Change-Id: I349692e278c2d744136d7fba7f7c2e5a7ada0c06
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 3646064a0e]
It is possible for the runtime to receive an interrupt while trying to
access VRAM data using /proc/self/mem. In such case, pread(2) would
return -1 and set errno to -EINTR. This is not an error case, the
pread(2) call just need to be restarted, however current implementation
would tread it as an error.
This patch changes the the implementation to correctly retry on EINTR.
While at it, this patch also handles cases where pread(2) reads less
data than originally requested.
Change-Id: I6a72fc5eda4afd90319f0d24b35c9eac6d1ff41c
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 3e0d3d6d61]
Force mem_flags to be explicit passed in then calling Queue constructor
to avoid ambiguity with calls to Queue constructor trying to only pass
the agent_node_id.
Change-Id: Ib6fedcb9e52d6c9f35f9051dfa989343456ca368
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 1d1d402dcc]
No functional change
Change-Id: Ibe97b03f62c4affcb60d3469312c8a0b6eb11391
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
[ROCm/ROCR-Runtime commit: 8176a8830f]
In rocrtst helper_funcs.h, a function argument that gets
written to was previously incorrectly marked as const.
Change-Id: If8cc6555ebfa974b9665d9d5b93de01bb45fde2c
[ROCm/ROCR-Runtime commit: 1c6a4a55f1]
Current DMABUF implemenation is unstable. Switch back to legacy
support for now.
Change-Id: I3be871f38c6524b0bcc9225bab61de4e57771efb
[ROCm/ROCR-Runtime commit: ea646cf958]
Currently, the only error type is HSA_AMD_MEMORY_ERROR_MEMORY_IN_USE,
which happens when a user application incorrectly tries to free memory
that is currently being used by underlying device hardware.
Change-Id: I8ce352eb9719694135fba1fa56d62368036b2e5e
Signed-off-by: Chris Freehill <cfreehil@amd.com>
[ROCm/ROCR-Runtime commit: 2853bf03f0]