Test if query userptr pointer info return correct alloc flags,
CoarseGrain by default.
Test if query hsaKmtAllocMemory pointer info return correct alloc
CoarseGrain flags.
Change-Id: If3a1175645717e5d7c475d6ff35b02d6876a1f7c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: c3c1618db7]
hsaKmtQueryPointerInfo return vm_obj flags for all below registered
memory types other than hsaKmtAllocMemory, and set the CoarseGrain flag
correctly for:
Graphics: always coarse grain.
Shared: hsaKmtShareMemory pass mflags with export handle to KFD to store
in KFD objs, hsaKmtRegisterSharedHandle get mflags from KFD with import
handle.
Userptr: it is already coarse-grain by default, or based on mflags
provided in hsaKmtRegisterMemoryWithFlags call.
Change-Id: Idc23e8b0cf599b02580737639da2f9ef4ccd0c0d
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: fa7b4a6268]
Query pointer info returns KFD_IOC_ALLOC_MEM_FLAGS_* flags, it should
return HsaMemFlags, fix it by renaming vm_obj->flags to mflags and
always saving HsaMemFlags.
Use consistent function parameter and variable name to avoid confusion:
mflags for HsaMemFlags and ioc_flags for KFD_IOC_ALLOC_MEM_FLAGS_*
flags.
AMDKFD_IOC_GET_DMABUF_INFO return ioc_flags, translate it to mflags
using new helper fmm_translate_ioc_to_hsa_flags.
Change-Id: If9e117c507139c0166abb1ab0df8c233ef7e48a1
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 2c796e62be]
0x73E3 DID was missing, add it.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Id1ae2f268e0e8b5cfec5ae2065153fe73854b93a
[ROCm/ROCR-Runtime commit: ed62c7aa1c]
sync with KFD ioctl version 1.6:
1.6 - Query clear flags in SVM get_attr API
Change import export handle args pad field to flags, to pass memory
alloc flags from alloc process to import process.
Change-Id: I69360b244651947e885c4a8da9f64a1163101d20
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: dee9c023a2]
kernel-headers provides the drm/drm.h path, while libdrm-dev[el]
provides the libdrm/drm.h path, which is what we want to use. Fix the
path so we use the newer drm.h header, as well as fixing SLES, which
doesn't provide drm.h in their kernel-headers.
Change-Id: Icb2b6643698d356169e3baeef17527a1b4e05483
[ROCm/ROCR-Runtime commit: 4f3440a8ac]
Update to thunk API introduced dependency on drm.h in commit
1001f27cb5 libhsakmt: update thunk api for exception handling
so update dependency list in SLES builds.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I6d987fac07612e3eca7b6087205d76df50dc13d9
[ROCm/ROCR-Runtime commit: 303c0748ce]
Under xnack we can now identify the queue which generated a vm fault.
This allows users to identify which queue, and therefore which
dispatch, a vm fault came from.
Change-Id: If72ff3de05800f2b811aa7842a15eedff8b5e45a
[ROCm/ROCR-Runtime commit: 59ee761f81]
ttmp6.packet_index is reported as 0 for all waves, regardless of the
dispatch packet position in the queue, due to an issue in the clearing
of the previous trap_id and saved status.halt bit.
Fixed TTMP6_SAVED_STATUS_HALT_MASK to only be one bit, 1<<29.
Change-Id: Ia4934e51123a40d71de658efc387a1f3a6344f05
[ROCm/ROCR-Runtime commit: ef1955ad42]
If left non-zero the event loop will keep reinvoking the callback,
preventing AqlQueue::ExceptionHandler from running.
Change-Id: If85fbaf62f04ffd327ecf9d649aa23afad4442ce
[ROCm/ROCR-Runtime commit: 8d4608ed0e]
Also fix hsaKmtRuntimeEnable error handling. Continue if ioctl fails.
Change-Id: I754ccba5910ccfef6f1ada1415593ef89ce33aba
[ROCm/ROCR-Runtime commit: 7e4088309d]
Add hsaKmtRuntimeEnable and disable.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: I083f9293948e975546a1b3c1334cb41499b9ab1f
[ROCm/ROCR-Runtime commit: 1ce548829b]
The debugger and debug agent no longer use the Thunk API.
Remove all deprecated functions and keep commented
references for future KFD tests.
Update and the keep the version checks for future use
and hsaKmtRuntimeEnable/Disable.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: Ia2f10d82f5ac36d0bd1bda233810f26e8a154d55
[ROCm/ROCR-Runtime commit: 31ac82617c]
Update hsaKmtCreateQueue to initialize the new save area header with the
exception payload and event ID.
Signed-by-off: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: Icd38062dc982cb29b30644699014eeb0b3e26d00
[ROCm/ROCR-Runtime commit: 96c7a5c9dc]
__fmm_release is sometimes called with the aperture lock, and sometimes
without. Consistently call it with the aperture lock held and remove the
lock/unlock calls from this function.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I80dddc64cc0703e5eed8e9f1eb65b75a2c7ae2eb
[ROCm/ROCR-Runtime commit: 5fac7dcc3b]
Unlock mutex if MMIO mapping fails. This happens on all GFXv8 GPUs.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I1dee1cbddefd9185c24ea79377f49f8ae2c5ff57
[ROCm/ROCR-Runtime commit: 19536080a8]
If the devices aren't peer-accessible, we shouldn't try to run a test
that requires that the devices be peer-accessible. Thus, add a check in
MapVramToGPUNodesTest to check for peer accessibility before executing
the peer mappings.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ib79b141f8c1ac6d85f5ab49d62af62ec10b988b7
[ROCm/ROCR-Runtime commit: bdfe3a12a8]
Test Thunk multiple threads register and deregister same userptr race
condition, to emulate application register same userptr to multiple
GPUs using multiple threads.
Use thread barrier to sync the threads, to start register userptr at
same time.
Change-Id: I6723dc39f75908026fa14a490e39e1fe49a13a1b
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 92076f6f1b]
This patch is to add yellow carp support on thunk.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: Icfecc3fd1f472c9924f934c6a5352448356d83df
[ROCm/ROCR-Runtime commit: a55551309c]
Limit test buffer size to 3/4 total VRAM size, and max 1GB.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I937e10b0a6bd8215e3865b50f22ce75b3982a6f7
[ROCm/ROCR-Runtime commit: fd131e875e]
Add a blacklist for gfx1xxx12, using the same list as gfx1012
Change-Id: I7e620dba8a36f6f89152a48066234884150a15dd
[ROCm/ROCR-Runtime commit: b2fb2a3470]
Warn that HSA_FORCE_ASIC_TYPE may be needed if the engine major id
assertion fails.
Change-Id: I67e01e99c3d1bdc84630ccfae489dce5e77961b5
[ROCm/ROCR-Runtime commit: 408fca0278]
Aperture locking is too fine-grained, it has race between find userptr
and allocate userptr object.
Change _fmm_allocate_device and fmm_allocate_memory_object to not take
the aperture lock, the callers take it, this implements an atomic find
userptr or allocate a new one.
Change-Id: I6773404e22c1f4382a211c5a9817df23c5534a2a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: c4d5ee28f0]
Certain special signals do not carry their updates via their signal
value. These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults. We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any. Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check. Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.
Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef
[ROCm/ROCR-Runtime commit: 3d6a18b67c]
Clang now warns about set but unused variables. It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.
Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a
[ROCm/ROCR-Runtime commit: 26808295f8]
This is causing PSDB/OSDB failures so disable it until investigation is
done
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I666cd45fdf8ae585486adc7cf43eacd1700704bb
[ROCm/ROCR-Runtime commit: 5796225011]
Allows determining if the host can directly access HMM memory that
is physically resident in vram.
Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8
[ROCm/ROCR-Runtime commit: 9e53cab613]
To test ACCESS_IN_PLACE GPU mapping update to system memory.
Change-Id: I5b990215f39692e829128d848125e1ae0d571e03
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 351a41ac76]
CoherentHostAccess flag member moved from HSA_MEMORYPROPERTY
to HSA_CAPABILITY struct. Now this is reported to the
topology as a capability of the device instead of a device
memory property.
Change-Id: I48e43e4b4a0635b711b62933734587facdfbf88b
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
[ROCm/ROCR-Runtime commit: f85b428265]
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment. Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.
Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use. This will reduce block retention
when a significant amount of large blocks or IPC is in use.
Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6
[ROCm/ROCR-Runtime commit: 8adbda1c18]
Add macro debug_warning_n to stop printing a message after
N instances.
Change-Id: Id5f84b11eb63b3a20bd2bcb2ea8f10a066b457ef
[ROCm/ROCR-Runtime commit: ca8387768e]
Under high async handler load signal retention and event sorting
become bottlenecks. This change processes more handlers in a
single pass to amortize wait_any overheads.
Change-Id: I8b276e102db647e3858e120547aa0c6fca85ab4c
[ROCm/ROCR-Runtime commit: 6b398eb72c]
it is to optimize memory allocation latency, which
changes alignment from 2MB to 1GB.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I7818e9f13b17e2c0992e75b17f978dc03a018a57
[ROCm/ROCR-Runtime commit: 973b35bc06]