Add hsaKmtRuntimeEnable and disable.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: I083f9293948e975546a1b3c1334cb41499b9ab1f
The debugger and debug agent no longer use the Thunk API.
Remove all deprecated functions and keep commented
references for future KFD tests.
Update and the keep the version checks for future use
and hsaKmtRuntimeEnable/Disable.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: Ia2f10d82f5ac36d0bd1bda233810f26e8a154d55
Update hsaKmtCreateQueue to initialize the new save area header with the
exception payload and event ID.
Signed-by-off: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: Icd38062dc982cb29b30644699014eeb0b3e26d00
__fmm_release is sometimes called with the aperture lock, and sometimes
without. Consistently call it with the aperture lock held and remove the
lock/unlock calls from this function.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I80dddc64cc0703e5eed8e9f1eb65b75a2c7ae2eb
Unlock mutex if MMIO mapping fails. This happens on all GFXv8 GPUs.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I1dee1cbddefd9185c24ea79377f49f8ae2c5ff57
If the devices aren't peer-accessible, we shouldn't try to run a test
that requires that the devices be peer-accessible. Thus, add a check in
MapVramToGPUNodesTest to check for peer accessibility before executing
the peer mappings.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ib79b141f8c1ac6d85f5ab49d62af62ec10b988b7
Test Thunk multiple threads register and deregister same userptr race
condition, to emulate application register same userptr to multiple
GPUs using multiple threads.
Use thread barrier to sync the threads, to start register userptr at
same time.
Change-Id: I6723dc39f75908026fa14a490e39e1fe49a13a1b
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Limit test buffer size to 3/4 total VRAM size, and max 1GB.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I937e10b0a6bd8215e3865b50f22ce75b3982a6f7
Aperture locking is too fine-grained, it has race between find userptr
and allocate userptr object.
Change _fmm_allocate_device and fmm_allocate_memory_object to not take
the aperture lock, the callers take it, this implements an atomic find
userptr or allocate a new one.
Change-Id: I6773404e22c1f4382a211c5a9817df23c5534a2a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Certain special signals do not carry their updates via their signal
value. These signals are wrappers around special KFD events, of
which the only current instance informs about VM faults. We either
need to check each signal for this special event type or rely on
the checking done in hsa_amd_signal_wait_any. Since there will always
be a small number of these signals it doesn't make much since to
penalize the performance path with this check. Additionally we know
that the signal indicated by hsa_amd_signal_wait_any is satisfied so
don't need to recheck it's conditions.
Change-Id: I9fc6298300ad543d823ecd28ca8fab4ad26c23ef
Clang now warns about set but unused variables. It also now
recognizes -Wno-error=unused-but-set-variable so this patch moves
that option back to the general options list.
Change-Id: Id800e87eb688b9441b14380e2246ad586179f31a
This is causing PSDB/OSDB failures so disable it until investigation is
done
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I666cd45fdf8ae585486adc7cf43eacd1700704bb
Allows determining if the host can directly access HMM memory that
is physically resident in vram.
Change-Id: Ie452eedd0e27fe1b511afd416f5a1cd01b3d84e8
To test ACCESS_IN_PLACE GPU mapping update to system memory.
Change-Id: I5b990215f39692e829128d848125e1ae0d571e03
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
CoherentHostAccess flag member moved from HSA_MEMORYPROPERTY
to HSA_CAPABILITY struct. Now this is reported to the
topology as a capability of the device instead of a device
memory property.
Change-Id: I48e43e4b4a0635b711b62933734587facdfbf88b
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Enables the fragment allocator to handle >2MB allocations, maintaining
good TLB alignment. Prior code contained a bug that caused the effective
API granule for vram allocations >2MB to be bumped to 2MB.
Also adjusts the block cache's block retention heuristic to not
count discarded blocks as in use. This will reduce block retention
when a significant amount of large blocks or IPC is in use.
Change-Id: I30bd85eb87951df822211f799d9cfe579ab109c6
move blacklisted test case from gfx902 iommuv2 to dgpu path.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I8b101226ca8dcd0c12c484f5f6ce12fe73a75bdc
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
(cherry picked from commit 9cf4377572321396225950b9a58beb549120c2a3)
Under high async handler load signal retention and event sorting
become bottlenecks. This change processes more handlers in a
single pass to amortize wait_any overheads.
Change-Id: I8b276e102db647e3858e120547aa0c6fca85ab4c
it is to optimize memory allocation latency, which
changes alignment from 2MB to 1GB.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I7818e9f13b17e2c0992e75b17f978dc03a018a57
Device cgroup can limit accessible devices. Handle the cases where
p2p_links are not accessible
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I513dc75ad14e4f2d426cf2fbd301bcba12b4ee54
blacklist some svm related test cases until they are solved.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I05e2d965d89bcbf3d43bed2873297e98ad0738ef
It needs to skip LocalMemoryTest because it doesn't support local memory
with no dgpu path.
Change-Id: Iedb6f6deba55e239b21747d933cf2d7005623106
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
The updated sp3 compiler does not support GFX10 temperaly.
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: Idd9336663814b7925d9742eee0bd310d00945d3e
Fixes assembler error. The SP3 backend if already set to FamilyId.
Change-Id: I7721a555b05688b16993a03242a765694594825a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Increasing the timeout will avoid some test failures. This shouldn't
mask any issues as any incomplete shaders should still hang and would
just time out at 180 sec instead of 120 sec.
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If4e893ab80d9d159bd0b8b112aa7574abc5e4f44
Old memory properties info name used after removing branches.
This caused the CPU coarse grain pool to initialize with random
bits.
Change-Id: I397bc5ecf09fab69bdf1d7fafadcf54d71b64070
Prevents poorly written tools which throw in tools interface
callbacks from causing ROCr to catch and return a generic error
code.
Change-Id: I2f5bf7104dc7d4ee688eb48423c7ffdb06bd7702
amdgpu_cs_submit can fail intermittently if another process has too much
memory reserved at the time. Allow a small percental of command
submissions to fail to make the test more robust.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If9f62b2b6f67be71420016d4e38d4dd6b6bca9a5
Delayed page faults from a terminated process can be attributed to the
next process with the same PASID. Work around that by adding a delay
after the Exception tests to allow the kernel to clean up any fault
storms before the next test.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Id310c13ea9eb92b04d37b95d91a0dd60bd9954e5
If the signal arrives too late, it interrupts waitpid. Handle this
situation gracefully.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If4925c352c81ba7fef8a940460b91f5e720b451e