Commit Graph

879 Commits

Author SHA1 Message Date
Kent Russell ed62c7aa1c libhsakmt: Add gfx1032 DID
0x73E3 DID was missing, add it.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Id1ae2f268e0e8b5cfec5ae2065153fe73854b93a
2021-07-22 07:37:19 -04:00
Philip Yang dee9c023a2 libhsakmt: update to KFD ioctl version 1.6
sync with KFD ioctl version 1.6:

1.6 - Query clear flags in SVM get_attr API

Change import export handle args pad field to flags, to pass memory
alloc flags from alloc process to import process.

Change-Id: I69360b244651947e885c4a8da9f64a1163101d20
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-07-21 21:48:18 -04:00
Icarus Sparry b63dde24d0 Add dependency on rocm-core
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
Change-Id: I5f99114e9186679585862f05db8a508663b74b0d
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
2021-07-21 15:57:54 -04:00
Kent Russell 4f3440a8ac Fix drm.h include path
kernel-headers provides the drm/drm.h path, while libdrm-dev[el]
provides the libdrm/drm.h path, which is what we want to use. Fix the
path so we use the newer drm.h header, as well as fixing SLES, which
doesn't provide drm.h in their kernel-headers.

Change-Id: Icb2b6643698d356169e3baeef17527a1b4e05483
2021-07-20 12:49:15 -04:00
Jonathan Kim 303c0748ce libhsakmt: add drm.h header dependency for sles
Update to thunk API introduced dependency on drm.h in commit
31ac82617c libhsakmt: update thunk api for exception handling
so update dependency list in SLES builds.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I6d987fac07612e3eca7b6087205d76df50dc13d9
2021-07-19 12:48:13 -04:00
Jonathan Kim 1ce548829b libhsakmt: add runtime enable and disable calls
Add hsaKmtRuntimeEnable and disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: I083f9293948e975546a1b3c1334cb41499b9ab1f
2021-07-16 18:37:41 -04:00
Jonathan Kim 31ac82617c libhsakmt: update thunk api for exception handling
The debugger and debug agent no longer use the Thunk API.
Remove all deprecated functions and keep commented
references for future KFD tests.

Update and the keep the version checks for future use
and hsaKmtRuntimeEnable/Disable.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Laurent Morichetti <laurent.morichetti@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: Ia2f10d82f5ac36d0bd1bda233810f26e8a154d55
2021-07-16 18:36:18 -04:00
Jonathan Kim 96c7a5c9dc libhsakmt: update create queue for exception handling
Update hsaKmtCreateQueue to initialize the new save area header with the
exception payload and event ID.

Signed-by-off: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Sean Keely <sean.keely@amd.com>
Change-Id: Icd38062dc982cb29b30644699014eeb0b3e26d00
2021-07-16 18:34:35 -04:00
Felix Kuehling 5fac7dcc3b libhsakmt: Fix deadlocks in __fmm_release
__fmm_release is sometimes called with the aperture lock, and sometimes
without. Consistently call it with the aperture lock held and remove the
lock/unlock calls from this function.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I80dddc64cc0703e5eed8e9f1eb65b75a2c7ae2eb
2021-07-12 18:27:55 -04:00
Felix Kuehling 19536080a8 libhsakmt: Fix deadlock in map_mmio
Unlock mutex if MMIO mapping fails. This happens on all GFXv8 GPUs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I1dee1cbddefd9185c24ea79377f49f8ae2c5ff57
2021-07-09 17:07:42 -04:00
Kent Russell bdfe3a12a8 kfdtest: Ensure devices are peer-accessible for peer mapping
If the devices aren't peer-accessible, we shouldn't try to run a test
that requires that the devices be peer-accessible. Thus, add a check in
MapVramToGPUNodesTest to check for peer accessibility before executing
the peer mappings.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: Ib79b141f8c1ac6d85f5ab49d62af62ec10b988b7
2021-07-09 15:45:01 -04:00
Philip Yang 92076f6f1b kfdtest: add KFDMemoryTest MultiThreadRegisterUserptrTest
Test Thunk multiple threads register and deregister same userptr race
condition, to emulate application register same userptr to multiple
GPUs using multiple threads.

Use thread barrier to sync the threads, to start register userptr at
same time.



Change-Id: I6723dc39f75908026fa14a490e39e1fe49a13a1b
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-07-07 17:52:31 -04:00
Aaron Liu ef9c532187 kfdtest: add yellow_carp blacklist
Signed-off-by: Chen Gong <curry.gong@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: Ib3a3172b0ac40109acbe42b9dc92517b3fedc84c
2021-07-07 09:47:05 +08:00
Aaron Liu a55551309c libhsakmt: add yellow carp support
This patch is to add yellow carp support on thunk.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: Icfecc3fd1f472c9924f934c6a5352448356d83df
2021-07-06 21:46:28 -04:00
Aaron Liu fd131e875e kfdtest: MigrateLargeBufTest support APU
Limit test buffer size to 3/4 total VRAM size, and max 1GB.

Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I937e10b0a6bd8215e3865b50f22ce75b3982a6f7
2021-07-06 21:44:23 -04:00
Kent Russell b2fb2a3470 kfdtest.exclude: Add NV12 blacklist
Add a blacklist for gfx1xxx12, using the same list as gfx1012

Change-Id: I7e620dba8a36f6f89152a48066234884150a15dd
2021-07-06 11:58:53 -04:00
Sean Keely 408fca0278 Add error message to assertion.
Warn that HSA_FORCE_ASIC_TYPE may be needed if the engine major id
assertion fails.

Change-Id: I67e01e99c3d1bdc84630ccfae489dce5e77961b5
2021-06-28 23:18:43 -04:00
changzhu 1a9604ad57 kfdtest: skip KFDSVMRangeTest.MigrateAccessInPlaceTest for gfx902 and gfx90c
Change-Id: I671440c212a07fdfdb1c4245b4551c6344eaedc6
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
2021-06-28 13:37:52 +08:00
Philip Yang c4d5ee28f0 libhsakmt: fix multiple threads register userptr race
Aperture locking is too fine-grained, it has race between find userptr
and allocate userptr object.

Change _fmm_allocate_device and fmm_allocate_memory_object to not take
the aperture lock, the callers take it, this implements an atomic find
userptr or allocate a new one.



Change-Id: I6773404e22c1f4382a211c5a9817df23c5534a2a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-06-25 14:16:20 -04:00
Kent Russell 5796225011 kfdtest: Remove EvictTest.BasicTest from gfx906
This is causing PSDB/OSDB failures so disable it until investigation is
done

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I666cd45fdf8ae585486adc7cf43eacd1700704bb
2021-06-17 17:07:22 -04:00
Philip Yang 351a41ac76 kfdtest: add KFDSVMRangeTest MigrateAccessInPlaceTest
To test ACCESS_IN_PLACE GPU mapping update to system memory.

Change-Id: I5b990215f39692e829128d848125e1ae0d571e03
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-06-14 11:03:09 -04:00
Alex Sierra f85b428265 libhsakmt: move CoherentHostAccess prop to HSA_CAPABILITY
CoherentHostAccess flag member moved from HSA_MEMORYPROPERTY
to HSA_CAPABILITY struct. Now this is reported to the
topology as a capability of the device instead of a device
memory property.

Change-Id: I48e43e4b4a0635b711b62933734587facdfbf88b
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
2021-06-10 22:21:17 -05:00
Yifan Zhang c24ed10dfa libhsakmt: add colon after KFDQMTest.SdmaConcurrentCopies
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: Ief14e513e4b09db0607f5533a55f80d3b0be017e
2021-06-07 18:21:59 +08:00
Yifan Zhang e72be0e54d kfdtest: Temporarily blacklist some svm related test cases for gfx902.
move blacklisted test case from gfx902 iommuv2 to dgpu path.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I8b101226ca8dcd0c12c484f5f6ce12fe73a75bdc
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
(cherry picked from commit 9cf4377572321396225950b9a58beb549120c2a3)
2021-06-06 23:07:29 -04:00
Alex Sierra 973b35bc06 libhsakmt: change memory allocation alignment
it is to optimize memory allocation latency, which
changes alignment from 2MB to 1GB.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: I7818e9f13b17e2c0992e75b17f978dc03a018a57
2021-06-01 11:33:16 -04:00
Harish Kasiviswanathan e28b3fe8b3 libhsakmt: Handle unaccessible p2p_links
Device cgroup can limit accessible devices. Handle the cases where
p2p_links are not accessible



Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I513dc75ad14e4f2d426cf2fbd301bcba12b4ee54
2021-05-25 12:01:44 -04:00
Yifan Zhang 9e0fc7f3c6 kfdtest: Temporarily blacklist some svm related test cases.
blacklist some svm related test cases until they are solved.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I05e2d965d89bcbf3d43bed2873297e98ad0738ef
2021-05-24 22:06:53 -04:00
changzhu 55cb03dbae kfdtest: skip KFDLocalMemoryTest.AccessLocalMem if not on dgpu path
It needs to skip LocalMemoryTest because it doesn't support local memory
with no dgpu path.

Change-Id: Iedb6f6deba55e239b21747d933cf2d7005623106
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
2021-05-19 11:33:08 -04:00
Chengming Gui b8ef20e35c kfdtest: Temp disable all shader test related cases due to sp3 compiler update
The updated sp3 compiler does not support GFX10 temperaly.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: Idd9336663814b7925d9742eee0bd310d00945d3e
2021-05-18 02:04:55 -04:00
Chengming Gui f28dbdf7bf kfdtest: Add Beige_Goby support
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: I3c9d4f8af1dbb4fd7ce7ff238426a4af61fd771f
2021-05-18 02:04:25 -04:00
Chengming Gui ce995fe48d libhsakmt: Prepare Beige_goby support
PCI IDs have yet to be added later.

Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: Ia0cbda17469b13fca807ce4eb74deae6f0d1eeac
2021-05-18 02:04:06 -04:00
Philip Yang 86a68b2774 kfdtest: Remove KFDSVMEvictTest.QueueTest GFX9 assembler meta
Fixes assembler error. The SP3 backend if already set to FamilyId.

Change-Id: I7721a555b05688b16993a03242a765694594825a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-05-14 10:44:30 -04:00
Kent Russell 9168dfe041 kfdtest: Increase timeout in EvictTest
Increasing the timeout will avoid some test failures. This shouldn't
mask any issues as any incomplete shaders should still hang and would
just time out at 180 sec instead of 120 sec.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: If4e893ab80d9d159bd0b8b112aa7574abc5e4f44
2021-05-12 14:06:03 -04:00
Mike Li 47ccc6604d Add Size of VGPR and SGPR to HsaNodeProperties
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I7e6c0c5b9fd90c0bb5f3b7d35362a073afdcf9b8
2021-05-03 15:16:15 -04:00
Felix Kuehling 8baf02e80b kfdtest: Allow some CS to fail in EvictTest
amdgpu_cs_submit can fail intermittently if another process has too much
memory reserved at the time. Allow a small percental of command
submissions to fail to make the test more robust.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If9f62b2b6f67be71420016d4e38d4dd6b6bca9a5
2021-05-03 11:01:35 -04:00
Felix Kuehling bd68646772 kfdtest: Workaround delayed page faults
Delayed page faults from a terminated process can be attributed to the
next process with the same PASID. Work around that by adding a delay
after the Exception tests to allow the kernel to clean up any fault
storms before the next test.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Id310c13ea9eb92b04d37b95d91a0dd60bd9954e5
2021-05-03 11:01:24 -04:00
Felix Kuehling 25288e07dc kfdtest: Handle EINTR in waitpid
If the signal arrives too late, it interrupts waitpid. Handle this
situation gracefully.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If4925c352c81ba7fef8a940460b91f5e720b451e
2021-05-03 11:01:11 -04:00
Felix Kuehling d8d8e3ddd6 libhsakmt: Add a new device ID for gfx90a
It is gfx90a VF device ID, for virtualization support.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I9e51d6b58c702d185e6758a9c511e9b8bc72c2f5
2021-04-30 13:42:27 -04:00
Alex Sierra 0a2d7d8319 kfdtest: SetGetAttributes default access attr returned based on xnack
After unregistered memory is added, now default access attribute
is returned based on xnack configuration.

Change-Id: I8ef44fe1e165ba009622e8112436c1f7a683f6cb
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
2021-04-27 14:18:15 -04:00
Harish Kasiviswanathan 9b95185a61 libhsakmt: Add DIDs for gfx1032
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: I61e938db3763bc2cdb4e0ea74f9aaae810b5d27b
2021-04-27 09:43:32 -04:00
Eric Huang a6703395f6 kfdtest: remove scc bit for cache coherence tests
It is to address gfx90a HW memory model changes.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: Ie5c5c5ee5ddfb75c0b4f625baf59ce37b4cc7c31
2021-04-26 19:55:49 -04:00
Philip Yang 7d53e94750 kfdtest: skip KFDSVMEvictTest.QueueTest on gfx10
KFDSVMEvictTest.QueueTest shader asm code need update to support gfx10
and gfx9, skip the test to unblock CI test.

Change-Id: Id2842127cf5fc98a652afa82035a4b3603bf5c33
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-04-26 10:34:03 -04:00
Harish Kasiviswanathan e06d549337 kfdtest: Remove GFX9 assembler meta information
Fixes assembler error. The SP3 backend if already set to FamilyId

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: If127a71693b293e2748b06efb668a359b939cd14
2021-04-21 14:27:16 -04:00
Joseph Greathouse c1c46d9c97 Update GWS tests for gfx1030
gfx10 GPUs such as gfx1030 need new assembly code to test
the GWS. Removed scalar stores and added proper usage of DLC and
VSCNT waits. Removed gfx9-specific assembler meta-values.

Change-Id: I2bbdb77692ace2dba10997f721ba9decaa9be82a
2021-04-19 10:21:21 -04:00
Mike Li 77f1bfa277 Add cache information for GPU
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I93606e676ae944fa3d72886654566c75ab8f9806
2021-04-19 09:55:30 -04:00
Felix Kuehling e8990cf830 kfdtest: add SVM tests
KFD changes are ready, all SVM tests should pass now. Skip SVM tests if
the SVM API is not supported.

Change-Id: I5e358565a0458eea45eae0aaf4969ce3a36574a7
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2021-04-16 00:12:48 -04:00
Philip Yang e8f369b385 libhsakmt: dynamic HMM and xnack detection
New properties SVMAPISupported added in Thunk spec HSA_CAPABILITY, read
from sysfs from KFD topology.

New local memory property flag CoherentHostAccess added to Thunk
HSA_MEMORYPROPERTY, read from sysfs from KFD topology.

Change-Id: I83933f0e5a61508508168873209dba4af0b77295
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2021-04-16 00:10:56 -04:00
Felix Kuehling bb441d0bdd libhsakmt: add XNACK API set/get mode
XNACK API for GPUs that support this mode. This API
makes calls to amdgpu driver to configure xnack mode.
It supports set xnack mode and query the current mode used.

Change-Id: If865fd0e3f900f008243dc49504e1a0694e1791a
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
2021-04-16 00:10:41 -04:00
Felix Kuehling dd72f236c1 libhsakmt: add SVM thunk implementation
Implement SVM (Shared Virtual Memory) in the thunk.

Change-Id: I0380150d1d3da48070f9389a06f416d6059d6948
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Sean Keely <Sean.Keely@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
2021-04-16 00:10:25 -04:00
Felix Kuehling c44a4be776 libhsakmt: add API to support svm and xnack
Add function definitions to support SVM (shared virtual memory)
and xnack set.

Change-Id: Ia97ad9d0c449d8d500d799f702e1a58e87d65a56
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2021-04-16 00:09:49 -04:00