- Use new buffer resource descriptor layout
- Handle wave32 scratch allocation error from CP
- Make wavefront size a property of scratch allocation requests
- Repurpose wave64-specific amd_queue_t.scratch_workitem_byte_size field
- Clear index_stride field in V# on gfx10, calculated per-dispatch by CP
Change-Id: If2acdf6430772abd4d6a8c792fc8c11260764dda
Docker seccomp by default blocks mbind system call, so mbind return
failed on docker. thunk should not fail this otherwise application
cannot allocate system memory on docker.
Use pr_warn_once and pr_err_once to avoid duplicate same error messages
Change-Id: I61a7c0e4abaa3dcfe7abf2ea48db90f669f9638a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
This will emilinate the need of updating the run_kfdtest.sh every time
a new platform is added.
Change-Id: I584d65b462de36a685fa2d29d43962078ba511dc
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
doorbell_queue_map should always be allocated or we will need to
add branches around all accesses.
Change-Id: I994c0eaf4be62c1a4a37bd06894272dba1fc1da6
Force all the GPUs to a certain type, use the below command:
HSA_FORCE_ASIC_TYPE="10.1.0 1 gfx1010 14"
meaning major.minor.step dgpu asic_name asic_id
This will faciliate the cooperation across the teams for bringing up
ASICs which reuse existing device IDs.
Change-Id: I40fe4c9b46d3ccb3e38ea52250e80e82fb50fb0f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Those tests are actually did not function up to its expectation because
some underlying functions such as suspend/resume and disable/enable KFD
were not implemented. Those interfaces would never be implemented, so
delete them.
Change-Id: Ib5872ba2f35e307221e43791cda1782c6b6bb4d1
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
A number of tests are no longer broken on gfx802.
Change-Id: If70c77423f8f14de59490ab8ca156b0c4e7b5cf1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
The memory size alignment workaround for a TLB bug on gfx802 was
breaking userptrs because it would attempt to get_user_pages beyond
the end of a VMA. Refine this workaround based on our understanding
of the HW bug. It only affects L2 cacheline allocation, which is
decided by the last page in the cache line (8 entries = 32KB of
address space). Thus aligning memory allocation so that the last
page falls on the end of a 8 entry TLB cache line allows caching
to work correctly.
Imported images require specific alignments. If their size is not
naturally aligned with 8 cache lines, it may have bad TLB cache
performance.
This patch will only have the desired effect if redundant size
padding in KFD is also removed.
Change-Id: I984cbe7fa61fec04d70fa387aaf9aab370eabeb9
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
sdma end ts must be 256 bit aligned in oss 3.0 and prior. Using
the ts pool requires copying into the signal and is a significant
performance penalty for small copies.
SharedSignal is 128 bytes due to alignment so can host the end ts.
Move sdma end ts into SharedSignal and remove ts pool and ts copy.
Change-Id: I7899bda36ebc9adcaad1d3a3d2b7a489857cc9e8
hsakmt-dev should not install include/linux/* (currently just kfd_ioctl.h)
as those are linux kernel headers provided by the linux kernel header
packages (`linux-headers-*` on Debian/Ubuntu or `kernel-headers-*` on
Red Hat / Fedora)
Change-Id: Ib6e62ca2f3582c5ad7351225f5827081bf8e05c0
Signed-off-by: Craig Andrews <candrews@integralblue.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
pkgconfig files should be installed to /usr/share/pkgconfig/, not /usr/libhsakmt/
Change-Id: Ifd08f612addb375de1d00282ee9e7c257528bf74
Signed-off-by: Craig Andrews <candrews@integralblue.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
LICENSE.md should be installed to the DOCDIR, not /usr/libhsakmt
Change-Id: I2020547b3174b9d91c1f800d9db2d73f627a6ce3
Signed-off-by: Craig Andrews <candrews@integralblue.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Instead of the installed hsakmt header files, use the ones from the
source tree, since they are in the same git repository. This allows
using kfd_ioctl.h even when we don't install this file with an
upcoming change.
Change-Id: I9a30abd5445806d2141bdb1ccd88d3794a74ed20
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Made changes in the CMakeLists.txt
1) Detects the OS of the system
2) Accordingly setsup the runtime dependencies
Change-Id: Ief9a0217caae77d4be4850167e2a9b8387f639e4
Signed-off-by: Lad, Aditya <aditya.lad@amd.com>
Impacts GPU_ONLY signal type latency when waiting for small operations.
Using this type improves total SDMA small copy performance by ~40% if
the signal is allowed to spin freely.
Change-Id: I27aa128c63a1bacb3f51fb08f166e4e1d6fef651
Remove agent lookup in time stamp translation for IPC signals. The copy
agent handle is not shared so does not need to be checked for cross
process use. Cross process copy-timestamp read is illegal and continues
to deliver garbage.
Store the copy agent properly when doing CPU-CPU copies.
Change-Id: Ib4008f66ff866922047749dd556c84a32021c1fd
ucode versions are per asic so not valid for feature enablement outside
of bringup/dev. Feature is older than the latest ioctl change that
the thunk depends on so use of this patch with kernel packages that
don't contain the feature is not possible in a supported environment.
Change-Id: I36b14176a7d642017ef1518aeade454b0f3dc749
snprintf throws a warning from -Wformat-truncation where the string
could be truncated. We address this by referencing the maximum size that
can be returned from a file according to MAXNAMLEN . This should safely
guard us from truncating the path value.
Change-Id: If1d208990d8775e9494835b0deb890d2616fd15b
Signed-off-by: Kent Russell <kent.russell@amd.com>
Replace cpuid call with sysfs data to get CPU cache information. With this
change, x86 check is also removed since sysfs applies to other platforms.
CPU cache information can be retrieved from
/sys/devices/system/node/nodeX/cpuY/cache where Y is processor number
represented in /proc/cpuinfo at "processor" entry.
Change-Id: Ic47df6d5dafaf1aae5b46b1fdee42691c697e49e
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
PCI domain has moved to 32-bits to accommodate virtualization,
so a 32-bit integer is exposed for domain to reflect this change.
Change-Id: I0d767acadcdc8e4277db203b5865dd67dd001cef
Signed-off-by: Ori Messinger <ori.messinger@amd.com>
enable thunk query api to report if queue is newly created
test new queue bit test on clear events.
also fixup cleanup to disable debug trap.
Change-Id: I3ebe2d85da66f28b8c82f0e68461ee7d32ec0b0d
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Reviewed-by: Philip Cox <Philip.Cox@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Originally reserved 100 SGPRs per wave. Pre-gfx10 needs 102 SGPRs
and gfx10 needs 128 SGPRs. Reserve 128 SGPRs per wave for all ASICs
to simplify calculation.
Also double VGPR register size for gfx908 family
Change-Id: I98b741cbfa051f49ed37ff25d99f851f124be7b6
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
This saves us from maintaining device ID to Asic mapping in the scripts.
Moreover, stop using abbrevation asic names to avoid confusion.
Change-Id: I7ce583b26b09b627c142aae41932483b28c545d8
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
By using user-allocated pages instead of kernel-allocated pages
from TTM, we're not subject to TTM's self imposed limits on kernel
memory usage. This also paves the way for for more wide-spread use
of HMM on upstream kernels.
Change-Id: Iac82964c98a441e29b7f1986d1be1bb5ccb1e569
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
child process clone vm objects from svm->apertures if parent process
doesn't free memory before fork. fmm_clear_all_mem suppose to clear the
apertures in forked child process but this only works if gpu_vm is not
NULL. parent process call hsaKmtCloseKFD reset gpu_vm to NULL and then
fork, then child process will not clear svm->apertures.
As a result, the child process will allocate vm object with same address
and add to aperture, there are duplicate vm objects with same address
in aperture. Then mapping to GPU will find the wrong vm object and
create incorrect GPU mapping cause rocrtst IPC test VM fault. The issue
happened with HSA_USERPTR_FOR_PAGED_MEM=1.
The fix is to clear vm objects in all apertures in clear_after_fork.
Change-Id: I92e42a967075a634a3f475b915c8242d82077ecb
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
This reverts commit 632ad3a749.
This change causes KFDTest failed on gfx803. The first hsaKmtCreateEvent
call allocate system memory for events_page because global variable
events_page is NULL. And this events page vm address should not be freed
until the process exit.
The change to destrory objects in hsaKmtCloseKFD removes events page. As
a result, KFDTest call hsaKmtOpenKFD again and then allocate memory will
get same events_page vm address on gfx803, and map this vm failed because
the vm conflict with events_page mapping.
KFDTest passed on VG10, gfx906 because allocate memory get different vm
address. hsaKmtCreateEvent still works fine as the driver keeps the
events page mapping of the process.
We should only destroy objects in fork cloned child process regardless
if gpu_vm is NULL or not.
Change-Id: I174ef65321cbd6074c855c2021318fe961c8c72c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>