For GC 9.4.0, modifications were made to various shaders since certain
flat_ instructions no longer support glc/slc modifiers (replaced with
nt/sc1/sc0). Instead of repeating conditionals inside various shader
bodies, we can make use of LLVM AMDGCN macros.
This patch modularizes the shader macros into seperated defines. Prior
to the core raw-string literal, each shader now starts with the
SHADER_START literal (".text\n") plus any number of SHADER_MACRO_*
literals. This allows us to seperate the macro definitions logically and
use the pre-processor to only include the required macro groups on a
per-shader basis.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I19eb3fd14252a0601bb7509249051b68e7fdb02a
Previously, KFDEvictTest.QueueTest and KFDSVMEvictTest.QueueTest
would create a variable number of wavefronts, one for each 64MB
of memory under test. This ran into limits on the buffers used
by the wavefronts, and may at some point have exceeded the
wavefront limit.
Restrict the number of wavefronts to 512, and adjust the shader
to accomodate a variable buffer size
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I2ec292e2900e2efa62a08313bca3d2f4bdabca8b
GC 9.4.3 to set gfx target version to 9.4.x dependent on revision and
capabilities. Due to this, where applicable, mask off the gfx target
stepping version and only check major/minor version (9.4). There are no
collisions due to this change since GC 9.4.3 is the only ASIC that uses
gfx target version 9.4.x.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I72803e594c421f054d18ccfa7e92c507128fa5be
KFDMemoryTest.DeviceHdpFlush requires device node 0 is large bar to
check VRAM content from CPU, run the test only if device 0 is large
bar GPU.
Change-Id: I874b153219550c50b724625e971e3ed3a84dc652
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Nodes with XGMI have no HDP, so DriverHDPFlush should skip.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: If5a87e660712e51d03e750d8e044786036b2e603
Even with the restriction to only compile on gfx90a, this
shader still fails CompileShaders test.
There don't seem to be any systems that actually use it.
Leave it in the shader store, but remove it otherwise
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I41bec6ba10363d42b163ac101c3a92edaad6d6df
A gfx940 code path was erroneously added to this shader.
It's unneccesary; without this path, the shader uses
the scalar store, which works just fine on gfx940 without changes.
Remove it.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I825cbbebbdb25c4a7c2f16e228c2bea6a6bcc30c
gfx940 changed the semantics of the glc and slc coherency options
on vector stores and loads. This means that shaders that use
those bits no longer compile on gfx940.
Add precompilation if statements to those shaders to use the
new coherency bits.
Also add gfx940 to ASMTest so that compilation is tested.
Note: One of the tests enabled by this patch on gfx940,
KFDEvictTest.QueueTest, does not pass on gfx940 emulators.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I942f9d2536e9eb5510c4d5af30df6ff1a95c8cf7
Use q->total_mem_alloc_size for munmap in SVM codepath of free_queue.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2fecaa1ddb337b1fe71f9cbba45a0c9467eff0c0
Currently, on queue destroy, context save restore memory is freed
only for a single XCC. Instead, we need to free the entire context
save restore memory, which was allocated for all XCCs.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I51ebb12fa8d5ebed41979d68e74f7c5392dca062
Do not allocate the EOP buffer when not required.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I1664a3f0a882219a72278174006cdb8d46fd4f5e
Program ACCUM_OFFSET to match the number of VGPRS used
by the shader as part of Dispatch setup.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: Icfa1fbe4de2a62f00743de567f3ed382d3378b17
In multi-partition modes, e.g. CPX, we want to create new file
descriptor despite using the same render node. Update
open_drm_render_device to use a gpu_id to fd map partitioned by render
node. Different gpu_id's requesting the same render node will be added
to that render node's map list for fetching its fd. Different gpu_id's
requesting different render nodes as well as the same gpu_id's
requesting the same render node will behave as they did previously.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie153d42355d4d75b1c6ba6ff40fac3295bc87009
Allocate debug area big enough for all XCCs in the partition. Also, fix
the cu_num calculations as driver now reports cu_num as the total number
of CUs in the partition.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I6e80d57196b770bb3c2506bc58cb366c0046084b
Add gfx version for VGPR size per CU calc, add FAMILY_AV to KfdFamilyId,
add blacklist filter to kfdtest.exclude.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I9b8072e45f4d497e0a8fd3f8f97f1425238e8b42
Instead of hard-coding lib64 and other include locations, just prepend
the DRM_DIR to the beginning of the CMake prefix path. Then let
pkgconfig find the package, the same way that it would if DRM_DIR wasn't
set. DRM_DIR takes precedence, but the default paths will be used if
DRM_DIR isn't set, or doesn't point to where libdrm is housed
Note that /lib and /lib/$ARCH aren't required for DRM_DIR, just the
path to the root folder for the package (e.g. /opt/amdgpu instead of
/opt/amdgpu/lib or /opt/amdgpu/lib64 or /opt/amdgpu/lib/x86_64-linux-gnu
etc)
Change-Id: I56767db28476d14e3fa77be1089c3904e2a32450
See description of previous revert.
This reverts commit 564913526a.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I969dc6469e62b50cd7ba0595918538602afa7516
This patch and the previous made it such that the queue ring buffer was
allocated as non-paged for GFX11+. The queue ring buffer should not be
mapped as non-paged; the non-paged requirement on GFX11 is only needed
for the queue wptr.
This patch was causing issues on various tests, such as intermittent
CP_INTSRC_BAD_OPCODE interrupts.
This reverts commit e40ae8481e.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I55b64aed73dc3b792f0756ae00daf6e10d93ce10
Test is inconsistent across ASICs. Add to blacklist to unblock QA.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I31e5aa2450165227107536bef8402db2c0dc6d7f
Get more debug information about user pointers that were registered
through SVM API, and triggered by memory exception events.
A new kfdtest with this use case was also included inside
KFDExceptionTest.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I0ef4929afe0625b9b5cbbbebef11ede66dda60ab
Register and map userptrs through Shared Virtual Memory(SVM) API at
the Kernel level when available. Using this approach, performance
will be improve as register/unregister memory will not trigger any
system call to KFD driver.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I3726b4b5e1c6a52a83786fbe0af6322eb29ae7c9
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message
Change-Id: Ibb84241ba35aefb7a8450d68231e52242a634ed3
The MemoryAllocAll test in kfdtests exercises the new KFD memory
availability API by trying to allocate a single buffer object that
exactly fills all of vram. Desired object size is determined using the
memory availility KFD ioctl via libhsakmt, then an object is allocated
slightly larger than that size. If the allocation attempt fails then
the test tries to allocate a slightly smaller object, and continues
trying with smaller sizes until the allocation succeeds. The test
succeeds if the successfully allocated object is within some specified
tolerance of the available memory reported.
There are a number of known issues that can cause the successfully
allocated object to be significantly smaller than reported availability.
Until these issues are addressed, we should not fail the test, but just
log the actual divergence between the size of the object we thought we
could allocate, and what was actually possible.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: I165a30865ffbb2353286dcc896ad8e24af124615
Since KFD counts svm allocation as system memory usage,
KFDSVMEvictTest will fail on the case of small system
memory, adding check is to skip test.
Signed-off-by: Eric Huang <jinhuieric.Huang@amd.com>
Change-Id: I040f16f2dd0d4092d069a632cfba9c28293f781b
Implement hsaKmtExportDMABufHandle, which can be used for a new
upstreamable RDMA solution. It exports a DMABuf handle for an arbitrary
virtual address along with the offset of the address within the
allocation. It also checks that the size of the intended export does
not exceed the allocation.
This uses the new AMDKFD_IOC_EXPORT_DMABUF, which requires KFD ioctl
API version 1.12.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ie5fdb1f73ab3c7fa36c315ce326b1fb89eacc8b6
Remove BLACKLIST_GFX10_NV2X from GFX11 blacklists, update
BLACKLIST_GFX11 as needed.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I84bd91ba20a5d3df27478fb4c97afa12f8a3e76a
The Shader Engines number should be shadder array_count divided by simd_arrays_per_engine
not array_count.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I808d1fedd6b9843500719e902ecf759f5668a7d1
KFDTopologyTest.BasicTest duplicates Thunk logic to calculate VGPR size,
meaning it will always be the same, and SGPR size is a constant. Since
no benefit, remove comparisons.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I99e7ff6fb69ed07bc0716fdf43946b19c67b9268
Fixed VGPR memory size, size was too small for some GPU, causing a memory overflow.
Refactored macro code into a function.
Thanks to Jay Cornwall for locating the problem and proposing the fix.
Change-Id: Iffedea1c4f341967f02c56d810ff048225b02c16
Signed-off-by: David Belanger <david.belanger@amd.com>
This is a temporary work around for GPU hang issues observed on GFX11.
Change-Id: I98fbedbbd1c51fe402c2116b35ca548931a390c9
Signed-off-by: David Belanger <david.belanger@amd.com>
This reverts commit 7787a039bd.
It causes a regression in pytorch benchmark.
Change-Id: I96173dbd061cf38d6f451c02cb181ae51b7f625e
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
This reverts commit 178a619b80.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I7ef87c5232a3bcbe594c743fa4b4958601845ba5
This reverts commit 45fad29752.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I6566c9f0d39d05ecb92f38159880763f432939a5
This reverts commit 8a746bdaed.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ib01046571d2c84fa0fd228ecba0dee0eae3f994d
Track Test Status in syslog, it will help understand
sys log assoicated with test cases.
Change-Id: I7c0749102db9bc73d6ae3a237ec347a8fefb12e9
Signed-off-by: James Zhu <James.Zhu@amd.com>
This is handled by __fmm_release calling aperture_release_area.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ib8ed300e1734f03aeb9dfc8074897ece310b8af9