AppAPU VRAM is part of system memory managed by Linux kernel, no
VRAM eviction and restore is needed between VRAM and system memory.
Those Evict test failed on AppAPU now, skip those tests on AppAPU.
No page migration between VRAM and system on AppAPU, HMMProfilingEvent
depends on migration event, skip it on AppAPU.
Change-Id: I4c809b97c947e809d136c1f88db2278cf74f5b47
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 21abaef3f8]
If there is connection between GPU and CPU with weight 13,
KFD_CRAT_INTRA_SOCKET_WEIGHT, then this is AppAPU.
This will be used to skip tests not suitable for AppAPU.
Change-Id: If6fad81528b52afd4ac4cefa508d787b0f6637ca
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: e2df2c21af]
We should check compute core instead of cpu core,
in order to exclude the case of APU.
Signed-off-by: Jesse zhang <jesse.zhang@amd.com>
Change-Id: I2ec2a6807f51f49f80e0e500f5d9af81c2efae37
[ROCm/ROCR-Runtime commit: 4d54d6e706]
For GC 9.4.0, modifications were made to various shaders since certain
flat_ instructions no longer support glc/slc modifiers (replaced with
nt/sc1/sc0). Instead of repeating conditionals inside various shader
bodies, we can make use of LLVM AMDGCN macros.
This patch modularizes the shader macros into seperated defines. Prior
to the core raw-string literal, each shader now starts with the
SHADER_START literal (".text\n") plus any number of SHADER_MACRO_*
literals. This allows us to seperate the macro definitions logically and
use the pre-processor to only include the required macro groups on a
per-shader basis.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I19eb3fd14252a0601bb7509249051b68e7fdb02a
[ROCm/ROCR-Runtime commit: e2435d9e93]
Previously, KFDEvictTest.QueueTest and KFDSVMEvictTest.QueueTest
would create a variable number of wavefronts, one for each 64MB
of memory under test. This ran into limits on the buffers used
by the wavefronts, and may at some point have exceeded the
wavefront limit.
Restrict the number of wavefronts to 512, and adjust the shader
to accomodate a variable buffer size
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I2ec292e2900e2efa62a08313bca3d2f4bdabca8b
[ROCm/ROCR-Runtime commit: 680c8ca5a9]
GC 9.4.3 to set gfx target version to 9.4.x dependent on revision and
capabilities. Due to this, where applicable, mask off the gfx target
stepping version and only check major/minor version (9.4). There are no
collisions due to this change since GC 9.4.3 is the only ASIC that uses
gfx target version 9.4.x.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I72803e594c421f054d18ccfa7e92c507128fa5be
[ROCm/ROCR-Runtime commit: 831d1ad352]
KFDMemoryTest.DeviceHdpFlush requires device node 0 is large bar to
check VRAM content from CPU, run the test only if device 0 is large
bar GPU.
Change-Id: I874b153219550c50b724625e971e3ed3a84dc652
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 598e3e8d86]
Nodes with XGMI have no HDP, so DriverHDPFlush should skip.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: If5a87e660712e51d03e750d8e044786036b2e603
[ROCm/ROCR-Runtime commit: e32278a612]
Even with the restriction to only compile on gfx90a, this
shader still fails CompileShaders test.
There don't seem to be any systems that actually use it.
Leave it in the shader store, but remove it otherwise
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I41bec6ba10363d42b163ac101c3a92edaad6d6df
[ROCm/ROCR-Runtime commit: 16c6530330]
A gfx940 code path was erroneously added to this shader.
It's unneccesary; without this path, the shader uses
the scalar store, which works just fine on gfx940 without changes.
Remove it.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I825cbbebbdb25c4a7c2f16e228c2bea6a6bcc30c
[ROCm/ROCR-Runtime commit: 2a01e5c33b]
gfx940 changed the semantics of the glc and slc coherency options
on vector stores and loads. This means that shaders that use
those bits no longer compile on gfx940.
Add precompilation if statements to those shaders to use the
new coherency bits.
Also add gfx940 to ASMTest so that compilation is tested.
Note: One of the tests enabled by this patch on gfx940,
KFDEvictTest.QueueTest, does not pass on gfx940 emulators.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I942f9d2536e9eb5510c4d5af30df6ff1a95c8cf7
[ROCm/ROCR-Runtime commit: 30da9a3cf9]
Use q->total_mem_alloc_size for munmap in SVM codepath of free_queue.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2fecaa1ddb337b1fe71f9cbba45a0c9467eff0c0
[ROCm/ROCR-Runtime commit: ae659e5427]
Currently, on queue destroy, context save restore memory is freed
only for a single XCC. Instead, we need to free the entire context
save restore memory, which was allocated for all XCCs.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I51ebb12fa8d5ebed41979d68e74f7c5392dca062
[ROCm/ROCR-Runtime commit: a713fb766e]
Do not allocate the EOP buffer when not required.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I1664a3f0a882219a72278174006cdb8d46fd4f5e
[ROCm/ROCR-Runtime commit: 252a2cf959]
Program ACCUM_OFFSET to match the number of VGPRS used
by the shader as part of Dispatch setup.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: Icfa1fbe4de2a62f00743de567f3ed382d3378b17
[ROCm/ROCR-Runtime commit: 8994c3ba0e]
In multi-partition modes, e.g. CPX, we want to create new file
descriptor despite using the same render node. Update
open_drm_render_device to use a gpu_id to fd map partitioned by render
node. Different gpu_id's requesting the same render node will be added
to that render node's map list for fetching its fd. Different gpu_id's
requesting different render nodes as well as the same gpu_id's
requesting the same render node will behave as they did previously.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie153d42355d4d75b1c6ba6ff40fac3295bc87009
[ROCm/ROCR-Runtime commit: fd48f14ceb]
Allocate debug area big enough for all XCCs in the partition. Also, fix
the cu_num calculations as driver now reports cu_num as the total number
of CUs in the partition.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I6e80d57196b770bb3c2506bc58cb366c0046084b
[ROCm/ROCR-Runtime commit: 97a669a979]
Add gfx version for VGPR size per CU calc, add FAMILY_AV to KfdFamilyId,
add blacklist filter to kfdtest.exclude.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I9b8072e45f4d497e0a8fd3f8f97f1425238e8b42
[ROCm/ROCR-Runtime commit: 6be4461a0d]
Instead of hard-coding lib64 and other include locations, just prepend
the DRM_DIR to the beginning of the CMake prefix path. Then let
pkgconfig find the package, the same way that it would if DRM_DIR wasn't
set. DRM_DIR takes precedence, but the default paths will be used if
DRM_DIR isn't set, or doesn't point to where libdrm is housed
Note that /lib and /lib/$ARCH aren't required for DRM_DIR, just the
path to the root folder for the package (e.g. /opt/amdgpu instead of
/opt/amdgpu/lib or /opt/amdgpu/lib64 or /opt/amdgpu/lib/x86_64-linux-gnu
etc)
Change-Id: I56767db28476d14e3fa77be1089c3904e2a32450
[ROCm/ROCR-Runtime commit: d0c2770cde]
See description of previous revert.
This reverts commit 8554f0df14.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I969dc6469e62b50cd7ba0595918538602afa7516
[ROCm/ROCR-Runtime commit: 287cb29340]
This patch and the previous made it such that the queue ring buffer was
allocated as non-paged for GFX11+. The queue ring buffer should not be
mapped as non-paged; the non-paged requirement on GFX11 is only needed
for the queue wptr.
This patch was causing issues on various tests, such as intermittent
CP_INTSRC_BAD_OPCODE interrupts.
This reverts commit 92a336d485.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I55b64aed73dc3b792f0756ae00daf6e10d93ce10
[ROCm/ROCR-Runtime commit: 0750856d4a]
Test is inconsistent across ASICs. Add to blacklist to unblock QA.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I31e5aa2450165227107536bef8402db2c0dc6d7f
[ROCm/ROCR-Runtime commit: 5d80a4d214]
Get more debug information about user pointers that were registered
through SVM API, and triggered by memory exception events.
A new kfdtest with this use case was also included inside
KFDExceptionTest.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I0ef4929afe0625b9b5cbbbebef11ede66dda60ab
[ROCm/ROCR-Runtime commit: 2a1d6ee8b5]
Register and map userptrs through Shared Virtual Memory(SVM) API at
the Kernel level when available. Using this approach, performance
will be improve as register/unregister memory will not trigger any
system call to KFD driver.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I3726b4b5e1c6a52a83786fbe0af6322eb29ae7c9
[ROCm/ROCR-Runtime commit: 63c8cf115a]
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message
Change-Id: Ibb84241ba35aefb7a8450d68231e52242a634ed3
[ROCm/ROCR-Runtime commit: c911848242]
The MemoryAllocAll test in kfdtests exercises the new KFD memory
availability API by trying to allocate a single buffer object that
exactly fills all of vram. Desired object size is determined using the
memory availility KFD ioctl via libhsakmt, then an object is allocated
slightly larger than that size. If the allocation attempt fails then
the test tries to allocate a slightly smaller object, and continues
trying with smaller sizes until the allocation succeeds. The test
succeeds if the successfully allocated object is within some specified
tolerance of the available memory reported.
There are a number of known issues that can cause the successfully
allocated object to be significantly smaller than reported availability.
Until these issues are addressed, we should not fail the test, but just
log the actual divergence between the size of the object we thought we
could allocate, and what was actually possible.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: I165a30865ffbb2353286dcc896ad8e24af124615
[ROCm/ROCR-Runtime commit: d3bb1ca4af]
Since KFD counts svm allocation as system memory usage,
KFDSVMEvictTest will fail on the case of small system
memory, adding check is to skip test.
Signed-off-by: Eric Huang <jinhuieric.Huang@amd.com>
Change-Id: I040f16f2dd0d4092d069a632cfba9c28293f781b
[ROCm/ROCR-Runtime commit: 3f55ba9fb8]
Implement hsaKmtExportDMABufHandle, which can be used for a new
upstreamable RDMA solution. It exports a DMABuf handle for an arbitrary
virtual address along with the offset of the address within the
allocation. It also checks that the size of the intended export does
not exceed the allocation.
This uses the new AMDKFD_IOC_EXPORT_DMABUF, which requires KFD ioctl
API version 1.12.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ie5fdb1f73ab3c7fa36c315ce326b1fb89eacc8b6
[ROCm/ROCR-Runtime commit: 332f59eb2a]
If the KFD IOCTL version doesn't support available_memory, don't run the
test. Just skip the test
Change-Id: Iebf526d4563ab9f3c054bbfb38c214a1b893fcb5
[ROCm/ROCR-Runtime commit: 64aa9009e1]
The Shader Engines number should be shadder array_count divided by simd_arrays_per_engine
not array_count.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I808d1fedd6b9843500719e902ecf759f5668a7d1
[ROCm/ROCR-Runtime commit: efcc9b275b]
KFDTopologyTest.BasicTest duplicates Thunk logic to calculate VGPR size,
meaning it will always be the same, and SGPR size is a constant. Since
no benefit, remove comparisons.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I99e7ff6fb69ed07bc0716fdf43946b19c67b9268
[ROCm/ROCR-Runtime commit: 3fb1496fb3]
Fixed VGPR memory size, size was too small for some GPU, causing a memory overflow.
Refactored macro code into a function.
Thanks to Jay Cornwall for locating the problem and proposing the fix.
Change-Id: Iffedea1c4f341967f02c56d810ff048225b02c16
Signed-off-by: David Belanger <david.belanger@amd.com>
[ROCm/ROCR-Runtime commit: a847a7b80e]
This is a temporary work around for GPU hang issues observed on GFX11.
Change-Id: I98fbedbbd1c51fe402c2116b35ca548931a390c9
Signed-off-by: David Belanger <david.belanger@amd.com>
[ROCm/ROCR-Runtime commit: b25867c4b8]
This reverts commit 1bb6d872ac.
It causes a regression in pytorch benchmark.
Change-Id: I96173dbd061cf38d6f451c02cb181ae51b7f625e
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
[ROCm/ROCR-Runtime commit: 505287412f]
This reverts commit ea19fbb646.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I7ef87c5232a3bcbe594c743fa4b4958601845ba5
[ROCm/ROCR-Runtime commit: f2bda56d04]
This reverts commit a89bcd0518.
There are some openMP issues that were introduced after SVM userptr
feature was added.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I6566c9f0d39d05ecb92f38159880763f432939a5
[ROCm/ROCR-Runtime commit: d9f86ae02b]