Alloc vram by kfd, then map by GEM api to GPU VM and map to CPU VM.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: Ib5b2f35662cd5473f622f6ffc9b62925fe57ae42
[ROCm/ROCR-Runtime commit: 108c0e5f92]
This new manageable_aperture_t is used for VRAM allocation-only and
VA allocation-only.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I3866ef9d35386d6aef7b6934ac8d4a89ef843b50
[ROCm/ROCR-Runtime commit: 0a2989083b]
This reverts commit 89ce41694f.
Current amdgpu exposes one render node for one gpu node/partition,
revert to previous way to open render node at Thunk.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
Change-Id: I436be74f8e872a7ab5c4a1420b4ea884f5a00e57
[ROCm/ROCR-Runtime commit: cc4fb2d1a9]
Add parameterization for KFDSVM tests so that we test with both XNACK
enabled and XNACK disabled. This will be overridden by HSA_XNACK, if set
Change-Id: Ie96eb61c03115f947e08cfa076ac459f7440f5d8
[ROCm/ROCR-Runtime commit: 478a68d49c]
For aqua_vanjaram APU mode, KFDEvictTest and KFDSVMEvictTest are
skipped. Those tests passed on dGPU mode with memory reporting partition
support on GFX 9.4.3.
Change-Id: I56357843c6743b01b807359dbb37b32391fd9a25
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 5df82e3d14]
Add support functions to remap the first page of device memory (GPU/GTT)
to share host ASAN logic.
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: I4c27d5417ba80a172dccb0a079a597c5dc1c8f85
[ROCm/ROCR-Runtime commit: 1e6d728730]
When we merge thunk into ROCr, kfdtest will be in a different folder
structure. Add the new location to ensure that we can build now and in
the future with no disruptions
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I6517e061cb0da7137d903abbc380bfc7126f40d4
[ROCm/ROCR-Runtime commit: d966243783]
Starting with GFX11, wptr BOs must be mapped to GART for MES to determine work
on unmapped queues for usermode queue oversubscription (no aggregated doorbell)
Change-Id: I10e30fdc2bec587cef9427faa4874957988c34b3
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: d319660838]
If MES is enabled, wptr has to be non paged memory,
Add an API to check this condition.
Change-Id: I53af1f6687d5332d102e7062c3d760e33b96e722
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: 53ed978c3d]
Using wrapper header files will result in #warning message by default
Change-Id: I8301e433d39f3e5d39384ede6f0e4464d0eb20a6
[ROCm/ROCR-Runtime commit: b487f87363]
If Dev0 and Dev1 are not the same gfx, we should temporarily
set the target ASIC for compiling Shader code.
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Signed-off-by: Shikai Guo <shikai.guo@amd.com>
Change-Id: I5836beb16ade519f5a148d3d2b9c2875554f0c35
[ROCm/ROCR-Runtime commit: 5d6f900353]
Overload Assembler::RunAssembleBuf to take in an extra Gfxv parameter.
Using this overload will temporarily set the target ASIC to Gfxv before
calling RunAssemble, and copy back the original MCPU literal upon
completion. The copy to reset the original MCPU in this case is safe as
the MCPU length is always known.
This will be useful in multi-device test cases whereby the devices are
not necessarily the same gfx version. The overload is explicitly for the
RunAssembleBuf wrapper rather than RunAssemble to ensure the default
MCPU is always reset independent of errors in RunAssemble.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7fe5a962876314b6df32e4b7160174949d98f9e3
[ROCm/ROCR-Runtime commit: 54136f60a0]
LLVM MC does not seem to accept multi-line conditionals. This may be
fixable in the future with macros. The Aqua Vanjaram shader spec states
that while buffer_invl2 has been replaced by buffer_inv, the former may
still be used for compatibility. However, this does not seem to be
implemented. For now, fix conditional.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I7f8b64c96055371d7e0090b758d2cfd2a37ecd3c
[ROCm/ROCR-Runtime commit: 92f3d4a458]
Previous code might fail to get the correct ln node. And trigger extra
walk through of the tree. Fix it.
While walking through the tree, better to search from right to left as
the node->start likely close to *address*.
Change-Id: If86ddf73e59a1eb88225d1ea90797818e8165488
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
[ROCm/ROCR-Runtime commit: 77761836ae]
These tests should also pass on Aqua Vanjaram, so enable them
Signed-off-by: David Francis <David.Francis@amd.com>
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ibbb9cd43d653c63b08c39efd1d7326cfac1f8411
[ROCm/ROCR-Runtime commit: eed5518e4c]
Aqua Vanjaram is intended to have fine-grained coherency
from anywhere to anywhere else using read-acquire and
write-release primitives.
Add a test that writes to memory covered by five
different cache lines, then write-releases, while
another thread read-acquires, then reads those
five locations in memory.
There are nine variations of the test to cover
CPU-GPU, same-GPU and across-GPU, vector instructions and
scalar instructions, and data local to the
acquirer or receiver.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I20d2db5c53bd280e971479aad7e61df6ed5d3623
[ROCm/ROCR-Runtime commit: 30b1f23f7a]
For vector iterator loop access current node directly, don't need
gpuNodesAll.at(i), which also causes out of range access.
Change vector index loop to iterator loop to simplify the code.
Change-Id: I2627ef8d13b5d2c9cd8c51cf4dacc3e8a97fcfb0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 0696f06c16]
AppAPU VRAM is part of system memory managed by Linux kernel, no
VRAM eviction and restore is needed between VRAM and system memory.
Those Evict test failed on AppAPU now, skip those tests on AppAPU.
No page migration between VRAM and system on AppAPU, HMMProfilingEvent
depends on migration event, skip it on AppAPU.
Change-Id: I4c809b97c947e809d136c1f88db2278cf74f5b47
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 21abaef3f8]
If there is connection between GPU and CPU with weight 13,
KFD_CRAT_INTRA_SOCKET_WEIGHT, then this is AppAPU.
This will be used to skip tests not suitable for AppAPU.
Change-Id: If6fad81528b52afd4ac4cefa508d787b0f6637ca
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: e2df2c21af]
We should check compute core instead of cpu core,
in order to exclude the case of APU.
Signed-off-by: Jesse zhang <jesse.zhang@amd.com>
Change-Id: I2ec2a6807f51f49f80e0e500f5d9af81c2efae37
[ROCm/ROCR-Runtime commit: 4d54d6e706]
For GC 9.4.0, modifications were made to various shaders since certain
flat_ instructions no longer support glc/slc modifiers (replaced with
nt/sc1/sc0). Instead of repeating conditionals inside various shader
bodies, we can make use of LLVM AMDGCN macros.
This patch modularizes the shader macros into seperated defines. Prior
to the core raw-string literal, each shader now starts with the
SHADER_START literal (".text\n") plus any number of SHADER_MACRO_*
literals. This allows us to seperate the macro definitions logically and
use the pre-processor to only include the required macro groups on a
per-shader basis.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I19eb3fd14252a0601bb7509249051b68e7fdb02a
[ROCm/ROCR-Runtime commit: e2435d9e93]
Previously, KFDEvictTest.QueueTest and KFDSVMEvictTest.QueueTest
would create a variable number of wavefronts, one for each 64MB
of memory under test. This ran into limits on the buffers used
by the wavefronts, and may at some point have exceeded the
wavefront limit.
Restrict the number of wavefronts to 512, and adjust the shader
to accomodate a variable buffer size
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I2ec292e2900e2efa62a08313bca3d2f4bdabca8b
[ROCm/ROCR-Runtime commit: 680c8ca5a9]
GC 9.4.3 to set gfx target version to 9.4.x dependent on revision and
capabilities. Due to this, where applicable, mask off the gfx target
stepping version and only check major/minor version (9.4). There are no
collisions due to this change since GC 9.4.3 is the only ASIC that uses
gfx target version 9.4.x.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I72803e594c421f054d18ccfa7e92c507128fa5be
[ROCm/ROCR-Runtime commit: 831d1ad352]
KFDMemoryTest.DeviceHdpFlush requires device node 0 is large bar to
check VRAM content from CPU, run the test only if device 0 is large
bar GPU.
Change-Id: I874b153219550c50b724625e971e3ed3a84dc652
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: 598e3e8d86]
Nodes with XGMI have no HDP, so DriverHDPFlush should skip.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: If5a87e660712e51d03e750d8e044786036b2e603
[ROCm/ROCR-Runtime commit: e32278a612]
Even with the restriction to only compile on gfx90a, this
shader still fails CompileShaders test.
There don't seem to be any systems that actually use it.
Leave it in the shader store, but remove it otherwise
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I41bec6ba10363d42b163ac101c3a92edaad6d6df
[ROCm/ROCR-Runtime commit: 16c6530330]
A gfx940 code path was erroneously added to this shader.
It's unneccesary; without this path, the shader uses
the scalar store, which works just fine on gfx940 without changes.
Remove it.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I825cbbebbdb25c4a7c2f16e228c2bea6a6bcc30c
[ROCm/ROCR-Runtime commit: 2a01e5c33b]
gfx940 changed the semantics of the glc and slc coherency options
on vector stores and loads. This means that shaders that use
those bits no longer compile on gfx940.
Add precompilation if statements to those shaders to use the
new coherency bits.
Also add gfx940 to ASMTest so that compilation is tested.
Note: One of the tests enabled by this patch on gfx940,
KFDEvictTest.QueueTest, does not pass on gfx940 emulators.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I942f9d2536e9eb5510c4d5af30df6ff1a95c8cf7
[ROCm/ROCR-Runtime commit: 30da9a3cf9]
Use q->total_mem_alloc_size for munmap in SVM codepath of free_queue.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2fecaa1ddb337b1fe71f9cbba45a0c9467eff0c0
[ROCm/ROCR-Runtime commit: ae659e5427]
Currently, on queue destroy, context save restore memory is freed
only for a single XCC. Instead, we need to free the entire context
save restore memory, which was allocated for all XCCs.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I51ebb12fa8d5ebed41979d68e74f7c5392dca062
[ROCm/ROCR-Runtime commit: a713fb766e]
Do not allocate the EOP buffer when not required.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I1664a3f0a882219a72278174006cdb8d46fd4f5e
[ROCm/ROCR-Runtime commit: 252a2cf959]
Program ACCUM_OFFSET to match the number of VGPRS used
by the shader as part of Dispatch setup.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: Icfa1fbe4de2a62f00743de567f3ed382d3378b17
[ROCm/ROCR-Runtime commit: 8994c3ba0e]
In multi-partition modes, e.g. CPX, we want to create new file
descriptor despite using the same render node. Update
open_drm_render_device to use a gpu_id to fd map partitioned by render
node. Different gpu_id's requesting the same render node will be added
to that render node's map list for fetching its fd. Different gpu_id's
requesting different render nodes as well as the same gpu_id's
requesting the same render node will behave as they did previously.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie153d42355d4d75b1c6ba6ff40fac3295bc87009
[ROCm/ROCR-Runtime commit: fd48f14ceb]
Allocate debug area big enough for all XCCs in the partition. Also, fix
the cu_num calculations as driver now reports cu_num as the total number
of CUs in the partition.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I6e80d57196b770bb3c2506bc58cb366c0046084b
[ROCm/ROCR-Runtime commit: 97a669a979]
Add gfx version for VGPR size per CU calc, add FAMILY_AV to KfdFamilyId,
add blacklist filter to kfdtest.exclude.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I9b8072e45f4d497e0a8fd3f8f97f1425238e8b42
[ROCm/ROCR-Runtime commit: 6be4461a0d]
Instead of hard-coding lib64 and other include locations, just prepend
the DRM_DIR to the beginning of the CMake prefix path. Then let
pkgconfig find the package, the same way that it would if DRM_DIR wasn't
set. DRM_DIR takes precedence, but the default paths will be used if
DRM_DIR isn't set, or doesn't point to where libdrm is housed
Note that /lib and /lib/$ARCH aren't required for DRM_DIR, just the
path to the root folder for the package (e.g. /opt/amdgpu instead of
/opt/amdgpu/lib or /opt/amdgpu/lib64 or /opt/amdgpu/lib/x86_64-linux-gnu
etc)
Change-Id: I56767db28476d14e3fa77be1089c3904e2a32450
[ROCm/ROCR-Runtime commit: d0c2770cde]
See description of previous revert.
This reverts commit 8554f0df14.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I969dc6469e62b50cd7ba0595918538602afa7516
[ROCm/ROCR-Runtime commit: 287cb29340]
This patch and the previous made it such that the queue ring buffer was
allocated as non-paged for GFX11+. The queue ring buffer should not be
mapped as non-paged; the non-paged requirement on GFX11 is only needed
for the queue wptr.
This patch was causing issues on various tests, such as intermittent
CP_INTSRC_BAD_OPCODE interrupts.
This reverts commit 92a336d485.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I55b64aed73dc3b792f0756ae00daf6e10d93ce10
[ROCm/ROCR-Runtime commit: 0750856d4a]
Test is inconsistent across ASICs. Add to blacklist to unblock QA.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I31e5aa2450165227107536bef8402db2c0dc6d7f
[ROCm/ROCR-Runtime commit: 5d80a4d214]
Get more debug information about user pointers that were registered
through SVM API, and triggered by memory exception events.
A new kfdtest with this use case was also included inside
KFDExceptionTest.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I0ef4929afe0625b9b5cbbbebef11ede66dda60ab
[ROCm/ROCR-Runtime commit: 2a1d6ee8b5]
Register and map userptrs through Shared Virtual Memory(SVM) API at
the Kernel level when available. Using this approach, performance
will be improve as register/unregister memory will not trigger any
system call to KFD driver.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I3726b4b5e1c6a52a83786fbe0af6322eb29ae7c9
[ROCm/ROCR-Runtime commit: 63c8cf115a]
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message
Change-Id: Ibb84241ba35aefb7a8450d68231e52242a634ed3
[ROCm/ROCR-Runtime commit: c911848242]
The MemoryAllocAll test in kfdtests exercises the new KFD memory
availability API by trying to allocate a single buffer object that
exactly fills all of vram. Desired object size is determined using the
memory availility KFD ioctl via libhsakmt, then an object is allocated
slightly larger than that size. If the allocation attempt fails then
the test tries to allocate a slightly smaller object, and continues
trying with smaller sizes until the allocation succeeds. The test
succeeds if the successfully allocated object is within some specified
tolerance of the available memory reported.
There are a number of known issues that can cause the successfully
allocated object to be significantly smaller than reported availability.
Until these issues are addressed, we should not fail the test, but just
log the actual divergence between the size of the object we thought we
could allocate, and what was actually possible.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: I165a30865ffbb2353286dcc896ad8e24af124615
[ROCm/ROCR-Runtime commit: d3bb1ca4af]
Since KFD counts svm allocation as system memory usage,
KFDSVMEvictTest will fail on the case of small system
memory, adding check is to skip test.
Signed-off-by: Eric Huang <jinhuieric.Huang@amd.com>
Change-Id: I040f16f2dd0d4092d069a632cfba9c28293f781b
[ROCm/ROCR-Runtime commit: 3f55ba9fb8]