gfx940 changed the semantics of the glc and slc coherency options
on vector stores and loads. This means that shaders that use
those bits no longer compile on gfx940.
Add precompilation if statements to those shaders to use the
new coherency bits.
Also add gfx940 to ASMTest so that compilation is tested.
Note: One of the tests enabled by this patch on gfx940,
KFDEvictTest.QueueTest, does not pass on gfx940 emulators.
Signed-off-by: David Francis <David.Francis@amd.com>
Change-Id: I942f9d2536e9eb5510c4d5af30df6ff1a95c8cf7
[ROCm/ROCR-Runtime commit: 30da9a3cf9]
Use q->total_mem_alloc_size for munmap in SVM codepath of free_queue.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I2fecaa1ddb337b1fe71f9cbba45a0c9467eff0c0
[ROCm/ROCR-Runtime commit: ae659e5427]
Currently, on queue destroy, context save restore memory is freed
only for a single XCC. Instead, we need to free the entire context
save restore memory, which was allocated for all XCCs.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I51ebb12fa8d5ebed41979d68e74f7c5392dca062
[ROCm/ROCR-Runtime commit: a713fb766e]
Do not allocate the EOP buffer when not required.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: I1664a3f0a882219a72278174006cdb8d46fd4f5e
[ROCm/ROCR-Runtime commit: 252a2cf959]
Program ACCUM_OFFSET to match the number of VGPRS used
by the shader as part of Dispatch setup.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: Icfa1fbe4de2a62f00743de567f3ed382d3378b17
[ROCm/ROCR-Runtime commit: 8994c3ba0e]
We used to report HSA_STATUS_ERROR_INVALID_ISA when receiving error code
128, but there are several other reasons why we could be exceeding
number of VGPRs, so updating the error code.
Change-Id: I6a6980d5b07b09c93d00dee5207a0d52399bc77e
[ROCm/ROCR-Runtime commit: f43a284b8e]
In multi-partition modes, e.g. CPX, we want to create new file
descriptor despite using the same render node. Update
open_drm_render_device to use a gpu_id to fd map partitioned by render
node. Different gpu_id's requesting the same render node will be added
to that render node's map list for fetching its fd. Different gpu_id's
requesting different render nodes as well as the same gpu_id's
requesting the same render node will behave as they did previously.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ie153d42355d4d75b1c6ba6ff40fac3295bc87009
[ROCm/ROCR-Runtime commit: fd48f14ceb]
Allocate debug area big enough for all XCCs in the partition. Also, fix
the cu_num calculations as driver now reports cu_num as the total number
of CUs in the partition.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I6e80d57196b770bb3c2506bc58cb366c0046084b
[ROCm/ROCR-Runtime commit: 97a669a979]
Add gfx version for VGPR size per CU calc, add FAMILY_AV to KfdFamilyId,
add blacklist filter to kfdtest.exclude.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I9b8072e45f4d497e0a8fd3f8f97f1425238e8b42
[ROCm/ROCR-Runtime commit: 6be4461a0d]
One some platforms, e.g Arch Linux, -D_GLIBCXX_ASSERTIONS compile flag
is enabled by default, causing a runtime assertion.
Avoid assertion by using std::vector accessor function data().
Change-Id: I118cdf102c3e353f32c618823e363ee1059f3453
[ROCm/ROCR-Runtime commit: 511855d344]
Fix for overwriting pointer info size provided by caller of
hsa_amd_pointer_info.
Change-Id: I2e5d73ab9ba1a32bc9b4d112bc29b4a99fd8b3b5
[ROCm/ROCR-Runtime commit: c5bf7eb112]
Some applications will keep trying to allocate device memory until the
allocation fails. This causes all device memory to be used up and we are
then unable to allocate scratch memory for dispatches. Reserve enough
memory for 1 small scratch allocation.
Change-Id: I968400d41540ba1aca8f28581f229693eec02225
[ROCm/ROCR-Runtime commit: 8ebf5f9c48]
Instead of hard-coding lib64 and other include locations, just prepend
the DRM_DIR to the beginning of the CMake prefix path. Then let
pkgconfig find the package, the same way that it would if DRM_DIR wasn't
set. DRM_DIR takes precedence, but the default paths will be used if
DRM_DIR isn't set, or doesn't point to where libdrm is housed
Note that /lib and /lib/$ARCH aren't required for DRM_DIR, just the
path to the root folder for the package (e.g. /opt/amdgpu instead of
/opt/amdgpu/lib or /opt/amdgpu/lib64 or /opt/amdgpu/lib/x86_64-linux-gnu
etc)
Change-Id: I56767db28476d14e3fa77be1089c3904e2a32450
[ROCm/ROCR-Runtime commit: d0c2770cde]
See description of previous revert.
This reverts commit 8554f0df14.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I969dc6469e62b50cd7ba0595918538602afa7516
[ROCm/ROCR-Runtime commit: 287cb29340]
This patch and the previous made it such that the queue ring buffer was
allocated as non-paged for GFX11+. The queue ring buffer should not be
mapped as non-paged; the non-paged requirement on GFX11 is only needed
for the queue wptr.
This patch was causing issues on various tests, such as intermittent
CP_INTSRC_BAD_OPCODE interrupts.
This reverts commit 92a336d485.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I55b64aed73dc3b792f0756ae00daf6e10d93ce10
[ROCm/ROCR-Runtime commit: 0750856d4a]
Test is inconsistent across ASICs. Add to blacklist to unblock QA.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I31e5aa2450165227107536bef8402db2c0dc6d7f
[ROCm/ROCR-Runtime commit: 5d80a4d214]
Get more debug information about user pointers that were registered
through SVM API, and triggered by memory exception events.
A new kfdtest with this use case was also included inside
KFDExceptionTest.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I0ef4929afe0625b9b5cbbbebef11ede66dda60ab
[ROCm/ROCR-Runtime commit: 2a1d6ee8b5]
Register and map userptrs through Shared Virtual Memory(SVM) API at
the Kernel level when available. Using this approach, performance
will be improve as register/unregister memory will not trigger any
system call to KFD driver.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I3726b4b5e1c6a52a83786fbe0af6322eb29ae7c9
[ROCm/ROCR-Runtime commit: 63c8cf115a]
Wait on completion signal for amd_aql_pm4_ib processing
on ASICs with gfx version >= 9.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: Ia704d9cc5b2535dcf8564a30f694262b113f77a2
[ROCm/ROCR-Runtime commit: aec7200cb2]
Engine offset that is the maximum number of engines is still valid
as offset enum 0 is occupied by blit copies so raise the limit by 1.
Change-Id: I6fcab106290e6647702efe297a4281861da4e0b8
[ROCm/ROCR-Runtime commit: fc8f3f9fd5]
Package ASAN libraries and license file
Suffix "asan" added to package name
Change-Id: I2af416d86a9068a41e3880836a21c9005e45271b
[ROCm/ROCR-Runtime commit: dd9b7b3b3a]
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message
Change-Id: Ibb84241ba35aefb7a8450d68231e52242a634ed3
[ROCm/ROCR-Runtime commit: c911848242]
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message
Change-Id: Ib48e361b72176e2845c8f74f980f0234e7eb4a7d
[ROCm/ROCR-Runtime commit: 629ddde072]
Adds hsa_amd_portable_export_dmabuf and hsa_amd_portable_close_dmabuf
which allow obtaining dmabuf handles to rocr allocations. These handles
may be shared with other APIs to support cross vendor & cross device
memory sharing.
Adds query to return whether dmabuf export is supported
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: I7f98501087d9563d07fc2cb428cc886b1e518b1e
[ROCm/ROCR-Runtime commit: 42243c1e8f]
Forgot SDMA blit engine indices are offset by DevToDev 0-position in
a couple of places.
Change-Id: Ie811d8281bc812738ed0107694f3dffde5e93685
[ROCm/ROCR-Runtime commit: 7364a93b98]
The MemoryAllocAll test in kfdtests exercises the new KFD memory
availability API by trying to allocate a single buffer object that
exactly fills all of vram. Desired object size is determined using the
memory availility KFD ioctl via libhsakmt, then an object is allocated
slightly larger than that size. If the allocation attempt fails then
the test tries to allocate a slightly smaller object, and continues
trying with smaller sizes until the allocation succeeds. The test
succeeds if the successfully allocated object is within some specified
tolerance of the available memory reported.
There are a number of known issues that can cause the successfully
allocated object to be significantly smaller than reported availability.
Until these issues are addressed, we should not fail the test, but just
log the actual divergence between the size of the object we thought we
could allocate, and what was actually possible.
Signed-off-by: Daniel Phillips <daniel.phillips@amd.com>
Change-Id: I165a30865ffbb2353286dcc896ad8e24af124615
[ROCm/ROCR-Runtime commit: d3bb1ca4af]
Since KFD counts svm allocation as system memory usage,
KFDSVMEvictTest will fail on the case of small system
memory, adding check is to skip test.
Signed-off-by: Eric Huang <jinhuieric.Huang@amd.com>
Change-Id: I040f16f2dd0d4092d069a632cfba9c28293f781b
[ROCm/ROCR-Runtime commit: 3f55ba9fb8]
Implement hsaKmtExportDMABufHandle, which can be used for a new
upstreamable RDMA solution. It exports a DMABuf handle for an arbitrary
virtual address along with the offset of the address within the
allocation. It also checks that the size of the intended export does
not exceed the allocation.
This uses the new AMDKFD_IOC_EXPORT_DMABUF, which requires KFD ioctl
API version 1.12.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ie5fdb1f73ab3c7fa36c315ce326b1fb89eacc8b6
[ROCm/ROCR-Runtime commit: 332f59eb2a]
If the KFD IOCTL version doesn't support available_memory, don't run the
test. Just skip the test
Change-Id: Iebf526d4563ab9f3c054bbfb38c214a1b893fcb5
[ROCm/ROCR-Runtime commit: 64aa9009e1]
Use mwaitx instructions when busy waiting for signals to reduce CPU
energy usage.
This can be disabled by setting HSA_ENABLE_MWAITX=0
Change-Id: Ic207895a491b2bf6dacba47ef0921df3faad5b5a
[ROCm/ROCR-Runtime commit: cc48dfdbff]