Gfx9 requires monotonic write pointer and doorbell.
Cound fields are 1-based compared with 0-based pre-Gfx9.
- Restructure implementation to use monotonic ring indices
- Remove redundant submission size checks (handled by AcquireWriteAddress)
- Unify copy/fill per-command limit (documentation is unclear)
Change-Id: I57c1675221d2e63aa319fee700d9951671e1bd65
[ROCm/ROCR-Runtime commit: 1cd46afe6d]
Note: Implementation same as 1.0 APIs for now.
The followup change will have the complete implementation.
Change-Id: Ife633f74ff27eee0bb9b0c46952cf5233b0114e8
[ROCm/ROCR-Runtime commit: a324f21a46]
Initial work to import the latest (1.1) hsa_ext_image extension.
Change-Id: I51d70ef26f97250c884b3def2088be0d7eb04eb3
[ROCm/ROCR-Runtime commit: 31d379c821]
If fork() is called, clear all duplicated data that is invalid in the
child process.
Change-Id: I4e27198060db593c630c6337b7071dfbd0d80b83
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: f1f62d863c]
Currently HSA HW Profiler is failing to build due to this patch
This reverts commit 91866d4bbb.
Change-Id: Iabb2b958f33ba614a24b61bb370905b3b7362708
[ROCm/ROCR-Runtime commit: 5162a76616]
Initial work to import the latest (1.1) hsa_ext_image extension.
Change-Id: I4d55adb09ba4d4dbd43d47a4bc54077d4bc531d2
[ROCm/ROCR-Runtime commit: e0ce8855dd]
CWSR buffers can be large on dGPUs (~21MB on gfx803). Allocating them
in VRAM limits the number of queues that can be created unnecessarily.
Also make freeing of per-queue buffers symmetric with allocation. All
buffers are now allocated with allocate_exec_aligned_memory on dGPUs
and APUs, so use free_exec_aligned_memory to free them.
Change-Id: I45e8cb1801857d0268750202cdd422426611e457
[ROCm/ROCR-Runtime commit: 4181b408fc]
Also emit error messages to stderr if no async queue error callback was registered and queue fault messages are enabled (on by default).
Queue fault messages are controlled with env key HSA_ENABLE_QUEUE_FAULT_MESSAGE.
Change-Id: I496487b8d048b83aa95b9784e92928211f167b17
[ROCm/ROCR-Runtime commit: 0e17cc2887]
Uncommented HSA IPC code.
Changed hsa_amd_ipc_memory_t to be 8 uint32_t's instead of 9 to
match spec
Change-Id: Id1523125e9b876a23c3743df1be29c98b47f6725
[ROCm/ROCR-Runtime commit: 160f8c5880]
Implement three new APIs for IPC buffer sharing:
-hsaKmtShareMemory()
-hsaKmtRegisterSharedHandle()
-hsaKmtRegisterSharedHandleToNodes()
Add new ioclts necessary for the above APIs.
Change-Id: Ia2b4d0dc91ec64bff959395d11c0536467404792
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: 559e31d6ff]
A memory region is allowed to be registered multiple times when the memory
is specified by a user pointer. If it's registered with the same user
pointer but with different sizes, it's treated as different instances and
multiple VM objects are created with different GPU address.
Change-Id: I49627111bb5db36d18f1133b252fb62a611f06a4
[ROCm/ROCR-Runtime commit: 2a50ebba98]
Currently, if a process' parent called hsaKmtOpen, the child will be
unable to open a connection to KFD, since kfd_open_count will be > 0.
When forking, the refcount should be reset, in order to allow the child
to re-open /dev/kfd.
Change-Id: Ia4b78f6bacc4f82e8ac724e5f488a3eff5084007
[ROCm/ROCR-Runtime commit: 0de39b6724]
Ensure that the write index and ring buffer contents are visible
to the HW before sending the doorbell. The latter is a write-combined
MMIO store and must be ordered with prior cacehable non-MMIO stores.
Also be more explicit about memory semantics for doorbell stores.
Change-Id: Ie4d96a7ee2a507237a8dbe7705fdf234d62ce9ba
[ROCm/ROCR-Runtime commit: d5b4078072]
gfx802 requires a workaround for a VM TLB bug in which lookups use
the ACTIVE bit of the 8th PTE within any aligned group of 8 PTEs.
Until this is fixed in amdgpu the GPUVM doorbell logic will fail.
Change-Id: I5ec7b1fcd8b7677011a141d27cfc486c45d9a415
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
[ROCm/ROCR-Runtime commit: 5493ae420b]
If we issue too many copy commands without syncing and wrapping happens,
we need to wait for the blits to be done before moving forward otherwise
we will overwrite the kernel args of the blits in flight.
Change-Id: I9a21e31ce07f8e8157ca38e96dc264ff47fd3639
[ROCm/ROCR-Runtime commit: 5519c96b74]
Allocate doorbells for dGPUs in the SVM aperture and map them for
GPU access. This is necessary to allow GPU-initiated submissions to
user mode queues.
Depends on new doorbell BO allocation flag in KFD.
Change-Id: I0737bef4a4764bb4a66c43846707ead2108f6601
[ROCm/ROCR-Runtime commit: 2e0a6eb371]
CPU cache information reported by Thunk topology is obtained from cpuid
instruction. This instruction only applies to X86 systems. It can cause
compile errors on non-X86 platforms. This patch temporarily disables CPU
cache functions in topology for non-X86 platforms in order to compile.
Change-Id: If86671817b0d036cb324eebf3f354682bfb75856
[ROCm/ROCR-Runtime commit: 660a6ebbd4]