Calculate the actual scratch memory size required based on the
packet information for kernel dispatch.
If the required size exceeds the total allocated memory, scratch
memory must be reallocated. Otherwise, no action is needed.
miopen_gtest: Full/GPU_MIOpenDriverRegressionTest_FP16.MIOpenDriverRegressionHalf/0
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
Reviewed-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Horatio Zhang <Hongkun.Zhang@amd.com>
Replace direct D3DKMT API calls with DXCORE_CALL macro in WDDM
thunk layer. This enables dynamic loading of DXCore functions
while maintaining the same API interface.
Updated thunk functions:
- MapGpuVirtualAddress, CreateAllocation, DestroyAllocation
- ReserveGpuVirtualAddress, FreeGpuVirtualAddress
- MakeResident, Evict, ShareObjects
- QueryResourceInfoFromNtHandle, OpenResourceFromNtHandle
All existing functionality is preserved while adding flexibility
for runtime DXCore availability detection.
Signed-off-by: Chengjun Yao <Chengjun.Yao@amd.com>
Signed-off-by: Yang Su <Yang.Su2@amd.com>
Reviewed-by: Shi.Leslie <Yuliang.Shi@amd.com>
For multi-GPU supporting, local heap and system heap managers are
implemented in thunk runtime, so the heap allocation function
ReserveGpuVirtualAddress should be moved to runtime too.
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
In multi-GPU, handle aperture is shared between all GPUs, not belongs to
specific one GPU, so move it from wddm device (which presents a specific GPU)
to thunk runtime which has gloable view, can manage handle aperture for all GPUs.
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
In multi-GPU, system heap space is shared between all GPUs, not belongs to
specific one GPU, so move it from wddm device (which presents a specific GPU)
to thunk runtime which has gloable view, can manage system heap for all GPUs.
Introduce a new va_Mgr instance to manage system heap, since local heap
and system heap both comply with SVM(Shared Virtual Memory), without
this new mgr, every allocation has to call KMD at least once (each GPU
needs a call) to allocate GPU VA, the new mgr manage the space itself,
no longer call KMD.
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
In multi-GPU, local heap space is shared between all GPUs, not belongs to
specific one GPU, so move it from wddm device (which presents a specific GPU)
to thunk runtime which has gloable view, can manage local heap for all GPUs.
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
IPC Signal only support sys ram backend and CPU&GPU both accessible,
IPC Memory only support vram backend and only GPU accessible.
Reviewed-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>
The legacy mode means buffer sharing through KFD, KFD provide a buffer
id to exporter, exporter pass it to importer, importer pass buffer id
to KFD to query and import this buffer.
The non-legcay mode relys on socket to pass dmabuf fd between processes.
In hsa-runtime, the legcay mode is the default mode, setting environment
variable HSA_ENABLE_IPC_MODE_LEGACY to 0 can force hsa-runtime to new
mode code path.
Reviewed-by: Flora Cui <flora.cui@amd.com>
Reviewed-by: Longlong Yao <Longlong.Yao@amd.com>
Signed-off-by: tiancyin <tianci.yin@amd.com>