Added application and driver to serve as the starting point for RDMA
unit test uility.
v2: Added initial mmap support
v3: Fixed logic to find correct ioctl handler
v4: Fixed logic in mmap to find correct pages table
Change-Id: Iaf97c0eb2acef2160d542c71afed58cf400414f7
Signed-off-by: Serguei Sagalovitch <Serguei.Sagalovitch@amd.com>
[ROCm/ROCR-Runtime commit: 47cef87a34]
Stop using NUM_OF_SUPPORTED_GPUS. For now the definitions itself cannot
be removed as ioctl code is in upstream Kernel.
Change-Id: If846625a8ad5062d5483e762850c793d3c00b9d0
[ROCm/ROCR-Runtime commit: ce83dc623f]
Fix hsaKmtRegisterMemory to be a no-op for now and move the multi-GPU
implementation to hsaKmtRegisterMemoryToNodes. Make GPU memory mappings
of host memory visible to all GPUs by default. Device memory is still
visible to the allocating GPU only by default (but can be overridden
with hsaKmtRegisterMemoryToNodes for experimenting with P2P).
Change-Id: I73408afbe3b10c8dad2ab3a780f58413249692e6
[ROCm/ROCR-Runtime commit: 063ad3ad9e]
This is required when we have a debug session
Change-Id: If9d6d2d23a9016b6ca9562e02a91fc16e0354ee4
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
[ROCm/ROCR-Runtime commit: 681f4dcecc]
When the Thunk is initialized multiple times in the lifetime of a single process
, some global resources are leaked. This can happen when dlopen and dlclose are
used to load the library at runtime, rather than linking the runtime against
the Thunk. This patch adds the destructor to release global resources when
dlclose is called.
Change-Id: Ia00da0d41f095d0b2706f98c0e75effedd596f49
[ROCm/ROCR-Runtime commit: 582b70f9c3]
This will also fix out of bound access in functions
fmm_get_aperture_base_and_limit and fmm_release
Change-Id: Icf064c46647e69a069126171dbacdf3d5b27f972
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: a4cf02d797]
dgpu_aperture and dgpu_alt_aperture will be shared by all dGPUs.
Change-Id: I814495e43b51acabdc6266cfa8d83db5a062e20d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: 2903a610e1]
Break from the for-loop once dgpu VM range is found, otherwise the
length is reduced by half
Change-Id: Ie602054c16ea69ea1cbb75e804ead551bc3615c0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
[ROCm/ROCR-Runtime commit: 5a55383baf]
Previous code only works for systems where shared_cpu_map lists 32 or less
bits. Some systems list more than 32 bits and express them as
XXXXXXXX,XXXXXXXX,.... This patch adds that calculation. Also increase
MAX_CPU_CORES and MAX_CACHES to accommodate more advanced systems.
Change-Id: Ia5c7041866456a6aa3b66f8f0f951022d7c51028
[ROCm/ROCR-Runtime commit: a5bc8360e8]
Access to reserved address space that has not been allocated should
result in a segfault. Use PROT_NONE to ensure that.
Change-Id: Ic5da9392fabbe78c9ec14f98e8b7b47e5267a98a
[ROCm/ROCR-Runtime commit: 62337b6c0a]
Pick up the thunk from the correct location. It is no longer inside
THUNK_ROOT, but instead part of the OUT folder.
Change-Id: I41dd7dae243e66270d0ea7182f1ba119b18a1cfb
[ROCm/ROCR-Runtime commit: 3786e18d99]
Certain versions of rpmbuild need the variable to be outside of curly
braces. This addresses that issue in that situation.
Change-Id: Iff7200b332b9d8e41a4d7676ca14c5a32c075beb
[ROCm/ROCR-Runtime commit: 4e4d4a81e1]
Fill up cache properties of CPU node by reading data from /proc/cpuinfo
and /sys/devices/system/cpu/cpuX/cache/indexY
Change-Id: I0a96760575e504e38962554f192c3fe66bea3c15
[ROCm/ROCR-Runtime commit: b6f65f9849]
By adding REL=1 to the make command line (e.g. make REL=1 deb), we can
create a release build of the Thunk. This will not affect existing
functionality, and will only have an effect if REL=1 is specified on the
command line, or in the build_thunk.sh script.
Change-Id: Iedc3b6094e70a4ebd726499eda56013cc254b83d
[ROCm/ROCR-Runtime commit: cb3a664065]
Passing in the wrong aperture resulted in failure to unmap scratch.
Change-Id: Icd7423abfb1bcc773b33becffcbefc233f4ff340
[ROCm/ROCR-Runtime commit: bd93eecc64]
Add an option to libhsakmt to allow the thunk to be packaged as an RPM.
The default will remain being built as-is, but this can now be packaged
as an RPM by using "make src rpm" . build_thunk.sh will be modified to
reflect this new option.
Change-Id: I38e03d10cfb5035bdf0a87635a784c47a709a5b6
[ROCm/ROCR-Runtime commit: 6ceed7def3]
hsaKmtGetNodeMemoryProperties -
- Return only HSA_HEAPTYPE_SYSTEM memory for CPU only node.
- For dGPU remove redundant HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE
entry.
Change-Id: I0349be39b8409a0fd64a038b8b2956191356d937
[ROCm/ROCR-Runtime commit: f885e551aa]
Unify fmm_get_aperture_base and fmm_get_aperture_limit into one
function. Make the return value to HSAKMT_STATUS.
Change-Id: I0b3f563ffb268947ab891f4935f61788d0af0e01
[ROCm/ROCR-Runtime commit: cb53548c89]
hsaKmtAllocMemory only allocates aligned address space and sets up
the scratch_physical aperture to match the allocated address space.
Actual allocation of backing memory happens in hsaKmtMapMemoryToGPU.
Change-Id: Ie709815ab9bedb3d682e096b4005fdfb5e94d3a7
[ROCm/ROCR-Runtime commit: 5131ab4e64]
This is a hack to allow the Runtime to allocate system memory with
PreferredNode=0 on a dGPU system. We allocated it from Node 1
instead so that the node 1 GPU can map the memory. A proper fix
will be implemented together with multi-GPU support.
Change-Id: Ieb52599e5275781c04ee34405ea850bf782c523a
[ROCm/ROCR-Runtime commit: 590c8e522c]
Try to reserve as much SVM address space as GPUVM can address.
Implement a fallback scheme to smaller sizes if larger allocations
fail or are not addressable by the GPU, down to an (arbitrary)
minimum of 4GB.
Change-Id: I770177834cc9e6ddd6ef4f20d789eab63c8055cb
[ROCm/ROCR-Runtime commit: 39bde26c9b]
When 'make deb' is run create a libhsakmt.deb archive that installs
libhsakmt into the appropriate folder on the target where the dymanic
linker can find it.
Change-Id: I32de7198975f7831e509a67371e78456982b5c42
[ROCm/ROCR-Runtime commit: 0df346aaf9]
Kernel ioctl AMDKFD_IOC_GET_PROCESS_APERTURES returns process apertures
only for GPU nodes. The current implementation assumed that this list of
GPU nodes returned by the ioctl has one to one correpondence to sysfs
topology nodes. This fails when non-GPU nodes exist in topology as in
case of Intel + gfx802
Fix this by using gpu_id (./sys/.../kfd/topology/nodes/1/gpu_id) to map
information obtained from kernel ioctl call.
Change-Id: I4ab8ae5354f12cf0b6609fc4b24182b82eb3677f
[ROCm/ROCR-Runtime commit: 5cc56a2647]
Fix TONGA_PAGE_SIZE value and move it to libhsakmt.h for usiing it
consistently in all places that require the same alignment for the
same reason. Create a generic alignment helper macro to replace some
incorrect hand-coded size alignments.
Move virtual address and size alignments down into aperture management
functions. Alignment is a per-aperture property that is set during
fmm_init_process_apertures. Doing the alignment there ensures that
all allocations in the same aperture are aligned the same way. Finding
objects by size and address can take the alignment into account.
Also align the size of physical allocations to back aligned virtual
address allocations. CPU mappings do not need to be aligned.
Map anonymous pages over released memory mappings to allow the
backing pages to be released, while keeping the address space
reserved.
Add alignment parameter to free_exec_aligned_memory_gpu to match the
interface of allocate_exec_aligned_memory_cpu. It doesn't make sense
to allow an alignment parameter in one but assume a specific
alignment in the other.
Change-Id: I74226ca6938f4948f643e5aee1d474720cd89e78
[ROCm/ROCR-Runtime commit: 6a5ca4bc5a]
Create new device_info and add device ID. Add helper macros to
identify chip families (VI, discrete). For now gfx803 behaves like
gfx802. But if necessary we can have gfx802 or gfx803-specific
code paths or workarounds in the future.
Change-Id: I61b4ffef7dd7796bb34cb01fbff0089bd49507bb
[ROCm/ROCR-Runtime commit: 0fc0a5b526]
hsa_gfxip_table lists only (supported) GPUs. So assert fail only when a
non-supported GPU is detected.
Change-Id: I6207dc7cd55860c8b3348b6a4ca6102131975722
[ROCm/ROCR-Runtime commit: 758824db17]
The default is non-coherent access for better performance on dGPU.
Disabled hsaKmtSetMemoryPolicy function on dGPU to prevent app from
overriding the APE1 settings at runtime.
Fixed dGPU VM aperture limit to be inclusive.
Change-Id: I378ff74a654f533572775c0c97c19779a56bc6d9
[ROCm/ROCR-Runtime commit: 8e836f8183]
1. Add IOCTL defines to set trap handler
2. Add control stack size information on create queue argument.
3. Increase the total save&restore area size for carrizo to include the control stack size.
Signed-off-by: Shaoyun Liu <Shaoyun.liu@amd.com>
Change-Id: Iccf15e073b7db2519e96e7f7b46a89d57ab9a4df
[ROCm/ROCR-Runtime commit: 2d63ee7b8f]