When the Thunk is initialized multiple times in the lifetime of a single process
, some global resources are leaked. This can happen when dlopen and dlclose are
used to load the library at runtime, rather than linking the runtime against
the Thunk. This patch adds the destructor to release global resources when
dlclose is called.
Change-Id: Ia00da0d41f095d0b2706f98c0e75effedd596f49
This will also fix out of bound access in functions
fmm_get_aperture_base_and_limit and fmm_release
Change-Id: Icf064c46647e69a069126171dbacdf3d5b27f972
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
dgpu_aperture and dgpu_alt_aperture will be shared by all dGPUs.
Change-Id: I814495e43b51acabdc6266cfa8d83db5a062e20d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Break from the for-loop once dgpu VM range is found, otherwise the
length is reduced by half
Change-Id: Ie602054c16ea69ea1cbb75e804ead551bc3615c0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Previous code only works for systems where shared_cpu_map lists 32 or less
bits. Some systems list more than 32 bits and express them as
XXXXXXXX,XXXXXXXX,.... This patch adds that calculation. Also increase
MAX_CPU_CORES and MAX_CACHES to accommodate more advanced systems.
Change-Id: Ia5c7041866456a6aa3b66f8f0f951022d7c51028
Access to reserved address space that has not been allocated should
result in a segfault. Use PROT_NONE to ensure that.
Change-Id: Ic5da9392fabbe78c9ec14f98e8b7b47e5267a98a
Pick up the thunk from the correct location. It is no longer inside
THUNK_ROOT, but instead part of the OUT folder.
Change-Id: I41dd7dae243e66270d0ea7182f1ba119b18a1cfb
Certain versions of rpmbuild need the variable to be outside of curly
braces. This addresses that issue in that situation.
Change-Id: Iff7200b332b9d8e41a4d7676ca14c5a32c075beb
Fill up cache properties of CPU node by reading data from /proc/cpuinfo
and /sys/devices/system/cpu/cpuX/cache/indexY
Change-Id: I0a96760575e504e38962554f192c3fe66bea3c15
By adding REL=1 to the make command line (e.g. make REL=1 deb), we can
create a release build of the Thunk. This will not affect existing
functionality, and will only have an effect if REL=1 is specified on the
command line, or in the build_thunk.sh script.
Change-Id: Iedc3b6094e70a4ebd726499eda56013cc254b83d
Add an option to libhsakmt to allow the thunk to be packaged as an RPM.
The default will remain being built as-is, but this can now be packaged
as an RPM by using "make src rpm" . build_thunk.sh will be modified to
reflect this new option.
Change-Id: I38e03d10cfb5035bdf0a87635a784c47a709a5b6
hsaKmtGetNodeMemoryProperties -
- Return only HSA_HEAPTYPE_SYSTEM memory for CPU only node.
- For dGPU remove redundant HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE
entry.
Change-Id: I0349be39b8409a0fd64a038b8b2956191356d937
Unify fmm_get_aperture_base and fmm_get_aperture_limit into one
function. Make the return value to HSAKMT_STATUS.
Change-Id: I0b3f563ffb268947ab891f4935f61788d0af0e01
hsaKmtAllocMemory only allocates aligned address space and sets up
the scratch_physical aperture to match the allocated address space.
Actual allocation of backing memory happens in hsaKmtMapMemoryToGPU.
Change-Id: Ie709815ab9bedb3d682e096b4005fdfb5e94d3a7
This is a hack to allow the Runtime to allocate system memory with
PreferredNode=0 on a dGPU system. We allocated it from Node 1
instead so that the node 1 GPU can map the memory. A proper fix
will be implemented together with multi-GPU support.
Change-Id: Ieb52599e5275781c04ee34405ea850bf782c523a
Try to reserve as much SVM address space as GPUVM can address.
Implement a fallback scheme to smaller sizes if larger allocations
fail or are not addressable by the GPU, down to an (arbitrary)
minimum of 4GB.
Change-Id: I770177834cc9e6ddd6ef4f20d789eab63c8055cb
When 'make deb' is run create a libhsakmt.deb archive that installs
libhsakmt into the appropriate folder on the target where the dymanic
linker can find it.
Change-Id: I32de7198975f7831e509a67371e78456982b5c42
Kernel ioctl AMDKFD_IOC_GET_PROCESS_APERTURES returns process apertures
only for GPU nodes. The current implementation assumed that this list of
GPU nodes returned by the ioctl has one to one correpondence to sysfs
topology nodes. This fails when non-GPU nodes exist in topology as in
case of Intel + gfx802
Fix this by using gpu_id (./sys/.../kfd/topology/nodes/1/gpu_id) to map
information obtained from kernel ioctl call.
Change-Id: I4ab8ae5354f12cf0b6609fc4b24182b82eb3677f
Fix TONGA_PAGE_SIZE value and move it to libhsakmt.h for usiing it
consistently in all places that require the same alignment for the
same reason. Create a generic alignment helper macro to replace some
incorrect hand-coded size alignments.
Move virtual address and size alignments down into aperture management
functions. Alignment is a per-aperture property that is set during
fmm_init_process_apertures. Doing the alignment there ensures that
all allocations in the same aperture are aligned the same way. Finding
objects by size and address can take the alignment into account.
Also align the size of physical allocations to back aligned virtual
address allocations. CPU mappings do not need to be aligned.
Map anonymous pages over released memory mappings to allow the
backing pages to be released, while keeping the address space
reserved.
Add alignment parameter to free_exec_aligned_memory_gpu to match the
interface of allocate_exec_aligned_memory_cpu. It doesn't make sense
to allow an alignment parameter in one but assume a specific
alignment in the other.
Change-Id: I74226ca6938f4948f643e5aee1d474720cd89e78
Create new device_info and add device ID. Add helper macros to
identify chip families (VI, discrete). For now gfx803 behaves like
gfx802. But if necessary we can have gfx802 or gfx803-specific
code paths or workarounds in the future.
Change-Id: I61b4ffef7dd7796bb34cb01fbff0089bd49507bb
hsa_gfxip_table lists only (supported) GPUs. So assert fail only when a
non-supported GPU is detected.
Change-Id: I6207dc7cd55860c8b3348b6a4ca6102131975722
The default is non-coherent access for better performance on dGPU.
Disabled hsaKmtSetMemoryPolicy function on dGPU to prevent app from
overriding the APE1 settings at runtime.
Fixed dGPU VM aperture limit to be inclusive.
Change-Id: I378ff74a654f533572775c0c97c19779a56bc6d9
1. Add IOCTL defines to set trap handler
2. Add control stack size information on create queue argument.
3. Increase the total save&restore area size for carrizo to include the control stack size.
Signed-off-by: Shaoyun Liu <Shaoyun.liu@amd.com>
Change-Id: Iccf15e073b7db2519e96e7f7b46a89d57ab9a4df
HSA_ENGINE_ID in Perforce added ui32 to the typedef while in Git it doesn't.
This causes conflicts to RT applications. Decision being made is to change Git
to match Perforce.
Change-Id: I7e9c6437b023bb23ec9578737f8534e9453589b9
Currently, Kernel imposes a limit on VM. Thunk should be aware of it.
This fix is required till Kernel VM limit is sorted.
For now both "Host Access" memory and "Local Memory" share the same VM range.
Change-Id: I5a9220face20df9ede2b78bd6201a01dd2ea70e0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Memory size is 64-bit. So use HSAuint64 instead of uint32.
Change-Id: Iaa607dec9c1a1c5ac46ea442fd482210ea550b45
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Patch submitted by Besar Wicaksono
1. Bug on detecting local memory size interpreted as 32 bit value
instead of 64. The bug causes thunk to go into an infinite loop trying
to reserve virtual address range for dgpu system memory.
2. SIMD count in the node property is 0. Runtime use this attribute to
find a gpu device.
Regarding other attributes of intel+tonga topology, Harish started a
discussion on August iirc, could you please share an update ?
This would help me progress with more tests such as scratch memory,
which require the scratch aperture information in order to construct a
buffer srd in gpuvm space.
3. Bug on releasing memory via fmm_release, where no actual release is
being done. The vm_object can't be found because the memory size does
not match due to the allocation padded the size with 32KB.
4. Pointer arithmetic on vm_area allocation/release. The value of
vm_area_t::end seems to be interpreted inconsistently whether it is
(start + size -1) or (start + size).
One example of potential issue I see is the logic could report
larger size of the hole in the vm area list.
5. Resource cleanup on multiple library load/unload within a single
process.
- Any memory allocation on subsequent library load will result
an error "va above limit". To my understanding this is due to
the reserved memory for the system memory not being released on unload.
- The static variable events_page needs to be invalidated
appropriately on library unload so the next load could
reinitialize it.
6. Could you please update if AQL queue is ready to test with the stg
kfd/kmt ?
7. The system memory allocation with size larger than 32KB seems to be
padded by an extra 32KB. I was wondering if we could remove this
overhead.
Change-Id: I039988d36637525089c7569dc3b77e58750e2121
Makefile currently sends build output a default location.
Allow choice of build output location if so desired
using a variable.
Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>