İşleme Grafiği

103 İşleme

Yazar SHA1 Mesaj Tarih
Felix Kuehling cc9fc386bd Remove gfx802 page size workaround on gfx803
All tonga page size alignment is done in the memory management
functions in fmm.c. All other code only specifies the minimum
alignment it needs and lets fmm.c handle the HW-specific
alignment.

Clean up aligned-exec memory allocation in queue.c to remove
hard-coded TONGA_PAGE_SIZE alignments and remove code duplication.
Make sure alignments are consistent between allocate and free.

Change-Id: Ia8923448173d1cef315af24cebff12adef385cb0
2016-01-28 16:05:18 -05:00
David Ogbeide 4e4a881940 libhsakmt: Add marketing names for GPU nodes
HSA thunk API returns null when querying for GPU node marketing
names due to empty system topology file.

- Add marketing names to device GFX IP data structs.
- Modify name retrieval to pull from data structs instead of file.



Signed-off by: David Ogbeide <davidboyowa.ogbeide@amd.com>

Change-Id: I30ea04111be7e0df2e93894f801fbeb414ffa790
2016-01-25 11:03:54 -05:00
Felix Kuehling cc7491ec71 Link libhsakmt with -z nodelete
This prevents the library from being unloaded at runtime, even when
dlclose is called. This preserves global variables, such as state
about the SVM address space and avoids catastrophic leaks on dlclose.

Change-Id: I34f1d19a450835200e9d4815458e8d1b3045053c
2016-01-22 18:08:19 -05:00
Amber Lin 7416805a44 Revert "Free resources when dlclose is called"
This reverts commit 582b70f9c3.

Conflicts:
	src/fmm.c
	src/perfctr.c

Change-Id: Ib6113c2dd3962c72100c7f74cdef6897e1df40b3
2016-01-22 17:58:33 -05:00
Harish Kasiviswanathan 5e53205b9e Don't limit number of supported HSA Nodes
Remove #define MAX_NODES 8

Change-Id: I756cadc652543dd17ea48a1c956adc08c3d2631a
2016-01-15 17:27:43 -05:00
Harish Kasiviswanathan ce83dc623f Don't limit number of supported GPUs
Stop using NUM_OF_SUPPORTED_GPUS. For now the definitions itself cannot
be removed as ioctl code is in upstream Kernel.

Change-Id: If846625a8ad5062d5483e762850c793d3c00b9d0
2016-01-15 11:44:42 -05:00
Harish Kasiviswanathan e7e1361c3d Use new ioctl for getting process apertures
Change-Id: I73678744ad73942edec442ad9c6d38637f7e1235
2016-01-12 12:09:25 -05:00
Felix Kuehling 063ad3ad9e Implement hsaKmtRegisterMemoryToNodes
Fix hsaKmtRegisterMemory to be a no-op for now and move the multi-GPU
implementation to hsaKmtRegisterMemoryToNodes. Make GPU memory mappings
of host memory visible to all GPUs by default. Device memory is still
visible to the allocating GPU only by default (but can be overridden
with hsaKmtRegisterMemoryToNodes for experimenting with P2P).

Change-Id: I73408afbe3b10c8dad2ab3a780f58413249692e6
2016-01-08 16:00:23 -05:00
Ben Goz ea0f9d2a0b Adding support for mGPU
Change-Id: I5ed184e6a58b38d9dde48867f14513d161cf41a9
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-01-04 15:35:15 +02:00
Ben Goz 53b208adf2 Fix AQL Double buffer allocation mode
Change-Id: I5162ffd89416d317fd0ca0fc51da523298488922
Signed-off-by: Ben Goz <ben.goz@amd.com>
2016-01-04 15:34:53 +02:00
Yair Shachar 681f4dcecc Add support for scratch GPUVM on host memory
This is required when we have a debug session



Change-Id: If9d6d2d23a9016b6ca9562e02a91fc16e0354ee4
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
2015-12-20 15:50:50 +02:00
Harish Kasiviswanathan 39bf9c6611 Fix node_id in gpu_mem[] array
Change-Id: I4897623612e1749e275fb97ce1603dc5130fc9ce
2015-12-14 16:25:18 -05:00
Amber Lin 582b70f9c3 Free resources when dlclose is called
When the Thunk is initialized multiple times in the lifetime of a single process
, some global resources are leaked. This can happen when dlopen and dlclose are
 used to load the library at runtime, rather than linking the runtime against
the Thunk. This patch adds the destructor to release global resources when
dlclose is called.


Change-Id: Ia00da0d41f095d0b2706f98c0e75effedd596f49
2015-12-11 16:32:41 -05:00
Yair Shachar 8f529e3c72 Add support for per device debug register state tracking
Change-Id: I8d51670f5de8d379ead898d484f668a8034f9878
Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>
2015-12-07 21:11:21 +02:00
Harish Kasiviswanathan a4cf02d797 Remove unused parameter gpu_id from few functions
This will also fix out of bound access in functions
fmm_get_aperture_base_and_limit and fmm_release



Change-Id: Icf064c46647e69a069126171dbacdf3d5b27f972
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-11-30 11:51:44 -05:00
Harish Kasiviswanathan 2903a610e1 Use same VM range for all dGPUs
dgpu_aperture and dgpu_alt_aperture will be shared by all dGPUs.



Change-Id: I814495e43b51acabdc6266cfa8d83db5a062e20d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-11-26 15:07:29 -05:00
Harish Kasiviswanathan 5a55383baf Fix dgpu_vm_limit
Break from the for-loop once dgpu VM range is found, otherwise the
length is reduced by half

Change-Id: Ie602054c16ea69ea1cbb75e804ead551bc3615c0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-11-23 11:51:39 -05:00
Amber Lin a5bc8360e8 Fix sibling map in CPU cache properties
Previous code only works for systems where shared_cpu_map lists 32 or less
bits. Some systems list more than 32 bits and express them as
XXXXXXXX,XXXXXXXX,.... This patch adds that calculation. Also increase
MAX_CPU_CORES and MAX_CACHES to accommodate more advanced systems.

Change-Id: Ia5c7041866456a6aa3b66f8f0f951022d7c51028
2015-11-12 08:31:51 -05:00
Felix Kuehling 62337b6c0a Reserve address space with PROT_NONE
Access to reserved address space that has not been allocated should
result in a segfault. Use PROT_NONE to ensure that.


Change-Id: Ic5da9392fabbe78c9ec14f98e8b7b47e5267a98a
2015-11-10 18:19:56 -05:00
Amber Lin b6f65f9849 Add CPU cache information
Fill up cache properties of CPU node by reading data from /proc/cpuinfo
and /sys/devices/system/cpu/cpuX/cache/indexY



Change-Id: I0a96760575e504e38962554f192c3fe66bea3c15
2015-11-09 07:16:24 -05:00
Kent Russell cb3a664065 Add option to create release build for Thunk
By adding REL=1 to the make command line (e.g. make REL=1 deb), we can
create a release build of the Thunk. This will not affect existing
functionality, and will only have an effect if REL=1 is specified on the
command line, or in the build_thunk.sh script.

Change-Id: Iedc3b6094e70a4ebd726499eda56013cc254b83d
2015-10-30 14:05:40 -04:00
Felix Kuehling bd93eecc64 Use correct aperture for _fmm_unmap_from_gpu_scratch
Passing in the wrong aperture resulted in failure to unmap scratch.


Change-Id: Icd7423abfb1bcc773b33becffcbefc233f4ff340
2015-10-29 18:26:15 -04:00
Philip Cox 0c234c7ef3 Add SDMA IOCTL type to Create Queue function.
Change-Id: I7e31507b761ca388b2cac93f994f6106de962f17
2015-10-29 10:25:41 -04:00
Kent Russell 6ceed7def3 libhsakmt - Add make option to package thunk as RPM
Add an option to libhsakmt to allow the thunk to be packaged as an RPM.
The default will remain being built as-is, but this can now be packaged
as an RPM by using "make src rpm" . build_thunk.sh will be modified to
reflect this new option.

Change-Id: I38e03d10cfb5035bdf0a87635a784c47a709a5b6
2015-10-29 07:49:13 -04:00
Harish Kasiviswanathan f885e551aa Remove erroneous and redundant memory banks reported
hsaKmtGetNodeMemoryProperties -
	- Return only HSA_HEAPTYPE_SYSTEM memory for CPU only node.
	- For dGPU remove redundant HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE
	  entry.

Change-Id: I0349be39b8409a0fd64a038b8b2956191356d937
2015-10-23 18:43:46 -04:00
Harish Kasiviswanathan 69662da3dc Correct parameter name for topology_is_dgpu()
The function expects device_id and not gpu_id.

Change-Id: I79794fd4e58e6e6adb26659da30f3e4d8e108434
2015-10-23 18:43:45 -04:00
Harish Kasiviswanathan cb53548c89 Unify fmm_get_aperture_xxx functions
Unify fmm_get_aperture_base and fmm_get_aperture_limit into one
function. Make the return value to HSAKMT_STATUS.

Change-Id: I0b3f563ffb268947ab891f4935f61788d0af0e01
2015-10-23 18:43:34 -04:00
Felix Kuehling 5131ab4e64 Implement flat scratch support for dGPU
hsaKmtAllocMemory only allocates aligned address space and sets up
the scratch_physical aperture to match the allocated address space.

Actual allocation of backing memory happens in hsaKmtMapMemoryToGPU.

Change-Id: Ie709815ab9bedb3d682e096b4005fdfb5e94d3a7
2015-10-22 20:40:22 -04:00
Felix Kuehling 149261ba09 Allow address space allocations with specific alignment
Change-Id: I4bf7f7ac53c3921dd330b9dc7a40582611f88b69
2015-10-22 20:27:49 -04:00
Ben Goz 55b1a5dc43 Casting local memory size to uint64_t
Change-Id: I5c2010056b84ac01bb65361210d2a693e437050a
Signed-off-by: Ben Goz <ben.goz@amd.com>
2015-10-22 09:05:34 -04:00
Ben Goz e61500c46e Adding support for new AQL Queue Memory allocation
Change-Id: If84fc4b961627dbdd0b77b1c509a3c9a4c709b9f
Signed-off-by: Ben Goz <ben.goz@amd.com>
2015-10-22 13:13:54 +03:00
Felix Kuehling 590c8e522c Fix node 0 system memory allocation for dGPU
This is a hack to allow the Runtime to allocate system memory with
PreferredNode=0 on a dGPU system. We allocated it from Node 1
instead so that the node 1 GPU can map the memory. A proper fix
will be implemented together with multi-GPU support.

Change-Id: Ieb52599e5275781c04ee34405ea850bf782c523a
2015-10-21 20:00:01 -04:00
Felix Kuehling 39bde26c9b Reserve more SVM process address space
Try to reserve as much SVM address space as GPUVM can address.
Implement a fallback scheme to smaller sizes if larger allocations
fail or are not addressable by the GPU, down to an (arbitrary)
minimum of 4GB.

Change-Id: I770177834cc9e6ddd6ef4f20d789eab63c8055cb
2015-10-19 17:44:23 -04:00
Andres Rodriguez 0df346aaf9 make: add 'deb' target for creating deb packages
When 'make deb' is run create a libhsakmt.deb archive that installs
libhsakmt into the appropriate folder on the target where the dymanic
linker can find it.

Change-Id: I32de7198975f7831e509a67371e78456982b5c42
2015-10-16 19:13:51 -04:00
Harish Kasiviswanathan 5cc56a2647 Fix init process apertures
Kernel ioctl AMDKFD_IOC_GET_PROCESS_APERTURES returns process apertures
only for GPU nodes. The current implementation assumed that this list of
GPU nodes returned by the ioctl has one to one correpondence to sysfs
topology nodes. This fails when non-GPU nodes exist in topology as in
case of Intel + gfx802

Fix this by using gpu_id (./sys/.../kfd/topology/nodes/1/gpu_id) to map
information obtained from kernel ioctl call.

Change-Id: I4ab8ae5354f12cf0b6609fc4b24182b82eb3677f
2015-10-15 15:38:14 -04:00
Harish Kasiviswanathan b6c6f79143 Fix hard-coded usage of Node 0
Use appropriate NodeId instead

Change-Id: I46af93b76978fea7bedb34457fcc0864ed4fe2d4
2015-10-14 17:27:38 -04:00
Felix Kuehling 6a5ca4bc5a Fix various dgpu memory management issues
Fix TONGA_PAGE_SIZE value and move it to libhsakmt.h for usiing it
consistently in all places that require the same alignment for the
same reason. Create a generic alignment helper macro to replace some
incorrect hand-coded size alignments.

Move virtual address and size alignments down into aperture management
functions. Alignment is a per-aperture property that is set during
fmm_init_process_apertures. Doing the alignment there ensures that
all allocations in the same aperture are aligned the same way. Finding
objects by size and address can take the alignment into account.

Also align the size of physical allocations to back aligned virtual
address allocations. CPU mappings do not need to be aligned.

Map anonymous pages over released memory mappings to allow the
backing pages to be released, while keeping the address space
reserved.

Add alignment parameter to free_exec_aligned_memory_gpu to match the
interface of allocate_exec_aligned_memory_cpu. It doesn't make sense
to allow an alignment parameter in one but assume a specific
alignment in the other.

Change-Id: I74226ca6938f4948f643e5aee1d474720cd89e78
2015-10-13 19:14:56 -04:00
Felix Kuehling 0fc0a5b526 Add support for gfx803
Create new device_info and add device ID. Add helper macros to
identify chip families (VI, discrete). For now gfx803 behaves like
gfx802. But if necessary we can have gfx802 or gfx803-specific
code paths or workarounds in the future.

Change-Id: I61b4ffef7dd7796bb34cb01fbff0089bd49507bb
2015-10-09 17:40:54 -04:00
Harish Kasiviswanathan 758824db17 Fix assert failure for CPU only node
hsa_gfxip_table lists only (supported) GPUs. So assert fail only when a
non-supported GPU is detected.

Change-Id: I6207dc7cd55860c8b3348b6a4ca6102131975722
2015-10-08 11:52:59 -04:00
Harish Kasiviswanathan f2a46101d3 Refactor hsa_gfxip_table lookup
Also fix some formatting

Change-Id: Ia04d7a9cd3972cc4d283c576161de639027aac6d
2015-10-08 11:52:59 -04:00
Felix Kuehling 8e836f8183 Setup APE1 on dGPU for coherent access
The default is non-coherent access for better performance on dGPU.
Disabled hsaKmtSetMemoryPolicy function on dGPU to prevent app from
overriding the APE1 settings at runtime.
Fixed dGPU VM aperture limit to be inclusive.

Change-Id: I378ff74a654f533572775c0c97c19779a56bc6d9
2015-10-02 17:20:33 -04:00
Felix Kuehling 7505893cc7 Add all gfx802 device IDs to supported_devices
Without this, queue creation segfaults on unknown devices.

Change-Id: Ieea0bc4783e7313b3dcdabf03ab1269e3670b217
2015-10-02 15:33:37 -04:00
Felix Kuehling f3aaba0621 Fix returning of base and limit on dgpu_mem_init reinitialization
Change-Id: I1d1500ee57c3b85fc39c224d233a62097f981719
2015-09-30 18:07:04 -04:00
shaoyunl 2d63ee7b8f Initiali support for CWSR on thunk
1. Add IOCTL defines to set trap handler
2. Add control stack size information on create queue argument.
3. Increase the total save&restore area size for carrizo to include the control stack size.

Signed-off-by: Shaoyun Liu <Shaoyun.liu@amd.com>

Change-Id: Iccf15e073b7db2519e96e7f7b46a89d57ab9a4df
2015-09-25 15:12:25 -04:00
Harish Kasiviswanathan 1897acd78e Merge "Sync up HSA_ENGINE_ID type with Windows/Perforce" into amd-staging 2015-09-24 11:03:23 -04:00
Amber Lin 082f8314c4 Sync up HSA_ENGINE_ID type with Windows/Perforce
HSA_ENGINE_ID in Perforce added ui32 to the typedef while in Git it doesn't.
This causes conflicts to RT applications. Decision being made is to change Git
to match Perforce.

Change-Id: I7e9c6437b023bb23ec9578737f8534e9453589b9
2015-09-24 00:10:52 -04:00
Harish Kasiviswanathan 1438f15fd0 Fix VM range for dGPU local memory
Currently, Kernel imposes a limit on VM. Thunk should be aware of it.
This fix is required till Kernel VM limit is sorted.

For now both "Host Access" memory and "Local Memory" share the same VM range.

Change-Id: I5a9220face20df9ede2b78bd6201a01dd2ea70e0
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-09-23 16:18:50 -04:00
Harish Kasiviswanathan 4b768872c0 Fix mem size variable type
Memory size is 64-bit. So use HSAuint64 instead of uint32.

Change-Id: Iaa607dec9c1a1c5ac46ea442fd482210ea550b45
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2015-09-23 15:33:54 -04:00
Amber Lin f7fffdc2be Enable GFXIP version info for dGPU
Add GFXIP version 8.0.2(major.minor.stepping) for gfx802 and 8.0.3 for gfx803.


Change-Id: Icc7cac6b2e8a78d9cff4105aeb2bfcd2c7759027
2015-09-22 15:04:43 -04:00
Ben Goz 6170080cf6 Adding support for local memory on dGPU
Change-Id: I1a926b11730ba295605eeb37c9b1fc438bed8a64
Signed-off-by: Ben Goz <ben.goz@amd.com>
2015-09-21 14:13:15 -04:00