rocm-systems

Autor	SHA1	Wiadomość	Data
Harish Kasiviswanathan	39bf9c6611	Fix node_id in gpu_mem[] array Change-Id: I4897623612e1749e275fb97ce1603dc5130fc9ce	2015-12-14 16:25:18 -05:00
Amber Lin	582b70f9c3	Free resources when dlclose is called When the Thunk is initialized multiple times in the lifetime of a single process , some global resources are leaked. This can happen when dlopen and dlclose are used to load the library at runtime, rather than linking the runtime against the Thunk. This patch adds the destructor to release global resources when dlclose is called. Change-Id: Ia00da0d41f095d0b2706f98c0e75effedd596f49	2015-12-11 16:32:41 -05:00
Yair Shachar	8f529e3c72	Add support for per device debug register state tracking Change-Id: I8d51670f5de8d379ead898d484f668a8034f9878 Signed-off-by: Yair Shachar <Yair.Shachar@amd.com>	2015-12-07 21:11:21 +02:00
Harish Kasiviswanathan	a4cf02d797	Remove unused parameter gpu_id from few functions This will also fix out of bound access in functions fmm_get_aperture_base_and_limit and fmm_release Change-Id: Icf064c46647e69a069126171dbacdf3d5b27f972 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2015-11-30 11:51:44 -05:00
Harish Kasiviswanathan	2903a610e1	Use same VM range for all dGPUs dgpu_aperture and dgpu_alt_aperture will be shared by all dGPUs. Change-Id: I814495e43b51acabdc6266cfa8d83db5a062e20d Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2015-11-26 15:07:29 -05:00
Harish Kasiviswanathan	5a55383baf	Fix dgpu_vm_limit Break from the for-loop once dgpu VM range is found, otherwise the length is reduced by half Change-Id: Ie602054c16ea69ea1cbb75e804ead551bc3615c0 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2015-11-23 11:51:39 -05:00
Amber Lin	a5bc8360e8	Fix sibling map in CPU cache properties Previous code only works for systems where shared_cpu_map lists 32 or less bits. Some systems list more than 32 bits and express them as XXXXXXXX,XXXXXXXX,.... This patch adds that calculation. Also increase MAX_CPU_CORES and MAX_CACHES to accommodate more advanced systems. Change-Id: Ia5c7041866456a6aa3b66f8f0f951022d7c51028	2015-11-12 08:31:51 -05:00
Felix Kuehling	62337b6c0a	Reserve address space with PROT_NONE Access to reserved address space that has not been allocated should result in a segfault. Use PROT_NONE to ensure that. Change-Id: Ic5da9392fabbe78c9ec14f98e8b7b47e5267a98a	2015-11-10 18:19:56 -05:00
Kent Russell	3786e18d99	Use OUT_DIR for thunkroot variable Pick up the thunk from the correct location. It is no longer inside THUNK_ROOT, but instead part of the OUT folder. Change-Id: I41dd7dae243e66270d0ea7182f1ba119b18a1cfb	2015-11-09 16:21:49 -05:00
Kent Russell	4e4d4a81e1	Fix variable for RPM build Certain versions of rpmbuild need the variable to be outside of curly braces. This addresses that issue in that situation. Change-Id: Iff7200b332b9d8e41a4d7676ca14c5a32c075beb	2015-11-09 11:05:32 -05:00
Amber Lin	b6f65f9849	Add CPU cache information Fill up cache properties of CPU node by reading data from /proc/cpuinfo and /sys/devices/system/cpu/cpuX/cache/indexY Change-Id: I0a96760575e504e38962554f192c3fe66bea3c15	2015-11-09 07:16:24 -05:00
Kent Russell	cb3a664065	Add option to create release build for Thunk By adding REL=1 to the make command line (e.g. make REL=1 deb), we can create a release build of the Thunk. This will not affect existing functionality, and will only have an effect if REL=1 is specified on the command line, or in the build_thunk.sh script. Change-Id: Iedc3b6094e70a4ebd726499eda56013cc254b83d	2015-10-30 14:05:40 -04:00
Kent Russell	cabbcbabff	Cleanup RPM build of thunk Change-Id: Ib437a3ec7be9f5aa7d3ef9e53c13e3c5e7b7382e	2015-10-30 08:42:16 -04:00
Felix Kuehling	bd93eecc64	Use correct aperture for _fmm_unmap_from_gpu_scratch Passing in the wrong aperture resulted in failure to unmap scratch. Change-Id: Icd7423abfb1bcc773b33becffcbefc233f4ff340	2015-10-29 18:26:15 -04:00
Philip Cox	0c234c7ef3	Add SDMA IOCTL type to Create Queue function. Change-Id: I7e31507b761ca388b2cac93f994f6106de962f17	2015-10-29 10:25:41 -04:00
Kent Russell	6ceed7def3	libhsakmt - Add make option to package thunk as RPM Add an option to libhsakmt to allow the thunk to be packaged as an RPM. The default will remain being built as-is, but this can now be packaged as an RPM by using "make src rpm" . build_thunk.sh will be modified to reflect this new option. Change-Id: I38e03d10cfb5035bdf0a87635a784c47a709a5b6	2015-10-29 07:49:13 -04:00
Harish Kasiviswanathan	f885e551aa	Remove erroneous and redundant memory banks reported hsaKmtGetNodeMemoryProperties - - Return only HSA_HEAPTYPE_SYSTEM memory for CPU only node. - For dGPU remove redundant HSA_HEAPTYPE_FRAME_BUFFER_PRIVATE entry. Change-Id: I0349be39b8409a0fd64a038b8b2956191356d937	2015-10-23 18:43:46 -04:00
Harish Kasiviswanathan	69662da3dc	Correct parameter name for topology_is_dgpu() The function expects device_id and not gpu_id. Change-Id: I79794fd4e58e6e6adb26659da30f3e4d8e108434	2015-10-23 18:43:45 -04:00
Harish Kasiviswanathan	cb53548c89	Unify fmm_get_aperture_xxx functions Unify fmm_get_aperture_base and fmm_get_aperture_limit into one function. Make the return value to HSAKMT_STATUS. Change-Id: I0b3f563ffb268947ab891f4935f61788d0af0e01	2015-10-23 18:43:34 -04:00
Felix Kuehling	5131ab4e64	Implement flat scratch support for dGPU hsaKmtAllocMemory only allocates aligned address space and sets up the scratch_physical aperture to match the allocated address space. Actual allocation of backing memory happens in hsaKmtMapMemoryToGPU. Change-Id: Ie709815ab9bedb3d682e096b4005fdfb5e94d3a7	2015-10-22 20:40:22 -04:00
Felix Kuehling	149261ba09	Allow address space allocations with specific alignment Change-Id: I4bf7f7ac53c3921dd330b9dc7a40582611f88b69	2015-10-22 20:27:49 -04:00
Ben Goz	55b1a5dc43	Casting local memory size to uint64_t Change-Id: I5c2010056b84ac01bb65361210d2a693e437050a Signed-off-by: Ben Goz <ben.goz@amd.com>	2015-10-22 09:05:34 -04:00
Ben Goz	e61500c46e	Adding support for new AQL Queue Memory allocation Change-Id: If84fc4b961627dbdd0b77b1c509a3c9a4c709b9f Signed-off-by: Ben Goz <ben.goz@amd.com>	2015-10-22 13:13:54 +03:00
Felix Kuehling	590c8e522c	Fix node 0 system memory allocation for dGPU This is a hack to allow the Runtime to allocate system memory with PreferredNode=0 on a dGPU system. We allocated it from Node 1 instead so that the node 1 GPU can map the memory. A proper fix will be implemented together with multi-GPU support. Change-Id: Ieb52599e5275781c04ee34405ea850bf782c523a	2015-10-21 20:00:01 -04:00
Felix Kuehling	39bde26c9b	Reserve more SVM process address space Try to reserve as much SVM address space as GPUVM can address. Implement a fallback scheme to smaller sizes if larger allocations fail or are not addressable by the GPU, down to an (arbitrary) minimum of 4GB. Change-Id: I770177834cc9e6ddd6ef4f20d789eab63c8055cb	2015-10-19 17:44:23 -04:00
Andres Rodriguez	0df346aaf9	make: add 'deb' target for creating deb packages When 'make deb' is run create a libhsakmt.deb archive that installs libhsakmt into the appropriate folder on the target where the dymanic linker can find it. Change-Id: I32de7198975f7831e509a67371e78456982b5c42	2015-10-16 19:13:51 -04:00
Harish Kasiviswanathan	5cc56a2647	Fix init process apertures Kernel ioctl AMDKFD_IOC_GET_PROCESS_APERTURES returns process apertures only for GPU nodes. The current implementation assumed that this list of GPU nodes returned by the ioctl has one to one correpondence to sysfs topology nodes. This fails when non-GPU nodes exist in topology as in case of Intel + gfx802 Fix this by using gpu_id (./sys/.../kfd/topology/nodes/1/gpu_id) to map information obtained from kernel ioctl call. Change-Id: I4ab8ae5354f12cf0b6609fc4b24182b82eb3677f	2015-10-15 15:38:14 -04:00
Harish Kasiviswanathan	b6c6f79143	Fix hard-coded usage of Node 0 Use appropriate NodeId instead Change-Id: I46af93b76978fea7bedb34457fcc0864ed4fe2d4	2015-10-14 17:27:38 -04:00
Felix Kuehling	6a5ca4bc5a	Fix various dgpu memory management issues Fix TONGA_PAGE_SIZE value and move it to libhsakmt.h for usiing it consistently in all places that require the same alignment for the same reason. Create a generic alignment helper macro to replace some incorrect hand-coded size alignments. Move virtual address and size alignments down into aperture management functions. Alignment is a per-aperture property that is set during fmm_init_process_apertures. Doing the alignment there ensures that all allocations in the same aperture are aligned the same way. Finding objects by size and address can take the alignment into account. Also align the size of physical allocations to back aligned virtual address allocations. CPU mappings do not need to be aligned. Map anonymous pages over released memory mappings to allow the backing pages to be released, while keeping the address space reserved. Add alignment parameter to free_exec_aligned_memory_gpu to match the interface of allocate_exec_aligned_memory_cpu. It doesn't make sense to allow an alignment parameter in one but assume a specific alignment in the other. Change-Id: I74226ca6938f4948f643e5aee1d474720cd89e78	2015-10-13 19:14:56 -04:00
Felix Kuehling	0fc0a5b526	Add support for gfx803 Create new device_info and add device ID. Add helper macros to identify chip families (VI, discrete). For now gfx803 behaves like gfx802. But if necessary we can have gfx802 or gfx803-specific code paths or workarounds in the future. Change-Id: I61b4ffef7dd7796bb34cb01fbff0089bd49507bb	2015-10-09 17:40:54 -04:00
Harish Kasiviswanathan	758824db17	Fix assert failure for CPU only node hsa_gfxip_table lists only (supported) GPUs. So assert fail only when a non-supported GPU is detected. Change-Id: I6207dc7cd55860c8b3348b6a4ca6102131975722	2015-10-08 11:52:59 -04:00
Harish Kasiviswanathan	f2a46101d3	Refactor hsa_gfxip_table lookup Also fix some formatting Change-Id: Ia04d7a9cd3972cc4d283c576161de639027aac6d	2015-10-08 11:52:59 -04:00
Felix Kuehling	b94ae66c62	Update HsaMemFlags.ui32.CoarseGrain comment As advised by Paul Blinzer Change-Id: Icabf4acd94866ddbbe53faf48a71e1113f0c76b6	2015-10-05 16:48:50 -04:00
Felix Kuehling	8e836f8183	Setup APE1 on dGPU for coherent access The default is non-coherent access for better performance on dGPU. Disabled hsaKmtSetMemoryPolicy function on dGPU to prevent app from overriding the APE1 settings at runtime. Fixed dGPU VM aperture limit to be inclusive. Change-Id: I378ff74a654f533572775c0c97c19779a56bc6d9	2015-10-02 17:20:33 -04:00
Felix Kuehling	7505893cc7	Add all gfx802 device IDs to supported_devices Without this, queue creation segfaults on unknown devices. Change-Id: Ieea0bc4783e7313b3dcdabf03ab1269e3670b217	2015-10-02 15:33:37 -04:00
Felix Kuehling	f3aaba0621	Fix returning of base and limit on dgpu_mem_init reinitialization Change-Id: I1d1500ee57c3b85fc39c224d233a62097f981719	2015-09-30 18:07:04 -04:00
Felix Kuehling	f2f45cc0e4	Add CoarseGrain memory flag Change-Id: If8ac0339ae8c809c6e6a4f56592a4061d110ea94	2015-09-30 18:07:04 -04:00
shaoyunl	2d63ee7b8f	Initiali support for CWSR on thunk 1. Add IOCTL defines to set trap handler 2. Add control stack size information on create queue argument. 3. Increase the total save&restore area size for carrizo to include the control stack size. Signed-off-by: Shaoyun Liu <Shaoyun.liu@amd.com> Change-Id: Iccf15e073b7db2519e96e7f7b46a89d57ab9a4df	2015-09-25 15:12:25 -04:00
Harish Kasiviswanathan	1897acd78e	Merge "Sync up HSA_ENGINE_ID type with Windows/Perforce" into amd-staging	2015-09-24 11:03:23 -04:00
Amber Lin	082f8314c4	Sync up HSA_ENGINE_ID type with Windows/Perforce HSA_ENGINE_ID in Perforce added ui32 to the typedef while in Git it doesn't. This causes conflicts to RT applications. Decision being made is to change Git to match Perforce. Change-Id: I7e9c6437b023bb23ec9578737f8534e9453589b9	2015-09-24 00:10:52 -04:00
Harish Kasiviswanathan	1438f15fd0	Fix VM range for dGPU local memory Currently, Kernel imposes a limit on VM. Thunk should be aware of it. This fix is required till Kernel VM limit is sorted. For now both "Host Access" memory and "Local Memory" share the same VM range. Change-Id: I5a9220face20df9ede2b78bd6201a01dd2ea70e0 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2015-09-23 16:18:50 -04:00
Harish Kasiviswanathan	4b768872c0	Fix mem size variable type Memory size is 64-bit. So use HSAuint64 instead of uint32. Change-Id: Iaa607dec9c1a1c5ac46ea442fd482210ea550b45 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2015-09-23 15:33:54 -04:00
Amber Lin	f7fffdc2be	Enable GFXIP version info for dGPU Add GFXIP version 8.0.2(major.minor.stepping) for gfx802 and 8.0.3 for gfx803. Change-Id: Icc7cac6b2e8a78d9cff4105aeb2bfcd2c7759027	2015-09-22 15:04:43 -04:00
Ben Goz	6170080cf6	Adding support for local memory on dGPU Change-Id: I1a926b11730ba295605eeb37c9b1fc438bed8a64 Signed-off-by: Ben Goz <ben.goz@amd.com>	2015-09-21 14:13:15 -04:00
Ben Goz	692e004047	Adding new memory allocation IOCTL Change-Id: I0eb1924811a2e1e436296ebe632d8f112a61637d Signed-off-by: Ben Goz <ben.goz@amd.com>	2015-09-21 13:58:32 -04:00
Harish Kasiviswanathan	3e9773ff2c	Revert "Topology, memory allocation, cleanup issue for gGPU" This reverts commit `ee08f537a7`. Change-Id: I92a4ed91bf566259916d1a96207e1fe9a6099c31 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2015-09-21 10:47:30 -04:00
Harish Kasiviswanathan	ee08f537a7	Topology, memory allocation, cleanup issue for gGPU Patch submitted by Besar Wicaksono 1. Bug on detecting local memory size interpreted as 32 bit value instead of 64. The bug causes thunk to go into an infinite loop trying to reserve virtual address range for dgpu system memory. 2. SIMD count in the node property is 0. Runtime use this attribute to find a gpu device. Regarding other attributes of intel+tonga topology, Harish started a discussion on August iirc, could you please share an update ? This would help me progress with more tests such as scratch memory, which require the scratch aperture information in order to construct a buffer srd in gpuvm space. 3. Bug on releasing memory via fmm_release, where no actual release is being done. The vm_object can't be found because the memory size does not match due to the allocation padded the size with 32KB. 4. Pointer arithmetic on vm_area allocation/release. The value of vm_area_t::end seems to be interpreted inconsistently whether it is (start + size -1) or (start + size). One example of potential issue I see is the logic could report larger size of the hole in the vm area list. 5. Resource cleanup on multiple library load/unload within a single process. - Any memory allocation on subsequent library load will result an error "va above limit". To my understanding this is due to the reserved memory for the system memory not being released on unload. - The static variable events_page needs to be invalidated appropriately on library unload so the next load could reinitialize it. 6. Could you please update if AQL queue is ready to test with the stg kfd/kmt ? 7. The system memory allocation with size larger than 32KB seems to be padded by an extra 32KB. I was wondering if we could remove this overhead. Change-Id: I039988d36637525089c7569dc3b77e58750e2121	2015-09-15 13:15:04 -04:00
David Ogbeide	8a01cd1212	libhsakmt: specify build output via variable Makefile currently sends build output a default location. Allow choice of build output location if so desired using a variable. Signed-off-by: David Ogbeide <davidboyowa.ogbeide@amd.com>	2015-09-01 14:30:53 -04:00
Ben Goz	fb8378a18b	Support gfx802 dGPU Signed-off-by: Ben Goz <ben.goz@amd.com>	2015-08-30 14:13:53 +03:00
shaoyunl	2dff5cabfa	Minor fix in libhsathunk for KFDMemory test Signed-off-by: shaoyun liu(shaoyun.liu@amd.com) Reviewed-by: Ben Goz(Ben.Goz@amd.com)	2015-08-05 17:32:00 -04:00

1 2 3

113 Commity