rocm-systems

Автор	SHA1	Сообщение	Дата
xinhui pan	41bf449e99	thunk: fix size overflow Some test case alloc >4gb memory. Use HSAuint64 in bytes and HSAuint32 in pages. Change-Id: I0d5e6c299903b5898cfea024178a7a26b9ba3c90 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2019-01-28 10:49:45 -05:00
Jay Cornwall	b764991982	Set EOP buffer TC policy to non-coherent Restores regression in dispatch latency. Change-Id: I17869d3d515d8c1fa055a57afec2531903b88b16 Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>	2019-01-23 12:21:01 -05:00
Oak Zeng	124f77775c	Revert "Create SDMA queue on specific engine" This reverts commit `58b95e0a9d`. Change-Id: Idc0decd86364ed3441e9037b83be8be9953f0b3e Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-01-21 10:37:46 -06:00
Philip Cox	37858f2311	Initial gfx9 debugger node suspend/resume Change-Id: I2a5dac3d02265c11f5b6985ab457e2d1caa0a033 Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2019-01-11 09:00:54 -05:00
Eric Huang	8ee93b3187	libhsakmt: add RAS support v2 RAS feature enabling bit and errors return are implemented in existed topology and event mechanism. v2: change library interface. Change-Id: I75807c080b5b26e8115240b05b3d7016cb05a31a Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>	2018-12-13 10:17:12 -05:00
Kent Russell	53439669d9	libhsakmt: Add new gfx900 and gfx906 GPU IDs Change-Id: I93b2b845c3edb2da55235a56516a851145745988	2018-12-12 08:36:40 -05:00
Eric Huang	29d11d02e8	Revert "libhsakmt: add RAS support" This reverts commit `1fbe010354`. Change-Id: I739b17e057f2a8a0f4375741955209d2477c704a Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>	2018-12-08 19:42:33 -05:00
Eric Huang	1fbe010354	libhsakmt: add RAS support RAS feature enabling bit and errors return are implemented in existed topology and event mechanism. Change-Id: I9b018bba80cf4a6998e42a7bff64318c689b1d2a Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>	2018-11-23 11:42:34 -05:00
shaoyunl	29b45b8c0a	Thunk: make scratch memory only map to its own GPU Map scratch memory to the GPU that specified when allocate the memory Change-Id: I788f9ef0dccb63b894a75e75cac5f94a60d7ec48 Signed-off-by: shaoyunl <shaoyun.liu@amd.com>	2018-11-19 10:26:31 -05:00
Oak Zeng	58b95e0a9d	Create SDMA queue on specific engine Change-Id: Id651ececda55b81b45e991bd8e6616674be48d8e Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2018-11-13 14:52:17 -05:00
Felix Kuehling	5e4e19d47b	libhsakmt: Distinguish EPERM and EACCES EPERM means "operation not permitted" and is returned when CGroup access checks fail. EACCES means "permission denied" and is returned when the device file permission bits or access control list don't allow access. EPERM can fail silently, since we assume the administrator disabled a device on purpose in the CGroup. EACCESS should produce an error message and an info message to check the device file permissions. Change-Id: Iee4c5584c5fdc4e113c3d760dede6661097b4341 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-11-12 17:06:18 -05:00
Gang Ba	c54c1dbdcb	Add code to support packet capture and replay in the Thunk This feature only support dgpu for now. Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2 Signed-off-by: Gang Ba <gaba@amd.com>	2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan	c1994e28f0	libhsakmt: Support device controller cgroup Device whiltelist controller cgroup allows to track and enforce open and mknod restrictions on device files. Tasks should works with /dev/dri/renderN devices that are whitelisted for its cgroup. If a certain node is not whitelisted it is not an error condition. Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-10-30 11:31:53 -04:00
Philip Cox	105edd4bb4	Fix Debug Thunk spec mismatch Move debug trap support capabilities to their own structure to fix thunk spec vs header mismatch. Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2 Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2018-10-24 11:32:12 -04:00
Gang Ba	52ec7f805e	drm/amdkfd: Added gfx904 and gfx803 for KFD. Change-Id: I4406dc70c776926feaecca3f2146d65259a80517 Signed-off-by: Gang Ba <gaba@amd.com>	2018-09-25 08:17:44 -04:00
Mike Li	3144a84b9a	all_gpu_id_array: Handle GPU resource management GPU Resource management can disable some of the GPU nodes. The Kernel driver could be not aware of this. Get from Kernel driver information of all the nodes and then filter it. Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32 Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-09-24 11:38:11 -04:00
Mike Li	f9bd960344	Output a error message only when open_drm_render_device failed unexpectedly. Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-09-24 11:36:11 -04:00
Harish Kasiviswanathan	fb79a0efe2	Topology: Use processors available to the process The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of processors available to the scheduler. When a KFD process is run under a container environment, only a subset (cpuset) of processors are available to the current process. For getting CPU cache information use sched_getaffinity() to get the number of processors available to the current process. Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-21 10:31:59 -04:00
Felix Kuehling	be574169c1	libhsakmt: Fix segfault on gfx801 Handle the case that svm.dgpu_aperture does not exist in vm_find_object. Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan	7876bb70a9	Add cgroup support Some nodes are unavailable based on the task's cgroup hierarchy. Handle this situation by ignoring those nodes Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan	866ef20054	iolinks: Handle GPU resource management With GPU resource management, some nodes are unavailable based on the cgroup hierarchy of the task. Kernel via sysfs specifies all the iolinks. Skip the links which are not accessible. Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to relevant user Node IDs v2: NodeFrom mapped from sysfs Node to User Node Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan	f84a99e953	Replace global variable _system with g_system Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-06 16:56:07 -04:00
Felix Kuehling	a9bd6e6f8b	Revert "libhsakmt: Try to use CPU addr as GPU addr for userptrs" This reverts commit `ab181c46c0`. This fixes ambiguity when looking up GPU addresses with hsaKmtQueryPointerInfo. hsa_amd_agents_allow_access uses hsaKmtQueryPointerInfo, and depends on finding the correct object from a GPU address. Finding the wrong userptr object based on its CPU address leads to incorrect GPU mappings and results in VM faults. Change-Id: I7c5f571ee6e1f9d32687aa3eab6d96944ad032be Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-31 15:04:50 -04:00
Felix Kuehling	855f1a32a9	libhsakmt: Fix and deduplicate object lookup code Added a helper vm_find_object that can be used everywhere we need to lookup objects by their address and optionally size. This unifies all subtly different, partially incomplete, or broken ways of doing this in various functions: * map * unmap * register * deregister * free * get_mem_info * set_mem_user_data At the same time fix some subtle problems for userptr lookup that got a bit more complex when the userptr address can match the GPU address. Change-Id: I98572d1734fc7688a1d68f6a784e02c8dee90af5 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-31 15:04:47 -04:00
shaoyunl	30a4ab39f3	thunk: Avoid create PCIe indirect link on none large bar target PCIe P2P (indirect) IOLinks should only be created if the remote GPU is large-BAR Change-Id: I55cbb5e37c5d41267583e07aca6bdcc708403029 Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>	2018-08-29 16:31:55 -04:00
Shaoyun Liu	7796994f46	Thunk: Avoid add indirect link for the GPUS with xGMI link Change-Id: I06f511c55e28919512fda79b504566818dc2a5ab Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>	2018-08-29 13:22:58 -04:00
Mike Li	3437a356c7	Decouple user NodeID and sysfs NodeID Currently, all HSA nodes are exposed to user. So the existing implementation assumes a one to one mapping between user NodeId and sysfs nodeId. GPU Resource Management will provide control over the exposed HSA nodes. This means not all HSA nodes will be exposed to the user. Decouple it. The mapping from user NodeId to sysfs NodeId will be local to topology.c and topology helper functions. For others NodeId should be sequential from 0 to Number of Nodes exposed to user. v1: initial implementation v2: map node id within the topology_* functions v3: remove two static globals v4: add bounds check got node id Change-Id: Id12147ece41d682430f398944bbb339ca906eb1b Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-08-23 16:01:32 -04:00
Philip Cox	db92d5af23	Add GFX debug trap control code Add initial support for the kfd debugger trap support for GFX9 chips. - Adding support for Enable/Disable trap support - Setting debug trap support data - Setting wave launch trap override - Setting wave launch mode Change-Id: If39f2395c4b6cf56249cf76f1c44cfcbdcef891c Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2018-08-22 14:40:15 -04:00
Felix Kuehling	9271e69ddf	libhsakmt: Fix processing of memory fault events AMDKFD_IOC_WAIT_EVENTS with multiple events and wait_for_all = 0 returns success after any of the events have signaled. So we can't blindly assume that a memory fault event that was in the list has actually signaled. Check the gpu_id as an indicator whether there really was a memory fault before processing it further. Change-Id: I6cc311bfc184c631beaf684027176a6ca42e05c1 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-17 16:06:45 -04:00
Felix Kuehling	ab181c46c0	libhsakmt: Try to use CPU addr as GPU addr for userptrs If the CPU addr of a userptr is accessible by the GPU, try to use it instead of allocating a different GPU address. If something else is already registered with an overlapping address range, we still need to allocate a GPU address, because KFD does not support overlapping GPUVM mappings. Change-Id: I452963ee45a454f735755a0b43122b9aee5d55be Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-17 16:06:45 -04:00
Felix Kuehling	80f2cc644c	libhsakmt: Add mmap-based aperture management for GFXv9 and later If the GPU virtual address space is >= 47 bits, don't reserve virtual address space at startup and use mmap to allocate virtual addresses. Change-Id: Ic935b03c8e78271829fc8e6cfd0e543184aff818 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-17 16:06:45 -04:00
Felix Kuehling	40c46cc6cb	libhsakmt: Fix assumptions about userptrs relative to apertures So far we have assumed that userptrs are always memory outside reserved SVM apertures that are mapped into the SVM aperture for GPU access. With an unreserved SVM aperture that covers the entire virtual address range, this distinction will no longer be true. Userptrs will generally be inside the unreserved SVM aperture. Take that into consideration when registering, mapping and unmapping virtual addresses. We now need a retry logic when looking up buffers from addresses. If it is not found by its GPU address, try it as a userptr. We also need to consider the new possibility that a userptr is registered at the same address for CPU and GPU access. So a buffer found by its GPU address may also turn out to be a userptr. In that case use a stricter lookup using the userptr and size (if the size is known), to identify the correct one of multiple overlapping objects. Change-Id: Ia43633aaa40f9fd2a74918ae969a631d2ff68419 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-15 16:07:54 -04:00
Felix Kuehling	d79b9c1a29	libhsakmt: Make VA management scheme configurable per aperture Change-Id: Ib70b038b4ef6465b03545317c6494a4e4950c107 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-15 14:22:19 -04:00
Felix Kuehling	d57026f447	libhsakmt: Allow dgpu and dgpu_alt aperture to be the same Make dgpu_aperture and dgpu_alt_aperture pointers that can point to the same actual aperture. This will be useful on GFXv9 and later, where the MType is not defined by the aperture and we want to have a single aperture covering the entire virtual address space. aperture->is_coherent can no longer be a reliable indicator of coherency. Replace it with different conditions based on mem flags and svm.disable_cache (from HSA_DISABLE_CACHE environment). Change-Id: Iefc415b87b8abd96e3916586485a0a55d9b27c19 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-15 14:22:19 -04:00
Felix Kuehling	2d2181b478	libhsakmt: Move unmapping into aperture_release_area This prepares the code for an alternative aperture management method that needs to unmap memory differently. Change-Id: I5494aa5420f85edb8f7857f00c17e1d2e6479a51 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-15 14:22:19 -04:00
Felix Kuehling	9d96af0150	libhsakmt: scratch is not a manageable aperture Only scratch_physical, for scratch-backing memory is managed by the Thunk. Change-Id: I4716981aa908d9569584dc35f40ffd270a2f9014 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-15 14:22:19 -04:00
Felix Kuehling	842359a826	libhsakmt: Remove aperture offset parameter This parameter was used for non-canonical GPUVM allocations on GFX7/8 APUs only, to prevent getting NULL pointers from valid allocation after subtracting the aperture base. The same can be achieved less intrusively by reserving address space at the start of the aperture during initialization. Change-Id: I0aae773f069c2b228824ba464b0612a4d8b489ce Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-15 14:22:19 -04:00
xinhui pan	eb5539fb10	thunk: fix a memory leak Hit queue create failure when do kfdtest with --gtest_repet=-1 fix: 4bb90d04("Remove the use of IS_DGPU()") Change-Id: I04fa73f90cef13a5517dbaceb89c41dc0f821a79 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-10 15:51:32 +08:00
Yong Zhao	110e754f64	Differentiate gfx700 and improve the logic by introducing is_gfx700() Because gfx700 has local memory but other APUs don't, we should reflect that in the code. Meanwhile, fix a bug that on gfx902 svm aperture is not added when calling hsaKmtGetNodeMemoryProperties(). Change-Id: Id840f2db0b14fda9ee713b219a9474c15f8a9771 Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-08-09 21:39:37 -04:00
xinhui pan	8fbf4a26ec	thunk: fix a vm area release issue On some asics, like tonga, the memory alignment size is as big as 0x8000. fmm_allocate* alloc vm area with size passed in which is not aligned mostly. But __fmm_release free vm area with vm_object_t->size which is aligned. That might cause aperture_release_area fail to free the vm area as the size might be bigger than zone itself or it just free another vm area nearby unexpected. This patch somehow will alloc more space than it needed on tonga. gfx900+ is not affected. Change-Id: I5a88c92b08c4e6f6bc05881798f769b55d6debe9 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-09 06:08:15 -04:00
Yong Zhao	fe04dd6890	Calculate and store the first gpu mem during initializaiton Previously we used the first dgpu mem, but after careful examination, we found it only needs to be a GPU, so we modify the code to reflect that as well. Change-Id: I069d9b8e247aed55c1f885b79f743ea8e03ddf93 Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-08-08 13:54:09 -04:00
Yong Zhao	4bb90d048c	Remove the use of IS_DGPU() The information can be obtained directly from node id. Also improve the whole logic for future compatibility. Change-Id: I130733be4e7930d5953d5e81409905e60c2ec35e Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-08-07 18:07:04 -04:00
Felix Kuehling	c21927f425	libhsakmt: Fix problems init_svm_apertures Unset ret_addr when unmapping the address space reservation. Otherwise it may try to unmap it again later. Remember the actual map_size and use it instead of len outside the reservation loops. Change-Id: I1a6b3fecfb59e22a713e5ed49c3ed37914cb6fb5 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-03 22:09:52 -04:00
Yong Zhao	08b6685dd5	Change the confusing type and name in topology node is used repeatedly and excessively, which caused unnecessary confusion. Change-Id: I4ae4171887df5e5b85209a5af8a636e6d72e5e82 Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-08-03 12:00:17 -04:00
xinhui pan	ab9017715f	use rbtree instead of vm_objects list simple test of mapping many system memory to gpu. before [ RUN ] KFDMemoryTest.MMap [ ] Using ISA for GFXIP 9.0 [ ] successfully register/map 32GB system memory to gpu [ OK ] KFDMemoryTest.MMap (36932 ms) after [ RUN ] KFDMemoryTest.MMap [ ] Using ISA for GFXIP 9.0 [ ] successfully register/map 32GB system memory to gpu [ OK ] KFDMemoryTest.MMap (11441 ms) So there is 11s VS 36s improvement. Looks like we can do something similar with vm_area too. Change-Id: I0349aacdeddec3534016d28176f0fabf632c61fc Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-07-08 22:38:22 -04:00
Felix Kuehling	d3228f363e	Fix wrong loop termination condition Compare with gpu_mem_count instead of deprecated NUM_OF_SUPPORTED_GPUS to prevent overflows in case no dGPUs are present. Change-Id: I71fcb7503ba4c20bffadbdb04cefc4e4027a7df7 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-07-05 17:04:40 -04:00
Yong Zhao	4839882fc8	Set the write permission according to the flag when allocating host cpu mem Change-Id: I758c2b5b1799e968fa852646e1494fabb68c782d Signed-off-by: Yong Zhao <yong.zhao@amd.com>	2018-07-03 20:39:01 -04:00
Slava Grigorev	89e35574e3	Fix 'strncpy' truncating warnings when compiling with gcc 8 Change-Id: Ib145bab9450281da05f70dea34433b83438a756b Signed-off-by: Slava Grigorev <slava.grigorev@amd.com>	2018-06-29 17:06:08 -04:00
Yong Zhao	4eaaf9694d	Simplify if else logic for hsaKmtAllocMemory() The new logic is easier to follow. Change-Id: I69759a45c5dedaefeff831a2367253d3a4486bd3 Signed-off-by: Yong Zhao <yong.zhao@amd.com>	2018-06-29 14:39:52 -04:00
Yong Zhao	5972fac417	Rename two variable names in doorbells structure There were two doorbells, one embedded in another, which are very confusing. Change the member variable name to mapping to differentiate them. Also, rename doorbells_mutex to just mutext for brevity. Change-Id: Iaa14a1a3ee09449a9089fc1fb39c916fdf32fb44 Signed-off-by: Yong Zhao <yong.zhao@amd.com>	2018-06-28 16:04:35 -04:00

1 2 3 4 5 ...

343 Коммитов