rocm-systems

Autors	SHA1	Ziņojums	Datums
Huang Rui	06464b917d	libhsakmt: add NumCpQueues and NumSdmaQueuesPerEngine data field (v3) NumCpQueues and NumSdmaQueuesPerEngine should be got by kfd driver not hardcode. So add two data fields in HsaNodeProperties then thunk is able to get it from sysfs that exposed by kfd. v2: change NumCpQueues/NumSdmaQueuesPerEngine to one byte. v3: merge two commits as one to avoid ABI update two times. Change-Id: Ie386e4685f13493e22db6e207a399db6a4c5b9dc Signed-off-by: Huang Rui <ray.huang@amd.com>	2020-01-03 23:27:42 -05:00
Yong Zhao	22e9ef7303	libhsakmt: Add the perf counter support for gfx1012 Change-Id: I55d68a77928617edaabd33ae0807bf23f739c8de Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-12-18 20:49:36 -05:00
Jonathan Kim	8b01a1c4c5	add queue snapshot test adds api and test to get newly create queue snapshot per ptraced process. Change-Id: Ife97123a5b930e837ccaa386801145ef23c2cc2c Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>	2019-12-02 11:56:04 -05:00
Huang Rui	fdba74c2fb	libhsakmt: add gfx90c support for thunk This patch adds the support for gfx90c apu. So far we treat it as "dgpu" and gfx900. Will update hsa gfxip table while the isa/llvm is implemented on gfx90c. Change-Id: I6ef164bf3e751fe6dd6287cac212a500dce84b1a Signed-off-by: Huang Rui <ray.huang@amd.com>	2019-11-14 20:02:53 -05:00
Philip Yang	59c857476f	libhsakmt: use the closest NUMA node to allocate queue ctx area On NUMA system, allocate queue ctx save restore area on the closest NUMA node to the GPU which the queue is going to run. This will improve performance on NUMA system generally by reducing schedule latency and fix the multi-node rccl-tests unstable performance issue. If the closest NUMA node has no memory available, set flags NoNUMABind=1 to bypass mbind, to use default NUMA memory policy to allocate system memory. Change-Id: Ic62bfa5bb2efbf4f6ae79ff403e9610ddf18d45c Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-11-06 17:33:26 -05:00
Ori Messinger	e7f45fae8a	Add non-priv PMC blocks to GFX10 This patch adds the non-privileged PMC blocks for GFX10/gfx1010. Change-Id: I4b98cb2159d71113c12920ca7fd10e45096b4e2c Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>	2019-11-05 13:07:13 -05:00
Oak Zeng	fa0cb9ebeb	Handle IOCTL failure in fmm_release FREE_MEMORY_OF_GPU ioctl could fail, e.g., if memory is still mapped to GPU. Handle this failure by return error in fmm_release/HsaKmtFreeMemory Change-Id: I5461db39964f733cf97376d50e44906a9b4c0f13 Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-11-01 08:59:05 -04:00
Yong Zhao	ab2daf6538	libhsakmt: Add a message when a device is not supported This helps to quickly triage problems. Change-Id: Iad2b4b74209ab972be0c2f6311eeb3aaf098d29f Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-10-20 12:44:13 -04:00
Yong Zhao	1c7755d2da	libhsakmt: Add gfx1012 device IDs Now the gfx1012 device IDs are okay to reveal. Change-Id: I9da2a036b74ec7b6b8b1fb7587597a5847f02205 Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-10-17 12:35:22 -04:00
Yong Zhao	16fa78b134	libhsakmt: Print an error message when map_mmio failes Without this change, the failure was hard to notice when it happened. Change-Id: I99c3e8cea0d0cbd3bcfe79069410e6e870e225bf Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-10-17 12:32:32 -04:00
Amber Lin	23541e0289	libhsakmt: handle CPU cache info on non-NUMA sys When CONFIG_NUMA is not enabled in the kernel config, only one CPU node presents on the system and /sys/devices/system/node/nodeX directories don't exist. Read CPU cache information from /sys/devices/system/cpu in this situation. Change-Id: I017ff17dd72678a0551edcc77446664501aa42ca Signed-off-by: Amber Lin <Amber.Lin@amd.com>	2019-10-17 11:53:19 -04:00
Philip Cox	6933540c81	Remove debugger data reg accesses The debug trap accesses the data0/data1 registers, so we do not want the userspace to write values to it. We remove the calls to set the data0/data1 register values. Change-Id: Iaba842a4c445f339f16a39fe1994526ff78a2f3c Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2019-10-10 14:32:54 -04:00
Philip Cox	dbbd189b33	Add functions to get the kfd debugger version info To support adding new features to the kfd debugger, and not break functionality, we need to be able to check the kfd debugger support version info from the kernel. Change-Id: Icd88e4edab8430c35eaed588e62d892c1b5c62ec Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2019-10-10 14:32:54 -04:00
Amber Lin	5a09880620	libhsakmt: fix typo in error message When fail to get CPU dirs from //sys/devices/system/node/nodeX directory, the error message should print node_dir, not path. Change-Id: If76a51918c8dd55fa6605a62f3d29f9efc6fadb3 Signed-off-by: Amber Lin <Amber.Lin@amd.com>	2019-09-30 14:29:39 -04:00
shaoyunl	a1e399a3ff	Thunk : Add gfx1011 support from thunk side Change-Id: I6b202b75fc1ad0e69576a35a6a3e499818137e04 Signed-off-by: shaoyunl <shaoyun.liu@amd.com>	2019-09-25 11:02:33 -04:00
Philip Yang	71cf3cf5d3	libhsakmt: correct number of NUMA nodes calculation numa_max_node() return the highest node number available on the current system, number of NUMA nodes should be numa_max_node() + 1. Change-Id: I20a6c17af071e73e853cb5ea6d0304c8aca52681 Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-09-16 16:25:57 -04:00
Philip Yang	42392f093f	libhsakmt: handle NUMA system with no memory on node 0 on NUMA system, node 0 may have no memory, application pass node id 0 to hsaKmtAllocMemory will fail because mbind to specify the allocation from node 0 return EINVAL. Add new flag NoNUMABind for application to pass it to hsaKmtAllocMemory to skip mbind. hsaKmtCreateEvent and hsaKmtCreateQueue specify the new flag NoNUMABind to allocate system memory for event page and CWSR area, don't bind the system memory to a specific NUMA node. Change-Id: I854e5a57502c7807c4c5ff2e441d499ae515c309 Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-09-16 11:30:24 -04:00
Philip Yang	4da09813a3	libhsakmt: fix mbind failed on docker Docker seccomp by default blocks mbind system call, so mbind return failed on docker. thunk should not fail this otherwise application cannot allocate system memory on docker. Use pr_warn_once and pr_err_once to avoid duplicate same error messages Change-Id: I61a7c0e4abaa3dcfe7abf2ea48db90f669f9638a Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-09-13 15:01:47 -04:00
Yong Zhao	3ecd83e52d	libhsakmt: Support gfx1012 The gfx version item is yet to be added. Change-Id: Ia6c487447e5a5df80c0c12fe150939175068024b Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-09-06 14:42:32 -04:00
Yong Zhao	d6539ddc24	libhsakmt: Implement HSA_FORCE_ASIC_TYPE to overwrite asic type Force all the GPUs to a certain type, use the below command: HSA_FORCE_ASIC_TYPE="10.1.0 1 gfx1010 14" meaning major.minor.step dgpu asic_name asic_id This will faciliate the cooperation across the teams for bringing up ASICs which reuse existing device IDs. Change-Id: I40fe4c9b46d3ccb3e38ea52250e80e82fb50fb0f Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-09-06 12:12:42 -04:00
Felix Kuehling	e320913e9e	libhsakmt: Fix userptr mappings on gfx802 The memory size alignment workaround for a TLB bug on gfx802 was breaking userptrs because it would attempt to get_user_pages beyond the end of a VMA. Refine this workaround based on our understanding of the HW bug. It only affects L2 cacheline allocation, which is decided by the last page in the cache line (8 entries = 32KB of address space). Thus aligning memory allocation so that the last page falls on the end of a 8 entry TLB cache line allows caching to work correctly. Imported images require specific alignments. If their size is not naturally aligned with 8 cache lines, it may have bad TLB cache performance. This patch will only have the desired effect if redundant size padding in KFD is also removed. Change-Id: I984cbe7fa61fec04d70fa387aaf9aab370eabeb9 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2019-08-30 19:06:24 -04:00
Jack Zhang	545ca6263f	add device ID for gfx908 VF SRIOV: Fix issue that kfdtest cannot detect gfx908 VF inside VM. Change-Id: Ie05fd66d4e14b47818fddbe404df1059567b76a2 Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>	2019-08-29 15:29:34 +08:00
Yong Zhao	dbe9af7777	libhsakmt: Improve the confusing code The code tends to confuse readers. Improve it. Change-Id: I5c6cbf7a114b6e7d26ce3b9f54350a153032267d Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-08-28 18:43:24 -04:00
Kent Russell	ccac07cb14	topology: Fix compile warnings regarding snprintf and path snprintf throws a warning from -Wformat-truncation where the string could be truncated. We address this by referencing the maximum size that can be returned from a file according to MAXNAMLEN . This should safely guard us from truncating the path value. Change-Id: If1d208990d8775e9494835b0deb890d2616fd15b Signed-off-by: Kent Russell <kent.russell@amd.com>	2019-08-27 11:06:06 -04:00
Sean Keely	8ab8b14902	Initialize dirp in topology_create_temp_cpu_cache_list to NULL. Avoids uninitialized use in early exit (error) paths. Change-Id: I5fb24863f0a5da48776608d47f25e1c8d8aafe35	2019-08-27 00:37:05 -05:00
Amber Lin	4fa930af5a	libhsakmt: get cpu cache info from sysfs Replace cpuid call with sysfs data to get CPU cache information. With this change, x86 check is also removed since sysfs applies to other platforms. CPU cache information can be retrieved from /sys/devices/system/node/nodeX/cpuY/cache where Y is processor number represented in /proc/cpuinfo at "processor" entry. Change-Id: Ic47df6d5dafaf1aae5b46b1fdee42691c697e49e Signed-off-by: Amber Lin <Amber.Lin@amd.com>	2019-08-26 14:52:01 -04:00
Ori Messinger	f2173254e4	Report domain with HsaNodeProperties PCI domain has moved to 32-bits to accommodate virtualization, so a 32-bit integer is exposed for domain to reflect this change. Change-Id: I0d767acadcdc8e4277db203b5865dd67dd001cef Signed-off-by: Ori Messinger <ori.messinger@amd.com>	2019-08-23 11:59:19 -04:00
Jonathan Kim	1ff5cb33b2	add new queue bit test on clear event enable thunk query api to report if queue is newly created test new queue bit test on clear events. also fixup cleanup to disable debug trap. Change-Id: I3ebe2d85da66f28b8c82f0e68461ee7d32ec0b0d Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com> Reviewed-by: Philip Cox <Philip.Cox@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>	2019-08-23 11:25:32 -04:00
Jay Cornwall	814e0f0bdc	Reserve 128 SGPRs per wave in context save area Originally reserved 100 SGPRs per wave. Pre-gfx10 needs 102 SGPRs and gfx10 needs 128 SGPRs. Reserve 128 SGPRs per wave for all ASICs to simplify calculation. Also double VGPR register size for gfx908 family Change-Id: I98b741cbfa051f49ed37ff25d99f851f124be7b6 Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-08-22 23:33:26 -04:00
Felix Kuehling	626957a263	libhsakmt: Enable HSA_USERPTR_FOR_PAGED_MEM by default By using user-allocated pages instead of kernel-allocated pages from TTM, we're not subject to TTM's self imposed limits on kernel memory usage. This also paves the way for for more wide-spread use of HMM on upstream kernels. Change-Id: Iac82964c98a441e29b7f1986d1be1bb5ccb1e569 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2019-08-20 13:54:04 -04:00
Philip Yang	a11cb2a633	libhsakmt: child process destroy vm objects in all apertures child process clone vm objects from svm->apertures if parent process doesn't free memory before fork. fmm_clear_all_mem suppose to clear the apertures in forked child process but this only works if gpu_vm is not NULL. parent process call hsaKmtCloseKFD reset gpu_vm to NULL and then fork, then child process will not clear svm->apertures. As a result, the child process will allocate vm object with same address and add to aperture, there are duplicate vm objects with same address in aperture. Then mapping to GPU will find the wrong vm object and create incorrect GPU mapping cause rocrtst IPC test VM fault. The issue happened with HSA_USERPTR_FOR_PAGED_MEM=1. The fix is to clear vm objects in all apertures in clear_after_fork. Change-Id: I92e42a967075a634a3f475b915c8242d82077ecb Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-08-20 13:31:37 -04:00
Philip Yang	7fc6a9f7c2	Revert "hsaKmtCloseKFD destroy objects in all apertures" This reverts commit `632ad3a749`. This change causes KFDTest failed on gfx803. The first hsaKmtCreateEvent call allocate system memory for events_page because global variable events_page is NULL. And this events page vm address should not be freed until the process exit. The change to destrory objects in hsaKmtCloseKFD removes events page. As a result, KFDTest call hsaKmtOpenKFD again and then allocate memory will get same events_page vm address on gfx803, and map this vm failed because the vm conflict with events_page mapping. KFDTest passed on VG10, gfx906 because allocate memory get different vm address. hsaKmtCreateEvent still works fine as the driver keeps the events page mapping of the process. We should only destroy objects in fork cloned child process regardless if gpu_vm is NULL or not. Change-Id: I174ef65321cbd6074c855c2021318fe961c8c72c Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-08-20 10:52:14 -04:00
Amber Lin	1fddfd316a	libhsakmt: reduce /proc/cpuinfo opens /proc/cpuinfo are opened, read, and closed multiple times. Once for vendor name and multiple times for model name -- each node opens once. For example in a 2 CPUs + 4 GPUs system, it'll be opened 7 times. This patch reads it one time and stores it in a cpuinfo buffer. This cpuinfo buffer is freed when the snapshot is done. Also replace returns with gotos inside the snapshot to avoid possible memory leak. Change-Id: Iaf26a6c7e7323a8651d137c3706179449b9e3c80 Signed-off-by: Amber Lin <Amber.Lin@amd.com>	2019-08-15 12:37:26 -04:00
Jonathan Kim	836dfd0752	libhsakmt: update dbg enable trap and add query debug events Add data out for enable trap to return poll fd to user space. Add query debug events interface. Change-Id: Ia4afde1cf167e6aa61d502380a8b329ee89d5f44 Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>	2019-08-07 10:04:33 -04:00
Philip Yang	632ad3a749	hsaKmtCloseKFD destroy objects in all apertures Otherwise the parent call hsaKmtCloseKFD and then fork child process, child process will duplicate the vm_objects from the parent. Change-Id: Ia6ffc51cbae983b6a7cdc58ccf3b11ebe4087d97 Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-08-07 09:34:06 -04:00
Oak Zeng	f40f166e20	Remove NodeId parameter from hsaKmtAllocQueueGWS The NodeId parameter is redundant and can be retrieved from QueueId parameter. Change-Id: I12853849b868b304bd27633fa7653ba644d69026 Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-08-06 20:20:30 -04:00
Yong Zhao	23db2c658d	libhsakmt: Add gfx908 support Change-Id: Icced5ca4c68eb6cc3978e0d8e836d0ccfc8c980d Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2019-08-01 23:10:28 -04:00
Felix Kuehling	4d7b0990e4	libhsakmt: Sanity check node_id for NUMA binding Ignore requests to bind to invalid NUMA nodes. This affects only legacy applications (such as KFDTest) that allocate system memory as paged memory with a GPU node ID. Change-Id: I81e514af6d0c1ab2ed5229adeeca1fa0ab2a0685 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2019-07-31 11:29:52 -04:00
shaoyunl	02ccb9eb57	Thunk: Add gfx1010 initial support Add gfx1010 basic support on Thunk Change-Id: Ie4c0922158c7f5e2951f8694f4b204f371f1aa23 Signed-off-by: shaoyunl <shaoyun.liu@amd.com>	2019-07-11 17:08:11 -04:00
Philip Yang	67f366243d	fix mbind on NUMA system mbind walks through pages to setup vma memory policy. So we need do mmap to create vma mappings first, then call mbind. mbind will do nothing if vma does not exist. And add numa available check before executing mbind, and return NULL to hsaKmtAllocMemory if mbind failed. Change-Id: I28ab661885d807ca51ef90e87230669dc80f10ec Signed-off-by: Philip Yang <Philip.Yang@amd.com>	2019-07-09 17:53:30 -04:00
Oak Zeng	888e1a7ae7	Use kfd fd to mmap mmio Change-Id: Iadd2e1ea46d0951aaa5a6cefbc7d42d1b2c1f653 Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-06-10 21:07:45 -05:00
Oak Zeng	65d554f5e4	Thunk API to allocate queue GWS Change-Id: I6c5b109e2567cb71aed9245923cfcbeee6295ab2 Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-06-10 21:07:45 -05:00
Oak Zeng	45d717d860	Add node property to report number of GWS Change-Id: I81263ca7ebfa3c0f9f1be78acfa0920e47d551b1 Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-06-10 21:07:45 -05:00
Felix Kuehling	64b90261d9	libhsakmt: Enable invisible debug VRAM mappings by default Remove the HSA_DEBUG environment variable that controlled the creation of these mappings. This should allow the debugger to attach to a running process and access VRAM buffers through ptrace without having to do anything special. On processes that create many small VRAM mappings, this may cause regressions due to the per-process mmap limit. However, the sub-allocator in ROCr should consolidate most small allocations into 2MB blocks nowadays, for good TLB efficiency. So this is unlikely to cause problems. Change-Id: I929da1be0f6cb51ec00a02f3f241d16083e4d95f Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2019-05-17 18:28:14 -04:00
Philip Cox	608bc7c3a0	Fix type mismatch passed to queue suspend/resume The queue IDs passed over to the kernel via kfd_ioctl_dbg_trap_args->ptr should be a list of uint32_t's. Need to convert from the passed in 64 bit HSA_QUEUEID to 32 bit uint32_t's. Change-Id: I8718566d9f9ffc90ce0b2ecc129b10c49d73186a Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2019-05-15 07:33:47 -04:00
Kent Russell	54e042eee1	Add missing gfx803 ID Change-Id: I9eca81f0f149ea924c3b81bd80680d7fd1ad7a6c	2019-05-13 09:03:06 -04:00
Philip Cox	b0d23aee16	fix suspend/resume logic in debug_trap code There was a mistake and RESUME was used when it should have been suspend in two places in the suspend resume code. This fixes that error. Change-Id: I69be733d7ae7c14ce5ee8af57a307976e4212d62	2019-05-07 06:56:00 -04:00
Philip Cox	c2c1385e29	libhsakmt: Update wave suspend/resume API This is updating to the new suspend and resume API for the KFD and the thunk. We now support passing in a list of queues to suspend, and not just all of the queues for the process. The kfdtest testcase was also updated so it still compiles. Change-Id: I71d1b178476bd9df0c311bdedaa6a891528cebcf Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2019-05-03 10:32:47 -04:00
Philip Cox	d21e9d5bbd	libhsakmt: Update HsaQueueInfo for GetQueueInfo hsaKmtGetQueueInfo needs to return the control stack size, and the wave state size for the debugger. These changes are needed to support returning the new values. Change-Id: Ib4c60e0ea34446c06aef4a86996250989f348a69 Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2019-05-03 10:32:47 -04:00
Oak Zeng	804aa90a22	Add MMIO_REMAP heap type Add a MMIO_REMAP heap type and expose mmio virtual address through HsaKmtGetNodeMemoryProperties Change-Id: I1e585e6dfbec8fa7c85f1dda7b89b763a8e2c439 Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>	2019-04-30 15:40:50 -05:00

1 2 3 4 5 ...

406 Revīzijas