rocm-systems

Автор	SHA1	Сообщение	Дата
Felix Kuehling	5e4e19d47b	libhsakmt: Distinguish EPERM and EACCES EPERM means "operation not permitted" and is returned when CGroup access checks fail. EACCES means "permission denied" and is returned when the device file permission bits or access control list don't allow access. EPERM can fail silently, since we assume the administrator disabled a device on purpose in the CGroup. EACCESS should produce an error message and an info message to check the device file permissions. Change-Id: Iee4c5584c5fdc4e113c3d760dede6661097b4341 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-11-12 17:06:18 -05:00
Mike Li	3afce42b57	Changed scripts to include running kfdtest in docker container Change-Id: I822ff4869610df6abad846542d7c290b7a5aae79	2018-11-07 16:09:12 -05:00
Gang Ba	c54c1dbdcb	Add code to support packet capture and replay in the Thunk This feature only support dgpu for now. Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2 Signed-off-by: Gang Ba <gaba@amd.com>	2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan	c1994e28f0	libhsakmt: Support device controller cgroup Device whiltelist controller cgroup allows to track and enforce open and mknod restrictions on device files. Tasks should works with /dev/dri/renderN devices that are whitelisted for its cgroup. If a certain node is not whitelisted it is not an error condition. Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-10-30 11:31:53 -04:00
Kent Russell	10edccb912	Specify requirement of NUMA libs for Thunk Add the numa libs to the thunk specs for DEB/RPM, so we can remove the manual installation requirement Change-Id: I5aadcf581b64e9a20aee9c1e1204af4715d1e990	2018-10-25 07:37:07 -04:00
Philip Cox	105edd4bb4	Fix Debug Thunk spec mismatch Move debug trap support capabilities to their own structure to fix thunk spec vs header mismatch. Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2 Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2018-10-24 11:32:12 -04:00
xinhui pan	7a13bb4d66	kfdtest: blacklist KFDQMTest.SdmaEventInterrupt On gfx900+, the test sometimes timeout due to cp fw bug. Blacklist it until we address the root cause and have a fix. Change-Id: Iff600a6f6dbd86c56e034f530484205520bced32 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-10-19 15:29:54 -04:00
xinhui pan	ab4610cff7	kfdtest: Add more debug information of sdma event interrupt test We observe this test fails on gfx900+. Looks like the sdma packets are not executed at all after we submit sometimes. Run it with timeout 2s on gfx900. [ RUN ] KFDQMTest.SdmaEventInterrupt [----------] SDMACopyData FAIL! 1485262707170 VS 1485262747814 [----------] Event On Queue 1:0 Timeout, try to resubmit packets! [----------] The timeout event is signaled! [ ] Time Consumption (ns) [ ] 1: 1859427148 [ ] 2: 680148 [ ] 3: 6370 [ ] 4: 5481 /home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure Value of: (ret) Actual: 31 Expected: HSAKMT_STATUS_SUCCESS Which is: 0 [----------] SDMACopyData FAIL! 1485367669958 VS 1485367750022 [----------] Event On Queue 2:1 Timeout, try to resubmit packets! [----------] The timeout event is signaled! [ ] Time Consumption (ns) [ ] 1: 1881615148 [ ] 2: 673629 [ ] 3: 6074 [ ] 4: 5481 /home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure Value of: (ret) Actual: 31 Expected: HSAKMT_STATUS_SUCCESS Which is: 0 [----------] SDMACopyData FAIL! 1485427671250 VS 1485427751238 [----------] Event On Queue 2:1 Timeout, try to resubmit packets! [----------] The timeout event is signaled! [ ] Time Consumption (ns) [ ] 1: 1881508777 [ ] 2: 741629 [ ] 3: 6074 [ ] 4: 5481 /home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure Value of: (ret) Actual: 31 Expected: HSAKMT_STATUS_SUCCESS Which is: 0 [ FAILED ] KFDQMTest.SdmaEventInterrupt (23675 ms) Change-Id: I7c1b752537d89782570df20838bf976578614f75 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-10-19 15:29:54 -04:00
Yong Zhao	d7e6d4706c	kfdtest: Clean up the indentations in PM4ReleaseMemoryPacket::InitPacket() Change-Id: I7f6b08697f6a68bf8c4a388c9f1cf3c3c8e6c81f Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-10-17 14:28:15 -04:00
Yong Zhao	77bab8596f	kfdtest: Improve the SignalEvent test Create an extra event so that the event id to test is non zero. That way we can be sure the context id received in kernel ISR is non zero, which is different from the default value 0 when context id is not set at all. Change-Id: I7e261d1bbb783d5afd15558c7ac00493b1218cef Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-10-17 14:27:54 -04:00
Gang Ba	52ec7f805e	drm/amdkfd: Added gfx904 and gfx803 for KFD. Change-Id: I4406dc70c776926feaecca3f2146d65259a80517 Signed-off-by: Gang Ba <gaba@amd.com>	2018-09-25 08:17:44 -04:00
Mike Li	3144a84b9a	all_gpu_id_array: Handle GPU resource management GPU Resource management can disable some of the GPU nodes. The Kernel driver could be not aware of this. Get from Kernel driver information of all the nodes and then filter it. Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32 Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-09-24 11:38:11 -04:00
Mike Li	c3b47c0959	kfdtest: Handle GPU resource management Currently the FindDRMRenderNode function will access the sysfs directly to find the render node. It doesn't work with the GPU management changes. Have changed code to call hsaKmtGetNodeProperties instead. Change-Id: I3bb537a323bc1e8c49f38d8aabc60c13e268aecd Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-09-24 11:38:11 -04:00
Mike Li	f9bd960344	Output a error message only when open_drm_render_device failed unexpectedly. Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-09-24 11:36:11 -04:00
xinhui pan	918a45a430	kfdtest: add P2POverheadTest This is to measure the laterncy + overhead of sdma packet consumption on p2p. It is Similar with QueueLatency test. What's more, the queue's overhead with different workload show more details. test result on two gfx900. [ RUN ] KFDPerformanceTest.P2POverheadTest [ ] Test (avg. ns) \| Size 4 8 16 64 256 1024 [ ] ----------------------------------------------------------------------- [ ] [push] [1 -> 0] 333 148 185 111 148 148 [ ] [push] [1 -> 1] 370 222 333 74 148 111 [ ] [push] [1 -> 2] 333 148 148 148 148 148 [ ] [push] [2 -> 0] 111 333 259 148 148 148 [ ] [push] [2 -> 1] 222 148 185 148 148 148 [ ] [push] [2 -> 2] 222 111 370 111 74 148 [ ] [pull] [1 -> 0] 370 296 296 148 185 148 [ ] [pull] [1 -> 1] 185 333 222 148 222 148 [ ] [pull] [1 -> 2] 222 444 259 148 185 111 [ ] [pull] [2 -> 0] 148 148 148 148 148 148 [ ] [pull] [2 -> 1] 148 148 148 148 148 148 [ ] [pull] [2 -> 2] 185 148 148 74 222 296 [ ] [push\|pull][1 -> 0] 1259 1222 1259 1074 1037 962 [ ] [push\|pull][1 -> 1] 1037 1037 1037 740 740 1000 [ ] [push\|pull][1 -> 2] 1259 1259 1296 1037 1000 1074 [ ] [push\|pull][2 -> 0] 1037 1037 1037 1074 1037 1148 [ ] [push\|pull][2 -> 1] 1037 1037 1037 1037 925 1074 [ ] [push\|pull][2 -> 2] 666 666 740 740 703 925 [ OK ] KFDPerformanceTest.P2POverheadTest (459 ms) Change-Id: I422263cb52f7ce184f6f1ff4466d04c239fbe9c9 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-09-24 09:28:00 -04:00
Harish Kasiviswanathan	fb79a0efe2	Topology: Use processors available to the process The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of processors available to the scheduler. When a KFD process is run under a container environment, only a subset (cpuset) of processors are available to the current process. For getting CPU cache information use sched_getaffinity() to get the number of processors available to the current process. Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-21 10:31:59 -04:00
xinhui pan	e5a541eaf2	kfdtest: Add P2P bandwidth test The test measures the bandwidth between GPUs. Currently we do not care numa topology as some products really support across PCI-e root complex p2p. test result on two gfx900 system. [ RUN ] KFDPerformanceTest.P2PBandWidthTest [ ] Copy from node to node by [push, NONE] [ ] [1 -> 0] 6.13477 - 6.12695 GB/s [ ] [1 -> 2] 3.77734 - 3.76855 GB/s [ ] [2 -> 0] 6.67676 - 6.6543 GB/s [ ] [2 -> 1] 6.14453 - 6.12793 GB/s [ ] Copy from node to node by [pull, NONE] [ ] [1 -> 0] 6.10547 - 6.08105 GB/s [ ] [1 -> 2] 9.65527 - 9.65039 GB/s [ ] [2 -> 0] 6.49805 - 6.4873 GB/s [ ] [2 -> 1] 8.95508 - 8.85254 GB/s [ ] Full duplex copy from node to node by [push\|pull, NONE] [ ] [1 -> 0] 11.0986 - 11.0986 GB/s [ ] [1 -> 2] 7.54297 - 7.54297 GB/s [ ] [2 -> 0] 12.0264 - 11.9639 GB/s [ ] [2 -> 1] 12.0469 - 12.0371 GB/s [ ] Full duplex copy from node to node by [push, push] [ ] [1 <-> 2] 11.7324 - 11.4541 GB/s [ ] Full duplex copy from node to node by [pull, pull] [ ] [1 <-> 2] 11.4824 - 11.0508 GB/s [ ] Copy from node to multiple nodes by [push, NONE] [ ] [1 -> [0...2]] 5.625 - 5.73633 GB/s [ ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s [ ] Copy from multiple nodes to node by [push, NONE] [ ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s Now we can get more timestamp info like below. Copy from node to node by [push, NONE] [1 -> 0] [1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-############################### [1 : 1] #################################################################################################### [1 -> 2] [1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-###################################### [1 : 1] ##################################################################################################-# [2 -> 0] [2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-################# [2 : 1] ###############################################################################-#############-###-## [2 -> 1] [2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-#################### [2 : 1] ################################################################################-###-############-## [snip] Full duplex copy from node to node by [push, push] [1 <-> 2] [1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#################################### [1 : 1] ################-###################################################-############-####-############# [2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-################## [2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-## Full duplex copy from node to node by [pull, pull] [1 <-> 2] [1 : 0] ######################################################################-##-#-###############-####-### [1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################ [2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############ [2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-############# Copy from node to multiple nodes by [push, NONE] [1 -> [0...2]] [1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-############################### [1 : 1] ########################################################################################-###-###-### [2 -> [0...2]] [2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-################## [2 : 1] -################################################################################################-## Copy from multiple nodes to node by [push, NONE] [[1...2] -> 0] [1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-############################### [1 : 1] ################################################################################################-#-# [2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-################## [2 : 3] #########################-#########################-#########################-######################### [ OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms) Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-09-19 12:03:05 +08:00
xinhui pan	f618b3f075	kfdtest: add KFDTestUtilQueue Some infrastructures below, Implement SdmaTimePacket which records the global GPU timestamp. Introduce class AsyncMPSQ and AsyncMPMQ. AsyncMPSQ is aka async multiple packet single queue. It takes a set of packet when create and submits them to a GPU to run. While AsyncMPMQ is aka async multiple packet multiple queue. It manages a set of AsyncMPSQ, and use a forloop to do operations of AsyncMPSQ. Implement sdma_multicopy helper functions. Change-Id: I47e1d2ca9630113b2a1d85a0055f3f8ee629fb5f Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-09-19 12:03:05 +08:00
Xiaojie Yuan	247fa9f1e0	Use 'RecordProperty' to record performance scores For following test cases: - KFDQMTest.QueueLatency - KFDQMTest.BasicCuMaskingLinear - KFDQMTest.BasicCuMaskingEven - KFDMemoryTest.MMBandWidth - KFDMemoryTest.MMapLarge - KFDMemoryTest.MMBench v2: xml element cannot start with a number, so change the key name of MMBandWidth and MMBench accordingly xml element cannot contain whitespaces, so trim whitespaces in "VRAM " v3: introduce KFDLog-like way to use KFDRecord Change-Id: Ifc3ed5657621252a7b39dccf1ef4f50a92593f77 Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>	2018-09-18 17:41:14 +08:00
xinhui pan	a6287ba919	kfdtest: Do not set GTEST_FLAG throw_on_failure This change is from commit 62f7dc2a("kfdtest: Do not set GTEST_FLAG throw_on_failure"). But it is unexpected to reverted by commit 414042ab("kfdtest: Clean up comments"). So add this change back. Fix: `414042ab` Change-Id: Ia9e99c9ca17b99aab62b4db55017018ddae43dfb Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-09-11 10:25:56 +08:00
xinhui pan	07bd97a864	kfdtest: Fix queuelatency fail issue The timestamp written by releaseMemory packet might still not be visible when we fetch it. To fix this bug, use event-based wait. Change-Id: If2324eb3b3a632c711ee4dff4d03a93d5306c289 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-09-10 21:17:29 -04:00
Felix Kuehling	be574169c1	libhsakmt: Fix segfault on gfx801 Handle the case that svm.dgpu_aperture does not exist in vm_find_object. Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan	1fda429726	kfdtest: GetNodeIoLinkProperties: Display NodeFrom Use the NodeFrom returned by hsaKmtGetNodeIoLinkProperties() to check its correctness. Change-Id: I6ce436dc7c5d5b192bee21156292bd3eff77f916 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-10 09:44:24 -04:00
Harish Kasiviswanathan	7876bb70a9	Add cgroup support Some nodes are unavailable based on the task's cgroup hierarchy. Handle this situation by ignoring those nodes Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan	866ef20054	iolinks: Handle GPU resource management With GPU resource management, some nodes are unavailable based on the cgroup hierarchy of the task. Kernel via sysfs specifies all the iolinks. Skip the links which are not accessible. Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to relevant user Node IDs v2: NodeFrom mapped from sysfs Node to User Node Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4 Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan	f84a99e953	Replace global variable _system with g_system Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>	2018-09-06 16:56:07 -04:00
xinhui pan	9c7cfc0df2	kfdtest: Add event-based synchronization mechanism to queues Wait4PacketConsumption now can accept an event to wait all packets subbmitted to be processed. Change-Id: I1497b7704e892b04d05811b8d3e4742237c1be57 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-09-04 21:21:19 -04:00
Felix Kuehling	a9bd6e6f8b	Revert "libhsakmt: Try to use CPU addr as GPU addr for userptrs" This reverts commit `ab181c46c0`. This fixes ambiguity when looking up GPU addresses with hsaKmtQueryPointerInfo. hsa_amd_agents_allow_access uses hsaKmtQueryPointerInfo, and depends on finding the correct object from a GPU address. Finding the wrong userptr object based on its CPU address leads to incorrect GPU mappings and results in VM faults. Change-Id: I7c5f571ee6e1f9d32687aa3eab6d96944ad032be Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-31 15:04:50 -04:00
Felix Kuehling	608dddbe9d	kfdtest: Fix gfx902 blacklist Removed some tests from the blacklist that are now passing. Added two new tests that hang the GPU. Change-Id: I09e729590e5181311375058be492d387342ba2fe Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-31 15:04:50 -04:00
Felix Kuehling	855f1a32a9	libhsakmt: Fix and deduplicate object lookup code Added a helper vm_find_object that can be used everywhere we need to lookup objects by their address and optionally size. This unifies all subtly different, partially incomplete, or broken ways of doing this in various functions: * map * unmap * register * deregister * free * get_mem_info * set_mem_user_data At the same time fix some subtle problems for userptr lookup that got a bit more complex when the userptr address can match the GPU address. Change-Id: I98572d1734fc7688a1d68f6a784e02c8dee90af5 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-31 15:04:47 -04:00
shaoyunl	30a4ab39f3	thunk: Avoid create PCIe indirect link on none large bar target PCIe P2P (indirect) IOLinks should only be created if the remote GPU is large-BAR Change-Id: I55cbb5e37c5d41267583e07aca6bdcc708403029 Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>	2018-08-29 16:31:55 -04:00
Shaoyun Liu	7796994f46	Thunk: Avoid add indirect link for the GPUS with xGMI link Change-Id: I06f511c55e28919512fda79b504566818dc2a5ab Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>	2018-08-29 13:22:58 -04:00
xinhui pan	a040a24243	kfdtest: Let BigBufferStressTest detect memory leak As it will alloc as much as small system memory to reach the allocation limit. We can try to alloc memory several times to see if any allocation in the previous step cause memory leak. Also we test if GPU can access these memory correctly or not. Change-Id: I309f9821b6bc99c212a6bfbc21fe3086ab589fd3 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-28 22:50:42 -04:00
Shaoyun Liu	f9faf05fd9	Thunk: Add xgmi thunk interface definition Add XGMI related defines in thunk according to the document HSAKMT library interface specification v1.16 Change-Id: Ib25ff0ddf7380c97d06bd76fb730915e7c634270 Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>	2018-08-27 13:13:37 -04:00
xinhui pan	3e527bc7e8	kfdtest: add PM4EventInterrupt test Similar with SdmaEventInterrupt, verify event interrupt on pm4 queue. Change-Id: I0e43f26fd0d965126985820704215d2ef5e52c1a Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-24 13:21:01 +08:00
xinhui pan	bdb1f8a066	kfdtest: Let SdmaEventInterrupt test more meaningful Simulate some workload there to verify the sDMA event interrupt. Change-Id: Ib5ad0c238cc66898f7835e765df50427ef106b04 Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-24 11:27:34 +08:00
xinhui pan	1076075a1c	kfdtest: Add some asserts in BigBufferStressTest It should have PASS/FAIL report for the vram allocated size. Change-Id: I546c02c2ed02f1cfb5278e0dfd7b18ade39faafb Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-23 23:01:20 -04:00
Mike Li	3437a356c7	Decouple user NodeID and sysfs NodeID Currently, all HSA nodes are exposed to user. So the existing implementation assumes a one to one mapping between user NodeId and sysfs nodeId. GPU Resource Management will provide control over the exposed HSA nodes. This means not all HSA nodes will be exposed to the user. Decouple it. The mapping from user NodeId to sysfs NodeId will be local to topology.c and topology helper functions. For others NodeId should be sequential from 0 to Number of Nodes exposed to user. v1: initial implementation v2: map node id within the topology_* functions v3: remove two static globals v4: add bounds check got node id Change-Id: Id12147ece41d682430f398944bbb339ca906eb1b Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>	2018-08-23 16:01:32 -04:00
Kent Russell	fe33461622	kfdtest: Consolidate logic for ASSERT vs EXPECT ASSERT failures result in immediate termination of the test. EXPECT returns a failure but continues execution. Reserve ASSERT for required functionality (node initialization, queue creation, etc) where the rest of the test cannot run if that call fails. Use EXPECT everywhere else Change-Id: I1c11326fc3ae22b50fa83b07b3b49af1e1f4e69e	2018-08-23 06:20:18 -04:00
Kent Russell	414042abf7	kfdtest: Clean up comments Consolidate style (use /* */ for multi-line), fix typos, use dword instad of DWORD/DWord Change-Id: I620e45c1687550db41127e45641b7d79d28223a1	2018-08-23 06:20:17 -04:00
Philip Cox	db92d5af23	Add GFX debug trap control code Add initial support for the kfd debugger trap support for GFX9 chips. - Adding support for Enable/Disable trap support - Setting debug trap support data - Setting wave launch trap override - Setting wave launch mode Change-Id: If39f2395c4b6cf56249cf76f1c44cfcbdcef891c Signed-off-by: Philip Cox <Philip.Cox@amd.com>	2018-08-22 14:40:15 -04:00
Felix Kuehling	9271e69ddf	libhsakmt: Fix processing of memory fault events AMDKFD_IOC_WAIT_EVENTS with multiple events and wait_for_all = 0 returns success after any of the events have signaled. So we can't blindly assume that a memory fault event that was in the list has actually signaled. Check the gpu_id as an indicator whether there really was a memory fault before processing it further. Change-Id: I6cc311bfc184c631beaf684027176a6ca42e05c1 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-17 16:06:45 -04:00
Felix Kuehling	ab181c46c0	libhsakmt: Try to use CPU addr as GPU addr for userptrs If the CPU addr of a userptr is accessible by the GPU, try to use it instead of allocating a different GPU address. If something else is already registered with an overlapping address range, we still need to allocate a GPU address, because KFD does not support overlapping GPUVM mappings. Change-Id: I452963ee45a454f735755a0b43122b9aee5d55be Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-17 16:06:45 -04:00
Felix Kuehling	80f2cc644c	libhsakmt: Add mmap-based aperture management for GFXv9 and later If the GPU virtual address space is >= 47 bits, don't reserve virtual address space at startup and use mmap to allocate virtual addresses. Change-Id: Ic935b03c8e78271829fc8e6cfd0e543184aff818 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-17 16:06:45 -04:00
xinhui pan	163fa2f3aa	kfdtest: use HSAuint64 instead of unsigned HSAint64 This should fix gtest compile errors. code like below has trouble, typedef char char8; typedef unsigned char uchar8; ASSERT_NE((uchar8)1, 0); ASSERT_NE((unsigned char8)1, 0); // compile error here or ASSERT_NE((unsigned char8)1, 0); ASSERT_NE((uchar8)1, 0); // compile error here HSA[u]int64 are alias. So ASSERT_XX((unsigned HSAint64)..) with ASSERT_XX((HSAuint64)..) fail to compile. Change-Id: I4c24bc699a69bd4f37c4bc8aaaa9f1a92a24a33e Signed-off-by: xinhui pan <xinhui.pan@amd.com>	2018-08-16 16:03:52 +08:00
Yong Zhao	62f7dc2a48	kfdtest: Do not set GTEST_FLAG throw_on_failure The flag makes EXPECT_* to behave like ASSERT_*, which actually work against our favor, so disable the flag. Change-Id: I2ea1dfeaf916b396593a504d081148abdac0fc70 Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>	2018-08-15 18:08:39 -04:00
Felix Kuehling	40c46cc6cb	libhsakmt: Fix assumptions about userptrs relative to apertures So far we have assumed that userptrs are always memory outside reserved SVM apertures that are mapped into the SVM aperture for GPU access. With an unreserved SVM aperture that covers the entire virtual address range, this distinction will no longer be true. Userptrs will generally be inside the unreserved SVM aperture. Take that into consideration when registering, mapping and unmapping virtual addresses. We now need a retry logic when looking up buffers from addresses. If it is not found by its GPU address, try it as a userptr. We also need to consider the new possibility that a userptr is registered at the same address for CPU and GPU access. So a buffer found by its GPU address may also turn out to be a userptr. In that case use a stricter lookup using the userptr and size (if the size is known), to identify the correct one of multiple overlapping objects. Change-Id: Ia43633aaa40f9fd2a74918ae969a631d2ff68419 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-15 16:07:54 -04:00
Felix Kuehling	d79b9c1a29	libhsakmt: Make VA management scheme configurable per aperture Change-Id: Ib70b038b4ef6465b03545317c6494a4e4950c107 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-15 14:22:19 -04:00
Felix Kuehling	d57026f447	libhsakmt: Allow dgpu and dgpu_alt aperture to be the same Make dgpu_aperture and dgpu_alt_aperture pointers that can point to the same actual aperture. This will be useful on GFXv9 and later, where the MType is not defined by the aperture and we want to have a single aperture covering the entire virtual address space. aperture->is_coherent can no longer be a reliable indicator of coherency. Replace it with different conditions based on mem flags and svm.disable_cache (from HSA_DISABLE_CACHE environment). Change-Id: Iefc415b87b8abd96e3916586485a0a55d9b27c19 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>	2018-08-15 14:22:19 -04:00
Felix Kuehling	2d2181b478	libhsakmt: Move unmapping into aperture_release_area This prepares the code for an alternative aperture management method that needs to unmap memory differently. Change-Id: I5494aa5420f85edb8f7857f00c17e1d2e6479a51 Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>	2018-08-15 14:22:19 -04:00

1 2 3 4 5 ...

435 Коммитов