Commit Graph

2959 Commits

Author SHA1 Message Date
Sean Keely dda9c17b45 Move VM fault handler init to after all devices are registered.
During registration we must not call any function that depends on registered
data as the lists are not yet complete.  This includes signal allocation since
allocating shared GPU mapped memory depends on the list of GPUs.

Change-Id: I94d59e847802c546c2a5a0d9f55fe5ac3fd1d878
2018-11-09 03:10:08 -06:00
Mike Li 3afce42b57 Changed scripts to include running kfdtest in docker container
Change-Id: I822ff4869610df6abad846542d7c290b7a5aae79
2018-11-07 16:09:12 -05:00
Gang Ba c54c1dbdcb Add code to support packet capture and replay in the Thunk
This feature only support dgpu for now.

Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2
Signed-off-by: Gang Ba <gaba@amd.com>
2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan c1994e28f0 libhsakmt: Support device controller cgroup
Device whiltelist controller cgroup allows to track and enforce open and
mknod restrictions on device files. Tasks should works with
/dev/dri/renderN devices that are whitelisted for its cgroup. If a
certain node is not whitelisted it is not an error condition.

Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-10-30 11:31:53 -04:00
Kent Russell 10edccb912 Specify requirement of NUMA libs for Thunk
Add the numa libs to the thunk specs for DEB/RPM, so we can remove the
manual installation requirement


Change-Id: I5aadcf581b64e9a20aee9c1e1204af4715d1e990
2018-10-25 07:37:07 -04:00
Philip Cox 105edd4bb4 Fix Debug Thunk spec mismatch
Move debug trap support capabilities to their own
structure to fix thunk spec vs header mismatch.



Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2018-10-24 11:32:12 -04:00
Sean Keely 9ec37b5103 Ensure runtime cleanup when hsa_init ref count reaches 0.
Delete the runtime object when the last hsa_shut_down occurs.

Change-Id: I2005d52d06702eaef166714fd5e471cc277924db
2018-10-22 19:32:00 -05:00
Evgeny d788a53972 aqlprofile extension version check
Change-Id: If824764f199eca15a0341cdf6177d8d6353e29f3
2018-10-22 15:36:57 -04:00
Sean Keely 757502ccd6 Report internal queue creation to tools.
Debug agent requires handles to internal queues for single step debugging.
Added tools only API hsa_amd_runtime_queue_create_register for reporting.

hsa_amd_runtime_queue_create_register sets a callback which is invoked
when internal queues are created.

Change-Id: Ia5190ae724fadba686c15f25b2cd085350eeff0e
2018-10-20 23:12:27 -04:00
Sean Keely 5975c465ad Fully initialize GPU agents before loading tools.
Required for debug agent requires copy API and trap handler to be initalized
prior to loading.  Existing tools do not make use of internal queue or scratch
memory intercept which is what PostToolsInit allows.

PostToolsInit() will be removed in a following cleanup change.

Change-Id: If43377843808e3eff0defd9204910a67a852902f
2018-10-20 23:12:14 -04:00
Sean Keely 6852282a07 Refactor of Runtime::CopyMemory()
Change-Id: I32a7cb24d00660ff4471d121ef7b3c2eec8fced2
2018-10-20 14:38:50 -04:00
xinhui pan 7a13bb4d66 kfdtest: blacklist KFDQMTest.SdmaEventInterrupt
On gfx900+, the test sometimes timeout due to cp fw bug.
Blacklist it until we address the root cause and have a fix.

Change-Id: Iff600a6f6dbd86c56e034f530484205520bced32
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-10-19 15:29:54 -04:00
xinhui pan ab4610cff7 kfdtest: Add more debug information of sdma event interrupt test
We observe this test fails on gfx900+. Looks like the sdma packets are not
executed at all after we submit sometimes.

Run it with timeout 2s on gfx900.
[ RUN      ] KFDQMTest.SdmaEventInterrupt
[----------] SDMACopyData FAIL! 1485262707170 VS 1485262747814
[----------] Event On Queue 1:0 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1859427148
[          ] 2: 680148
[          ] 3: 6370
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485367669958 VS 1485367750022
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881615148
[          ] 2: 673629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485427671250 VS 1485427751238
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881508777
[          ] 2: 741629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[  FAILED  ] KFDQMTest.SdmaEventInterrupt (23675 ms)

Change-Id: I7c1b752537d89782570df20838bf976578614f75
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-10-19 15:29:54 -04:00
Yong Zhao d7e6d4706c kfdtest: Clean up the indentations in PM4ReleaseMemoryPacket::InitPacket()
Change-Id: I7f6b08697f6a68bf8c4a388c9f1cf3c3c8e6c81f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-10-17 14:28:15 -04:00
Yong Zhao 77bab8596f kfdtest: Improve the SignalEvent test
Create an extra event so that the event id to test is non zero. That
way we can be sure the context id received in kernel ISR is non zero, which
is different from the default value 0 when context id is not set at all.

Change-Id: I7e261d1bbb783d5afd15558c7ac00493b1218cef
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-10-17 14:27:54 -04:00
Konstantin Zhuravlyov 509bb777e0 Loader: Update license for AMDHSAKernelDescriptor.h
Change-Id: I3a48b595ba089ca8a25f878c056b04a417a2364f
2018-10-12 14:51:05 -04:00
Sean Keely 1e0d690948 Use ptrinfo rather than apertures in hsa_memory_copy
Apertures now overlap with the change to 48bit addressing which
precludes using aperture checks to discover buffer ownership.
Switches to ptrinfo to decide which device a buffer owned by.

This corrects faults in the legacy hsa_memory_copy api.

Change-Id: I5c7ce0216e1cdc96f836fc6fec9c3defdf4b9d90
2018-10-11 13:34:53 -04:00
Konstantin Zhuravlyov 386874da55 Loader: Add support for v3 object code.
Change-Id: I7215bd0c1277c2036bf0fadf5b23cb57fdf7f665
2018-10-06 14:01:59 -04:00
Jay Cornwall f1ffbc3286 Revert "Extend SDMA disable list until firmware stability resolved"
This reverts commit 5e1ccdc4a9.

Change-Id: I17b379e4d0e49a79dc8d4a60f01ea424fda24f02
2018-10-05 15:17:27 -04:00
Kent Russell ed9baefd75 Only remove ldconf on uninstall
On update, the removal will occur AFTER the new package is installed,
due to some stupidity with how yum/rpm does things. Only remove it if
we're doing a pure uninstall

Change-Id: I4982610828d8bc1f2d8691b1e4ee1718c89413cc
2018-10-03 08:10:06 -04:00
Evgeny fdbe277f2a hsa_ven_amd_aqlprofile_pfn_t alias
Change-Id: Ia4a67ef0d2f8975f0e541e85c215afec76e9de5f
2018-09-26 14:10:21 -04:00
Gang Ba 52ec7f805e drm/amdkfd: Added gfx904 and gfx803 for KFD.
Change-Id: I4406dc70c776926feaecca3f2146d65259a80517
Signed-off-by: Gang Ba <gaba@amd.com>
2018-09-25 08:17:44 -04:00
Mike Li 3144a84b9a all_gpu_id_array: Handle GPU resource management
GPU Resource management can disable some of the GPU nodes.
The Kernel driver could be not aware of this.
Get from Kernel driver information of all the nodes and then filter it.

Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-09-24 11:38:11 -04:00
Mike Li c3b47c0959 kfdtest: Handle GPU resource management
Currently the FindDRMRenderNode function will access the sysfs
directly to find the render node. It doesn't work with the
GPU management changes. Have changed code to call hsaKmtGetNodeProperties
instead.

Change-Id: I3bb537a323bc1e8c49f38d8aabc60c13e268aecd
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-09-24 11:38:11 -04:00
Mike Li f9bd960344 Output a error message only when open_drm_render_device failed unexpectedly.
Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-09-24 11:36:11 -04:00
xinhui pan 918a45a430 kfdtest: add P2POverheadTest
This is to measure the laterncy + overhead of sdma packet
consumption on p2p.
It is Similar with QueueLatency test. What's more, the queue's overhead
with different workload show more details.

test result on two gfx900.
[ RUN      ] KFDPerformanceTest.P2POverheadTest
[          ] Test (avg. ns) | Size	4	8	16	64	256	1024
[          ] -----------------------------------------------------------------------
[          ] [push]     [1 -> 0]	333	148	185	111	148	148
[          ] [push]     [1 -> 1]	370	222	333	74	148	111
[          ] [push]     [1 -> 2]	333	148	148	148	148	148
[          ] [push]     [2 -> 0]	111	333	259	148	148	148
[          ] [push]     [2 -> 1]	222	148	185	148	148	148
[          ] [push]     [2 -> 2]	222	111	370	111	74	148
[          ] [pull]     [1 -> 0]	370	296	296	148	185	148
[          ] [pull]     [1 -> 1]	185	333	222	148	222	148
[          ] [pull]     [1 -> 2]	222	444	259	148	185	111
[          ] [pull]     [2 -> 0]	148	148	148	148	148	148
[          ] [pull]     [2 -> 1]	148	148	148	148	148	148
[          ] [pull]     [2 -> 2]	185	148	148	74	222	296
[          ] [push|pull][1 -> 0]	1259	1222	1259	1074	1037	962
[          ] [push|pull][1 -> 1]	1037	1037	1037	740	740	1000
[          ] [push|pull][1 -> 2]	1259	1259	1296	1037	1000	1074
[          ] [push|pull][2 -> 0]	1037	1037	1037	1074	1037	1148
[          ] [push|pull][2 -> 1]	1037	1037	1037	1037	925	1074
[          ] [push|pull][2 -> 2]	666	666	740	740	703	925
[       OK ] KFDPerformanceTest.P2POverheadTest (459 ms)

Change-Id: I422263cb52f7ce184f6f1ff4466d04c239fbe9c9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-24 09:28:00 -04:00
Scott Linder 47f0e6f7d3 Apply dynamic relocations for STT_FUNC symbols
Required to support function calls through GOT table.

Change-Id: I174a0269fdd67369d38fe41855b7bd01f350b839
2018-09-23 21:42:32 -04:00
Harish Kasiviswanathan fb79a0efe2 Topology: Use processors available to the process
The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of
processors available to the scheduler. When a KFD process is run under a
container environment, only a subset (cpuset) of processors are
available to the current process.

For getting CPU cache information use sched_getaffinity() to get the
number of processors available to the current process.

Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-21 10:31:59 -04:00
xinhui pan e5a541eaf2 kfdtest: Add P2P bandwidth test
The test measures the bandwidth between GPUs. Currently we do not
care numa topology as some products really support across PCI-e root
complex p2p.

test result on two gfx900 system.
[ RUN      ] KFDPerformanceTest.P2PBandWidthTest
[          ] Copy from node to node by [push, NONE]
[          ] [1 -> 0] 6.13477 - 6.12695 GB/s
[          ] [1 -> 2] 3.77734 - 3.76855 GB/s
[          ] [2 -> 0] 6.67676 - 6.6543 GB/s
[          ] [2 -> 1] 6.14453 - 6.12793 GB/s
[          ] Copy from node to node by [pull, NONE]
[          ] [1 -> 0] 6.10547 - 6.08105 GB/s
[          ] [1 -> 2] 9.65527 - 9.65039 GB/s
[          ] [2 -> 0] 6.49805 - 6.4873 GB/s
[          ] [2 -> 1] 8.95508 - 8.85254 GB/s
[          ] Full duplex copy from node to node by [push|pull, NONE]
[          ] [1 -> 0] 11.0986 - 11.0986 GB/s
[          ] [1 -> 2] 7.54297 - 7.54297 GB/s
[          ] [2 -> 0] 12.0264 - 11.9639 GB/s
[          ] [2 -> 1] 12.0469 - 12.0371 GB/s
[          ] Full duplex copy from node to node by [push, push]
[          ] [1 <-> 2] 11.7324 - 11.4541 GB/s
[          ] Full duplex copy from node to node by [pull, pull]
[          ] [1 <-> 2] 11.4824 - 11.0508 GB/s
[          ] Copy from node to multiple nodes by [push, NONE]
[          ] [1 -> [0...2]] 5.625 - 5.73633 GB/s
[          ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s
[          ] Copy from multiple nodes to node by [push, NONE]
[          ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s

Now we can get more timestamp info like below.

Copy from node to node by [push, NONE]
[1 -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-###############################
[1 : 1] ####################################################################################################
[1 -> 2]
[1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-######################################
[1 : 1] ##################################################################################################-#
[2 -> 0]
[2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-#################
[2 : 1] ###############################################################################-#############-###-##
[2 -> 1]
[2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-####################
[2 : 1] ################################################################################-###-############-##

[snip]

Full duplex copy from node to node by [push, push]
[1 <-> 2]
[1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-####################################
[1 : 1] ################-###################################################-############-####-#############
[2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##################
[2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-##
Full duplex copy from node to node by [pull, pull]
[1 <-> 2]
[1 : 0] ######################################################################-##-#-###############-####-###
[1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################
[2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############
[2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-#############
Copy from node to multiple nodes by [push, NONE]
[1 -> [0...2]]
[1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-###############################
[1 : 1] ########################################################################################-###-###-###
[2 -> [0...2]]
[2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-##################
[2 : 1] -################################################################################################-##
Copy from multiple nodes to node by [push, NONE]
[[1...2] -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-###############################
[1 : 1] ################################################################################################-#-#
[2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##################
[2 : 3] #########################-#########################-#########################-#########################
[       OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms)

Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-19 12:03:05 +08:00
xinhui pan f618b3f075 kfdtest: add KFDTestUtilQueue
Some infrastructures below,
Implement SdmaTimePacket which records the global GPU timestamp.

Introduce class AsyncMPSQ and AsyncMPMQ.
AsyncMPSQ is aka async multiple packet single queue. It takes a set of
packet when create and submits them to a GPU to run. While AsyncMPMQ is
aka async multiple packet multiple queue. It manages a set of AsyncMPSQ,
and use a forloop to do operations of AsyncMPSQ.

Implement sdma_multicopy helper functions.

Change-Id: I47e1d2ca9630113b2a1d85a0055f3f8ee629fb5f
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-19 12:03:05 +08:00
Ramesh Errabolu 01eea21d6c Capture number of Numa Nodes present on system
Change-Id: Ic789a6b9da8e316cb483e50b0fe9faa03798f97c
2018-09-18 16:27:30 -05:00
Xiaojie Yuan 247fa9f1e0 Use 'RecordProperty' to record performance scores
For following test cases:
- KFDQMTest.QueueLatency
- KFDQMTest.BasicCuMaskingLinear
- KFDQMTest.BasicCuMaskingEven
- KFDMemoryTest.MMBandWidth
- KFDMemoryTest.MMapLarge
- KFDMemoryTest.MMBench

v2: xml element cannot start with a number, so change the key name of
    MMBandWidth and MMBench accordingly
    xml element cannot contain whitespaces, so trim whitespaces in "VRAM  "
v3: introduce KFDLog-like way to use KFDRecord

Change-Id: Ifc3ed5657621252a7b39dccf1ef4f50a92593f77
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
2018-09-18 17:41:14 +08:00
Ramesh Errabolu f007870792 ROCr changes to enable small BAR P2P over xGMI
Change-Id: I6aaa3fe2565cdf7e15d58a7484d6bd5916ffff64
2018-09-17 22:54:40 -04:00
Evgeny 81532bb6f5 VERSION_MINOR macro typo fix
aqlprofile info ENABLE_CMD enum adding;

Change-Id: I7b19082144d2bd0bf7af7ddc282358168b225759
2018-09-17 20:49:47 -04:00
xinhui pan a6287ba919 kfdtest: Do not set GTEST_FLAG throw_on_failure
This change is from commit 62f7dc2a("kfdtest: Do not set GTEST_FLAG
throw_on_failure").
But it is unexpected to reverted by commit 414042ab("kfdtest: Clean up
comments"). So add this change back.

Fix: 414042ab

Change-Id: Ia9e99c9ca17b99aab62b4db55017018ddae43dfb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-11 10:25:56 +08:00
xinhui pan 07bd97a864 kfdtest: Fix queuelatency fail issue
The timestamp written by releaseMemory packet might still not be visible
when we fetch it.
To fix this bug, use event-based wait.

Change-Id: If2324eb3b3a632c711ee4dff4d03a93d5306c289
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-10 21:17:29 -04:00
Felix Kuehling be574169c1 libhsakmt: Fix segfault on gfx801
Handle the case that svm.dgpu_aperture does not exist in vm_find_object.

Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan 1fda429726 kfdtest: GetNodeIoLinkProperties: Display NodeFrom
Use the NodeFrom returned by hsaKmtGetNodeIoLinkProperties() to check
its correctness.

Change-Id: I6ce436dc7c5d5b192bee21156292bd3eff77f916
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-10 09:44:24 -04:00
Harish Kasiviswanathan 7876bb70a9 Add cgroup support
Some nodes are unavailable based on the task's cgroup hierarchy. Handle
this situation by ignoring those nodes

Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan 866ef20054 iolinks: Handle GPU resource management
With GPU resource management, some nodes are unavailable based on the
cgroup hierarchy of the task. Kernel via sysfs specifies all the
iolinks. Skip the links which are not accessible.

Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to
relevant user Node IDs

v2: NodeFrom mapped from sysfs Node to User Node

Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan f84a99e953 Replace global variable _system with g_system
Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:07 -04:00
Sean Keely 3357cadeec Check fill addresses for alignment.
Check was documented but missing.

Change-Id: I97951635d794fd22e20c25d20e9d0e35035254af
2018-09-05 16:34:19 -04:00
xinhui pan 9c7cfc0df2 kfdtest: Add event-based synchronization mechanism to queues
Wait4PacketConsumption now can accept an event to wait all packets subbmitted
to be processed.

Change-Id: I1497b7704e892b04d05811b8d3e4742237c1be57
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-04 21:21:19 -04:00
Felix Kuehling a9bd6e6f8b Revert "libhsakmt: Try to use CPU addr as GPU addr for userptrs"
This reverts commit ab181c46c0.

This fixes ambiguity when looking up GPU addresses with
hsaKmtQueryPointerInfo.

hsa_amd_agents_allow_access uses hsaKmtQueryPointerInfo, and
depends on finding the correct object from a GPU address. Finding
the wrong userptr object based on its CPU address leads to
incorrect GPU mappings and results in VM faults.


Change-Id: I7c5f571ee6e1f9d32687aa3eab6d96944ad032be
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:50 -04:00
Felix Kuehling 608dddbe9d kfdtest: Fix gfx902 blacklist
Removed some tests from the blacklist that are now passing. Added two
new tests that hang the GPU.

Change-Id: I09e729590e5181311375058be492d387342ba2fe
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:50 -04:00
Felix Kuehling 855f1a32a9 libhsakmt: Fix and deduplicate object lookup code
Added a helper vm_find_object that can be used everywhere we need to
lookup objects by their address and optionally size. This unifies
all subtly different, partially incomplete, or broken ways of doing
this in various functions:

* map
* unmap
* register
* deregister
* free
* get_mem_info
* set_mem_user_data

At the same time fix some subtle problems for userptr lookup that
got a bit more complex when the userptr address can match the GPU
address.


Change-Id: I98572d1734fc7688a1d68f6a784e02c8dee90af5
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:47 -04:00
Sean Keely 2843988dd7 Remove redundant initialization.
LinkInfo is already initialized to zero in its default constructor.

Change-Id: Ifa4fb886cce9b474c6879c9c82744044ab394082
2018-08-29 19:36:07 -04:00
Sean Keely 56ed5c8904 Refactor blocking sdma commands.
Remove fence pool and use two signals.  Two signals allows overlapped
submission and copy while reducing thread busy polling.

Change-Id: Idb5f8e4c7f482a596ffce9e7799191fdd785a216
2018-08-29 19:13:23 -04:00
Sean Keely e0839ab27e Implement SDMA copy rect for gfx9.
Fix pitch overflow due to small element detection.
Add wide pitch 2D copy handling.
Cleanup code duplication.

Change-Id: I93b1584aba8e5964957eb7ab3544df806ca3e2f9
2018-08-29 19:13:07 -04:00
shaoyunl 30a4ab39f3 thunk: Avoid create PCIe indirect link on none large bar target
PCIe P2P (indirect) IOLinks should only be created if the remote GPU
is large-BAR

Change-Id: I55cbb5e37c5d41267583e07aca6bdcc708403029
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
2018-08-29 16:31:55 -04:00