Граф коммитов

437 Коммитов

Автор SHA1 Сообщение Дата
Oak Zeng b87f8459f4 Add more SDMA queue type
Those new types are used to create SDMA queue on specific engine

Change-Id: I91c3bcc14fef7404cf42b256a18651432e171091
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 5173e71810]
2018-11-13 14:52:01 -05:00
Oak Zeng 49dbd130f5 Use latest kfd_ioctl.h file
Change-Id: Icd7da4a305581c6857e17d59fbd0c3bd5101df3b
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>


[ROCm/ROCR-Runtime commit: 055f7c9c2c]
2018-11-13 14:51:46 -05:00
Felix Kuehling 6819730ea3 libhsakmt: Distinguish EPERM and EACCES
EPERM means "operation not permitted" and is returned when CGroup
access checks fail. EACCES means "permission denied" and is returned
when the device file permission bits or access control list don't
allow access.

EPERM can fail silently, since we assume the administrator disabled
a device on purpose in the CGroup. EACCESS should produce an error
message and an info message to check the device file permissions.

Change-Id: Iee4c5584c5fdc4e113c3d760dede6661097b4341
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 5e4e19d47b]
2018-11-12 17:06:18 -05:00
Mike Li b3fdcfe3b9 Changed scripts to include running kfdtest in docker container
Change-Id: I822ff4869610df6abad846542d7c290b7a5aae79


[ROCm/ROCR-Runtime commit: 3afce42b57]
2018-11-07 16:09:12 -05:00
Gang Ba 9147adc1d5 Add code to support packet capture and replay in the Thunk
This feature only support dgpu for now.

Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2
Signed-off-by: Gang Ba <gaba@amd.com>


[ROCm/ROCR-Runtime commit: c54c1dbdcb]
2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan 278287f045 libhsakmt: Support device controller cgroup
Device whiltelist controller cgroup allows to track and enforce open and
mknod restrictions on device files. Tasks should works with
/dev/dri/renderN devices that are whitelisted for its cgroup. If a
certain node is not whitelisted it is not an error condition.

Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: c1994e28f0]
2018-10-30 11:31:53 -04:00
Kent Russell c3aacd8463 Specify requirement of NUMA libs for Thunk
Add the numa libs to the thunk specs for DEB/RPM, so we can remove the
manual installation requirement


Change-Id: I5aadcf581b64e9a20aee9c1e1204af4715d1e990


[ROCm/ROCR-Runtime commit: 10edccb912]
2018-10-25 07:37:07 -04:00
Philip Cox 84b9ffbbbd Fix Debug Thunk spec mismatch
Move debug trap support capabilities to their own
structure to fix thunk spec vs header mismatch.



Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2
Signed-off-by: Philip Cox <Philip.Cox@amd.com>


[ROCm/ROCR-Runtime commit: 105edd4bb4]
2018-10-24 11:32:12 -04:00
xinhui pan 11106ed72f kfdtest: blacklist KFDQMTest.SdmaEventInterrupt
On gfx900+, the test sometimes timeout due to cp fw bug.
Blacklist it until we address the root cause and have a fix.

Change-Id: Iff600a6f6dbd86c56e034f530484205520bced32
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 7a13bb4d66]
2018-10-19 15:29:54 -04:00
xinhui pan 4bf0f9f43c kfdtest: Add more debug information of sdma event interrupt test
We observe this test fails on gfx900+. Looks like the sdma packets are not
executed at all after we submit sometimes.

Run it with timeout 2s on gfx900.
[ RUN      ] KFDQMTest.SdmaEventInterrupt
[----------] SDMACopyData FAIL! 1485262707170 VS 1485262747814
[----------] Event On Queue 1:0 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1859427148
[          ] 2: 680148
[          ] 3: 6370
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485367669958 VS 1485367750022
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881615148
[          ] 2: 673629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[----------] SDMACopyData FAIL! 1485427671250 VS 1485427751238
[----------] Event On Queue 2:1 Timeout, try to resubmit packets!
[----------] The timeout event is signaled!
[          ] Time Consumption (ns)
[          ] 1: 1881508777
[          ] 2: 741629
[          ] 3: 6074
[          ] 4: 5481
/home/pp/code/compute/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:1670: Failure
Value of: (ret)
  Actual: 31
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
[  FAILED  ] KFDQMTest.SdmaEventInterrupt (23675 ms)

Change-Id: I7c1b752537d89782570df20838bf976578614f75
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: ab4610cff7]
2018-10-19 15:29:54 -04:00
Yong Zhao e3f00a21ad kfdtest: Clean up the indentations in PM4ReleaseMemoryPacket::InitPacket()
Change-Id: I7f6b08697f6a68bf8c4a388c9f1cf3c3c8e6c81f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: d7e6d4706c]
2018-10-17 14:28:15 -04:00
Yong Zhao 569bdf3c84 kfdtest: Improve the SignalEvent test
Create an extra event so that the event id to test is non zero. That
way we can be sure the context id received in kernel ISR is non zero, which
is different from the default value 0 when context id is not set at all.

Change-Id: I7e261d1bbb783d5afd15558c7ac00493b1218cef
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: 77bab8596f]
2018-10-17 14:27:54 -04:00
Gang Ba 197f731fbc drm/amdkfd: Added gfx904 and gfx803 for KFD.
Change-Id: I4406dc70c776926feaecca3f2146d65259a80517
Signed-off-by: Gang Ba <gaba@amd.com>


[ROCm/ROCR-Runtime commit: 52ec7f805e]
2018-09-25 08:17:44 -04:00
Mike Li 7cd87a5590 all_gpu_id_array: Handle GPU resource management
GPU Resource management can disable some of the GPU nodes.
The Kernel driver could be not aware of this.
Get from Kernel driver information of all the nodes and then filter it.

Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: 3144a84b9a]
2018-09-24 11:38:11 -04:00
Mike Li 150eaea0af kfdtest: Handle GPU resource management
Currently the FindDRMRenderNode function will access the sysfs
directly to find the render node. It doesn't work with the
GPU management changes. Have changed code to call hsaKmtGetNodeProperties
instead.

Change-Id: I3bb537a323bc1e8c49f38d8aabc60c13e268aecd
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: c3b47c0959]
2018-09-24 11:38:11 -04:00
Mike Li 3feaa41dd7 Output a error message only when open_drm_render_device failed unexpectedly.
Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: f9bd960344]
2018-09-24 11:36:11 -04:00
xinhui pan 5b7d3a16c5 kfdtest: add P2POverheadTest
This is to measure the laterncy + overhead of sdma packet
consumption on p2p.
It is Similar with QueueLatency test. What's more, the queue's overhead
with different workload show more details.

test result on two gfx900.
[ RUN      ] KFDPerformanceTest.P2POverheadTest
[          ] Test (avg. ns) | Size	4	8	16	64	256	1024
[          ] -----------------------------------------------------------------------
[          ] [push]     [1 -> 0]	333	148	185	111	148	148
[          ] [push]     [1 -> 1]	370	222	333	74	148	111
[          ] [push]     [1 -> 2]	333	148	148	148	148	148
[          ] [push]     [2 -> 0]	111	333	259	148	148	148
[          ] [push]     [2 -> 1]	222	148	185	148	148	148
[          ] [push]     [2 -> 2]	222	111	370	111	74	148
[          ] [pull]     [1 -> 0]	370	296	296	148	185	148
[          ] [pull]     [1 -> 1]	185	333	222	148	222	148
[          ] [pull]     [1 -> 2]	222	444	259	148	185	111
[          ] [pull]     [2 -> 0]	148	148	148	148	148	148
[          ] [pull]     [2 -> 1]	148	148	148	148	148	148
[          ] [pull]     [2 -> 2]	185	148	148	74	222	296
[          ] [push|pull][1 -> 0]	1259	1222	1259	1074	1037	962
[          ] [push|pull][1 -> 1]	1037	1037	1037	740	740	1000
[          ] [push|pull][1 -> 2]	1259	1259	1296	1037	1000	1074
[          ] [push|pull][2 -> 0]	1037	1037	1037	1074	1037	1148
[          ] [push|pull][2 -> 1]	1037	1037	1037	1037	925	1074
[          ] [push|pull][2 -> 2]	666	666	740	740	703	925
[       OK ] KFDPerformanceTest.P2POverheadTest (459 ms)

Change-Id: I422263cb52f7ce184f6f1ff4466d04c239fbe9c9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 918a45a430]
2018-09-24 09:28:00 -04:00
Harish Kasiviswanathan f709e5f94d Topology: Use processors available to the process
The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of
processors available to the scheduler. When a KFD process is run under a
container environment, only a subset (cpuset) of processors are
available to the current process.

For getting CPU cache information use sched_getaffinity() to get the
number of processors available to the current process.

Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: fb79a0efe2]
2018-09-21 10:31:59 -04:00
xinhui pan c61fffa876 kfdtest: Add P2P bandwidth test
The test measures the bandwidth between GPUs. Currently we do not
care numa topology as some products really support across PCI-e root
complex p2p.

test result on two gfx900 system.
[ RUN      ] KFDPerformanceTest.P2PBandWidthTest
[          ] Copy from node to node by [push, NONE]
[          ] [1 -> 0] 6.13477 - 6.12695 GB/s
[          ] [1 -> 2] 3.77734 - 3.76855 GB/s
[          ] [2 -> 0] 6.67676 - 6.6543 GB/s
[          ] [2 -> 1] 6.14453 - 6.12793 GB/s
[          ] Copy from node to node by [pull, NONE]
[          ] [1 -> 0] 6.10547 - 6.08105 GB/s
[          ] [1 -> 2] 9.65527 - 9.65039 GB/s
[          ] [2 -> 0] 6.49805 - 6.4873 GB/s
[          ] [2 -> 1] 8.95508 - 8.85254 GB/s
[          ] Full duplex copy from node to node by [push|pull, NONE]
[          ] [1 -> 0] 11.0986 - 11.0986 GB/s
[          ] [1 -> 2] 7.54297 - 7.54297 GB/s
[          ] [2 -> 0] 12.0264 - 11.9639 GB/s
[          ] [2 -> 1] 12.0469 - 12.0371 GB/s
[          ] Full duplex copy from node to node by [push, push]
[          ] [1 <-> 2] 11.7324 - 11.4541 GB/s
[          ] Full duplex copy from node to node by [pull, pull]
[          ] [1 <-> 2] 11.4824 - 11.0508 GB/s
[          ] Copy from node to multiple nodes by [push, NONE]
[          ] [1 -> [0...2]] 5.625 - 5.73633 GB/s
[          ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s
[          ] Copy from multiple nodes to node by [push, NONE]
[          ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s

Now we can get more timestamp info like below.

Copy from node to node by [push, NONE]
[1 -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-###############################
[1 : 1] ####################################################################################################
[1 -> 2]
[1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-######################################
[1 : 1] ##################################################################################################-#
[2 -> 0]
[2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-#################
[2 : 1] ###############################################################################-#############-###-##
[2 -> 1]
[2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-####################
[2 : 1] ################################################################################-###-############-##

[snip]

Full duplex copy from node to node by [push, push]
[1 <-> 2]
[1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-####################################
[1 : 1] ################-###################################################-############-####-#############
[2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##################
[2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-##
Full duplex copy from node to node by [pull, pull]
[1 <-> 2]
[1 : 0] ######################################################################-##-#-###############-####-###
[1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################
[2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############
[2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-#############
Copy from node to multiple nodes by [push, NONE]
[1 -> [0...2]]
[1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-###############################
[1 : 1] ########################################################################################-###-###-###
[2 -> [0...2]]
[2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-##################
[2 : 1] -################################################################################################-##
Copy from multiple nodes to node by [push, NONE]
[[1...2] -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-###############################
[1 : 1] ################################################################################################-#-#
[2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##################
[2 : 3] #########################-#########################-#########################-#########################
[       OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms)

Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: e5a541eaf2]
2018-09-19 12:03:05 +08:00
xinhui pan 6b357be502 kfdtest: add KFDTestUtilQueue
Some infrastructures below,
Implement SdmaTimePacket which records the global GPU timestamp.

Introduce class AsyncMPSQ and AsyncMPMQ.
AsyncMPSQ is aka async multiple packet single queue. It takes a set of
packet when create and submits them to a GPU to run. While AsyncMPMQ is
aka async multiple packet multiple queue. It manages a set of AsyncMPSQ,
and use a forloop to do operations of AsyncMPSQ.

Implement sdma_multicopy helper functions.

Change-Id: I47e1d2ca9630113b2a1d85a0055f3f8ee629fb5f
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: f618b3f075]
2018-09-19 12:03:05 +08:00
Xiaojie Yuan ca0873a234 Use 'RecordProperty' to record performance scores
For following test cases:
- KFDQMTest.QueueLatency
- KFDQMTest.BasicCuMaskingLinear
- KFDQMTest.BasicCuMaskingEven
- KFDMemoryTest.MMBandWidth
- KFDMemoryTest.MMapLarge
- KFDMemoryTest.MMBench

v2: xml element cannot start with a number, so change the key name of
    MMBandWidth and MMBench accordingly
    xml element cannot contain whitespaces, so trim whitespaces in "VRAM  "
v3: introduce KFDLog-like way to use KFDRecord

Change-Id: Ifc3ed5657621252a7b39dccf1ef4f50a92593f77
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>


[ROCm/ROCR-Runtime commit: 247fa9f1e0]
2018-09-18 17:41:14 +08:00
xinhui pan 175bd1ed3d kfdtest: Do not set GTEST_FLAG throw_on_failure
This change is from commit a505c9bb("kfdtest: Do not set GTEST_FLAG
throw_on_failure").
But it is unexpected to reverted by commit b86f1456("kfdtest: Clean up
comments"). So add this change back.

Fix: b86f1456

Change-Id: Ia9e99c9ca17b99aab62b4db55017018ddae43dfb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: a6287ba919]
2018-09-11 10:25:56 +08:00
xinhui pan 501d3878ae kfdtest: Fix queuelatency fail issue
The timestamp written by releaseMemory packet might still not be visible
when we fetch it.
To fix this bug, use event-based wait.

Change-Id: If2324eb3b3a632c711ee4dff4d03a93d5306c289
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 07bd97a864]
2018-09-10 21:17:29 -04:00
Felix Kuehling c08dca02d7 libhsakmt: Fix segfault on gfx801
Handle the case that svm.dgpu_aperture does not exist in vm_find_object.

Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: be574169c1]
2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan af0eadcee6 kfdtest: GetNodeIoLinkProperties: Display NodeFrom
Use the NodeFrom returned by hsaKmtGetNodeIoLinkProperties() to check
its correctness.

Change-Id: I6ce436dc7c5d5b192bee21156292bd3eff77f916
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 1fda429726]
2018-09-10 09:44:24 -04:00
Harish Kasiviswanathan a0cee77f82 Add cgroup support
Some nodes are unavailable based on the task's cgroup hierarchy. Handle
this situation by ignoring those nodes

Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 7876bb70a9]
2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan ef0dad6679 iolinks: Handle GPU resource management
With GPU resource management, some nodes are unavailable based on the
cgroup hierarchy of the task. Kernel via sysfs specifies all the
iolinks. Skip the links which are not accessible.

Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to
relevant user Node IDs

v2: NodeFrom mapped from sysfs Node to User Node

Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 866ef20054]
2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan b3329ec72d Replace global variable _system with g_system
Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: f84a99e953]
2018-09-06 16:56:07 -04:00
xinhui pan 718af9febc kfdtest: Add event-based synchronization mechanism to queues
Wait4PacketConsumption now can accept an event to wait all packets subbmitted
to be processed.

Change-Id: I1497b7704e892b04d05811b8d3e4742237c1be57
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 9c7cfc0df2]
2018-09-04 21:21:19 -04:00
Felix Kuehling 094073ff74 Revert "libhsakmt: Try to use CPU addr as GPU addr for userptrs"
This reverts commit 84bb9072c0.

This fixes ambiguity when looking up GPU addresses with
hsaKmtQueryPointerInfo.

hsa_amd_agents_allow_access uses hsaKmtQueryPointerInfo, and
depends on finding the correct object from a GPU address. Finding
the wrong userptr object based on its CPU address leads to
incorrect GPU mappings and results in VM faults.


Change-Id: I7c5f571ee6e1f9d32687aa3eab6d96944ad032be
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: a9bd6e6f8b]
2018-08-31 15:04:50 -04:00
Felix Kuehling cc6d8bbbdc kfdtest: Fix gfx902 blacklist
Removed some tests from the blacklist that are now passing. Added two
new tests that hang the GPU.

Change-Id: I09e729590e5181311375058be492d387342ba2fe
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 608dddbe9d]
2018-08-31 15:04:50 -04:00
Felix Kuehling 2069d0f8e0 libhsakmt: Fix and deduplicate object lookup code
Added a helper vm_find_object that can be used everywhere we need to
lookup objects by their address and optionally size. This unifies
all subtly different, partially incomplete, or broken ways of doing
this in various functions:

* map
* unmap
* register
* deregister
* free
* get_mem_info
* set_mem_user_data

At the same time fix some subtle problems for userptr lookup that
got a bit more complex when the userptr address can match the GPU
address.


Change-Id: I98572d1734fc7688a1d68f6a784e02c8dee90af5
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 855f1a32a9]
2018-08-31 15:04:47 -04:00
shaoyunl 38a722523c thunk: Avoid create PCIe indirect link on none large bar target
PCIe P2P (indirect) IOLinks should only be created if the remote GPU
is large-BAR

Change-Id: I55cbb5e37c5d41267583e07aca6bdcc708403029
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 30a4ab39f3]
2018-08-29 16:31:55 -04:00
Shaoyun Liu 71b392fdf8 Thunk: Avoid add indirect link for the GPUS with xGMI link
Change-Id: I06f511c55e28919512fda79b504566818dc2a5ab
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>


[ROCm/ROCR-Runtime commit: 7796994f46]
2018-08-29 13:22:58 -04:00
xinhui pan 4dce73a17c kfdtest: Let BigBufferStressTest detect memory leak
As it will alloc as much as small system memory to reach the allocation limit.
We can try to alloc memory several times to see if any allocation in
the previous step cause memory leak.

Also we test if GPU can access these memory correctly or not.

Change-Id: I309f9821b6bc99c212a6bfbc21fe3086ab589fd3
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: a040a24243]
2018-08-28 22:50:42 -04:00
Shaoyun Liu a354382f0c Thunk: Add xgmi thunk interface definition
Add XGMI related defines in thunk according to the document
HSAKMT library interface specification v1.16

Change-Id: Ib25ff0ddf7380c97d06bd76fb730915e7c634270
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>


[ROCm/ROCR-Runtime commit: f9faf05fd9]
2018-08-27 13:13:37 -04:00
xinhui pan 3cd03c7f5e kfdtest: add PM4EventInterrupt test
Similar with SdmaEventInterrupt, verify event interrupt on pm4 queue.

Change-Id: I0e43f26fd0d965126985820704215d2ef5e52c1a
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 3e527bc7e8]
2018-08-24 13:21:01 +08:00
xinhui pan 973510c41f kfdtest: Let SdmaEventInterrupt test more meaningful
Simulate some workload there to verify the sDMA event interrupt.

Change-Id: Ib5ad0c238cc66898f7835e765df50427ef106b04
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: bdb1f8a066]
2018-08-24 11:27:34 +08:00
xinhui pan 3e0f1d695a kfdtest: Add some asserts in BigBufferStressTest
It should have PASS/FAIL report for the vram allocated size.

Change-Id: I546c02c2ed02f1cfb5278e0dfd7b18ade39faafb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 1076075a1c]
2018-08-23 23:01:20 -04:00
Mike Li b5cc397824 Decouple user NodeID and sysfs NodeID
Currently, all HSA nodes are exposed to user. So the existing
implementation assumes a one to one mapping between user
NodeId and sysfs nodeId.
GPU Resource Management will provide control over the exposed
HSA nodes. This means not all HSA nodes will be exposed to the user.
Decouple it.
The mapping from user NodeId to sysfs NodeId will be local
to topology.c and topology helper functions. For others NodeId
should be sequential from 0 to Number of Nodes exposed to user.

v1: initial implementation
v2: map node id within the topology_* functions
v3: remove two static globals
v4: add bounds check got node id

Change-Id: Id12147ece41d682430f398944bbb339ca906eb1b
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>


[ROCm/ROCR-Runtime commit: 3437a356c7]
2018-08-23 16:01:32 -04:00
Kent Russell 52536ba23b kfdtest: Consolidate logic for ASSERT vs EXPECT
ASSERT failures result in immediate termination of the test. EXPECT
returns a failure but continues execution. Reserve ASSERT for required
functionality (node initialization, queue creation, etc) where the rest
of the test cannot run if that call fails. Use EXPECT everywhere else

Change-Id: I1c11326fc3ae22b50fa83b07b3b49af1e1f4e69e


[ROCm/ROCR-Runtime commit: fe33461622]
2018-08-23 06:20:18 -04:00
Kent Russell b86f145610 kfdtest: Clean up comments
Consolidate style (use /* */ for multi-line), fix typos,
use dword instad of DWORD/DWord

Change-Id: I620e45c1687550db41127e45641b7d79d28223a1


[ROCm/ROCR-Runtime commit: 414042abf7]
2018-08-23 06:20:17 -04:00
Philip Cox f85a629639 Add GFX debug trap control code
Add initial support for the kfd debugger trap support
for GFX9 chips.

   - Adding support for Enable/Disable trap support
   - Setting debug trap support data
   - Setting wave launch trap override
   - Setting wave launch mode

Change-Id: If39f2395c4b6cf56249cf76f1c44cfcbdcef891c
Signed-off-by: Philip Cox <Philip.Cox@amd.com>


[ROCm/ROCR-Runtime commit: db92d5af23]
2018-08-22 14:40:15 -04:00
Felix Kuehling 8289fa25a1 libhsakmt: Fix processing of memory fault events
AMDKFD_IOC_WAIT_EVENTS with multiple events and wait_for_all = 0
returns success after any of the events have signaled. So we can't
blindly assume that a memory fault event that was in the list has
actually signaled. Check the gpu_id as an indicator whether there
really was a memory fault before processing it further.

Change-Id: I6cc311bfc184c631beaf684027176a6ca42e05c1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 9271e69ddf]
2018-08-17 16:06:45 -04:00
Felix Kuehling 84bb9072c0 libhsakmt: Try to use CPU addr as GPU addr for userptrs
If the CPU addr of a userptr is accessible by the GPU, try to use it
instead of allocating a different GPU address. If something else is
already registered with an overlapping address range, we still need to
allocate a GPU address, because KFD does not support overlapping GPUVM
mappings.

Change-Id: I452963ee45a454f735755a0b43122b9aee5d55be
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>


[ROCm/ROCR-Runtime commit: ab181c46c0]
2018-08-17 16:06:45 -04:00
Felix Kuehling 3dfb956bd5 libhsakmt: Add mmap-based aperture management for GFXv9 and later
If the GPU virtual address space is >= 47 bits, don't reserve virtual
address space at startup and use mmap to allocate virtual addresses.

Change-Id: Ic935b03c8e78271829fc8e6cfd0e543184aff818
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>


[ROCm/ROCR-Runtime commit: 80f2cc644c]
2018-08-17 16:06:45 -04:00
xinhui pan 7cd22e8785 kfdtest: use HSAuint64 instead of unsigned HSAint64
This should fix gtest compile errors.

code like below has trouble,

typedef char char8;
typedef unsigned char uchar8;

ASSERT_NE((uchar8)1, 0);
ASSERT_NE((unsigned char8)1, 0); // compile error here
or
ASSERT_NE((unsigned char8)1, 0);
ASSERT_NE((uchar8)1, 0); // compile error here

HSA[u]int64 are alias. So ASSERT_XX((unsigned HSAint64)..)
with ASSERT_XX((HSAuint64)..) fail to compile.

Change-Id: I4c24bc699a69bd4f37c4bc8aaaa9f1a92a24a33e
Signed-off-by: xinhui pan <xinhui.pan@amd.com>


[ROCm/ROCR-Runtime commit: 163fa2f3aa]
2018-08-16 16:03:52 +08:00
Yong Zhao a505c9bb05 kfdtest: Do not set GTEST_FLAG throw_on_failure
The flag makes EXPECT_* to behave like ASSERT_*, which actually work against
our favor, so disable the flag.

Change-Id: I2ea1dfeaf916b396593a504d081148abdac0fc70
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>


[ROCm/ROCR-Runtime commit: 62f7dc2a48]
2018-08-15 18:08:39 -04:00
Felix Kuehling 98c65aeaa7 libhsakmt: Fix assumptions about userptrs relative to apertures
So far we have assumed that userptrs are always memory outside
reserved SVM apertures that are mapped into the SVM aperture for
GPU access.

With an unreserved SVM aperture that covers the entire virtual
address range, this distinction will no longer be true. Userptrs
will generally be inside the unreserved SVM aperture. Take that
into consideration when registering, mapping and unmapping virtual
addresses.

We now need a retry logic when looking up buffers from addresses.
If it is not found by its GPU address, try it as a userptr.

We also need to consider the new possibility that a userptr is
registered at the same address for CPU and GPU access. So a buffer
found by its GPU address may also turn out to be a userptr. In
that case use a stricter lookup using the userptr and size (if
the size is known), to identify the correct one of multiple
overlapping objects.

Change-Id: Ia43633aaa40f9fd2a74918ae969a631d2ff68419
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>


[ROCm/ROCR-Runtime commit: 40c46cc6cb]
2018-08-15 16:07:54 -04:00
Felix Kuehling fd7827a8a1 libhsakmt: Make VA management scheme configurable per aperture
Change-Id: Ib70b038b4ef6465b03545317c6494a4e4950c107
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>


[ROCm/ROCR-Runtime commit: d79b9c1a29]
2018-08-15 14:22:19 -04:00