コミットグラフ

419 コミット

作成者 SHA1 メッセージ 日付
xinhui pan e5a541eaf2 kfdtest: Add P2P bandwidth test
The test measures the bandwidth between GPUs. Currently we do not
care numa topology as some products really support across PCI-e root
complex p2p.

test result on two gfx900 system.
[ RUN      ] KFDPerformanceTest.P2PBandWidthTest
[          ] Copy from node to node by [push, NONE]
[          ] [1 -> 0] 6.13477 - 6.12695 GB/s
[          ] [1 -> 2] 3.77734 - 3.76855 GB/s
[          ] [2 -> 0] 6.67676 - 6.6543 GB/s
[          ] [2 -> 1] 6.14453 - 6.12793 GB/s
[          ] Copy from node to node by [pull, NONE]
[          ] [1 -> 0] 6.10547 - 6.08105 GB/s
[          ] [1 -> 2] 9.65527 - 9.65039 GB/s
[          ] [2 -> 0] 6.49805 - 6.4873 GB/s
[          ] [2 -> 1] 8.95508 - 8.85254 GB/s
[          ] Full duplex copy from node to node by [push|pull, NONE]
[          ] [1 -> 0] 11.0986 - 11.0986 GB/s
[          ] [1 -> 2] 7.54297 - 7.54297 GB/s
[          ] [2 -> 0] 12.0264 - 11.9639 GB/s
[          ] [2 -> 1] 12.0469 - 12.0371 GB/s
[          ] Full duplex copy from node to node by [push, push]
[          ] [1 <-> 2] 11.7324 - 11.4541 GB/s
[          ] Full duplex copy from node to node by [pull, pull]
[          ] [1 <-> 2] 11.4824 - 11.0508 GB/s
[          ] Copy from node to multiple nodes by [push, NONE]
[          ] [1 -> [0...2]] 5.625 - 5.73633 GB/s
[          ] [2 -> [0...2]] 6.45801 - 6.4707 GB/s
[          ] Copy from multiple nodes to node by [push, NONE]
[          ] [[1...2] -> 0] 12.8379 - 12.2578 GB/s

Now we can get more timestamp info like below.

Copy from node to node by [push, NONE]
[1 -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-###############################
[1 : 1] ####################################################################################################
[1 -> 2]
[1 : 0] #--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-#-#-#--#-#-#-#-#-#--#-#-#-#-#--#-#-######################################
[1 : 1] ##################################################################################################-#
[2 -> 0]
[2 : 0] ##-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-##-###-##-###-###-##-###-###-##-###-###-#################
[2 : 1] ###############################################################################-#############-###-##
[2 -> 1]
[2 : 0] ##-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-##-##-###-##-##-##-##-###-##-##-##-####################
[2 : 1] ################################################################################-###-############-##

[snip]

Full duplex copy from node to node by [push, push]
[1 <-> 2]
[1 : 0] #-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-####################################
[1 : 1] ################-###################################################-############-####-#############
[2 : 2] #-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##-##-##-##-#-##-##-##-##-##-#-##################
[2 : 3] #####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-######-#####-#####-##
Full duplex copy from node to node by [pull, pull]
[1 <-> 2]
[1 : 0] ######################################################################-##-#-###############-####-###
[1 : 1] #-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-##-#-#-############################
[2 : 2] ##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-##-##-###-##-##-##-##-###-##-##-##-###-##-##-############
[2 : 3] #-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#########-#############
Copy from node to multiple nodes by [push, NONE]
[1 -> [0...2]]
[1 : 0] #-#--#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-#-###############################
[1 : 1] ########################################################################################-###-###-###
[2 -> [0...2]]
[2 : 0] ##-##-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-###-##-##-###-##-###-##-##-###-##-###-##-##################
[2 : 1] -################################################################################################-##
Copy from multiple nodes to node by [push, NONE]
[[1...2] -> 0]
[1 : 0] #-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-#-#-#-#-#-##-#-#-#-###############################
[1 : 1] ################################################################################################-#-#
[2 : 2] ##-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##-###-##-##-##-###-##-##-###-##-##-###-##-##################
[2 : 3] #########################-#########################-#########################-#########################
[       OK ] KFDPerformanceTest.P2PBandWidthTest (15982 ms)

Change-Id: Ia90044191d51650ccb220476d31fb317aa3ad6ce
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-19 12:03:05 +08:00
xinhui pan f618b3f075 kfdtest: add KFDTestUtilQueue
Some infrastructures below,
Implement SdmaTimePacket which records the global GPU timestamp.

Introduce class AsyncMPSQ and AsyncMPMQ.
AsyncMPSQ is aka async multiple packet single queue. It takes a set of
packet when create and submits them to a GPU to run. While AsyncMPMQ is
aka async multiple packet multiple queue. It manages a set of AsyncMPSQ,
and use a forloop to do operations of AsyncMPSQ.

Implement sdma_multicopy helper functions.

Change-Id: I47e1d2ca9630113b2a1d85a0055f3f8ee629fb5f
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-19 12:03:05 +08:00
Xiaojie Yuan 247fa9f1e0 Use 'RecordProperty' to record performance scores
For following test cases:
- KFDQMTest.QueueLatency
- KFDQMTest.BasicCuMaskingLinear
- KFDQMTest.BasicCuMaskingEven
- KFDMemoryTest.MMBandWidth
- KFDMemoryTest.MMapLarge
- KFDMemoryTest.MMBench

v2: xml element cannot start with a number, so change the key name of
    MMBandWidth and MMBench accordingly
    xml element cannot contain whitespaces, so trim whitespaces in "VRAM  "
v3: introduce KFDLog-like way to use KFDRecord

Change-Id: Ifc3ed5657621252a7b39dccf1ef4f50a92593f77
Signed-off-by: Xiaojie Yuan <xiaojie.yuan@amd.com>
2018-09-18 17:41:14 +08:00
xinhui pan a6287ba919 kfdtest: Do not set GTEST_FLAG throw_on_failure
This change is from commit 62f7dc2a("kfdtest: Do not set GTEST_FLAG
throw_on_failure").
But it is unexpected to reverted by commit 414042ab("kfdtest: Clean up
comments"). So add this change back.

Fix: 414042ab

Change-Id: Ia9e99c9ca17b99aab62b4db55017018ddae43dfb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-11 10:25:56 +08:00
xinhui pan 07bd97a864 kfdtest: Fix queuelatency fail issue
The timestamp written by releaseMemory packet might still not be visible
when we fetch it.
To fix this bug, use event-based wait.

Change-Id: If2324eb3b3a632c711ee4dff4d03a93d5306c289
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-10 21:17:29 -04:00
Felix Kuehling be574169c1 libhsakmt: Fix segfault on gfx801
Handle the case that svm.dgpu_aperture does not exist in vm_find_object.

Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan 1fda429726 kfdtest: GetNodeIoLinkProperties: Display NodeFrom
Use the NodeFrom returned by hsaKmtGetNodeIoLinkProperties() to check
its correctness.

Change-Id: I6ce436dc7c5d5b192bee21156292bd3eff77f916
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-10 09:44:24 -04:00
Harish Kasiviswanathan 7876bb70a9 Add cgroup support
Some nodes are unavailable based on the task's cgroup hierarchy. Handle
this situation by ignoring those nodes

Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan 866ef20054 iolinks: Handle GPU resource management
With GPU resource management, some nodes are unavailable based on the
cgroup hierarchy of the task. Kernel via sysfs specifies all the
iolinks. Skip the links which are not accessible.

Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to
relevant user Node IDs

v2: NodeFrom mapped from sysfs Node to User Node

Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan f84a99e953 Replace global variable _system with g_system
Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:07 -04:00
xinhui pan 9c7cfc0df2 kfdtest: Add event-based synchronization mechanism to queues
Wait4PacketConsumption now can accept an event to wait all packets subbmitted
to be processed.

Change-Id: I1497b7704e892b04d05811b8d3e4742237c1be57
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-09-04 21:21:19 -04:00
Felix Kuehling a9bd6e6f8b Revert "libhsakmt: Try to use CPU addr as GPU addr for userptrs"
This reverts commit ab181c46c0.

This fixes ambiguity when looking up GPU addresses with
hsaKmtQueryPointerInfo.

hsa_amd_agents_allow_access uses hsaKmtQueryPointerInfo, and
depends on finding the correct object from a GPU address. Finding
the wrong userptr object based on its CPU address leads to
incorrect GPU mappings and results in VM faults.


Change-Id: I7c5f571ee6e1f9d32687aa3eab6d96944ad032be
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:50 -04:00
Felix Kuehling 608dddbe9d kfdtest: Fix gfx902 blacklist
Removed some tests from the blacklist that are now passing. Added two
new tests that hang the GPU.

Change-Id: I09e729590e5181311375058be492d387342ba2fe
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:50 -04:00
Felix Kuehling 855f1a32a9 libhsakmt: Fix and deduplicate object lookup code
Added a helper vm_find_object that can be used everywhere we need to
lookup objects by their address and optionally size. This unifies
all subtly different, partially incomplete, or broken ways of doing
this in various functions:

* map
* unmap
* register
* deregister
* free
* get_mem_info
* set_mem_user_data

At the same time fix some subtle problems for userptr lookup that
got a bit more complex when the userptr address can match the GPU
address.


Change-Id: I98572d1734fc7688a1d68f6a784e02c8dee90af5
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:47 -04:00
shaoyunl 30a4ab39f3 thunk: Avoid create PCIe indirect link on none large bar target
PCIe P2P (indirect) IOLinks should only be created if the remote GPU
is large-BAR

Change-Id: I55cbb5e37c5d41267583e07aca6bdcc708403029
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
2018-08-29 16:31:55 -04:00
Shaoyun Liu 7796994f46 Thunk: Avoid add indirect link for the GPUS with xGMI link
Change-Id: I06f511c55e28919512fda79b504566818dc2a5ab
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2018-08-29 13:22:58 -04:00
xinhui pan a040a24243 kfdtest: Let BigBufferStressTest detect memory leak
As it will alloc as much as small system memory to reach the allocation limit.
We can try to alloc memory several times to see if any allocation in
the previous step cause memory leak.

Also we test if GPU can access these memory correctly or not.

Change-Id: I309f9821b6bc99c212a6bfbc21fe3086ab589fd3
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-28 22:50:42 -04:00
Shaoyun Liu f9faf05fd9 Thunk: Add xgmi thunk interface definition
Add XGMI related defines in thunk according to the document
HSAKMT library interface specification v1.16

Change-Id: Ib25ff0ddf7380c97d06bd76fb730915e7c634270
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2018-08-27 13:13:37 -04:00
xinhui pan 3e527bc7e8 kfdtest: add PM4EventInterrupt test
Similar with SdmaEventInterrupt, verify event interrupt on pm4 queue.

Change-Id: I0e43f26fd0d965126985820704215d2ef5e52c1a
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-24 13:21:01 +08:00
xinhui pan bdb1f8a066 kfdtest: Let SdmaEventInterrupt test more meaningful
Simulate some workload there to verify the sDMA event interrupt.

Change-Id: Ib5ad0c238cc66898f7835e765df50427ef106b04
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-24 11:27:34 +08:00
xinhui pan 1076075a1c kfdtest: Add some asserts in BigBufferStressTest
It should have PASS/FAIL report for the vram allocated size.

Change-Id: I546c02c2ed02f1cfb5278e0dfd7b18ade39faafb
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-23 23:01:20 -04:00
Mike Li 3437a356c7 Decouple user NodeID and sysfs NodeID
Currently, all HSA nodes are exposed to user. So the existing
implementation assumes a one to one mapping between user
NodeId and sysfs nodeId.
GPU Resource Management will provide control over the exposed
HSA nodes. This means not all HSA nodes will be exposed to the user.
Decouple it.
The mapping from user NodeId to sysfs NodeId will be local
to topology.c and topology helper functions. For others NodeId
should be sequential from 0 to Number of Nodes exposed to user.

v1: initial implementation
v2: map node id within the topology_* functions
v3: remove two static globals
v4: add bounds check got node id

Change-Id: Id12147ece41d682430f398944bbb339ca906eb1b
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-08-23 16:01:32 -04:00
Kent Russell fe33461622 kfdtest: Consolidate logic for ASSERT vs EXPECT
ASSERT failures result in immediate termination of the test. EXPECT
returns a failure but continues execution. Reserve ASSERT for required
functionality (node initialization, queue creation, etc) where the rest
of the test cannot run if that call fails. Use EXPECT everywhere else

Change-Id: I1c11326fc3ae22b50fa83b07b3b49af1e1f4e69e
2018-08-23 06:20:18 -04:00
Kent Russell 414042abf7 kfdtest: Clean up comments
Consolidate style (use /* */ for multi-line), fix typos,
use dword instad of DWORD/DWord

Change-Id: I620e45c1687550db41127e45641b7d79d28223a1
2018-08-23 06:20:17 -04:00
Philip Cox db92d5af23 Add GFX debug trap control code
Add initial support for the kfd debugger trap support
for GFX9 chips.

   - Adding support for Enable/Disable trap support
   - Setting debug trap support data
   - Setting wave launch trap override
   - Setting wave launch mode

Change-Id: If39f2395c4b6cf56249cf76f1c44cfcbdcef891c
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2018-08-22 14:40:15 -04:00
Felix Kuehling 9271e69ddf libhsakmt: Fix processing of memory fault events
AMDKFD_IOC_WAIT_EVENTS with multiple events and wait_for_all = 0
returns success after any of the events have signaled. So we can't
blindly assume that a memory fault event that was in the list has
actually signaled. Check the gpu_id as an indicator whether there
really was a memory fault before processing it further.

Change-Id: I6cc311bfc184c631beaf684027176a6ca42e05c1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-17 16:06:45 -04:00
Felix Kuehling ab181c46c0 libhsakmt: Try to use CPU addr as GPU addr for userptrs
If the CPU addr of a userptr is accessible by the GPU, try to use it
instead of allocating a different GPU address. If something else is
already registered with an overlapping address range, we still need to
allocate a GPU address, because KFD does not support overlapping GPUVM
mappings.

Change-Id: I452963ee45a454f735755a0b43122b9aee5d55be
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-17 16:06:45 -04:00
Felix Kuehling 80f2cc644c libhsakmt: Add mmap-based aperture management for GFXv9 and later
If the GPU virtual address space is >= 47 bits, don't reserve virtual
address space at startup and use mmap to allocate virtual addresses.

Change-Id: Ic935b03c8e78271829fc8e6cfd0e543184aff818
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-17 16:06:45 -04:00
xinhui pan 163fa2f3aa kfdtest: use HSAuint64 instead of unsigned HSAint64
This should fix gtest compile errors.

code like below has trouble,

typedef char char8;
typedef unsigned char uchar8;

ASSERT_NE((uchar8)1, 0);
ASSERT_NE((unsigned char8)1, 0); // compile error here
or
ASSERT_NE((unsigned char8)1, 0);
ASSERT_NE((uchar8)1, 0); // compile error here

HSA[u]int64 are alias. So ASSERT_XX((unsigned HSAint64)..)
with ASSERT_XX((HSAuint64)..) fail to compile.

Change-Id: I4c24bc699a69bd4f37c4bc8aaaa9f1a92a24a33e
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-16 16:03:52 +08:00
Yong Zhao 62f7dc2a48 kfdtest: Do not set GTEST_FLAG throw_on_failure
The flag makes EXPECT_* to behave like ASSERT_*, which actually work against
our favor, so disable the flag.

Change-Id: I2ea1dfeaf916b396593a504d081148abdac0fc70
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-15 18:08:39 -04:00
Felix Kuehling 40c46cc6cb libhsakmt: Fix assumptions about userptrs relative to apertures
So far we have assumed that userptrs are always memory outside
reserved SVM apertures that are mapped into the SVM aperture for
GPU access.

With an unreserved SVM aperture that covers the entire virtual
address range, this distinction will no longer be true. Userptrs
will generally be inside the unreserved SVM aperture. Take that
into consideration when registering, mapping and unmapping virtual
addresses.

We now need a retry logic when looking up buffers from addresses.
If it is not found by its GPU address, try it as a userptr.

We also need to consider the new possibility that a userptr is
registered at the same address for CPU and GPU access. So a buffer
found by its GPU address may also turn out to be a userptr. In
that case use a stricter lookup using the userptr and size (if
the size is known), to identify the correct one of multiple
overlapping objects.

Change-Id: Ia43633aaa40f9fd2a74918ae969a631d2ff68419
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-15 16:07:54 -04:00
Felix Kuehling d79b9c1a29 libhsakmt: Make VA management scheme configurable per aperture
Change-Id: Ib70b038b4ef6465b03545317c6494a4e4950c107
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling d57026f447 libhsakmt: Allow dgpu and dgpu_alt aperture to be the same
Make dgpu_aperture and dgpu_alt_aperture pointers that can point to
the same actual aperture. This will be useful on GFXv9 and later,
where the MType is not defined by the aperture and we want to have
a single aperture covering the entire virtual address space.

aperture->is_coherent can no longer be a reliable indicator of
coherency. Replace it with different conditions based on mem flags
and svm.disable_cache (from HSA_DISABLE_CACHE environment).

Change-Id: Iefc415b87b8abd96e3916586485a0a55d9b27c19
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling 2d2181b478 libhsakmt: Move unmapping into aperture_release_area
This prepares the code for an alternative aperture management method
that needs to unmap memory differently.

Change-Id: I5494aa5420f85edb8f7857f00c17e1d2e6479a51
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling 9d96af0150 libhsakmt: scratch is not a manageable aperture
Only scratch_physical, for scratch-backing memory is managed by the Thunk.

Change-Id: I4716981aa908d9569584dc35f40ffd270a2f9014
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling 842359a826 libhsakmt: Remove aperture offset parameter
This parameter was used for non-canonical GPUVM allocations on GFX7/8 APUs
only, to prevent getting NULL pointers from valid allocation after
subtracting the aperture base. The same can be achieved less intrusively
by reserving address space at the start of the aperture during
initialization.

Change-Id: I0aae773f069c2b228824ba464b0612a4d8b489ce
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling d3fdaaca3a kfdtest: Enable more tests for gfx900
A lot of tests were disabled on gfx900 for historical reasons that
are no longer valid. The only remaining one that won't work on
gfx900 is BasicAddressWatch.

Change-Id: I11507de0dfd31262713127d6cb15cc09c14b8b9f
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-15 14:22:19 -04:00
Kent Russell f2bd7e1d52 kfdtest: Consolidate log messages for skipped tests
When skipping a test, the output should be:
Skipping test: <reason>.

This will allow for easier identification, automation and general readability

Change-Id: I98bda1c068f9dbc83aeea74f642b6101121f234d
2018-08-14 10:11:50 -04:00
Kent Russell cb019f00cd kfdtest: Consolidate indentation of multi-line function calls
Make indentation consistent, which is that subsequent lines are aligned
with the variables declared above

Change-Id: I590f7768d93565145b986ad1fb6ac8e82f9c0d58
2018-08-14 08:18:07 -04:00
Kent Russell dffac0a97e kfdtest: Style cleanup
Clean up the KFDTest style via CPPLint. Some warnings remain regarding
volatile variables being cast to void*. This is the command used:
cpplint.py --linelength=120
--filter=-readability/multiline_string,-readability/todo,-build/include,-runtime/references

multiline_string is due to using ISA code
todo is to avoid errors that we don't have TODO(username) instead of TODO
include is about including the folder in the header includes
references is regarding non-const references '&' being const or using
pointers. That can be addressed later

Change-Id: I3c6622da0a13dd33ab29b2bfff48be25e763b750
2018-08-14 08:17:57 -04:00
xinhui pan 3f7b6356fd kfdtest: fix a memory leak issue in MMapLarge test
When mapMemoryToGpu fails, we need unregister it with user address as
the gpu address is not available.

Change-Id: I4418eeaa7aa37008f5bffa144e2c2171f0d238fd
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-10 05:26:06 -04:00
xinhui pan eb5539fb10 thunk: fix a memory leak
Hit queue create failure when do kfdtest with --gtest_repet=-1

fix: 4bb90d04("Remove the use of IS_DGPU()")

Change-Id: I04fa73f90cef13a5517dbaceb89c41dc0f821a79
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-10 15:51:32 +08:00
Yong Zhao 110e754f64 Differentiate gfx700 and improve the logic by introducing is_gfx700()
Because gfx700 has local memory but other APUs don't, we should reflect
that in the code. Meanwhile, fix a bug that on gfx902 svm aperture is not
added when calling hsaKmtGetNodeMemoryProperties().

Change-Id: Id840f2db0b14fda9ee713b219a9474c15f8a9771
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-09 21:39:37 -04:00
xinhui pan 8fbf4a26ec thunk: fix a vm area release issue
On some asics, like tonga, the memory alignment size is as big as 0x8000.

fmm_allocate* alloc vm area with size passed in which is not aligned mostly.
But __fmm_release free vm area with vm_object_t->size which is aligned.

That might cause aperture_release_area fail to free the vm area as the
size might be bigger than zone itself or it just free another vm area
nearby unexpected.

This patch somehow will alloc more space than it needed on tonga.
gfx900+ is not affected.

Change-Id: I5a88c92b08c4e6f6bc05881798f769b55d6debe9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-09 06:08:15 -04:00
Yong Zhao fe04dd6890 Calculate and store the first gpu mem during initializaiton
Previously we used the first dgpu mem, but after careful examination, we
found it only needs to be a GPU, so we modify the code to reflect that as
well.

Change-Id: I069d9b8e247aed55c1f885b79f743ea8e03ddf93
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-08 13:54:09 -04:00
xinhui pan 9d6d0911e4 kfdtest: make p2ptest go through all gpus
Implement sDMA copy packet broadcast.

Each time sDMA will copy its local vram to sysbuf and next GPU's vram.
That will verify where the p2p link is broken.
Currently we just test push of p2p.

test result on 2 cpus, 4 gpus, numa enabled system.
[ RUN      ] KFDQMTest.P2PTest
[          ] Test 2 -> 3
[          ] PASS 2 -> 3
[          ] Test 3 -> 4
[          ] PASS 3 -> 4
[          ] Test 4 -> 5
[          ] PASS 4 -> 5
[          ] Test 5 -> 0
[          ] PASS 5 -> 0
[       OK ] KFDQMTest.P2PTest (190 ms)

Change-Id: Ie6fb2604109e39465b8a873b3bb42abc6259825a
2018-08-07 21:13:37 -04:00
Yong Zhao 4bb90d048c Remove the use of IS_DGPU()
The information can be obtained directly from node id. Also improve the
whole logic for future compatibility.

Change-Id: I130733be4e7930d5953d5e81409905e60c2ec35e
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-07 18:07:04 -04:00
Felix Kuehling c21927f425 libhsakmt: Fix problems init_svm_apertures
Unset ret_addr when unmapping the address space reservation. Otherwise
it may try to unmap it again later.

Remember the actual map_size and use it instead of len outside the
reservation loops.

Change-Id: I1a6b3fecfb59e22a713e5ed49c3ed37914cb6fb5
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-03 22:09:52 -04:00
Felix Kuehling dd6f34b7f5 libhsakmt: Fix pkg-config file paths
Both the include and libpath were incorrect after recent build
system changes. Use the proper GNUInstallDirs definitions in
libhsakmt.pc.in to write the proper locations.

This is needed for end users building KFDTest, which depends on
correct pkg-config information.

Change-Id: Ia45f36f054c2a607a77e7ecbcbd9eb7edd067348
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-03 20:17:09 -04:00
Felix Kuehling 5c742f3e5e kfdtest: Blacklist Fragmentation test on all chips
This test has been intermittently failing for various reasons and
was already disabled on all chips except Ellesmere. It stresses
memory management in unusual ways by having lots of memory allocated
but +# not mapped, which is not relevant to compute applications over
ROCr.



Change-Id: I6b791ca7e2e0fcfe93fc720063b4b56acfded751
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-03 20:14:46 -04:00