Commit Graph

352 Commits

Author SHA1 Message Date
Sean Keely c7f1277013 Add hsaKmtRegisterMemoryWithFlags.
API follows hsaKmtRegisterMemory but allows passing HsaMemFlags.

Change-Id: I66a230a87c8b085f27c769bdf2cb4d0d96a5d6dd
2019-03-28 17:17:40 -04:00
Kent Russell 1304a92e17 libhsakmt: Add Vega M support
While this may not be supported in the runtime, the kernel/firmware
support it

Change-Id: I7fe4536a6b3055f39e25f453060e899938645d91
Signed-off-by: Kent Russell <kent.russell@amd.com>
2019-03-21 08:05:47 -04:00
Kent Russell c7439a0039 libhsakmt: Add another gfx902 GPU ID
Change-Id: I967f16bf548171df73d2e721f16c1aac52e99852
Signed-off-by: Kent Russell <kent.russell@amd.com>
2019-03-21 08:05:47 -04:00
Amber Lin cfa47ac1f9 libhsakmt: Fix missing apicid in topology
While adding x2APIC support, apicid for non-x2apic was missing out by
mistake.



Change-Id: I25eed362c035c0e9fb9ea948899c49f70311f269
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2019-03-11 13:05:37 -04:00
Oak Zeng 1046a1fd72 Introduce XGMI SDMA queue type
Change-Id: I8c6ff04f92c2bbea0bab94ddb8cc4cceb5d74d02
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-03-07 19:42:20 -05:00
Oak Zeng 414a3508d6 Thunk interface to get SDMA engine info
Add SDMA engine info fields to node properties and
modify get node properties API to read SDMA engine
info from sysfs

Change-Id: Iea877b5bc008cc9df9405daf564a359535f1bc9f
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-03-07 14:09:43 -05:00
Amber Lin f8028a40fd libhsakmt: Support x2APIC in topology
Current processor/cache topology code implements xAPIC architecture, which
is 8 bits addressability. This is not enough for a system having more than
255 processors. x2APIC is the extension of xAPIC architecture to support
32 bit addressability of processors. This patch detects the x2APIC
enablement and uses the extension leaf to get apicid when detected.

Change-Id: I0826585d02f696a46cd5efb9a6630c60af01e2d8
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2019-03-01 16:42:30 -05:00
Yong Zhao 776077fe65 libhsakmt: Use a better name doorbell_mmap_offset
The previous name doorbell_offset is used too extensively throughout
the code and did not reflect the true usage.

Change-Id: I50d33f5c00e82c46cdf4264a78b8f925705bed6a
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-01-31 14:50:04 -05:00
Yong Zhao 51ee5c324a libhsakmt: Introduce HSA_ZFB environment variable
This variable is 0 by default. When set to 1, it means there is no frame
buffer, so all memory allocation is routed to system memory. This mode
is mainly used during emulation.

Use CoarseGrain for VRAM under ZFB mode

Change-Id: I29e8e98be56935e3ceb94782d70771cc45700749
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-01-29 19:46:26 -05:00
xinhui pan 41bf449e99 thunk: fix size overflow
Some test case alloc >4gb memory.
Use HSAuint64 in bytes and HSAuint32 in pages.

Change-Id: I0d5e6c299903b5898cfea024178a7a26b9ba3c90
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2019-01-28 10:49:45 -05:00
Jay Cornwall b764991982 Set EOP buffer TC policy to non-coherent
Restores regression in dispatch latency.

Change-Id: I17869d3d515d8c1fa055a57afec2531903b88b16
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
2019-01-23 12:21:01 -05:00
Oak Zeng 124f77775c Revert "Create SDMA queue on specific engine"
This reverts commit 58b95e0a9d.

Change-Id: Idc0decd86364ed3441e9037b83be8be9953f0b3e
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-01-21 10:37:46 -06:00
Philip Cox 37858f2311 Initial gfx9 debugger node suspend/resume
Change-Id: I2a5dac3d02265c11f5b6985ab457e2d1caa0a033
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2019-01-11 09:00:54 -05:00
Eric Huang 8ee93b3187 libhsakmt: add RAS support v2
RAS feature enabling bit and errors return are implemented in
existed topology and event mechanism.

v2: change library interface.

Change-Id: I75807c080b5b26e8115240b05b3d7016cb05a31a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
2018-12-13 10:17:12 -05:00
Kent Russell 53439669d9 libhsakmt: Add new gfx900 and gfx906 GPU IDs
Change-Id: I93b2b845c3edb2da55235a56516a851145745988
2018-12-12 08:36:40 -05:00
Eric Huang 29d11d02e8 Revert "libhsakmt: add RAS support"
This reverts commit 1fbe010354.

Change-Id: I739b17e057f2a8a0f4375741955209d2477c704a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
2018-12-08 19:42:33 -05:00
Eric Huang 1fbe010354 libhsakmt: add RAS support
RAS feature enabling bit and errors return are implemented in
existed topology and event mechanism.

Change-Id: I9b018bba80cf4a6998e42a7bff64318c689b1d2a
Signed-off-by: Eric Huang <JinhuiEric.Huang@amd.com>
2018-11-23 11:42:34 -05:00
shaoyunl 29b45b8c0a Thunk: make scratch memory only map to its own GPU
Map scratch memory to the GPU that specified when allocate the memory

Change-Id: I788f9ef0dccb63b894a75e75cac5f94a60d7ec48
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
2018-11-19 10:26:31 -05:00
Oak Zeng 58b95e0a9d Create SDMA queue on specific engine
Change-Id: Id651ececda55b81b45e991bd8e6616674be48d8e
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2018-11-13 14:52:17 -05:00
Felix Kuehling 5e4e19d47b libhsakmt: Distinguish EPERM and EACCES
EPERM means "operation not permitted" and is returned when CGroup
access checks fail. EACCES means "permission denied" and is returned
when the device file permission bits or access control list don't
allow access.

EPERM can fail silently, since we assume the administrator disabled
a device on purpose in the CGroup. EACCESS should produce an error
message and an info message to check the device file permissions.

Change-Id: Iee4c5584c5fdc4e113c3d760dede6661097b4341
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-11-12 17:06:18 -05:00
Gang Ba c54c1dbdcb Add code to support packet capture and replay in the Thunk
This feature only support dgpu for now.

Change-Id: Ic766ec06892c955dd605ecc335a776335edc0df2
Signed-off-by: Gang Ba <gaba@amd.com>
2018-10-31 16:53:46 -04:00
Harish Kasiviswanathan c1994e28f0 libhsakmt: Support device controller cgroup
Device whiltelist controller cgroup allows to track and enforce open and
mknod restrictions on device files. Tasks should works with
/dev/dri/renderN devices that are whitelisted for its cgroup. If a
certain node is not whitelisted it is not an error condition.

Change-Id: I0b997423ccdc00aee98df5b6f04ed6794549604e
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-10-30 11:31:53 -04:00
Philip Cox 105edd4bb4 Fix Debug Thunk spec mismatch
Move debug trap support capabilities to their own
structure to fix thunk spec vs header mismatch.



Change-Id: I6694601bfa36097502c8ab932e082d7a4645d5b2
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2018-10-24 11:32:12 -04:00
Gang Ba 52ec7f805e drm/amdkfd: Added gfx904 and gfx803 for KFD.
Change-Id: I4406dc70c776926feaecca3f2146d65259a80517
Signed-off-by: Gang Ba <gaba@amd.com>
2018-09-25 08:17:44 -04:00
Mike Li 3144a84b9a all_gpu_id_array: Handle GPU resource management
GPU Resource management can disable some of the GPU nodes.
The Kernel driver could be not aware of this.
Get from Kernel driver information of all the nodes and then filter it.

Change-Id: I4eeb126a5efce2192c35f5d2b72be1811e9ded32
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-09-24 11:38:11 -04:00
Mike Li f9bd960344 Output a error message only when open_drm_render_device failed unexpectedly.
Change-Id: I5b9587a8d5c7a900e9ab8611a25d0c49d34b4cef
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-09-24 11:36:11 -04:00
Harish Kasiviswanathan fb79a0efe2 Topology: Use processors available to the process
The existing call sysconf (_SC_NPROCESSORS_ONLN) provides the number of
processors available to the scheduler. When a KFD process is run under a
container environment, only a subset (cpuset) of processors are
available to the current process.

For getting CPU cache information use sched_getaffinity() to get the
number of processors available to the current process.

Change-Id: Ieac02f1f61c17e24ac34ba502968c69d3bc631cb
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-21 10:31:59 -04:00
Felix Kuehling be574169c1 libhsakmt: Fix segfault on gfx801
Handle the case that svm.dgpu_aperture does not exist in vm_find_object.

Change-Id: Ic0983d4f321f1b6248514f2fa25162976e90bd75
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-09-10 14:39:05 -04:00
Harish Kasiviswanathan 7876bb70a9 Add cgroup support
Some nodes are unavailable based on the task's cgroup hierarchy. Handle
this situation by ignoring those nodes

Change-Id: I72f9e822d2ec8cf15732df95e427d5549a75b55d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:32 -04:00
Harish Kasiviswanathan 866ef20054 iolinks: Handle GPU resource management
With GPU resource management, some nodes are unavailable based on the
cgroup hierarchy of the task. Kernel via sysfs specifies all the
iolinks. Skip the links which are not accessible.

Also iolinks specified by the kernel refer to sysfs Node IDs. Map it to
relevant user Node IDs

v2: NodeFrom mapped from sysfs Node to User Node

Change-Id: I95312ee6ca51b89fe9e6ca2a9185c2ea1e94afc4
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:07 -04:00
Harish Kasiviswanathan f84a99e953 Replace global variable _system with g_system
Change-Id: I452090473a5b46b32204f7f916bdcfdd3e8a47bd
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2018-09-06 16:56:07 -04:00
Felix Kuehling a9bd6e6f8b Revert "libhsakmt: Try to use CPU addr as GPU addr for userptrs"
This reverts commit ab181c46c0.

This fixes ambiguity when looking up GPU addresses with
hsaKmtQueryPointerInfo.

hsa_amd_agents_allow_access uses hsaKmtQueryPointerInfo, and
depends on finding the correct object from a GPU address. Finding
the wrong userptr object based on its CPU address leads to
incorrect GPU mappings and results in VM faults.


Change-Id: I7c5f571ee6e1f9d32687aa3eab6d96944ad032be
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:50 -04:00
Felix Kuehling 855f1a32a9 libhsakmt: Fix and deduplicate object lookup code
Added a helper vm_find_object that can be used everywhere we need to
lookup objects by their address and optionally size. This unifies
all subtly different, partially incomplete, or broken ways of doing
this in various functions:

* map
* unmap
* register
* deregister
* free
* get_mem_info
* set_mem_user_data

At the same time fix some subtle problems for userptr lookup that
got a bit more complex when the userptr address can match the GPU
address.


Change-Id: I98572d1734fc7688a1d68f6a784e02c8dee90af5
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-31 15:04:47 -04:00
shaoyunl 30a4ab39f3 thunk: Avoid create PCIe indirect link on none large bar target
PCIe P2P (indirect) IOLinks should only be created if the remote GPU
is large-BAR

Change-Id: I55cbb5e37c5d41267583e07aca6bdcc708403029
Signed-off-by: shaoyunl <Shaoyun.Liu@amd.com>
2018-08-29 16:31:55 -04:00
Shaoyun Liu 7796994f46 Thunk: Avoid add indirect link for the GPUS with xGMI link
Change-Id: I06f511c55e28919512fda79b504566818dc2a5ab
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
2018-08-29 13:22:58 -04:00
Mike Li 3437a356c7 Decouple user NodeID and sysfs NodeID
Currently, all HSA nodes are exposed to user. So the existing
implementation assumes a one to one mapping between user
NodeId and sysfs nodeId.
GPU Resource Management will provide control over the exposed
HSA nodes. This means not all HSA nodes will be exposed to the user.
Decouple it.
The mapping from user NodeId to sysfs NodeId will be local
to topology.c and topology helper functions. For others NodeId
should be sequential from 0 to Number of Nodes exposed to user.

v1: initial implementation
v2: map node id within the topology_* functions
v3: remove two static globals
v4: add bounds check got node id

Change-Id: Id12147ece41d682430f398944bbb339ca906eb1b
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
2018-08-23 16:01:32 -04:00
Philip Cox db92d5af23 Add GFX debug trap control code
Add initial support for the kfd debugger trap support
for GFX9 chips.

   - Adding support for Enable/Disable trap support
   - Setting debug trap support data
   - Setting wave launch trap override
   - Setting wave launch mode

Change-Id: If39f2395c4b6cf56249cf76f1c44cfcbdcef891c
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2018-08-22 14:40:15 -04:00
Felix Kuehling 9271e69ddf libhsakmt: Fix processing of memory fault events
AMDKFD_IOC_WAIT_EVENTS with multiple events and wait_for_all = 0
returns success after any of the events have signaled. So we can't
blindly assume that a memory fault event that was in the list has
actually signaled. Check the gpu_id as an indicator whether there
really was a memory fault before processing it further.

Change-Id: I6cc311bfc184c631beaf684027176a6ca42e05c1
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-17 16:06:45 -04:00
Felix Kuehling ab181c46c0 libhsakmt: Try to use CPU addr as GPU addr for userptrs
If the CPU addr of a userptr is accessible by the GPU, try to use it
instead of allocating a different GPU address. If something else is
already registered with an overlapping address range, we still need to
allocate a GPU address, because KFD does not support overlapping GPUVM
mappings.

Change-Id: I452963ee45a454f735755a0b43122b9aee5d55be
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-17 16:06:45 -04:00
Felix Kuehling 80f2cc644c libhsakmt: Add mmap-based aperture management for GFXv9 and later
If the GPU virtual address space is >= 47 bits, don't reserve virtual
address space at startup and use mmap to allocate virtual addresses.

Change-Id: Ic935b03c8e78271829fc8e6cfd0e543184aff818
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-17 16:06:45 -04:00
Felix Kuehling 40c46cc6cb libhsakmt: Fix assumptions about userptrs relative to apertures
So far we have assumed that userptrs are always memory outside
reserved SVM apertures that are mapped into the SVM aperture for
GPU access.

With an unreserved SVM aperture that covers the entire virtual
address range, this distinction will no longer be true. Userptrs
will generally be inside the unreserved SVM aperture. Take that
into consideration when registering, mapping and unmapping virtual
addresses.

We now need a retry logic when looking up buffers from addresses.
If it is not found by its GPU address, try it as a userptr.

We also need to consider the new possibility that a userptr is
registered at the same address for CPU and GPU access. So a buffer
found by its GPU address may also turn out to be a userptr. In
that case use a stricter lookup using the userptr and size (if
the size is known), to identify the correct one of multiple
overlapping objects.

Change-Id: Ia43633aaa40f9fd2a74918ae969a631d2ff68419
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-15 16:07:54 -04:00
Felix Kuehling d79b9c1a29 libhsakmt: Make VA management scheme configurable per aperture
Change-Id: Ib70b038b4ef6465b03545317c6494a4e4950c107
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling d57026f447 libhsakmt: Allow dgpu and dgpu_alt aperture to be the same
Make dgpu_aperture and dgpu_alt_aperture pointers that can point to
the same actual aperture. This will be useful on GFXv9 and later,
where the MType is not defined by the aperture and we want to have
a single aperture covering the entire virtual address space.

aperture->is_coherent can no longer be a reliable indicator of
coherency. Replace it with different conditions based on mem flags
and svm.disable_cache (from HSA_DISABLE_CACHE environment).

Change-Id: Iefc415b87b8abd96e3916586485a0a55d9b27c19
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling 2d2181b478 libhsakmt: Move unmapping into aperture_release_area
This prepares the code for an alternative aperture management method
that needs to unmap memory differently.

Change-Id: I5494aa5420f85edb8f7857f00c17e1d2e6479a51
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling 9d96af0150 libhsakmt: scratch is not a manageable aperture
Only scratch_physical, for scratch-backing memory is managed by the Thunk.

Change-Id: I4716981aa908d9569584dc35f40ffd270a2f9014
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
Felix Kuehling 842359a826 libhsakmt: Remove aperture offset parameter
This parameter was used for non-canonical GPUVM allocations on GFX7/8 APUs
only, to prevent getting NULL pointers from valid allocation after
subtracting the aperture base. The same can be achieved less intrusively
by reserving address space at the start of the aperture during
initialization.

Change-Id: I0aae773f069c2b228824ba464b0612a4d8b489ce
Signed-off-by: Felix Kuehling <felix.kuehling@gmail.com>
2018-08-15 14:22:19 -04:00
xinhui pan eb5539fb10 thunk: fix a memory leak
Hit queue create failure when do kfdtest with --gtest_repet=-1

fix: 4bb90d04("Remove the use of IS_DGPU()")

Change-Id: I04fa73f90cef13a5517dbaceb89c41dc0f821a79
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-10 15:51:32 +08:00
Yong Zhao 110e754f64 Differentiate gfx700 and improve the logic by introducing is_gfx700()
Because gfx700 has local memory but other APUs don't, we should reflect
that in the code. Meanwhile, fix a bug that on gfx902 svm aperture is not
added when calling hsaKmtGetNodeMemoryProperties().

Change-Id: Id840f2db0b14fda9ee713b219a9474c15f8a9771
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-09 21:39:37 -04:00
xinhui pan 8fbf4a26ec thunk: fix a vm area release issue
On some asics, like tonga, the memory alignment size is as big as 0x8000.

fmm_allocate* alloc vm area with size passed in which is not aligned mostly.
But __fmm_release free vm area with vm_object_t->size which is aligned.

That might cause aperture_release_area fail to free the vm area as the
size might be bigger than zone itself or it just free another vm area
nearby unexpected.

This patch somehow will alloc more space than it needed on tonga.
gfx900+ is not affected.

Change-Id: I5a88c92b08c4e6f6bc05881798f769b55d6debe9
Signed-off-by: xinhui pan <xinhui.pan@amd.com>
2018-08-09 06:08:15 -04:00
Yong Zhao fe04dd6890 Calculate and store the first gpu mem during initializaiton
Previously we used the first dgpu mem, but after careful examination, we
found it only needs to be a GPU, so we modify the code to reflect that as
well.

Change-Id: I069d9b8e247aed55c1f885b79f743ea8e03ddf93
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2018-08-08 13:54:09 -04:00