Граф коммитов

435 Коммитов

Автор SHA1 Сообщение Дата
Huang Rui 8fc816affe libhsakmt: fix to update the param number after remove to dgpu input
This patch is the hot fix to fix the param number checking after remove
dgpu input.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: Ic980588f78616f99076de742af580afb4273fb2f
2020-09-11 10:25:37 -04:00
Huang Rui 8ea0d49337 libhsakmt: update gfx90c isa version
gfx90c should use GFX902 which is the same with gfx902.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: Id24dc2c85c9f49f36b00889c3b8b1b19cce34e09
2020-09-09 22:10:58 -04:00
Huang Rui ad87f38dad libhsakmt: remove is_dgpu flag in the hsa_gfxip_table
Whether use dgpu path will check the props which exposed from kernel.
We won't need hard code in the ASIC table.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: I0c018a26b219914a41197ff36dbec7a75945d452
2020-09-09 20:56:50 +08:00
Huang Rui 12813691a2 libhsakmt: implement the method that using flag which exposed by kfd to configure is_dgpu
KFD already implemented the fallback path for APU. Thunk will use flag
which exposed by kfd to configure is_dgpu instead of hardcode before.

Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: I445f6cf668f9484dd06cd9ae1bb3cfe7428ec7eb
2020-09-09 20:56:39 +08:00
Oak Zeng 3d3b28b670 CWSR control stack size calculation for gfx10
Gfx10 need 12bytes/wave control stack

Change-Id: I6c6f2819572e6b43aa3140d4dbe79d930e4c1c9c
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
2020-09-01 21:34:00 -07:00
Philip Cox f7a3427c99 libhsakmt: call madvise() from fmm_allocate_device
This is needed to avoid additional references to mapped BOs in child
processes that can prevent freeing memory in the parent process and lead
to out-of-memory conditions.

Change-Id: I25c90510a14dde515cc23ea5dc1f68e8d7e37a66
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2020-08-19 13:33:47 -04:00
Kent Russell 04f6b9e16b Fix GCC warning regarding strncpy in CPU info
strlen(src) should not be used as the length in strncpy. Use memcpy
since we know the length of the string, and ensure that we
NULL-terminate regardless of length

Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I21cc6d106510c69464e7ac9d3fc7da3a1e6d1a68
2020-08-14 07:10:19 -04:00
Kent Russell 6085baa2dc Fix typo lager->larger
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I188d629b6441e5ebb14f104869e871d003c78c9d
2020-08-13 06:34:42 -04:00
Philip Yang 9e9771a7d9 libhsakmt: always use render fd to create CPU mapping
The option to use kfd_fd for cpu mapping is for very old broken KFD
version, it is not used in upstreaming process. This causes issue when
multiple process uses shared system memory because the GTT address is
over 40 bits.

Change to always use render node fd to create CPU mapping.

Change-Id: Id7e7b2a2e2f13c6e62c5de170589abfff4d456b0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2020-08-04 12:54:57 -04:00
Chengming Gui bf1a7acea3 libhsakmt: Add gfx1031 support
Signed-off-by: Chengming Gui <Jack.Gui@amd.com>
Change-Id: Ic1e78e5c3a453eb01f725612cf9ecc702ce2e132
2020-07-28 15:01:00 -05:00
Yong Zhao 7c74069d6a libhsakmt: Prepare for gfx1030 support
PCI IDs have yet to be added later.

Change-Id: Iac303fc1346f4ed5c4da5300b1e311c1c6938ee2
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2020-07-07 11:07:14 -04:00
Gang Ba fec3780c1a Revert "libhsakmt: add Streaming Performance Monitors APIs"
This reverts commit d675d1cce1.

Reason for revert: Change was submitted by accident

Change-Id: If05c705e22296fd3ca789f269737d379a933361d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2020-06-29 10:54:54 -04:00
Yong Zhao 6a762ec717 libhsakmt: Improve the comment regarding queue doorbells
The comment failed to convey the fact.

Change-Id: Ia9b1d1c2583e288a6308d2bc81d42055064a5f4f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2020-06-16 15:16:03 -04:00
Gang Ba d675d1cce1 libhsakmt: add Streaming Performance Monitors APIs
Signed-off-by: Gang Ba <gaba@amd.com>
Change-Id: I5c23a8dacf9bc50c740908aabe391432f2c7112e
Signed-off-by: Gang Ba <gaba@amd.com>
2020-05-29 09:34:31 -04:00
Philip.Cox@amd.com 0a55f31463 Initial kfd debugger address watch support
Code for new kfd debugger address watch code.
           -- Adding support for:
              -- add address watch
              -- clear address watch

Change-Id: I9b014e7cee03897157b997b9e5b39b6ed403b8e1
Signed-off-by: Philip.Cox@amd.com <Philip.Cox@amd.com>
2020-05-21 13:41:55 -04:00
Felix Kuehling df16950a0c libhsakmt: use _Static_assert instead of static_assert
Using static_assert breaks in "Many Linux" build environment. It is not
supported by that libc version. _Static_assert is a compiler built-in
and does not depend on the libc version.

Change-Id: I37cf0ad10de94d8f6fc8cefc4fdda55c9520d599
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2020-05-15 01:34:10 -04:00
Ranieri Althoff aa185380f9 Avoid calculating strlen multiple times
Change-Id: Iec66c7d35e5d6cd2deb02c94ee070d0fa1335147
Signed-off-by: Ranieri Althoff <ranisalt@gmail.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
2020-05-13 00:38:26 -04:00
Jonathan Kim 93c333711a libhsakmt: queue suspend/resume can return non-zero positive values
New queue suspend/resume update can now return the number of successful
queue requests so return success if IOCTL return is non-negative.

This should be backwards compatible since old queue suspend/resume returns
0 on success.

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Change-Id: I06b70d95d203b2bfc19a0cc1b88c5719c695159a
2020-05-08 22:10:41 -04:00
Felix Kuehling e5062c4383 libhsakmt: Add PROT_NONE CPU mapping for scratch mappings
This is needed to allow gdb to access the memory.

Change-Id: I96c084b714e952d7b7000f0dd41e1c530fdd092f
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2020-05-02 15:52:30 -04:00
Srinivasan Subramanian 5e35364838 libhsakmt: check ret and errno for EBADF
Change-Id: I9fcbf955d8b7b01ff1025534a8c2eaa8e6790565
Signed-off-by: Srinivasan Subramanian <srinivasan.subramanian@amd.com>
2020-04-15 20:55:40 -04:00
Sean Keely 884fed4f04 Correct initial kfd_open_count increment.
Don't set kfd_open_count=1 unless hsaKmtOpenKFD actually succeeds.
This prevents returning HSAKMT_STATUS_KERNEL_ALREADY_OPENED in
subsequent calls when KFD is actually closed.

Signed-off-by: Sean Keely <Sean.Keely@amd.com>
Change-Id: Ia870b5faa8626826a6c8795aa10784d376cf2e80
2020-04-03 21:05:07 -04:00
Jon Chesterfield 0a1718b753 Replace libpci with new parser.
libpci was only used to find a marketing string for a device.
This patch looks for a pci.ids on disk and parses it to extract the
same string, using 'Device xxxx' as the fallback on file i/o error
or missing data from the text file. Tested by checking every vendor/
device pair against the values returned from libpci.

Change-Id: I21af3157472c1824d57fcee31393c6ee8ce07330
Signed-off-by: Jon Chesterfield <Jonathan.Chesterfield@amd.com>
2020-03-20 17:50:47 +00:00
Sean Keely 9efefe6d52 Handle EBADF when KFD file handle is still open.
Signed-off-by: Sean Keely <Sean.Keely@amd.com>
Change-Id: I23d6c87d5729f57c261030c6baeff4c977eef934
2020-03-11 18:52:19 -05:00
Divya Shikre ebe7de1f99 libhsakmt: Expose device Unique Id
Read device unique id from sysfs and expose it in HsaNodeProperties.
For devices not supported the value will be 0

Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I97b8689dfa090971c6876de6feaa97652e28c03d
2020-03-10 10:06:11 -05:00
Sean Keely e66818e4d3 Update analysis_memory_exception to recognize shared memory.
Add type HSA_POINTER_REGISTERED_SHARED printing.

Change-Id: Ic0400a097ebabde4f035b57fbca4cca12428fc97
2020-02-12 21:51:53 -05:00
Harish Kasiviswanathan 31530da7c6 libhsakmt: Child process can reacquire system props
If child process explicitly calls hsaKmtReleaseSystemProperties(), it
fails. Allow child process to release and acquire system properties.



Change-Id: I649a4600212711b2ad4474f605f3ca39a4003d03
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
2020-02-06 15:09:39 -05:00
Sean Keely 6957202df8 Update pointer info to include IPC memory.
IPC memory was previously returned as HSA_POINTER_ALLOCATED and
had garbage in the node_id field.  Due to ROCR_VISIBLE_DEVICES
we need to be able to distinguish between imported memory and
regular memory because imported memory may not be owned by an
agent that is visible in the process.  Differentiating these flags
allows the users to expect null agent for the owning agent.

Fixes 

Change-Id: Ide3489cec1ee2072dc9697fa5cb71ddb17999d14
2020-02-05 01:55:39 -06:00
Philip Yang 5858aa17a9 libhsakmt: Ignore mbind failure if flag NoSubstitute = 0
From Thunk spec, flag NoSubstitute = 0, if specific memory type is not
available on node, allocation may fall back to other memory that can
replace it on that node. mbind return failure if no memory available on
the specific node, we should ignore the mbind failure for this case.




Change-Id: I651a1bedf1852330604e56965cc17862403ebf87
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2020-01-27 13:27:37 -05:00
Felix Kuehling 87e10cd0b4 libhsakmt: Improve error handling in child process
Check for errno == EBADF in kmtIoctl to detect misuse of the kfd_fd
in a forked child process.

Detect being in a forked child process pro-actively by implementing
a pthread_atfork callback.

Make sure all mutexes get reinitialized in the child process to avoid
deadlocks.

Check for being in a forked child process in CHECK_KFD_OPENED so that
all hsaKmt functions will return the appropriate status
HSAKMT_STATUS_KERNEL_IO_CHANNEL_NOT_OPENED.

Update InvalidKFDHandleTest to expect that error code.

Change-Id: I0238e5fba344dcaa454e97a35db2e2dcc8d1f607
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2020-01-20 18:01:21 -05:00
Huang Rui 06464b917d libhsakmt: add NumCpQueues and NumSdmaQueuesPerEngine data field (v3)
NumCpQueues and NumSdmaQueuesPerEngine should be got by kfd driver not hardcode.
So add two data fields in HsaNodeProperties then thunk is able to get it from sysfs
that exposed by kfd.

v2: change NumCpQueues/NumSdmaQueuesPerEngine to one byte.
v3: merge two commits as one to avoid ABI update two times.

Change-Id: Ie386e4685f13493e22db6e207a399db6a4c5b9dc
Signed-off-by: Huang Rui <ray.huang@amd.com>
2020-01-03 23:27:42 -05:00
Yong Zhao 22e9ef7303 libhsakmt: Add the perf counter support for gfx1012
Change-Id: I55d68a77928617edaabd33ae0807bf23f739c8de
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-12-18 20:49:36 -05:00
Jonathan Kim 8b01a1c4c5 add queue snapshot test
adds api and test to get newly create queue snapshot per ptraced process.

Change-Id: Ife97123a5b930e837ccaa386801145ef23c2cc2c
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
2019-12-02 11:56:04 -05:00
Huang Rui fdba74c2fb libhsakmt: add gfx90c support for thunk
This patch adds the support for gfx90c apu. So far we treat it as "dgpu" and
gfx900. Will update hsa gfxip table while the isa/llvm is implemented on gfx90c.

Change-Id: I6ef164bf3e751fe6dd6287cac212a500dce84b1a
Signed-off-by: Huang Rui <ray.huang@amd.com>
2019-11-14 20:02:53 -05:00
Philip Yang 59c857476f libhsakmt: use the closest NUMA node to allocate queue ctx area
On NUMA system, allocate queue ctx save restore area on the closest NUMA
node to the GPU which the queue is going to run. This will improve
performance on NUMA system generally by reducing schedule latency and
fix the multi-node rccl-tests unstable performance issue.

If the closest NUMA node has no memory available, set flags NoNUMABind=1
to bypass mbind, to use default NUMA memory policy to allocate system
memory.



Change-Id: Ic62bfa5bb2efbf4f6ae79ff403e9610ddf18d45c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2019-11-06 17:33:26 -05:00
Ori Messinger e7f45fae8a Add non-priv PMC blocks to GFX10
This patch adds the non-privileged PMC blocks for GFX10/gfx1010.

Change-Id: I4b98cb2159d71113c12920ca7fd10e45096b4e2c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
2019-11-05 13:07:13 -05:00
Oak Zeng fa0cb9ebeb Handle IOCTL failure in fmm_release
FREE_MEMORY_OF_GPU ioctl could fail, e.g., if memory is still mapped
to GPU. Handle this failure by return error in fmm_release/HsaKmtFreeMemory

Change-Id: I5461db39964f733cf97376d50e44906a9b4c0f13
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
2019-11-01 08:59:05 -04:00
Yong Zhao ab2daf6538 libhsakmt: Add a message when a device is not supported
This helps to quickly triage problems.

Change-Id: Iad2b4b74209ab972be0c2f6311eeb3aaf098d29f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-10-20 12:44:13 -04:00
Yong Zhao 1c7755d2da libhsakmt: Add gfx1012 device IDs
Now the gfx1012 device IDs are okay to reveal.

Change-Id: I9da2a036b74ec7b6b8b1fb7587597a5847f02205
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-10-17 12:35:22 -04:00
Yong Zhao 16fa78b134 libhsakmt: Print an error message when map_mmio failes
Without this change, the failure was hard to notice when it happened.

Change-Id: I99c3e8cea0d0cbd3bcfe79069410e6e870e225bf
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-10-17 12:32:32 -04:00
Amber Lin 23541e0289 libhsakmt: handle CPU cache info on non-NUMA sys
When CONFIG_NUMA is not enabled in the kernel config, only one CPU node
presents on the system and /sys/devices/system/node/nodeX directories
don't exist. Read CPU cache information from /sys/devices/system/cpu in
this situation.

Change-Id: I017ff17dd72678a0551edcc77446664501aa42ca
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2019-10-17 11:53:19 -04:00
Philip Cox 6933540c81 Remove debugger data reg accesses
The debug trap accesses the data0/data1 registers, so we do not
want the userspace to write values to it.  We remove the calls to
set the data0/data1 register values.

Change-Id: Iaba842a4c445f339f16a39fe1994526ff78a2f3c
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2019-10-10 14:32:54 -04:00
Philip Cox dbbd189b33 Add functions to get the kfd debugger version info
To support adding new features to the kfd debugger, and not break
functionality, we need to be able to check the kfd debugger support
version info from the kernel.

Change-Id: Icd88e4edab8430c35eaed588e62d892c1b5c62ec
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
2019-10-10 14:32:54 -04:00
Amber Lin 5a09880620 libhsakmt: fix typo in error message
When fail to get CPU dirs from //sys/devices/system/node/nodeX directory,
the error message should print node_dir, not path.

Change-Id: If76a51918c8dd55fa6605a62f3d29f9efc6fadb3
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2019-09-30 14:29:39 -04:00
shaoyunl a1e399a3ff Thunk : Add gfx1011 support from thunk side
Change-Id: I6b202b75fc1ad0e69576a35a6a3e499818137e04
Signed-off-by: shaoyunl <shaoyun.liu@amd.com>
2019-09-25 11:02:33 -04:00
Philip Yang 71cf3cf5d3 libhsakmt: correct number of NUMA nodes calculation
numa_max_node() return the highest node number available on the current
system, number of NUMA nodes should be numa_max_node() + 1.

Change-Id: I20a6c17af071e73e853cb5ea6d0304c8aca52681
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2019-09-16 16:25:57 -04:00
Philip Yang 42392f093f libhsakmt: handle NUMA system with no memory on node 0
on NUMA system, node 0 may have no memory, application pass node id
0 to hsaKmtAllocMemory will fail because mbind to specify the allocation
from node 0 return EINVAL.

Add new flag NoNUMABind for application to pass it to hsaKmtAllocMemory
to skip mbind.

hsaKmtCreateEvent and hsaKmtCreateQueue specify the new flag NoNUMABind
to allocate system memory for event page and CWSR area, don't bind the
system memory to a specific NUMA node.

Change-Id: I854e5a57502c7807c4c5ff2e441d499ae515c309
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2019-09-16 11:30:24 -04:00
Philip Yang 4da09813a3 libhsakmt: fix mbind failed on docker
Docker seccomp by default blocks mbind system call, so mbind return
failed on docker. thunk should not fail this otherwise application
cannot allocate system memory on docker.

Use pr_warn_once and pr_err_once to avoid duplicate same error messages



Change-Id: I61a7c0e4abaa3dcfe7abf2ea48db90f669f9638a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2019-09-13 15:01:47 -04:00
Yong Zhao 3ecd83e52d libhsakmt: Support gfx1012
The gfx version item is yet to be added.

Change-Id: Ia6c487447e5a5df80c0c12fe150939175068024b
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-09-06 14:42:32 -04:00
Yong Zhao d6539ddc24 libhsakmt: Implement HSA_FORCE_ASIC_TYPE to overwrite asic type
Force all the GPUs to a certain type, use the below command:
HSA_FORCE_ASIC_TYPE="10.1.0 1 gfx1010 14"
meaning major.minor.step dgpu asic_name asic_id

This will faciliate the cooperation across the teams for bringing up
ASICs which reuse existing device IDs.

Change-Id: I40fe4c9b46d3ccb3e38ea52250e80e82fb50fb0f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
2019-09-06 12:12:42 -04:00
Felix Kuehling e320913e9e libhsakmt: Fix userptr mappings on gfx802
The memory size alignment workaround for a TLB bug on gfx802 was
breaking userptrs because it would attempt to get_user_pages beyond
the end of a VMA. Refine this workaround based on our understanding
of the HW bug. It only affects L2 cacheline allocation, which is
decided by the last page in the cache line (8 entries = 32KB of
address space). Thus aligning memory allocation so that the last
page falls on the end of a 8 entry TLB cache line allows caching
to work correctly.

Imported images require specific alignments. If their size is not
naturally aligned with 8 cache lines, it may have bad TLB cache
performance.

This patch will only have the desired effect if redundant size
padding in KFD is also removed.

Change-Id: I984cbe7fa61fec04d70fa387aaf9aab370eabeb9
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-08-30 19:06:24 -04:00