This patch is the hot fix to fix the param number checking after remove
dgpu input.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: Ic980588f78616f99076de742af580afb4273fb2f
gfx90c should use GFX902 which is the same with gfx902.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: Id24dc2c85c9f49f36b00889c3b8b1b19cce34e09
Whether use dgpu path will check the props which exposed from kernel.
We won't need hard code in the ASIC table.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: I0c018a26b219914a41197ff36dbec7a75945d452
KFD already implemented the fallback path for APU. Thunk will use flag
which exposed by kfd to configure is_dgpu instead of hardcode before.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Change-Id: I445f6cf668f9484dd06cd9ae1bb3cfe7428ec7eb
Gfx10 need 12bytes/wave control stack
Change-Id: I6c6f2819572e6b43aa3140d4dbe79d930e4c1c9c
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
This is needed to avoid additional references to mapped BOs in child
processes that can prevent freeing memory in the parent process and lead
to out-of-memory conditions.
Change-Id: I25c90510a14dde515cc23ea5dc1f68e8d7e37a66
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
strlen(src) should not be used as the length in strncpy. Use memcpy
since we know the length of the string, and ensure that we
NULL-terminate regardless of length
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I21cc6d106510c69464e7ac9d3fc7da3a1e6d1a68
The option to use kfd_fd for cpu mapping is for very old broken KFD
version, it is not used in upstreaming process. This causes issue when
multiple process uses shared system memory because the GTT address is
over 40 bits.
Change to always use render node fd to create CPU mapping.
Change-Id: Id7e7b2a2e2f13c6e62c5de170589abfff4d456b0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
This reverts commit d675d1cce1.
Reason for revert: Change was submitted by accident
Change-Id: If05c705e22296fd3ca789f269737d379a933361d
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Using static_assert breaks in "Many Linux" build environment. It is not
supported by that libc version. _Static_assert is a compiler built-in
and does not depend on the libc version.
Change-Id: I37cf0ad10de94d8f6fc8cefc4fdda55c9520d599
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
New queue suspend/resume update can now return the number of successful
queue requests so return success if IOCTL return is non-negative.
This should be backwards compatible since old queue suspend/resume returns
0 on success.
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Change-Id: I06b70d95d203b2bfc19a0cc1b88c5719c695159a
This is needed to allow gdb to access the memory.
Change-Id: I96c084b714e952d7b7000f0dd41e1c530fdd092f
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Don't set kfd_open_count=1 unless hsaKmtOpenKFD actually succeeds.
This prevents returning HSAKMT_STATUS_KERNEL_ALREADY_OPENED in
subsequent calls when KFD is actually closed.
Signed-off-by: Sean Keely <Sean.Keely@amd.com>
Change-Id: Ia870b5faa8626826a6c8795aa10784d376cf2e80
libpci was only used to find a marketing string for a device.
This patch looks for a pci.ids on disk and parses it to extract the
same string, using 'Device xxxx' as the fallback on file i/o error
or missing data from the text file. Tested by checking every vendor/
device pair against the values returned from libpci.
Change-Id: I21af3157472c1824d57fcee31393c6ee8ce07330
Signed-off-by: Jon Chesterfield <Jonathan.Chesterfield@amd.com>
Read device unique id from sysfs and expose it in HsaNodeProperties.
For devices not supported the value will be 0
Signed-off-by: Divya Shikre <DivyaUday.Shikre@amd.com>
Change-Id: I97b8689dfa090971c6876de6feaa97652e28c03d
If child process explicitly calls hsaKmtReleaseSystemProperties(), it
fails. Allow child process to release and acquire system properties.
Change-Id: I649a4600212711b2ad4474f605f3ca39a4003d03
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
IPC memory was previously returned as HSA_POINTER_ALLOCATED and
had garbage in the node_id field. Due to ROCR_VISIBLE_DEVICES
we need to be able to distinguish between imported memory and
regular memory because imported memory may not be owned by an
agent that is visible in the process. Differentiating these flags
allows the users to expect null agent for the owning agent.
Fixes
Change-Id: Ide3489cec1ee2072dc9697fa5cb71ddb17999d14
From Thunk spec, flag NoSubstitute = 0, if specific memory type is not
available on node, allocation may fall back to other memory that can
replace it on that node. mbind return failure if no memory available on
the specific node, we should ignore the mbind failure for this case.
Change-Id: I651a1bedf1852330604e56965cc17862403ebf87
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Check for errno == EBADF in kmtIoctl to detect misuse of the kfd_fd
in a forked child process.
Detect being in a forked child process pro-actively by implementing
a pthread_atfork callback.
Make sure all mutexes get reinitialized in the child process to avoid
deadlocks.
Check for being in a forked child process in CHECK_KFD_OPENED so that
all hsaKmt functions will return the appropriate status
HSAKMT_STATUS_KERNEL_IO_CHANNEL_NOT_OPENED.
Update InvalidKFDHandleTest to expect that error code.
Change-Id: I0238e5fba344dcaa454e97a35db2e2dcc8d1f607
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
NumCpQueues and NumSdmaQueuesPerEngine should be got by kfd driver not hardcode.
So add two data fields in HsaNodeProperties then thunk is able to get it from sysfs
that exposed by kfd.
v2: change NumCpQueues/NumSdmaQueuesPerEngine to one byte.
v3: merge two commits as one to avoid ABI update two times.
Change-Id: Ie386e4685f13493e22db6e207a399db6a4c5b9dc
Signed-off-by: Huang Rui <ray.huang@amd.com>
adds api and test to get newly create queue snapshot per ptraced process.
Change-Id: Ife97123a5b930e837ccaa386801145ef23c2cc2c
Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
This patch adds the support for gfx90c apu. So far we treat it as "dgpu" and
gfx900. Will update hsa gfxip table while the isa/llvm is implemented on gfx90c.
Change-Id: I6ef164bf3e751fe6dd6287cac212a500dce84b1a
Signed-off-by: Huang Rui <ray.huang@amd.com>
On NUMA system, allocate queue ctx save restore area on the closest NUMA
node to the GPU which the queue is going to run. This will improve
performance on NUMA system generally by reducing schedule latency and
fix the multi-node rccl-tests unstable performance issue.
If the closest NUMA node has no memory available, set flags NoNUMABind=1
to bypass mbind, to use default NUMA memory policy to allocate system
memory.
Change-Id: Ic62bfa5bb2efbf4f6ae79ff403e9610ddf18d45c
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
This patch adds the non-privileged PMC blocks for GFX10/gfx1010.
Change-Id: I4b98cb2159d71113c12920ca7fd10e45096b4e2c
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
FREE_MEMORY_OF_GPU ioctl could fail, e.g., if memory is still mapped
to GPU. Handle this failure by return error in fmm_release/HsaKmtFreeMemory
Change-Id: I5461db39964f733cf97376d50e44906a9b4c0f13
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Without this change, the failure was hard to notice when it happened.
Change-Id: I99c3e8cea0d0cbd3bcfe79069410e6e870e225bf
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
When CONFIG_NUMA is not enabled in the kernel config, only one CPU node
presents on the system and /sys/devices/system/node/nodeX directories
don't exist. Read CPU cache information from /sys/devices/system/cpu in
this situation.
Change-Id: I017ff17dd72678a0551edcc77446664501aa42ca
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
The debug trap accesses the data0/data1 registers, so we do not
want the userspace to write values to it. We remove the calls to
set the data0/data1 register values.
Change-Id: Iaba842a4c445f339f16a39fe1994526ff78a2f3c
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
To support adding new features to the kfd debugger, and not break
functionality, we need to be able to check the kfd debugger support
version info from the kernel.
Change-Id: Icd88e4edab8430c35eaed588e62d892c1b5c62ec
Signed-off-by: Philip Cox <Philip.Cox@amd.com>
When fail to get CPU dirs from //sys/devices/system/node/nodeX directory,
the error message should print node_dir, not path.
Change-Id: If76a51918c8dd55fa6605a62f3d29f9efc6fadb3
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
numa_max_node() return the highest node number available on the current
system, number of NUMA nodes should be numa_max_node() + 1.
Change-Id: I20a6c17af071e73e853cb5ea6d0304c8aca52681
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
on NUMA system, node 0 may have no memory, application pass node id
0 to hsaKmtAllocMemory will fail because mbind to specify the allocation
from node 0 return EINVAL.
Add new flag NoNUMABind for application to pass it to hsaKmtAllocMemory
to skip mbind.
hsaKmtCreateEvent and hsaKmtCreateQueue specify the new flag NoNUMABind
to allocate system memory for event page and CWSR area, don't bind the
system memory to a specific NUMA node.
Change-Id: I854e5a57502c7807c4c5ff2e441d499ae515c309
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Docker seccomp by default blocks mbind system call, so mbind return
failed on docker. thunk should not fail this otherwise application
cannot allocate system memory on docker.
Use pr_warn_once and pr_err_once to avoid duplicate same error messages
Change-Id: I61a7c0e4abaa3dcfe7abf2ea48db90f669f9638a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Force all the GPUs to a certain type, use the below command:
HSA_FORCE_ASIC_TYPE="10.1.0 1 gfx1010 14"
meaning major.minor.step dgpu asic_name asic_id
This will faciliate the cooperation across the teams for bringing up
ASICs which reuse existing device IDs.
Change-Id: I40fe4c9b46d3ccb3e38ea52250e80e82fb50fb0f
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
The memory size alignment workaround for a TLB bug on gfx802 was
breaking userptrs because it would attempt to get_user_pages beyond
the end of a VMA. Refine this workaround based on our understanding
of the HW bug. It only affects L2 cacheline allocation, which is
decided by the last page in the cache line (8 entries = 32KB of
address space). Thus aligning memory allocation so that the last
page falls on the end of a 8 entry TLB cache line allows caching
to work correctly.
Imported images require specific alignments. If their size is not
naturally aligned with 8 cache lines, it may have bad TLB cache
performance.
This patch will only have the desired effect if redundant size
padding in KFD is also removed.
Change-Id: I984cbe7fa61fec04d70fa387aaf9aab370eabeb9
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>