hsaKmtMapMemoryToGPU should not try to map VRAM on peer GPUs that don't
have an IO-Link to the memory. The new P2P mapping code in KFD will
fail otherwise.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I6d59b55651b98756865a0f69eafef3e386372cf3
This allows init_process_apertures to use the whole consistent topoology
instead of taking its own partial snapshot.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: Ia13e7aa7fcd090ea8d6cacd4babb29a27c20207f
Kernel debug IOCTL got version bumped to v11.
Updated runtime enable but missed runtime disable check update.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I71a8970ccfe7dc517abe7b4ad962369aea6a0496
To improve performance on queue preemption, allocate ctx s/r
area in VRAM instead of system memory, and migrate it back
to system memory when VRAM is full.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Change-Id: If775782027188dbe84b6868260e429373675434c
The KFD no longer allow debug ops that modify HW state prior to
trap activation so permit bump in major version.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I072d3998b7b043df9a67f0f6762b0afdfa9382c6
Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES on
GFX11.
Change ensures queue memory does not specify ATS.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I10b23b0205c90dad902c711a88cfb5e9b4979617
System Management Interface event is read from anonymous file handle,
this helper wrap the ioctl interface to get anonymous file handle for
GPU nodeid.
Define SMI event IDs, event triggers, copy the same value from
kfd_ioctl.h to avoid translation.
Change-Id: I5c8ba5301473bb3b80bb4e2aa33a9f675bedb001
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Add KFDGpuID to HsaNodeProperties to return gpu_id to upper layer,
gpu_id is hash ID generated by KFD to distinguish GPUs on the system.
ROCr and ROCProfiler will use gpu_id to analyze SMI event message.
Change-Id: I6eabe6849230e04120674f5bc55e6ea254a532d6
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
This env variable sets the max VA alignment order size as
"PAGE_SIZE * 2^alignment order" during mapping. By default the order
size is set to 9(2MB).
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I01ae4e0963f4d21c7c367464e60f865bc58d7fac
New API function to report available memory per GPU
Signed-off-by: Daniel Phillips <Daniel.Phillips@amd.com>
Change-Id: I63c1e4ca0020c657977ab3635947ab0ed0a81440
Tested on Talos II with Vega 64
POWER systems allocate NUMA nodes on multiples of 8 to allow CPU
onlining / offlining
Set the correct NUMA mask bits when requesting node-bound memory
allocations
This is a cleanup/squash/rebase of:
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/pull/47
Change-Id: Id4af6dff7e66e9d464d6b17a1e99087eb3ac8e51
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Import the latest version from the kernel tree.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: If5f998ad55085ebd5020adaa382181204d834e3e
Currently, context save area size passed to KFD includes the
size of the debug area. Change this to report the actual size
of the context save area to KFD.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Change-Id: I5d440ae802255a97ade046775f6a000bae79d5d5
Mapped memory areas become invalid after fork, and the child process is
required to remap the memory areas after a fork. So we mark these device
memory mappings with MADV_DONTFORK so that they are removed from the
child process after fork.
This was causing some issues when doing CRIU checkpoint/restore because
CRIU and amdgpu_plugin were not able to handle these mappings.
Change-Id: I50eb334aecea6dab7522d94da0273adcf4fb1ce0
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
The AMD compiler team has confirmed that they expect gfx90c
to be gfx90c, with a major/minor/stepping of 9, 0, and 12
respectively. It appears that there is a typo in the libhsakmt
topology information that lists this part as gfx902. This patch
fixes the issue.
Signed-off-by: Joseph Greathouse <Joseph.Greathouse@amd.com>
Change-Id: I6f907a7aa6f190b12aba8bb4210c7b341b3c720b
The return code is just -1 if any error occurs. To detect debugger
unavailable we need to check the actual ioctl error code.
Change-Id: I8a294c754196aec916809497ec8e810da2f072b8
Signed-off-by: Sean Keely <Sean.Keely@amd.com>
For userptr, after taking aperture lock, decrease registration_count and
ensure object registration_count equal to 0 to release KFD and thunk
object.
Move decrementing of registration count from fmm_deregister_memory into
__fmm_release to avoid a race condition when dropping the aperture lock
in fmm_deregister_memory.
Change-Id: I5381fa6b8a77a1516af2554e5174e91969c338c4
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
to faciliate debugging, print errno when queue destroy fail
current log give very little information when fail:
[ RUN ] KFDQMTest.AllSdmaQueues
[ ] Regular SDMA engines number: 1 SDMA queues per engine: 2
[ OK ] KFDQMTest.AllSdmaQueues (11 ms)
[ RUN ] KFDQMTest.AllXgmiSdmaQueues
[ ] XGMI SDMA engines number: 0 SDMA queues per engine: 2
[ OK ] KFDQMTest.AllXgmiSdmaQueues (6 ms)
[ RUN ] KFDQMTest.AllQueues
/home/foreman/build/hsakmt-roct-amdgpu-1.0.9.40500/sources/libhsakmt/tests/kfdtest/src/KFDQMTest.cpp:381: Failure
Value of: (cpQueues[i].Destroy())
Actual: 1
Expected: HSAKMT_STATUS_SUCCESS
Which is: 0
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Change-Id: I5b1b5b616a5fd7ff198360c893a7aeed685022bd
After calling ioctl to create userptr obj, take aperture lock, check if
there is same userptr obj created after finding object, to catch the
race that multiple threads register same userptr to multiple GPUs.
If same userptr obj exist, then increase userptr registeration_count,
and free the newly create obj.
Change-Id: I63ae3a4f54da8aedd11c124d8d53ebe727b8203a
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
g_props is intialized in topology_take_snapshot which needs to call
hsaKmtAcquireSystemProperties. In hsaKmtOpenKFD, it doesn't call hsaKmtAcquireSystemProperties.
So it needs to change parameter of topology_is_svm_needed from node_id
to EngineId to avoid Segmentation fault on gfx902.
Signed-off-by: changzhu <Changfeng.Zhu@amd.com>
Change-Id: Iba8d20a9142510c70927454d26bdcaf579ad5574
Use the database provided by libdrm-amdgpu,as this is maintained by AMD
Signed-off-by: Mike Li <Tianxinmike.Li@amd.com>
Change-Id: I319a46d833cb23173a77fdff1deae69ea58588b8
If lookup is successful, device info from find_hsa_gfxip_device() is
used to populate props->EngineId.ui32 gfx version fields and node CAL
name (props->AMDName).
If unsuccessful, uses gfx_target_version sysfs field if present
(otherwise results in error), and sets the node CAL name to
"GFX<GFX_VERSION>" (e.g. 9.0.8 -> GFX090008).
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id39c521b769aebbe40c934f19e03150a66884cce
Some g_system and Node ID checks already performed by validate_nodeid().
Fixes bug where hsaKmtGetNodeProperties() would return on
validate_nodeid() error without releasing lock. Switches some
conditionals to return INVALID_NODE_UNIT instead of INVALID_PARAMETER if
NodeId >= NumNodes. Removes some g_system assertions to return error
code rather than trigger a hard fault.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Id1604b20c2cef8808b98cdad61bd47aa7ea3d229
All conditional logic previously dependent on asic_family now dependent
on gfx version. As such, HSA_FORCE_ASIC_TYPE is now deprecated and
HSA_OVERRIDE_GFX_VERSION should be used instead.
device_info structs containing asic_family, eop_buffer_size, and
doorbell_size removed (alongside dev_lookup_table[]). eop_buffer_size
and doorbell_size now determined via macro, with eop_buffer_size and
full gfx version field being added to queue struct.
Certain funcs previously dependent on DID or GPU ID now dependent on
Node ID (due to use of get_gfxv_by_node_id()).
Fixes sleeper bug where ctl_stack_size and debug_memory_size were being
incorrectly programmed on gfx90a.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Iaabdc5c73920ad83b4b379cd6086992357b8ff10
get_block_properties() switch statement now based on major GFX version
instead of asic_family. Addition for NPI bringup only needed on major
GFX revisions unless differing perf_counter_block assignment necessary.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: Ia02676d2a3a8f968b67feeafbab534ef4c01c0dd
Add defines for converting between full gfx version (in decimal) to
major/minor/stepping values, and major/minor/stepping values to full gfx
version (in hex). The first is to be used when grabbing the gfx version
from sysfs, the second when returning the gfx version in
get_gfxv_by_node_id() for comparisions in Thunk.
Add enum for defines of full gfx versions for various cards. These are
to be used in conditionals to replace CHIP_*/asic_family comparisons. On
NPI bringup, addition to this enum will not be required for
functionality unless explicitly needed for a conditional. Most ASICs in
this enum simply for completeness.
Add get_gfxv_by_node_id(): returns full gfx version in hex.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I9d1dba7fdd4a587a000eea2f4141a1d301bf4776
Expand check control stack size for Beige Goby, Van Gogh and Yellow Carp.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I6c8345bbb751eb9179fc740fe1674504b4295a5f
Put CHIP_gfx90a next to CHIP_gfx908 to keep all the GFXv9
chips together.
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Change-Id: I32b4b7f7746ae4bdaff7e178f13307c1bcfa93e1
hsaKmtQueryPointerInfo return vm_obj flags for all below registered
memory types other than hsaKmtAllocMemory, and set the CoarseGrain flag
correctly for:
Graphics: always coarse grain.
Shared: hsaKmtShareMemory pass mflags with export handle to KFD to store
in KFD objs, hsaKmtRegisterSharedHandle get mflags from KFD with import
handle.
Userptr: it is already coarse-grain by default, or based on mflags
provided in hsaKmtRegisterMemoryWithFlags call.
Change-Id: Idc23e8b0cf599b02580737639da2f9ef4ccd0c0d
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Query pointer info returns KFD_IOC_ALLOC_MEM_FLAGS_* flags, it should
return HsaMemFlags, fix it by renaming vm_obj->flags to mflags and
always saving HsaMemFlags.
Use consistent function parameter and variable name to avoid confusion:
mflags for HsaMemFlags and ioc_flags for KFD_IOC_ALLOC_MEM_FLAGS_*
flags.
AMDKFD_IOC_GET_DMABUF_INFO return ioc_flags, translate it to mflags
using new helper fmm_translate_ioc_to_hsa_flags.
Change-Id: If9e117c507139c0166abb1ab0df8c233ef7e48a1
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Add hsaKmtRuntimeEnable and disable.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Change-Id: I083f9293948e975546a1b3c1334cb41499b9ab1f