Add two new agent info fields:
HSA_AMD_AGENT_INFO_UCODE_VERSION
HSA_AMD_AGENT_INFO_SDMA_UCODE_VERSION
Change-Id: I51cb853724b23a26e945e5c1ac32c16d0cb3bc31
Modified If condition checks in GElfImage::pullElf() of amd_elf_image.cpp to
check using section types instead of a string check.
Change-Id: I1ab92f0a9118fb2382652a1cc900a3150cbee2da
Thunk keeps an internal cache of system topology that can be used to
speed up subsequent calls to hsaKmtAcquireSystemProperties(). This cache
is cleared by calling hsaKmtReleaseSystemProperties() at the beginning
of BuildTopology().
hsaKmtRuntimeEnable() also calls hsaKmtAcquireSystemProperties() inside
Thunk. Move call to hsaKmtRuntimeEnable() after BuildTopology() so that
we can re-use Thunks internal cache.
Parsing of of topology can take ~150 ms on systems for large number of
nodes.
Change-Id: I741709d49d67d244f5fbd707fe8f01ab923bb153
Simplified the callback method. Also fixed the way, loaded shared object were getting appended into a string vector,
which was not being passed to this callback method.
Change-Id: I68661dd73f61a11c42fa92f670e8e7b6ffcb5711
File reorganization feature was implemented with backward compatibility
The backward compatibility support will be deprecated in future release.
Changed the #pragma message to #warning for a smooth transition
Change-Id: Ibaedc1873bc764d25f74d9ca9416077d084e332d
Previous versions of HIP will call hsa_amd_ipc_memory_create with then
len aligned to granularity. Temporarily allow this so that we go not
break backward compability. Will remove this after 2 releaes
Change-Id: I6b5ac2cad5d32d62c803637cf1a2c6deebc03169
MES devices need GART mappings and therefore need non-paged memory. But
using non-paged memory introduces performance regression where it can
take over 80 ms to see the signal changes if the memory is in the wrong
NUMA node. Currently, we cannot control NUMA affinity when allocating
non-paged memory. Using non-paged memory allocation only on devices that
have MES scheduler
Change-Id: Ib27fb01d75247aa4f2bb2aa4503c6af5a98afda0
Using previous method of std::thread for SVM profiler task was causing
segfaults on thread launch on RHEL 8 if libhsa-runtime library is loaded
using dlopen.
Change-Id: Ic010cd6ae9bc6e6ed0605de02b93f6aae8ed3e97
Transient exec usage is not required for GFX11 and will result in a NULL
return of s_sendmsg_rtn if directly returned to exec_lo.
Directly fetch and mask the doorbell ID to ttmp3 for GFX11 instead.
Change-Id: Ie17ed69d68d84ab18869b1c7871a0ed0482cd661
If hsa_amd_agents_allow_access is called for an imported IPC handle,
ignore the request as this pointer will already have these pointers
mapped to other GPUs during IPCAttach()
Change-Id: I4bf33ed57e93b5a3ead749d4f87ab6f2750bed58
If a user queries the pointer info on an invalid pointer,
hsaKmtQueryPointerInfo will return error or unknown pointer. The other
fields in HsaPointerInfo are invalid, so we do not return them to the
user.
Also removing the assert and returning unknown pointer instead. As the
assert will not trigger in release builds.
hsaKmtQueryPointerInfo may also return unknown pointer for userptrs as
they are not always tracked by thunk. Adjusting code to still treat
these pointers as valid in this case.
Change-Id: Idf5cd8b61cd532d31b072f449839d223369bb138
:Since all public interface libraries are present in
same folder RUNPATH/RPATH is not required in the library itself.
Application shall provide the required RPATH/RUNPATH to load all
libraries.
Change-Id: I1d1ba920bf291eb89bd1f4c0fd0cfd80c7d739bd
Amount of memory requested by user may be aligned-up internally to
the memory pool granularity. The extra padded memory should not be
considered when validating pointers from the user. Also return the
user requested size when user queries pointer information.
Change-Id: I28b25448ea03c836b44fafdb34b7330cf6887424
What we want for libdrm-amdgpu is for it to be a recommended package.
Either libdrm or libdrm-amdgpu can be used, but we recommend the latter.
Using "SUGGESTS" does not seem like a strong enough requirement, but
CPACK does not support RPM recommends. Although, it does allow
customizing the RPM SPEC file template. By generating a template, which
is done by setting:
-DCPACK_RPM_GENERATE_USER_BINARY_SPECFILE_TEMPLATE=1
This template file can be trivially modified to allow adding a line to
implement CPACK_RPM_PACKAGE_RECOMMENDS.
Fixes
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
Change-Id: I34467b1ba878827ced9b8db74977967815732552
Add agent info query HSA_AMD_AGENT_INFO_ASIC_FAMILY_ID.
Then we can remove the codes to parse family id.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I3ac4746d3015e89b32322ebc0f8a3084f98677a4
This reverts commit c904cc5856.
The change from using RUNPATH to RPATH was not approved formally.
Reverting this patch until this gets approved.
Change-Id: Ibc1a8f9d5dfa6694adacccfd9e3b0d053660e848
The allocation logic of the SPI does not take into account compute
user thread management settings for masking CUs with the exception of
skipping fully disabled SEs. This means that occupancy limited
dispatches such as cooperative launch may over allocate onto hardware
resources that are not immediately available, resulting in a potential
barrier logic hang as occupying work groups are waiting on enqueued
work groups to reach the barrier.
Further work will have to be done to get the per-SA CU enablement count
from the KFD in order to correctly clip the cooperative CU limit based
on the CU mask, which will require breaking the current ABI.
For now, report that cooperative launch is not supported while a CU
mask has been applied to prevent potential shader hangs.
Change-Id: I8be4bb47d65ceb62d805f36ef6ef3996d756021f
New environment variable HSA_OVERRIDE_CPU_AFFINITY_DEBUG to
enable/disable overriding CPU affinity.
Default value is enabled(1).
This is a temporary variable and may be removed in the future.
Change-Id: Id6a7c611730471ddc276ca333fde1e57046bf32a
For gfx11 the image type table has some different values compared to
previous asic families (e.g TYPE_SRGB). Creating a new LUT class to
use these new values.
Change-Id: Ifdfc6cd29bfd5f4ec2643c848fcb9986eb874f9e
This library was taken from public MESA library:
https://gitlab.freedesktop.org/mesa/mesa/-/tree/main/src/amd/addrlib
with top commit:
2866ae32da0348caf71ad2d11c353321df626ff4
Removing macros.h as it is no longer used by addrlib
Change-Id: I0fdabfe48b74c259b4d29d81beae89604bbc141a
Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES for
GFX11.
Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity;
aliases AllocateIPC.
Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4
Adding new ImageManager class for GFX11 GPUs
ImageManagerGfx11 functions copied from ImageManagerNv.
Register descriptions in resource_gfx11.h updated for gfx11.
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Change-Id: I48b39f6a633aef14aa829f7240a43fe0feb1c290
GPUs excluded by RVD are not expected to have scratch, memory, trap
handling nor memory regions set up. Now that these GPUs are added to
a new list, early return on agent destruction to prevent bad function
calls on destroy.
Also fix up broken memory releases between the gpu lists and ugly braces.
Change-Id: I52fc6e86ceba0a0383cedc63310eb409515eaf9f
The current state of hsa-rocr does
NOT requires thunk lib as its dependency.
Its unnecessary pulling thunk package while
installing rocr. This patch corrects
the same
Change-Id: Id98ede8b66ffd9aaf4a47da96ba2f981f4c3da73
Maintainer distribution list field had wrong information.
Adding the newly formed DL by the component team.
Change-Id: I61651e429375cdc512d0fe4b0768f917506b5392
A work group processor (WGP) require both its CU to be enabled
in order to be enabled.
The KFD will round robin distribute by even-indexed pairs so
enforce this requirement for runtime set mask calls.
Change-Id: Ic46661b01f398aa1fe24d96b5c9c31f122f967a3
Discovered agent handles should only apply to copy routing, not to
copy device selection. The user may not have mapped all allocations
to all GPUs so we must ensure that the copying device is one passed
by the user.
Change-Id: I2532e66d30e6842624e594f235dd144a186220d4