Add compile time asserts to force incrementing API table STEP versions
each time a new function is added to each table. This is required for
profiler team to be able to add preprocessor macros to determine which
versions contain the new APIs.
Also incrementing the major versions to 2 to indicate new numbering
scheme.
Change-Id: I148a436a5ceab6be3906f8263b40ea9b07841577
Some GFX9 devices will drop commands if ring buffer submission is less
than 64 DWORDs. Pad submission with a NOP head an trailing null
DWORDs in this case.
Change-Id: I850af490fb699f7efe8aef96d97c600a8e76516b
Also changed enum value to leave gap between enums that only exist in
hsa_region_info_t and enums that exist in both hsa_amd_memory_pool_info_t
Change-Id: I8f9f31200de66648e9328e4203ab283068c993f0
We don't need to keep track of specific blit engines in gang for
submission anymore as ganging early exits on pending bytes.
So tidy up the fluff.
Change-Id: I77e80bf1ad8f561a03fff77bce33aa09d02760c6
When oversubscribing SDMA gangs, a circular deadlock can occur since
gang enqueue is staggered with respect to SDMA engine leader based
on source to destination.
As a result, an enqueued leader may be waiting on a gang item that is
waiting on another enqueued leader or gang item and so on.
To prevent this, first lock the submission to ensure dma status query
and submissions are atomic. Once this is in place, be more stringent
with ganging in that all SDMA engines must be available in order to gang.
Finally, re-enable SDMA ganging by default.
Change-Id: I4511e3487db9d26475b5aece4897f10168cc5322
xGMI for compute partitioning in non-SPX modes does not have
a reported bandwith.
Fix it to at most 2 since each partition is either bounded
by the number of xGMI links or the number of available
SDMA contexts.
Change-Id: I09094bd7548d9eee6f039b0efe849838e5de166e
SDMA ganging is causing some regressions with some applications hanging.
Temporarily disabling SDMA ganging by default until issue is fixed.
Change-Id: I65e172923a53a967df27b30d969ad5d215c4fa09
Use all available SDMA engines capped by xGMI bandwith for
all D2D copies within a hive.
By default, set the latency boundary copy size as 4KB and below.
Any copy size in within this boundary will not gang.
Avoid oversubscribing engines by not ganging on engines with
pending non-ganged work.
An enviroment variable HSA_ENABLE_SDMA_GANG has been provided
to override default ganging behaviour.
Change-Id: Iccde76aa1af1d47ea2a151789432c9db4f0ffa8d
Reverting this as current mainline compiler branch does not support
gfx1150/gfx1151 yet. Will bring back later.
This reverts commit e877840197.
Change-Id: I31ff4fb2d5817538094a7ffaeba96dd6a7d660c7
Add agent info query to return nearest CPU agent. This can be used to
determine which CPU agent is in the same NUMA region as the GPU agent.
Change-Id: I5400b4347ffbf4d2a836df31c4de443a38b0ecd1
Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical
AND operator (&&) when evaluating AQL packet type
Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70
Support function to retain allocation handle for memory mappings.
The get allocation properties function will return the current
allocation properties for existing memory mappings.
This is part of patch series for Virtual Memory API.
Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551
Support exporting and importing dmabuf file descriptors for memory
mappings. The exported dmabuf file descriptors are shareable posix
file descriptors that can be used for cross-vendor, cross-device
and cross-process memory sharing.
This is part of patch series for Virtual Memory API.
Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5
Mapping memory handles to virtual memory addresses do not make them
accessible. The set access function is needed to make the memory
mappings accessible to specific agents. The get access function
returns current access properties for individual agents.
This is part of patch series for Virtual Memory API.
Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c
Add support for mapping and unmapping memory handles to virtual
address ranges.
This is part of patch series for Virtual Memory API.
Change-Id: If512d49ff4211e68f2064249add607a3200e458a
Add support for creating and releasing memory handles. Memory
handles are memory allocations on device memory without a virtual
address.
This is part of patch series for Virtual Memory API.
Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e
Add support for reserving virtual address ranges. Virtual address
ranges are addresses without any memory backing. These address ranges
need to be mapped to memory handles later.
This is part of patch series for Virtual Memory API.
Change-Id: I5d066e7421d6896f933f524312afc230a13d594e
Change initialize libdrm device and file descriptor initialization
to use new APIs from Thunk. Libdrm recommends that we re-use the same
file descriptor thoughout the life of a process instead of re-creating
new one each time.
This is part of patch series for Virtual Memory API.
Change-Id: I1c0b8d1bd660cd25478b5f94c84071b90d93fc6c
Checks whether version of libdrm library installed on current
system supports the amdgpu_device_get_fd API. This API is
required to support the virtual memory API functions. The
amdgpu_device_get_fd function was introduced in libdrm-2.4.109.
Using a runtime check test instead of static dependency to be
able to support previous APIs on older versions of libdrm.
Add query for virtual memory API support.
This is part of patch series for Virtual Memory API.
Change-Id: Iec831eb24b5d1689c392e50ae86f4d52d4870ac4
Add new query for recommended granularity size. This is the
internal blocksize used. While the existing query for granularity
size returns the minimum size possible, it is recommended that
allocations and mappings are multiple of the recommended granularity
size to minimise internal memory fragmentation.
This is part of patch series for Virtual Memory API.
Change-Id: Ia82c8f073b2a2c47ecd26fbb0aba27b8b7cd965f
The access type for extended scope fine grained memory was being returned as never
allowed by default
Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5
On gfx11, with a sequence such as
s_trap 2
s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
s_endpgm
the s_sendmsg does deallocate registers while the wave is supposed to be
stopped. As a result, the wave cannot do the expected context save
operations, and cannot context save.
To avoid this problem, park the wave in the trap handler for gfx11.
Note that gfx11 has implemented an instruction cache prefetch. When
parked, the prefetch tries to access memory past the end of trap handler
which causes memory violation exceptions to be reported. To avoid this,
we need to add padding at the end of the trap handler. The padding
consists of `s_code_end` instructions Given that the trap handler is
loaded at a 0x1000 aligned address the maximum prefetch amount (in
bytes) is given by `256 - (trap_handler_size % 64)`.
Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933
Thread yield doesn't drop the scoped acquired mutex so drop it around
yield to prevent a multithread deadlock.
Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a
Fix a condition where we can get a divide-by-zero in the
TranslateTime(tick) function if the GPU tick predates HSA
startup and we did not do a SyncClocks since initialization.
Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb
In ASAN builds, the compiler used is clang. The initialization of
variable sized array using assignment operator is causing compilation
failure in ASAN builds. Used memset to fix the same.
Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific
Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
This fixes a segfault error in cases where the linking order of
compilation unit varies. Reason behind the segfault is that one
global variable in one compilation unit depends on another global
variable in another compilation unit, but there is no guarantee that
this other compilation unit is initialized first. The fix forces a
reinitialization at the first invocation of the library.
Change-Id: I1428592c6898bca13a330c4588941de260ff0370
Unless SDMA blits have actually been used for copies, prevent the DMA
copy status from querying the blit's pending byte status to avoid
creating an unnecessary HW queue.
Change-Id: Ied1fbed73c08f0408f0e3583f9b56f2768c71708
Querying pending bytes on a blit kernel is unnecessary when runtime
runs out of SDMA resource since we are returning an SDMA availabilty
mask.
Change-Id: I347efba0c85b70ea3ba8749d76a499afc23909e8
Added HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXT_SCOPE_FINE_GRAINED flag to enable extended scope memory region
where the device-scope atomics act as system-scope atomics
Change-Id: I79fc3207cb630dfc68bed2f8aabd75f35fe80b12
Enable sleep for all waiters with event age tracking support kernel.
Change-Id: Icd4e1e8d83b4a54e9f6aaa99691a6573211b3337
Signed-off-by: James Zhu <James.Zhu@amd.com>
KFD kernel version 1.13 starts to support event age
tracking which help elimating unncessary busy wait.
Change-Id: Ib447ed6e0350f3110a4d6b9b80a0388000dd0e72
Signed-off-by: James Zhu <James.Zhu@amd.com>
Compiler behavior is undefined if the right operand is negative,
or greater than or equal to the width of the promoted left operand.
For release builds with address sanitizer enabled, this compiler
optimization behavior leads to unsupported queue size value since
current method shifts till 128 bits on a 64 bit value.
Change-Id: Iddcc15b43d2331bc8bf5fc3aa4725f76844655ec
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue
no support to find symbols from ".dynsym" section.
Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1
environment variable
Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a