커밋 그래프

1052 커밋

작성자 SHA1 메시지 날짜
David Yat Sin 08fc87ecba Use scope guards to release ref counts
Some negative tests can trigger C++ exceptions to be thrown, which
causes code to leave the ref counts in inconsistent state.

Change-Id: Ifa6d8be986941efcdf20d7ac8b86eb15a8fe9932


[ROCm/ROCR-Runtime commit: 06eefdeb1b]
2023-09-20 15:08:52 -04:00
David Yat Sin b060204498 Fix hsa_amd_vmem_get_access to accept offset pointers
Modify hsa_amd_vmem_get_access to handle pointers that are within VA
range of an existing memory mapping

Change-Id: I9f806ec39f6e9a33da8d86dd65d9a472438fa8ed


[ROCm/ROCR-Runtime commit: dd61f54171]
2023-09-20 14:03:37 -04:00
David Yat Sin 48cb2f5a9e Add query for Xnack enabled
Add system query for whether Xnack is enabled on a system.

Change-Id: I2832110e4f33f6a951d13acd06636442debf27ae


[ROCm/ROCR-Runtime commit: 22becfb1e8]
2023-09-19 00:25:30 +00:00
Jonathan Kim d04acccc26 Set correct overrides settings for GangLeader functions
Silence warnings on more stringent compile checks for lack of override
declaration.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Iaa54dfc3dd74f5ee55763cafbbcf2db73493bb21


[ROCm/ROCR-Runtime commit: 6b4365ae4c]
2023-09-12 15:56:34 -04:00
David Yat Sin 2052be1d1d Pre-allocate memory for 16K signals
On busy systems, the memory allocation can take long duration and
increase calls to hsa_signal_create/hsa_amd_signal_create. This
mitigates this issue.

Change-Id: Ib7640273262ebc3dbf1f07049ce5da10b1d6b158


[ROCm/ROCR-Runtime commit: 9a127193a8]
2023-09-11 13:08:28 -04:00
David Yat Sin 2a2555dd52 Update blit shaders for gfx94x
Change-Id: Ic8def71aa0c6ab9a9a758877a65ca6b5625e8f1e


[ROCm/ROCR-Runtime commit: 6ce1586def]
2023-09-08 09:43:31 -04:00
Shweta Khatri e2c5ecb8dc Use LLVM compiler to build blit shaders
Generates shader bytecode stream in amd_blit_shaders_v2.h at build time

Change-Id: I5228ec5442a78d074fd85ca9cd7f7a156dd84da3


[ROCm/ROCR-Runtime commit: 4e675ce730]
2023-09-08 09:42:29 -04:00
David Yat Sin 590cac0321 Fix clang compile warnings
Change-Id: Iea9afc3d998a6c5db28af6c7b54939960b11ae95


[ROCm/ROCR-Runtime commit: 3ee6c9b0e2]
2023-09-07 12:00:02 -04:00
David Yat Sin 3e286607ca Fix for always returning 64 for cacheline size
Change-Id: I0e31d306a2e051ecb9ac019c4e6f5efa25eabba0


[ROCm/ROCR-Runtime commit: 4770b210f6]
2023-08-31 13:50:49 +00:00
David Yat Sin 5b9dcfd0d8 Update interface version for virtual memory APIs
Change-Id: Ifbf1af08ee7aa4d55387ff9786f6a61b89b56f88


[ROCm/ROCR-Runtime commit: 1e7b078628]
2023-08-30 17:01:13 -04:00
David Yat Sin 4e46eded66 Increment HSA API table stepping on new APIs
Add compile time asserts to force incrementing API table STEP versions
each time a new function is added to each table. This is required for
profiler team to be able to add preprocessor macros to determine which
versions contain the new APIs.

Also incrementing the major versions to 2 to indicate new numbering
scheme.

Change-Id: I148a436a5ceab6be3906f8263b40ea9b07841577


[ROCm/ROCR-Runtime commit: 03f2f69d16]
2023-08-29 21:59:36 +00:00
Jonathan Kim 9e533f6664 Submit a minimum of 64 DWORDs for SDMA submissions for some GFX9 devices
Some GFX9 devices will drop commands if ring buffer submission is less
than 64 DWORDs.  Pad submission with a NOP head an trailing null
DWORDs in this case.

Change-Id: I850af490fb699f7efe8aef96d97c600a8e76516b


[ROCm/ROCR-Runtime commit: cdd0728d9b]
2023-08-23 13:36:29 -04:00
David Yat Sin 0637810752 Fix memory pool ALLOC_REC_GRANULE query
Also changed enum value to leave gap between enums that only exist in
hsa_region_info_t and enums that exist in both hsa_amd_memory_pool_info_t

Change-Id: I8f9f31200de66648e9328e4203ab283068c993f0


[ROCm/ROCR-Runtime commit: 4317f8dece]
2023-08-22 17:46:48 -04:00
David Yat Sin 777df5c6dc Fix flags passed to thunk for address reserve
Fix flags passed to thunk when reserving address only

Change-Id: Ic91d4c3393cc6a2b98e6bc5ed3575d40fa5e1424


[ROCm/ROCR-Runtime commit: 7be305b83c]
2023-08-22 14:01:49 -04:00
Jonathan Kim ad613e1644 Clean up SDMA ganging
We don't need to keep track of specific blit engines in gang for
submission anymore as ganging early exits on pending bytes.
So tidy up the fluff.

Change-Id: I77e80bf1ad8f561a03fff77bce33aa09d02760c6


[ROCm/ROCR-Runtime commit: 132815bcfb]
2023-08-22 05:57:04 -04:00
Jonathan Kim 704d9c5e19 Fix SDMA ganging circular deadlock in oversubscription
When oversubscribing SDMA gangs, a circular deadlock can occur since
gang enqueue is staggered with respect to SDMA engine leader based
on source to destination.
As a result, an enqueued leader may be waiting on a gang item that is
waiting on another enqueued leader or gang item and so on.

To prevent this, first lock the submission to ensure dma status query
and submissions are atomic.  Once this is in place, be more stringent
with ganging in that all SDMA engines must be available in order to gang.

Finally, re-enable SDMA ganging by default.

Change-Id: I4511e3487db9d26475b5aece4897f10168cc5322


[ROCm/ROCR-Runtime commit: 8f21793a3e]
2023-08-17 08:49:09 -04:00
Jonathan Kim 58d5f7354f Update D2D SDMA ganging for non-SPX modes
xGMI for compute partitioning in non-SPX modes does not have
a reported bandwith.
Fix it to at most 2 since each partition is either bounded
by the number of xGMI links or the number of available
SDMA contexts.

Change-Id: I09094bd7548d9eee6f039b0efe849838e5de166e


[ROCm/ROCR-Runtime commit: 4c74e47e91]
2023-08-17 07:25:08 -04:00
Jonathan Kim 2994cfa875 Bump the number of SDMA engines for gfx940
GFX940 can support up to 16 SDMA engines so bump it.

Change-Id: I41a95e66383036735712e317a57b239d84fcb78d


[ROCm/ROCR-Runtime commit: 30982ff6aa]
2023-08-17 07:25:08 -04:00
Jonathan Kim 64e0037743 Break when finding ganged agent
There's no need to keep looking in the list once we find a ganged agent.

Change-Id: Ia0b9b484c88221a7966a814456942c19b1741978


[ROCm/ROCR-Runtime commit: f8664e88e0]
2023-08-17 07:25:08 -04:00
David Yat Sin 38bec00960 Temporarily disable SDMA ganging by default
SDMA ganging is causing some regressions with some applications hanging.
Temporarily disabling SDMA ganging by default until issue is fixed.

Change-Id: I65e172923a53a967df27b30d969ad5d215c4fa09


[ROCm/ROCR-Runtime commit: a20a0a5bac]
2023-08-15 23:17:34 +00:00
David Yat Sin 5564790017 Revert "Adding documentation for SDMA environment var"
This reverts commit 56ccf828bc.

Replaced by commit 3b3f14c06e8a2fab717f0b82aba3c72d74bb9574.

Environment variables documented in:docs/environment_variables.md

Change-Id: I8da0d971eb98554b4bd1b884617a439f1b20ed5b


[ROCm/ROCR-Runtime commit: 93401e3c8c]
2023-08-10 09:55:42 -04:00
Ranjith Ramakrishnan 670217a201 Disable file reorg backward compatibility support by default
Change-Id: Ib53a4d0476ec598025d4f1f98414e0e425bb0e49


[ROCm/ROCR-Runtime commit: bb4756d2e0]
2023-08-07 09:38:12 -07:00
David Yat Sin 47b40068f6 Fix compile error when using clang
Fix compile error due to arithmetic on void*
Fix some compile warnings

Change-Id: I03ded438c5af77ba61c0a7017be5d4fe1e16c16c


[ROCm/ROCR-Runtime commit: 93aff0b439]
2023-07-31 18:29:19 +00:00
Jonathan Kim ae3b48d227 Enable D2D SDMA Ganging over xGMI
Use all available SDMA engines capped by xGMI bandwith for
all D2D copies within a hive.

By default, set the latency boundary copy size as 4KB and below.
Any copy size in within this boundary will not gang.

Avoid oversubscribing engines by not ganging on engines with
pending non-ganged work.

An enviroment variable HSA_ENABLE_SDMA_GANG has been provided
to override default ganging behaviour.

Change-Id: Iccde76aa1af1d47ea2a151789432c9db4f0ffa8d


[ROCm/ROCR-Runtime commit: 7df0167821]
2023-07-27 08:58:26 -04:00
Jonathan Kim 8d0be0c17f Silence parenthesis warnings in mem API
Fix KFD version checking parenthesis warnings on compile.

Change-Id: I89c46ea84a8d75b761d8c40ff62d008c7afbef2d


[ROCm/ROCR-Runtime commit: c5dbb93e59]
2023-07-26 16:14:40 -04:00
David Yat Sin a8e34eaec8 Revert "Add support for GC 11.5.0 and 11.5.1"
Reverting this as current mainline compiler branch does not support
gfx1150/gfx1151 yet. Will bring back later.

This reverts commit 75ce1848cf.

Change-Id: I31ff4fb2d5817538094a7ffaeba96dd6a7d660c7


[ROCm/ROCR-Runtime commit: ebc51dd0eb]
2023-07-26 15:03:54 +00:00
David Yat Sin 2cfb82a1a6 Add agent query for nearest CPU agent
Add agent info query to return nearest CPU agent. This can be used to
determine which CPU agent is in the same NUMA region as the GPU agent.

Change-Id: I5400b4347ffbf4d2a836df31c4de443a38b0ecd1


[ROCm/ROCR-Runtime commit: 469defa78a]
2023-07-24 13:59:13 -04:00
Jonathan Kim 8e16b26347 Silence implicity conversion warnings in exception handling
Silence unnamed enum warning in error code comparison

Change-Id: I008b269c106bbad83a1f7588e7b4ec89ec17d37d


[ROCm/ROCR-Runtime commit: 0d14144e3a]
2023-07-24 10:06:55 -04:00
Jonathan Kim a29b3ab868 Fix out of order initializer for memory region
Silence out of order initializer compile warnings during memory region
initialization.

Change-Id: Idbbdd93d3ea8cda289d25a473b3882b920b2e8d8


[ROCm/ROCR-Runtime commit: 42274cfc59]
2023-07-24 09:58:37 -04:00
Lang Yu 75ce1848cf Add support for GC 11.5.0 and 11.5.1
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I3c4116e78a5c1ddac2389f5fece57485bdb17f68


[ROCm/ROCR-Runtime commit: e877840197]
2023-07-22 16:06:22 +08:00
Shweta Khatri 46643a8ec7 Correct evaluating condition to use logical AND
Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical
AND operator (&&) when evaluating AQL packet type

Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70


[ROCm/ROCR-Runtime commit: a2d0adf9be]
2023-07-21 15:36:48 -04:00
David Yat Sin bf41567189 Add retain handle and get allocation properties
Support function to retain allocation handle for memory mappings.
The get allocation properties function will return the current
allocation properties for existing memory mappings.

This is part of patch series for Virtual Memory API.

Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551


[ROCm/ROCR-Runtime commit: 687eb043d4]
2023-07-21 15:17:01 -04:00
David Yat Sin 0bcc573ed7 Support exporting and importing memory mappings
Support exporting  and importing dmabuf file descriptors for memory
mappings. The exported dmabuf file descriptors are shareable posix
file descriptors that can be used for cross-vendor, cross-device
and cross-process memory sharing.

This is part of patch series for Virtual Memory API.

Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5


[ROCm/ROCR-Runtime commit: b03c96c264]
2023-07-21 15:17:01 -04:00
David Yat Sin 933aac4cda Support Get and Set access for memory mappings
Mapping memory handles to virtual memory addresses do not make them
accessible. The set access function is needed to make the memory
mappings accessible to specific agents. The get access function
returns current access properties for individual agents.

This is part of patch series for Virtual Memory API.

Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c


[ROCm/ROCR-Runtime commit: 13fbd8a232]
2023-07-21 15:17:01 -04:00
David Yat Sin 203934445a Support mapping and unmapping memory handles
Add support for mapping and unmapping memory handles to virtual
address ranges.

This is part of patch series for Virtual Memory API.

Change-Id: If512d49ff4211e68f2064249add607a3200e458a


[ROCm/ROCR-Runtime commit: 179dcf1c77]
2023-07-21 15:17:01 -04:00
David Yat Sin 15aa42edb5 Support memory handles
Add support for creating and releasing memory handles. Memory
handles are memory allocations on device memory without a virtual
address.

This is part of patch series for Virtual Memory API.

Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e


[ROCm/ROCR-Runtime commit: e4a84c4a9c]
2023-07-21 15:17:01 -04:00
David Yat Sin b219d0224d Support Virtual Address reservations
Add support for reserving virtual address ranges. Virtual address
ranges are addresses without any memory backing. These address ranges
need to be mapped to memory handles later.

This is part of patch series for Virtual Memory API.

Change-Id: I5d066e7421d6896f933f524312afc230a13d594e


[ROCm/ROCR-Runtime commit: 1085311f1a]
2023-07-21 15:17:01 -04:00
David Yat Sin 667ed434fb Change libdrm initialization
Change initialize libdrm device and file descriptor initialization
to use new APIs from Thunk. Libdrm recommends that we re-use the same
file descriptor thoughout the life of a process instead of re-creating
new one each time.

This is part of patch series for Virtual Memory API.

Change-Id: I1c0b8d1bd660cd25478b5f94c84071b90d93fc6c


[ROCm/ROCR-Runtime commit: a55f11025b]
2023-07-21 15:17:01 -04:00
David Yat Sin 2a5f4263a8 Add check/query for virtual memory API support
Checks whether version of libdrm library installed on current
system supports the amdgpu_device_get_fd API. This API is
required to support the virtual memory API functions. The
amdgpu_device_get_fd function was introduced in libdrm-2.4.109.
Using a runtime check test instead of static dependency to be
able to support previous APIs on older versions of libdrm.
Add query for virtual memory API support.

This is part of patch series for Virtual Memory API.

Change-Id: Iec831eb24b5d1689c392e50ae86f4d52d4870ac4


[ROCm/ROCR-Runtime commit: e65edb35fc]
2023-07-21 15:17:01 -04:00
David Yat Sin 2be68e2502 Add query for recommended granularity size
Add new query for recommended granularity size. This is the
internal blocksize used. While the existing query for granularity
size returns the minimum size possible, it is recommended that
allocations and mappings are multiple of the recommended granularity
size to minimise internal memory fragmentation.

This is part of patch series for Virtual Memory API.

Change-Id: Ia82c8f073b2a2c47ecd26fbb0aba27b8b7cd965f


[ROCm/ROCR-Runtime commit: 3ebe1fdff9]
2023-07-21 15:17:01 -04:00
David Yat Sin 56ccf828bc Adding documentation for SDMA environment var
Adding documentation for modifiers for SDMA copy

Change-Id: I2425672c3ba1f1617d29b8f4b49776775d78a376


[ROCm/ROCR-Runtime commit: a7ffddb265]
2023-07-20 15:15:04 +00:00
Shweta Khatri 9fda38f0ba Fixes a bug that led to setting wrong access type for device local memory
The access type for extended scope fine grained memory was being returned as never
allowed by default

Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5


[ROCm/ROCR-Runtime commit: 82e7979c61]
2023-07-17 14:52:01 -04:00
Lancelot SIX 09589e5929 Park waves for gfx11 and bump abi version to 9
On gfx11, with a sequence such as

  s_trap 2
  s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
  s_endpgm

the s_sendmsg does deallocate registers while the wave is supposed to be
stopped.  As a result, the wave cannot do the expected context save
operations, and cannot context save.

To avoid this problem, park the wave in the trap handler for gfx11.

Note that gfx11 has implemented an instruction cache prefetch.  When
parked, the prefetch tries to access memory past the end of trap handler
which causes memory violation exceptions to be reported.  To avoid this,
we need to add padding at the end of the trap handler.  The padding
consists of `s_code_end` instructions  Given that the trap handler is
loaded at a 0x1000 aligned address the maximum prefetch amount (in
bytes) is given by `256 - (trap_handler_size % 64)`.

Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933


[ROCm/ROCR-Runtime commit: 2f2ba050f6]
2023-07-15 09:44:50 -04:00
Jonathan Kim babe58eb24 Release lock on thread yield during blit ops
Thread yield doesn't drop the scoped acquired mutex so drop it around
yield to prevent a multithread deadlock.

Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a


[ROCm/ROCR-Runtime commit: 70f0a44910]
2023-07-14 10:44:56 -04:00
David Yat Sin b434d15a27 Force clock sync on profiling enablement
Fix a condition where we can get a divide-by-zero in the
TranslateTime(tick) function if the GPU tick predates HSA
startup and we did not do a SyncClocks since initialization.

Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb


[ROCm/ROCR-Runtime commit: bc585bd8de]
2023-07-07 10:08:54 -04:00
Ranjith Ramakrishnan a4d9fa592d Use memset for initializing variable sized array
In ASAN builds, the compiler used is clang. The initialization of
variable sized array using assignment operator is causing compilation
failure in ASAN builds. Used memset to fix the same.

Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e


[ROCm/ROCR-Runtime commit: cd4632ccbc]
2023-07-07 12:54:54 +00:00
Jeremy Newton b3f22fef0a Fix non-x86 builds
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific

Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: 132a19e9c3]
2023-06-30 01:04:04 -04:00
Jeremy Newton e80bd7f5b0 Only install asan license when enabled
Change-Id: I7b2aad1042846401d7422ca499ef6912f49f6b50
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>


[ROCm/ROCR-Runtime commit: d1f025bff6]
2023-06-29 10:20:16 -04:00
Philipp Knechtges 4a7c3a2607 fix link-time ordering condition
This fixes a segfault error in cases where the linking order of
compilation unit varies. Reason behind the segfault is that one
global variable in one compilation unit depends on another global
variable in another compilation unit, but there is no guarantee that
this other compilation unit is initialized first. The fix forces a
reinitialization at the first invocation of the library.

Change-Id: I1428592c6898bca13a330c4588941de260ff0370


[ROCm/ROCR-Runtime commit: d220e16000]
2023-06-29 10:08:29 -04:00
David Yat Sin 175265aef4 Add query for driver gpu_id
Add query OS driver node ID (gpu_id)

Change-Id: I72ebc54d8ae5dbcd1346535912160a642b1065ae


[ROCm/ROCR-Runtime commit: 60a0fd64c4]
2023-06-23 15:02:48 +00:00