When oversubscribing SDMA gangs, a circular deadlock can occur since
gang enqueue is staggered with respect to SDMA engine leader based
on source to destination.
As a result, an enqueued leader may be waiting on a gang item that is
waiting on another enqueued leader or gang item and so on.
To prevent this, first lock the submission to ensure dma status query
and submissions are atomic. Once this is in place, be more stringent
with ganging in that all SDMA engines must be available in order to gang.
Finally, re-enable SDMA ganging by default.
Change-Id: I4511e3487db9d26475b5aece4897f10168cc5322
[ROCm/ROCR-Runtime commit: 8f21793a3e]
xGMI for compute partitioning in non-SPX modes does not have
a reported bandwith.
Fix it to at most 2 since each partition is either bounded
by the number of xGMI links or the number of available
SDMA contexts.
Change-Id: I09094bd7548d9eee6f039b0efe849838e5de166e
[ROCm/ROCR-Runtime commit: 4c74e47e91]
There's no need to keep looking in the list once we find a ganged agent.
Change-Id: Ia0b9b484c88221a7966a814456942c19b1741978
[ROCm/ROCR-Runtime commit: f8664e88e0]
SDMA ganging is causing some regressions with some applications hanging.
Temporarily disabling SDMA ganging by default until issue is fixed.
Change-Id: I65e172923a53a967df27b30d969ad5d215c4fa09
[ROCm/ROCR-Runtime commit: a20a0a5bac]
Add queue suspend and resume test.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I2ade721026cbb458a3597b7858a164e70fe05f4f
[ROCm/ROCR-Runtime commit: d20f0bbb90]
Add queue and devices snapshot operations.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I836884c9f3b65dd9e5e444d554d3eb87938e1634
[ROCm/ROCR-Runtime commit: b0e84183c1]
Add base debug operations to suspend and resume queues.
Routine will return the number of queues successfully
suspended or resumed.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I8f18317f70464b04231c5cf822e11d545ebfa02a
[ROCm/ROCR-Runtime commit: 5a675921ea]
Check that a jump to trap event can be picked up by the debugger.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Iad5f87092f2b82d5018013bba548979122a9bd02
[ROCm/ROCR-Runtime commit: b77189cf83]
Add set exceptions enabled debug option
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I6ee1769bbbb90a74074d8100974c4bfeabaf7f2c
[ROCm/ROCR-Runtime commit: 97fc25bb8d]
Add debug attach and runtime enable test for attaching to a spawned and
running process.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I72302ff73494d9dae0c79a299508085d7ca0552b
[ROCm/ROCR-Runtime commit: 097ee967d1]
Add base debug class and attach/detach operations.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I60f3c166646f05838fec208ac2f59bba998c63f8
[ROCm/ROCR-Runtime commit: dd56b38c2f]
Even if the version of libdrm older and does not support the
amdgpu_device_get_fd function, the device_handle stored in
amdgpu_handle[] is still valid and can be returned via
hsaKmtGetAMDGPUDeviceHandle.
Change-Id: I024a3e82e6cfebac5577aefe359b067746c4023e
[ROCm/ROCR-Runtime commit: 66b66e42cd]
Fix compile error due to arithmetic on void*
Fix some compile warnings
Change-Id: I03ded438c5af77ba61c0a7017be5d4fe1e16c16c
[ROCm/ROCR-Runtime commit: 93aff0b439]
Use all available SDMA engines capped by xGMI bandwith for
all D2D copies within a hive.
By default, set the latency boundary copy size as 4KB and below.
Any copy size in within this boundary will not gang.
Avoid oversubscribing engines by not ganging on engines with
pending non-ganged work.
An enviroment variable HSA_ENABLE_SDMA_GANG has been provided
to override default ganging behaviour.
Change-Id: Iccde76aa1af1d47ea2a151789432c9db4f0ffa8d
[ROCm/ROCR-Runtime commit: 7df0167821]
Reverting this as current mainline compiler branch does not support
gfx1150/gfx1151 yet. Will bring back later.
This reverts commit 75ce1848cf.
Change-Id: I31ff4fb2d5817538094a7ffaeba96dd6a7d660c7
[ROCm/ROCR-Runtime commit: ebc51dd0eb]
Remove all unused material from KFDDBGTest.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I13ed68656efadef7bbaf8bb737ce5a04829eca9b
[ROCm/ROCR-Runtime commit: 98c6784cc1]
Current debugger uses KFD version directly.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I212a53560a94dd24c599addce72f59c527c8af25
[ROCm/ROCR-Runtime commit: 8471f80bac]
For xnack off, skip SVM evict tests if memory allocation size is larger
than 15/16 total system memory, because the test may fail to allocate
CWSR svm range to create queue after allocating test memory.
Limit eviction size from total VRAM size to 1/2 total VRAM size,
because for 192GB VRAM, evict 192GB may takes more than 120 seconds
and cause test timeout failed.
Change-Id: Ib1483b9aab580a8539187b2943cadea0fd5a7c71
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
[ROCm/ROCR-Runtime commit: a395dd7306]
Add agent info query to return nearest CPU agent. This can be used to
determine which CPU agent is in the same NUMA region as the GPU agent.
Change-Id: I5400b4347ffbf4d2a836df31c4de443a38b0ecd1
[ROCm/ROCR-Runtime commit: 469defa78a]
Silence out of order initializer compile warnings during memory region
initialization.
Change-Id: Idbbdd93d3ea8cda289d25a473b3882b920b2e8d8
[ROCm/ROCR-Runtime commit: 42274cfc59]
Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical
AND operator (&&) when evaluating AQL packet type
Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70
[ROCm/ROCR-Runtime commit: a2d0adf9be]
Support function to retain allocation handle for memory mappings.
The get allocation properties function will return the current
allocation properties for existing memory mappings.
This is part of patch series for Virtual Memory API.
Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551
[ROCm/ROCR-Runtime commit: 687eb043d4]
Support exporting and importing dmabuf file descriptors for memory
mappings. The exported dmabuf file descriptors are shareable posix
file descriptors that can be used for cross-vendor, cross-device
and cross-process memory sharing.
This is part of patch series for Virtual Memory API.
Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5
[ROCm/ROCR-Runtime commit: b03c96c264]
Mapping memory handles to virtual memory addresses do not make them
accessible. The set access function is needed to make the memory
mappings accessible to specific agents. The get access function
returns current access properties for individual agents.
This is part of patch series for Virtual Memory API.
Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c
[ROCm/ROCR-Runtime commit: 13fbd8a232]
Add support for mapping and unmapping memory handles to virtual
address ranges.
This is part of patch series for Virtual Memory API.
Change-Id: If512d49ff4211e68f2064249add607a3200e458a
[ROCm/ROCR-Runtime commit: 179dcf1c77]
Add support for creating and releasing memory handles. Memory
handles are memory allocations on device memory without a virtual
address.
This is part of patch series for Virtual Memory API.
Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e
[ROCm/ROCR-Runtime commit: e4a84c4a9c]
Add support for reserving virtual address ranges. Virtual address
ranges are addresses without any memory backing. These address ranges
need to be mapped to memory handles later.
This is part of patch series for Virtual Memory API.
Change-Id: I5d066e7421d6896f933f524312afc230a13d594e
[ROCm/ROCR-Runtime commit: 1085311f1a]
Change initialize libdrm device and file descriptor initialization
to use new APIs from Thunk. Libdrm recommends that we re-use the same
file descriptor thoughout the life of a process instead of re-creating
new one each time.
This is part of patch series for Virtual Memory API.
Change-Id: I1c0b8d1bd660cd25478b5f94c84071b90d93fc6c
[ROCm/ROCR-Runtime commit: a55f11025b]
Checks whether version of libdrm library installed on current
system supports the amdgpu_device_get_fd API. This API is
required to support the virtual memory API functions. The
amdgpu_device_get_fd function was introduced in libdrm-2.4.109.
Using a runtime check test instead of static dependency to be
able to support previous APIs on older versions of libdrm.
Add query for virtual memory API support.
This is part of patch series for Virtual Memory API.
Change-Id: Iec831eb24b5d1689c392e50ae86f4d52d4870ac4
[ROCm/ROCR-Runtime commit: e65edb35fc]
Add new query for recommended granularity size. This is the
internal blocksize used. While the existing query for granularity
size returns the minimum size possible, it is recommended that
allocations and mappings are multiple of the recommended granularity
size to minimise internal memory fragmentation.
This is part of patch series for Virtual Memory API.
Change-Id: Ia82c8f073b2a2c47ecd26fbb0aba27b8b7cd965f
[ROCm/ROCR-Runtime commit: 3ebe1fdff9]
For --node and --exclude, these flags take arguments, but usage was
unclear. This led to attempts like --node=1 , which will not work
appropriately. Add examples for flags that take parameters, as well as
the requirements for those parameters. Also change --exclude parsing to
match --node parsing, for consistency
Change-Id: I563ba9b370a24d9a84b9c39093f3cb1a5d723cef
[ROCm/ROCR-Runtime commit: 1958224379]
GFX11 will no longer use GWS for cooperative launch so disable the test.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I8611c8158e1654782150ad10f1f65edb578e6435
[ROCm/ROCR-Runtime commit: 2d3a09cbd6]
The access type for extended scope fine grained memory was being returned as never
allowed by default
Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5
[ROCm/ROCR-Runtime commit: 82e7979c61]
On gfx11, with a sequence such as
s_trap 2
s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
s_endpgm
the s_sendmsg does deallocate registers while the wave is supposed to be
stopped. As a result, the wave cannot do the expected context save
operations, and cannot context save.
To avoid this problem, park the wave in the trap handler for gfx11.
Note that gfx11 has implemented an instruction cache prefetch. When
parked, the prefetch tries to access memory past the end of trap handler
which causes memory violation exceptions to be reported. To avoid this,
we need to add padding at the end of the trap handler. The padding
consists of `s_code_end` instructions Given that the trap handler is
loaded at a 0x1000 aligned address the maximum prefetch amount (in
bytes) is given by `256 - (trap_handler_size % 64)`.
Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933
[ROCm/ROCR-Runtime commit: 2f2ba050f6]
Thread yield doesn't drop the scoped acquired mutex so drop it around
yield to prevent a multithread deadlock.
Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a
[ROCm/ROCR-Runtime commit: 70f0a44910]
Fix a condition where we can get a divide-by-zero in the
TranslateTime(tick) function if the GPU tick predates HSA
startup and we did not do a SyncClocks since initialization.
Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb
[ROCm/ROCR-Runtime commit: bc585bd8de]
In ASAN builds, the compiler used is clang. The initialization of
variable sized array using assignment operator is causing compilation
failure in ASAN builds. Used memset to fix the same.
Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e
[ROCm/ROCR-Runtime commit: cd4632ccbc]
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific
Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
[ROCm/ROCR-Runtime commit: 132a19e9c3]
The purpose of this patch is to fix an issue in kfdtest.exclude's
blacklist for KFDSVMRangeTest.ReadOnlyRangeTest.
Excluding "KFDSVMRangeTest.ReadOnlyRangeTest" without adding a "*"
to the end causes the test to still run, since after a recent patch
the test actually runs these two variants instead:
-"KFDSVMRangeTest.ReadOnlyRangeTest/0"
-"KFDSVMRangeTest.ReadOnlyRangeTest/1"
(For XNACK OFF/ON)
Now, the test is excluded as "KFDSVMRangeTest.ReadOnlyRangeTest*"
to cover those two XNACK ON/OFF variants.
Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I067c4c99fe839ce6cec5d134bd605e8cb41b8291
[ROCm/ROCR-Runtime commit: 7cc3ffc115]