Graf commitů

2959 Commity

Autor SHA1 Zpráva Datum
David Yat Sin ebc51dd0eb Revert "Add support for GC 11.5.0 and 11.5.1"
Reverting this as current mainline compiler branch does not support
gfx1150/gfx1151 yet. Will bring back later.

This reverts commit e877840197.

Change-Id: I31ff4fb2d5817538094a7ffaeba96dd6a7d660c7
2023-07-26 15:03:54 +00:00
Jonathan Kim aaab019960 libhsakmt: add debug trap thunk call for testing
Add generic thunk call for debug testing that assumes
caller populations trap arguments correctly.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I33a0bc66ca77e29f5b663d4bfe73f8684df8bfb6
2023-07-26 10:29:27 -04:00
Jonathan Kim 98c6784cc1 kfdtest: remove deprecated debug references
Remove all unused material from KFDDBGTest.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I13ed68656efadef7bbaf8bb737ce5a04829eca9b
2023-07-26 10:29:23 -04:00
Jonathan Kim 8471f80bac libhsakmt: remove old debugger versioning
Current debugger uses KFD version directly.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I212a53560a94dd24c599addce72f59c527c8af25
2023-07-26 09:41:38 -04:00
Philip Yang a395dd7306 kfdtest: KFDSVMEvictTest support large VRAM or small system memory
For xnack off, skip SVM evict tests if memory allocation size is larger
than 15/16 total system memory, because the test may fail to allocate
CWSR svm range to create queue after allocating test memory.

Limit eviction size from total VRAM size to 1/2 total VRAM size,
because for 192GB VRAM, evict 192GB may takes more than 120 seconds
and cause test timeout failed.

Change-Id: Ib1483b9aab580a8539187b2943cadea0fd5a7c71
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2023-07-25 11:11:55 -04:00
David Yat Sin 469defa78a Add agent query for nearest CPU agent
Add agent info query to return nearest CPU agent. This can be used to
determine which CPU agent is in the same NUMA region as the GPU agent.

Change-Id: I5400b4347ffbf4d2a836df31c4de443a38b0ecd1
2023-07-24 13:59:13 -04:00
Jonathan Kim 0d14144e3a Silence implicity conversion warnings in exception handling
Silence unnamed enum warning in error code comparison

Change-Id: I008b269c106bbad83a1f7588e7b4ec89ec17d37d
2023-07-24 10:06:55 -04:00
Jonathan Kim 42274cfc59 Fix out of order initializer for memory region
Silence out of order initializer compile warnings during memory region
initialization.

Change-Id: Idbbdd93d3ea8cda289d25a473b3882b920b2e8d8
2023-07-24 09:58:37 -04:00
Lang Yu e877840197 Add support for GC 11.5.0 and 11.5.1
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I3c4116e78a5c1ddac2389f5fece57485bdb17f68
2023-07-22 16:06:22 +08:00
Shweta Khatri a2d0adf9be Correct evaluating condition to use logical AND
Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical
AND operator (&&) when evaluating AQL packet type

Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70
2023-07-21 15:36:48 -04:00
David Yat Sin 687eb043d4 Add retain handle and get allocation properties
Support function to retain allocation handle for memory mappings.
The get allocation properties function will return the current
allocation properties for existing memory mappings.

This is part of patch series for Virtual Memory API.

Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551
2023-07-21 15:17:01 -04:00
David Yat Sin b03c96c264 Support exporting and importing memory mappings
Support exporting  and importing dmabuf file descriptors for memory
mappings. The exported dmabuf file descriptors are shareable posix
file descriptors that can be used for cross-vendor, cross-device
and cross-process memory sharing.

This is part of patch series for Virtual Memory API.

Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5
2023-07-21 15:17:01 -04:00
David Yat Sin 13fbd8a232 Support Get and Set access for memory mappings
Mapping memory handles to virtual memory addresses do not make them
accessible. The set access function is needed to make the memory
mappings accessible to specific agents. The get access function
returns current access properties for individual agents.

This is part of patch series for Virtual Memory API.

Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c
2023-07-21 15:17:01 -04:00
David Yat Sin 179dcf1c77 Support mapping and unmapping memory handles
Add support for mapping and unmapping memory handles to virtual
address ranges.

This is part of patch series for Virtual Memory API.

Change-Id: If512d49ff4211e68f2064249add607a3200e458a
2023-07-21 15:17:01 -04:00
David Yat Sin e4a84c4a9c Support memory handles
Add support for creating and releasing memory handles. Memory
handles are memory allocations on device memory without a virtual
address.

This is part of patch series for Virtual Memory API.

Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e
2023-07-21 15:17:01 -04:00
David Yat Sin 1085311f1a Support Virtual Address reservations
Add support for reserving virtual address ranges. Virtual address
ranges are addresses without any memory backing. These address ranges
need to be mapped to memory handles later.

This is part of patch series for Virtual Memory API.

Change-Id: I5d066e7421d6896f933f524312afc230a13d594e
2023-07-21 15:17:01 -04:00
David Yat Sin a55f11025b Change libdrm initialization
Change initialize libdrm device and file descriptor initialization
to use new APIs from Thunk. Libdrm recommends that we re-use the same
file descriptor thoughout the life of a process instead of re-creating
new one each time.

This is part of patch series for Virtual Memory API.

Change-Id: I1c0b8d1bd660cd25478b5f94c84071b90d93fc6c
2023-07-21 15:17:01 -04:00
David Yat Sin e65edb35fc Add check/query for virtual memory API support
Checks whether version of libdrm library installed on current
system supports the amdgpu_device_get_fd API. This API is
required to support the virtual memory API functions. The
amdgpu_device_get_fd function was introduced in libdrm-2.4.109.
Using a runtime check test instead of static dependency to be
able to support previous APIs on older versions of libdrm.
Add query for virtual memory API support.

This is part of patch series for Virtual Memory API.

Change-Id: Iec831eb24b5d1689c392e50ae86f4d52d4870ac4
2023-07-21 15:17:01 -04:00
David Yat Sin 3ebe1fdff9 Add query for recommended granularity size
Add new query for recommended granularity size. This is the
internal blocksize used. While the existing query for granularity
size returns the minimum size possible, it is recommended that
allocations and mappings are multiple of the recommended granularity
size to minimise internal memory fragmentation.

This is part of patch series for Virtual Memory API.

Change-Id: Ia82c8f073b2a2c47ecd26fbb0aba27b8b7cd965f
2023-07-21 15:17:01 -04:00
Kent Russell 1958224379 run_kfdtest.sh: Clarify parameters taking arguments
For --node and --exclude, these flags take arguments, but usage was
unclear. This led to attempts like --node=1 , which will not work
appropriately. Add examples for flags that take parameters, as well as
the requirements for those parameters. Also change --exclude parsing to
match --node parsing, for consistency

Change-Id: I563ba9b370a24d9a84b9c39093f3cb1a5d723cef
2023-07-21 10:53:31 -04:00
Jonathan Kim 2d3a09cbd6 kfdtest: disable gws tests for gfx11
GFX11 will no longer use GWS for cooperative launch so disable the test.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: I8611c8158e1654782150ad10f1f65edb578e6435
2023-07-20 11:22:56 -04:00
David Yat Sin a7ffddb265 Adding documentation for SDMA environment var
Adding documentation for modifiers for SDMA copy

Change-Id: I2425672c3ba1f1617d29b8f4b49776775d78a376
2023-07-20 15:15:04 +00:00
Shweta Khatri 82e7979c61 Fixes a bug that led to setting wrong access type for device local memory
The access type for extended scope fine grained memory was being returned as never
allowed by default

Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5
2023-07-17 14:52:01 -04:00
Lancelot SIX 2f2ba050f6 Park waves for gfx11 and bump abi version to 9
On gfx11, with a sequence such as

  s_trap 2
  s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
  s_endpgm

the s_sendmsg does deallocate registers while the wave is supposed to be
stopped.  As a result, the wave cannot do the expected context save
operations, and cannot context save.

To avoid this problem, park the wave in the trap handler for gfx11.

Note that gfx11 has implemented an instruction cache prefetch.  When
parked, the prefetch tries to access memory past the end of trap handler
which causes memory violation exceptions to be reported.  To avoid this,
we need to add padding at the end of the trap handler.  The padding
consists of `s_code_end` instructions  Given that the trap handler is
loaded at a 0x1000 aligned address the maximum prefetch amount (in
bytes) is given by `256 - (trap_handler_size % 64)`.

Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933
2023-07-15 09:44:50 -04:00
Jonathan Kim 70f0a44910 Release lock on thread yield during blit ops
Thread yield doesn't drop the scoped acquired mutex so drop it around
yield to prevent a multithread deadlock.

Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a
2023-07-14 10:44:56 -04:00
David Yat Sin bc585bd8de Force clock sync on profiling enablement
Fix a condition where we can get a divide-by-zero in the
TranslateTime(tick) function if the GPU tick predates HSA
startup and we did not do a SyncClocks since initialization.

Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb
2023-07-07 10:08:54 -04:00
Ranjith Ramakrishnan cd4632ccbc Use memset for initializing variable sized array
In ASAN builds, the compiler used is clang. The initialization of
variable sized array using assignment operator is causing compilation
failure in ASAN builds. Used memset to fix the same.

Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e
2023-07-07 12:54:54 +00:00
Jeremy Newton 132a19e9c3 Fix non-x86 builds
I've just reverted some code what it was in 5.5 by wrapping new x86
specific bits with #if's, e.g.:
- CPUID is x86 specific
- mwait is x86 specific

Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-30 01:04:04 -04:00
Ori Messinger 7cc3ffc115 kfdtest: Fix kfdtest.exclude "ReadOnlyRangeTest" Issue
The purpose of this patch is to fix an issue in kfdtest.exclude's
blacklist for KFDSVMRangeTest.ReadOnlyRangeTest.

Excluding "KFDSVMRangeTest.ReadOnlyRangeTest" without adding a "*"
to the end causes the test to still run, since after a recent patch
the test actually runs these two variants instead:
   -"KFDSVMRangeTest.ReadOnlyRangeTest/0"
   -"KFDSVMRangeTest.ReadOnlyRangeTest/1"
(For XNACK OFF/ON)

Now, the test is excluded as "KFDSVMRangeTest.ReadOnlyRangeTest*"
to cover those two XNACK ON/OFF variants.

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I067c4c99fe839ce6cec5d134bd605e8cb41b8291
2023-06-29 23:14:30 -04:00
Jeremy Newton d1f025bff6 Only install asan license when enabled
Change-Id: I7b2aad1042846401d7422ca499ef6912f49f6b50
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-29 10:20:16 -04:00
Philipp Knechtges d220e16000 fix link-time ordering condition
This fixes a segfault error in cases where the linking order of
compilation unit varies. Reason behind the segfault is that one
global variable in one compilation unit depends on another global
variable in another compilation unit, but there is no guarantee that
this other compilation unit is initialized first. The fix forces a
reinitialization at the first invocation of the library.

Change-Id: I1428592c6898bca13a330c4588941de260ff0370
2023-06-29 10:08:29 -04:00
Jeremy Newton 473a66d115 Don't install asan license if disabled
Change-Id: I8bffe5ec8496ff11e6d66995dd470cddb13f3c0d
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2023-06-29 09:34:49 -04:00
Alex Sierra 0cbf26c148 src: add debug API to support GPU core dump
Functions to API added to extract the following information from KFD
Runtime information, device info and queues snapshot.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: If995ecc54497ab61189bb0f209c64af0bbb0f56f
2023-06-26 18:58:15 +00:00
Alex Sierra 5e0a32d7b3 add hsaKmtGetRuntimeCapabilities API
Queries for runtime capabilities after its being enabled

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I098c0e9862c0c1d5e304b111cdc281c0ccd09691
2023-06-26 18:58:15 +00:00
Ori Messinger 3447b795df kfdtest: Fix gfx_target_version Parsing Issue
The purpose of this patch is to fix an issue in the run_kfdtest.sh
script's gfx_target_version parsing.

When the character length of the "gfx_target_version" value is
equal to 5 instead of 6, it will now be zero padded on the left to
allow each Major/Minor/Stepping value to be parsed correctly.

Also, kfdtest.exclude file now replaces the default filter for
aqua_vanjaram with the following 3 gfx filters:
gfx940, gfx941, & gfx942

Signed-off-by: Ori Messinger <Ori.Messinger@amd.com>
Change-Id: I1f0264d3705803f24ad3c458e6bd367fbbec62be
2023-06-23 13:18:05 -04:00
David Yat Sin 60a0fd64c4 Add query for driver gpu_id
Add query OS driver node ID (gpu_id)

Change-Id: I72ebc54d8ae5dbcd1346535912160a642b1065ae
2023-06-23 15:02:48 +00:00
Konstantin Zhuravlyov 8a6edb07d9 Cache referenced symbol table when pulling data in relocation section
Change-Id: I6ef21cedde1aca6fd1ec5e5d5634563f030eaab8
2023-06-21 16:35:45 -04:00
Jonathan Kim 92467fd282 Prevent unnecessary SDMA queue creation on copy on status
Unless SDMA blits have actually been used for copies, prevent the DMA
copy status from querying the blit's pending byte status to avoid
creating an unnecessary HW queue.

Change-Id: Ied1fbed73c08f0408f0e3583f9b56f2768c71708
2023-06-21 03:10:53 -04:00
Jonathan Kim 8c60f04a99 Prevent blit copy pending bytes query when out of SDMA resources
Querying pending bytes on a blit kernel is unnecessary when runtime
runs out of SDMA resource since we are returning an SDMA availabilty
mask.

Change-Id: I347efba0c85b70ea3ba8749d76a499afc23909e8
2023-06-21 03:10:52 -04:00
Shweta Khatri 77bf357647 Defined a new extended scope memory region
Added HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXT_SCOPE_FINE_GRAINED flag to enable extended scope memory region
where the device-scope atomics act as system-scope atomics

Change-Id: I79fc3207cb630dfc68bed2f8aabd75f35fe80b12
2023-06-20 11:00:05 -04:00
James Zhu 36666f5895 Enable sleep for all waiters
Enable sleep for all waiters with event age tracking support kernel.

Change-Id: Icd4e1e8d83b4a54e9f6aaa99691a6573211b3337
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-06-20 09:32:16 -04:00
James Zhu 5871b28503 Add kernel version flag supports event age
KFD kernel version 1.13 starts to support event age
tracking which help elimating unncessary busy wait.

Change-Id: Ib447ed6e0350f3110a4d6b9b80a0388000dd0e72
Signed-off-by: James Zhu <James.Zhu@amd.com>
2023-06-20 09:32:03 -04:00
Sreekant Somasekharan ea2f832a43 rocrtst: Fix RoundToPowerOf2 function
Compiler behavior is undefined if the right operand is negative,
or greater than or equal to the width of the promoted left operand.
For release builds with address sanitizer enabled, this compiler
optimization behavior leads to unsupported queue size value since
current method shifts till 128 bits on a 64 bit value.

Change-Id: Iddcc15b43d2331bc8bf5fc3aa4725f76844655ec
Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com>
2023-06-19 19:17:49 -04:00
Jonathan Kim 3e3e11bc5a Ensure HSA_ENABLE_SDMA=0 persists on new copy on engine API
Copy on engine API still needs to respect HSA_ENABLE_SDMA settings.

Change-Id: I26038b1e3082d62687c2e279615557583d20f229
2023-06-19 13:48:59 -04:00
raghavmedicherla 4142a77375 [hsa-runtime] Add support to hsa-runtime to find symbols from ".dynsym" section.
Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue
no support to find symbols from ".dynsym" section.

Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1
environment variable

Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a
2023-06-16 14:40:50 -04:00
Kent Russell 9a22bade89 kfdtest.exclude: Blacklist CuMaskingEven on all ASICs
This has slowly become less and less reliable on more and more ASICs,
so just blacklist it altogether. Using wall clock for performance
is not a reliable method for testing performance, so skip it to avoid
more failure reports on various systems.

Change-Id: I1a5744604e4620bc7675a629d146ba4ffba669d2
2023-06-15 11:24:04 -04:00
David Yat Sin 5e4490f180 Update documentation for IPC handles
Explicitly mention that IPC handles can only be created on GPU agents.

Change-Id: I19bc3578d6e5243c795bf6fbf981ea4bd3bfc2e8
2023-06-14 16:21:26 -04:00
Ruili Ji 9bf1cbe4ed kfdtest: Update COMPUTE_PGM_RSRC1 for software trap
If asics don't need software traps within GFX11 domain,
test with COMPUTE_PGM_RSRC1.PRIV = 1 will make system hang.

Change-Id: I00cf8eb6d6b07856885c77bd343ca3c41cc3cad5
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
2023-06-14 07:46:51 -04:00
Philip Yang 29b04c2534 kfdtest: Fix KFDSVMEvictTest.QueueTest OOM
Typo to calculate bufferSize from vramBufSizeInPages. The OOM shows up
only with HSA_XNACK=1 because HSA_XNACK=0 doesn't support VRAM
oversubscription. We changed to run SVM tests with both XNACK off and
on.

Change-Id: I3949959288fd92f4e7f4a87115a5f1547e225042
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
2023-06-13 21:15:31 -04:00
Jonathan Kim bfb94b3b6e Soften trap handler loading failure when exception handling not supported
GFX11 and up including some GFX9 devices will not support
old trap handling without the new exception handling.

Instead of a hard assert failure that runs into a core dump,
let ROCr initialization continue instead.

Change-Id: I309becdc72ef4fb2fafd118c1faf0801407e658e
2023-06-13 13:05:47 -04:00