Commit-Graf

2930 Incheckningar

Upphovsman SHA1 Meddelande Datum
Stella Laurenzo 4fbec4e774 Link CMAKE_DL_LIBS.
Was failing to link on AlmaLinux8.

Change-Id: Id7df245f1063c2bebd0f07efc352f1b9017eda0e
Signed-off-by: Stella Laurenzo <stellaraccident@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>


[ROCm/ROCR-Runtime commit: 7c10e1e4f5]
2024-03-13 09:18:03 -04:00
pvanhout 8e43aaab04 [libamdhsacode] Support COV6/Generic Targets
Change-Id: I4680577eb56dc436fbc134b169f172dd476bff37


[ROCm/ROCR-Runtime commit: a93c18dc90]
2024-03-12 07:37:32 -04:00
Jonathan R. Madsen 64d380d125 Add hsa_api_trace_version.h
- hsa_api_trace.h contains C++
- rocprofiler-sdk needs to include the table version number defines (*_MAJOR_VERSION and *_STEP_VERSION) for the HSA API in it's public headers
- rocprofiler-sdk needs it's public headers to be C-compatible so hsa_api_trace_version.h was created

Change-Id: Ieece990b3b7775cb0446b545c9e3391c5f691c61


[ROCm/ROCR-Runtime commit: 5402842d5f]
2024-03-12 01:17:34 -04:00
Sv. Lockal 760647dd19 Fix compilation on musl-based systems
This allows to build ROCT-Thunk-Interface for Alpine Linux, Gentoo with musl profile and so on.

List of changes:
* Fix redefinition of PAGE_SIZE from limits.h
* Use NAME_MAX from limits.h

Closes #65

Change-Id: Ibdb0ef5668a07b7b403fcc4a44cd2658e00a584a
Signed-off-by: Sv. Lockal <lockalsash@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>


[ROCm/ROCR-Runtime commit: 9a89997b5f]
2024-03-11 10:46:55 -04:00
Jonathan Kim 5cfa60e03e Fix deferred dmabuf export on IPC due to GEM object loss
When deferring a dmabuf export on an import call, there may be a
failure to export as the GEM object is not referenced by the kernel
mode driver.  To get around this, do a non-deferred export and
immediately close the dmabuf FD to keep FD creation to a minimum.
This way, the GEM object will have a kernel mode driver reference
when a deferred export is done.

Also a bad dmabuf FD sent over a socket may not be received by an import
reader and this can cause a hang.
Set a 10 second timer so that importer is not blocking indefinitely.

Change-Id: I11a9b5ec64aa2e16fd6aecdf46c34e4eb56ccfd0


[ROCm/ROCR-Runtime commit: eb2100daad]
2024-03-07 12:12:06 -05:00
Alex Sierra 7721aadf66 core dump: Generates a core dump from a fault event
Extracts and creates a core dump ELF file from a fault event, using
core dump front end. GFX11 is not supported.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I5ae154e886f39ab3ce7bbae5803efb27a96c7e2e


[ROCm/ROCR-Runtime commit: cbeddf9eb6]
2024-03-05 09:28:44 -05:00
Lancelot SIX 7f763d499a trap_handler: Set status.skip_export when halting a wave
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.

This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture).  It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.

Change-Id: I5c4148c78062f7ffa049ac7856c2edc82dbc77d1


[ROCm/ROCR-Runtime commit: 5d3f6a63f1]
2024-03-05 09:28:33 -05:00
Jonathan Kim f9a6578b6b Disable SDMA ganging on non-APU multi-partition modes
Work around SDMA hang in non-SPX modes for non-APU devices by disabling
ganging.
Root cause of hang not found.
non-APU xGMI modes have only 1 link between socket devices anyways so
there's likely no real system level gain in ganging intra-socket.

Change-Id: Ia4eda2f85cbf25151d3dbcf50cc45b8b775c60e2


[ROCm/ROCR-Runtime commit: ed462035fa]
2024-02-28 14:52:01 -05:00
Jonathan Kim 1e46d28ee6 Fix gang item wait on dependency signals
Gang items have to wait on dependency signals as well as the leader.
Copies should not start if shaders are still operating on memory
to be copied.

Change-Id: I99703b420045ebcba2c9da39ec64678129dc140f


[ROCm/ROCR-Runtime commit: ed260ea970]
2024-02-27 12:45:41 -05:00
Harish Kasiviswanathan 2f05682f29 libhsakmt: Associate correct GPU with queue memory
Pass the correct gpu_id to KFD for system memory that is allocated for
the queue and eop buffer

Change-Id: I43bb6333560a7d9d38293c191303161ab1443b5d
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 341ecaf1d9]
2024-02-26 14:20:16 -05:00
Shweta Khatri a0ebf06e6e Record interop mapped object in allocation_map_
This allows the VA to be recorded in ROCr so that they are not
treated as an invalid pointer in future API calls.

Change-Id: I8d1d8ef9816a984c89d30a2179b0ce8940fef1da


[ROCm/ROCR-Runtime commit: f2006d6899]
2024-02-26 13:40:55 -05:00
Harish Kasiviswanathan de1c7daba7 libhsakmt: Move global zfb_support to globals.c
zfb_support needs to accessed from multiple places, so move to globals.c
file

Change-Id: I40b487c26a13e7cc6fc01b671d6166e7114e02d2
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 191caf46ac]
2024-02-26 09:08:11 -05:00
Harish Kasiviswanathan aeddcb8156 libhsakmt: Use correct gpu_id for GPU system memory
For GTT memory allocation if GPU is provided honour it.

Change-Id: Iea9a26bc44cd3daa2337845f53dc430787b0643b
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>


[ROCm/ROCR-Runtime commit: 858dfd364f]
2024-02-25 10:45:36 -05:00
Jonathan R. Madsen c85e1dc4cd Update rocprofiler-register support
- add rocprofiler-register to CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS when found
- add rocprofiler-register to CPACK_RPM_BINARY_PACKAGE_REQUIRES when found
- remove report_tool_load_failures_explicit_
- add HSA_TOOLS_DISABLE_REGISTER flag
- add HSA_TOOLS_REPORT_REGISTER_FAILURE
- use HSA_TOOLS_REPORT_REGISTER_FAILURE instead of HSA_TOOLS_REPORT_LOAD_FAILURE
- changed rocprofiler-register message to not include the word "error"

Change-Id: Ib7fd7f14c42758a54c347874018281bb1b5477a6


[ROCm/ROCR-Runtime commit: 7ce263b0e4]
2024-02-22 11:55:25 -05:00
Shweta Khatri 7b3dbd6d9e Avoid releasing scratch for blit queues
At hsa_shutdown(), scratch_lock_ may be gone. Blit queues don't need it.

Change-Id: Ic132ac8a6be31fb2f0623137115608b0b222f077


[ROCm/ROCR-Runtime commit: 24633c7a85]
2024-02-22 14:12:05 +00:00
Jonathan Kim 18d556cba7 Fix export-close race during IPC attach request
If two attach requests to the same piece of shared memory occur,
a double export or premature dmabuf fd close can occur since the export
and close on demand calls are not atomic.

Use a reference counter on shared memory dmabuf FDs that have
already been opened to avoid this problem.

Change-Id: I14a59209c0385e32582af42a57b33b1c6838a9b1


[ROCm/ROCR-Runtime commit: 1f63ea3476]
2024-02-22 14:12:05 +00:00
David Yat Sin 5288e97de6 rocrtst: Add non-contiguous VMM map tests
Add rocrtst to test mapping non-contiguous memory to a
single VA range

Change-Id: Id2e57f83512f8b482456b2b1925586951ada7400


[ROCm/ROCR-Runtime commit: b77ade9c64]
2024-02-22 14:12:05 +00:00
David Yat Sin a4003bb849 rocrtst: Add test for GPU access to memory
Add test to verify whether GPU shaders can read memory created using VMM
APIs.
Split VMM rocrtst to two separate groups: Basic and Access tests

Change-Id: Iead8d46125580c71ccd582e967c8e2e891e75c5e


[ROCm/ROCR-Runtime commit: 99e31e43aa]
2024-02-22 14:12:05 +00:00
David Yat Sin 1e44ddc349 Fix compile error when using clang
Change-Id: Ibacf094934a9b489c052a18eeb6b26639aba3032


[ROCm/ROCR-Runtime commit: 1f50219634]
2024-02-22 14:12:05 +00:00
David Yat Sin 7a6c962b36 Fix compile error on certain gcc versions
Change-Id: I8a4fab76d1dcc576eb7706ab45fc786c0cab274a


[ROCm/ROCR-Runtime commit: 5b28a1bc17]
2024-02-13 15:25:34 -05:00
David Yat Sin 65497f8f2c Use sysconf pagesize for system pagesize
Provided by user huanggyizhi on github
https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/124

Change-Id: Ia03c45f7a869ae2c804accf8163f8ae36c20dd5a


[ROCm/ROCR-Runtime commit: ae16b3e14e]
2024-02-13 14:28:10 -05:00
Joseph Huber 3f872d8c97 Add executable symbol info for the wavefront size
The wavefront size is currently only exposed as an agent level
attribute. This is not correctyl, because while the agent has a default
wave front size that is usually correct, it can easily be overridden via
options like -mwavefrontsize64 on various ISAs. The wavefrontsize
attribute is actually more of a calling convention that is consistent
within a callgraph. Because the root of each call graph is a kernel in
this architecture, we need to be able to query this on a per-kernel
basis. This information is already avialable in the kernel descriptor
packet, but it wasn't exported.

This patch adds HSA_CODE_SYMBOL_INFO_KERNEL_WAVEFRONT_SIZE as a new
option to query on the executable symbol.

Change-Id: I744815c89cc9d4c82f25479bdd48ae1f32e859ff


[ROCm/ROCR-Runtime commit: 9e26cbac14]
2024-02-09 15:55:30 +00:00
Jonathan Kim 8ac93cff2e Minimize FD creation on IPC Create
Instead of caching shared memory fds for export on the exporter side,
only export the FD in the async handler when requested.
The importer should request export fd closure once import is done.

Change-Id: I469e0cd1749beeb9c506c8a6461745fb039d9c3b


[ROCm/ROCR-Runtime commit: e911335cee]
2024-02-07 18:50:54 -05:00
Mythreya 8c4d3fc62f Fix ToolsApiTable versioning
ToolsApiTable's version was incorrectly default initialized to 0.
Fixes error in commit fc889669

Change-Id: I41e9301a9c33b119ee50f6164d21ddf11dc188c4


[ROCm/ROCR-Runtime commit: 8e312471dc]
2024-02-07 17:02:32 -05:00
Shweta Khatri 7a310f2e91 Set max_alloc to 95%,reduce by 1% on fail
Prevents OOM-Killer trigger,if all physical and swap mem gets fully used

Change-Id: I70d558fa9c06fe6217e62d57e11aec6a089aa0bb


[ROCm/ROCR-Runtime commit: 13800cc6d5]
2024-02-07 14:46:58 -05:00
David Yat Sin 1f38927b0d VMM: Allow non-contiguous memory maps
Adjust code to allow the use of non-contiguous chunks of memory to be
mapped within a single VA range.

Change-Id: Ida21ba202927229347b3a32d9b7106df10819cf5


[ROCm/ROCR-Runtime commit: f7de85082e]
2024-02-07 16:56:52 +00:00
David Yat Sin 2ab36a8f08 rocrtst: Add some tests for hsa_amd_pointer_info
Add tests to catch whether ROCr breaks ABI compatibility with the
hsa_amd_pointer_info API in case the hsa_amd_pointer_info struct is
extended.

Change-Id: I4e69bf30db9791e59f895b2798b87985c41242e5


[ROCm/ROCR-Runtime commit: 776da1a3f7]
2024-02-07 16:56:52 +00:00
David Yat Sin 1d9eaae944 Improve documentation for set_async_scratch_limit API
Change-Id: I03ca986cdd468c7b167e119bd2f25d5c79ff2142


[ROCm/ROCR-Runtime commit: 0f30da58a7]
2024-02-07 16:56:52 +00:00
Mythreya e5d4513c7b Initial support for scratch allocation tracking
Add new tools table and functions to notify in case of an event

Change-Id: I47f0c2f3c8e02d7bcb74d649903eb4f86721c154


[ROCm/ROCR-Runtime commit: a67af3807f]
2024-02-07 16:56:52 +00:00
Joseph Greathouse 30a05ed11e Fix undefined behavior in definition of hsa_amd_memory_fault_reason_t
Currently, the definition of hsa_amd_memory_fault_reason_t tries to
set a constant of 0x8000_0000 by using the definition "1 << 31".

However, the 1 in this definition is a signed integer by C++ rules.
On our architectures, shifting a signed integer by 31 results in
signed integer overflow. Signed integer overflow results in
undefined behavior.

Forcing the 1 to be unsigned avoids this.

Change-Id: I860431eeede4eff29598f646abf3c1337b048d71


[ROCm/ROCR-Runtime commit: 1d6691e06b]
2024-02-07 16:56:52 +00:00
Jonathan Kim d9d10761f5 Fix copy logic on devices with no xgmi SDMAs
Fix gang factor overwrite of 0 if there are no xGMI SDMAs
on the device and gang factor is 1.

Change-Id: I041d4b4ae87fb68f224ee4dedb758c6f06c022a9


[ROCm/ROCR-Runtime commit: 1dd4a7dc18]
2024-02-07 16:56:52 +00:00
David Belanger 9345569ac7 kfdtest: Updated CWSR test for emulation
Added global flag g_IsEmuMode and set it when running under emulator.
Adjusted delays in KFDCWSRTest for emulator.

Change-Id: Ia5c0be40816ac2219add943e306ee16438f5b852
Signed-off-by: David Belanger <david.belanger@amd.com>


[ROCm/ROCR-Runtime commit: 3dd98d075f]
2024-02-06 20:56:32 -05:00
Jonathan Kim 39e3120f44 Fix IPC import on device memory with no requested nodes
Users can import device memory without specifying the target node.
DMA buf imports return a Thunk handle that's not useful for
gpu mapping calls.

Fix this by using the import node information to re-import and
map with the correct target GPU.

Also fix IPC detach calls by deregistering the Thunk handle
import immediately during attach instead of failing to do it later
on detach since Thunk handles aren't placed into ROCr allocation
map.

Finally refactor the IPC attach function for cleaner logic flow.

Change-Id: Ib2bf178110b2be98bd6917c765f724e4e613f5f2


[ROCm/ROCR-Runtime commit: a3efd13a2f]
2024-02-06 23:15:29 +00:00
Jonathan Kim 976cf45b0c Fix DMABuf FD closure for IPC attach client
We should also close the client side dmabuf fd after importing for target
nodes.

Change-Id: I74f61dd65bebb03dc002f5df7301efd1ef8d9603


[ROCm/ROCR-Runtime commit: 15691ae460]
2024-02-06 23:15:29 +00:00
Jonathan Kim 15127c6f85 Optimize and fix SDMA gang copies
Optimizations include:
- Greedy gang by placing gang leaders on first D2D sdma blit context
to avoid dead locking with other gang leaders and items.  Note that
this is fine since we can't avoid an oversubscription problem when
there is only 1 xGMI link anyways, so treat all xGMI links as a single
pipe for ganging.
- Non-leader gang items don't have to poll on dependency signals so this
opens up more non-blocking SDMA channels.
- unlock gang lock when gangs are not needed.
- Change gang factor lookup from vector pair to map and register all
gpus in gang factor lookup regardless of link type so that we can take
advantage of the O(logN) direct key/value lookup time.

Fixes include:
- HSA_PAGE_SIZE_4KB was an incorrect macro to use for gang size limit.
As a result, small copies ended up ganging and hitting latency limit.
Use hardcoded 4096 bytes instead.
- Cap auxillary gang factor to the number of non-XGMI SDMA engines.

Change-Id: Ic23fde131502906a807134a04599aa6d012e8cbb


[ROCm/ROCR-Runtime commit: 62f3f250ce]
2024-01-25 10:42:27 -05:00
James Zhu 72775ad4e4 kfdtest: change Largest Buffer search algorithm
The old Largest Buffer search algorithm is using Binary Search
algorithm to find last successful memory allocation. But each
successful memory allocation takes times. Since the unsuccessful
memory allocation returns very quick. Changing the search algorithm
to find first successful memory allocation starting from MAX, each
testing step with granularity interval will speed up this test.

Change-Id: I07daea05423c33e72a483f0013e8ea1b5dabf989
Signed-off-by: James Zhu <James.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: f75fddb9bd]
2024-01-19 10:42:40 -05:00
James Zhu 399abc66f7 rocrtst: change max memory search algorithm.
The old max memory search algorithm is using Binary Search
algorithm to find last successful memory allocation. But each
successful memory allocation takes times. Since the unsuccessful
memory allocation returns very quick. Changing the search algorithm
to find first successful memory allocation starting from MAX, each
testing step with granularity interval will speed up this test.

Change-Id: Idada3c6f750c94f3bb223f4f3bff4e4ebd3e98f7
Signed-off-by: James Zhu <James.Zhu@amd.com>


[ROCm/ROCR-Runtime commit: caedadcc6f]
2024-01-18 13:46:44 -05:00
Sam Wu b9a5201cbc Apply doc standards for ReadtheDocs builds
Applies the following changes:
add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency

Change-Id: I947cf32cd42d9f4e55b1ddd324ad4a7e4ba3f3e3


[ROCm/ROCR-Runtime commit: 1c6ad56dc6]
2024-01-18 12:07:27 -05:00
David Yat Sin 8450291bac VMM: rocrtst for exporting/importing dmabuf
This is part of patch series for Virtual Memory API.

Change-Id: I1f1357a39b48b0d0611967ce9dd0b83b6a8db864


[ROCm/ROCR-Runtime commit: 84c30dd735]
2024-01-17 10:25:20 -05:00
David Yat Sin 99ced0140e VMM: rocrtst for basic virtual memory APIs
This is part of patch series for Virtual Memory API.

Change-Id: Ic3b44435cb09ad17d833b4a4b2551bd211b494e9


[ROCm/ROCR-Runtime commit: a69c1e9f39]
2024-01-17 10:25:09 -05:00
David Yat Sin 6d86fe02f5 VMM: Use emplace when adding entries
Use emplace to prevent copying the MappedHandle objects when inserting
entries into mapped_handle_map_.

Change-Id: Id3f40f1eb73ce30e62da53c5aea4dd715e83ac59


[ROCm/ROCR-Runtime commit: 32b3a3c299]
2024-01-17 10:25:04 -05:00
David Yat Sin 7c35d797d9 VMM: Fix flags when allocating memory handle
When allocating a memory handle, the NoAddress thunk flag should be set
so that this allocation does not have a virtual address range.
Also, skip mapping the memory when allocating a memory handle

Change-Id: I1c168bc00ddbc158d447197c4dc25f96bad02b19


[ROCm/ROCR-Runtime commit: 29efd8eccd]
2024-01-17 10:24:58 -05:00
David Yat Sin a8664a7471 VMM: Default access should be none
After a memory handle is created. hsa_amd_vmem_get_access should return
HSA_ACCESS_PERMISSION_NONE insread of reporting the allocation as
invalid.

Change-Id: I1a09d15c220d48497d09c89059493e538f82aeb9


[ROCm/ROCR-Runtime commit: 2f97049da5]
2024-01-17 10:24:51 -05:00
David Yat Sin 01ae507bf2 VMM: Fix access for multi-GPU
When using multi-GPU for each BO, a new dmabuf_fd needs to be imported
into libdrm.

Change-Id: Iaa2415c8f655a1ce8e92b0878517a11ff014a1d5


[ROCm/ROCR-Runtime commit: 8b85f9e668]
2024-01-17 10:24:35 -05:00
Jonathan R. Madsen 4f7dfe87d2 Suppress reporting no tools were found with rocprofiler-register
Change-Id: If853517d40e073202d12e2a6b16fb54be5529650


[ROCm/ROCR-Runtime commit: 8f0ea44c09]
2024-01-17 01:01:19 -05:00
David Yat Sin 68522c65a9 HSA_USE_SVM to override SVMAPISupported node prop
When HSA_USE_SVM is 0, thunk uses non-SVM path, but upper layers still
use SVM path. That is not as expected.

Suggested-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I1ae0b4faa2f8af5ec69a81cfeb7661bd47d739d4


[ROCm/ROCR-Runtime commit: 0accd17b6e]
2024-01-16 22:44:38 -05:00
Jonathan Kim 7ac11c41af Enable IPC DMA buf
Set HSA_ENABLE_IPC_MODE_LEGACY off (i.e. use DMA bufs implementation
by default).

Change-Id: I7b1c6cb7d19310adf6f0bfe060736f4adbf7adc2


[ROCm/ROCR-Runtime commit: e20f41df62]
2024-01-16 22:43:27 -05:00
Jonathan Kim 3c63cf521b Change IPC implementation to use DMA Bufs
As the KFD IPC IOCTLs will not be upstreamed, change runtime
implementation to use DMA bufs.

DMA buf fds will be passed over abstract unix domain sockets.
The exporter spins a thread that creates a socket server.
The importer connects to the server to fetch the fd.

libDRM will be required to do a manual import and GPU map for
memory that is not already imported and mapped.

For now, use the legacy IPC implementation by default as a
follow on patch will disable the HSA_ENABLE_IPC_MODE_LEGACY
environment variable.

Change-Id: Ifd8469e9adfc81f8a1ea78d6010fb10b515ba1b4


[ROCm/ROCR-Runtime commit: 5dfebdbca9]
2024-01-16 22:43:00 -05:00
David Yat Sin ba59a36d8a Use HybridMutex for IPC locks
Change-Id: I24ab4a96237612a7d32beda06cc20b25cb1f0b37


[ROCm/ROCR-Runtime commit: 0e3f668e2c]
2024-01-16 21:29:39 +00:00
David Yat Sin cbe9337918 Use HybridMutex for signal mutexes
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.

Change-Id: If53377033e749b0050727964c9303f09b02527cc


[ROCm/ROCR-Runtime commit: 8d3fee5095]
2024-01-16 21:29:39 +00:00