Graphe des révisions

899 Révisions

Auteur SHA1 Message Date
David Yat Sin 32b3a3c299 VMM: Use emplace when adding entries
Use emplace to prevent copying the MappedHandle objects when inserting
entries into mapped_handle_map_.

Change-Id: Id3f40f1eb73ce30e62da53c5aea4dd715e83ac59
2024-01-17 10:25:04 -05:00
David Yat Sin 29efd8eccd VMM: Fix flags when allocating memory handle
When allocating a memory handle, the NoAddress thunk flag should be set
so that this allocation does not have a virtual address range.
Also, skip mapping the memory when allocating a memory handle

Change-Id: I1c168bc00ddbc158d447197c4dc25f96bad02b19
2024-01-17 10:24:58 -05:00
David Yat Sin 2f97049da5 VMM: Default access should be none
After a memory handle is created. hsa_amd_vmem_get_access should return
HSA_ACCESS_PERMISSION_NONE insread of reporting the allocation as
invalid.

Change-Id: I1a09d15c220d48497d09c89059493e538f82aeb9
2024-01-17 10:24:51 -05:00
David Yat Sin 8b85f9e668 VMM: Fix access for multi-GPU
When using multi-GPU for each BO, a new dmabuf_fd needs to be imported
into libdrm.

Change-Id: Iaa2415c8f655a1ce8e92b0878517a11ff014a1d5
2024-01-17 10:24:35 -05:00
Jonathan R. Madsen 8f0ea44c09 Suppress reporting no tools were found with rocprofiler-register
Change-Id: If853517d40e073202d12e2a6b16fb54be5529650
2024-01-17 01:01:19 -05:00
Jonathan Kim e20f41df62 Enable IPC DMA buf
Set HSA_ENABLE_IPC_MODE_LEGACY off (i.e. use DMA bufs implementation
by default).

Change-Id: I7b1c6cb7d19310adf6f0bfe060736f4adbf7adc2
2024-01-16 22:43:27 -05:00
Jonathan Kim 5dfebdbca9 Change IPC implementation to use DMA Bufs
As the KFD IPC IOCTLs will not be upstreamed, change runtime
implementation to use DMA bufs.

DMA buf fds will be passed over abstract unix domain sockets.
The exporter spins a thread that creates a socket server.
The importer connects to the server to fetch the fd.

libDRM will be required to do a manual import and GPU map for
memory that is not already imported and mapped.

For now, use the legacy IPC implementation by default as a
follow on patch will disable the HSA_ENABLE_IPC_MODE_LEGACY
environment variable.

Change-Id: Ifd8469e9adfc81f8a1ea78d6010fb10b515ba1b4
2024-01-16 22:43:00 -05:00
David Yat Sin 0e3f668e2c Use HybridMutex for IPC locks
Change-Id: I24ab4a96237612a7d32beda06cc20b25cb1f0b37
2024-01-16 21:29:39 +00:00
David Yat Sin 8d3fee5095 Use HybridMutex for signal mutexes
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.

Change-Id: If53377033e749b0050727964c9303f09b02527cc
2024-01-16 21:29:39 +00:00
David Yat Sin 3d1563ee68 Force t1_ update when profiling is enabled
Fixes issue where t1_ counters may not be updated when doing dispatch
profiling, causing a divide by 0.

Change-Id: I91060ac3f9fd2183d277e6e7cd810398a453a87f
2024-01-16 21:29:39 +00:00
David Yat Sin d16c6db2ee Increase min KFD version for Virtual mem support
KFD had some fixes for handling of virtual memory APIs. These fixes are
included in interface version 1.15.

Change-Id: Ie701eccf6e032f9ec0a1f4e8a43718964eebddc6
2024-01-16 21:29:39 +00:00
Joseph Huber 4971150576 Improve endianness check
Update the `hsa.h` header to use the gcc / clang `__BYTE_ORDER__`
macros where available to more accurately autodetect endianness for
the target.

Change-Id: I7312f3badcba9287a30eb14882b91e2a247acc5f
2024-01-16 21:29:39 +00:00
Lancelot SIX 6f828d8609 Revert "trap_handler: Set status.skip_export when halting a wave"
This reverts commit c5db063b2f.  This
change is required for the runtime to generate reliable core dump files,
but this feature has been disabled for now by
5e3be9c28a.  Until it is needed, revert
the ABI change in the trap handler to maintain compatibility with older
debugger.

Change-Id: I77a1562dc7962befe2bf88442df858e2d2b1c5ab
2024-01-16 15:55:59 +00:00
Ruili Ji 4b69351394 To fix sdma segment fault for error address
pad_size address shall start from command_addr not
(command_addr + total_command_size)

Change-Id: I3d8491986caf2d4d5dc41b1d90286c21e7c0a457
2023-12-25 09:31:13 +08:00
Alex Sierra 5e3be9c28a Revert "core dump: Generates a core dump from a fault event"
This reverts commit 803e37ded5.
This commit disables core dump feature. Apparently, gfx1101 SA1 waves
can not enter the trap handler because they receive an invalid
address. However, core dump at the debugger has been moved to rocm
6.2.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I7915caf58118658e5e7f435f91a0a6216d2fdb42
2023-12-18 17:30:13 -06:00
David Yat Sin 6333fdecf3 Use pthread_setaffinity_np
On some systems, pthread_addr_setaffinity_np does not exist, so we need
to use pthread_setaffinity_np on thread after pthread_create

Provided by Julian Samaroo on github

https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143
Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb
2023-12-18 17:41:49 -05:00
David Yat Sin 9b2ed66609 Fix README for invalid command
`-DCMAKE_INSTALL_PATH` is not valid,use `-DCMAKE_INSTALL_PREFIX` instead

https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/171/

Suggested-by: fjh1997 on github
Change-Id: Ibb85da7fe755b662fa9a836d6fbe3394d34a0337
2023-12-18 09:15:05 -05:00
David Yat Sin c86837d8d6 Add query for agent memory and aql ext properties
Add query to return flags for GPU agent memory properties and AQL
extensions.

Implement flag to determine that GPU agent is an APU

Change-Id: Ic04c51290b2b9763e14989c117f35a2e22297453
2023-12-07 14:41:37 -05:00
Lancelot SIX c5db063b2f trap_handler: Set status.skip_export when halting a wave
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.

This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture).  It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.

Change-Id: I314db6b37772f7daa8bd405e6662a86658d3f5e0
2023-12-06 21:20:03 -05:00
David Yat Sin ed1b0b9b1a Add queries for HSA Ext interface version
Change-Id: I26860fb1364cd3a33cdc9b284ac807b2702bb241
2023-12-06 13:58:52 -05:00
Alex Sierra 803e37ded5 core dump: Generates a core dump from a fault event
Extracts and creates a core dump ELF file from a fault event, using
core dump front end.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ibbbe41b3d13dd3fcb90161e927d48c329cf513a9
2023-12-05 23:19:14 -05:00
Alex Sierra 54604654bd reports KFD core dump support through hsakmt API
Member added to KFDVersion to report if KFD supports core dump
mechanism. This is done through hsaKmtRuntimeEnable API call while
the topology is being built. It also dictates if core dump will be
generated by either KFD or hsa-runtime.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2e9d4166563402f78613d728446feb692c52d9d1
2023-12-05 23:19:14 -05:00
Alex Sierra 91f2a70817 core dump: ulimit check mechanism added
Core dump generation considers ulimit to generate the proper size
file.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I61d991fc003b173f9075b66bff6a931447720695
2023-12-05 23:19:14 -05:00
Alex Sierra 514b222368 core dump: Front end core dump API
This API consists in one function to be called from a fault event at the
hsa-runtime to generate a core dump.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: Ib1b90d5beb13f93c4e8ebd21fd61705ebb12ca5d
2023-12-05 23:19:14 -05:00
Alex Sierra 1083d5c35f core dump: SegmentBuilder classes added
SegmentBuilder classes are used to get core dump data from the GPUs.
So far, it uses thunk API calls and smaps to collect all data from
the Hardware.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2ad70ca5a951885181d3142653b186b0f6be739e
2023-12-05 23:19:14 -05:00
Giovanni LB 71bc875ccd Adding coordinate query to aqlprofile
Change-Id: I9f2fee62a24cf2a4784ba9e8c813b7b7296d034b
2023-12-05 13:25:30 -05:00
Giovanni LB e8920cacc8 : Adding ATT API extension to aqlprofile
Change-Id: Ic511cf871d5d98638d7041ca277f945ae8ced3a5
2023-12-05 13:25:10 -05:00
Jonathan R. Madsen 27eb0516bb rocprofiler-register updates
- fix logic for using HSA_TOOLS_LIB when rocprofiler-register support is enabled
- report tool load failure for rocprofiler-register

Change-Id: Ife23aa3e6ed19174376cd694764583b73f8976cd
2023-12-04 11:44:58 -06:00
David Yat Sin 251601b20b Add RISC-V support
Patch provided by user Xeonacid via github:
https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/172/

Change-Id: I5f9086b536383093e7995b9cfdc19dab213f0265
2023-12-04 15:05:22 +00:00
David Yat Sin f07b8f2250 Use CPU_SET_S instead of CPU_SET
Fix incorrect use of CPU_SET on variable size cpu_set_t

Suggested by Christopher E. Moore on github
https://github.com/RadeonOpenCompute/ROCR-Runtime/issues/130

Change-Id: I710b56683ba07c08dcd83c851bf72e4f127a0ad4
2023-12-04 15:05:22 +00:00
Giovanni LB e0c6c5e5bf Extending AQLprofile API to include counter dimensions
Change-Id: If59489a085959f3f765a30e3e445df5151e30350
2023-12-04 15:05:22 +00:00
David Yat Sin a7a3358067 Implement alternate scratch
The alternate scratch memory is used for dispatches that have a low
number of waves but relatively large wave size.
This allows us to keep the tmpring_size.bits.WAVES field of the main
scratch to full occupancy.

Change-Id: I32d240fac4b7d38200d1eebc1b0fdc8a823920d3
2023-12-04 15:05:22 +00:00
David Yat Sin dca8f3a21d Implement async scratch reclaim
For devices where the CP FW supports asynchronous scratch reclaim, ROCr
is able to claw-back scratch memory that was assigned to an AQL queue.
With that ability, ROCr does not have to rely on using USO
(use-scratch-once) when assigning large amounts of memory to a queue.
If we reach a situation where we are running low on device memory, ROCr
will attempt to claw-back the scratch memory.

Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7
2023-12-04 15:05:22 +00:00
David Yat Sin 64070a9acc Refactor scratch handler function
Separate the event handler and scratch handler portions of the code into
separate functions.

Change-Id: Ifdb7461e816b0f2d3c1c0a74d6f020b4d6fc736c
2023-12-04 15:05:22 +00:00
David Yat Sin fa317f8c41 Re-arrange and rename scratch elements that are used with main scratch
Change-Id: I4c1ff8cf4121a06b586fe49c70400226506bf95e
2023-12-04 15:05:22 +00:00
David Yat Sin 0344c8c0b6 Update queue structure to support async reclaim
Update queue structure to add members required for asynchronous reclaim
mechanism and dual-scratch. CP will set the AMD_QUEUE_CAPS_ASYNC_RECLAIM
bit on queue-connect to indicate whether the new features are supported.

The new members are ignored by previous versions of CP FW

Change-Id: Ic8e9ef41c5b1d04f09b43bc9b44b31527863d10f
2023-12-04 15:05:22 +00:00
Shweta Khatri acf9e95027 Revert "Restore default code object version usage for ROCr and ROCr Test"
This reverts commit 6ef7fcedd1290b59190f81df1d25142ecb05d282.

Change-Id: Icc0300c25a89fcb99287d013863a00ace7e12129
2023-12-04 15:03:31 +00:00
Lancelot SIX 6916ce358a trap_handler: Fix handling of debugtrap for gfx11
For gfx11, the trap_handler fails to recognize a trap id 3 and report
the exception to the debugger if the debugger is attached.

This is because the 2nd level trap handler looks for the DEBUG_ENABLED
bit in ttmp13 instead of ttmp11.  This bit is set by the 1st level trap
handler and is part of the 1st/2nd level trap handler ABI.

Change-Id: Ib36361f53d9bcbbed52320d8c3a9ab2c0b28c7cd
2023-12-04 15:03:31 +00:00
Lang Yu 991bbdcf24 Revert "Revert "Add support for GC 11.5.0 and 11.5.1""
This reverts commit ebc51dd0eb.

gfx1150/1151 is merged into mainline now.

Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906
2023-12-04 15:03:31 +00:00
David Yat Sin 642165b1bc Increase scratch aperture size to 4GB per XCC
Change-Id: Ia02cea45ce8b782527f44fec539b0ab7cc453200
2023-12-04 15:03:31 +00:00
Jonathan Kim 81c64228e0 Increase SDMA copy size
SDMA4.4 and SDMA5.2+ has increased it's available copy size to 2^30 bytes
represented by exponent as bits set in the COUNT field of the
linear copy.

Also note that the full 2^22 byte limit is available from SDMA4 onwards
as it has corrected the 0x3fffe0 HW limitation from SDMA3.

As copy limit has increase, this can change system performance
so provide env var HSA_ENABLE_SDMA_COPY_SIZE_OVERRIDE=0 to fall
back to the original 0x3fffe0 limit for debugging purposes.

Change-Id: I0fb6e5378f68e5b8a00ff559271691a943ee06ee
2023-12-04 15:03:31 +00:00
Youssef Aly ae1da390bd Enabled profiling for CPU agents for memcpy activities
To be able to trace memcpy asynchronously, both dst and src agents need to have profiling enabled and the api for enabling profiling was only enabling for gpu agents. CPU agents didn't have profiling enabled so the signal owner could not be known. hsa_amd_profiling_get_async_copy_time will fail with an HSA status error because it can't read the agent for the given signal.

Change-Id: Ie165e0e39b8fcd6992a55695b9ffcead10a8e812
2023-12-04 15:01:59 +00:00
Jonathan R. Madsen f9cf1852e5 rocprofiler-register support
- Update CMakeLists.txt
  - find_package for rocprofiler-register
    - this is an optional package until rocprofiler-register is added to the CI
  - define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs
- Update runtime.cpp
  - include <rocprofiler-register/rocprofiler-register.h>
  - if rocprofiler-register succeeds, do not support v1 unless explicitly requested

Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe
2023-12-04 15:01:59 +00:00
Jonathan Kim 2f847cf05f Restore default code object version usage for ROCr and ROCr Test
Remove override that forces ROCr image blit source and ROCr test to use
code object version 4 now that mainline has been updated to version 5.

Change-Id: I94681e86835c0e382475306ead4cd4132a2ee78f
2023-12-04 15:01:44 +00:00
David Yat Sin 750212e50e Handle HW_EXCEPTION events
Add handler to handle HW exception events reported by underlying
drivers. These events are generally caused by GPU resets and need the
application to abort.
As an improvement, in the future, we can provide additional information
about the exception (e.g mode-reset level)

Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0
2023-11-20 14:49:26 +00:00
David Yat Sin 1a7de9588e Add LoongArch64 Support
Patch submitted by user Xinmudotmoe on github

Change-Id: I58fd035b4ec4856f20d63747ababd49fa9764348
2023-10-26 11:36:16 -04:00
Tony Tye 7955fb01ec Make AqlPacket::string more robust
AqlPacket::string should check the packet type is in range of the array
used to print its name.

Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5
2023-10-18 12:54:36 -04:00
Tony Tye 395ad3b77b AQL packet header may need to be loaded atomically
An AQL packet header field is stored using an atomic release, and needs
to be read using atomic acquire if it may be written by another thread.

Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749
2023-10-18 12:54:36 -04:00
Tony Tye 23b4ce501d Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet
Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add
support to intercept queue to invoke a callback for these packets.

Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149
2023-10-18 12:54:36 -04:00
Tony Tye b020f66d39 Prevent accessing packets outside intercept queue
When the intecept queue copies packets from the proxy queue to the
wrapped queue, it should not attempt to copy packets that are outside
the proxy queue. This could happen if the user of the proxy queue
advances the write  pointer beyond the number of free slots and the
packet rewriter reduces the number of packets.

Change-Id: Id02f5df8aee0ed7269f4de813731d507cf2126b3
2023-10-18 12:54:36 -04:00