نمودار کامیت

2959 کامیت‌ها

مولف SHA1 پیام تاریخ
Joseph Huber 9e26cbac14 Add executable symbol info for the wavefront size
The wavefront size is currently only exposed as an agent level
attribute. This is not correctyl, because while the agent has a default
wave front size that is usually correct, it can easily be overridden via
options like -mwavefrontsize64 on various ISAs. The wavefrontsize
attribute is actually more of a calling convention that is consistent
within a callgraph. Because the root of each call graph is a kernel in
this architecture, we need to be able to query this on a per-kernel
basis. This information is already avialable in the kernel descriptor
packet, but it wasn't exported.

This patch adds HSA_CODE_SYMBOL_INFO_KERNEL_WAVEFRONT_SIZE as a new
option to query on the executable symbol.

Change-Id: I744815c89cc9d4c82f25479bdd48ae1f32e859ff
2024-02-09 15:55:30 +00:00
Jonathan Kim e911335cee Minimize FD creation on IPC Create
Instead of caching shared memory fds for export on the exporter side,
only export the FD in the async handler when requested.
The importer should request export fd closure once import is done.

Change-Id: I469e0cd1749beeb9c506c8a6461745fb039d9c3b
2024-02-07 18:50:54 -05:00
Mythreya 8e312471dc Fix ToolsApiTable versioning
ToolsApiTable's version was incorrectly default initialized to 0.
Fixes error in commit fc889669

Change-Id: I41e9301a9c33b119ee50f6164d21ddf11dc188c4
2024-02-07 17:02:32 -05:00
Shweta Khatri 13800cc6d5 Set max_alloc to 95%,reduce by 1% on fail
Prevents OOM-Killer trigger,if all physical and swap mem gets fully used

Change-Id: I70d558fa9c06fe6217e62d57e11aec6a089aa0bb
2024-02-07 14:46:58 -05:00
David Yat Sin f7de85082e VMM: Allow non-contiguous memory maps
Adjust code to allow the use of non-contiguous chunks of memory to be
mapped within a single VA range.

Change-Id: Ida21ba202927229347b3a32d9b7106df10819cf5
2024-02-07 16:56:52 +00:00
David Yat Sin 776da1a3f7 rocrtst: Add some tests for hsa_amd_pointer_info
Add tests to catch whether ROCr breaks ABI compatibility with the
hsa_amd_pointer_info API in case the hsa_amd_pointer_info struct is
extended.

Change-Id: I4e69bf30db9791e59f895b2798b87985c41242e5
2024-02-07 16:56:52 +00:00
David Yat Sin 0f30da58a7 Improve documentation for set_async_scratch_limit API
Change-Id: I03ca986cdd468c7b167e119bd2f25d5c79ff2142
2024-02-07 16:56:52 +00:00
Mythreya a67af3807f Initial support for scratch allocation tracking
Add new tools table and functions to notify in case of an event

Change-Id: I47f0c2f3c8e02d7bcb74d649903eb4f86721c154
2024-02-07 16:56:52 +00:00
Joseph Greathouse 1d6691e06b Fix undefined behavior in definition of hsa_amd_memory_fault_reason_t
Currently, the definition of hsa_amd_memory_fault_reason_t tries to
set a constant of 0x8000_0000 by using the definition "1 << 31".

However, the 1 in this definition is a signed integer by C++ rules.
On our architectures, shifting a signed integer by 31 results in
signed integer overflow. Signed integer overflow results in
undefined behavior.

Forcing the 1 to be unsigned avoids this.

Change-Id: I860431eeede4eff29598f646abf3c1337b048d71
2024-02-07 16:56:52 +00:00
Jonathan Kim 1dd4a7dc18 Fix copy logic on devices with no xgmi SDMAs
Fix gang factor overwrite of 0 if there are no xGMI SDMAs
on the device and gang factor is 1.

Change-Id: I041d4b4ae87fb68f224ee4dedb758c6f06c022a9
2024-02-07 16:56:52 +00:00
David Belanger 3dd98d075f kfdtest: Updated CWSR test for emulation
Added global flag g_IsEmuMode and set it when running under emulator.
Adjusted delays in KFDCWSRTest for emulator.

Change-Id: Ia5c0be40816ac2219add943e306ee16438f5b852
Signed-off-by: David Belanger <david.belanger@amd.com>
2024-02-06 20:56:32 -05:00
Jonathan Kim a3efd13a2f Fix IPC import on device memory with no requested nodes
Users can import device memory without specifying the target node.
DMA buf imports return a Thunk handle that's not useful for
gpu mapping calls.

Fix this by using the import node information to re-import and
map with the correct target GPU.

Also fix IPC detach calls by deregistering the Thunk handle
import immediately during attach instead of failing to do it later
on detach since Thunk handles aren't placed into ROCr allocation
map.

Finally refactor the IPC attach function for cleaner logic flow.

Change-Id: Ib2bf178110b2be98bd6917c765f724e4e613f5f2
2024-02-06 23:15:29 +00:00
Jonathan Kim 15691ae460 Fix DMABuf FD closure for IPC attach client
We should also close the client side dmabuf fd after importing for target
nodes.

Change-Id: I74f61dd65bebb03dc002f5df7301efd1ef8d9603
2024-02-06 23:15:29 +00:00
Jonathan Kim 62f3f250ce Optimize and fix SDMA gang copies
Optimizations include:
- Greedy gang by placing gang leaders on first D2D sdma blit context
to avoid dead locking with other gang leaders and items.  Note that
this is fine since we can't avoid an oversubscription problem when
there is only 1 xGMI link anyways, so treat all xGMI links as a single
pipe for ganging.
- Non-leader gang items don't have to poll on dependency signals so this
opens up more non-blocking SDMA channels.
- unlock gang lock when gangs are not needed.
- Change gang factor lookup from vector pair to map and register all
gpus in gang factor lookup regardless of link type so that we can take
advantage of the O(logN) direct key/value lookup time.

Fixes include:
- HSA_PAGE_SIZE_4KB was an incorrect macro to use for gang size limit.
As a result, small copies ended up ganging and hitting latency limit.
Use hardcoded 4096 bytes instead.
- Cap auxillary gang factor to the number of non-XGMI SDMA engines.

Change-Id: Ic23fde131502906a807134a04599aa6d012e8cbb
2024-01-25 10:42:27 -05:00
James Zhu f75fddb9bd kfdtest: change Largest Buffer search algorithm
The old Largest Buffer search algorithm is using Binary Search
algorithm to find last successful memory allocation. But each
successful memory allocation takes times. Since the unsuccessful
memory allocation returns very quick. Changing the search algorithm
to find first successful memory allocation starting from MAX, each
testing step with granularity interval will speed up this test.

Change-Id: I07daea05423c33e72a483f0013e8ea1b5dabf989
Signed-off-by: James Zhu <James.Zhu@amd.com>
2024-01-19 10:42:40 -05:00
James Zhu caedadcc6f rocrtst: change max memory search algorithm.
The old max memory search algorithm is using Binary Search
algorithm to find last successful memory allocation. But each
successful memory allocation takes times. Since the unsuccessful
memory allocation returns very quick. Changing the search algorithm
to find first successful memory allocation starting from MAX, each
testing step with granularity interval will speed up this test.

Change-Id: Idada3c6f750c94f3bb223f4f3bff4e4ebd3e98f7
Signed-off-by: James Zhu <James.Zhu@amd.com>
2024-01-18 13:46:44 -05:00
Sam Wu 1c6ad56dc6 Apply doc standards for ReadtheDocs builds
Applies the following changes:
add version number to documentation left navigation bar and page title
add an "About" section with a license page
enable htmlzip, pdf, epub formats when publishing on Read the Docs
set pdf title, author, copyright, and version
rename .sphinx/.doxygen to sphinx/doxygen
remove docBin from URL
update rocm-docs-core dependency

Change-Id: I947cf32cd42d9f4e55b1ddd324ad4a7e4ba3f3e3
2024-01-18 12:07:27 -05:00
David Yat Sin 84c30dd735 VMM: rocrtst for exporting/importing dmabuf
This is part of patch series for Virtual Memory API.

Change-Id: I1f1357a39b48b0d0611967ce9dd0b83b6a8db864
2024-01-17 10:25:20 -05:00
David Yat Sin a69c1e9f39 VMM: rocrtst for basic virtual memory APIs
This is part of patch series for Virtual Memory API.

Change-Id: Ic3b44435cb09ad17d833b4a4b2551bd211b494e9
2024-01-17 10:25:09 -05:00
David Yat Sin 32b3a3c299 VMM: Use emplace when adding entries
Use emplace to prevent copying the MappedHandle objects when inserting
entries into mapped_handle_map_.

Change-Id: Id3f40f1eb73ce30e62da53c5aea4dd715e83ac59
2024-01-17 10:25:04 -05:00
David Yat Sin 29efd8eccd VMM: Fix flags when allocating memory handle
When allocating a memory handle, the NoAddress thunk flag should be set
so that this allocation does not have a virtual address range.
Also, skip mapping the memory when allocating a memory handle

Change-Id: I1c168bc00ddbc158d447197c4dc25f96bad02b19
2024-01-17 10:24:58 -05:00
David Yat Sin 2f97049da5 VMM: Default access should be none
After a memory handle is created. hsa_amd_vmem_get_access should return
HSA_ACCESS_PERMISSION_NONE insread of reporting the allocation as
invalid.

Change-Id: I1a09d15c220d48497d09c89059493e538f82aeb9
2024-01-17 10:24:51 -05:00
David Yat Sin 8b85f9e668 VMM: Fix access for multi-GPU
When using multi-GPU for each BO, a new dmabuf_fd needs to be imported
into libdrm.

Change-Id: Iaa2415c8f655a1ce8e92b0878517a11ff014a1d5
2024-01-17 10:24:35 -05:00
Jonathan R. Madsen 8f0ea44c09 Suppress reporting no tools were found with rocprofiler-register
Change-Id: If853517d40e073202d12e2a6b16fb54be5529650
2024-01-17 01:01:19 -05:00
David Yat Sin 0accd17b6e HSA_USE_SVM to override SVMAPISupported node prop
When HSA_USE_SVM is 0, thunk uses non-SVM path, but upper layers still
use SVM path. That is not as expected.

Suggested-by: Lang Yu <Lang.Yu@amd.com>
Change-Id: I1ae0b4faa2f8af5ec69a81cfeb7661bd47d739d4
2024-01-16 22:44:38 -05:00
Jonathan Kim e20f41df62 Enable IPC DMA buf
Set HSA_ENABLE_IPC_MODE_LEGACY off (i.e. use DMA bufs implementation
by default).

Change-Id: I7b1c6cb7d19310adf6f0bfe060736f4adbf7adc2
2024-01-16 22:43:27 -05:00
Jonathan Kim 5dfebdbca9 Change IPC implementation to use DMA Bufs
As the KFD IPC IOCTLs will not be upstreamed, change runtime
implementation to use DMA bufs.

DMA buf fds will be passed over abstract unix domain sockets.
The exporter spins a thread that creates a socket server.
The importer connects to the server to fetch the fd.

libDRM will be required to do a manual import and GPU map for
memory that is not already imported and mapped.

For now, use the legacy IPC implementation by default as a
follow on patch will disable the HSA_ENABLE_IPC_MODE_LEGACY
environment variable.

Change-Id: Ifd8469e9adfc81f8a1ea78d6010fb10b515ba1b4
2024-01-16 22:43:00 -05:00
David Yat Sin 0e3f668e2c Use HybridMutex for IPC locks
Change-Id: I24ab4a96237612a7d32beda06cc20b25cb1f0b37
2024-01-16 21:29:39 +00:00
David Yat Sin 8d3fee5095 Use HybridMutex for signal mutexes
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.

Change-Id: If53377033e749b0050727964c9303f09b02527cc
2024-01-16 21:29:39 +00:00
David Yat Sin 3d1563ee68 Force t1_ update when profiling is enabled
Fixes issue where t1_ counters may not be updated when doing dispatch
profiling, causing a divide by 0.

Change-Id: I91060ac3f9fd2183d277e6e7cd810398a453a87f
2024-01-16 21:29:39 +00:00
David Yat Sin d16c6db2ee Increase min KFD version for Virtual mem support
KFD had some fixes for handling of virtual memory APIs. These fixes are
included in interface version 1.15.

Change-Id: Ie701eccf6e032f9ec0a1f4e8a43718964eebddc6
2024-01-16 21:29:39 +00:00
Joseph Huber 4971150576 Improve endianness check
Update the `hsa.h` header to use the gcc / clang `__BYTE_ORDER__`
macros where available to more accurately autodetect endianness for
the target.

Change-Id: I7312f3badcba9287a30eb14882b91e2a247acc5f
2024-01-16 21:29:39 +00:00
Lancelot SIX 6f828d8609 Revert "trap_handler: Set status.skip_export when halting a wave"
This reverts commit c5db063b2f.  This
change is required for the runtime to generate reliable core dump files,
but this feature has been disabled for now by
5e3be9c28a.  Until it is needed, revert
the ABI change in the trap handler to maintain compatibility with older
debugger.

Change-Id: I77a1562dc7962befe2bf88442df858e2d2b1c5ab
2024-01-16 15:55:59 +00:00
Ranjith Ramakrishnan a7d8b1c287 Use relative path rather than hard coded path in package config file
Change-Id: Ia35fadeead69f84f4f4d32ab0c04f2f391aba4f4
2024-01-07 20:51:59 -08:00
Jeremy Newton 42581d4172 Fix missing global symbol
If using hsakmt as a shared library

Change-Id: I66a1849a46bd7009813d49824d0d059e8a511038
Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>
2024-01-04 11:14:39 -05:00
Yifan Zhang 808a4428b6 kfdtest: add APUs judgement in LargestVramBufferTest criteria
This patch is to add APUs judgement in LargestVramBufferTest criteria.

Change-Id: Ic69093f8ebed8be0b1c58787e2a294d86fb49bb0
2023-12-28 17:17:31 +08:00
Yifan Zhang 92c0015787 kfdtest: finetune MMBench subtest criteria
Change the proportion to 6/10 on APUs.

Change-Id: I3576cb23d0f14ff6d576a5db4bdeef9446aa10d2
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
2023-12-28 17:12:44 +08:00
Yifan Zhang 72324444cc kfdtest: finetune LargestVramBuffer subtest criteria
Change the proportion to 3/5 on APUs.

Change-Id: I0dc37fb1c309605811551b88259473603c81f9ae
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
2023-12-27 16:23:17 +08:00
Ruili Ji 4b69351394 To fix sdma segment fault for error address
pad_size address shall start from command_addr not
(command_addr + total_command_size)

Change-Id: I3d8491986caf2d4d5dc41b1d90286c21e7c0a457
2023-12-25 09:31:13 +08:00
Alex Sierra 5e3be9c28a Revert "core dump: Generates a core dump from a fault event"
This reverts commit 803e37ded5.
This commit disables core dump feature. Apparently, gfx1101 SA1 waves
can not enter the trap handler because they receive an invalid
address. However, core dump at the debugger has been moved to rocm
6.2.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I7915caf58118658e5e7f435f91a0a6216d2fdb42
2023-12-18 17:30:13 -06:00
David Yat Sin 6333fdecf3 Use pthread_setaffinity_np
On some systems, pthread_addr_setaffinity_np does not exist, so we need
to use pthread_setaffinity_np on thread after pthread_create

Provided by Julian Samaroo on github

https: //github.com/RadeonOpenCompute/ROCR-Runtime/pull/143
Change-Id: I4649f94333f2d7b0a5993b370a4bfc48d92acecb
2023-12-18 17:41:49 -05:00
David Yat Sin 9b2ed66609 Fix README for invalid command
`-DCMAKE_INSTALL_PATH` is not valid,use `-DCMAKE_INSTALL_PREFIX` instead

https://github.com/RadeonOpenCompute/ROCR-Runtime/pull/171/

Suggested-by: fjh1997 on github
Change-Id: Ibb85da7fe755b662fa9a836d6fbe3394d34a0337
2023-12-18 09:15:05 -05:00
Xiaogang Chen 1a7162731e kfdtest: use svm range granularity in KFDSVMEvictTest.QueueTest/1
When xnack is on shadder code in this test triggers gpu page fault that migrate
data from system ram to vram. Use svm range granularity to move all data from
system buffer to vram to reduce system ram pressure to avoid system ram oom for
systems that has less system ram.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Change-Id: I219472210756be319491f7827f7209fe32726f81
2023-12-15 13:27:22 -06:00
David Yat Sin c81c18bcc5 Temporarily disabling Queue_Validation_InvalidWorkGroupSize
Change-Id: I88c5900151b9f572f956cd7b428119b614331be9
2023-12-14 16:46:19 -05:00
gaba 4bf73f521b libhsakmt: Fix CPU cache issue
For "Intel Meteor lake Mobile", the cache info is not in sysfs,
    That means /sys/devices/system/node/node%d/%s/cache is not exist,
    but system working fine.

Change-Id: Ie7c04426791a84c2288ff21df093226828a5f629
Signed-off-by: Gang Ba <Gang.Ba@amd.com>
2023-12-08 15:29:19 -05:00
David Yat Sin c86837d8d6 Add query for agent memory and aql ext properties
Add query to return flags for GPU agent memory properties and AQL
extensions.

Implement flag to determine that GPU agent is an APU

Change-Id: Ic04c51290b2b9763e14989c117f35a2e22297453
2023-12-07 14:41:37 -05:00
Lancelot SIX c5db063b2f trap_handler: Set status.skip_export when halting a wave
When inspecting waves on architectures where SPI may not initialize TTMP
registers, the debugger cannot reliably know if the trap handler was
entered and if it saved valuable information in TTMP registers.

This patch uses the status.skip_export bit (unused by the compute
shaders) to indicate that it got executed before halting a wave.
This is done except for gfx940, where ttmp11[31] can be used (as long as
TTMP registers are always initialized by SPI for this architecture).  It
could be possible to be more selective as architectures always
initializing TTMP registers do not require this step, but always doing
is makes maintenance simpler.

Change-Id: I314db6b37772f7daa8bd405e6662a86658d3f5e0
2023-12-06 21:20:03 -05:00
David Yat Sin ed1b0b9b1a Add queries for HSA Ext interface version
Change-Id: I26860fb1364cd3a33cdc9b284ac807b2702bb241
2023-12-06 13:58:52 -05:00
Alex Sierra 803e37ded5 core dump: Generates a core dump from a fault event
Extracts and creates a core dump ELF file from a fault event, using
core dump front end.

Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ibbbe41b3d13dd3fcb90161e927d48c329cf513a9
2023-12-05 23:19:14 -05:00
Alex Sierra 54604654bd reports KFD core dump support through hsakmt API
Member added to KFDVersion to report if KFD supports core dump
mechanism. This is done through hsaKmtRuntimeEnable API call while
the topology is being built. It also dictates if core dump will be
generated by either KFD or hsa-runtime.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I2e9d4166563402f78613d728446feb692c52d9d1
2023-12-05 23:19:14 -05:00