コミットグラフ

58 コミット

作成者 SHA1 メッセージ 日付
Chris Freehill 3a9d14bb66 rocr: Add hsa_amd_portable_export_dmabuf_v2
The original version of hsa_amd_portable_export_dmabuf() did not
consider the conditions under which a dmabuf could be shared.
In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller
can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they
want to share the dmabuf over PCIe. In that case, the new code will check
that if it is a PCIe GPU and it is not in a XGMI Hive then if
large-BAR is not supported, we will return an error.
2025-06-09 15:42:58 -05:00
Tony Gutierrez 11d1d2cd25 rocr: Remove empty shared.cpp 2025-04-23 15:53:29 -04:00
Saleel Kudchadker 57c0c643ce rocr: return preferred SDMA engine mask
- Add a new AMD extension API to return preferred SDMA engine mask.
This can use used in conjunction with copy_on_engine API to get
optimal bandwidth.
2025-04-22 13:28:38 -07:00
Tony Gutierrez 8a38f121ea rocr: Add WaitMultiple to core Signal
Replaces WaitAny with WaitMultiple to more closely align with the
underlying driver API for waiting on multiple events.

WaitMultiple adds a single parameter, wait_on_all, to the WaitAny
interface providing a single function for waiting on multiple
events when we only need AND and OR semantics for the signal
checking logic.

Change-Id: I68a4a45d48151d9d69aef02fd8f7263b9e6c0e75
2025-01-27 09:21:43 -05:00
Chris Freehill 9b13bcd0ac rocr: Ensure globals are initialized at first use
When ROCr is built as a static library, global variables
were often not initialized to valid values at their first
use. This change addresses that problem.

Change-Id: I550fa41feb3bc04b9cc686bcfb4acf2a7b651a88
2024-10-16 23:19:48 -04:00
Saleel Kudchadker 26e105d9ab Initial external logging API
New API to accept a file stream for logging

Co-authored-by: David Yat Sin <David.YatSin@amd.com>

Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-08-07 02:59:00 +00:00
David Yat Sin 08c44fbda6 Add hsa_amd_vmem_address_reserve_align API
New API to support alignment parameter when reserving virtual addresses.
If the alignment is 0, then the default size is used. Otherwise the
alignment needs to be a power of 2 and greater than or equal to page
size.

Existing hsa_amd_vmem_address_reserve marked for future deprecation.

Change-Id: I17cee75420183dea5842fc1ecc2514cdcd760bac
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-25 12:57:22 -05:00
David Yat Sin d6d5786051 Adding queue information queries
New hsa_amd_queue_get_info API to support:

- HSA_AMD_QUEUE_INFO_AGENT: Agent that owns the underlying HW queue

- HSA_AMD_QUEUE_INFO_DOORBELL_ID: KFD doorbell ID of the queue
completion signal.

Change-Id: I98842131bcbdd08552649791a5d43e578a615808
2024-04-11 12:53:48 -04:00
David Yat Sin efe455c2fa Temporary: Set AllocateGTTAccess and node_id for MES
Temporary change to set the AllocateGTTAccess flag and node_id
on MES devices.

Change-Id: I22385d11b17b76cfb44278fa0d8a09bc8721cea6
2024-03-29 19:38:19 +00:00
Mythreya a67af3807f Initial support for scratch allocation tracking
Add new tools table and functions to notify in case of an event

Change-Id: I47f0c2f3c8e02d7bcb74d649903eb4f86721c154
2024-02-07 16:56:52 +00:00
David Yat Sin dca8f3a21d Implement async scratch reclaim
For devices where the CP FW supports asynchronous scratch reclaim, ROCr
is able to claw-back scratch memory that was assigned to an AQL queue.
With that ability, ROCr does not have to rely on using USO
(use-scratch-once) when assigning large amounts of memory to a queue.
If we reach a situation where we are running low on device memory, ROCr
will attempt to claw-back the scratch memory.

Change-Id: Iddf8ec84e37ab8b9fdc58bafbe2b61fe2acb6eb7
2023-12-04 15:05:22 +00:00
David Yat Sin 687eb043d4 Add retain handle and get allocation properties
Support function to retain allocation handle for memory mappings.
The get allocation properties function will return the current
allocation properties for existing memory mappings.

This is part of patch series for Virtual Memory API.

Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551
2023-07-21 15:17:01 -04:00
David Yat Sin b03c96c264 Support exporting and importing memory mappings
Support exporting  and importing dmabuf file descriptors for memory
mappings. The exported dmabuf file descriptors are shareable posix
file descriptors that can be used for cross-vendor, cross-device
and cross-process memory sharing.

This is part of patch series for Virtual Memory API.

Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5
2023-07-21 15:17:01 -04:00
David Yat Sin 13fbd8a232 Support Get and Set access for memory mappings
Mapping memory handles to virtual memory addresses do not make them
accessible. The set access function is needed to make the memory
mappings accessible to specific agents. The get access function
returns current access properties for individual agents.

This is part of patch series for Virtual Memory API.

Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c
2023-07-21 15:17:01 -04:00
David Yat Sin 179dcf1c77 Support mapping and unmapping memory handles
Add support for mapping and unmapping memory handles to virtual
address ranges.

This is part of patch series for Virtual Memory API.

Change-Id: If512d49ff4211e68f2064249add607a3200e458a
2023-07-21 15:17:01 -04:00
David Yat Sin e4a84c4a9c Support memory handles
Add support for creating and releasing memory handles. Memory
handles are memory allocations on device memory without a virtual
address.

This is part of patch series for Virtual Memory API.

Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e
2023-07-21 15:17:01 -04:00
David Yat Sin 1085311f1a Support Virtual Address reservations
Add support for reserving virtual address ranges. Virtual address
ranges are addresses without any memory backing. These address ranges
need to be mapped to memory handles later.

This is part of patch series for Virtual Memory API.

Change-Id: I5d066e7421d6896f933f524312afc230a13d594e
2023-07-21 15:17:01 -04:00
Philipp Knechtges d220e16000 fix link-time ordering condition
This fixes a segfault error in cases where the linking order of
compilation unit varies. Reason behind the segfault is that one
global variable in one compilation unit depends on another global
variable in another compilation unit, but there is no guarantee that
this other compilation unit is initialized first. The fix forces a
reinitialization at the first invocation of the library.

Change-Id: I1428592c6898bca13a330c4588941de260ff0370
2023-06-29 10:08:29 -04:00
Sean Keely 42243c1e8f Add support for exporting portable handles to GPU allocations.
Adds hsa_amd_portable_export_dmabuf and hsa_amd_portable_close_dmabuf
which allow obtaining dmabuf handles to rocr allocations.  These handles
may be shared with other APIs to support cross vendor & cross device
memory sharing.
Adds query to return whether dmabuf export is supported

Signed-off-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: David Yat Sin <David.YatSin@amd.com>

Change-Id: I7f98501087d9563d07fc2cb428cc886b1e518b1e
2023-03-06 12:39:01 -05:00
Jonathan Kim 30920fc94d Add interface to DMA copy directly to a target engine.
Change-Id: Ic87cfeabb11c1a465f98f3f444d39955f5300525
2023-02-13 13:50:49 -05:00
Jonathan Kim 8f27f495c6 Make SDMA engine availability status queryable.
Report the availability of SDMA engines for memory copies.

Change-Id: Ie31b02d6b65355122bb8c98bc73700a59bee166e
2023-02-13 13:50:49 -05:00
Cordell Bloor 5873a78d58 Fix static initialization order
Change-Id: I1d51e150b526d050b988fe5a422644667a561cd7
2023-02-09 13:51:08 -05:00
David Yat Sin 6bfe57aeb2 Add Stream Performance Monitor(SPM) APIs
Change-Id: I0d48782887814ef245b7e0182e2d5570aa8c3f50
2022-12-08 13:56:29 -05:00
Graham Sider 061aa04147 Make queue memory allocation non-paged
Non-paged allocation for queue memory necessary for binding wptr to
GART. Required to support usermode queue oversubscription with MES for
GFX11.

Adds AllocateNonPaged entry to MemoryRegion::AllocateEnum for clarity;
aliases AllocateIPC.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Change-Id: I1a97a1820da26cf2433d9c237b2e6d2b0b8628b4
2022-08-04 11:21:00 -04:00
Sean Keely 270d042ef8 Minor interface improvement to pointer info.
Take in const void* rather than void*.  This does not break the
abi or existing code.  Existing code would need to cast away any
const which is unnecessary and annoying.

Change-Id: I28787e8fab1b600bf6871ea82835e10a4f475c5b
2021-08-04 16:43:23 -04:00
Sean Keely 4455250be1 Add HSA_CU_MASK
New environment variable HSA_CU_MASK allows users to
specify a cu mask to every queue allocated from any
GPU.  hsa_amd_queue_cu_set_mask is restricted from
escaping this mask.

A new API hsa_amd_queue_cu_get_mask is added to query
the current cu mask.

Change-Id: I846c03a5faaca9b95067c31db84b59cc9fce2f03
2021-07-29 02:23:34 -05:00
Sean Keely 77046a1aaa Revert "Revert SVM and XNACK support."
This reverts commit 5bd153974d.

Conflicts:
	opensrc/hsa-runtime/core/util/flag.h

Change-Id: I16daf41588e6139126d66af54b0693de2e7e39f3
2021-04-21 14:49:43 -05:00
Sean Keely 5bd153974d Revert SVM and XNACK support.
KFD is not ready yet.

Change-Id: I61deb292ddb92185d33504c2115169888d56e211
2021-04-02 02:10:59 -04:00
Sean Keely 7333c77e22 Squash merge of cfreehil/amd-temp-gfx90a onto amd-staging.
Includes some workarounds and HMM.
Conflicts:
	opensrc/hsa-runtime/core/runtime/amd_topology.cpp
	opensrc/hsa-runtime/core/util/flag.h

Change-Id: I22976f07964a43dbb228a6231777dbd599112b8d
2021-04-02 02:10:15 -04:00
Sean Keely 01f42dbe46 Add hsa_amd_signal_value_pointer.
Enables partial signal interop with non-HSA devices.

Change-Id: Ic39bca84ed1709cbd2cc24b1eb0f4fc6cccb39cf
2021-02-10 18:47:54 -05:00
Sean Keely f4fe7ddf47 Make explicit reference between init modules.
Make explicit reference to hsa_api_trace.cpp from
initialization of hsa_table_interface.cpp.  Breaks
the ability to use hsa_table_interface.cpp in plugins.

Change-Id: I22a42d3a132512b0d9ec7a1ca629b169e7f8eba7
2020-07-15 16:02:15 -04:00
Sean Keely bd51c61af8 Move tools only table interfaces into namespace rocr.
Change-Id: Ic0b8d958c2d27c921c6955a56110c6cdf5ba5e8e
2020-06-19 22:35:15 -04:00
Ramesh Errabolu fa13208698 Add rocr namespace to core header and impl files
Change-Id: I1e1b33f9bba1078d049bc19797889988c3e43360
2020-06-19 22:34:21 -04:00
Sean Keely ce19721c88 Update copyright date.
Change-Id: If4bf4c20cf051878bfe759080bb7345d884dd53d
2020-06-19 22:34:01 -04:00
Ramesh Errabolu 627991b1c1 Update how code references publicly available ROCr headers
Change-Id: I357c51eb713a23704d4fee71081be46a73a71806
2020-02-21 20:01:11 -05:00
Sean Keely 299874f17d Initial support for deallocation callbacks.
Adds hsa_amd_register_deallocation_callback and hsa_amd_deregister_deallocation_callback
to notify when HSA memory has been released.

Change-Id: I1f33cee250ca890e5c2e7fddfa4479aa5874651d
2019-06-26 04:12:17 -05:00
Felix Kuehling 0c6b9532d4 Use non-paged memory for IPC signals
Non-paged memory can be IPC-shared even when HSA_USERPTR_FOR_PAGED_MEM
is enabled.

Change-Id: I8b1fa6d7a4a9327c78a77b3679697fbf55397093
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
2019-04-29 09:20:11 -04:00
Sean Keely a535e18cc1 Add hsa_amd_memory_lock_to_pool.
Makes malloc memory accessible to GPUs so that the memory has the
capabilities of the pool it is locked to.
This admits fine grained locked memory and reserves API space for any future
special CPU pools.

Change-Id: If8c3dd8582a43f19d3d36b3763c1a688cc419ef0
2019-03-29 01:09:21 -05:00
Sean Keely 8323b2e1d7 Add pooling for Signal ABI blocks (SharedSignal).
Makes better use of memory and greatly reduces mmap count.

Change-Id: Ib444cd1ccd144986adbcc7cec297a966e2c08bc7
2018-11-12 22:37:28 -06:00
Sean Keely e0839ab27e Implement SDMA copy rect for gfx9.
Fix pitch overflow due to small element detection.
Add wide pitch 2D copy handling.
Cleanup code duplication.

Change-Id: I93b1584aba8e5964957eb7ab3544df806ca3e2f9
2018-08-29 19:13:07 -04:00
Jay Cornwall e388a23344 Add hsa_amd_queue_set_priority extension function
Controls dispatch and wavefront scheduling arbitration across quees.

Change-Id: I498f4898b544f79b8fb8514bf7e789ca9da29462
2018-06-19 19:41:28 -05:00
Qingchuan Shi 49d2175c74 debug suport for queue error.
1/ Revised debug event handler to handle different events.
2/ Added queue error handler using the callback in queue create, which will print out wave info when queue in error state.
3/ Preempt queue instead of destory queue when queue error state.

Change-Id: Ib727d208de9caf1c72c76d42268483b24aaebde8
2018-04-20 14:25:16 -04:00
Sean Keely f312a7386e Exception support for Queue.
Remove "zombie" queue state and report queue creation failure via
exceptions.  Make Shared object a final container and support array
objects with Shared.  Add message printing to hsa_exception in
debug builds.

Change-Id: I459f38c80846018acbf45538874e95f91dd6b195
2017-11-08 15:50:02 -05:00
Sean Keely 0c7dde2d1f Add queue intercept support to the runtime.
Queue intercept is exposed as two tools-only APIs via the API
intercept table.

Change-Id: Iac9602ed3143974d85c3569e9092295ad18037f8
2017-11-08 15:50:01 -05:00
Qingchuan Shi ce6aee01ed Add APIs to support debugging vm fault
1. Add hsa ext api hsa_amd_register_vmfault_handler for debugger to register callback in case of VM fault.
2. Extend hsa_ven_amd_loader API to:
   (1) iterate loaded code objects in executable:
       hsa_ven_amd_loader_executable_iterate_loaded_code_objects
   (2) get loaded code object info:
       hsa_ven_amd_loader_loaded_code_object_get_info
3. Make the id of hsa_queue the same as the one used in communication with thunk (for amd_aql_queue)

Change-Id: I68910809e59e24297350d262606f00e96c14bcbd
2017-10-28 21:48:26 -04:00
Sean Keely c9642cf7af Initial IPC signal support.
Added an API for creating signals with attributes.
Added two APIs for IPC operations on signals.
Initial use of exceptions for error handling.

Add ref counting to signals.
Removed spin loops from signal destructors.
Signals are no longer to be destroyed with delete, use DeleteSignal instead.
Added delete safety to doorbells.
Added secondary hsa_signal_t -> Signal* translation path for IPC enabled signals.

Change-Id: Id59065d002f0c2566b0a9425694da2ed27cb7d7f
2017-08-11 18:41:34 -05:00
Sean Keely 2732b18092 Initial exception support for signals.
Also separate signal ABI block allocations from the runtime interface object.

Change-Id: If16763338db664f29163a1348f8f4c38cf0597b2
2017-08-11 18:41:34 -05:00
Sean Keely bc43f97964 Use fixed size type for queue type arguments.
Change-Id: I81b605c9cc9b18bcef043a4f0292212241ce5987
2017-02-07 01:22:30 -05:00
Sean Keely 8081758a55 Add InterProcess memory sharing support.
Support is disabled pending KFD / Thunk readiness.

Change-Id: I55def748e3d56cbfcfa6e24983a0ab78567aa81d
2016-11-15 18:58:29 -06:00
Sean Keely 9dd76dbeda Add pointer info support.
Change-Id: I3edcc0bfddbf12465065c9bc3b6565288faff1b8
2016-11-11 18:40:16 -06:00