To correctly map to all GPUs after an import, use the new extended
registration call that can import a virtual address without having to
specify a target node.
Change-Id: Ifca8f6f6ee24fa99b2af357dcc3ea1de3ab234f7
When hsa_amd_vmem_set_access is called, do not remove permissions for
unspecified agents. Also updating documentation in header to clarify
this.
Change-Id: I3bb4cf08ba399f85cc67b17fd13a4a40d862415f
Socket server accept calls do not guarantee synchronous actions
post-accept. This can result in a race condition.
To resolve this, first limit the socket server's listen backlog to a
single connection. This will force competing clients to busy-retry
until timeout.
Second, make the DMABUF IPC file descriptor send-receive and import
calls into an atomic routine per connection.
By doing these fixes, not only to we resolve potential races but
we guarantee that any exporter process will create at most one
file descriptor that will only last for the duration of the import
transaction. This alleviates any concern on running into system
limits for the number of open file descriptors per process.
Change-Id: I6d8b14795a680d89a2707e082fa027d525792e05
Discarding blocks for reallocation on IPC export for better memory
performance trigger memory violations with DMA BUF exports so bypass
this for now as application performance drops haven't been observed
with the bypass.
The raw fragment should be passed to the DMA Buf export call as well
since offsets will be implicitly applied in the Thunk/KFD for
export/import calls.
Also, use the agent information directly from the pointer
information so that the export call doesn't have to scan memory to find
this. Pass the node ID in the handle so that the import call doesn't
have to make two thunk imports to fetch the node ID for GPU memory
imports.
Finally, allow the user to use DMA Buf IPC via
HSA_ENABLE_IPC_MODE_LEGACY=0 for developer testing as legacy mode will
be applied by default.
Change-Id: Ie8fe267f8768fa5df37126078406f7065f69ff4e
Adds support for AllocateMemoryOnly inside XDNA driver.
Move the IsLocalMemory() check inside the KFD driver
since the XDNA driver can, and needs to, create handles
on system memory buffer objects.
Changed handle variable name from thunk_handle to user_mode_driver_handle,
which is more representative if we support non-GPU drivers.
Change-Id: I95db9d575afd1ab0ff2de74cea5175d9a12a721b
This change adds the initial classes for the AIE agent and AIE AQL
queue.
An AIE agent list is added to the core runtime object.
Change-Id: I84b02f52171b80726dfb2c8431582a3ea2986eb3
Recommended SDMA engines for DMA copies are now exposed for better
GPU-GPU performance. ROCr can now select those DMA engines.
Also lock-in host-device copies to SDMA0 and device-host copies to
SDMA1 for better stability and performance.
Change-Id: Ideff2e13daf537104efecb8b837bd49ee5096cb5
This fixes an issue for missing HW events when out of HW events.
We cannot determine whether a HW event has occurred unless we call the
underlying drivers with hsaKmtWaitOnMultipleEvents_Ext. Previous logic
in Signal::WaitAny would switch to ACTIVE_WAIT state if we run out of
hardware events (signal->EopEvent() == NULL) and this would cause the
hsaKmtWaitOnMultipleEvents_Ext call to be skipped. But also, when we
have some signals without hardware events, calling
hsaKmtWaitOnMultipleEvents_Ext with a timeout of 0 so that we can poll
for remaining signals adds overhead with an IOCTL call and may cause
extra delay. Separating AsyncEventLoop into two separate threads so
that:
1. We can have a new Signal::WaitAnyExceptions to wait for HW events
This function can be simpler as it does not have to perform all the
timer calculations because it is expected to be always waiting on
hsaKmtWaitOnMultipleEvents_Ext through the lifetime of a process.
2. Signal::WaitAny does not need to have extra code to check for HW
exceptions as it only needs to handle HSA_EVENTTYPE_SIGNAL events. It
can also skip the calls to hsaKmtWaitOnMultipleEvents_Ext if needed.
Change-Id: I52ba99fd6e483e0cb477b7931a0dcc03520aa523
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
If user application tries to free memory that is currently being used by
the underlying HW device, the hsaKmtFreeMemory function call will fail.
This would be caused by an incorrect call by the user application. A
system memory error is raised and the user application is expected to
abort when this happens.
Note: This leaves the allocation_map_ table in an inconsistent state as
this address entry is removed from it while the pointer is not actually
free'd. But re-organising the FreeMemory() function would require the
memory_lock_ to be held for much longer and may affect performance.
Since this is a very unlikely and invalid use case, we prefer to leave
the FreeMemory() function as is.
Change-Id: I24279eb98620c32d34f4c5ad1b7a0a30cb65835d
Signed-off-by: David Yat Sin <David.YatSin@amd.com>
New API to accept a file stream for logging
Co-authored-by: David Yat Sin <David.YatSin@amd.com>
Change-Id: Ie09c35ae14ca86a97eb25f61251be287c55d7169
Signed-off-by: Chris Freehill <cfreehil@amd.com>
New API to support alignment parameter when reserving virtual addresses.
If the alignment is 0, then the default size is used. Otherwise the
alignment needs to be a power of 2 and greater than or equal to page
size.
Existing hsa_amd_vmem_address_reserve marked for future deprecation.
Change-Id: I17cee75420183dea5842fc1ecc2514cdcd760bac
Signed-off-by: Chris Freehill <cfreehil@amd.com>
When doing a coredump, we try to park the wave and save its PC in
ttmp7/ttmp11, but these registers will be overwritten by PC Sampling
requests.
Change-Id: I60fb734eb3bed4ee3cc8d8bba9ec4a527fff9671
When deferring a dmabuf export on an import call, there may be a
failure to export as the GEM object is not referenced by the kernel
mode driver. To get around this, do a non-deferred export and
immediately close the dmabuf FD to keep FD creation to a minimum.
This way, the GEM object will have a kernel mode driver reference
when a deferred export is done.
Also a bad dmabuf FD sent over a socket may not be received by an import
reader and this can cause a hang.
Set a 10 second timer so that importer is not blocking indefinitely.
Change-Id: I11a9b5ec64aa2e16fd6aecdf46c34e4eb56ccfd0
Extracts and creates a core dump ELF file from a fault event, using
core dump front end. GFX11 is not supported.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: I5ae154e886f39ab3ce7bbae5803efb27a96c7e2e
This allows the VA to be recorded in ROCr so that they are not
treated as an invalid pointer in future API calls.
Change-Id: I8d1d8ef9816a984c89d30a2179b0ce8940fef1da
- add rocprofiler-register to CPACK_DEBIAN_BINARY_PACKAGE_DEPENDS when found
- add rocprofiler-register to CPACK_RPM_BINARY_PACKAGE_REQUIRES when found
- remove report_tool_load_failures_explicit_
- add HSA_TOOLS_DISABLE_REGISTER flag
- add HSA_TOOLS_REPORT_REGISTER_FAILURE
- use HSA_TOOLS_REPORT_REGISTER_FAILURE instead of HSA_TOOLS_REPORT_LOAD_FAILURE
- changed rocprofiler-register message to not include the word "error"
Change-Id: Ib7fd7f14c42758a54c347874018281bb1b5477a6
If two attach requests to the same piece of shared memory occur,
a double export or premature dmabuf fd close can occur since the export
and close on demand calls are not atomic.
Use a reference counter on shared memory dmabuf FDs that have
already been opened to avoid this problem.
Change-Id: I14a59209c0385e32582af42a57b33b1c6838a9b1
Instead of caching shared memory fds for export on the exporter side,
only export the FD in the async handler when requested.
The importer should request export fd closure once import is done.
Change-Id: I469e0cd1749beeb9c506c8a6461745fb039d9c3b
Adjust code to allow the use of non-contiguous chunks of memory to be
mapped within a single VA range.
Change-Id: Ida21ba202927229347b3a32d9b7106df10819cf5
Users can import device memory without specifying the target node.
DMA buf imports return a Thunk handle that's not useful for
gpu mapping calls.
Fix this by using the import node information to re-import and
map with the correct target GPU.
Also fix IPC detach calls by deregistering the Thunk handle
import immediately during attach instead of failing to do it later
on detach since Thunk handles aren't placed into ROCr allocation
map.
Finally refactor the IPC attach function for cleaner logic flow.
Change-Id: Ib2bf178110b2be98bd6917c765f724e4e613f5f2
Use emplace to prevent copying the MappedHandle objects when inserting
entries into mapped_handle_map_.
Change-Id: Id3f40f1eb73ce30e62da53c5aea4dd715e83ac59
After a memory handle is created. hsa_amd_vmem_get_access should return
HSA_ACCESS_PERMISSION_NONE insread of reporting the allocation as
invalid.
Change-Id: I1a09d15c220d48497d09c89059493e538f82aeb9
As the KFD IPC IOCTLs will not be upstreamed, change runtime
implementation to use DMA bufs.
DMA buf fds will be passed over abstract unix domain sockets.
The exporter spins a thread that creates a socket server.
The importer connects to the server to fetch the fd.
libDRM will be required to do a manual import and GPU map for
memory that is not already imported and mapped.
For now, use the legacy IPC implementation by default as a
follow on patch will disable the HSA_ENABLE_IPC_MODE_LEGACY
environment variable.
Change-Id: Ifd8469e9adfc81f8a1ea78d6010fb10b515ba1b4
Implement HybridMutex to improve latencies compared to KernelMutex when
there is contention between several threads calling hsa_signal_create
and hsa_amd_signal_async_handler.
Change-Id: If53377033e749b0050727964c9303f09b02527cc
KFD had some fixes for handling of virtual memory APIs. These fixes are
included in interface version 1.15.
Change-Id: Ie701eccf6e032f9ec0a1f4e8a43718964eebddc6
This reverts commit 803e37ded5.
This commit disables core dump feature. Apparently, gfx1101 SA1 waves
can not enter the trap handler because they receive an invalid
address. However, core dump at the debugger has been moved to rocm
6.2.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I7915caf58118658e5e7f435f91a0a6216d2fdb42
Extracts and creates a core dump ELF file from a fault event, using
core dump front end.
Signed-off-by: Alex Sierra <Alex.Sierra@amd.com>
Change-Id: Ibbbe41b3d13dd3fcb90161e927d48c329cf513a9
- fix logic for using HSA_TOOLS_LIB when rocprofiler-register support is enabled
- report tool load failure for rocprofiler-register
Change-Id: Ife23aa3e6ed19174376cd694764583b73f8976cd
- Update CMakeLists.txt
- find_package for rocprofiler-register
- this is an optional package until rocprofiler-register is added to the CI
- define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs
- Update runtime.cpp
- include <rocprofiler-register/rocprofiler-register.h>
- if rocprofiler-register succeeds, do not support v1 unless explicitly requested
Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe
Add handler to handle HW exception events reported by underlying
drivers. These events are generally caused by GPU resets and need the
application to abort.
As an improvement, in the future, we can provide additional information
about the exception (e.g mode-reset level)
Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0
Modify hsa_amd_vmem_get_access to handle pointers that are within VA
range of an existing memory mapping
Change-Id: I9f806ec39f6e9a33da8d86dd65d9a472438fa8ed
Support function to retain allocation handle for memory mappings.
The get allocation properties function will return the current
allocation properties for existing memory mappings.
This is part of patch series for Virtual Memory API.
Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551
Support exporting and importing dmabuf file descriptors for memory
mappings. The exported dmabuf file descriptors are shareable posix
file descriptors that can be used for cross-vendor, cross-device
and cross-process memory sharing.
This is part of patch series for Virtual Memory API.
Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5
Mapping memory handles to virtual memory addresses do not make them
accessible. The set access function is needed to make the memory
mappings accessible to specific agents. The get access function
returns current access properties for individual agents.
This is part of patch series for Virtual Memory API.
Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c
Add support for mapping and unmapping memory handles to virtual
address ranges.
This is part of patch series for Virtual Memory API.
Change-Id: If512d49ff4211e68f2064249add607a3200e458a
Add support for creating and releasing memory handles. Memory
handles are memory allocations on device memory without a virtual
address.
This is part of patch series for Virtual Memory API.
Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e