* libhsakmt: Update hsakmt_fmm_get_handle to support address range
Currently, hsakmt_fmm_get_handle works only if the address is allocated
(staring) value. Update it so it can find the handle if address falls in
the valid allocated range. This is useful for AMD infinity storage
feature where data needs to be transferred to any memory within in the
allocated range
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
* libhsakmt: Introduce AMD Infinity Storage (AIS) API
Add hsaKmtAisReadWriteFile() API to support AMD Infinity Storage. The
API moves data directly from GPU VRAM to a file.
v2: Add in/out ioctl arguments to provide more status information to
user space. Modify hsaKmt API also accordingly.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
* rocr: Initial implementation of AMD Infinity Storage (AIS)
Implement first two API: hsa_amd_ais_file_write and hsa_amd_ais_file_read
v2: Change API from hsa_amd_ to hsa_amd_ais_
Change API to take in handle instead of fd for compatibility accross
different platforms
Original Author: Chris Freehill <Chris.Freehill@amd.com>
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
---------
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
* libhsakmt: fix UB due to signed integer literal in 1 << 31
Bit shift operations on signed numbers should not shift into or beyond
the signed bit as this results in Undefined Behaviour.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* libhsakmt: Fix UB due to signed integer literal in 1 << x
Bit Shifting an unsigned integer is undefined behavior.
BUG: SWDEV-532853
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* rocr: Fix UB in various places due signed integer in bit shift
Bit shifting signed integers into or beyond the sign bit is undefined.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
* rocr: Change signed integer literals to unsigned
Changing the signed integers in the macro expressions throughout the file
to avoid overflow.
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
---------
Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Co-authored-by: Flora Cui <flora.cui@amd.com>
* libhsakmt: Update ioctl version to 1.18
Sync with kernel ioctl version.
Also explicitly set the ioctl flag to KFD_PROC_FLAG_MFMA_HIGH_PRECISION
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
* libhsakmt: Sync ioctl header by adding kfd_ioctl_profiler
Sync with kernel ioctl version. Add kfd_ioctl_profiler.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
---------
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Also advice parameter of madvise() system call is not a bitmask. So fix
that also
v2: Use MAP_SHARED instead of MAP_PRIVATE. This avoids MMU notifiers and
evictions.
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
This patch uses udmabuf driver to allocate system memory instead of using amdgpu
driver for APU. With this function app can account its consumed system memory by
cgroup mechanism. This function is enabled by env variable HSA_USE_UDMABUF.
Signed-off-by: Xiaogang Chen<Xiaogang.Chen@amd.com>
[ROCm/ROCR-Runtime commit: 996e8bbfb7]
This patch adds VirtIO support to the libhsakmt library, enabling communication
with AMD GPUs via VirtIO.
Details
- CMakeLists.txt: Added a new CMakeLists.txt file for the VirtIO component
of libhsakmt.
- hsakmt_virtio.c/h: Implemented the core VirtIO functionality, including
VirtIO GPU device initialization, command execution, and memory management.
- virtio_gpu.c/h: Contains the implementation of the VirtIO GPU device,
including ioctl handling, shared memory management, and command execution.
- hsakmt_virtio_events.c: Implements event handling for VirtIO, such as event
creation, destruction, setting, resetting, and querying event states.
- hsakmt_virtio_memory.c: Manages memory operations for VirtIO, including memory
allocation, freeing, mapping, and unmapping.
- hsakmt_virtio_queues.c: Implements queue management for VirtIO, including
queue creation, destruction, and updating.
- hsakmt_virtio_topology.c: Handles system and node properties for VirtIO.
- hsakmt_virtio_vm.c: Manages VM-related operations for VirtIO, such as
reserving and dereserving VA space.
- include/linux/virtgpu_drm.h: Contains DRM definitions for VirtIO GPU.
Key Features
- VirtIO GPU Initialization: The library can now initialize a VirtIO GPU device
and communicate with it.
- Command Execution: Supports executing commands on the VirtIO GPU device.
- Memory Management: Provides functions for allocating, freeing, mapping, and
unmapping memory for VirtIO operations.
- Event Handling: Implements a comprehensive event system for VirtIO.
- Queue Management: Allows for creating, destroying, and updating queues
on the VirtIO GPU device.
- System and Node Properties: Retrieves and manages system and node
properties for VirtIO.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
[ROCm/ROCR-Runtime commit: 48d3719dba]
- Refactored scratch memory handling by introducing fmm_is_scratch_aperture to
replace repeated for-loops.
- Simplified code paths in hsakmt_fmm_release, hsakmt_fmm_map_to_gpu, and
hsakmt_fmm_unmap_from_gpu by using the new helper.
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
[ROCm/ROCR-Runtime commit: 72061a9024]
This patch changes the type of several loop index variables from int to
uint32_t in fmm.c. The affected functions are:
- __fmm_release
- _fmm_map_to_gpu
- _fmm_unmap_from_gpu
To fix compile warning:
warning: comparison of integer expressions of different signedness:
'int' and 'uint32_t' {aka 'unsigned int'} [-Wsign-compare]
2009 | for (i = 0; i < object->handle_num; i++) {
Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>
[ROCm/ROCR-Runtime commit: 45af009c5d]
To change biggest single buffer to be huge page aligned
and other optimization.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
[ROCm/ROCR-Runtime commit: afe7965796]
when allocating userptr buffer in system ram with size bigger
than or equal 512G, TTM has limit and returns error, to split one
big buffer into multiple small buffers in vm_object will solve
this issue.
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
[ROCm/ROCR-Runtime commit: 8887d25304]
If unmap from GPU return failed, for example, unmap user queue buffer
while queue is active, we should not free obj->mapped_node_id_array,
otherwise, the following unmap user queue buffer after queue is
destroyed still return failed.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Change-Id: I32aeb18871c2e971d01900d92916c54680f5c9fa
[ROCm/ROCR-Runtime commit: 3e6f51b715]
This builds on a prior change that allowed for allocating
a user-mode queue's packet buffer in device memory to also
allocate the queue struct in device memory. This provides
additional latency benefits particularly for cases where
dispatches are performed from the GPU itself. Flags are
added to support the various use cases.
[ROCm/ROCR-Runtime commit: 6e3c375bf1]
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
[ROCm/ROCR-Runtime commit: d4b85b6bf5]
Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA
precision
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12
[ROCm/ROCR-Runtime commit: 2a64fa5e06]
Since GFX950 can support page table fragment up to 18 without
performance loss. So set GFX950 default svm.alignment_order to 18.
Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d
Signed-off-by: James Zhu <James.Zhu@amd.com>
[ROCm/ROCR-Runtime commit: 9509af4b98]
Make sure to use allocate the same amount of size for VGPR data in
gfx950 as it is done for gfx940.
Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
[ROCm/ROCR-Runtime commit: 76052ba028]
The CWSR area size needs to take into account the size of LDS each
active workgroup can have. The current implementation uses a constant
for that. This patch refactors this to use the HsaNodeProperties of the
device's the CWSR area is for to figure out the size of LDS.
Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a
Signed-off-by: Lancelot Six <Lancelot.Six@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
[ROCm/ROCR-Runtime commit: c51aa0d155]
This reverts commit 5a8092bccf.
Reason for revert: This will put back the change ID - Id1154f08f6ba21c633905fd46b06053994d6f3cc to ROCR repo, which will prevent memory allocations from being automatically granted the 'executable' flag, addressing previously - incorrect and unsafe behavior in ROCm driver.
Change-Id: I3d45c45859929a80f7791681b411251e099a1901
[ROCm/ROCR-Runtime commit: 2d4a578020]
local variable 'counter_id' exceeded the max single
use of stack, thus move to heap to prevent overflow
also, use of a contiguous memory block for 2D array
to reduce space complexity, add error messages for
NO_MEMORY exits and check MAX_COUNTER limit for IDs
Change-Id: Id0249ca767a336b31c759c693a82d3f5c950a2fa
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: ecf57310ca]
Add free() for 'all_gpu_id_array' in
hsakmt_fmm_destroy_process_apertures() and
removed it from 'hsakmt_fmm_clear_all_mem()'
Change-Id: I32d2d22e7152f62a3f2e7da4f601f0db7cebd534
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
[ROCm/ROCR-Runtime commit: c066ec13dd]
This reverts commit cfb1ab45ac.
Reason for revert:
This is currently breaking some tools. Will put it back as soon as tools update their code.
Change-Id: I05c82d443f3a274a618d05e6dc5a87943f5dc7a4
[ROCm/ROCR-Runtime commit: 80da7d5ee4]
Fixed multiple issues related to memory management, atomicity,
and error handling across various functions: handle null checks,
use-after-free, unchecked returns, and memory leaks.
Change-Id: Ia7c76320cc20e24001052fbba2dd0600bd412140
[ROCm/ROCR-Runtime commit: c9454794b6]
Currently registering graphics memory without specifying a target
node will return a memory handle that's not a virtual address.
As a result, ROCr is forced to register with a target node for
IPC usage.
Mapping memory without specifying a target node afterwards will
result in mapping to the target node that was imported because the
previous import call flags this node targeting action to future mapping.
For ROCr IPC usage, ROCr wants to map to all GPU nodes if the target node
is not specified.
Allow the caller to register graphics handles that returns a virtual
address without having to specify the target node so that the caller
can make a subsequent map call to all GPUs.
Change-Id: I5a935092b885cc3568e4f3a5dd951c7ec6c84fca
[ROCm/ROCR-Runtime commit: 03463ed2c0]
Fix data race by protecting events_page access with mutex in event create
Fix potential NULL dereference in hsaKmtWaitOnMultipleEvents_Ext
Fix unchecked return value in hsaKmtCreateEvent function
Change-Id: I434bef43666e5205a8b061259569c1d99a952752
[ROCm/ROCR-Runtime commit: 857200e28c]
We had skipped doing it for PAGE_SIZE, but it should be left as the
regular PAGE_SHIFT name, especially for users who are using different
headers. We want PAGE_SHIFT and PAGE_SIZE to be consistent with one
another, so set them both explicitly to the same value if either
of them is undefined
Change-Id: I121d81c48409dd77351b59a192d824e2419a2410
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: daad183bf8]
Add check before close to prevent closing invalid file descriptors
Change-Id: Ie1d50e0d55159512a14a70c1e4be058218aae668
[ROCm/ROCR-Runtime commit: ff6e1b44bf]
The fmm_node_[added|removed] functions were added in the initial FMM
support, but weren't used. Remove them now since no one's referencing
them
Change-Id: I1e46e57294a72012227b38f46c7099de0b9263be
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 3b61f75f49]
To support fully-static library ROCm builds, ensure that all global
symbols are prefixed with something meaningful to avoid collisions with
other libraries
A script was made using" objdump -C -t" to get a list of symbols,
then checking if the global symbols have a meaningful prefix (for thunk:
hsakmt or kmt in various cases)
Change-Id: Ifd353f64a3344eb60d1f6c4e041aa20967b38a59
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 3da42a0847]
trace is calloc'd but never freed. Free it.
Change-Id: I5795cbe5738f25a9621d24be86abb35c263fa8b7
Signed-off-by: Kent Russell <kent.russell@amd.com>
[ROCm/ROCR-Runtime commit: 4dc9d49aa6]
Previous code would blindly set executable bit on all allocations.
Change-Id: Id1154f08f6ba21c633905fd46b06053994d6f3cc
[ROCm/ROCR-Runtime commit: 75143555fa]
Enum type for compute AQL is defined as larger then targeted SDMAs
enum types. We should only deny legacy calls for SDMA queues that
require targeted engines.
Change-Id: I6386a8700b3b18af825b6f0d2be27052cc8de0f5
[ROCm/ROCR-Runtime commit: ae99effb29]
Core dump support relies on debugger related KFD ioctl which have been
introduced in version 1.13 of the interface. However, the code checks
for KFD_IOCTL_MINOR_VERSION (currently 17), making it impossible to
produce core dumps when using some drivers that should support it.
Update the CHECK_KFD_MINOR_VERSION calls in the debugger related ioctl
wrappers and look for KFD 1.13 or above.
Change-Id: I10a7fd03bf8f678b6318d7c25d6a7ded804dac67
[ROCm/ROCR-Runtime commit: d5acab2b39]
Extend the current Thunk implementation of queue creation to target
specific SDMA engine IDs.
Also expose the new recommend SDMA engines per IO link from the KFD
sysfs.
Change-Id: I51f9a0d83c0f1fc4d5dc837f879a7ae332e7d7e9
[ROCm/ROCR-Runtime commit: 2f588a2406]
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.
Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
[ROCm/ROCR-Runtime commit: 3f1f68c8cb]