İşleme Grafiği

49 İşleme

Yazar SHA1 Mesaj Tarih
jordans d4b85b6bf5 hsakmt: Initial Commit for the HSA KMT Model
The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
2025-03-18 16:22:17 -04:00
Longlong Yao 5916467552 libhsakmt: set node_id to 0 for OnlyAddress
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-03-11 10:16:58 -04:00
Jonathan Kim e3d09e30dc hsakmt: Expose per-SDMA queue reset capabilities
Expose new capabilities field that flags per-sdma queue reset
support.
2025-03-06 14:04:42 -05:00
Longlong Yao 26f001d3cb libhsakmt: allocate va in host path
Change-Id: I40a4395aca99ea8dfd8ff0ecde64eb2c3840d867
Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>
2025-02-15 07:56:45 -05:00
Harish Kasiviswanathan 2a64fa5e06 libhsakmt: gfx950: Add option to enable HIGH_PRECISION
Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA
precision

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12
2025-02-10 16:05:25 -05:00
Sv. Lockal 5d04bd42f3 Fix build issues for musl libc (#267)
Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9
2025-01-29 14:31:05 +00:00
James Zhu 9509af4b98 libhsakmt: increase default svm.alignment_order
Since GFX950 can support page table fragment up to 18 without
performance loss. So set GFX950  default svm.alignment_order to 18.

Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d
Signed-off-by: James Zhu <James.Zhu@amd.com>
2025-01-28 08:27:19 -05:00
Lancelot Six 76052ba028 libhsakmt: gfx950 uses same VGPR block size as gfx940
Make sure to use allocate the same amount of size for VGPR data in
gfx950 as it is done for gfx940.

Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
2025-01-27 14:06:42 -05:00
Lancelot Six c51aa0d155 libhsakmt: Use the node info to determine LDS size
The CWSR area size needs to take into account the size of LDS each
active workgroup can have.  The current implementation uses a constant
for that.  This patch refactors this to use the HsaNodeProperties of the
device's the CWSR area is for to figure out the size of LDS.

Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a
Signed-off-by: Lancelot Six <Lancelot.Six@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
2025-01-26 21:46:32 -05:00
Shweta Khatri 2d4a578020 Revert "Revert "hsakmt: Only set exec flag when requested""
This reverts commit 80da7d5ee4.

Reason for revert: This will put back the change ID - Id1154f08f6ba21c633905fd46b06053994d6f3cc to ROCR repo, which will prevent memory allocations from being automatically granted the 'executable' flag, addressing previously -  incorrect and unsafe behavior in ROCm driver.

Change-Id: I3d45c45859929a80f7791681b411251e099a1901
2025-01-23 09:08:25 -05:00
Apurv Mishra ecf57310ca hsakmt: move 'counter_id' array to heap
local variable 'counter_id' exceeded the max single
use of stack, thus move to heap to prevent overflow

also, use of a contiguous memory block for 2D array
to reduce space complexity, add error messages for
NO_MEMORY exits and check MAX_COUNTER limit for IDs

Change-Id: Id0249ca767a336b31c759c693a82d3f5c950a2fa
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2025-01-13 16:29:16 -05:00
Apurv Mishra c066ec13dd hsakmt: modified to free all_gpu_id_array in fmm.c
Add free() for 'all_gpu_id_array' in
hsakmt_fmm_destroy_process_apertures() and
removed it from 'hsakmt_fmm_clear_all_mem()'

Change-Id: I32d2d22e7152f62a3f2e7da4f601f0db7cebd534
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-28 13:08:03 -05:00
Apurv Mishra 79f0ac2534 hsakmt: minor code cleanup and refactor topology.c
removed unused value assignment for HSAKMT_STATUS,
restructured 'topology_sysfs_check_node_supported'

Change-Id: I21cdccb3e3c5e42981f10597426de479d0f4ee6a
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
2024-11-28 13:06:23 -05:00
David Yat Sin 80da7d5ee4 Revert "hsakmt: Only set exec flag when requested"
This reverts commit 75143555fa.

Reason for revert: 
This is currently breaking some tools. Will put it back as soon as tools update their code.

Change-Id: I05c82d443f3a274a618d05e6dc5a87943f5dc7a4
2024-10-16 20:31:27 -04:00
Shweta Khatri 8bc4efc8ca hsakmt: pmc_table.c: Fix Coverity reported warnings
Eliminate out-of-bounds access in get_block_properties

Change-Id: I3abee1e36fafdda053d4bc4a611698d676b01d5c
2024-10-07 14:15:26 -04:00
Shweta Khatri 52e7fd1480 hsakmt: debug.c: Fix Coverity reported warnings
Fix potential  memory leak reported by Coverity warnings

Change-Id: Iacbaa99be3f4fe7fae5fb6a10bd41dfc34b96059
2024-10-07 14:14:26 -04:00
Shweta Khatri c9454794b6 hsakmt: fmm.c: Fix Coverity reported warnings
Fixed multiple issues related to memory management, atomicity,
and error handling across various functions: handle null checks,
use-after-free, unchecked returns, and memory leaks.

Change-Id: Ia7c76320cc20e24001052fbba2dd0600bd412140
2024-10-07 13:54:03 -04:00
Jonathan Kim 03463ed2c0 hsakmt: Enable graphics handle registration with a virtual address
Currently registering graphics memory without specifying a target
node will return a memory handle that's not a virtual address.

As a result, ROCr is forced to register with a target node for
IPC usage.

Mapping memory without specifying a target node afterwards will
result in mapping to the target node that was imported because the
previous import call flags this node targeting action to future mapping.

For ROCr IPC usage, ROCr wants to map to all GPU nodes if the target node
is not specified.

Allow the caller to register graphics handles that returns a virtual
address without having to specify the target node so that the caller
can make a subsequent map call to all GPUs.

Change-Id: I5a935092b885cc3568e4f3a5dd951c7ec6c84fca
2024-10-03 14:06:31 -04:00
Shweta Khatri 9f43c9fd51 hsakmt: spm.c: Fix Coverity reported warnings
Fix unused ret value and initialize gpu_id

Change-Id: Ib3acc7db4bbab519318d0970786a5dc641dcc9eb
2024-09-30 19:46:51 -04:00
Shweta Khatri 681610937a hsakmt: queues.c: Fix Coverity reported warnings
Move variable declarations inline and add NULL checks to prevent errors

Change-Id: Ia5bf5e245bcc0f756a15bc799b55c5e2a8459f89
2024-09-23 15:07:28 -04:00
Shweta Khatri 857200e28c hsakmt: events.c: Fix Coverity reported warnings
Fix data race by protecting events_page access with mutex in event create
Fix potential NULL dereference in hsaKmtWaitOnMultipleEvents_Ext
Fix unchecked return value in hsaKmtCreateEvent function

Change-Id: I434bef43666e5205a8b061259569c1d99a952752
2024-09-23 11:35:02 -04:00
Shweta Khatri 659fa04d8c hsakmt: topology.c: Fix Coverity reported warnings
Refactor fscanf_str to use fgets for safer string handling, remove unused code

Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111feaf1e
2024-09-23 11:34:28 -04:00
Kent Russell daad183bf8 hsakmt: Undo HSAKMT prefix for PAGE_SHIFT
We had skipped doing it for PAGE_SIZE, but it should be left as the
regular PAGE_SHIFT name, especially for users who are using different
headers. We want PAGE_SHIFT and PAGE_SIZE to be consistent with one
another, so set them both explicitly to the same value if either
of them is undefined

Change-Id: I121d81c48409dd77351b59a192d824e2419a2410
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-20 11:04:34 -04:00
Shweta Khatri ff6e1b44bf hsakmt: openclose.c: Fix Coverity reported warnings
Add check before close to prevent closing invalid file descriptors

Change-Id: Ie1d50e0d55159512a14a70c1e4be058218aae668
2024-09-19 19:44:53 +00:00
Kent Russell 3b61f75f49 hsakmt: Remove unused functions
The fmm_node_[added|removed] functions were added in the initial FMM
support, but weren't used. Remove them now since no one's referencing
them

Change-Id: I1e46e57294a72012227b38f46c7099de0b9263be
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-19 19:44:53 +00:00
David Yat Sin 0f241d4061 hsakmt: Add debug prints to trace mem allocations
Add extra debug prints to trace memory alloc and register

Change-Id: I03d8d7d415565916a8336db6e7063bb7d4cb9102
2024-09-19 19:44:53 +00:00
Kent Russell 3da42a0847 libhsakmt: Prefix global symbols with hsakmt
To support fully-static library ROCm builds, ensure that all global
symbols are prefixed with something meaningful to avoid collisions with
other libraries

A script was made using" objdump -C -t" to get a list of symbols,
then checking if the global symbols have a meaningful prefix (for thunk:
hsakmt or kmt in various cases)

Change-Id: Ifd353f64a3344eb60d1f6c4e041aa20967b38a59
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-06 09:56:07 -04:00
Kent Russell 4dc9d49aa6 hsakmt: Free alloc'd memory
trace is calloc'd but never freed. Free it.

Change-Id: I5795cbe5738f25a9621d24be86abb35c263fa8b7
Signed-off-by: Kent Russell <kent.russell@amd.com>
2024-09-05 10:20:09 -04:00
Xuanteng Huang 7a52a45824 hsakmt: fix spelling error
This was pulled in from:
https://github.com/ROCm/ROCT-Thunk-Interface/pull/107

Change-Id: Ic30e4552a94a212a9cd138f9311b1c85b0c13867
2024-09-04 10:46:39 -04:00
Joseph Greathouse 75143555fa hsakmt: Only set exec flag when requested
Previous code would blindly set executable bit on all allocations.

Change-Id: Id1154f08f6ba21c633905fd46b06053994d6f3cc
2024-09-03 15:13:56 -04:00
Jonathan Kim ae99effb29 libhsakmt: Fix improper type range check in legacy queue creation
Enum type for compute AQL is defined as larger then targeted SDMAs
enum types.  We should only deny legacy calls for SDMA queues that
require targeted engines.

Change-Id: I6386a8700b3b18af825b6f0d2be27052cc8de0f5
2024-08-28 13:55:41 -04:00
Lancelot SIX d5acab2b39 libhsakmt: Check for KFD 1.13 for debug ioctl interface
Core dump support relies on debugger related KFD ioctl which have been
introduced in version 1.13 of the interface.  However, the code checks
for KFD_IOCTL_MINOR_VERSION (currently 17), making it impossible to
produce core dumps when using some drivers that should support it.

Update the CHECK_KFD_MINOR_VERSION calls in the debugger related ioctl
wrappers and look for KFD 1.13 or above.

Change-Id: I10a7fd03bf8f678b6318d7c25d6a7ded804dac67
2024-08-21 23:45:25 +01:00
Jonathan Kim 2f588a2406 libhsakmt: Extend thunk queue creation with recommended sdma engines
Extend the current Thunk implementation of queue creation to target
specific SDMA engine IDs.

Also expose the new recommend SDMA engines per IO link from the KFD
sysfs.

Change-Id: I51f9a0d83c0f1fc4d5dc837f879a7ae332e7d7e9
2024-08-20 11:13:57 -04:00
Yifan Zhang 3f1f68c8cb libhsakmt: add OverrideEngineId property
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.

Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
2024-08-20 09:10:52 -04:00
David Yat Sin 4ffa325c08 libhsakmt: Add two symbols to global symbols
For users still using non-static hsakmt

Change-Id: I12b1c25f0d952ed9178529cadc518c57c1aeb06d
2024-08-19 14:56:00 -04:00
Alex Sierra 626eb4bfaf src/fmm.c: fallback to old userptr reg if SVM fails
Fallback to old userptr registration in case SVM method fails.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I70c3ec74a8b4f762713e6a0619453642f3fca8e5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-18 10:20:05 -05:00
Adam Niederer 84567b6416 Allow overriding gfx version per-node
This lets you run two unsupported-but-really-supported cards of different architecture together in the same program. Works great w/ llama.cpp on my 7900XT + 6600.

Example usage (device 0 is RDNA3, device 1 is RDNA2):

HSA_OVERRIDE_GFX_VERSION_1="11.0.0" HSA_OVERRIDE_GFX_VERSION_2="10.3.0" ollama serve

Change-Id: Ic63ef462f698dee722d360f7fc3ef72789c277b7
Signed-off-by: AdamNiederer
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-07-18 10:20:05 -05:00
James Zhu 338721c24a PC Sampling: Temporarily check KFD_IOCTL_MINOR_VERSION 16
Since PC Sampling is still under experiment, we can't
bump KFD_IOCTL_MINOR_VERSION to enable pc sampling.
KFD_IOCTL_MINOR_VERSION 16 already includes all pc sampling
code, so use version 16 to enable pc sampling implicitly for
customer to try-out this new feature.
Need update the version accordingly when pc sampling upstream.

Change-Id: I65840128f94e8f347c0617971c0aa4b7e478691a
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Philip Yang 6e6f445f75 libhsakmt: Update contiguous memory support ioctl version
KFD ioctl version is 1.16 on upstream for contiguous memory support.

Remove pc_sampling version, should be added after pc_sample upstream.

Change-Id: I6e6c3340bc8e371d68dd7741b02578be2fdef801
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Philip Yang c98a8dc179 libhsakmt: Add missing CHECK_KFD_OPEN in APIs
The application may use parent process KFD handle or invalid KFD handle,
add CHECK_KFD_OPEN in all APIs to catch this application bug earlier
without calling to KFD.

Change-Id: I0391e91eeca8e6752fc9c23f0742445b823ea9b0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
David Yat Sin a31e84eaef libhsakmt: Add alignment for memory allocations
New API to support optional alignment parameter for memory allocations.
The alignment should be larger than or equal to page size and a power
of 2.

Change-Id: Ic3fec43b3c4281f74dd33a57ab4143dcf76e1186
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Lang Yu 4844a70d94 libhsakmt: Prevent hsaKmtRegisterMemory* from registering non-userptr
hsaKmtRegisterMemory* can only register OS allocated userptr.

v2: Apply changes to all hsaKmtRegisterMemory* stuff.(Philip)

v3: Unlock aperture->fmm_mutex to aviod deadlock.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I1045af7edb4da8206cb878f64c0176ba4fc59f60
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Lang Yu a7a712fb36 libhsakmt: Fix improper usage of hsaKmtRegisterMemoryToNodes
It's unnecessary to register non-userptr.

Change-Id: Iefd329578365e036e2fe7e4d5c9c0c3d0976f67c
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Lang Yu ae3ede062f libhsakmt: add Integrated property
To differentiate discrete and integrated GPU more flexibly in runtime,
this will aid in querying HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU
and hipDeviceAttributeIntegrated.

Change-Id: Ic8a6c9aea3b4bd19c4d5f6729af7e64c328fc61d
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Yiyang Wu 9316b6e4e4 kfdtest: hsaKmtCheckRuntimeDebugSupport should be visible
Change-Id: I03a379ede1c990bd275a4d2a8cb379f228381d03
Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
David Belanger 259a724e21 libhsakmt: Fix VGPR size for GFX12/GFX12.1
Set max size needed for VGPR when doing a CWSR for GFX12 and GFX12.1.

Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iddefc62f1ad419c6f5ab6a872048457a1dc24037
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
James Zhu 1087dea925 kfdtest: skip test when PC Sampling is not supported by ASIC
Skip test when PC Sampling is not supported by ASIC.

Change-Id: I6f9be0bdaed66e51052723b6df6908079470cefb
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Jonathan Kim 206db80a56 libhsakmt: fix pc sampling return of functions
C Error returns are positive in user space and should check against errno
instead.
Fix declaration of return to type HSAKMT_STATUS.
KFD IOCTL should handle size return when querying capabilities so return
size to caller unconditionally.
Clean up error translations per function so that it's stylistically
clear.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ic37390425f370c7ad88f9ed014444decf19383a3
Signed-off-by: Chris Freehill <cfreehil@amd.com>
2024-06-24 14:26:21 -05:00
Chris Freehill 11fd5c2562 Prepare for integration into rocr
Change-Id: I6102b9910dbb9d09e09bb262a03c5c0ad4ce66f4
2024-04-30 09:01:09 -05:00