The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting
Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA
precision
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12
Since GFX950 can support page table fragment up to 18 without
performance loss. So set GFX950 default svm.alignment_order to 18.
Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d
Signed-off-by: James Zhu <James.Zhu@amd.com>
Make sure to use allocate the same amount of size for VGPR data in
gfx950 as it is done for gfx940.
Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1
Signed-off-by: Lancelot SIX <lancelot.six@amd.com>
The CWSR area size needs to take into account the size of LDS each
active workgroup can have. The current implementation uses a constant
for that. This patch refactors this to use the HsaNodeProperties of the
device's the CWSR area is for to figure out the size of LDS.
Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a
Signed-off-by: Lancelot Six <Lancelot.Six@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
This reverts commit 80da7d5ee4.
Reason for revert: This will put back the change ID - Id1154f08f6ba21c633905fd46b06053994d6f3cc to ROCR repo, which will prevent memory allocations from being automatically granted the 'executable' flag, addressing previously - incorrect and unsafe behavior in ROCm driver.
Change-Id: I3d45c45859929a80f7791681b411251e099a1901
local variable 'counter_id' exceeded the max single
use of stack, thus move to heap to prevent overflow
also, use of a contiguous memory block for 2D array
to reduce space complexity, add error messages for
NO_MEMORY exits and check MAX_COUNTER limit for IDs
Change-Id: Id0249ca767a336b31c759c693a82d3f5c950a2fa
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
Add free() for 'all_gpu_id_array' in
hsakmt_fmm_destroy_process_apertures() and
removed it from 'hsakmt_fmm_clear_all_mem()'
Change-Id: I32d2d22e7152f62a3f2e7da4f601f0db7cebd534
Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>
This reverts commit 75143555fa.
Reason for revert:
This is currently breaking some tools. Will put it back as soon as tools update their code.
Change-Id: I05c82d443f3a274a618d05e6dc5a87943f5dc7a4
Fixed multiple issues related to memory management, atomicity,
and error handling across various functions: handle null checks,
use-after-free, unchecked returns, and memory leaks.
Change-Id: Ia7c76320cc20e24001052fbba2dd0600bd412140
Currently registering graphics memory without specifying a target
node will return a memory handle that's not a virtual address.
As a result, ROCr is forced to register with a target node for
IPC usage.
Mapping memory without specifying a target node afterwards will
result in mapping to the target node that was imported because the
previous import call flags this node targeting action to future mapping.
For ROCr IPC usage, ROCr wants to map to all GPU nodes if the target node
is not specified.
Allow the caller to register graphics handles that returns a virtual
address without having to specify the target node so that the caller
can make a subsequent map call to all GPUs.
Change-Id: I5a935092b885cc3568e4f3a5dd951c7ec6c84fca
Fix data race by protecting events_page access with mutex in event create
Fix potential NULL dereference in hsaKmtWaitOnMultipleEvents_Ext
Fix unchecked return value in hsaKmtCreateEvent function
Change-Id: I434bef43666e5205a8b061259569c1d99a952752
We had skipped doing it for PAGE_SIZE, but it should be left as the
regular PAGE_SHIFT name, especially for users who are using different
headers. We want PAGE_SHIFT and PAGE_SIZE to be consistent with one
another, so set them both explicitly to the same value if either
of them is undefined
Change-Id: I121d81c48409dd77351b59a192d824e2419a2410
Signed-off-by: Kent Russell <kent.russell@amd.com>
The fmm_node_[added|removed] functions were added in the initial FMM
support, but weren't used. Remove them now since no one's referencing
them
Change-Id: I1e46e57294a72012227b38f46c7099de0b9263be
Signed-off-by: Kent Russell <kent.russell@amd.com>
To support fully-static library ROCm builds, ensure that all global
symbols are prefixed with something meaningful to avoid collisions with
other libraries
A script was made using" objdump -C -t" to get a list of symbols,
then checking if the global symbols have a meaningful prefix (for thunk:
hsakmt or kmt in various cases)
Change-Id: Ifd353f64a3344eb60d1f6c4e041aa20967b38a59
Signed-off-by: Kent Russell <kent.russell@amd.com>
Enum type for compute AQL is defined as larger then targeted SDMAs
enum types. We should only deny legacy calls for SDMA queues that
require targeted engines.
Change-Id: I6386a8700b3b18af825b6f0d2be27052cc8de0f5
Core dump support relies on debugger related KFD ioctl which have been
introduced in version 1.13 of the interface. However, the code checks
for KFD_IOCTL_MINOR_VERSION (currently 17), making it impossible to
produce core dumps when using some drivers that should support it.
Update the CHECK_KFD_MINOR_VERSION calls in the debugger related ioctl
wrappers and look for KFD 1.13 or above.
Change-Id: I10a7fd03bf8f678b6318d7c25d6a7ded804dac67
Extend the current Thunk implementation of queue creation to target
specific SDMA engine IDs.
Also expose the new recommend SDMA engines per IO link from the KFD
sysfs.
Change-Id: I51f9a0d83c0f1fc4d5dc837f879a7ae332e7d7e9
When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX
version to OverrideEngineId instead of original EngineId. There
are places where real GFX properties still needed, e.g. CWSR size
calculation.
Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Fallback to old userptr registration in case SVM method fails.
Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Change-Id: I70c3ec74a8b4f762713e6a0619453642f3fca8e5
Signed-off-by: Chris Freehill <cfreehil@amd.com>
This lets you run two unsupported-but-really-supported cards of different architecture together in the same program. Works great w/ llama.cpp on my 7900XT + 6600.
Example usage (device 0 is RDNA3, device 1 is RDNA2):
HSA_OVERRIDE_GFX_VERSION_1="11.0.0" HSA_OVERRIDE_GFX_VERSION_2="10.3.0" ollama serve
Change-Id: Ic63ef462f698dee722d360f7fc3ef72789c277b7
Signed-off-by: AdamNiederer
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Since PC Sampling is still under experiment, we can't
bump KFD_IOCTL_MINOR_VERSION to enable pc sampling.
KFD_IOCTL_MINOR_VERSION 16 already includes all pc sampling
code, so use version 16 to enable pc sampling implicitly for
customer to try-out this new feature.
Need update the version accordingly when pc sampling upstream.
Change-Id: I65840128f94e8f347c0617971c0aa4b7e478691a
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
KFD ioctl version is 1.16 on upstream for contiguous memory support.
Remove pc_sampling version, should be added after pc_sample upstream.
Change-Id: I6e6c3340bc8e371d68dd7741b02578be2fdef801
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
The application may use parent process KFD handle or invalid KFD handle,
add CHECK_KFD_OPEN in all APIs to catch this application bug earlier
without calling to KFD.
Change-Id: I0391e91eeca8e6752fc9c23f0742445b823ea9b0
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
New API to support optional alignment parameter for memory allocations.
The alignment should be larger than or equal to page size and a power
of 2.
Change-Id: Ic3fec43b3c4281f74dd33a57ab4143dcf76e1186
Signed-off-by: Chris Freehill <cfreehil@amd.com>
hsaKmtRegisterMemory* can only register OS allocated userptr.
v2: Apply changes to all hsaKmtRegisterMemory* stuff.(Philip)
v3: Unlock aperture->fmm_mutex to aviod deadlock.
Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Change-Id: I1045af7edb4da8206cb878f64c0176ba4fc59f60
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
It's unnecessary to register non-userptr.
Change-Id: Iefd329578365e036e2fe7e4d5c9c0c3d0976f67c
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
To differentiate discrete and integrated GPU more flexibly in runtime,
this will aid in querying HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU
and hipDeviceAttributeIntegrated.
Change-Id: Ic8a6c9aea3b4bd19c4d5f6729af7e64c328fc61d
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Set max size needed for VGPR when doing a CWSR for GFX12 and GFX12.1.
Signed-off-by: David Belanger <david.belanger@amd.com>
Change-Id: Iddefc62f1ad419c6f5ab6a872048457a1dc24037
Signed-off-by: Chris Freehill <cfreehil@amd.com>
Skip test when PC Sampling is not supported by ASIC.
Change-Id: I6f9be0bdaed66e51052723b6df6908079470cefb
Signed-off-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Chris Freehill <cfreehil@amd.com>
C Error returns are positive in user space and should check against errno
instead.
Fix declaration of return to type HSAKMT_STATUS.
KFD IOCTL should handle size return when querying capabilities so return
size to caller unconditionally.
Clean up error translations per function so that it's stylistically
clear.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Change-Id: Ic37390425f370c7ad88f9ed014444decf19383a3
Signed-off-by: Chris Freehill <cfreehil@amd.com>