rocm-systems

Yazar	SHA1	Mesaj	Tarih
jordans	d4b85b6bf5	hsakmt: Initial Commit for the HSA KMT Model The over arching goal it so provide an API that pre-silicon models can latch into for software bring up.# Please enter the commit message for your changes. Lines starting	2025-03-18 16:22:17 -04:00
Longlong Yao	5916467552	libhsakmt: set node_id to 0 for OnlyAddress Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>	2025-03-11 10:16:58 -04:00
Jonathan Kim	e3d09e30dc	hsakmt: Expose per-SDMA queue reset capabilities Expose new capabilities field that flags per-sdma queue reset support.	2025-03-06 14:04:42 -05:00
Longlong Yao	26f001d3cb	libhsakmt: allocate va in host path Change-Id: I40a4395aca99ea8dfd8ff0ecde64eb2c3840d867 Signed-off-by: Longlong Yao <Longlong.Yao@amd.com>	2025-02-15 07:56:45 -05:00
Harish Kasiviswanathan	2a64fa5e06	libhsakmt: gfx950: Add option to enable HIGH_PRECISION Environment variable HSA_HIGH_PRECISION_MODE can be used to control MFMA precision Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Change-Id: Ib78dd9dd8867025e090a3cca96ab6db4f65dea12	2025-02-10 16:05:25 -05:00
Sv. Lockal	5d04bd42f3	Fix build issues for musl libc (#267 ) Change-Id: Ia31330b0f96669966712b58986abeca754c2cbb9	2025-01-29 14:31:05 +00:00
James Zhu	9509af4b98	libhsakmt: increase default svm.alignment_order Since GFX950 can support page table fragment up to 18 without performance loss. So set GFX950 default svm.alignment_order to 18. Change-Id: Ibcdb7f041fb07a38e924c471beec261ea227ca1d Signed-off-by: James Zhu <James.Zhu@amd.com>	2025-01-28 08:27:19 -05:00
Lancelot Six	76052ba028	libhsakmt: gfx950 uses same VGPR block size as gfx940 Make sure to use allocate the same amount of size for VGPR data in gfx950 as it is done for gfx940. Change-Id: I6a0820996389627ccbdfef856e5150c46fac92a1 Signed-off-by: Lancelot SIX <lancelot.six@amd.com>	2025-01-27 14:06:42 -05:00
Lancelot Six	c51aa0d155	libhsakmt: Use the node info to determine LDS size The CWSR area size needs to take into account the size of LDS each active workgroup can have. The current implementation uses a constant for that. This patch refactors this to use the HsaNodeProperties of the device's the CWSR area is for to figure out the size of LDS. Change-Id: Ib8585b2b7140ec5c99e7b7d62e67f785697c028a Signed-off-by: Lancelot Six <Lancelot.Six@amd.com> Signed-off-by: Amber Lin <Amber.Lin@amd.com>	2025-01-26 21:46:32 -05:00
Shweta Khatri	2d4a578020	Revert "Revert "hsakmt: Only set exec flag when requested"" This reverts commit `80da7d5ee4`. Reason for revert: This will put back the change ID - Id1154f08f6ba21c633905fd46b06053994d6f3cc to ROCR repo, which will prevent memory allocations from being automatically granted the 'executable' flag, addressing previously - incorrect and unsafe behavior in ROCm driver. Change-Id: I3d45c45859929a80f7791681b411251e099a1901	2025-01-23 09:08:25 -05:00
Apurv Mishra	ecf57310ca	hsakmt: move 'counter_id' array to heap local variable 'counter_id' exceeded the max single use of stack, thus move to heap to prevent overflow also, use of a contiguous memory block for 2D array to reduce space complexity, add error messages for NO_MEMORY exits and check MAX_COUNTER limit for IDs Change-Id: Id0249ca767a336b31c759c693a82d3f5c950a2fa Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>	2025-01-13 16:29:16 -05:00
Apurv Mishra	c066ec13dd	hsakmt: modified to free all_gpu_id_array in fmm.c Add free() for 'all_gpu_id_array' in hsakmt_fmm_destroy_process_apertures() and removed it from 'hsakmt_fmm_clear_all_mem()' Change-Id: I32d2d22e7152f62a3f2e7da4f601f0db7cebd534 Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>	2024-11-28 13:08:03 -05:00
Apurv Mishra	79f0ac2534	hsakmt: minor code cleanup and refactor topology.c removed unused value assignment for HSAKMT_STATUS, restructured 'topology_sysfs_check_node_supported' Change-Id: I21cdccb3e3c5e42981f10597426de479d0f4ee6a Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>	2024-11-28 13:06:23 -05:00
David Yat Sin	80da7d5ee4	Revert "hsakmt: Only set exec flag when requested" This reverts commit `75143555fa`. Reason for revert: This is currently breaking some tools. Will put it back as soon as tools update their code. Change-Id: I05c82d443f3a274a618d05e6dc5a87943f5dc7a4	2024-10-16 20:31:27 -04:00
Shweta Khatri	8bc4efc8ca	hsakmt: pmc_table.c: Fix Coverity reported warnings Eliminate out-of-bounds access in get_block_properties Change-Id: I3abee1e36fafdda053d4bc4a611698d676b01d5c	2024-10-07 14:15:26 -04:00
Shweta Khatri	52e7fd1480	hsakmt: debug.c: Fix Coverity reported warnings Fix potential memory leak reported by Coverity warnings Change-Id: Iacbaa99be3f4fe7fae5fb6a10bd41dfc34b96059	2024-10-07 14:14:26 -04:00
Shweta Khatri	c9454794b6	hsakmt: fmm.c: Fix Coverity reported warnings Fixed multiple issues related to memory management, atomicity, and error handling across various functions: handle null checks, use-after-free, unchecked returns, and memory leaks. Change-Id: Ia7c76320cc20e24001052fbba2dd0600bd412140	2024-10-07 13:54:03 -04:00
Jonathan Kim	03463ed2c0	hsakmt: Enable graphics handle registration with a virtual address Currently registering graphics memory without specifying a target node will return a memory handle that's not a virtual address. As a result, ROCr is forced to register with a target node for IPC usage. Mapping memory without specifying a target node afterwards will result in mapping to the target node that was imported because the previous import call flags this node targeting action to future mapping. For ROCr IPC usage, ROCr wants to map to all GPU nodes if the target node is not specified. Allow the caller to register graphics handles that returns a virtual address without having to specify the target node so that the caller can make a subsequent map call to all GPUs. Change-Id: I5a935092b885cc3568e4f3a5dd951c7ec6c84fca	2024-10-03 14:06:31 -04:00
Shweta Khatri	9f43c9fd51	hsakmt: spm.c: Fix Coverity reported warnings Fix unused ret value and initialize gpu_id Change-Id: Ib3acc7db4bbab519318d0970786a5dc641dcc9eb	2024-09-30 19:46:51 -04:00
Shweta Khatri	681610937a	hsakmt: queues.c: Fix Coverity reported warnings Move variable declarations inline and add NULL checks to prevent errors Change-Id: Ia5bf5e245bcc0f756a15bc799b55c5e2a8459f89	2024-09-23 15:07:28 -04:00
Shweta Khatri	857200e28c	hsakmt: events.c: Fix Coverity reported warnings Fix data race by protecting events_page access with mutex in event create Fix potential NULL dereference in hsaKmtWaitOnMultipleEvents_Ext Fix unchecked return value in hsaKmtCreateEvent function Change-Id: I434bef43666e5205a8b061259569c1d99a952752	2024-09-23 11:35:02 -04:00
Shweta Khatri	659fa04d8c	hsakmt: topology.c: Fix Coverity reported warnings Refactor fscanf_str to use fgets for safer string handling, remove unused code Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111feaf1e	2024-09-23 11:34:28 -04:00
Kent Russell	daad183bf8	hsakmt: Undo HSAKMT prefix for PAGE_SHIFT We had skipped doing it for PAGE_SIZE, but it should be left as the regular PAGE_SHIFT name, especially for users who are using different headers. We want PAGE_SHIFT and PAGE_SIZE to be consistent with one another, so set them both explicitly to the same value if either of them is undefined Change-Id: I121d81c48409dd77351b59a192d824e2419a2410 Signed-off-by: Kent Russell <kent.russell@amd.com>	2024-09-20 11:04:34 -04:00
Shweta Khatri	ff6e1b44bf	hsakmt: openclose.c: Fix Coverity reported warnings Add check before close to prevent closing invalid file descriptors Change-Id: Ie1d50e0d55159512a14a70c1e4be058218aae668	2024-09-19 19:44:53 +00:00
Kent Russell	3b61f75f49	hsakmt: Remove unused functions The fmm_node_[added\|removed] functions were added in the initial FMM support, but weren't used. Remove them now since no one's referencing them Change-Id: I1e46e57294a72012227b38f46c7099de0b9263be Signed-off-by: Kent Russell <kent.russell@amd.com>	2024-09-19 19:44:53 +00:00
David Yat Sin	0f241d4061	hsakmt: Add debug prints to trace mem allocations Add extra debug prints to trace memory alloc and register Change-Id: I03d8d7d415565916a8336db6e7063bb7d4cb9102	2024-09-19 19:44:53 +00:00
Kent Russell	3da42a0847	libhsakmt: Prefix global symbols with hsakmt To support fully-static library ROCm builds, ensure that all global symbols are prefixed with something meaningful to avoid collisions with other libraries A script was made using" objdump -C -t" to get a list of symbols, then checking if the global symbols have a meaningful prefix (for thunk: hsakmt or kmt in various cases) Change-Id: Ifd353f64a3344eb60d1f6c4e041aa20967b38a59 Signed-off-by: Kent Russell <kent.russell@amd.com>	2024-09-06 09:56:07 -04:00
Kent Russell	4dc9d49aa6	hsakmt: Free alloc'd memory trace is calloc'd but never freed. Free it. Change-Id: I5795cbe5738f25a9621d24be86abb35c263fa8b7 Signed-off-by: Kent Russell <kent.russell@amd.com>	2024-09-05 10:20:09 -04:00
Xuanteng Huang	7a52a45824	hsakmt: fix spelling error This was pulled in from: https://github.com/ROCm/ROCT-Thunk-Interface/pull/107 Change-Id: Ic30e4552a94a212a9cd138f9311b1c85b0c13867	2024-09-04 10:46:39 -04:00
Joseph Greathouse	75143555fa	hsakmt: Only set exec flag when requested Previous code would blindly set executable bit on all allocations. Change-Id: Id1154f08f6ba21c633905fd46b06053994d6f3cc	2024-09-03 15:13:56 -04:00
Jonathan Kim	ae99effb29	libhsakmt: Fix improper type range check in legacy queue creation Enum type for compute AQL is defined as larger then targeted SDMAs enum types. We should only deny legacy calls for SDMA queues that require targeted engines. Change-Id: I6386a8700b3b18af825b6f0d2be27052cc8de0f5	2024-08-28 13:55:41 -04:00
Lancelot SIX	d5acab2b39	libhsakmt: Check for KFD 1.13 for debug ioctl interface Core dump support relies on debugger related KFD ioctl which have been introduced in version 1.13 of the interface. However, the code checks for KFD_IOCTL_MINOR_VERSION (currently 17), making it impossible to produce core dumps when using some drivers that should support it. Update the CHECK_KFD_MINOR_VERSION calls in the debugger related ioctl wrappers and look for KFD 1.13 or above. Change-Id: I10a7fd03bf8f678b6318d7c25d6a7ded804dac67	2024-08-21 23:45:25 +01:00
Jonathan Kim	2f588a2406	libhsakmt: Extend thunk queue creation with recommended sdma engines Extend the current Thunk implementation of queue creation to target specific SDMA engine IDs. Also expose the new recommend SDMA engines per IO link from the KFD sysfs. Change-Id: I51f9a0d83c0f1fc4d5dc837f879a7ae332e7d7e9	2024-08-20 11:13:57 -04:00
Yifan Zhang	3f1f68c8cb	libhsakmt: add OverrideEngineId property When HSA_OVERRIDE_GFX_VERSION is used, save the overrided GFX version to OverrideEngineId instead of original EngineId. There are places where real GFX properties still needed, e.g. CWSR size calculation. Change-Id: I9d9149bae465b7cfe55604fc19e7ca34e48b7b1c Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>	2024-08-20 09:10:52 -04:00
David Yat Sin	4ffa325c08	libhsakmt: Add two symbols to global symbols For users still using non-static hsakmt Change-Id: I12b1c25f0d952ed9178529cadc518c57c1aeb06d	2024-08-19 14:56:00 -04:00
Alex Sierra	626eb4bfaf	src/fmm.c: fallback to old userptr reg if SVM fails Fallback to old userptr registration in case SVM method fails. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Change-Id: I70c3ec74a8b4f762713e6a0619453642f3fca8e5 Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-07-18 10:20:05 -05:00
Adam Niederer	84567b6416	Allow overriding gfx version per-node This lets you run two unsupported-but-really-supported cards of different architecture together in the same program. Works great w/ llama.cpp on my 7900XT + 6600. Example usage (device 0 is RDNA3, device 1 is RDNA2): HSA_OVERRIDE_GFX_VERSION_1="11.0.0" HSA_OVERRIDE_GFX_VERSION_2="10.3.0" ollama serve Change-Id: Ic63ef462f698dee722d360f7fc3ef72789c277b7 Signed-off-by: AdamNiederer Signed-off-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-07-18 10:20:05 -05:00
James Zhu	338721c24a	PC Sampling: Temporarily check KFD_IOCTL_MINOR_VERSION 16 Since PC Sampling is still under experiment, we can't bump KFD_IOCTL_MINOR_VERSION to enable pc sampling. KFD_IOCTL_MINOR_VERSION 16 already includes all pc sampling code, so use version 16 to enable pc sampling implicitly for customer to try-out this new feature. Need update the version accordingly when pc sampling upstream. Change-Id: I65840128f94e8f347c0617971c0aa4b7e478691a Signed-off-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Philip Yang	6e6f445f75	libhsakmt: Update contiguous memory support ioctl version KFD ioctl version is 1.16 on upstream for contiguous memory support. Remove pc_sampling version, should be added after pc_sample upstream. Change-Id: I6e6c3340bc8e371d68dd7741b02578be2fdef801 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Philip Yang	c98a8dc179	libhsakmt: Add missing CHECK_KFD_OPEN in APIs The application may use parent process KFD handle or invalid KFD handle, add CHECK_KFD_OPEN in all APIs to catch this application bug earlier without calling to KFD. Change-Id: I0391e91eeca8e6752fc9c23f0742445b823ea9b0 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
David Yat Sin	a31e84eaef	libhsakmt: Add alignment for memory allocations New API to support optional alignment parameter for memory allocations. The alignment should be larger than or equal to page size and a power of 2. Change-Id: Ic3fec43b3c4281f74dd33a57ab4143dcf76e1186 Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Lang Yu	4844a70d94	libhsakmt: Prevent hsaKmtRegisterMemory* from registering non-userptr hsaKmtRegisterMemory* can only register OS allocated userptr. v2: Apply changes to all hsaKmtRegisterMemory* stuff.(Philip) v3: Unlock aperture->fmm_mutex to aviod deadlock. Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com> Change-Id: I1045af7edb4da8206cb878f64c0176ba4fc59f60 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Lang Yu	a7a712fb36	libhsakmt: Fix improper usage of hsaKmtRegisterMemoryToNodes It's unnecessary to register non-userptr. Change-Id: Iefd329578365e036e2fe7e4d5c9c0c3d0976f67c Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Lang Yu	ae3ede062f	libhsakmt: add Integrated property To differentiate discrete and integrated GPU more flexibly in runtime, this will aid in querying HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU and hipDeviceAttributeIntegrated. Change-Id: Ic8a6c9aea3b4bd19c4d5f6729af7e64c328fc61d Signed-off-by: Lang Yu <Lang.Yu@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Yiyang Wu	9316b6e4e4	kfdtest: hsaKmtCheckRuntimeDebugSupport should be visible Change-Id: I03a379ede1c990bd275a4d2a8cb379f228381d03 Signed-off-by: Yiyang Wu <xgreenlandforwyy@gmail.com> Signed-off-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
David Belanger	259a724e21	libhsakmt: Fix VGPR size for GFX12/GFX12.1 Set max size needed for VGPR when doing a CWSR for GFX12 and GFX12.1. Signed-off-by: David Belanger <david.belanger@amd.com> Change-Id: Iddefc62f1ad419c6f5ab6a872048457a1dc24037 Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
James Zhu	1087dea925	kfdtest: skip test when PC Sampling is not supported by ASIC Skip test when PC Sampling is not supported by ASIC. Change-Id: I6f9be0bdaed66e51052723b6df6908079470cefb Signed-off-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Jonathan Kim	206db80a56	libhsakmt: fix pc sampling return of functions C Error returns are positive in user space and should check against errno instead. Fix declaration of return to type HSAKMT_STATUS. KFD IOCTL should handle size return when querying capabilities so return size to caller unconditionally. Clean up error translations per function so that it's stylistically clear. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: Ic37390425f370c7ad88f9ed014444decf19383a3 Signed-off-by: Chris Freehill <cfreehil@amd.com>	2024-06-24 14:26:21 -05:00
Chris Freehill	11fd5c2562	Prepare for integration into rocr Change-Id: I6102b9910dbb9d09e09bb262a03c5c0ad4ce66f4	2024-04-30 09:01:09 -05:00

49 İşleme