rocm-systems

Autor(a)	SHA1	Mensagem	Data
German Andryeyev	816af44b05	rocr: Add logic to track the age of events Some KFD versions can return from hsaKmtWaitOnMultipleEvents_Ext without any wait and require the second call without age array init. Change-Id: I8358c33080084d47c273c2a2827085d0570c8201	2024-11-25 14:55:22 -05:00
Apurv Mishra	6f6ee9679c	rocr: uninitialized pointer read in InitScratchPool Initialized 'scratch_base' as a nullptr to avoid uninitialized read in hsaKmtAllocMemory() Change-Id: I3b0e67f3fd3b591e1d21d691f0777b1d1a059b73 Signed-off-by: Apurv Mishra <apurv.mishra@amd.com>	2024-11-25 14:02:37 -05:00
Apurv Mishra	610f8a1e0f	rocr: Uninitialized scalar variables and pointer Added check and initialized parameters for PtrInfo(). v1: Checking if PtrInfo() returns success. v2: Initialization for variables being passed to PtrInfo(). Change-Id: If3ec4608c8e58be259b4fd51ad681b9bc34ddff6 Signed-off-by: Apurv Mishra <apurv.mishra@amd.com> Reviewed-by: David Yat Sin <david.yatsin@amd.com>	2024-11-22 16:23:29 -05:00
Jonathan Kim	0f02ed6ffb	kfdtest: exclude negative testing from gfx908 GFX 9.0.8 may not properly support pipe reset capabilities so disable test for now. Change-Id: I3061cdad87eb979ba884c194f4229c0cbb144ee2	2024-11-20 12:23:09 -05:00
Jonathan Kim	26d338df12	kfdtest: fix dispatch pointers and event leaks tests KFDDBGTest and KFDNegative test can eat into memory and event resources for subsequent test interations if unallocated. Change-Id: Iea170c20df8d487703441181b6c152b61f02d3db	2024-11-19 11:25:24 -05:00
Emily Deng	f047f96161	kfdtest: Fix InterruptRestore randomly hang Queue 2's wave blocked the queue 1's wave save, which will cause unmap queue preemption fail. Add nop per SQ suggested. Signed-off-by: Emily Deng <Emily.Deng@amd.com> Change-Id: Iea7f280e35487059c4499ea999b9e0cdf841d1e1	2024-11-14 23:36:06 -05:00
Konstantin Zhuravlyov	4c7a9a0f67	loader: add gfx9-4-generic support Change-Id: Icb148f7a78a4ce0fc661e35d0df605e05db2de3d	2024-11-14 12:47:46 -05:00
David Yat Sin	f58aff630c	rocr: Fix sem_post overflow errors WaitSemaphore and PostSemaphore are used in the HybridMutex implementation. If HybridMutex did not have to call WaitSemaphore when acquired, then calling PostSemaphore would cause the internal count inside sem_t to slowly grow to large values and eventually cause overflow. Change-Id: I173fc17c874b49926e56991405e9086ea8c138fc	2024-11-13 21:57:26 -05:00
David Yat Sin	4ec730f1dc	rocr: Add HSA_SIGNAL_WAIT_ABORT_TIMEOUT Add support for abort timeout when hsa_signal_wait_relaxed is called and signal does not clear within timeout. timeout is in seconds Change-Id: If1db5a8af33c82ddc4b48968c3d8eceb97d0ea6d	2024-11-13 21:57:02 -05:00
Jonathan Kim	865e32baf4	kfdtest: add per-pipe reset negative test Add basic KFD per-pipe reset support. Change-Id: I0f68c4d33e6d043de0b5cbda1d494640ba8175f1	2024-11-13 13:34:44 -05:00
Jonathan Kim	1a4adaf7bc	hsakmt: Update HSA capabilites with per-queue reset Per-queue reset is now supported and flagged in HSA capabilities. Change-Id: I21e2421da73b9fafae19c903dc3eeeab1f84968d	2024-11-13 13:34:35 -05:00
Konstantin Zhuravlyov	ec3d4aa5e9	loader: add gfx12-generic support Change-Id: I0bf5d48ec357278bdb7a9c4eae61a7b7995411f0	2024-11-11 16:27:47 -05:00
Konstantin Zhuravlyov	cf9c2efbbd	loader: add gfx1153 support Change-Id: Ie3f0ecf1c6631d95cbff5e14ddc48e751f4c356d	2024-11-11 16:27:39 -05:00
Konstantin Zhuravlyov	7d9a51e22a	loader/nfc: reorder cases when switching on targets, specific first, generic second Change-Id: I47f38c1691b9b6ff589f7ff445143997b0801dc6	2024-11-11 16:27:34 -05:00
Konstantin Zhuravlyov	4344f012b6	loader: add missing support for gfx700 Change-Id: Ia08e93b0e2d300a183a7a5fb92604cd801b2d52a	2024-11-11 16:27:27 -05:00
Ranjith Ramakrishnan	2970545ded	Correct the provides field of hsa-rocr and has-rocr-devel package runtime and devel packages are providing the hsakmt packages. Only devel package need to provide the same Change the package replaces/obsoletes field accordingly Change-Id: Ia1a4f128a1f6928faf57faee5f301a77c21acca2	2024-11-08 13:51:10 -05:00
Konstantin Zhuravlyov	d9404a52ed	amd_hsa_elf.h: bring EF_AMDGPU_MACH_* in sync with llvm-project - formatting - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X56 - add EF_AMDGPU_MACH_AMDGCN_RESERVED_0X57 - add EF_AMDGPU_MACH_AMDGCN_GFX1153 - add EF_AMDGPU_MACH_AMDGCN_GFX12_GENERIC Change-Id: Ibad464c659137c0c98fa9fa9d1f293ea62684ee6	2024-11-07 18:03:27 -05:00
Chris Freehill	0878deda17	rocr: Dynamically allocate static global memory To allow non-POD global variables to last until the last thread has exited, use "new" to allocate the memory instead of static allocation. Change-Id: Ica571b61ff8068a52e472c49cb1c44917e60c8c8	2024-11-07 09:53:31 -05:00
Jaydeep Patel	700f1d9abd	rocr: Decrement counter only if event is popped Also restore dead signals cleanup for old path when HSA_WAIT_ANY_DEBUG is used. Change-Id: I51a7404991443c9f6cbf57b4b9e9faa694b9538c	2024-11-07 01:03:09 -05:00
AravindanC	1a0de862aa	Update static package dependency of rocrtst Change-Id: Ic12a6f2ec3bd03d871815810cc79488e7d5c57ab	2024-11-06 07:06:37 -08:00
Yiannis Papadopoulos	2837825b14	rocr: Adding pointer to the owner driver in Agent class Change-Id: If913d7c7e4caf6d6e6eee3a858a27c6027c2923f	2024-10-31 12:29:10 -04:00
Chris Freehill	c7521a5f2a	rocr: Fix supported_isas transient memory issue An ASAN run of the release build revealed some elements of the supported_isas static map were still using stack data. This change makes it use heap data so it will persist. Change-Id: Ie51887e88b9e2dec27acfc97ea45a6219fea971c	2024-10-31 11:59:29 -04:00
Chris Freehill	4256630fd0	rocr: Fix several rocrtst memory errors Change-Id: I9049a3905fb26cf9b8ad0839684a70771a49f616	2024-10-30 20:36:25 -04:00
Jonathan Kim	7f8676e177	rocr: revert back to old copy behaviour with no xgmi sdma engines SDMA queue resources are limited when all SDMA copies are bottle necked into 2 engines. Callers will not be able to make the best decisions to allocate queue resources fairly so have ROCr fallback to old round robin behaviour dictated by KFD. Change-Id: I93d52297976d74e20129c5eb1dcfbfa5aa5067a7	2024-10-29 16:01:01 -04:00
Chris Freehill	0c18ff22e1	rocr: Generic ISA targets support Change-Id: I6a0341ec9c1ec1e710143676b80a8a3c1a78f725	2024-10-28 08:54:06 -05:00
Chris Freehill	08699069d6	rocr: Quiet some ROCr compile warnings These are mostly AIE related, but there are a couple of others. Change-Id: I549e004772160ca282d4c94dc9d94dd2ccae8b1c	2024-10-28 09:08:14 -04:00
German Andryeyev	0fc7369ba5	rocr: Disable WaitAny() in AsyncEventsLoop() - Add the new path to avoid WaitAny() calls in AsyncEventsLoopp() with HSA_WAIT_ANY_DEBUG key. The new path is selected by default. The optimizaiton combines all logic of WaitAny() in a single processing loop and avoids extra memory allocations or ref counting. Also it won't spin on the CPU if all events are busy. Change-Id: I197ce60d0d023fbb672f700d6e87702686f1f55a	2024-10-25 14:37:02 -04:00
David Yat Sin	d90fbee9c4	rocr: find first dispatch pkt that needs scratch On GPUs where EOP is handled in asic, the read_dispatch_id is not always updated after each packet. Look for the first dispatch packet that needs scratch memory before allocating scratch. Change-Id: Ibf4b4b485f99bf2fabfe48e9609ca99111fdafbe	2024-10-25 14:36:40 -04:00
Philip Yang	e6d4a32c42	kfdtest: Update KFDSVMEvictTest.QueueTest for CPX mode Current test has 4 processes, each process allocate and access 512 buffers, this requires 2048 waves to access 2048 buffers at same time to finish the test. For CPX compute partition mode, each compute node has less waves and cause random test failure. Change test to 2 processes to use 1024 waves to access 1024 buffers with the increased buffer size. Add waves_num check to avoid the test failure on new ASICs or simulator, skip test if the available waves is less than 1024. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Change-Id: I64b5f9172b62cf38f62fbb0b48a801b8a11401c0	2024-10-24 12:57:30 -04:00
Yiannis Papadopoulos	c7785a6da1	rocr/aie: Remove unused set container and error when using AIE agents in MemoryRegion Change-Id: Icf1e56412c840810a679f376293a616068841b8c	2024-10-23 09:42:32 -04:00
Chris Freehill	fd99b74287	rocr: supported_isas map elements should persist The supported_isas static unordered_map was adding stack allocated Isa objects. Instead, make the objects statically allocated, as supported_isas itself is. Change-Id: I23405e218290d48deea6f984f76c57e7b43e314e	2024-10-22 18:09:03 -05:00
David Yat Sin	e1865f7b16	kfdtest: Inherit CXX flags Change-Id: I2e902ec3e6fd582c53a6d95cd49fe2b18f56b8ca	2024-10-17 14:17:08 -04:00
Chris Freehill	9b13bcd0ac	rocr: Ensure globals are initialized at first use When ROCr is built as a static library, global variables were often not initialized to valid values at their first use. This change addresses that problem. Change-Id: I550fa41feb3bc04b9cc686bcfb4acf2a7b651a88	2024-10-16 23:19:48 -04:00
David Yat Sin	80da7d5ee4	Revert "hsakmt: Only set exec flag when requested" This reverts commit `75143555fa`. Reason for revert: This is currently breaking some tools. Will put it back as soon as tools update their code. Change-Id: I05c82d443f3a274a618d05e6dc5a87943f5dc7a4	2024-10-16 20:31:27 -04:00
David Yat Sin	d58c9dea0a	rocr: Add executable flag for memory allocations Change-Id: I8307cd3562c3ab9c12fef8c457a59916e33b7923	2024-10-15 16:52:00 +00:00
David Yat Sin	ead3aafcda	rocrtst: Fix VirtMemory_Basic_Test permissions Fix VirtMemory_Basic_Test permissions to adjust for previous change to the hsa_amd_vmem_set_access behavior change that was done with this patch: rocr/vmm: Only modify permisions for specified agents Change-Id: I97230600b9b9144459b08ca3da3a5bfbdbb98231	2024-10-11 10:41:11 -04:00
jokim	1d6ff45673	rocr: Workaround segfault on GFX9 devices older than GFX90a Devices older than GFX90a hit a segfault on queue unmap when an SDMA queue has been assigned a fixed engine. Bypass fixing the engine for these devices for now. Change-Id: I7d2f882d2377f004a7bb65f3b397396db07ce6d3	2024-10-10 14:41:10 -04:00
Kent Russell	ccd80d19ba	kfdtest: Fix in-tree scriptless build If you build thunk following the instructions in the thunk's README, there is no /lib folder in the build folder. Adjust the include path, and clean up the docs to reflect that. The header include is already defined in the CMake file as ../../include, so we don't use LIBHSAKMT_PATH for that linking, just the lib location Change-Id: I73435d59adb9d01f527a28b1935086260e9d3d70 Signed-off-by: Kent Russell <kent.russell@amd.com>	2024-10-08 14:42:33 -04:00
Shweta Khatri	8bc4efc8ca	hsakmt: pmc_table.c: Fix Coverity reported warnings Eliminate out-of-bounds access in get_block_properties Change-Id: I3abee1e36fafdda053d4bc4a611698d676b01d5c	2024-10-07 14:15:26 -04:00
Shweta Khatri	52e7fd1480	hsakmt: debug.c: Fix Coverity reported warnings Fix potential memory leak reported by Coverity warnings Change-Id: Iacbaa99be3f4fe7fae5fb6a10bd41dfc34b96059	2024-10-07 14:14:26 -04:00
Shweta Khatri	c9454794b6	hsakmt: fmm.c: Fix Coverity reported warnings Fixed multiple issues related to memory management, atomicity, and error handling across various functions: handle null checks, use-after-free, unchecked returns, and memory leaks. Change-Id: Ia7c76320cc20e24001052fbba2dd0600bd412140	2024-10-07 13:54:03 -04:00
David Yat Sin	dbae8da515	rocr: Fix memory leak on non-visible GPUs Fix memory leak for memory regions objects when GPU is masked using ROCR_VISIBLE_DEVICES. Change-Id: I610842a18adbc3cdc854b12650844e271bc00592	2024-10-04 17:40:47 -04:00
Jonathan Kim	0ae064fe2d	rocr: Use new extended graphics handle registration call on IPC import To correctly map to all GPUs after an import, use the new extended registration call that can import a virtual address without having to specify a target node. Change-Id: Ifca8f6f6ee24fa99b2af357dcc3ea1de3ab234f7	2024-10-03 14:06:37 -04:00
Jonathan Kim	03463ed2c0	hsakmt: Enable graphics handle registration with a virtual address Currently registering graphics memory without specifying a target node will return a memory handle that's not a virtual address. As a result, ROCr is forced to register with a target node for IPC usage. Mapping memory without specifying a target node afterwards will result in mapping to the target node that was imported because the previous import call flags this node targeting action to future mapping. For ROCr IPC usage, ROCr wants to map to all GPU nodes if the target node is not specified. Allow the caller to register graphics handles that returns a virtual address without having to specify the target node so that the caller can make a subsequent map call to all GPUs. Change-Id: I5a935092b885cc3568e4f3a5dd951c7ec6c84fca	2024-10-03 14:06:31 -04:00
Ranjith Ramakrishnan	f27ae44b8c	cmake component grouping should not be ignored In static build, the dev and binary components are grouped to generate static package Removed the line that was ignoring the component grouping Change-Id: Ie0ca9db109f2002891260985634f2e6b1ea7f236	2024-10-01 14:22:21 -07:00
Shweta Khatri	9f43c9fd51	hsakmt: spm.c: Fix Coverity reported warnings Fix unused ret value and initialize gpu_id Change-Id: Ib3acc7db4bbab519318d0970786a5dc641dcc9eb	2024-09-30 19:46:51 -04:00
David Yat Sin	73f6bfa747	rocr/vmm: Only modify permisions for specified agents When hsa_amd_vmem_set_access is called, do not remove permissions for unspecified agents. Also updating documentation in header to clarify this. Change-Id: I3bb4cf08ba399f85cc67b17fd13a4a40d862415f	2024-09-30 17:41:58 -04:00
Mukul Joshi	b81e45f03c	kfdtest: Update KFDPerformanceTest.P2PBandWidthTest for CPX mode Currently, KFDPerformanceTest.P2PBandWidthTest cannot work if there are more than 16 KFD nodes in the system. This limit was put in to match the number of SDMA queues supported on a single node. This patch updates the test to make it run on systems with more than 16 KFD nodes. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Change-Id: I561d0cdef664cae84fb9c13a801052e2001256e5	2024-09-30 11:28:33 -04:00
Jonathan Kim	909b82d463	rocr: Fix race condition in IPC DMABUF socket server Socket server accept calls do not guarantee synchronous actions post-accept. This can result in a race condition. To resolve this, first limit the socket server's listen backlog to a single connection. This will force competing clients to busy-retry until timeout. Second, make the DMABUF IPC file descriptor send-receive and import calls into an atomic routine per connection. By doing these fixes, not only to we resolve potential races but we guarantee that any exporter process will create at most one file descriptor that will only last for the duration of the import transaction. This alleviates any concern on running into system limits for the number of open file descriptors per process. Change-Id: I6d8b14795a680d89a2707e082fa027d525792e05	2024-09-27 14:40:59 -04:00
Jonathan Kim	32bb0764b7	rocr: Fix IPC DMA Buf fragment handling and enable for development Discarding blocks for reallocation on IPC export for better memory performance trigger memory violations with DMA BUF exports so bypass this for now as application performance drops haven't been observed with the bypass. The raw fragment should be passed to the DMA Buf export call as well since offsets will be implicitly applied in the Thunk/KFD for export/import calls. Also, use the agent information directly from the pointer information so that the export call doesn't have to scan memory to find this. Pass the node ID in the handle so that the import call doesn't have to make two thunk imports to fetch the node ID for GPU memory imports. Finally, allow the user to use DMA Buf IPC via HSA_ENABLE_IPC_MODE_LEGACY=0 for developer testing as legacy mode will be applied by default. Change-Id: Ie8fe267f8768fa5df37126078406f7065f69ff4e	2024-09-27 14:40:42 -04:00

1 2 3 4 5 ...

2665 Cometimentos