rocm-systems

Tekijä	SHA1	Viesti	Päivämäärä
Flora Cui	a765dd7e94	rocr: add specific flag for blit kernel object so that aql-to-pm4 conversion could verify the validity of the kernel object. Signed-off-by: Flora Cui <flora.cui@amd.com>	2025-07-17 21:55:02 +08:00
Honglei Huang	6c87f5b5ce	rocr/driver: add memory residency management interface in driver This commit introduces MakeMemoryResident and MakeMemoryUnresident functions to KfdDriver and XdnaDriver classes. - Added implementations in amd_kfd_driver.cpp - Added stubs in amd_xdna_driver.cpp returning HSA_STATUS_ERROR - Updated header files amd_kfd_driver.h and amd_xdna_driver.h - Removed MakeKfdMemoryResident/Unresident from amd_memory_region.cpp Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-16 13:15:45 +08:00
Honglei Huang	ab6bda7e96	rocr/driver: add memory registration and deregistration into driver This commit completes the memory register/deregister interface change. Removed static RegisterMemory and DeregisterMemory from MemoryRegion class - Added pure virtual methods to base Driver interface in driver class - Added implementation in KFD driver - Modified MemoryRegion Lock and Unlock to use driver interface Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-16 13:15:45 +08:00
Honglei Huang	6c390e32cc	rocr/driver: add AvailableMemory API to driver This commit introduces a new AvailableMemory API to the KfdDriver and XdnaDriver classes. - Implemented AvailableMemory in KfdDriver to return the available memory size using hsaKmtAvailableMemory. - Added a stub implementation of AvailableMemory in XdnaDriver that returns an error. - Updated the GpuAgent class to use the new AvailableMemory API instead of directly calling hsaKmtAvailableMemory. Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-16 13:15:45 +08:00
Honglei Huang	9c18618847	rocr: add const version of driver() method to Agent class This change adds a const-qualified version of the driver() method to the Agent class, allowing const Agent objects to access their associated driver without modifying the object's state. Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-16 13:15:45 +08:00
Honglei Huang	da8dd9e1e3	rocr/driver: add scratch memory allocation into driver interface Add AllocateScratchMemory interface to Driver base class and implement it in both KFD and XDNA drivers. This change encapsulates the low-level scratch memory allocation details within driver implementations, making the code more maintainable and the interface cleaner. The main changes include: - Add AllocateScratchMemory virtual method to Driver interface - Implement the interface in KfdDriver with existing allocation logic - Add stub implementation in XdnaDriver - Update GpuAgent to use the new interface instead of direct KMT calls Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-16 13:15:45 +08:00
Honglei Huang	412e386b50	rocr/driver: move wallclock frequency query to driver layer Move the wallclock frequency query from GpuAgent to driver layer to improve code organization and support multiple driver types. This change: 1. Add GetWallclockFrequency API to KFD/XDNA drivers 2. Move libdrm GPU info query from GpuAgent to driver implementation 3. Update GpuAgent to use the new driver API Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-11 16:14:29 +08:00
Honglei Huang	9bc38e2ee6	rocr/driver: add support for getting GPU tile configuration - Implemented GetTileConfig in KfdDriver to retrieve tile configuration for a specific node. - Added a stub implementation of GetTileConfig in XdnaDriver. - Updated driver.h to include a virtual GetTileConfig method. - Extended hsa_internal.h with a new hsa_get_tile_config function. - Integrated hsa_get_tile_config into hsa.cpp to call the driver-specific implementation. - Updated driver headers to declare the new GetTileConfig method. Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-11 16:14:29 +08:00
Honglei Huang	8d077dba3b	rocr/driver: add GetClockCounters API to driver interface This commit introduces a new GetClockCounters API to the driver interface. - Implemented GetClockCounters in KfdDriver to fetch clock counters using hsaKmtGetClockCounters. - Added a stub implementation of GetClockCounters in XdnaDriver that returns HSA_STATUS_ERROR. - Modified GpuAgent to use driver().GetClockCounters instead of directly calling hsaKmtGetClockCounters. Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-11 16:14:29 +08:00
Honglei Huang	05b83e72d9	rocr/driver: add GetDeviceHandle to driver interface This commit introduces a new GetDeviceHandle API to the driver interface, allowing retrieval of the device handle for a specific node. - Implemented GetDeviceHandle in KfdDriver to fetch the AMD GPU device handle using hsaKmtGetAMDGPUDeviceHandle. - Added a stub implementation of GetDeviceHandle in XdnaDriver that returns HSA_STATUS_ERROR. - Modified GpuAgent::InitLibDrm to use driver().GetDeviceHandle instead of directly calling hsaKmtGetAMDGPUDeviceHandle. Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-07-11 16:14:29 +08:00
Tony Gutierrez	cb7b0c8d9f	rocr: Remove driver usage from filter device Slightly refactor the RvdFilter so it doesn't need to call into the driver.	2025-07-10 09:41:34 -07:00
David Yat Sin	4c2dec5bb8	doc: Fix doxygen comments for in-out params	2025-07-10 08:21:01 -04:00
Honglei Huang	d874b8003a	rocr/driver: add SetTrapHandler API to driver interface This commit introduces a new SetTrapHandler API to the driver interface - Implemented SetTrapHandler in KfdDriver to set trap handlers using hsaKmtSetTrapHandler. - Added a stub implementation of SetTrapHandler in XdnaDriver that returns HSA_STATUS_ERROR. - Updated the driver interface in driver.h to include the new SetTrapHandler method. - Modified GpuAgent to use driver().SetTrapHandler instead of directly calling hsaKmtSetTrapHandler. Signed-off-by: Honglei Huang <Honglei1.Huang@amd.com>	2025-06-27 23:32:53 +08:00
Tony Gutierrez	1a339feb1f	rocr: Move OpenSMI call to Driver	2025-06-25 15:53:02 -07:00
Yiannis Papadopoulos	2ca4d8f6d4	rocr/aie: Remove redundant and unused functions.	2025-06-25 11:32:42 -04:00
Tony Gutierrez	e03d44d742	rocr: Update Driver queue-related APIs Update the user-mode driver queue APIs to leverage KMT types. Move queue-related calls to the core::Driver API.	2025-06-23 12:21:01 -07:00
David Yat Sin	b3c48cc68c	rocr: support reserving non-registered VA Extend hsa_amd_vmem_address_reserve/hsa_amd_vmem_address_reserve_align to support HSA_AMD_VMEM_ADDRESS_NO_REGISTER flag. This allocation can be used to reserve virtual address ranges that can later be used by hsa_amd_svm_attributes_set for SVM based memory allocations.	2025-06-18 18:21:11 -04:00
Sunday Clement	06efa50c09	rocr: Fix Recursive Include in header files scratch_cache.h includes amd_gpu_agent.h which then again includes scratch_cache.h, this has now been fixed removing the unecessary header include. Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>	2025-06-13 12:29:52 -04:00
David Yat Sin	24ce840732	rocr: Remove support for Kaveri GPUs Kaveri GPUs are EoL	2025-06-12 10:38:58 -04:00
David Yat Sin	96d0f07b15	rocr: Fix compile warning when using clang	2025-06-12 10:38:58 -04:00
Chris Freehill	3a9d14bb66	rocr: Add hsa_amd_portable_export_dmabuf_v2 The original version of hsa_amd_portable_export_dmabuf() did not consider the conditions under which a dmabuf could be shared. In the new version (hsa_amd_portable_export_dmabuf_v2()), the caller can specify the flag HSA_AMD_DMABUF_MAPPING_TYPE_PCIE, which means they want to share the dmabuf over PCIe. In that case, the new code will check that if it is a PCIe GPU and it is not in a XGMI Hive then if large-BAR is not supported, we will return an error.	2025-06-09 15:42:58 -05:00
Alysa Liu	9b3d15e68d	rocr: Remove structurally dead code Remove unreachable return statement. Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>	2025-06-09 14:01:39 -04:00
Alysa Liu	f6c8cbd293	rocr: Fix inefficient copy operations Refactor variable assignments to use std::move() where appropriate. Updat function headers to accept parameters by const& where appropriate. Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>	2025-06-02 11:18:36 -04:00
Alysa Liu	ae6851dbb4	rocr: Fixed inefficient copy operations Changed variable assignments to use std::move() where appropriate. Changed function headers to pass string arguments by reference where appropriate. Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>	2025-06-02 11:18:36 -04:00
Alysa Liu	369d89ade3	rocr: Fixed inefficient copy operations Changed variable assignments to use std::move() where appropriate Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>	2025-06-02 11:18:36 -04:00
David Yat Sin	11da1293de	rocr:Fix compile warnings	2025-05-28 16:12:02 -04:00
David Yat Sin	0d70045817	rocr: Remove deprecated doorbell type 1 support	2025-05-28 16:12:02 -04:00
David Yat Sin	4bae509296	rocr: Remove deprecated queue doubleMap code	2025-05-28 16:12:02 -04:00
David Yat Sin	b8434529a5	rocr: Remove queue_full_workaround code Remove deprecated queue_full_workaround code as gfx7 and gfx8 GPUs are EoL.	2025-05-28 16:12:02 -04:00
David Yat Sin	04dbf769f6	rocr: update required CP FW version Update required CP FW version required for async-scratch memory support on gfx950.	2025-05-28 13:03:58 -04:00
David Yat Sin	9d38ca0d22	rocr: Fix compile error when using clang	2025-05-27 23:56:28 -04:00
David Yat Sin	da2607024b	rocr: Perform memcpy for small code-object loads On large BAR systems, for small-sized code-objects, we get performance using direct memcpy due to latencies when doing the blit-copy.	2025-05-22 18:39:19 -04:00
David Yat Sin	e969e01f54	rocr: Perform range based cache invalidates Invalidate only the address range that covers the newly copied code-object. This avoids invalidating I$ for old code objects and thus might increase I$ hit rate.	2025-05-22 18:39:19 -04:00
Jiadong Zhu	0f9d2b836c	rocr/dtif: use default signal for intercept queue for DTIF Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com>	2025-05-13 16:44:31 -04:00
Jiadong Zhu	e2d767879d	rocr/dtif: add hsaKmtQueueRingDoorbell in thunk loader hsaKmtQueueRingDoorbell is specfic to DTIF backend Signed-off-by: Flora Cui <flora.cui@amd.com> Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com>	2025-05-13 16:44:31 -04:00
Aaron Liu	e9088d6e47	rocr/dtif: add CreateThunkInstance/DestroyThunkInstance interfaces Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com>	2025-05-13 16:44:31 -04:00
Aaron Liu	0cd4ddd62b	rocr/dtif: add DRM APIs wrapper in thunk loader Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com>	2025-05-13 16:44:31 -04:00
Aaron Liu	1b79caa214	rocr/dtif: replace hsakmt interfaces with HSAKMT_CALL(...) Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com>	2025-05-13 16:44:31 -04:00
Aaron Liu	7ba77fb193	rocr/dtif: add thunk loader to wrap hsaKmt APIs For native and DTIF backends, unify to use HSAKMT_CALL(...) to call hsaKmt APIs. Signed-off-by: Aaron Liu <aaron.liu@amd.com> Reviewed-by: David Yat Sin <David.YatSin@amd.com>	2025-05-13 16:44:31 -04:00
christian-heusel	5cc61b714d	rocr:Add missing cstdint include	2025-05-06 20:52:48 -04:00
Tony Gutierrez	f2c482d923	rocr: Add large_bar_enabled var to the GPU agent Adds a bool to the GPU agent and a public member method to check if the GPU supports large BAR. This is needed so we can check if large BAR is supported when a user tries to allocate an AQL queue in device memory on a given GPU agent. Also adds an exception to the AQL queue if device-side AQL queues are requested and the GPU owner of the AQL doesn't support large BAR. Otherwise, ROCr will currently allow device-side queues that can cause faults when the user tries to touch their ring buffers and the user will not know why the faults are occuring. This relies on the fact that the KFD does not exposed any links from the CPU to the GPU if large BAR is not enabled (though links from the GPU to the CPU may still be exposed by the KFD).	2025-04-23 15:53:29 -04:00
Tony Gutierrez	6e3c375bf1	rocr: Flags to alloc queue buf/struct in dev mem This builds on a prior change that allowed for allocating a user-mode queue's packet buffer in device memory to also allocate the queue struct in device memory. This provides additional latency benefits particularly for cases where dispatches are performed from the GPU itself. Flags are added to support the various use cases.	2025-04-23 15:53:29 -04:00
Tony Gutierrez	adbc0495e2	rocr/libhsakmt: Add coarse-grain allocator to GPU	2025-04-23 15:53:29 -04:00
Saleel Kudchadker	57c0c643ce	rocr: return preferred SDMA engine mask - Add a new AMD extension API to return preferred SDMA engine mask. This can use used in conjunction with copy_on_engine API to get optimal bandwidth.	2025-04-22 13:28:38 -07:00
Shane Xiao	6a63170b38	rocr: Add rec sdma engines with limited XGMI SDMA engine This patch will adds recommended sdma supports with limited XGMI SDMA engine. It will use one PCIe SDMA to do gpu <-> gpu copies which will help improve all to all copy performance. Signed-off-by: Shane Xiao <shane.xiao@amd.com>	2025-04-11 23:54:15 +08:00
Yiannis Papadopoulos	2d2c47bdef	rocr/aie: Increment write pointer upon packet submission	2025-04-08 15:36:40 -05:00
Yiannis Papadopoulos	c63e01724c	rocr/aie: Using PDI address instead of cu_mask for dispatch. Automatic hw ctx reconfiguration upon new PDI addition.	2025-04-03 15:13:20 -05:00
Yiannis Papadopoulos	e55503e7f8	rocr/aie: Bundling XDNA BOs and addresses, adding cleanup guard in case of error	2025-03-27 13:15:13 -04:00
Yiannis Papadopoulos	f4e1c9b0ba	rocr/aie: Avoiding XdnaDriver class in queue API	2025-03-27 13:15:13 -04:00
David Yat Sin	947391deac	rocr: Release agent resources before pools Adding a general stage for agents to release their resources on shutdown. This avoids a circular dependency during shutdown because we have to delete allocated resources before deleting memory pools, but we also have to delete memory pools before destroying agents.	2025-03-25 14:25:04 -04:00

1 2 3 4 5 ...

388 Commitit