rocm-systems

Автор	SHA1	Сообщение	Дата
David Yat Sin	469defa78a	Add agent query for nearest CPU agent Add agent info query to return nearest CPU agent. This can be used to determine which CPU agent is in the same NUMA region as the GPU agent. Change-Id: I5400b4347ffbf4d2a836df31c4de443a38b0ecd1	2023-07-24 13:59:13 -04:00
Jonathan Kim	0d14144e3a	Silence implicity conversion warnings in exception handling Silence unnamed enum warning in error code comparison Change-Id: I008b269c106bbad83a1f7588e7b4ec89ec17d37d	2023-07-24 10:06:55 -04:00
Jonathan Kim	42274cfc59	Fix out of order initializer for memory region Silence out of order initializer compile warnings during memory region initialization. Change-Id: Idbbdd93d3ea8cda289d25a473b3882b920b2e8d8	2023-07-24 09:58:37 -04:00
Lang Yu	e877840197	Add support for GC 11.5.0 and 11.5.1 Signed-off-by: Lang Yu <Lang.Yu@amd.com> Change-Id: I3c4116e78a5c1ddac2389f5fece57485bdb17f68	2023-07-22 16:06:22 +08:00
Shweta Khatri	a2d0adf9be	Correct evaluating condition to use logical AND Aqlpacket:IsValid() function: Replaced bitwise AND operator (&) with the logical AND operator (&&) when evaluating AQL packet type Change-Id: I59980bc206cc7eff424023fff0bb92b618aa8c70	2023-07-21 15:36:48 -04:00
David Yat Sin	687eb043d4	Add retain handle and get allocation properties Support function to retain allocation handle for memory mappings. The get allocation properties function will return the current allocation properties for existing memory mappings. This is part of patch series for Virtual Memory API. Change-Id: I0a53a11b6efc2b5bf9d463512a489a2abd812551	2023-07-21 15:17:01 -04:00
David Yat Sin	b03c96c264	Support exporting and importing memory mappings Support exporting and importing dmabuf file descriptors for memory mappings. The exported dmabuf file descriptors are shareable posix file descriptors that can be used for cross-vendor, cross-device and cross-process memory sharing. This is part of patch series for Virtual Memory API. Change-Id: I3673fc009f7e73bc26be8349e19f66e20d0607c5	2023-07-21 15:17:01 -04:00
David Yat Sin	13fbd8a232	Support Get and Set access for memory mappings Mapping memory handles to virtual memory addresses do not make them accessible. The set access function is needed to make the memory mappings accessible to specific agents. The get access function returns current access properties for individual agents. This is part of patch series for Virtual Memory API. Change-Id: I152ba0557fd2a802eb9d840568b68cdd1911b72c	2023-07-21 15:17:01 -04:00
David Yat Sin	179dcf1c77	Support mapping and unmapping memory handles Add support for mapping and unmapping memory handles to virtual address ranges. This is part of patch series for Virtual Memory API. Change-Id: If512d49ff4211e68f2064249add607a3200e458a	2023-07-21 15:17:01 -04:00
David Yat Sin	e4a84c4a9c	Support memory handles Add support for creating and releasing memory handles. Memory handles are memory allocations on device memory without a virtual address. This is part of patch series for Virtual Memory API. Change-Id: I5dfb162eb1661621cce171b2870a3c93b24d840e	2023-07-21 15:17:01 -04:00
David Yat Sin	1085311f1a	Support Virtual Address reservations Add support for reserving virtual address ranges. Virtual address ranges are addresses without any memory backing. These address ranges need to be mapped to memory handles later. This is part of patch series for Virtual Memory API. Change-Id: I5d066e7421d6896f933f524312afc230a13d594e	2023-07-21 15:17:01 -04:00
David Yat Sin	a55f11025b	Change libdrm initialization Change initialize libdrm device and file descriptor initialization to use new APIs from Thunk. Libdrm recommends that we re-use the same file descriptor thoughout the life of a process instead of re-creating new one each time. This is part of patch series for Virtual Memory API. Change-Id: I1c0b8d1bd660cd25478b5f94c84071b90d93fc6c	2023-07-21 15:17:01 -04:00
David Yat Sin	e65edb35fc	Add check/query for virtual memory API support Checks whether version of libdrm library installed on current system supports the amdgpu_device_get_fd API. This API is required to support the virtual memory API functions. The amdgpu_device_get_fd function was introduced in libdrm-2.4.109. Using a runtime check test instead of static dependency to be able to support previous APIs on older versions of libdrm. Add query for virtual memory API support. This is part of patch series for Virtual Memory API. Change-Id: Iec831eb24b5d1689c392e50ae86f4d52d4870ac4	2023-07-21 15:17:01 -04:00
David Yat Sin	3ebe1fdff9	Add query for recommended granularity size Add new query for recommended granularity size. This is the internal blocksize used. While the existing query for granularity size returns the minimum size possible, it is recommended that allocations and mappings are multiple of the recommended granularity size to minimise internal memory fragmentation. This is part of patch series for Virtual Memory API. Change-Id: Ia82c8f073b2a2c47ecd26fbb0aba27b8b7cd965f	2023-07-21 15:17:01 -04:00
David Yat Sin	a7ffddb265	Adding documentation for SDMA environment var Adding documentation for modifiers for SDMA copy Change-Id: I2425672c3ba1f1617d29b8f4b49776775d78a376	2023-07-20 15:15:04 +00:00
Shweta Khatri	82e7979c61	Fixes a bug that led to setting wrong access type for device local memory The access type for extended scope fine grained memory was being returned as never allowed by default Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5	2023-07-17 14:52:01 -04:00
Lancelot SIX	2f2ba050f6	Park waves for gfx11 and bump abi version to 9 On gfx11, with a sequence such as s_trap 2 s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) s_endpgm the s_sendmsg does deallocate registers while the wave is supposed to be stopped. As a result, the wave cannot do the expected context save operations, and cannot context save. To avoid this problem, park the wave in the trap handler for gfx11. Note that gfx11 has implemented an instruction cache prefetch. When parked, the prefetch tries to access memory past the end of trap handler which causes memory violation exceptions to be reported. To avoid this, we need to add padding at the end of the trap handler. The padding consists of `s_code_end` instructions Given that the trap handler is loaded at a 0x1000 aligned address the maximum prefetch amount (in bytes) is given by `256 - (trap_handler_size % 64)`. Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933	2023-07-15 09:44:50 -04:00
Jonathan Kim	70f0a44910	Release lock on thread yield during blit ops Thread yield doesn't drop the scoped acquired mutex so drop it around yield to prevent a multithread deadlock. Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a	2023-07-14 10:44:56 -04:00
David Yat Sin	bc585bd8de	Force clock sync on profiling enablement Fix a condition where we can get a divide-by-zero in the TranslateTime(tick) function if the GPU tick predates HSA startup and we did not do a SyncClocks since initialization. Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb	2023-07-07 10:08:54 -04:00
Ranjith Ramakrishnan	cd4632ccbc	Use memset for initializing variable sized array In ASAN builds, the compiler used is clang. The initialization of variable sized array using assignment operator is causing compilation failure in ASAN builds. Used memset to fix the same. Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e	2023-07-07 12:54:54 +00:00
Jeremy Newton	132a19e9c3	Fix non-x86 builds I've just reverted some code what it was in 5.5 by wrapping new x86 specific bits with #if's, e.g.: - CPUID is x86 specific - mwait is x86 specific Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>	2023-06-30 01:04:04 -04:00
Jeremy Newton	d1f025bff6	Only install asan license when enabled Change-Id: I7b2aad1042846401d7422ca499ef6912f49f6b50 Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com>	2023-06-29 10:20:16 -04:00
Philipp Knechtges	d220e16000	fix link-time ordering condition This fixes a segfault error in cases where the linking order of compilation unit varies. Reason behind the segfault is that one global variable in one compilation unit depends on another global variable in another compilation unit, but there is no guarantee that this other compilation unit is initialized first. The fix forces a reinitialization at the first invocation of the library. Change-Id: I1428592c6898bca13a330c4588941de260ff0370	2023-06-29 10:08:29 -04:00
David Yat Sin	60a0fd64c4	Add query for driver gpu_id Add query OS driver node ID (gpu_id) Change-Id: I72ebc54d8ae5dbcd1346535912160a642b1065ae	2023-06-23 15:02:48 +00:00
Konstantin Zhuravlyov	8a6edb07d9	Cache referenced symbol table when pulling data in relocation section Change-Id: I6ef21cedde1aca6fd1ec5e5d5634563f030eaab8	2023-06-21 16:35:45 -04:00
Jonathan Kim	92467fd282	Prevent unnecessary SDMA queue creation on copy on status Unless SDMA blits have actually been used for copies, prevent the DMA copy status from querying the blit's pending byte status to avoid creating an unnecessary HW queue. Change-Id: Ied1fbed73c08f0408f0e3583f9b56f2768c71708	2023-06-21 03:10:53 -04:00
Jonathan Kim	8c60f04a99	Prevent blit copy pending bytes query when out of SDMA resources Querying pending bytes on a blit kernel is unnecessary when runtime runs out of SDMA resource since we are returning an SDMA availabilty mask. Change-Id: I347efba0c85b70ea3ba8749d76a499afc23909e8	2023-06-21 03:10:52 -04:00
Shweta Khatri	77bf357647	Defined a new extended scope memory region Added HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXT_SCOPE_FINE_GRAINED flag to enable extended scope memory region where the device-scope atomics act as system-scope atomics Change-Id: I79fc3207cb630dfc68bed2f8aabd75f35fe80b12	2023-06-20 11:00:05 -04:00
James Zhu	36666f5895	Enable sleep for all waiters Enable sleep for all waiters with event age tracking support kernel. Change-Id: Icd4e1e8d83b4a54e9f6aaa99691a6573211b3337 Signed-off-by: James Zhu <James.Zhu@amd.com>	2023-06-20 09:32:16 -04:00
James Zhu	5871b28503	Add kernel version flag supports event age KFD kernel version 1.13 starts to support event age tracking which help elimating unncessary busy wait. Change-Id: Ib447ed6e0350f3110a4d6b9b80a0388000dd0e72 Signed-off-by: James Zhu <James.Zhu@amd.com>	2023-06-20 09:32:03 -04:00
Jonathan Kim	3e3e11bc5a	Ensure HSA_ENABLE_SDMA=0 persists on new copy on engine API Copy on engine API still needs to respect HSA_ENABLE_SDMA settings. Change-Id: I26038b1e3082d62687c2e279615557583d20f229	2023-06-19 13:48:59 -04:00
raghavmedicherla	4142a77375	[hsa-runtime] Add support to hsa-runtime to find symbols from ".dynsym" section. Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue no support to find symbols from ".dynsym" section. Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1 environment variable Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a	2023-06-16 14:40:50 -04:00
David Yat Sin	5e4490f180	Update documentation for IPC handles Explicitly mention that IPC handles can only be created on GPU agents. Change-Id: I19bc3578d6e5243c795bf6fbf981ea4bd3bfc2e8	2023-06-14 16:21:26 -04:00
Jonathan Kim	bfb94b3b6e	Soften trap handler loading failure when exception handling not supported GFX11 and up including some GFX9 devices will not support old trap handling without the new exception handling. Instead of a hard assert failure that runs into a core dump, let ROCr initialization continue instead. Change-Id: I309becdc72ef4fb2fafd118c1faf0801407e658e	2023-06-13 13:05:47 -04:00
Laurent Morichetti	6a82b0a038	Fix a race condition in the trap handler status.priv may be read after returning from the trap handler, which causes sq_interrupt_word_wave.priv to be 0 even though the s_sendmsg instruction was initiated when status.priv was 1. To work around this, added a s_waitcnt lgkmcnt(0) after s_sendmsg to make sure the message is sent before continuing. Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com> Change-Id: Ieb75005ca1559ef03d0efac80e966f521e41fcb7	2023-06-09 10:03:55 -04:00
Ammar ELWazir	fc603d58d2	: Adding support to UMC & MMEA System Blocks Change-Id: I92601f37757e0cff3f1fdc10f2e5e0db51c1ee2d	2023-06-08 21:22:19 +00:00
Jonathan Kim	233413eb08	Remove Tab Indent on SDMA Status Fix Use spaces not tabs. Change-Id: Icaeb16158ebaddd8e5ac518103d285d55fe976f3	2023-06-07 16:47:04 -04:00
Xiaomeng Hou	389cd3564b	Do not reserve scratch memory on asic with finite vram resource Change-Id: I0a2207cb01f464ed3e73331637cfa9bd62f03d97	2023-06-06 22:01:31 +08:00
David Yat Sin	e4fffa140a	Removing __linux__ definition in CMake Removing this definition as this should already be defined by compiler. This is causing compile errors on newer versions of llvm because the macro is being redefined. Change-Id: Ica6a06f46a14e16d3f52e83b9b5ee8cfd7359510	2023-06-05 12:23:56 -04:00
Xiaomeng Hou	557da77c4e	Correct the SDMA engine mask reported on apu There is only one SDMA instance on small APUs. Change-Id: I9d4dda511c40fc78f002be720e5f1909dc5b91e4	2023-06-02 19:10:08 +08:00
David Yat Sin	fc3b554121	Change failure to parse CPUID to warning Change-Id: If42dbcd11ac1be09597e43a8f11caa91cf37903e	2023-05-31 11:46:52 -04:00
David Yat Sin	b290d65ec9	Bump interface versions due to hsa_amd_memory_async_copy_on_engine added Change-Id: Iff36719e800280d58217647bb70d3b5d5fcc91fe	2023-05-26 12:04:06 +00:00
David Yat Sin	41f6d0426d	Adding gfx941 and gfx942 Adding support for gfx941 and gfx942 ISAs. gfx940 ISA will use sc0:1 sc1:1 on load/store operations gfx942 ISA will use default load/store operations Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579	2023-05-23 11:13:16 -04:00
David Yat Sin	50e754d08b	ASAN: Remap first page of allocations to host mem When compiling in ASAN mode, remap the first page of device allocations to system memory. ASAN's memory allocator uses a small amount of extra memory to store data for housekeeping purpose. But because this memory is from the GPU memory pool, it might have uncommon memory type for host to access. Mapping this section of memory to the host makes this memory accessible to ASAN. Change-Id: I36f659d616a4d15558372592439a8723c5c84a69 Signed-off-by: Bing Ma <Bing.Ma@amd.com>	2023-05-22 20:58:54 -04:00
David Yat Sin	a1f3b619a7	Add mutex when reserving scratch This prevents race condition when creating queues concurrently. Change-Id: I5ea9714926fe06e1719fcb2559cb485063355e4f	2023-05-19 11:05:13 -04:00
David Yat Sin	a397373cea	Add HSA_ENABLE_PEER_SDMA env variable Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to disable use of SDMA engines for device-to-device transfers. Note that setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override HSA_ENABLE_PEER_SDMA values. Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a	2023-05-18 00:10:20 +00:00
Ranjith Ramakrishnan	ad002f1e7b	Use the RUNPATH provided by build scripts RUNPATH in libraries will be : $ORIGIN RUNPATH in binaries will be : $ORIGIN/../lib Change-Id: Iafa66a8e02cc8c5783903d40927b63652042d2f1	2023-05-17 09:10:50 -04:00
David Yat Sin	39feb83b88	Update documentation for hsa_amd_pointer_info Update documentation for hsa_amd_pointer_info to clarify which fields are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN. Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985	2023-05-16 18:36:54 -04:00
David Yat Sin	38e832a682	Reserve scratch on first queue allocation Some workloads running on multi-GPU create 1 process per GPU. So each process creates a GPU agent on every GPU, but will only create queues on one GPU. This would cause un-necessary scratch reservation. Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22	2023-05-15 17:10:57 -04:00
Graham Sider	bd63e5045c	Fix scratch allocation occupancy reduction loop If the required scratch allocation is too large, ROCr will attempt to reduce it by lowering the dispatch's targeted occupancy. The reduction loop however was prone to overflow if waves_per_cu was not a multiple of waves_per_group. Ensure no overflow by aligning waves_per_cu to waves_per_group. On GC 9.4.3 dGPU, dispatches with a large grid size and a waves_per_group of e.g. 16 may require to reduce occupancy such that waves_per_cu is less than waves_per_group to ensure the allocation size is small enough. Allow this while also ensuring the tmpring scratch wave count is kept divisible by the number of SEs per XCC. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480	2023-05-15 14:55:42 +00:00

1 2 3 4 5 ...

814 Коммитов