rocm-systems

Автор	SHA1	Сообщение	Дата
David Yat Sin	56ccf828bc	Adding documentation for SDMA environment var Adding documentation for modifiers for SDMA copy Change-Id: I2425672c3ba1f1617d29b8f4b49776775d78a376 [ROCm/ROCR-Runtime commit: `a7ffddb265`]	2023-07-20 15:15:04 +00:00
Shweta Khatri	9fda38f0ba	Fixes a bug that led to setting wrong access type for device local memory The access type for extended scope fine grained memory was being returned as never allowed by default Change-Id: I0167ea0e5931053f22f2d2755bf426d43d2bb8e5 [ROCm/ROCR-Runtime commit: `82e7979c61`]	2023-07-17 14:52:01 -04:00
Lancelot SIX	09589e5929	Park waves for gfx11 and bump abi version to 9 On gfx11, with a sequence such as s_trap 2 s_sendmsg sendmsg(MSG_DEALLOC_VGPRS) s_endpgm the s_sendmsg does deallocate registers while the wave is supposed to be stopped. As a result, the wave cannot do the expected context save operations, and cannot context save. To avoid this problem, park the wave in the trap handler for gfx11. Note that gfx11 has implemented an instruction cache prefetch. When parked, the prefetch tries to access memory past the end of trap handler which causes memory violation exceptions to be reported. To avoid this, we need to add padding at the end of the trap handler. The padding consists of `s_code_end` instructions Given that the trap handler is loaded at a 0x1000 aligned address the maximum prefetch amount (in bytes) is given by `256 - (trap_handler_size % 64)`. Change-Id: I5446da54a965a64f21cb0fd3ce3caa4b6137a933 [ROCm/ROCR-Runtime commit: `2f2ba050f6`]	2023-07-15 09:44:50 -04:00
Jonathan Kim	babe58eb24	Release lock on thread yield during blit ops Thread yield doesn't drop the scoped acquired mutex so drop it around yield to prevent a multithread deadlock. Change-Id: Ie21f3bff89f6f9e4c57e5b3ccf17968f253fa23a [ROCm/ROCR-Runtime commit: `70f0a44910`]	2023-07-14 10:44:56 -04:00
David Yat Sin	b434d15a27	Force clock sync on profiling enablement Fix a condition where we can get a divide-by-zero in the TranslateTime(tick) function if the GPU tick predates HSA startup and we did not do a SyncClocks since initialization. Change-Id: I0dcec8553ccb8f01211928991f4b3ed3cb4a1ebb [ROCm/ROCR-Runtime commit: `bc585bd8de`]	2023-07-07 10:08:54 -04:00
Ranjith Ramakrishnan	a4d9fa592d	Use memset for initializing variable sized array In ASAN builds, the compiler used is clang. The initialization of variable sized array using assignment operator is causing compilation failure in ASAN builds. Used memset to fix the same. Change-Id: Ifc748291a41a9886243e0fb1ba576d2760f5e15e [ROCm/ROCR-Runtime commit: `cd4632ccbc`]	2023-07-07 12:54:54 +00:00
Jeremy Newton	b3f22fef0a	Fix non-x86 builds I've just reverted some code what it was in 5.5 by wrapping new x86 specific bits with #if's, e.g.: - CPUID is x86 specific - mwait is x86 specific Change-Id: I6cefae34282c777c7340daf3f934d2a11742502e Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> [ROCm/ROCR-Runtime commit: `132a19e9c3`]	2023-06-30 01:04:04 -04:00
Jeremy Newton	e80bd7f5b0	Only install asan license when enabled Change-Id: I7b2aad1042846401d7422ca499ef6912f49f6b50 Signed-off-by: Jeremy Newton <Jeremy.Newton@amd.com> [ROCm/ROCR-Runtime commit: `d1f025bff6`]	2023-06-29 10:20:16 -04:00
Philipp Knechtges	4a7c3a2607	fix link-time ordering condition This fixes a segfault error in cases where the linking order of compilation unit varies. Reason behind the segfault is that one global variable in one compilation unit depends on another global variable in another compilation unit, but there is no guarantee that this other compilation unit is initialized first. The fix forces a reinitialization at the first invocation of the library. Change-Id: I1428592c6898bca13a330c4588941de260ff0370 [ROCm/ROCR-Runtime commit: `d220e16000`]	2023-06-29 10:08:29 -04:00
David Yat Sin	175265aef4	Add query for driver gpu_id Add query OS driver node ID (gpu_id) Change-Id: I72ebc54d8ae5dbcd1346535912160a642b1065ae [ROCm/ROCR-Runtime commit: `60a0fd64c4`]	2023-06-23 15:02:48 +00:00
Konstantin Zhuravlyov	e126b5a054	Cache referenced symbol table when pulling data in relocation section Change-Id: I6ef21cedde1aca6fd1ec5e5d5634563f030eaab8 [ROCm/ROCR-Runtime commit: `8a6edb07d9`]	2023-06-21 16:35:45 -04:00
Jonathan Kim	dbf125b5cf	Prevent unnecessary SDMA queue creation on copy on status Unless SDMA blits have actually been used for copies, prevent the DMA copy status from querying the blit's pending byte status to avoid creating an unnecessary HW queue. Change-Id: Ied1fbed73c08f0408f0e3583f9b56f2768c71708 [ROCm/ROCR-Runtime commit: `92467fd282`]	2023-06-21 03:10:53 -04:00
Jonathan Kim	2147e8ccbf	Prevent blit copy pending bytes query when out of SDMA resources Querying pending bytes on a blit kernel is unnecessary when runtime runs out of SDMA resource since we are returning an SDMA availabilty mask. Change-Id: I347efba0c85b70ea3ba8749d76a499afc23909e8 [ROCm/ROCR-Runtime commit: `8c60f04a99`]	2023-06-21 03:10:52 -04:00
Shweta Khatri	76cc9034ff	Defined a new extended scope memory region Added HSA_AMD_MEMORY_POOL_GLOBAL_FLAG_EXT_SCOPE_FINE_GRAINED flag to enable extended scope memory region where the device-scope atomics act as system-scope atomics Change-Id: I79fc3207cb630dfc68bed2f8aabd75f35fe80b12 [ROCm/ROCR-Runtime commit: `77bf357647`]	2023-06-20 11:00:05 -04:00
James Zhu	3ac5245f3b	Enable sleep for all waiters Enable sleep for all waiters with event age tracking support kernel. Change-Id: Icd4e1e8d83b4a54e9f6aaa99691a6573211b3337 Signed-off-by: James Zhu <James.Zhu@amd.com> [ROCm/ROCR-Runtime commit: `36666f5895`]	2023-06-20 09:32:16 -04:00
James Zhu	2cf7c88b34	Add kernel version flag supports event age KFD kernel version 1.13 starts to support event age tracking which help elimating unncessary busy wait. Change-Id: Ib447ed6e0350f3110a4d6b9b80a0388000dd0e72 Signed-off-by: James Zhu <James.Zhu@amd.com> [ROCm/ROCR-Runtime commit: `5871b28503`]	2023-06-20 09:32:03 -04:00
Sreekant Somasekharan	bf22d10ceb	rocrtst: Fix RoundToPowerOf2 function Compiler behavior is undefined if the right operand is negative, or greater than or equal to the width of the promoted left operand. For release builds with address sanitizer enabled, this compiler optimization behavior leads to unsupported queue size value since current method shifts till 128 bits on a 64 bit value. Change-Id: Iddcc15b43d2331bc8bf5fc3aa4725f76844655ec Signed-off-by: Sreekant Somasekharan <sreekant.somasekharan@amd.com> [ROCm/ROCR-Runtime commit: `ea2f832a43`]	2023-06-19 19:17:49 -04:00
Jonathan Kim	63463b14c3	Ensure HSA_ENABLE_SDMA=0 persists on new copy on engine API Copy on engine API still needs to respect HSA_ENABLE_SDMA settings. Change-Id: I26038b1e3082d62687c2e279615557583d20f229 [ROCm/ROCR-Runtime commit: `3e3e11bc5a`]	2023-06-19 13:48:59 -04:00
raghavmedicherla	2758da98cd	[hsa-runtime] Add support to hsa-runtime to find symbols from ".dynsym" section. Earlier, hsa-runtime was unable to find symbols from a stripped ELF-image becasue no support to find symbols from ".dynsym" section. Looking for symbols in .dynsym is enabled by LOADER_USE_DYNSYM=1 environment variable Change-Id: I4f0e8dd0eb053a6066d4d49b670c52e51149531a [ROCm/ROCR-Runtime commit: `4142a77375`]	2023-06-16 14:40:50 -04:00
David Yat Sin	8c3acb3974	Update documentation for IPC handles Explicitly mention that IPC handles can only be created on GPU agents. Change-Id: I19bc3578d6e5243c795bf6fbf981ea4bd3bfc2e8 [ROCm/ROCR-Runtime commit: `5e4490f180`]	2023-06-14 16:21:26 -04:00
Jonathan Kim	1772d866c9	Soften trap handler loading failure when exception handling not supported GFX11 and up including some GFX9 devices will not support old trap handling without the new exception handling. Instead of a hard assert failure that runs into a core dump, let ROCr initialization continue instead. Change-Id: I309becdc72ef4fb2fafd118c1faf0801407e658e [ROCm/ROCR-Runtime commit: `bfb94b3b6e`]	2023-06-13 13:05:47 -04:00
Laurent Morichetti	3736a0ffeb	Fix a race condition in the trap handler status.priv may be read after returning from the trap handler, which causes sq_interrupt_word_wave.priv to be 0 even though the s_sendmsg instruction was initiated when status.priv was 1. To work around this, added a s_waitcnt lgkmcnt(0) after s_sendmsg to make sure the message is sent before continuing. Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com> Signed-off-by: Laurent Morichetti <Laurent.Morichetti@amd.com> Change-Id: Ieb75005ca1559ef03d0efac80e966f521e41fcb7 [ROCm/ROCR-Runtime commit: `6a82b0a038`]	2023-06-09 10:03:55 -04:00
Ammar ELWazir	5675ed837a	: Adding support to UMC & MMEA System Blocks Change-Id: I92601f37757e0cff3f1fdc10f2e5e0db51c1ee2d [ROCm/ROCR-Runtime commit: `fc603d58d2`]	2023-06-08 21:22:19 +00:00
Jonathan Kim	21f24c1348	Remove Tab Indent on SDMA Status Fix Use spaces not tabs. Change-Id: Icaeb16158ebaddd8e5ac518103d285d55fe976f3 [ROCm/ROCR-Runtime commit: `233413eb08`]	2023-06-07 16:47:04 -04:00
Xiaomeng Hou	99d3d2afbd	Do not reserve scratch memory on asic with finite vram resource Change-Id: I0a2207cb01f464ed3e73331637cfa9bd62f03d97 [ROCm/ROCR-Runtime commit: `389cd3564b`]	2023-06-06 22:01:31 +08:00
David Yat Sin	c83eee3f2b	Removing __linux__ definition in CMake Removing this definition as this should already be defined by compiler. This is causing compile errors on newer versions of llvm because the macro is being redefined. Change-Id: Ica6a06f46a14e16d3f52e83b9b5ee8cfd7359510 [ROCm/ROCR-Runtime commit: `e4fffa140a`]	2023-06-05 12:23:56 -04:00
Graham Sider	5ec7dcd4c4	Revert "Disable Queue_Validation_InvalidGroupMemory" This reverts commit `7a157d8e55`. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: I8424c96d5e5c3c9a9e7711ecff7c5372190b0d2d [ROCm/ROCR-Runtime commit: `e2c3c3e510`]	2023-06-05 09:41:02 -04:00
Graham Sider	74f9ba24e0	rocrtst: Remove extra clear_code_object() calls A patch was made in gfx940 npi branch to move the kernel object file loading to outside the rocrtstNeg.Queue_Validation_* main queue creation and submission loops, and added a clear_code_object() after the loop. Another patch was made to the non-npi branch which adds a clear_code_object() inside the loop. When the npi branch patch was merged, this was causing the code object to be cleared at the end of the first loop. Remove these clear_code_object() calls. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: Id4188e78411e81c5071bf715c1f02491f571ab79 [ROCm/ROCR-Runtime commit: `dbe2a82e35`]	2023-06-05 09:41:02 -04:00
Xiaomeng Hou	381ea164ba	Correct the SDMA engine mask reported on apu There is only one SDMA instance on small APUs. Change-Id: I9d4dda511c40fc78f002be720e5f1909dc5b91e4 [ROCm/ROCR-Runtime commit: `557da77c4e`]	2023-06-02 19:10:08 +08:00
David Yat Sin	9c54cdaaf1	Change failure to parse CPUID to warning Change-Id: If42dbcd11ac1be09597e43a8f11caa91cf37903e [ROCm/ROCR-Runtime commit: `fc3b554121`]	2023-05-31 11:46:52 -04:00
David Yat Sin	3661d76c74	Bump interface versions due to hsa_amd_memory_async_copy_on_engine added Change-Id: Iff36719e800280d58217647bb70d3b5d5fcc91fe [ROCm/ROCR-Runtime commit: `b290d65ec9`]	2023-05-26 12:04:06 +00:00
Graham Sider	f0eeb60222	rocrtst: Throw on LocateKernelFile open() failures Throw runtime error instead of returning empty string when open() fails in LocateKernelFile() Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: Iafa360fbc2d3c9b01b9fe7ea4c11d70bd254ccce [ROCm/ROCR-Runtime commit: `0772e8d618`]	2023-05-24 14:31:26 -04:00
David Yat Sin	3345ada378	Adding gfx941 and gfx942 Adding support for gfx941 and gfx942 ISAs. gfx940 ISA will use sc0:1 sc1:1 on load/store operations gfx942 ISA will use default load/store operations Change-Id: If1efbef86f59e2cf2d48fe359cd4166405a0a579 [ROCm/ROCR-Runtime commit: `41f6d0426d`]	2023-05-23 11:13:16 -04:00
David Yat Sin	959c897604	ASAN: Remap first page of allocations to host mem When compiling in ASAN mode, remap the first page of device allocations to system memory. ASAN's memory allocator uses a small amount of extra memory to store data for housekeeping purpose. But because this memory is from the GPU memory pool, it might have uncommon memory type for host to access. Mapping this section of memory to the host makes this memory accessible to ASAN. Change-Id: I36f659d616a4d15558372592439a8723c5c84a69 Signed-off-by: Bing Ma <Bing.Ma@amd.com> [ROCm/ROCR-Runtime commit: `50e754d08b`]	2023-05-22 20:58:54 -04:00
David Yat Sin	255a645c3b	Add mutex when reserving scratch This prevents race condition when creating queues concurrently. Change-Id: I5ea9714926fe06e1719fcb2559cb485063355e4f [ROCm/ROCR-Runtime commit: `a1f3b619a7`]	2023-05-19 11:05:13 -04:00
David Yat Sin	14052ab9d0	Add HSA_ENABLE_PEER_SDMA env variable Add support for HSA_ENABLE_PEER_SDMA env variable that can be used to disable use of SDMA engines for device-to-device transfers. Note that setting HSA_ENABLE_SDMA=0 will disable all SDMA transfers and override HSA_ENABLE_PEER_SDMA values. Change-Id: I737b3c2b2efcf3ff237f98bc748f49b8252ed24a [ROCm/ROCR-Runtime commit: `a397373cea`]	2023-05-18 00:10:20 +00:00
Ranjith Ramakrishnan	82b4216e40	Use the RUNPATH provided by build scripts RUNPATH in libraries will be : $ORIGIN RUNPATH in binaries will be : $ORIGIN/../lib Change-Id: Iafa66a8e02cc8c5783903d40927b63652042d2f1 [ROCm/ROCR-Runtime commit: `ad002f1e7b`]	2023-05-17 09:10:50 -04:00
David Yat Sin	b8e97a8d1b	Update documentation for hsa_amd_pointer_info Update documentation for hsa_amd_pointer_info to clarify which fields are invalid when the allocation type is HSA_EXT_POINTER_TYPE_UNKNOWN. Change-Id: Idaed985962c4a98d281ebe01bef8ec2459da3985 [ROCm/ROCR-Runtime commit: `39feb83b88`]	2023-05-16 18:36:54 -04:00
David Yat Sin	7ecdefb7ca	Reserve scratch on first queue allocation Some workloads running on multi-GPU create 1 process per GPU. So each process creates a GPU agent on every GPU, but will only create queues on one GPU. This would cause un-necessary scratch reservation. Change-Id: I50a216f0bcc0b5f707f3943147390b0ecec1ac22 [ROCm/ROCR-Runtime commit: `38e832a682`]	2023-05-15 17:10:57 -04:00
Graham Sider	53b5692d07	Fix scratch allocation occupancy reduction loop If the required scratch allocation is too large, ROCr will attempt to reduce it by lowering the dispatch's targeted occupancy. The reduction loop however was prone to overflow if waves_per_cu was not a multiple of waves_per_group. Ensure no overflow by aligning waves_per_cu to waves_per_group. On GC 9.4.3 dGPU, dispatches with a large grid size and a waves_per_group of e.g. 16 may require to reduce occupancy such that waves_per_cu is less than waves_per_group to ensure the allocation size is small enough. Allow this while also ensuring the tmpring scratch wave count is kept divisible by the number of SEs per XCC. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: Ie4016dcd8166a9ae69e9decc26a3eec882b49480 [ROCm/ROCR-Runtime commit: `bd63e5045c`]	2023-05-15 14:55:42 +00:00
David Yat Sin	2d924e337d	Do not report reserved scratch cache as available Scratch cache reserved memory is only available for scratch memory use so do not report this memory as available to the user via the HSA_AMD_AGENT_INFO_MEMORY_AVAIL api. Change-Id: I52f96e62536458bcaa52b9f4be5de856d5680dc4 [ROCm/ROCR-Runtime commit: `3477fbc661`]	2023-05-15 09:45:31 -04:00
David Yat Sin	e1ded285a9	Removing invalid gfx entries Change-Id: I1a9a9a064f5f65ecc3e124c5dd7d6baf6b5ccb5c [ROCm/ROCR-Runtime commit: `f0000da7b3`]	2023-05-12 11:59:27 -04:00
David Yat Sin	7a157d8e55	Disable Queue_Validation_InvalidGroupMemory Temporarily disabling rocrtstNeg.Queue_Validation_InvalidGroupMemory until it is fixed. Change-Id: Ifc1973a960c8d0bae27e2628e4bfddc60f70325d [ROCm/ROCR-Runtime commit: `7b74271d5e`]	2023-05-12 11:03:26 -04:00
Saleel Kudchadker	5630103f4a	Report XGMI SDMA upon query Report XGMI SDMA engines when queried for H2D/D2H. Change-Id: I4fb7b24bc15d1745b3844485bdeab71282a787a5 [ROCm/ROCR-Runtime commit: `adf6512dad`]	2023-05-11 12:20:41 -04:00
David Yat Sin	35e72e3d97	Fix incorrect check for image support Change-Id: I77476204d40c245c9d9091853264a4e9fbb80725 [ROCm/ROCR-Runtime commit: `9b35ce5b3b`]	2023-05-10 20:13:54 +00:00
Ranjith Ramakrishnan	dd9fdba22c	Set the default value of ROCM_HEADER_WRAPPER_WERROR to OFF Using wrapper header files will result in #warning message by default Change-Id: I87739cabb365b9370b1182cf23ca9b54d99149c3 [ROCm/ROCR-Runtime commit: `fbcbcd9e73`]	2023-05-10 00:47:33 -04:00
Sam Wu	56ec0e6412	add sphinx configurations Change-Id: I1a66a02b18fb699415a87a6473eb72c097a13b5f [ROCm/ROCR-Runtime commit: `57b3fcde51`]	2023-05-08 15:58:01 -06:00
Graham Sider	e2fc46c189	rocrtst: Move kernel object loading outside of loops Negative queue validation tests were doing many redundant from-file kernel object loads in a loop. This was creating many simulataneous open file handles within many dynamically allocated CodeObject objects. While the CodeObject class implements RAII on the file handles to cleanup on destruction, clear_code_object() only gets called on the destruction of the TestBase-derived test objects (these being a suite abstraction). Due to this we were hitting file open() EMFILE errors (too many open files) in gfx94x CPX mode. Move LoadKernelFromObjFile outside of the test loops and clear_code_object() for each test on each agent. Signed-off-by: Graham Sider <Graham.Sider@amd.com> Change-Id: I6f9d23fd122720c49a58c22698f097906d2fc97c [ROCm/ROCR-Runtime commit: `7a4c9273d7`]	2023-04-27 16:16:12 -04:00
David Yat Sin	11541cc283	Add env var to override SRAM ECC Add HSA_ENABLE_SRAMECC environment variable that can be used to override SRAM ECC mode reported by KFD Change-Id: I2b95511820a2d3d146a76b03070659c0695b61fd [ROCm/ROCR-Runtime commit: `a180c9ee78`]	2023-04-27 16:16:05 -04:00
David Yat Sin	101755c207	Add query for number of XCCs per agent Change-Id: I4b694b4904ba0326c998356388a62c19a972a7ff [ROCm/ROCR-Runtime commit: `f024d21e3d`]	2023-04-27 16:15:59 -04:00

1 2 3 4 5 ...

1012 Коммитов