rocm-systems

Author	SHA1	Message	Date
Tony Tye	9cdf39a706	AQL packet header may need to be loaded atomically An AQL packet header field is stored using an atomic release, and needs to be read using atomic acquire if it may be written by another thread. Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749 [ROCm/ROCR-Runtime commit: `395ad3b77b`]	2023-10-18 12:54:36 -04:00
Tony Tye	fd757292fb	Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add support to intercept queue to invoke a callback for these packets. Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149 [ROCm/ROCR-Runtime commit: `23b4ce501d`]	2023-10-18 12:54:36 -04:00
Tony Tye	52d6235a1d	Prevent accessing packets outside intercept queue When the intecept queue copies packets from the proxy queue to the wrapped queue, it should not attempt to copy packets that are outside the proxy queue. This could happen if the user of the proxy queue advances the write pointer beyond the number of free slots and the packet rewriter reduces the number of packets. Change-Id: Id02f5df8aee0ed7269f4de813731d507cf2126b3 [ROCm/ROCR-Runtime commit: `b020f66d39`]	2023-10-18 12:54:36 -04:00
Tony Tye	63c3bafab7	Support intercept queue with multiple packet rewriters If an intercept queue is created and multiple packet rewriters are registered, and if one of the rewriters invokes the packet writer multiple times, then on returning from the packet writer the packet rewriter index needs to be restored. Otherwise the next packet writer call will start with an index of 0 which will be decremented and result in out of bounds vector access. Change-Id: Icb3f6a81ea04f1f7b91551b974a1f48c4f32db60 [ROCm/ROCR-Runtime commit: `b64a845105`]	2023-10-18 12:54:36 -04:00
Tony Tye	0f1e43f6f3	Intercept queue handling for large rewrites It is possible that packet rewriting an initial packet for the intercept queue produces more packets that the size of the wrapped queue. The code would never submit the such a set of packets as it attempted to submit all or none. This can result in an infinite loop. This is corrected to submit what will fit if the rewrite is larger than the wrapped queue. Change-Id: I8f03228c2e15151287e25de46eaee998f829c62a [ROCm/ROCR-Runtime commit: `9f4d651d14`]	2023-10-18 12:54:36 -04:00
Tony Tye	9cd957942d	Make intercept queue submission obstruction free The intercept queue submit needs to be obstruction free as it can be invoked by the runtime async handler helper thread. The code had a busy wait loop waiting for a free slot to be available to add the retry barrier packet. Blocking that thread prevents it servicing other async handlers which may need to execute in order to allow packets on the hardware queue to be processed to free up a slot. Change the code to always leave one free slot unless there is a retry barrier packet already on the queue. Change-Id: If901c865550258b790b995d58037b0f99f1968cc [ROCm/ROCR-Runtime commit: `d16c392338`]	2023-10-18 12:54:36 -04:00
Tony Tye	d1a017311d	Clarify intercept queue retry packet detection Describe the assumption being made when checking if there is a retry barrier packet on the queue. Also enforce the consequential requirement of the minimum queue size. Change-Id: I0efaffc5a79b9e2fdab3655b8b74270118a5c2ff [ROCm/ROCR-Runtime commit: `ca99795c58`]	2023-10-18 12:54:36 -04:00
Tony Tye	5bb0cc60f5	Correct intercept queue handling of the overflow queue The intercept queue was processing all the packets on the proxy queue. This could result in the rewrite of more than one packet being put on the overflow queue. If there are a lot of packets on the intercept queue this could result in the overflow queue having more packets than the size of the hardware queue. The code to submit the overflow queue fails if it is unable to put all the packets of the overflow on the hardware queue. This resulted in an infinite loop. It also resulted in an assert being reported that packets are being added to the overflow queue when it is not empty. Correct this by checking if the overflow queue is non-empty after rewriting each packet. If it is non-empty then stop processing additional packets. The additional packets will be processed when the barrier packet added to the hardware queue is executed due to its asyn handler. This barrier packet is added to the hardware queue whenever packets are saved on the overflow queue. Change-Id: I2537911d3c3ba1aac61a0a35f1ab97426a66b5a2 [ROCm/ROCR-Runtime commit: `be6b8bb055`]	2023-10-18 12:54:36 -04:00
Jonathan Kim	9a0c51ae92	Use user requested engine ID when forcing SDMA copies When forcing SDMA copies, engine ID specified by the requester should still be used since the requester has hint of engine availability. Change-Id: Idefa9494e407e31da510aa4c7c1fa283c85a4f6e [ROCm/ROCR-Runtime commit: `a36856b02a`]	2023-10-18 10:45:02 -04:00
David Yat Sin	dc0cfa8a54	Fix escape-to-IB packet definition The Vendor specific header is only 8-bits and this would break the behavior on big-endian machines. Renaming field to amd_format to match name in spec sheets. Change-Id: I65559757657565d3d3ff489d2663a0be42cf8ba5 [ROCm/ROCR-Runtime commit: `22be526230`]	2023-10-13 13:37:49 +00:00
Rajneesh Bhardwaj	bb813a14d4	libhsakmt: Use MADV_HUGEPAGE for large allocations For large memory allocations (>2MB) the thunk should use the MADV_HUGEPAGE flag for madvise call to optimize allocation performance on certain operating systems that rely on madvise hint when Traspatent Huge Pages is not set to always. Suggested-by: Joseph Greathouse <joseph.greathouse@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Change-Id: Ic0c753f89a177b0f715942d6e2a7108b08a85f20 [ROCm/ROCR-Runtime commit: `5047eb161f`]	2023-10-12 17:01:34 -04:00
Tomasz Kłoczko	3c5a5397fe	install .pc files in libdir Provided pkgconfig file contains interface description which is arch dependent. In such cases .pc files should be installed in libdir. Signed-off-by: Tomasz Kłoczko <kloczek@github.com> Change-Id: Ibbc85ad4aee1ef014c409dfa63313873b590464b [ROCm/ROCR-Runtime commit: `a226542fc3`]	2023-10-11 15:50:38 -04:00
Torsten Keßler	7d30e82f41	Mark new symbols in ROCm 5.7.0 as global Change-Id: Ia0391cac7f432f019dea94f98a145dbf8120817d [ROCm/ROCR-Runtime commit: `b44cca813d`]	2023-10-11 15:50:27 -04:00
Philip Yang	cf6745a36c	libhsakmt: Set CWSR range granularity Set CWSR svm range granularity to 0xff, then KFD will migrate the entire CWSR range from VRAM back to system memory when recovering the CPU page fault if rocgdb access CWSR area, this avoid the partial CWSR range migration and stall CWSR GPU mapping issue. This is a temporary workaround, it should be reverted once the KFD is fixed. Change-Id: I80a7248244574edba25b13858b7ebcf1c77b8930 Signed-off-by: Philip Yang <Philip.Yang@amd.com> [ROCm/ROCR-Runtime commit: `85a47fa66b`]	2023-10-06 10:46:40 -04:00
David Yat Sin	254e8219b3	libhsakmt: Fix incorrect flags for ext coherence Change-Id: I89c838b9fbdb85589691f29806ae15884b25592f [ROCm/ROCR-Runtime commit: `73efd3a14e`]	2023-10-04 15:00:58 +00:00
David Yat Sin	dbdb3af4fc	rocrtst:Add tag for extended-scope fine grain Change-Id: I2a64cf3fb476271b0a5d025fb6989feb40d676bb [ROCm/ROCR-Runtime commit: `d021055ada`]	2023-10-03 15:36:20 -04:00
David Yat Sin	9049e53b91	Allow CPU cache info to be empty Some new CPUs have different cache reporting structure causing thunk to leave the cache information empty. Allow the cache information for CPU agents to be empty as they are not used by language-runtimes Change-Id: Ic5e880171ab20aa114b4b62bdb4479eb54066f7b [ROCm/ROCR-Runtime commit: `96b3c4a0aa`]	2023-10-03 13:44:10 +00:00
James Zhu	cc54135848	kfdtest: remove IOMMUv2 performance monitor support IOMMUv2 is removed from AMDGPU/KFD. Change-Id: Ia00f9aa879a5f32a42bec914936d105d6845bc60 Signed-off-by: James Zhu <James.Zhu@amd.com> [ROCm/ROCR-Runtime commit: `693e686c4d`]	2023-09-30 09:16:49 -04:00
James Zhu	0379e077b4	libhsakmt: remove share resource in performance counter This share resource is for IOMMUv2 which is removed from AMDGPU/KFD. Change-Id: Ia6e9311f1adc56fac2c9e8fa05b24c5ec8c272a5 Signed-off-by: James Zhu <James.Zhu@amd.com> [ROCm/ROCR-Runtime commit: `d195deeec4`]	2023-09-30 08:54:10 -04:00
James Zhu	6d52c1f332	libhsakmt: remove iommu_block which supports IOMMUv2 performance IOMMUv2 is removed from AMDGPU/KFD. Change-Id: I9fcf20ae9288cb40bb4b696284fc70534fb6484b Signed-off-by: James Zhu <James.Zhu@amd.com> [ROCm/ROCR-Runtime commit: `277d5e27ff`]	2023-09-30 08:54:10 -04:00
James Zhu	9533c318bb	libhsakmt: remove IOMMUv2 performance monitor support IOMMUv2 is removed from AMDGPU/KFD. Change-Id: Ib87f501c07d9de90e6b83b98f98daacd5913e98a Signed-off-by: James Zhu <James.Zhu@amd.com> [ROCm/ROCR-Runtime commit: `274b5b51ca`]	2023-09-30 08:54:10 -04:00
Shweta Khatri	ecde4153d8	Using new KFD HSA extended coherent memory flag Using new ExtendedCoherent KFD HSA memory flag to achieve system scope coherence on atomic instructions. Non-compliant systems may have the need to perform explicit HDP flushes to achieve system scope coherence using this flag. Change-Id: Ic6b47c0e97285086fa1f52bbfa4597b81cadafeb [ROCm/ROCR-Runtime commit: `4eb6ed7799`]	2023-09-25 10:36:04 -04:00
David Yat Sin	351cbe9dc7	Add extended coherence memory flag Add support for new flag for memory allocation that will provide system-scope coherent atomics Change-Id: I426d66223e8d2b570f69b4c0e61145ce9b2290d2 [ROCm/ROCR-Runtime commit: `8e06dce573`]	2023-09-22 11:03:00 -04:00
David Yat Sin	08fc87ecba	Use scope guards to release ref counts Some negative tests can trigger C++ exceptions to be thrown, which causes code to leave the ref counts in inconsistent state. Change-Id: Ifa6d8be986941efcdf20d7ac8b86eb15a8fe9932 [ROCm/ROCR-Runtime commit: `06eefdeb1b`]	2023-09-20 15:08:52 -04:00
David Yat Sin	b060204498	Fix hsa_amd_vmem_get_access to accept offset pointers Modify hsa_amd_vmem_get_access to handle pointers that are within VA range of an existing memory mapping Change-Id: I9f806ec39f6e9a33da8d86dd65d9a472438fa8ed [ROCm/ROCR-Runtime commit: `dd61f54171`]	2023-09-20 14:03:37 -04:00
David Yat Sin	48cb2f5a9e	Add query for Xnack enabled Add system query for whether Xnack is enabled on a system. Change-Id: I2832110e4f33f6a951d13acd06636442debf27ae [ROCm/ROCR-Runtime commit: `22becfb1e8`]	2023-09-19 00:25:30 +00:00
Jonathan Kim	abc017f83b	kfdtest: temporarily exclude address watch testing The debug address watch test will hang when running with the entire KFD test. Disable it for now. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: I1d0479fa2717d2f398cc32e0605ca6dcc17ebcd5 [ROCm/ROCR-Runtime commit: `986e82d677`]	2023-09-14 09:07:20 -04:00
Jonathan Kim	d04acccc26	Set correct overrides settings for GangLeader functions Silence warnings on more stringent compile checks for lack of override declaration. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: Iaa54dfc3dd74f5ee55763cafbbcf2db73493bb21 [ROCm/ROCR-Runtime commit: `6b4365ae4c`]	2023-09-12 15:56:34 -04:00
Jonathan Kim	8283807bca	Use camel case for KFDDBGTest shaders Debug test shaders should use camel case and suffix *Isa to match other test shader naming convention. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: I64e14183ba1c7c9664b13a742a0e5683866e8223 [ROCm/ROCR-Runtime commit: `fcec22716a`]	2023-09-12 15:38:12 -04:00
David Yat Sin	2052be1d1d	Pre-allocate memory for 16K signals On busy systems, the memory allocation can take long duration and increase calls to hsa_signal_create/hsa_amd_signal_create. This mitigates this issue. Change-Id: Ib7640273262ebc3dbf1f07049ce5da10b1d6b158 [ROCm/ROCR-Runtime commit: `9a127193a8`]	2023-09-11 13:08:28 -04:00
Ori Messinger	805eeffa32	kfdtest: Fix String NULL Check MCPU const char * always returns true, so check the value instead. Before: if (!MCPU) { After: if (!*MCPU) { Signed-off-by: Ori Messinger <Ori.Messinger@amd.com> Change-Id: I414e091ca764095937311648c534351d6abf30e6 [ROCm/ROCR-Runtime commit: `5f117f7608`]	2023-09-08 16:36:01 -04:00
Jonathan Kim	6746de9752	kfdtest: temporarily exclude debug suspend queues test For some reason, non-Ubuntu builds have some sort of memory corruption when running this test, which affect subsequent running tests. Disable it for now. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: I5f54ee4c63286a33c6948bc818aa1501c4a6751e [ROCm/ROCR-Runtime commit: `6ec529fe68`]	2023-09-08 12:12:13 -04:00
David Yat Sin	2a2555dd52	Update blit shaders for gfx94x Change-Id: Ic8def71aa0c6ab9a9a758877a65ca6b5625e8f1e [ROCm/ROCR-Runtime commit: `6ce1586def`]	2023-09-08 09:43:31 -04:00
Shweta Khatri	e2c5ecb8dc	Use LLVM compiler to build blit shaders Generates shader bytecode stream in amd_blit_shaders_v2.h at build time Change-Id: I5228ec5442a78d074fd85ca9cd7f7a156dd84da3 [ROCm/ROCR-Runtime commit: `4e675ce730`]	2023-09-08 09:42:29 -04:00
David Yat Sin	590cac0321	Fix clang compile warnings Change-Id: Iea9afc3d998a6c5db28af6c7b54939960b11ae95 [ROCm/ROCR-Runtime commit: `3ee6c9b0e2`]	2023-09-07 12:00:02 -04:00
David Yat Sin	3e286607ca	Fix for always returning 64 for cacheline size Change-Id: I0e31d306a2e051ecb9ac019c4e6f5efa25eabba0 [ROCm/ROCR-Runtime commit: `4770b210f6`]	2023-08-31 13:50:49 +00:00
David Yat Sin	5b9dcfd0d8	Update interface version for virtual memory APIs Change-Id: Ifbf1af08ee7aa4d55387ff9786f6a61b89b56f88 [ROCm/ROCR-Runtime commit: `1e7b078628`]	2023-08-30 17:01:13 -04:00
David Yat Sin	4e46eded66	Increment HSA API table stepping on new APIs Add compile time asserts to force incrementing API table STEP versions each time a new function is added to each table. This is required for profiler team to be able to add preprocessor macros to determine which versions contain the new APIs. Also incrementing the major versions to 2 to indicate new numbering scheme. Change-Id: I148a436a5ceab6be3906f8263b40ea9b07841577 [ROCm/ROCR-Runtime commit: `03f2f69d16`]	2023-08-29 21:59:36 +00:00
Jonathan Kim	c7942fd93f	kfdtest: replace 0 initialized dbg structs with memset Use memset to avoid general 0 set padding issues and ASAN compile issues for debug tests. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: I0a5aca5b7b631083599573b47f1ae87d5d0d5d71 [ROCm/ROCR-Runtime commit: `f9e20c8a93`]	2023-08-29 11:25:56 -04:00
Lang Yu	9bfa5c5737	kfdtest: add blacklist for gfx1150 and gfx1151 Change-Id: If78840e57c2523696c620d28f4c4ffb004128c0c Signed-off-by: Lang Yu <Lang.Yu@amd.com> [ROCm/ROCR-Runtime commit: `65ca3317f2`]	2023-08-24 17:27:04 +08:00
Jonathan Kim	9e533f6664	Submit a minimum of 64 DWORDs for SDMA submissions for some GFX9 devices Some GFX9 devices will drop commands if ring buffer submission is less than 64 DWORDs. Pad submission with a NOP head an trailing null DWORDs in this case. Change-Id: I850af490fb699f7efe8aef96d97c600a8e76516b [ROCm/ROCR-Runtime commit: `cdd0728d9b`]	2023-08-23 13:36:29 -04:00
David Yat Sin	0637810752	Fix memory pool ALLOC_REC_GRANULE query Also changed enum value to leave gap between enums that only exist in hsa_region_info_t and enums that exist in both hsa_amd_memory_pool_info_t Change-Id: I8f9f31200de66648e9328e4203ab283068c993f0 [ROCm/ROCR-Runtime commit: `4317f8dece`]	2023-08-22 17:46:48 -04:00
David Yat Sin	777df5c6dc	Fix flags passed to thunk for address reserve Fix flags passed to thunk when reserving address only Change-Id: Ic91d4c3393cc6a2b98e6bc5ed3575d40fa5e1424 [ROCm/ROCR-Runtime commit: `7be305b83c`]	2023-08-22 14:01:49 -04:00
Jonathan Kim	ad613e1644	Clean up SDMA ganging We don't need to keep track of specific blit engines in gang for submission anymore as ganging early exits on pending bytes. So tidy up the fluff. Change-Id: I77e80bf1ad8f561a03fff77bce33aa09d02760c6 [ROCm/ROCR-Runtime commit: `132815bcfb`]	2023-08-22 05:57:04 -04:00
Ranjith Ramakrishnan	245c5e2a40	Use memset for initializing variable sized array In ASAN builds, the compiler used is clang. The initialization of variable sized array using assignment operator is causing compilation failure in ASAN builds. Used memset to fix the same. Change-Id: I02aef3b99a6cad0cce3a378210a48732e07a88fb [ROCm/ROCR-Runtime commit: `65911e8368`]	2023-08-21 12:01:58 -07:00
Jonathan Kim	9bcc4bafdf	kfdtest: add trap on wave start and end test Add test to catch trap on wave start or end override event. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: Icb57af64475fbd2d8a6c0af9a2ee5db5d1a169c6 [ROCm/ROCR-Runtime commit: `a3f8085025`]	2023-08-18 12:15:08 -04:00
Jonathan Kim	1cd6019517	kfdtest: add address watch test Address watch test will test read and write operations. Test will also check if operation is precise if precise address watch is available. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: I7ef835790e26bf6345682755d7dd26a35853bcd5 [ROCm/ROCR-Runtime commit: `8311ca5bfa`]	2023-08-18 12:15:07 -04:00
Jonathan Kim	8d3881c268	kfdtest: Add ops for address watch test Add wave launch override, set/clear address watch and precise memops test. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: Ib405d5570cd304e02c2e76eca3593cbd9a5937d9 [ROCm/ROCR-Runtime commit: `431dc8d403`]	2023-08-18 12:11:48 -04:00
Jonathan Kim	616f5ed0af	kfdtest: add memory violation test Add memory violation detection test. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: I1b56f684682836fc84fbec713bd81c53bdd6d413 [ROCm/ROCR-Runtime commit: `d4029a9492`]	2023-08-18 12:11:48 -04:00
Jonathan Kim	e8ea199d97	kfdtest: allow toggle of dispatch privilege For GFX11 debugger testing, waves require to start in non-priv mode for some test cases, so allow tester to set this. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: Iee93fda926bfd336d51c79c086f1f75bc35b70e5 [ROCm/ROCR-Runtime commit: `6c5121faff`]	2023-08-18 12:09:07 -04:00

... 12 13 14 15 16 ...

2930 Commits