rocm-systems

Автор	SHA1	Сообщение	Дата
Lang Yu	991bbdcf24	Revert "Revert "Add support for GC 11.5.0 and 11.5.1"" This reverts commit `ebc51dd0eb`. gfx1150/1151 is merged into mainline now. Change-Id: Id179949318a37888c74abb5a8610d95bc2f22906	2023-12-04 15:03:31 +00:00
David Yat Sin	642165b1bc	Increase scratch aperture size to 4GB per XCC Change-Id: Ia02cea45ce8b782527f44fec539b0ab7cc453200	2023-12-04 15:03:31 +00:00
Jonathan Kim	81c64228e0	Increase SDMA copy size SDMA4.4 and SDMA5.2+ has increased it's available copy size to 2^30 bytes represented by exponent as bits set in the COUNT field of the linear copy. Also note that the full 2^22 byte limit is available from SDMA4 onwards as it has corrected the 0x3fffe0 HW limitation from SDMA3. As copy limit has increase, this can change system performance so provide env var HSA_ENABLE_SDMA_COPY_SIZE_OVERRIDE=0 to fall back to the original 0x3fffe0 limit for debugging purposes. Change-Id: I0fb6e5378f68e5b8a00ff559271691a943ee06ee	2023-12-04 15:03:31 +00:00
Youssef Aly	ae1da390bd	Enabled profiling for CPU agents for memcpy activities To be able to trace memcpy asynchronously, both dst and src agents need to have profiling enabled and the api for enabling profiling was only enabling for gpu agents. CPU agents didn't have profiling enabled so the signal owner could not be known. hsa_amd_profiling_get_async_copy_time will fail with an HSA status error because it can't read the agent for the given signal. Change-Id: Ie165e0e39b8fcd6992a55695b9ffcead10a8e812	2023-12-04 15:01:59 +00:00
Jonathan R. Madsen	f9cf1852e5	rocprofiler-register support - Update CMakeLists.txt - find_package for rocprofiler-register - this is an optional package until rocprofiler-register is added to the CI - define HSA_VERSION_{MAJOR,MINOR,PATCH} ppdefs - Update runtime.cpp - include <rocprofiler-register/rocprofiler-register.h> - if rocprofiler-register succeeds, do not support v1 unless explicitly requested Change-Id: I8f48bbf3f6b52fb91ddade2f198491a1256035fe	2023-12-04 15:01:59 +00:00
Jonathan Kim	2f847cf05f	Restore default code object version usage for ROCr and ROCr Test Remove override that forces ROCr image blit source and ROCr test to use code object version 4 now that mainline has been updated to version 5. Change-Id: I94681e86835c0e382475306ead4cd4132a2ee78f	2023-12-04 15:01:44 +00:00
David Yat Sin	750212e50e	Handle HW_EXCEPTION events Add handler to handle HW exception events reported by underlying drivers. These events are generally caused by GPU resets and need the application to abort. As an improvement, in the future, we can provide additional information about the exception (e.g mode-reset level) Change-Id: If3fb5f19f9fce181a9d3b5e34a5506725856e7b0	2023-11-20 14:49:26 +00:00
David Yat Sin	1a7de9588e	Add LoongArch64 Support Patch submitted by user Xinmudotmoe on github Change-Id: I58fd035b4ec4856f20d63747ababd49fa9764348	2023-10-26 11:36:16 -04:00
Tony Tye	7955fb01ec	Make AqlPacket::string more robust AqlPacket::string should check the packet type is in range of the array used to print its name. Change-Id: I33dabbd941d086929526d842c9dbc0bd7305acd5	2023-10-18 12:54:36 -04:00
Tony Tye	395ad3b77b	AQL packet header may need to be loaded atomically An AQL packet header field is stored using an atomic release, and needs to be read using atomic acquire if it may be written by another thread. Change-Id: I1d75587fd93f9c6216deebffc9a627b404a7e749	2023-10-18 12:54:36 -04:00
Tony Tye	23b4ce501d	Add AMD_AQL_FORMAT_INTERCEPT_MARKER vendor packet Define AMD_AQL_FORMAT_INTERCEPT_MARKER AMD vendor AQL packet. Add support to intercept queue to invoke a callback for these packets. Change-Id: Ia58d5fe2171f563632b4edd6343e02585f49d149	2023-10-18 12:54:36 -04:00
Tony Tye	b020f66d39	Prevent accessing packets outside intercept queue When the intecept queue copies packets from the proxy queue to the wrapped queue, it should not attempt to copy packets that are outside the proxy queue. This could happen if the user of the proxy queue advances the write pointer beyond the number of free slots and the packet rewriter reduces the number of packets. Change-Id: Id02f5df8aee0ed7269f4de813731d507cf2126b3	2023-10-18 12:54:36 -04:00
Tony Tye	b64a845105	Support intercept queue with multiple packet rewriters If an intercept queue is created and multiple packet rewriters are registered, and if one of the rewriters invokes the packet writer multiple times, then on returning from the packet writer the packet rewriter index needs to be restored. Otherwise the next packet writer call will start with an index of 0 which will be decremented and result in out of bounds vector access. Change-Id: Icb3f6a81ea04f1f7b91551b974a1f48c4f32db60	2023-10-18 12:54:36 -04:00
Tony Tye	9f4d651d14	Intercept queue handling for large rewrites It is possible that packet rewriting an initial packet for the intercept queue produces more packets that the size of the wrapped queue. The code would never submit the such a set of packets as it attempted to submit all or none. This can result in an infinite loop. This is corrected to submit what will fit if the rewrite is larger than the wrapped queue. Change-Id: I8f03228c2e15151287e25de46eaee998f829c62a	2023-10-18 12:54:36 -04:00
Tony Tye	d16c392338	Make intercept queue submission obstruction free The intercept queue submit needs to be obstruction free as it can be invoked by the runtime async handler helper thread. The code had a busy wait loop waiting for a free slot to be available to add the retry barrier packet. Blocking that thread prevents it servicing other async handlers which may need to execute in order to allow packets on the hardware queue to be processed to free up a slot. Change the code to always leave one free slot unless there is a retry barrier packet already on the queue. Change-Id: If901c865550258b790b995d58037b0f99f1968cc	2023-10-18 12:54:36 -04:00
Tony Tye	ca99795c58	Clarify intercept queue retry packet detection Describe the assumption being made when checking if there is a retry barrier packet on the queue. Also enforce the consequential requirement of the minimum queue size. Change-Id: I0efaffc5a79b9e2fdab3655b8b74270118a5c2ff	2023-10-18 12:54:36 -04:00
Tony Tye	be6b8bb055	Correct intercept queue handling of the overflow queue The intercept queue was processing all the packets on the proxy queue. This could result in the rewrite of more than one packet being put on the overflow queue. If there are a lot of packets on the intercept queue this could result in the overflow queue having more packets than the size of the hardware queue. The code to submit the overflow queue fails if it is unable to put all the packets of the overflow on the hardware queue. This resulted in an infinite loop. It also resulted in an assert being reported that packets are being added to the overflow queue when it is not empty. Correct this by checking if the overflow queue is non-empty after rewriting each packet. If it is non-empty then stop processing additional packets. The additional packets will be processed when the barrier packet added to the hardware queue is executed due to its asyn handler. This barrier packet is added to the hardware queue whenever packets are saved on the overflow queue. Change-Id: I2537911d3c3ba1aac61a0a35f1ab97426a66b5a2	2023-10-18 12:54:36 -04:00
Jonathan Kim	a36856b02a	Use user requested engine ID when forcing SDMA copies When forcing SDMA copies, engine ID specified by the requester should still be used since the requester has hint of engine availability. Change-Id: Idefa9494e407e31da510aa4c7c1fa283c85a4f6e	2023-10-18 10:45:02 -04:00
David Yat Sin	22be526230	Fix escape-to-IB packet definition The Vendor specific header is only 8-bits and this would break the behavior on big-endian machines. Renaming field to amd_format to match name in spec sheets. Change-Id: I65559757657565d3d3ff489d2663a0be42cf8ba5	2023-10-13 13:37:49 +00:00
David Yat Sin	96b3c4a0aa	Allow CPU cache info to be empty Some new CPUs have different cache reporting structure causing thunk to leave the cache information empty. Allow the cache information for CPU agents to be empty as they are not used by language-runtimes Change-Id: Ic5e880171ab20aa114b4b62bdb4479eb54066f7b	2023-10-03 13:44:10 +00:00
Shweta Khatri	4eb6ed7799	Using new KFD HSA extended coherent memory flag Using new ExtendedCoherent KFD HSA memory flag to achieve system scope coherence on atomic instructions. Non-compliant systems may have the need to perform explicit HDP flushes to achieve system scope coherence using this flag. Change-Id: Ic6b47c0e97285086fa1f52bbfa4597b81cadafeb	2023-09-25 10:36:04 -04:00
David Yat Sin	06eefdeb1b	Use scope guards to release ref counts Some negative tests can trigger C++ exceptions to be thrown, which causes code to leave the ref counts in inconsistent state. Change-Id: Ifa6d8be986941efcdf20d7ac8b86eb15a8fe9932	2023-09-20 15:08:52 -04:00
David Yat Sin	dd61f54171	Fix hsa_amd_vmem_get_access to accept offset pointers Modify hsa_amd_vmem_get_access to handle pointers that are within VA range of an existing memory mapping Change-Id: I9f806ec39f6e9a33da8d86dd65d9a472438fa8ed	2023-09-20 14:03:37 -04:00
David Yat Sin	22becfb1e8	Add query for Xnack enabled Add system query for whether Xnack is enabled on a system. Change-Id: I2832110e4f33f6a951d13acd06636442debf27ae	2023-09-19 00:25:30 +00:00
Jonathan Kim	6b4365ae4c	Set correct overrides settings for GangLeader functions Silence warnings on more stringent compile checks for lack of override declaration. Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Change-Id: Iaa54dfc3dd74f5ee55763cafbbcf2db73493bb21	2023-09-12 15:56:34 -04:00
David Yat Sin	9a127193a8	Pre-allocate memory for 16K signals On busy systems, the memory allocation can take long duration and increase calls to hsa_signal_create/hsa_amd_signal_create. This mitigates this issue. Change-Id: Ib7640273262ebc3dbf1f07049ce5da10b1d6b158	2023-09-11 13:08:28 -04:00
David Yat Sin	6ce1586def	Update blit shaders for gfx94x Change-Id: Ic8def71aa0c6ab9a9a758877a65ca6b5625e8f1e	2023-09-08 09:43:31 -04:00
Shweta Khatri	4e675ce730	Use LLVM compiler to build blit shaders Generates shader bytecode stream in amd_blit_shaders_v2.h at build time Change-Id: I5228ec5442a78d074fd85ca9cd7f7a156dd84da3	2023-09-08 09:42:29 -04:00
David Yat Sin	3ee6c9b0e2	Fix clang compile warnings Change-Id: Iea9afc3d998a6c5db28af6c7b54939960b11ae95	2023-09-07 12:00:02 -04:00
David Yat Sin	4770b210f6	Fix for always returning 64 for cacheline size Change-Id: I0e31d306a2e051ecb9ac019c4e6f5efa25eabba0	2023-08-31 13:50:49 +00:00
David Yat Sin	1e7b078628	Update interface version for virtual memory APIs Change-Id: Ifbf1af08ee7aa4d55387ff9786f6a61b89b56f88	2023-08-30 17:01:13 -04:00
David Yat Sin	03f2f69d16	Increment HSA API table stepping on new APIs Add compile time asserts to force incrementing API table STEP versions each time a new function is added to each table. This is required for profiler team to be able to add preprocessor macros to determine which versions contain the new APIs. Also incrementing the major versions to 2 to indicate new numbering scheme. Change-Id: I148a436a5ceab6be3906f8263b40ea9b07841577	2023-08-29 21:59:36 +00:00
Jonathan Kim	cdd0728d9b	Submit a minimum of 64 DWORDs for SDMA submissions for some GFX9 devices Some GFX9 devices will drop commands if ring buffer submission is less than 64 DWORDs. Pad submission with a NOP head an trailing null DWORDs in this case. Change-Id: I850af490fb699f7efe8aef96d97c600a8e76516b	2023-08-23 13:36:29 -04:00
David Yat Sin	4317f8dece	Fix memory pool ALLOC_REC_GRANULE query Also changed enum value to leave gap between enums that only exist in hsa_region_info_t and enums that exist in both hsa_amd_memory_pool_info_t Change-Id: I8f9f31200de66648e9328e4203ab283068c993f0	2023-08-22 17:46:48 -04:00
David Yat Sin	7be305b83c	Fix flags passed to thunk for address reserve Fix flags passed to thunk when reserving address only Change-Id: Ic91d4c3393cc6a2b98e6bc5ed3575d40fa5e1424	2023-08-22 14:01:49 -04:00
Jonathan Kim	132815bcfb	Clean up SDMA ganging We don't need to keep track of specific blit engines in gang for submission anymore as ganging early exits on pending bytes. So tidy up the fluff. Change-Id: I77e80bf1ad8f561a03fff77bce33aa09d02760c6	2023-08-22 05:57:04 -04:00
Jonathan Kim	8f21793a3e	Fix SDMA ganging circular deadlock in oversubscription When oversubscribing SDMA gangs, a circular deadlock can occur since gang enqueue is staggered with respect to SDMA engine leader based on source to destination. As a result, an enqueued leader may be waiting on a gang item that is waiting on another enqueued leader or gang item and so on. To prevent this, first lock the submission to ensure dma status query and submissions are atomic. Once this is in place, be more stringent with ganging in that all SDMA engines must be available in order to gang. Finally, re-enable SDMA ganging by default. Change-Id: I4511e3487db9d26475b5aece4897f10168cc5322	2023-08-17 08:49:09 -04:00
Jonathan Kim	4c74e47e91	Update D2D SDMA ganging for non-SPX modes xGMI for compute partitioning in non-SPX modes does not have a reported bandwith. Fix it to at most 2 since each partition is either bounded by the number of xGMI links or the number of available SDMA contexts. Change-Id: I09094bd7548d9eee6f039b0efe849838e5de166e	2023-08-17 07:25:08 -04:00
Jonathan Kim	30982ff6aa	Bump the number of SDMA engines for gfx940 GFX940 can support up to 16 SDMA engines so bump it. Change-Id: I41a95e66383036735712e317a57b239d84fcb78d	2023-08-17 07:25:08 -04:00
Jonathan Kim	f8664e88e0	Break when finding ganged agent There's no need to keep looking in the list once we find a ganged agent. Change-Id: Ia0b9b484c88221a7966a814456942c19b1741978	2023-08-17 07:25:08 -04:00
David Yat Sin	a20a0a5bac	Temporarily disable SDMA ganging by default SDMA ganging is causing some regressions with some applications hanging. Temporarily disabling SDMA ganging by default until issue is fixed. Change-Id: I65e172923a53a967df27b30d969ad5d215c4fa09	2023-08-15 23:17:34 +00:00
David Yat Sin	93401e3c8c	Revert "Adding documentation for SDMA environment var" This reverts commit `a7ffddb265`. Replaced by commit 3b3f14c06e8a2fab717f0b82aba3c72d74bb9574. Environment variables documented in:docs/environment_variables.md Change-Id: I8da0d971eb98554b4bd1b884617a439f1b20ed5b	2023-08-10 09:55:42 -04:00
Ranjith Ramakrishnan	bb4756d2e0	Disable file reorg backward compatibility support by default Change-Id: Ib53a4d0476ec598025d4f1f98414e0e425bb0e49	2023-08-07 09:38:12 -07:00
David Yat Sin	93aff0b439	Fix compile error when using clang Fix compile error due to arithmetic on void* Fix some compile warnings Change-Id: I03ded438c5af77ba61c0a7017be5d4fe1e16c16c	2023-07-31 18:29:19 +00:00
Jonathan Kim	7df0167821	Enable D2D SDMA Ganging over xGMI Use all available SDMA engines capped by xGMI bandwith for all D2D copies within a hive. By default, set the latency boundary copy size as 4KB and below. Any copy size in within this boundary will not gang. Avoid oversubscribing engines by not ganging on engines with pending non-ganged work. An enviroment variable HSA_ENABLE_SDMA_GANG has been provided to override default ganging behaviour. Change-Id: Iccde76aa1af1d47ea2a151789432c9db4f0ffa8d	2023-07-27 08:58:26 -04:00
Jonathan Kim	c5dbb93e59	Silence parenthesis warnings in mem API Fix KFD version checking parenthesis warnings on compile. Change-Id: I89c46ea84a8d75b761d8c40ff62d008c7afbef2d	2023-07-26 16:14:40 -04:00
David Yat Sin	ebc51dd0eb	Revert "Add support for GC 11.5.0 and 11.5.1" Reverting this as current mainline compiler branch does not support gfx1150/gfx1151 yet. Will bring back later. This reverts commit `e877840197`. Change-Id: I31ff4fb2d5817538094a7ffaeba96dd6a7d660c7	2023-07-26 15:03:54 +00:00
David Yat Sin	469defa78a	Add agent query for nearest CPU agent Add agent info query to return nearest CPU agent. This can be used to determine which CPU agent is in the same NUMA region as the GPU agent. Change-Id: I5400b4347ffbf4d2a836df31c4de443a38b0ecd1	2023-07-24 13:59:13 -04:00
Jonathan Kim	0d14144e3a	Silence implicity conversion warnings in exception handling Silence unnamed enum warning in error code comparison Change-Id: I008b269c106bbad83a1f7588e7b4ec89ec17d37d	2023-07-24 10:06:55 -04:00
Jonathan Kim	42274cfc59	Fix out of order initializer for memory region Silence out of order initializer compile warnings during memory region initialization. Change-Id: Idbbdd93d3ea8cda289d25a473b3882b920b2e8d8	2023-07-24 09:58:37 -04:00

... 7 8 9 10 11 ...

1261 Коммитов