rocm-systems

Autor	SHA1	Nachricht	Datum
Anusha GodavarthySurya	742b0210d3	SWDEV-477324 - Capture Memcpy1D pinned H2D D2H Change-Id: I1f4744f20a9caeed005ec68da44e5fde737e09f7	2024-09-30 01:01:30 -04:00
German Andryeyev	29cc678d8d	SWDEV-483586 - Unblock staging H2D transfers Although unpinned copies require synchronizations in HIP, runtime can avoid syncs for H2D copies with a staging buffer Change-Id: If2203c6bc0cbd89742823688dc8e89e9acd873b2	2024-09-21 10:25:27 -04:00
kjayapra-amd	12a39fbf22	SWDEV-480772 - Remove name variable from amd::Monitor class. Change-Id: Ie2a4fa44f485786227230f8a892e090e718aa30e	2024-09-19 11:55:01 -04:00
Saleel Kudchadker	d379f4efd0	SWDEV-301667 - Refactor Blit force env var Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a	2024-07-25 15:15:10 -04:00
Anusha GodavarthySurya	57156c524d	SWDEV-467102 - Hidden heap init for graph capture If the graph has kernels that does device side allocation, during packet capture, heap is allocated because heap pointer has to be added to the AQL packet, and initialized during graph launch. Handle race with wait when 2 kernels with device heap are enqueued on multiple streams. Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688	2024-06-17 02:07:25 -04:00
Ioannis Assiouras	775dc204aa	SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd	2024-06-07 12:23:06 -04:00
Sourabh Betigeri	a9f05e22db	Revert "SWDEV-301667 - Disable HostBlit copy for HIP" This reverts commit `5447cf8872`. Reason for revert: SWDEV-455075, SWDEV-461507 - This change forces to use ROCr's copy path. Reintroducing hostBlit copy path for host-to-host copies. Change-Id: Ic3c45b49e481c9dcdaa7611f61071778790b7e6c	2024-05-28 06:31:10 +00:00
German Andryeyev	7de7da4016	SWDEV-455254 - Reduce blit kernels signature Remove offset from blit kernels, since it can be applied in setup. Change-Id: I06b585068d68a0ee8e125ddf46a36fccb372f30d	2024-04-12 14:45:55 -04:00
Saleel Kudchadker	3f0bcf7834	SWDEV-301667 - Fix SDMA mask reuse If we are using the mask returned by getLastUsedSdmaEngine() then we need to apply the SDMA Read/Write mask to it before using with HSA copy_on_engine API. Change-Id: I6e5dc6c187eeb3c61ee159e9d2a0fa7b4737c06e	2024-04-08 15:42:52 -04:00
Anusha GodavarthySurya	7f84df9f74	SWDEV-301667 - Disable HostBlit copy for HIP correct if check Change-Id: I33d1359d5e4c871f63350d8300f726e039664d86	2024-03-26 02:18:51 -04:00
Satyanvesh Dittakavi	1b25484f0f	SWDEV-447405 - Reset the last SDMA engine after every few copies The copies can get blocked if the last SDMA engine is used by another copy and this can lead to perf drop in some of the tests like Gromacs. Resetting the last engine by checking the engine status and fetching the new mask after few copies can avoid this. Change-Id: I8fe8ea678db508d291c6242f3741fa9215e99921	2024-03-12 02:10:27 -04:00
Ajay	e643406caa	SWDEV-347670 - StreamWait and StreamWrite on Windows __amd_streamOpsWrite blitkernel in device-libs has only 3 args. so getting rid of the 4th unused arg (sizeBytes) Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91	2024-02-26 22:45:10 -05:00
German Andryeyev	842eda5e7f	SWDEV-440746 - Remove pending dispatch check for SDMA P2P Change-Id: I7290345cfc60cd878fb39a06b03105441793c27b	2024-02-05 05:08:11 +00:00
German	31101c6219	SWDEV-440746 - Limit WG for compute P2P Use only 16 workgroups for compute P2P copies. That should be enough to utilize XGMI bandwidth. Change-Id: I60dfe019279bb95f93c8874244c1738aad1896d8	2024-01-12 14:56:29 -05:00
Jaydeep Patel	3daf8b334a	SWDEV-430911 - Force SDMA only if explicitly specified & pick appropriate engine. Change-Id: Ib34fa6b2782f74b753899fa8fddff646dc60c8ce	2023-12-04 21:58:47 -05:00
German Andryeyev	44761fe89b	SWDEV-434298 - Add destination offset The end pointer in copy buffer requires destination offset Change-Id: I01f2967144f741761fd2ce3244fd8d04564d986f	2023-11-29 16:33:29 -05:00
German Andryeyev	ed4e1fec98	SWDEV-434298 - Change copy buffer kernel The new copy kernel can limit the number of launched workgoups. It can copy in chunks of 16 bytes or 4 bytes. Workgoup size is increased to 512 or 1024 Change-Id: Ic3fefa2d5bda6afebd1acc4d41ad310b138af6df	2023-11-28 16:56:30 -05:00
German Andryeyev	f1dc81f427	SWDEV-432174 - Change the fillBuffer kernel - Add the new fillBuffer kernel, which allows to launch a limited number of workgroups for memory fill operation - Switch fill memory to 16 bytes write by default - Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e	2023-11-16 14:25:55 -04:00
Ioannis Assiouras	b0c9fb84fd	SWDEV-428408 - Add waitingSignal for hsa_amd_memory_async_copy calls in hsaCopyStaged Change-Id: I3c42ef1ef3ed2f0b00f0a50d402a32106e5978ba	2023-11-01 19:43:08 -04:00
Saleel Kudchadker	f316a30e5d	SWDEV-408180 - Add a new hipMemcpyKind Add hipMemcpyDeviceToDeviceNoCU to force a non blit copy path. This helps in cases where an app may determine that CU may be busy and copies with SDMA may be quicker. Change-Id: I59b415dd8f6022c244e8d75f265464d5c635df1e	2023-10-20 13:18:10 -04:00
Saleel Kudchadker	bf8baeecb3	SWDEV-301667 - Track last used SDMA engine per queue - Track last SDMA engine per queue, this results in better scheduling - Reset last SDMA engine upon batch completion. That ensures we dont get blocked if the same engine is used by another concurrent copy Change-Id: Id53111980da7ee41d5c932fb44e4aab5b1e065a3	2023-10-10 12:13:11 -04:00
kjayapra-amd	6a8bc3c718	SWDEV-419688 - Do not run GWS init kernel for targets > gfx12 and MI300. Change-Id: I8e7441268978be71ab8a5a33e7f8bcf69660e500 (cherry picked from commit 36d37ef614909c0f215512aac0c133408d787080)	2023-10-05 14:57:56 -04:00
kjayapra-amd	6f5277c701	SWDEV-408473 - Add wait time of 10 us if the waiting signal copy was < 24K. Change-Id: I438ec9eb07e5034042a4a9a5e6e51d74daba2c83	2023-08-23 10:46:33 -04:00
victzhan	cb426df1bd	SWDEV-416580 - Add condition when memory has direct access, only use host fill if image is small Change-Id: I3509c4aa21f6413adad3b46273ec650f5c577ddd	2023-08-17 17:23:49 -04:00
Jaydeep Patel	289535e805	SWDEV-412393 - Force alloc memory to avoid another hsa image creation. Change-Id: Ia3cd99eb736231e6dfe013ebae6c41fd4cc657bc	2023-08-17 05:18:43 +00:00
Saleel Kudchadker	aa6eb555e2	SWDEV-384557 - Enable SDMA query Change-Id: Ibb0a8d131f799985a4d4adbf753261e58c04157f	2023-08-01 18:41:23 -04:00
Saleel Kudchadker	5447cf8872	SWDEV-301667 - Disable HostBlit copy for HIP Change-Id: I46333ff42e8c1d402ece97e3ead7b539a27c3f82	2023-07-17 17:49:11 -04:00
Saleel Kudchadker	770b2a4711	SWDEV-384557 - Rename env var - Rename HIP_USE_SDMA_QUERY to DEBUG_CLR_USE_SDMA_QUERY as this is supposed to be a temporary env var for debug purposes only. Change-Id: If6ebd52ab87624375a3df24ceccdcc05c60a65af	2023-06-29 13:54:55 -04:00
German Andryeyev	d29755452b	SWDEV-396088 - Add image view cache Blit manager requires an image view to reduce the amount of copy kernels. Creation/destruction of a view in ROCr is an expensive operation. Thus, runtime can cache views for fast access. Change-Id: Ia67d775b481cc8326d91215ca22d4a73c1dddb59	2023-06-28 09:44:05 -04:00
Saleel Kudchadker	0a3d4bd4d4	SWDEV-408180 - Remove largeBar memcpy - Remove large bar memcpy path. Since we end up waiting for a barrier, its defeating the true intent of the copy, Also memcpy over PCIE\XGMI is introducing variability in perf for HPC apps like GROMACS Change-Id: I3b5c9d9ce93333959c39023bf4f703e2ccb6e3af	2023-06-27 18:15:26 -04:00
Saleel Kudchadker	8d193c32bb	SWDEV-384557 - Use toggle for SDMA query - Use HIP_USE_SDMA_QUERY env var toggle for new API use. Env var is 0 by default Change-Id: If725a0c41e15f78a1a6c3f47942954fe9240b4db	2023-06-15 01:02:24 -04:00
Saleel Kudchadker	60d9a4ebab	SWDEV-384557 - Do not fall back to compute - Use regular copy API if we exhaust free SDMA engines and not fall back to compute copy. Falling to compute is affecting performance for numerous apps that are GPU bound Change-Id: I75c767eff0b9f5ada324301c5c327fe2c23a9806	2023-05-22 11:23:23 -04:00
Saleel Kudchadker	0b475284e9	SWDEV-398151 - Partly relax static engine allocation Change-Id: I4903b51a34b597a2e84d771b52cf629f877dba05	2023-05-11 00:52:18 -04:00
taosang2	7624a48de9	SWDEV-366528 – Fix image memory format updating issue Add dstMemory format updating. Separate format updating for srcMemory and dstMemory. Change-Id: I1692b92d417bbd742d562679f218ebf8ca532e92	2023-05-08 21:43:42 -04:00
Saleel Kudchadker	5865c642d4	SWDEV-384557 - Fix engine status query - Maintain a map of SDMA engine# to stream allocated following a greedy approach - Anything past that will query SDMA engine status always and go with a SDMA or Blit copy path Change-Id: Ibfaed7f951ab84d80cb0430596a4d11b5aec9202	2023-04-21 00:57:26 -04:00
Saleel Kudchadker	20ca8b8116	SWDEV-384557 - Leverage SDMA engine status query Change-Id: I5f386f2965de24a229ea43b6c4da82099692f91f	2023-04-05 07:50:53 +00:00
Jaydeep Patel	ad78c5c4a5	SWDEV-382553 - Remove use of useCopyHint. Change-Id: I82eb5d7569a2a78d7709af9397d4f29c8274d781	2023-03-01 23:20:02 -05:00
jatang	b798c85272	SWDEV-380792 - Fix floating point exception when maxEngineClockFrequency_ is 0 Change-Id: Ic443ceae586c4c84995ed2abef9bd7f32f8b60f9	2023-02-07 11:43:10 -05:00
German Andryeyev	b23c759746	SWDEV-372790 - Copy AQL packet from runtime setup Scheduler in device queue requires relaunching itself. Make sure scheduler uses exactly the same AQL packet as the host launch. Change-Id: I4eb03c4c91bf2408a6d4607731f081a2e2c2c8ae	2023-01-24 10:25:45 -05:00
Jaydeep Patel	1e4a4162ff	SWDEV-378157 - Correct log message Change-Id: I6297693f67ae78a8874b976ac03353a81b728b1d	2023-01-23 12:06:18 -05:00
Saleel Kudchadker	033d4c0463	SWDEV-345213 - Fix staged line-by-line copy path - Address an old bug in offset calculation that was causing out of bound access. - Improve logging Change-Id: Iebdf34dddaa5e987cc72184a2152918adc6a96e0	2023-01-16 11:04:30 -05:00
Anusha GodavarthySurya	274f2de391	SWDEV-364576 - initialize device malloc heap state using blit kernel Change-Id: I5d0172aff7d2c04b322a4d828b8a2b438158b80f	2023-01-07 06:53:53 +00:00
Jaydeep Patel	070ae4e6d4	SWDEV-374370 - Propogate element size to blit kernel. Change-Id: I06d1ae6feebd238e9a63c617eb4c4dcf485d9ee0	2022-12-26 09:33:50 +00:00
Saleel Kudchadker	e0384f9f6b	SWDEV-373334 - Use copyMetadata for blit decisions - Check isAsync flag for small host copies on large bar as it synchronizes - Use CopyEngine Preference hint if HMM is enabled. Change-Id: I1ffc4b2604ed03cf5979cdc454178648c5ae5cba	2022-12-15 17:09:02 -05:00
Ioannis Assiouras	72b45e2a1f	SWDEV-369581 - Convey copy API metadata to ROCclr Change-Id: I569462d6d268700d419510255e201bf7d80d6714	2022-12-09 00:27:15 -05:00
Saleel Kudchadker	feca11d5e3	SWDEV-301667 - Improve logging Change-Id: Ifa6da876b85cb503967cf09aac6d477b10db8e63	2022-11-04 18:23:18 -04:00
Saleel Kudchadker	175ad024d3	SWDEV-260345 - Manage constant buffer for blit - Leverage managed buffer that would use chunks for fill pattern. Use a different chunk for the next fill to avoid wait Change-Id: I254483c867e112f66564ffd8f55e0a605d8896c9	2022-07-12 12:41:02 -04:00
Saleel Kudchadker	faaa41aab8	SWDEV-335626 - Use ROCr copy for IPC Detect IPC buffer and use ROCr copy api instead of blit Change-Id: Ie6bdd6fc45dbd7457611011d81570b53d5fd5276	2022-07-08 13:32:19 -04:00
Ajay	d2f837d25f	SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e	2022-04-25 14:59:36 -04:00
Jason Tang	ed7737564e	SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7	2022-04-24 11:41:16 -04:00

1 2 3

103 Commits