Commit graph

103 Commits

Autor SHA1 Nachricht Datum
Anusha GodavarthySurya 742b0210d3 SWDEV-477324 - Capture Memcpy1D pinned H2D D2H
Change-Id: I1f4744f20a9caeed005ec68da44e5fde737e09f7
2024-09-30 01:01:30 -04:00
German Andryeyev 29cc678d8d SWDEV-483586 - Unblock staging H2D transfers
Although unpinned copies require synchronizations
in HIP, runtime can avoid syncs for H2D copies with
a staging buffer

Change-Id: If2203c6bc0cbd89742823688dc8e89e9acd873b2
2024-09-21 10:25:27 -04:00
kjayapra-amd 12a39fbf22 SWDEV-480772 - Remove name variable from amd::Monitor class.
Change-Id: Ie2a4fa44f485786227230f8a892e090e718aa30e
2024-09-19 11:55:01 -04:00
Saleel Kudchadker d379f4efd0 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a
2024-07-25 15:15:10 -04:00
Anusha GodavarthySurya 57156c524d SWDEV-467102 - Hidden heap init for graph capture
If the graph has kernels that does device side allocation,  during packet capture, heap is
allocated because heap pointer has to be added to the AQL packet, and initialized during
graph launch.

Handle race with wait when 2 kernels with device heap are enqueued on multiple streams.

Change-Id: I45933b77fcaf7bc8fdf1bc906462e32b5d8d3688
2024-06-17 02:07:25 -04:00
Ioannis Assiouras 775dc204aa SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd
Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd
2024-06-07 12:23:06 -04:00
Sourabh Betigeri a9f05e22db Revert "SWDEV-301667 - Disable HostBlit copy for HIP"
This reverts commit 5447cf8872.

Reason for revert: SWDEV-455075, SWDEV-461507 - This change forces to
use ROCr's copy path. Reintroducing hostBlit copy path for
host-to-host copies.


Change-Id: Ic3c45b49e481c9dcdaa7611f61071778790b7e6c
2024-05-28 06:31:10 +00:00
German Andryeyev 7de7da4016 SWDEV-455254 - Reduce blit kernels signature
Remove offset from blit kernels, since it can be applied in setup.

Change-Id: I06b585068d68a0ee8e125ddf46a36fccb372f30d
2024-04-12 14:45:55 -04:00
Saleel Kudchadker 3f0bcf7834 SWDEV-301667 - Fix SDMA mask reuse
If we are using the mask returned by getLastUsedSdmaEngine() then we
need to apply the SDMA Read/Write mask to it before using with HSA
copy_on_engine API.

Change-Id: I6e5dc6c187eeb3c61ee159e9d2a0fa7b4737c06e
2024-04-08 15:42:52 -04:00
Anusha GodavarthySurya 7f84df9f74 SWDEV-301667 - Disable HostBlit copy for HIP correct if check
Change-Id: I33d1359d5e4c871f63350d8300f726e039664d86
2024-03-26 02:18:51 -04:00
Satyanvesh Dittakavi 1b25484f0f SWDEV-447405 - Reset the last SDMA engine after every few copies
The copies can get blocked if the last SDMA engine is used by another
copy and this can lead to perf drop in some of the tests like Gromacs.
Resetting the last engine by checking the engine status and fetching the
new mask after few copies can avoid this.

Change-Id: I8fe8ea678db508d291c6242f3741fa9215e99921
2024-03-12 02:10:27 -04:00
Ajay e643406caa SWDEV-347670 - StreamWait and StreamWrite on Windows
__amd_streamOpsWrite blitkernel in device-libs has only 3 args.
so getting rid of the 4th unused arg (sizeBytes)

Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91
2024-02-26 22:45:10 -05:00
German Andryeyev 842eda5e7f SWDEV-440746 - Remove pending dispatch check for SDMA P2P
Change-Id: I7290345cfc60cd878fb39a06b03105441793c27b
2024-02-05 05:08:11 +00:00
German 31101c6219 SWDEV-440746 - Limit WG for compute P2P
Use only 16 workgroups for compute P2P copies.
That should be enough to utilize XGMI bandwidth.

Change-Id: I60dfe019279bb95f93c8874244c1738aad1896d8
2024-01-12 14:56:29 -05:00
Jaydeep Patel 3daf8b334a SWDEV-430911 - Force SDMA only if explicitly specified & pick appropriate engine.
Change-Id: Ib34fa6b2782f74b753899fa8fddff646dc60c8ce
2023-12-04 21:58:47 -05:00
German Andryeyev 44761fe89b SWDEV-434298 - Add destination offset
The end pointer in copy buffer requires destination offset

Change-Id: I01f2967144f741761fd2ce3244fd8d04564d986f
2023-11-29 16:33:29 -05:00
German Andryeyev ed4e1fec98 SWDEV-434298 - Change copy buffer kernel
The new copy kernel can limit the number of launched workgoups.
It can copy in chunks of 16 bytes or 4 bytes.
Workgoup size is increased to 512 or 1024

Change-Id: Ic3fefa2d5bda6afebd1acc4d41ad310b138af6df
2023-11-28 16:56:30 -05:00
German Andryeyev f1dc81f427 SWDEV-432174 - Change the fillBuffer kernel
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG

Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
2023-11-16 14:25:55 -04:00
Ioannis Assiouras b0c9fb84fd SWDEV-428408 - Add waitingSignal for hsa_amd_memory_async_copy calls in hsaCopyStaged
Change-Id: I3c42ef1ef3ed2f0b00f0a50d402a32106e5978ba
2023-11-01 19:43:08 -04:00
Saleel Kudchadker f316a30e5d SWDEV-408180 - Add a new hipMemcpyKind
Add hipMemcpyDeviceToDeviceNoCU to force a non blit copy path. This
helps in cases where an app may determine that CU may be busy and copies
with SDMA may be quicker.

Change-Id: I59b415dd8f6022c244e8d75f265464d5c635df1e
2023-10-20 13:18:10 -04:00
Saleel Kudchadker bf8baeecb3 SWDEV-301667 - Track last used SDMA engine per queue
- Track last SDMA engine per queue, this results in better scheduling
- Reset last SDMA engine upon batch completion. That ensures we dont get
blocked if the same engine is used by another concurrent copy

Change-Id: Id53111980da7ee41d5c932fb44e4aab5b1e065a3
2023-10-10 12:13:11 -04:00
kjayapra-amd 6a8bc3c718 SWDEV-419688 - Do not run GWS init kernel for targets > gfx12 and MI300.
Change-Id: I8e7441268978be71ab8a5a33e7f8bcf69660e500
(cherry picked from commit 36d37ef614909c0f215512aac0c133408d787080)
2023-10-05 14:57:56 -04:00
kjayapra-amd 6f5277c701 SWDEV-408473 - Add wait time of 10 us if the waiting signal copy was < 24K.
Change-Id: I438ec9eb07e5034042a4a9a5e6e51d74daba2c83
2023-08-23 10:46:33 -04:00
victzhan cb426df1bd SWDEV-416580 - Add condition when memory has direct access, only use host fill if image is small
Change-Id: I3509c4aa21f6413adad3b46273ec650f5c577ddd
2023-08-17 17:23:49 -04:00
Jaydeep Patel 289535e805 SWDEV-412393 - Force alloc memory to avoid another hsa image creation.
Change-Id: Ia3cd99eb736231e6dfe013ebae6c41fd4cc657bc
2023-08-17 05:18:43 +00:00
Saleel Kudchadker aa6eb555e2 SWDEV-384557 - Enable SDMA query
Change-Id: Ibb0a8d131f799985a4d4adbf753261e58c04157f
2023-08-01 18:41:23 -04:00
Saleel Kudchadker 5447cf8872 SWDEV-301667 - Disable HostBlit copy for HIP
Change-Id: I46333ff42e8c1d402ece97e3ead7b539a27c3f82
2023-07-17 17:49:11 -04:00
Saleel Kudchadker 770b2a4711 SWDEV-384557 - Rename env var
- Rename HIP_USE_SDMA_QUERY to DEBUG_CLR_USE_SDMA_QUERY as this is
supposed to be a temporary env var for debug purposes only.

Change-Id: If6ebd52ab87624375a3df24ceccdcc05c60a65af
2023-06-29 13:54:55 -04:00
German Andryeyev d29755452b SWDEV-396088 - Add image view cache
Blit manager requires an image view to reduce the amount
of copy kernels. Creation/destruction of a view in ROCr is
an expensive operation. Thus, runtime can cache views for fast access.

Change-Id: Ia67d775b481cc8326d91215ca22d4a73c1dddb59
2023-06-28 09:44:05 -04:00
Saleel Kudchadker 0a3d4bd4d4 SWDEV-408180 - Remove largeBar memcpy
- Remove large bar memcpy path. Since we end up waiting for a barrier,
its defeating the true intent of the copy, Also memcpy over PCIE\XGMI is
introducing variability in perf for HPC apps like GROMACS

Change-Id: I3b5c9d9ce93333959c39023bf4f703e2ccb6e3af
2023-06-27 18:15:26 -04:00
Saleel Kudchadker 8d193c32bb SWDEV-384557 - Use toggle for SDMA query
- Use HIP_USE_SDMA_QUERY env var toggle for new API use. Env var is 0 by
default

Change-Id: If725a0c41e15f78a1a6c3f47942954fe9240b4db
2023-06-15 01:02:24 -04:00
Saleel Kudchadker 60d9a4ebab SWDEV-384557 - Do not fall back to compute
- Use regular copy API if we exhaust free SDMA engines and not fall back
to compute copy. Falling to compute is affecting performance for
numerous apps that are GPU bound

Change-Id: I75c767eff0b9f5ada324301c5c327fe2c23a9806
2023-05-22 11:23:23 -04:00
Saleel Kudchadker 0b475284e9 SWDEV-398151 - Partly relax static engine allocation
Change-Id: I4903b51a34b597a2e84d771b52cf629f877dba05
2023-05-11 00:52:18 -04:00
taosang2 7624a48de9 SWDEV-366528 – Fix image memory format updating issue
Add dstMemory format updating.
Separate format updating for srcMemory and dstMemory.

Change-Id: I1692b92d417bbd742d562679f218ebf8ca532e92
2023-05-08 21:43:42 -04:00
Saleel Kudchadker 5865c642d4 SWDEV-384557 - Fix engine status query
- Maintain a map of SDMA engine# to stream allocated following a greedy
approach
- Anything past that will query SDMA engine status always and go with a
SDMA or Blit copy path

Change-Id: Ibfaed7f951ab84d80cb0430596a4d11b5aec9202
2023-04-21 00:57:26 -04:00
Saleel Kudchadker 20ca8b8116 SWDEV-384557 - Leverage SDMA engine status query
Change-Id: I5f386f2965de24a229ea43b6c4da82099692f91f
2023-04-05 07:50:53 +00:00
Jaydeep Patel ad78c5c4a5 SWDEV-382553 - Remove use of useCopyHint.
Change-Id: I82eb5d7569a2a78d7709af9397d4f29c8274d781
2023-03-01 23:20:02 -05:00
jatang b798c85272 SWDEV-380792 - Fix floating point exception when maxEngineClockFrequency_ is 0
Change-Id: Ic443ceae586c4c84995ed2abef9bd7f32f8b60f9
2023-02-07 11:43:10 -05:00
German Andryeyev b23c759746 SWDEV-372790 - Copy AQL packet from runtime setup
Scheduler in device queue requires relaunching itself. Make sure
scheduler uses exactly the same AQL packet as the host launch.

Change-Id: I4eb03c4c91bf2408a6d4607731f081a2e2c2c8ae
2023-01-24 10:25:45 -05:00
Jaydeep Patel 1e4a4162ff SWDEV-378157 - Correct log message
Change-Id: I6297693f67ae78a8874b976ac03353a81b728b1d
2023-01-23 12:06:18 -05:00
Saleel Kudchadker 033d4c0463 SWDEV-345213 - Fix staged line-by-line copy path
- Address an old bug in offset calculation that was causing out of bound
access.
- Improve logging

Change-Id: Iebdf34dddaa5e987cc72184a2152918adc6a96e0
2023-01-16 11:04:30 -05:00
Anusha GodavarthySurya 274f2de391 SWDEV-364576 - initialize device malloc heap state using blit kernel
Change-Id: I5d0172aff7d2c04b322a4d828b8a2b438158b80f
2023-01-07 06:53:53 +00:00
Jaydeep Patel 070ae4e6d4 SWDEV-374370 - Propogate element size to blit kernel.
Change-Id: I06d1ae6feebd238e9a63c617eb4c4dcf485d9ee0
2022-12-26 09:33:50 +00:00
Saleel Kudchadker e0384f9f6b SWDEV-373334 - Use copyMetadata for blit decisions
- Check isAsync flag for small host copies on large bar as it synchronizes
- Use CopyEngine Preference hint if HMM is enabled.

Change-Id: I1ffc4b2604ed03cf5979cdc454178648c5ae5cba
2022-12-15 17:09:02 -05:00
Ioannis Assiouras 72b45e2a1f SWDEV-369581 - Convey copy API metadata to ROCclr
Change-Id: I569462d6d268700d419510255e201bf7d80d6714
2022-12-09 00:27:15 -05:00
Saleel Kudchadker feca11d5e3 SWDEV-301667 - Improve logging
Change-Id: Ifa6da876b85cb503967cf09aac6d477b10db8e63
2022-11-04 18:23:18 -04:00
Saleel Kudchadker 175ad024d3 SWDEV-260345 - Manage constant buffer for blit
- Leverage managed buffer that would use chunks for fill pattern. Use a
different chunk for the next fill to avoid wait

Change-Id: I254483c867e112f66564ffd8f55e0a605d8896c9
2022-07-12 12:41:02 -04:00
Saleel Kudchadker faaa41aab8 SWDEV-335626 - Use ROCr copy for IPC
Detect IPC buffer and use ROCr copy api instead of blit

Change-Id: Ie6bdd6fc45dbd7457611011d81570b53d5fd5276
2022-07-08 13:32:19 -04:00
Ajay d2f837d25f SWDEV-332522 - streamOpsWrite & streamOpsWait to accept memory offset
Change-Id: I4b6ecb4d80c093d038d86616a637c4bb465ae24e
2022-04-25 14:59:36 -04:00
Jason Tang ed7737564e SWDEV-324411 - Use blit kernel for copyBufferRect if atomic is not supported
Change-Id: I2e110fd3418117ee9c7ede379244d2c6c4f248b7
2022-04-24 11:41:16 -04:00