Commit Graph

85 Commits

Author SHA1 Message Date
Todd tiantuo Li 41dc4545fc SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL
Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb
2024-10-10 18:00:19 -04:00
Daniel Livingston e550032d25 SWDEV-77148 - Add UberTrace support to PAL device
This PR adds UberTrace-based tracing support to ROCclr's PAL device class.
Legacy RGP-based tracing is still available and is the default.
If UberTrace support is enabled tool-side, this new code path will activate.

Change-Id: I268b2dcef70e850a50e2caef8355f38bf51d4641
2024-09-17 16:06:37 -04:00
Jaydeep Patel 9c90bc43a5 SWDEV-475938 - Update dynamic stack in submit kernel internal.
Change-Id: I816bf9cfe8aaac5486ff3b719dbdc4f4d6134e01
2024-09-11 00:59:45 -04:00
kjayapra-amd 6211037f63 SWDEV-439234 - Access check before memcpy and kernel operations.
Change-Id: I7057125c03460db205409e19980145298c190fe2
2024-09-06 14:30:00 -04:00
kjayapra-amd 2a9cb89228 SWDEV-478099 - Fix multiple mapping case on PAL/Windows backend.
Change-Id: Id1fe7939fbf90649cda1848890b3b4ca9a1fcd00
2024-08-27 11:19:39 -04:00
Rahul Manocha 432bdd7bf2 SWDEV-474617 SWDEV-464679 - Fix segfault in palvirtual due to peer memory access
Change-Id: Ib8b641712d78acf8bc073ca5705dea97af6f944a
2024-08-21 11:34:15 -04:00
Ioannis Assiouras 775dc204aa SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd
Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd
2024-06-07 12:23:06 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Payam Ghafari f268b48a2d SWDEV-447691 - added error reporting on semaphore
Change-Id: Id903806d122c0594d6549d5e8b7201512eff9850
2024-05-28 06:31:10 +00:00
German Andryeyev 5c1804aa14 SWDEV-353281 - Corret VA unmap
Make sure graph mempool unmaps VA on release

Change-Id: Id3f1bd8d0115b533ae60aa5ba3676b8bf7e5b961
2024-04-26 09:37:01 -04:00
German Andryeyev 7448113cfc SWDEV-440746 - Remove obsolete code
The "optimized" version of memcpy is outdated and
was used in win32 only.

Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6
2024-04-22 09:56:42 -04:00
Ioannis Assiouras bf74ef4025 SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args
Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7
2024-04-19 09:29:20 -04:00
German Andryeyev 62559a6e5a SWDEV-440746 - Fix the hostcall buffer creation
Avoid a deadlock on the host call buffer creation. Since the buffer will be
allocated in the queue thread, then use direct device memory allocation
skipping the global context lock.

Change-Id: I09b55ee03bb42ab5d320c152b52a8c842c5fdcc1
2024-04-17 12:37:23 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
German Andryeyev 1239309c90 SWDEV-449558 - Update barrier's logic
PAL optimized the logic for the barriers, which caused failures with CP DMA on Navi4x.
Change barrier's code to match the most recent PAL optimizations.

Change-Id: I55eeab20f51eb8e920bcbb4b55fbe3c7f77fd3fa
2024-03-18 10:52:32 -04:00
kjayapra-amd f5ca620baa SWDEV-423835 - Fixing kernel launch issues on Virtual Memory Management path.
Change-Id: I9f5e8a3d83af3809b2c50b21a10697e26113dd23
2024-03-12 17:22:07 -04:00
Ajay e643406caa SWDEV-347670 - StreamWait and StreamWrite on Windows
__amd_streamOpsWrite blitkernel in device-libs has only 3 args.
so getting rid of the 4th unused arg (sizeBytes)

Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91
2024-02-26 22:45:10 -05:00
German 13c6f56ca9 SWDEV-440746 - Enable hipExtAnyOrderLaunch extension for PAL
Extension allows to execute the kernels without a wait barrier and L1
invalidation.

Change-Id: I96c485204303f54a0240b93134f4560673e4bd17
2024-01-16 15:20:39 -05:00
German dec1158d04 SWDEV-438532 - Enable wave limit for HSAIL
Luxmark still uses HSAIL path and one subtest can benefit from the wave limit.

Change-Id: I16c94e09cd6e2afd6341cb76bf2e9ab7b7713214
2024-01-09 17:00:50 -05:00
German 7d661bc7df SWDEV-404889 - Enable debugger interface in PAL
Add GPU_DEBUG_ENABLE to control ttpm behavior. If enabled,
then HW will collect more debug info at some perf cost

Change-Id: Icee0686b903a7b1bd483710b9d611877cd43c6aa
2024-01-02 11:51:42 -05:00
German cfc07c88ee SWDEV-436796 - Enable device memory for kernel arguments
Extra CPU read back will be performed before every submission to make sure
previous writes over PCIE reached GPU. HDP flush is done by CP.

Change-Id: I402d28ca26c8cee4a3920feb3599af8c285d0889
2023-12-15 13:11:50 -05:00
German Andryeyev f1dc81f427 SWDEV-432174 - Change the fillBuffer kernel
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG

Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
2023-11-16 14:25:55 -04:00
German 5f297d75d9 SWDEV-430256 - Expose HIP_FORCE_DEV_KERNARG under PAL
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.

Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4
2023-11-08 10:01:22 -05:00
Saleel Kudchadker 40f41f4d0b SWDEV-422207 - Track commands for capture
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues

Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
2023-11-03 05:09:04 +00:00
German 7be3a5e33e SWDEV-407533 - [ABI Break]Remove Wavelimiter
Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd
2023-09-07 09:58:41 -04:00
German bd00826446 SWDEV-3 - Move PAL to version 818
Restore PAL platform destruction.
Update CmdAllocatorCreateInfo::AllocInfo for the new interface.

Change-Id: Iea418eed7ee26166039a4a9cc1999438856e9097
2023-08-29 12:46:28 -04:00
German 077311153a SWDEV-407533 - [ABI Break]Purge unused env vars
Change-Id: I627950e8ebb6299affc602754a20d442dbe42b14
2023-08-24 14:11:40 -04:00
Tao Sang d433df4761 SWDEV-417727 - Fix hipSignalExternalSemaphoresAsync()
This reverts commit 44a3935cda.

Implement the right way to make ExternalSemaphores be signalled
only after prior works on the stream have been finished.

Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa
2023-08-23 22:31:27 -04:00
German d97cc0abbd SWDEV-404889 - Inital change for debugger support
- Program unique AQL index for debugger. The logic manages AQL array of packets per HW queue.
- Provide debug state to PAL

Change-Id: I38fa1f5435fa711fd1d44dc391f2e61eb2a25efa
2023-08-23 13:21:58 -04:00
Jaydeep Patel ff1a999f66 SWDEV-408283 - Sync scratchRegs_, privateMemSize_ and workitemPrivateSegmentSize.
Change-Id: I623a7140810ff9867f8816bf4c8621a1fe921744
2023-07-27 00:31:54 -04:00
German 1a0c3e4dc4 SWDEV-311270 - Add IPC support for memory pools
Initial implementation for hipMemPoolExportToShareableHandle,
hipMemPoolImportFromShareableHandle,
hipMemPoolExportPointer and hipMemPoolImportPointer

Change-Id: I0ebdc48e9163b394ded560adca6c38bbc5aee7d1
2023-06-15 11:36:52 -04:00
Jaydeep Patel 0eb96cbc59 SWDEV-397168 - Enable dynamic call stack size for PAL.
Change-Id: I8be51ffb48e6a742117491a4bf6f12f152e4a0b3
2023-05-07 23:26:28 -04:00
German 04b696abee SWDEV-353281 - VM support in mempool for graphs
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.

Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
2023-05-05 15:31:26 -04:00
German 7ef2da5aba SWDEV-353281 - Move VirtualMem map update to memobj
- The implementation in mempool graphs requires refcounting VA object.
That requires release() to update the map only on the actual destruction.
- Add GPU event tracking for paging operation. Otherwise, runtime
may not always flush IB.

Change-Id: Idf99ffb894321a38e04b490116a7ca435635918d
2023-04-28 17:22:11 -04:00
Ajay 88736010fb SWDEV-381627 - adding cl interop files to vdi
Change-Id: Ic40363587a2bc56f977a148eba386dfb73d6286e
2023-04-05 07:48:49 +00:00
Maneesh Gupta 5dc104b3ea SWDEV-368235 - Revert "Remove obsolete env variables"
This reverts commit 7b50c935f8.

Reason for revert: Deferred to a future release.

Change-Id: Ia66c37f0ab9734dee73c930d10d7469d5fd57254
2023-02-15 07:25:00 +00:00
German 7b50c935f8 SWDEV-368235 - Remove obsolete env variables
Change-Id: I7e14d53297e79e2f68b3a6cc40251ad7db9eb5ab
2023-02-03 13:44:24 -05:00
German ad33a021cb SWDEV-352197 - Destroy virtual device in thread destructor
Windows kills threads on exit without any notification. However,
runtime can still destroy VirtualGPU object from the host thread with
HostQueue destruction.
This change also forces RGP trace transfer on the last capture without
any delays.

Change-Id: I768e87e99e1d23a021e63c12f36e450817743759
2023-01-31 10:53:48 -05:00
German 53a10c9039 SWDEV-377991 - Remove liquidflash support
Change-Id: Iba6455e5c0210c3223a06fec332404cd9f489154
2023-01-20 09:57:06 -05:00
Xie,AlexBin 0703b8380b SWDEV-365305 - Same time is observed for CU Med-Priority tests
OCL runtime uses WGP mode and total CU count reported in WGP.
Realtime values are still in CUs. That can mislead in the test results.
Report realtime in WGP values and convert to CUs for KMD.

Change-Id: I90b82615640734dd655be2b613ccac3cb8483239
2023-01-19 11:36:34 -05:00
German c8927cd84e SWDEV-377991 - Remove Liquidflash extension
Initial check-in to untie dependencies with HIP and OCL repos

Change-Id: I363b63954c3f118f40a6ed893545d6a4ac44144c
2023-01-18 13:16:20 -05:00
Jaydeep Patel 9076d9a518 SWDEV-366087 - Pass pitch and slice pitch to blit kernel from rect struct.
Change-Id: I1ffe54929db59a40e2a1ae19c125f8d8e81b07ec
2022-12-20 16:43:49 +00:00
Todd tiantuo Li 9168415ca2 SWDEV-354868 - Queue::Create() for RT queue should fail when number of reserved CU's for RT queue is zero (most likely due to being aligned down by dedicatedCuGranularity).
Change-Id: I234e7ff83cb312bf44f5ad4b1a897c079f5106a9
2022-12-09 16:45:10 -05:00
Ioannis Assiouras 72b45e2a1f SWDEV-369581 - Convey copy API metadata to ROCclr
Change-Id: I569462d6d268700d419510255e201bf7d80d6714
2022-12-09 00:27:15 -05:00
German e5a36ab1ad SWDEV-368308 - Remove HW debug extension
Change-Id: If0c68023c09f0dac9111d52ecc0ad63719aa4e70
2022-11-18 10:29:44 -05:00
German ff6b4db70b SWDEV-363074 - Clean-up sync between SDMA and compute
HIP can't rely on the resource tracking, used in OCL and requires different explicit sync.
Make sure ROCCLR syncs compute only when SDMA is used and vise versa.
The new logic will allow to enable CPDMA without unnecessary waits.

Change-Id: Ib9d1788cfd5afa5ea2fec4c96a37d8b9c4d0059d
2022-10-31 10:02:01 -04:00
Ajay 373a7d1195 SWDEV-347670 - GPU StreamWait and StreamWrite support in Windows PAL backend
Change-Id: Ic4881305b6332e217f3d3127dce7e9d9d0a7df11
2022-09-15 13:57:40 -04:00
Rakesh Roy f097cda948 SWDEV-353941 - Fix hipMemset latency issue for hipMallocManaged
- In case of HMM, use blit kernel instead of CPU memcpy for hipMemset

Change-Id: I89bfc96ff01a2375ed8df1b1c6bc05357dea84f7
2022-09-07 03:20:58 -04:00
Christophe Paquot 905088e4e7 SWDEV-322620 - Virtual Memory Management
Introducing a VirtualMemObj map as it is needed to differentiate
between virtual address ranges and actual physical memory
This is because a whole VA range can have several physical memories
as chunks.

Change-Id: Ie2a972b4faf3f7d552cfa53e77898f80ad75740a
2022-06-06 11:32:22 -07:00
Christophe Paquot 67657d6099 SWDEV-322620 - Virtual Memory Management
Implement map/unmap for PAL backend
Create commands since PAL uses the IQueue to map/unmap

Change-Id: I97e26a7d28ae5e10774c9ca65307153100945621
2022-04-22 18:09:26 -04:00