Wykres commitów

97 Commity

Autor SHA1 Wiadomość Data
Pengda Xie a4bbd73dc6 SWDEV-556684 - Remove HSAIL support (#1183) 2025-10-23 11:21:49 -07:00
systems-assistant[bot] 1ae36dd856 SWDEV-538181 - Fix 1D buffered image copy (#441)
* SWDEV-538181 - Fix 1D buffered image copy

Fix wrong logics to copy to/from 1D buffered image
in PAL path.

---------

Authored-by: taosang2 <tao.sang@amd.com>
2025-10-09 09:47:11 -04:00
Ioannis Assiouras 35629e433d SWDEV-546146 - Added support for hipMemLocationTypeHost in hipMemSetAccess (#682) 2025-09-10 23:06:20 +01:00
Danylo Lytovchenko 2ff2316227 Adjust clang format to the new versions, revert broken macro layout (#714) 2025-08-22 17:23:22 +02:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
GunaShekar, Ajay bfcf0ef4e8 SWDEV-543366 - Bump PAL_CLIENT_INTERFACE_MAJOR_VERSION 916 --> 932 (#725)
Co-authored-by: Lin, Qun <Quentin.Lin@amd.com>
Co-authored-by: Lin,Qun <qlin@amd.com>

[ROCm/clr commit: ed903e8889]
2025-08-08 08:45:42 -07:00
Andryeyev, German b9669ea266 SWDEV-531678 - Remove split path from the dispatch (#283)
The split path for blit kernels are no longer necessary, since the new blit kernels
don't use the copy size as the global workload

[ROCm/clr commit: da198ac5b2]
2025-05-12 12:50:32 -04:00
GunaShekar, Ajay c4567a9188 SWDEV-523028 - print PAL failure return values in logs (#81)
* print PAL failure return values in logs
* dump kernel info incase of PAL failure

[ROCm/clr commit: 99ef573399]
2025-04-29 11:23:43 +05:30
Patel, Jaydeepkumar 2f3bc7f01c SWDEV-521011 - Allow max stack size as per ISA. (#73)
[ROCm/clr commit: 9e7248aa36]
2025-04-08 10:15:38 +05:30
Pengda Xie 021ca96766 SWDEV-497619 - Ensure suballocSize is integer multiple of 4096
Change-Id: Iefc452d73566f58cfb63391a68c836f30d77dd6c


[ROCm/clr commit: b02b1858c0]
2025-03-07 15:36:57 -05:00
German Andryeyev 6f2a603277 SWDEV-497619 - Allocate extra space in CB
Compute doesn't support IB chaining, but RGP may collect
perf counters, which require more space in CB.
Increase CB size if RGP is enabled.

Change-Id: Iaa0a620ead8541a679b0dfe5e5711af5afdba545


[ROCm/clr commit: 63cf3057ba]
2025-02-20 10:40:09 -05:00
Aidan Belton-Schure 4b4a35b86b SWDEV-508279 - Improve HIP event profiling
There are 2 functional changes to this patch:
* Use GPU timing for internal markers for HIP.
* Measure CPU time closer to GPU timer, to reduce delta between GPU/CPU timestamp measurements.

There are some smaller non-functional updates:
* waifForFence -> waitForFence typo
* Remove unused drmProfiling

Change-Id: I4c5fa600a842ab60e454888779edcac8449a902a


[ROCm/clr commit: 179801a750]
2025-02-13 04:15:40 -05:00
Todd tiantuo Li 170e45b879 SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL
Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb


[ROCm/clr commit: 41dc4545fc]
2024-10-10 18:00:19 -04:00
Daniel Livingston 7c0ff614a2 SWDEV-77148 - Add UberTrace support to PAL device
This PR adds UberTrace-based tracing support to ROCclr's PAL device class.
Legacy RGP-based tracing is still available and is the default.
If UberTrace support is enabled tool-side, this new code path will activate.

Change-Id: I268b2dcef70e850a50e2caef8355f38bf51d4641


[ROCm/clr commit: e550032d25]
2024-09-17 16:06:37 -04:00
Jaydeep Patel 7fa7a7cae5 SWDEV-475938 - Update dynamic stack in submit kernel internal.
Change-Id: I816bf9cfe8aaac5486ff3b719dbdc4f4d6134e01


[ROCm/clr commit: 9c90bc43a5]
2024-09-11 00:59:45 -04:00
kjayapra-amd eecbcddaf3 SWDEV-439234 - Access check before memcpy and kernel operations.
Change-Id: I7057125c03460db205409e19980145298c190fe2


[ROCm/clr commit: 6211037f63]
2024-09-06 14:30:00 -04:00
kjayapra-amd afd72f9ad0 SWDEV-478099 - Fix multiple mapping case on PAL/Windows backend.
Change-Id: Id1fe7939fbf90649cda1848890b3b4ca9a1fcd00


[ROCm/clr commit: 2a9cb89228]
2024-08-27 11:19:39 -04:00
Rahul Manocha 1b14058283 SWDEV-474617 SWDEV-464679 - Fix segfault in palvirtual due to peer memory access
Change-Id: Ib8b641712d78acf8bc073ca5705dea97af6f944a


[ROCm/clr commit: 432bdd7bf2]
2024-08-21 11:34:15 -04:00
Ioannis Assiouras 407d1346f2 SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd
Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd


[ROCm/clr commit: 775dc204aa]
2024-06-07 12:23:06 -04:00
Ioannis Assiouras 0e023d1a0a SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb


[ROCm/clr commit: b8c2ac4de4]
2024-06-06 04:05:55 -04:00
Payam Ghafari f8d4cca28b SWDEV-447691 - added error reporting on semaphore
Change-Id: Id903806d122c0594d6549d5e8b7201512eff9850


[ROCm/clr commit: f268b48a2d]
2024-05-28 06:31:10 +00:00
German Andryeyev b1c0f73229 SWDEV-353281 - Corret VA unmap
Make sure graph mempool unmaps VA on release

Change-Id: Id3f1bd8d0115b533ae60aa5ba3676b8bf7e5b961


[ROCm/clr commit: 5c1804aa14]
2024-04-26 09:37:01 -04:00
German Andryeyev 74d80fb509 SWDEV-440746 - Remove obsolete code
The "optimized" version of memcpy is outdated and
was used in win32 only.

Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6


[ROCm/clr commit: 7448113cfc]
2024-04-22 09:56:42 -04:00
Ioannis Assiouras 2f430138c5 SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args
Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7


[ROCm/clr commit: bf74ef4025]
2024-04-19 09:29:20 -04:00
German Andryeyev 562f3ef098 SWDEV-440746 - Fix the hostcall buffer creation
Avoid a deadlock on the host call buffer creation. Since the buffer will be
allocated in the queue thread, then use direct device memory allocation
skipping the global context lock.

Change-Id: I09b55ee03bb42ab5d320c152b52a8c842c5fdcc1


[ROCm/clr commit: 62559a6e5a]
2024-04-17 12:37:23 -04:00
Saleel Kudchadker f3aedfbec0 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006


[ROCm/clr commit: c157bfb202]
2024-03-26 14:47:24 -04:00
German Andryeyev eb355d0159 SWDEV-449558 - Update barrier's logic
PAL optimized the logic for the barriers, which caused failures with CP DMA on Navi4x.
Change barrier's code to match the most recent PAL optimizations.

Change-Id: I55eeab20f51eb8e920bcbb4b55fbe3c7f77fd3fa


[ROCm/clr commit: 1239309c90]
2024-03-18 10:52:32 -04:00
kjayapra-amd 8947420e41 SWDEV-423835 - Fixing kernel launch issues on Virtual Memory Management path.
Change-Id: I9f5e8a3d83af3809b2c50b21a10697e26113dd23


[ROCm/clr commit: f5ca620baa]
2024-03-12 17:22:07 -04:00
Ajay e8a077dc68 SWDEV-347670 - StreamWait and StreamWrite on Windows
__amd_streamOpsWrite blitkernel in device-libs has only 3 args.
so getting rid of the 4th unused arg (sizeBytes)

Change-Id: I81cc1107f8b424bf58558c93a2495a1b878aef91


[ROCm/clr commit: e643406caa]
2024-02-26 22:45:10 -05:00
German 37ed51c99c SWDEV-440746 - Enable hipExtAnyOrderLaunch extension for PAL
Extension allows to execute the kernels without a wait barrier and L1
invalidation.

Change-Id: I96c485204303f54a0240b93134f4560673e4bd17


[ROCm/clr commit: 13c6f56ca9]
2024-01-16 15:20:39 -05:00
German 7461a5b46f SWDEV-438532 - Enable wave limit for HSAIL
Luxmark still uses HSAIL path and one subtest can benefit from the wave limit.

Change-Id: I16c94e09cd6e2afd6341cb76bf2e9ab7b7713214


[ROCm/clr commit: dec1158d04]
2024-01-09 17:00:50 -05:00
German 4750a76899 SWDEV-404889 - Enable debugger interface in PAL
Add GPU_DEBUG_ENABLE to control ttpm behavior. If enabled,
then HW will collect more debug info at some perf cost

Change-Id: Icee0686b903a7b1bd483710b9d611877cd43c6aa


[ROCm/clr commit: 7d661bc7df]
2024-01-02 11:51:42 -05:00
German cba839f38d SWDEV-436796 - Enable device memory for kernel arguments
Extra CPU read back will be performed before every submission to make sure
previous writes over PCIE reached GPU. HDP flush is done by CP.

Change-Id: I402d28ca26c8cee4a3920feb3599af8c285d0889


[ROCm/clr commit: cfc07c88ee]
2023-12-15 13:11:50 -05:00
German Andryeyev e390ec044f SWDEV-432174 - Change the fillBuffer kernel
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG

Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e


[ROCm/clr commit: f1dc81f427]
2023-11-16 14:25:55 -04:00
German 32e02383ba SWDEV-430256 - Expose HIP_FORCE_DEV_KERNARG under PAL
Add support of HIP_FORCE_DEV_KERNARG under PAL.
Fix persistent memory detection for a resource view.

Change-Id: Ifb7db2db14e0c2205a9661cfa53887ec61ab26a4


[ROCm/clr commit: 5f297d75d9]
2023-11-08 10:01:22 -05:00
Saleel Kudchadker 5f009b7cb1 SWDEV-422207 - Track commands for capture
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues

Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f


[ROCm/clr commit: 40f41f4d0b]
2023-11-03 05:09:04 +00:00
German 5d9912f48b SWDEV-407533 - [ABI Break]Remove Wavelimiter
Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd


[ROCm/clr commit: 7be3a5e33e]
2023-09-07 09:58:41 -04:00
German db1e03f276 SWDEV-3 - Move PAL to version 818
Restore PAL platform destruction.
Update CmdAllocatorCreateInfo::AllocInfo for the new interface.

Change-Id: Iea418eed7ee26166039a4a9cc1999438856e9097


[ROCm/clr commit: bd00826446]
2023-08-29 12:46:28 -04:00
German 3f4bbcfdba SWDEV-407533 - [ABI Break]Purge unused env vars
Change-Id: I627950e8ebb6299affc602754a20d442dbe42b14


[ROCm/clr commit: 077311153a]
2023-08-24 14:11:40 -04:00
Tao Sang 3fdd346cf2 SWDEV-417727 - Fix hipSignalExternalSemaphoresAsync()
This reverts commit cab71e6e00.

Implement the right way to make ExternalSemaphores be signalled
only after prior works on the stream have been finished.

Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa


[ROCm/clr commit: d433df4761]
2023-08-23 22:31:27 -04:00
German 85d075fa82 SWDEV-404889 - Inital change for debugger support
- Program unique AQL index for debugger. The logic manages AQL array of packets per HW queue.
- Provide debug state to PAL

Change-Id: I38fa1f5435fa711fd1d44dc391f2e61eb2a25efa


[ROCm/clr commit: d97cc0abbd]
2023-08-23 13:21:58 -04:00
Jaydeep Patel 1d785911ed SWDEV-408283 - Sync scratchRegs_, privateMemSize_ and workitemPrivateSegmentSize.
Change-Id: I623a7140810ff9867f8816bf4c8621a1fe921744


[ROCm/clr commit: ff1a999f66]
2023-07-27 00:31:54 -04:00
German af5944dc71 SWDEV-311270 - Add IPC support for memory pools
Initial implementation for hipMemPoolExportToShareableHandle,
hipMemPoolImportFromShareableHandle,
hipMemPoolExportPointer and hipMemPoolImportPointer

Change-Id: I0ebdc48e9163b394ded560adca6c38bbc5aee7d1


[ROCm/clr commit: 1a0c3e4dc4]
2023-06-15 11:36:52 -04:00
Jaydeep Patel 34f9de0f7e SWDEV-397168 - Enable dynamic call stack size for PAL.
Change-Id: I8be51ffb48e6a742117491a4bf6f12f152e4a0b3


[ROCm/clr commit: 0eb96cbc59]
2023-05-07 23:26:28 -04:00
German 8d97827417 SWDEV-353281 - VM support in mempool for graphs
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.

Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d


[ROCm/clr commit: 04b696abee]
2023-05-05 15:31:26 -04:00
German 5af1af9c57 SWDEV-353281 - Move VirtualMem map update to memobj
- The implementation in mempool graphs requires refcounting VA object.
That requires release() to update the map only on the actual destruction.
- Add GPU event tracking for paging operation. Otherwise, runtime
may not always flush IB.

Change-Id: Idf99ffb894321a38e04b490116a7ca435635918d


[ROCm/clr commit: 7ef2da5aba]
2023-04-28 17:22:11 -04:00
Ajay db10db5d99 SWDEV-381627 - adding cl interop files to vdi
Change-Id: Ic40363587a2bc56f977a148eba386dfb73d6286e


[ROCm/clr commit: 88736010fb]
2023-04-05 07:48:49 +00:00
Maneesh Gupta d7fdd9fcb8 SWDEV-368235 - Revert "Remove obsolete env variables"
This reverts commit dfa7790030.

Reason for revert: Deferred to a future release.

Change-Id: Ia66c37f0ab9734dee73c930d10d7469d5fd57254


[ROCm/clr commit: 5dc104b3ea]
2023-02-15 07:25:00 +00:00
German dfa7790030 SWDEV-368235 - Remove obsolete env variables
Change-Id: I7e14d53297e79e2f68b3a6cc40251ad7db9eb5ab


[ROCm/clr commit: 7b50c935f8]
2023-02-03 13:44:24 -05:00
German f857dcc48d SWDEV-352197 - Destroy virtual device in thread destructor
Windows kills threads on exit without any notification. However,
runtime can still destroy VirtualGPU object from the host thread with
HostQueue destruction.
This change also forces RGP trace transfer on the last capture without
any delays.

Change-Id: I768e87e99e1d23a021e63c12f36e450817743759


[ROCm/clr commit: ad33a021cb]
2023-01-31 10:53:48 -05:00