69 Commits

Autor SHA1 Nachricht Datum
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
Tao Sang 163e44d0a8 SWDEV-555889 - Support mipmap on rocr (#2082)
* SWDEV-555889 - Support mipmap on rocr

Support mipmap in hip-rt on rocr backend.
Enable all mipmap tests in Windows.
Some other minor improvement.

Add some SRD logs that will be removed finally.

* Add sampler.mipFilter to fix sampler issues on mipmap in rocr.
Fix format issues of view of leveled image and  mipmap image in blit kernel in rocr.
Enabled disabled mipmap tests.

* Rewrite view logic

* Set word4.f.PITCH = 0 for mipmap SRD on navi31 to fix unstable test issues.
Reset last error in nagative tests.

* Remove SRD dump log from hip-rt
Let Rocr mipmap log be in condition.

* minor format chang

* Exclude mipmap tests for mi200+ which don't support mipmap.
2026-01-21 09:10:29 -08:00
Jin Jung deaf8ab38a SWDEV-567119 - Windows GL Interop Support (#1892) 2025-12-08 11:03:59 -05:00
Pengda Xie a4bbd73dc6 SWDEV-556684 - Remove HSAIL support (#1183) 2025-10-23 11:21:49 -07:00
Ajay GunaShekar f2ad8d6d5e SWDEV-553099 - remove WITHOUT_HSA_BACKEND usage (#831) 2025-09-03 08:40:25 -07:00
Danylo Lytovchenko 2ff2316227 Adjust clang format to the new versions, revert broken macro layout (#714) 2025-08-22 17:23:22 +02:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
Andryeyev, German 6df9a49437 SWDEV-465041 - Add support for user events with DD (#321)
* SWDEV-465041 - Add support for user events with DD

User events can be replaced with HSA signals. Add the interface
to allocate HSA signal for user events and update the status on
CL_COMPLETE.
Force pinned path with DD to avoid blocking calls. Pinned memory
can be released only when the command is complete.
Simplify device enqueue path to use generic kernel arg buffer and
signals

* Fix notifyCmdQueue() logic for OCL

* Avoid blocking calls in OCL with DD

* Add event  destruciton in a case of the failure.

[ROCm/clr commit: 2305f8ae56]
2025-08-12 19:04:36 -04:00
Xie, Jiabao(Jimbo) e1d2194b75 SWDEV-528913 - support gfx950 in rocsetting (#217)
* SWDEV-528913 - support gfx950 in rocsetting

---------

Co-authored-by: Jimbo Xie <jiabaxie@amd.com>

[ROCm/clr commit: a320a3f214]
2025-05-07 15:44:49 -04:00
Andryeyev, German 5c7c86f66d SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature

[ROCm/clr commit: 28967982b2]
2025-03-19 11:22:50 -04:00
Saleel Kudchadker d0a7ae02cf SWDEV-513197 - Unify getBuffer implementation
- Use getBuffer/releaseBuffer in BlitManager
- Cleanup XferBuffer as we use ManagedBuffer for both reads/writes

Change-Id: I2661b85dd012763b17a38a743fec1b1d79125f67


[ROCm/clr commit: 37d606d193]
2025-02-28 12:47:51 -05:00
Rahul Manocha 90337103ac SWDEV-510849 - Restore pinned memory copy path
1) Create getBuffer method to return pinned host memory or staging buffer
2) for D2H path use managed buffer instead of static buffer
3) use staging buffer copy for 16KB < size < 1MB
4) use pinned memory copy for size > 1MB

Change-Id: I13d4d6ab60691bc6c7724239db1e11e23f0f3dc2


[ROCm/clr commit: 4bf634dfca]
2025-02-26 11:25:02 -05:00
taosang2 40df900647 SWDEV-501963 - Add missing codes for gfx950
Cherry-pick https://gerrit-git.amd.com/c/compute/ec/clr/+/1162997

Change-Id: I6b3c6bf55c61cffd43cd6f17b75998f751b75723


[ROCm/clr commit: 32daa8f384]
2025-01-31 14:34:49 -05:00
German Andryeyev 584c9c1ee1 SWDEV-440746 - Fix a typo with GPU_PINNED_XFER_SIZE
Change-Id: I8fdbfb4e1c6b1274206c28a529eee9ebeaaa26fb


[ROCm/clr commit: dceb320ba7]
2024-10-24 18:33:14 -04:00
Saleel Kudchadker 343bdf3187 SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c


[ROCm/clr commit: 9de6d4d46c]
2024-09-11 14:53:15 -04:00
Ioannis Assiouras 75104df3b2 SWDEV-464648 - code and comment cleanups
Change-Id: I5ba3f1bff500b3cd5903c2f441017735e688f83f


[ROCm/clr commit: 8f42ad6aa3]
2024-06-07 22:38:09 +01:00
Ioannis Assiouras 407d1346f2 SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd
Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd


[ROCm/clr commit: 775dc204aa]
2024-06-07 12:23:06 -04:00
German Andryeyev ad24101e5e SWDEV-451594 - Correct preMI100 detection
Change-Id: I4f1570a64cebf1ff73b4d189c17b7d7db095009c


[ROCm/clr commit: a4dbc97bd7]
2024-05-28 06:31:10 +00:00
kjayapra-amd 27bc1632f1 SWDEV-417091 - Disable GWS Init for PAL/Windows side.
Change-Id: Ib6295f063daa835c1f33f21f50c083241a9026ff


[ROCm/clr commit: 931431fc38]
2024-05-28 06:31:10 +00:00
Ioannis Assiouras 6a0f554fa6 SWDEV-451594 - Fallback to host kernel args on older devices
On gfx8, gfx9 devices before MI100 and gfx10.0 or gfx10.1
none of the memory ordering workarounds for device kernel arguments
can be applied. Use host kernel arguments on these devices.

Change-Id: I9be6fbfe4b3986eb7d9f83998334df5f03fd4124


[ROCm/clr commit: 2b746de6de]
2024-05-28 06:28:17 +00:00
Ioannis Assiouras a21913a0bd SWDEV-451594 - Change device kernel args to use HDP flush by default
The Readback and Avoid HDP Flush memory ordering workaround is
used as a fallback solution only when HDP flush register is invalid

Change-Id: Ic284eba1f95ed22b0270d3abeb904fb902015b1a


[ROCm/clr commit: 6cb7b6ec6b]
2024-05-02 19:35:13 +00:00
Ioannis Assiouras 2f430138c5 SWDEV-451594 - Implement Readback and Avoid HDP Flush workaround for device kernel args
Change-Id: I6d41a089a17f55306e7ff402588a1e831b20a7a7


[ROCm/clr commit: bf74ef4025]
2024-04-19 09:29:20 -04:00
German Andryeyev f29d608ca3 SWDEV-455254 - Add kernel arg optimization
Add kernel arguments optimization into blit path.
Enabled by default on MI300.

Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e


[ROCm/clr commit: f0c7ecf617]
2024-04-10 18:08:37 -04:00
Ioannis Assiouras b46d3c0f8d SWDEV-451166 - Disable kernel args for non-XGMI if HDP flush register is invalid
Change-Id: I227e046e2b9cb25476a50240f5d070adbd558f21


[ROCm/clr commit: 96f5c44851]
2024-03-15 05:27:52 -04:00
Saleel Kudchadker ce7b62d15c SWDEV-443760 - Enable device kernel args for MI300
- Enable Device kernel args for MI300* for now.
- Fix a perf issue which impacts graph instantiate when dev kernel args
are enabled.

Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f


[ROCm/clr commit: 68f40f78dd]
2024-02-28 19:10:04 -05:00
Saleel Kudchadker ec59b1bc3e SWDEV-443760 - Enable device kern args
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback

Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2


[ROCm/clr commit: f138e0d113]
2024-02-20 02:03:14 -05:00
German 339523c475 SWDEV-440746 - Limit WG for compute P2P
Use only 16 workgroups for compute P2P copies.
That should be enough to utilize XGMI bandwidth.

Change-Id: I60dfe019279bb95f93c8874244c1738aad1896d8


[ROCm/clr commit: 31101c6219]
2024-01-12 14:56:29 -05:00
German Andryeyev e390ec044f SWDEV-432174 - Change the fillBuffer kernel
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG

Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e


[ROCm/clr commit: f1dc81f427]
2023-11-16 14:25:55 -04:00
Saleel Kudchadker 153bb15f46 SWDEV-301667 - Support device kernel args for PCIE
Change-Id: I5e51602bea5a68734227fd62e11ab68eb1ad81c1


[ROCm/clr commit: 5c591b5877]
2023-11-15 14:37:41 -05:00
kjayapra-amd 96580585c3 SWDEV-419688 - Do not run GWS init kernel for targets > gfx12 and MI300.
Change-Id: I8e7441268978be71ab8a5a33e7f8bcf69660e500
(cherry picked from commit 36d37ef614909c0f215512aac0c133408d787080)


[ROCm/clr commit: 6a8bc3c718]
2023-10-05 14:57:56 -04:00
Sourabh Betigeri 22f367a172 SWDEV-418855 - Limits the 'no GWS' approach to gfx940, gfx11and gfx12
Change-Id: Iab2d34d3142902517124cec7ef3461cf7aa4b98c


[ROCm/clr commit: 7dc78d234d]
2023-08-30 23:48:02 -04:00
German 3f4bbcfdba SWDEV-407533 - [ABI Break]Purge unused env vars
Change-Id: I627950e8ebb6299affc602754a20d442dbe42b14


[ROCm/clr commit: 077311153a]
2023-08-24 14:11:40 -04:00
Maneesh Gupta d7fdd9fcb8 SWDEV-368235 - Revert "Remove obsolete env variables"
This reverts commit dfa7790030.

Reason for revert: Deferred to a future release.

Change-Id: Ia66c37f0ab9734dee73c930d10d7469d5fd57254


[ROCm/clr commit: 5dc104b3ea]
2023-02-15 07:25:00 +00:00
German dfa7790030 SWDEV-368235 - Remove obsolete env variables
Change-Id: I7e14d53297e79e2f68b3a6cc40251ad7db9eb5ab


[ROCm/clr commit: 7b50c935f8]
2023-02-03 13:44:24 -05:00
Saleel Kudchadker 7ba49616e9 SWDEV-371123 - Use barrier value packet for event records
Change-Id: I5e5e5e89e0d96a2430b4682d168b76848fa5b94e


[ROCm/clr commit: 4f64d89026]
2022-12-07 17:57:36 -05:00
Sourabh Betigeri 7aa958a8f7 SWDEV-305894 - Cooperative groups grid and multi grid sync support for gfx940+
Change-Id: I35d72f1cb50c3a96eee56a612b72d641852b145f


[ROCm/clr commit: 5d7f3f9f3c]
2022-12-05 16:30:30 -05:00
German 4b6a6ba8e8 SWDEV-363074 - Adjust staging copy limits in Windows
Pinned copy can cause big performance drops, because slow pinning under Windows.
Use up to 128MB for staging transfers. Change staging buffer size to 4MB.
Linux path should still have the old defaults.

Change-Id: I954edceb3ec89e8e670be116aa2d0a9564c8b11c


[ROCm/clr commit: 79d12df147]
2022-11-17 14:48:16 -05:00
German Andryeyev 34ed734a66 SWDEV-344280 - Use coarse grain sysmem for kernel arg on MI200
Change-Id: I9596f0e8b88699538ec271b3a4345e5f75b968e3


[ROCm/clr commit: d8e4a289b3]
2022-06-29 13:04:46 -04:00
German Andryeyev 3c4f97f66c SWDEV-286150 - Remove GSL backend
Change-Id: Iba9a997ee7d5ff6ac00d5888ff189a4514958fe9


[ROCm/clr commit: 525a1bbf1a]
2022-02-09 17:16:39 -05:00
Satyanvesh Dittakavi 85c2cac111 SWDEV-306939 - Fix vdi errors/warnings by CppCheck
Change-Id: I56d910f8363787f1050d5d7e8064ed553c5827fd


[ROCm/clr commit: e20dd61932]
2022-01-12 00:22:16 -05:00
Saleel Kudchadker 97456a157b SWDEV-308843 - Increase MaxPinnedXferSize to 128
This allows experimenting with env var GPU_PINNED_XFER_SIZE which is
still at a default of 32MB

Change-Id: I85ade700ed58d498eba29d1737601dc74d4c26a4


[ROCm/clr commit: 3f82b99f5d]
2021-12-01 20:37:56 -05:00
Saleel Kudchadker 1bf9b39cf8 SWDEV-301667 - Kern arg placement
Add a env var ROC_USE_FGS_KERNARG to toggle kernel arg placement
By default its in Fine Grain Kernel arg segment for supported asics.

Change-Id: I3d57ed69a1a4db2b392b0438ead499f3ddca4716


[ROCm/clr commit: e29b9c00ee]
2021-09-02 12:36:49 -04:00
Jason Tang 8235cb4462 SWDEV-296911 - Enable clgl interop for both MesaGL and OrcaGL
Change-Id: Ie3ad85a8335b1fc751812c09bb0cd30aad38dcae


[ROCm/clr commit: f165737096]
2021-08-22 23:56:08 -07:00
agunashe 49f0546637 SWDEV-293742 - Update copyright end year VDI repo
Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261


[ROCm/clr commit: d96481fb36]
2021-08-22 23:56:07 -07:00
German Andryeyev 5e70450a24 SWDEV-240804 - Enable HMM build by default
Change-Id: Ia6175dff8eda8c18b7a7bb4ca87a90c1f3e4e6fb


[ROCm/clr commit: ea3dba0832]
2021-04-26 17:36:53 -04:00
Saleel Kudchadker 6c304e4027 SWDEV-276120 - Remove support for barrier sync
ROC_BARRIER_SYNC will not work with direct dispatch.
Remove and cleanup.

Change-Id: I81368b2e65039477bd0343bb92708dab48867db6


[ROCm/clr commit: aa38af8c96]
2021-04-07 17:08:39 -04:00
German Andryeyev e8b1e484f5 SWDEV-274199 - Enable SVM tracking
ROCr/KFD doesn't validate memory pointers. Enable validation inside
ROCclr, using SVM tracking mechanism.

Change-Id: I581e32ff37187f9ed8d9a302e8fd9f6ca935bdd7


[ROCm/clr commit: fbde61de7f]
2021-03-03 13:18:56 -05:00
Jason Tang 09259cd49f SWDEV-198364 - Only enable clgl sharing in ROCm path when building LinuxPro
Change-Id: Ie4d87e252519d090a62b930f7ebb315d3477b690


[ROCm/clr commit: 54a7170e40]
2021-02-23 14:15:04 -05:00
German Andryeyev f96e973378 SWDEV-257787 - Add engine tracking per signal
- The logic will trace compute, sdma read/write operations and
apply signals when necessary
- ROC_CPU_WAIT_FOR_SIGNAL, ROC_SYSTEM_SCOPE_SIGNAL
and ROC_SKIP_COPY_SYNC were added to control the tracking

Change-Id: I9e8e6174c63bf7784f7ab00964e2918c8667d364


[ROCm/clr commit: dbc7abaecf]
2021-01-25 12:34:45 -05:00
Tony Tye 902cf1a239 Update code object handling for GSL, PAL and ROCm
- Correct GSL path to report targets using the TargetID syntax.

- Correct GSL path to check compatibility of code objects when
  loading.

- Add concept of an device isa and create a registery used by ROCm,
  PAL and GSL.

- Support XNACK and SRAMECC target features consistently for PAL and ROCm.

- Correct logic for NullDevices and asserts to avoid memory coruption.

- Allow all NullDevices to be created for HIP.

- Numerous other code improvements.

Change-Id: I40abf3d2b22249c1492d1af5919665f8184f4e0e


[ROCm/clr commit: c7e8d91e14]
2021-01-14 11:11:51 -05:00