140 Commits

Author SHA1 Message Date
Xie, AlexBin e22c9b457e SWDEV-576718 - provide option to limit memory cache usage (#2810)
* SWDEV-576718 - provide option to limit memory cache usage

* SWDEV-576718 - Use MiB instead of MB in description
2026-01-26 11:35:01 -05:00
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
Godavarthy Surya, Anusha 1ef6a86ee3 SWDEV-549711 - Improve graph DEBUG dot print for segments (#2205)
Co-authored-by: Anusha GodavarthySurya<agodavar@amd.com>
2026-01-07 14:07:49 +05:30
SaleelK c105dcd05b clr: Use graph segment scheduling to process HIP Graphs (#1372)
* clr: Use graph segment scheduling to process HIP Graphs

* Add a broader path to use capture packet capture for all topologies
* Refactor code
* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path,
  Enabled by default

* clr: Few fixes and improvements

* clr: Detect complex graphs to take classic path

* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling
  path

* clr: Fix a cornercase stack corruption

* clr: Track commands of segments instead of snapshots

* clr: Fix Batch dispatch logic

* Track fence_dirty_ flag for command of other streams
* Dependency resolution markers can now accomodate dirty fence on cross
  streams

---------

Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>
2025-12-01 12:49:26 -08:00
Todd tiantuo Li ee48f6221d SWDEV-562708 - change default maximum SVM size to 256GB (#1731) 2025-11-25 23:59:39 -08:00
German Andryeyev 2c5754844f SWDEV-465041 - Enable direct dispatch under Linux by default. (#1934) 2025-11-25 11:30:32 -05:00
SaleelK 738bb19835 clr: Increase kernelArg/managedBuffer size (#1586)
* Increase the buffer to 4MB. That can help kernel launches limited by a deep kernel pipeline

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-08 18:32:43 -08:00
Pengda Xie a4bbd73dc6 SWDEV-556684 - Remove HSAIL support (#1183) 2025-10-23 11:21:49 -07:00
SaleelK 149dc17c90 clr: Optimize doorbell ring (#1030)
*Lay foundation to batch packets efficiently for graphs
*Dynamically copy packets with max threshold set with
DEBUG_HIP_GRAPH_BATCH_SIZE, if not stagger packet copy with pow2
*Default threshold for DEBUG_HIP_GRAPH_BATCH_SIZE is 256
*If TS are not collected for a signal for reuse, create a new signal.
This can potentially increase signal footprint if the handler doesn't run
fast enough.
2025-09-18 15:02:10 -07:00
Danylo Lytovchenko 2ff2316227 Adjust clang format to the new versions, revert broken macro layout (#714) 2025-08-22 17:23:22 +02:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
Kudchadker, Saleel 3a849c6962 SWDEV-538195 - Introduce threshold for handler submission (#723)
- When doing device/stream sync, we can submit a handler which may
  introduce some host side delays. Use DEBUG_CLR_BATCH_CPU_SYNC_SIZE to
  batch commands for host wait. Default for HIP is 8 commands.
- Investigation is underway in ROCr but need to address this for now in
  HIP runtime.

[ROCm/clr commit: 9b045922a8]
2025-08-06 20:34:42 -07:00
Xie, Pengda b7d8cb56d1 SWDEV-505833 - Remove DEBUG_CLR_SKIP_RELEASE_SCOPE flag (#735)
Cleanup debug flag DEBUG_CLR_SKIP_RELEASE_SCOPE

[ROCm/clr commit: 4121a860bf]
2025-08-05 08:31:55 -07:00
Belton-Schure, Aidan 88c1717658 SWDEV-515426 - Remove HIP_USE_RUNTIME_UNBUNDLER (#205)
* remove HIP_USE_RUNTIME_UNBUNDLER
* clang-format
* Generic to use comgr
* Remove HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION flag
* Removes runtime unbundling unused and debug Code
* Removes stale functions

[ROCm/clr commit: 81238db679]
2025-07-08 21:45:31 +05:30
Harrymanoharan, Jessey 1868e4e595 SWDEV-531711 - Enable skipping host side abort when GPU crashes. (#380)
Co-authored-by: kjayapra-amd <karthik.jayaprakash@amd.com>

[ROCm/clr commit: 3930ae2524]
2025-05-26 17:52:02 +05:30
Dittakavi, Satyanvesh 1cc35da9be SWDEV-438790 - Remove DEBUG_HIP_7_PREVIEW env var keeping the hipGetLastError changes by default (#337)
[ROCm/clr commit: 664bf232dd]
2025-05-21 22:12:45 +05:30
Andryeyev, German c512258e45 SWDEV-528808 - Disable dynamic queue by default (#256)
Dynamic queue management will be disabled by default and
the original sort logic is restored

[ROCm/clr commit: 9b018165ce]
2025-05-05 10:56:35 -04:00
Sang, Tao 68deb3d10a SWDEV-520352 - Remove HostThread and legacy monitor (#230)
* SWDEV-520352 - Remove HostThread and legacy monitor

Remove HostThread, semaphore and  legacy monitor.
Make original logics of thread and command queue stricker.
Add more comments to make logics clearer.
Some other minor improvement.

Also part of SWDEV-458943.

[ROCm/clr commit: 96cadbc9e9]
2025-04-29 09:55:24 -04:00
Jayaprakash, Karthik 49a527c826 SWDEV-506467 - Skip Abort in case of crash from the device. (#60)
Change-Id: I964b2f2647d068202e9c38fcddb1337da754df8d

[ROCm/clr commit: b2388dfb88]
2025-04-29 11:19:02 +05:30
Andryeyev, German f8344154a0 SWDEV-497841 - Enable memory manager by default (#149)
[ROCm/clr commit: a5c860f3b0]
2025-04-22 21:20:37 +05:30
Chaudhary, Jatin Jaikishan e9e207d7b0 SWDEV-517941 - use device bitcode before spirv (#95)
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.

* use cache for already compiled code objects

* address review comments and use the two spirv isa names

[ROCm/clr commit: 07e57a1f0d]
2025-04-14 23:40:52 +01:00
Andryeyev, German 5c7c86f66d SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature

[ROCm/clr commit: 28967982b2]
2025-03-19 11:22:50 -04:00
German Andryeyev f9d9b2c441 SWDEV-497841 - Add virtual memory heap
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.

Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500


[ROCm/clr commit: 296dce5570]
2025-02-20 10:55:49 -05:00
Tao Sang 7803594aea SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24


[ROCm/clr commit: f2ff56af9c]
2025-01-28 12:19:55 -05:00
Saleel Kudchadker d4594531ef SWDEV-506251 - Disable blit copy thresold for OpenCL
Change-Id: Id0ca43b13d5792791a42da263f6aa4496382cea6


[ROCm/clr commit: 39801b5750]
2025-01-08 02:46:01 +00:00
Pengda Xie 612ae28524 SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets
- Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to
   SCOPE_NONE in AQL packet header

Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868


[ROCm/clr commit: 8155943c5f]
2025-01-03 17:28:21 -05:00
Ioannis Assiouras 2c8805e536 SWDEV-483134 - Remove hipExtHostAlloc API
Change-Id: I60777ef5c56b60dd8100d0d794ca10fb3b96a555


[ROCm/clr commit: e8b2fdab96]
2024-12-16 17:13:49 -05:00
Saleel Kudchadker 7d7aa8b69c SWDEV-497145 - Use rocr copyOnEngine API for staged copies
- Refactor blit code and clean ASAN instrumentation
- Use unified function for rocr copy
- Enable shader copy path for unpinned writeBuffer/readBuffer paths
- Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for
  pinned copies or unpinned H2D/D2H copies < 16KB

Change-Id: I42045cca79234b340dbf53dafb93044199736ae4


[ROCm/clr commit: 7863eb92dc]
2024-12-04 13:38:13 -05:00
Satyanvesh Dittakavi 5a16db0cd5 SWDEV-477584 - Match hipGetLastError behavior with CUDA using env var
Change-Id: I4c5acff180ae904028f7c5fdf4e109ffd1f0c4ef


[ROCm/clr commit: e3b8754448]
2024-11-28 01:33:52 -05:00
German Andryeyev a9daa4c8f4 SWDEV-486602 - Disable sysmem pool
Currently amd::Monitor can work in FILO mode for the active waits
and cause a delay in wakeup of some threads. That may have a problem
with the current sysmem pool design.

Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69


[ROCm/clr commit: 9473f143c2]
2024-11-20 11:35:28 -05:00
Saleel Kudchadker 672e4fa835 SWDEV-446123 - Revert "Match hipGetLastError behavior with CUDA using env var"
This reverts commit 941cfd5b36.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I11a456655393bcf4b82d749ce7259bc1b78d1424


[ROCm/clr commit: 582dc7dd6d]
2024-11-08 20:35:13 -05:00
Satyanvesh Dittakavi 941cfd5b36 SWDEV-446123 - Match hipGetLastError behavior with CUDA using env var
Change-Id: Iaec697c1304d746376ecf2bfe2ad683b15ee189f


[ROCm/clr commit: 5f477900a3]
2024-11-07 12:02:34 -05:00
German Andryeyev 4a2687a450 SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293


[ROCm/clr commit: 6bb7d1afdc]
2024-10-18 11:35:54 -04:00
German Andryeyev 0a03665a3f SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396


[ROCm/clr commit: 8657a77029]
2024-10-17 10:53:57 -04:00
German Andryeyev faea40cbb3 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9


[ROCm/clr commit: 364dfb0ed1]
2024-10-11 14:50:25 -04:00
Saleel Kudchadker 5296c77138 SWDEV-301667 - Logging upgrades
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB

Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3


[ROCm/clr commit: 35e03ea0d0]
2024-10-04 13:26:25 -04:00
Saleel Kudchadker 343bdf3187 SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c


[ROCm/clr commit: 9de6d4d46c]
2024-09-11 14:53:15 -04:00
victzhan fde29b7c06 Revert "SWDEV-458943 - make new AMD_MONITOR on"
This reverts commit 47dcfbae6b.

Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b


[ROCm/clr commit: 7a01db98e9]
2024-09-05 10:10:50 -04:00
Ioannis Assiouras a00f071579 SWDEV-470372 - Added hipExtHostAlloc API
This change adds a new HIP API `hipExtHostAlloc` which preserves
the functionality of `hipHostMalloc`.

Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff


[ROCm/clr commit: 2c84211b58]
2024-08-27 08:26:03 -04:00
German Andryeyev 35c7a87014 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e


[ROCm/clr commit: 9db52f9a46]
2024-08-02 14:19:44 -04:00
Saleel Kudchadker 16920809d7 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a


[ROCm/clr commit: d379f4efd0]
2024-07-25 15:15:10 -04:00
taosang2 47dcfbae6b SWDEV-458943 - make new AMD_MONITOR on
make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true

Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac


[ROCm/clr commit: f8598dabb0]
2024-07-24 14:29:27 -04:00
Tao Sang b8cf863eaa SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad


[ROCm/clr commit: 73c02041e1]
2024-07-04 11:50:46 -04:00
Ioannis Assiouras a774b89a43 SWDEV-470787 - Fixed undefined symbols for flags in static build
Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211


[ROCm/clr commit: fa07c33cba]
2024-07-04 02:57:22 -04:00
Ioannis Assiouras af089a2171 SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1


[ROCm/clr commit: 3edf1501cc]
2024-06-12 16:22:27 -04:00
kjayapra-amd 41cb6dadf9 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543


[ROCm/clr commit: 892071aeb2]
2024-06-06 16:57:53 -04:00
Tao Sang 7bf8d102fc SWDEV-433371 - Support new comgr unbundling action
Support new comgr unbundling action api to extract codebjects
in compressed and uncompressed modes.

Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to
toggle new path and old path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default),
   uncompressed codeobject will go old path for better perf,
   compressed   codeobject will go new path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true,
   both uncompressed and compressed codeobjects will go new
   path.

Add comgr wrapper for
   amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32


[ROCm/clr commit: d0050ce309]
2024-05-28 06:31:10 +00:00
Tao Sang 5bf67d7da7 Revert "SWDEV-433371 - use comgr to unbundle code objects"
This reverts commit c0ee0ffa1c.

Reason for revert: <INSERT REASONING HERE>
New comgr unbundling action leads to perf drop for uncompressed code object.   Will create a new patch to use old path for uncompressed , new unbundling api for compressed . 

Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52


[ROCm/clr commit: a1350fe8c1]
2024-05-28 06:31:10 +00:00
taosang2 c0ee0ffa1c SWDEV-433371 - use comgr to unbundle code objects
1.Make runtime use comgr to unbundle code objects
2.Support compressed/uncompressed modes
3.Remove HIP_USE_RUNTIME_UNBUNDLER and
  HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics
4.Add comgr wrapper for
  amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329


[ROCm/clr commit: e53df57ffe]
2024-05-01 16:09:12 -04:00
German Andryeyev daceede8a7 SWDEV-311271 - Enable mempools under Linux
Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c


[ROCm/clr commit: 7a371503b2]
2024-04-29 18:06:34 -04:00