668 Коммитов

Автор SHA1 Сообщение Дата
Xie, AlexBin e22c9b457e SWDEV-576718 - provide option to limit memory cache usage (#2810)
* SWDEV-576718 - provide option to limit memory cache usage

* SWDEV-576718 - Use MiB instead of MB in description
2026-01-26 11:35:01 -05:00
SaleelK 340f3aa887 clr: Implement dynamic stream to HWq logic (#1958)
* clr: Implement dynamic stream to HW queue assignment

This change implements dynamic stream to hardware queue (HWq) mapping
with the following features:

* Queue depth heuristics with weights for optimal HWq assignment
* Make last used queue sticky for better locality
* Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to
  pipe mapping based on creation order (single process per device only,
  as pipe ID is statically assigned by runtime)
* More aggressive heuristic usage for better queue distribution
* Extend dynamic queues support for all stream priorities

Environment variables:
* DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 -
  Depth+Pipe heuristics
* DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation

* clr: Clean up last_used_queue_
2026-01-23 10:40:54 -08:00
Fábio Mestre 61325db1c8 Fix AMD_LOG_LEVEL_SIZE env variable (#2463)
AMD_LOG_LEVEL_SIZE is being used in a global variable.
This always uses the default value of 2048 because the
HIP runtime doesn't have the opportunity to load
environment variables at the point where global variables
are initialized.

The solution is to use AMD_LOG_LEVEL_SIZE inside
truncate_log_file() function.
2026-01-13 09:57:49 +00:00
Godavarthy Surya, Anusha 1ef6a86ee3 SWDEV-549711 - Improve graph DEBUG dot print for segments (#2205)
Co-authored-by: Anusha GodavarthySurya<agodavar@amd.com>
2026-01-07 14:07:49 +05:30
SaleelK c105dcd05b clr: Use graph segment scheduling to process HIP Graphs (#1372)
* clr: Use graph segment scheduling to process HIP Graphs

* Add a broader path to use capture packet capture for all topologies
* Refactor code
* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path,
  Enabled by default

* clr: Few fixes and improvements

* clr: Detect complex graphs to take classic path

* Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling
  path

* clr: Fix a cornercase stack corruption

* clr: Track commands of segments instead of snapshots

* clr: Fix Batch dispatch logic

* Track fence_dirty_ flag for command of other streams
* Dependency resolution markers can now accomodate dirty fence on cross
  streams

---------

Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com>
Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>
2025-12-01 12:49:26 -08:00
Todd tiantuo Li ee48f6221d SWDEV-562708 - change default maximum SVM size to 256GB (#1731) 2025-11-25 23:59:39 -08:00
Karthik Jayaprakash 740a06d567 SWDEV-559267 - Use CLPrint to DevLogPrintf with Log Level - detail debug. (#1160) 2025-11-25 19:25:32 -05:00
German Andryeyev 2c5754844f SWDEV-465041 - Enable direct dispatch under Linux by default. (#1934) 2025-11-25 11:30:32 -05:00
SaleelK 738bb19835 clr: Increase kernelArg/managedBuffer size (#1586)
* Increase the buffer to 4MB. That can help kernel launches limited by a deep kernel pipeline

Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-08 18:32:43 -08:00
SaleelK f301053740 clr: Improve logging (#1457) 2025-10-25 15:55:27 -07:00
Pengda Xie a4bbd73dc6 SWDEV-556684 - Remove HSAIL support (#1183) 2025-10-23 11:21:49 -07:00
SaleelK 149dc17c90 clr: Optimize doorbell ring (#1030)
*Lay foundation to batch packets efficiently for graphs
*Dynamically copy packets with max threshold set with
DEBUG_HIP_GRAPH_BATCH_SIZE, if not stagger packet copy with pow2
*Default threshold for DEBUG_HIP_GRAPH_BATCH_SIZE is 256
*If TS are not collected for a signal for reuse, create a new signal.
This can potentially increase signal footprint if the handler doesn't run
fast enough.
2025-09-18 15:02:10 -07:00
SaleelK c4537e8050 SWDEV-553126 - Improve logging (#835)
* Ability to mask COPY api usage in logs
* Show total graph nodes in logs
* Add another log level for detailed debug
2025-09-04 10:08:41 -07:00
Danylo Lytovchenko 2ff2316227 Adjust clang format to the new versions, revert broken macro layout (#714) 2025-08-22 17:23:22 +02:00
Danylo Lytovchenko f7338717ae SWDEV-470698 - fix formatting, add format check workflow (#657) 2025-08-20 19:58:06 +05:30
GunaShekar, Ajay 5c412edcd1 SWDEV-532576 - clr_logs_<pid>.txt default AMD_LOG_LEVEL_FILE (#480)
avoids app crash and uses default AMD_LOG_LEVEL_FILE if invalid name is passed

[ROCm/clr commit: 76637d7ebe]
2025-08-13 20:27:42 -07:00
Kudchadker, Saleel 3a849c6962 SWDEV-538195 - Introduce threshold for handler submission (#723)
- When doing device/stream sync, we can submit a handler which may
  introduce some host side delays. Use DEBUG_CLR_BATCH_CPU_SYNC_SIZE to
  batch commands for host wait. Default for HIP is 8 commands.
- Investigation is underway in ROCr but need to address this for now in
  HIP runtime.

[ROCm/clr commit: 9b045922a8]
2025-08-06 20:34:42 -07:00
Xie, Pengda b7d8cb56d1 SWDEV-505833 - Remove DEBUG_CLR_SKIP_RELEASE_SCOPE flag (#735)
Cleanup debug flag DEBUG_CLR_SKIP_RELEASE_SCOPE

[ROCm/clr commit: 4121a860bf]
2025-08-05 08:31:55 -07:00
Belton-Schure, Aidan 88c1717658 SWDEV-515426 - Remove HIP_USE_RUNTIME_UNBUNDLER (#205)
* remove HIP_USE_RUNTIME_UNBUNDLER
* clang-format
* Generic to use comgr
* Remove HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION flag
* Removes runtime unbundling unused and debug Code
* Removes stale functions

[ROCm/clr commit: 81238db679]
2025-07-08 21:45:31 +05:30
Lin, Qun 3b44884a57 SWDEV-508869 - Fix Linux build error for HIP on PAL (#176)
[ROCm/clr commit: 9699cc3864]
2025-06-27 07:51:22 +08:00
Harrymanoharan, Jessey 1868e4e595 SWDEV-531711 - Enable skipping host side abort when GPU crashes. (#380)
Co-authored-by: kjayapra-amd <karthik.jayaprakash@amd.com>

[ROCm/clr commit: 3930ae2524]
2025-05-26 17:52:02 +05:30
Dittakavi, Satyanvesh 1cc35da9be SWDEV-438790 - Remove DEBUG_HIP_7_PREVIEW env var keeping the hipGetLastError changes by default (#337)
[ROCm/clr commit: 664bf232dd]
2025-05-21 22:12:45 +05:30
Andryeyev, German c512258e45 SWDEV-528808 - Disable dynamic queue by default (#256)
Dynamic queue management will be disabled by default and
the original sort logic is restored

[ROCm/clr commit: 9b018165ce]
2025-05-05 10:56:35 -04:00
Sang, Tao 68deb3d10a SWDEV-520352 - Remove HostThread and legacy monitor (#230)
* SWDEV-520352 - Remove HostThread and legacy monitor

Remove HostThread, semaphore and  legacy monitor.
Make original logics of thread and command queue stricker.
Add more comments to make logics clearer.
Some other minor improvement.

Also part of SWDEV-458943.

[ROCm/clr commit: 96cadbc9e9]
2025-04-29 09:55:24 -04:00
Jayaprakash, Karthik 49a527c826 SWDEV-506467 - Skip Abort in case of crash from the device. (#60)
Change-Id: I964b2f2647d068202e9c38fcddb1337da754df8d

[ROCm/clr commit: b2388dfb88]
2025-04-29 11:19:02 +05:30
Kudchadker, Saleel 1b1d6b841e SWDEV-510186 - Improve logging (#220)
- Print all arguments for logs, this is useful for debug

[ROCm/clr commit: ce24936970]
2025-04-25 08:40:31 -07:00
Andryeyev, German f8344154a0 SWDEV-497841 - Enable memory manager by default (#149)
[ROCm/clr commit: a5c860f3b0]
2025-04-22 21:20:37 +05:30
Chaudhary, Jatin Jaikishan e9e207d7b0 SWDEV-517941 - use device bitcode before spirv (#95)
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.

* use cache for already compiled code objects

* address review comments and use the two spirv isa names

[ROCm/clr commit: 07e57a1f0d]
2025-04-14 23:40:52 +01:00
Andryeyev, German 5c7c86f66d SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature

[ROCm/clr commit: 28967982b2]
2025-03-19 11:22:50 -04:00
German Andryeyev 77840f1cb9 SWDEV-518474 - Add comgr debug mask
Move prints from CO processing under COMGR debug mask.

Change-Id: I2a417e42a1f4e2922a34eb104c69e4db10b5f1c6


[ROCm/clr commit: cece301fd4]
2025-03-04 14:37:08 -05:00
German Andryeyev f9d9b2c441 SWDEV-497841 - Add virtual memory heap
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.

Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500


[ROCm/clr commit: 296dce5570]
2025-02-20 10:55:49 -05:00
Tao Sang 7803594aea SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24


[ROCm/clr commit: f2ff56af9c]
2025-01-28 12:19:55 -05:00
Branislav Brzak 05057b2a88 SWDEV-508743 - [6.4 Preview] Add ROCm 7.0 breaking change fields
Change-Id: I07bff42731e74a4c409505cf8981342e22ce26be


[ROCm/clr commit: 3fd46a3783]
2025-01-17 06:25:27 -05:00
Saleel Kudchadker d4594531ef SWDEV-506251 - Disable blit copy thresold for OpenCL
Change-Id: Id0ca43b13d5792791a42da263f6aa4496382cea6


[ROCm/clr commit: 39801b5750]
2025-01-08 02:46:01 +00:00
Pengda Xie 612ae28524 SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets
- Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to
   SCOPE_NONE in AQL packet header

Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868


[ROCm/clr commit: 8155943c5f]
2025-01-03 17:28:21 -05:00
Ioannis Assiouras 2c8805e536 SWDEV-483134 - Remove hipExtHostAlloc API
Change-Id: I60777ef5c56b60dd8100d0d794ca10fb3b96a555


[ROCm/clr commit: e8b2fdab96]
2024-12-16 17:13:49 -05:00
Saleel Kudchadker 7d7aa8b69c SWDEV-497145 - Use rocr copyOnEngine API for staged copies
- Refactor blit code and clean ASAN instrumentation
- Use unified function for rocr copy
- Enable shader copy path for unpinned writeBuffer/readBuffer paths
- Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for
  pinned copies or unpinned H2D/D2H copies < 16KB

Change-Id: I42045cca79234b340dbf53dafb93044199736ae4


[ROCm/clr commit: 7863eb92dc]
2024-12-04 13:38:13 -05:00
Satyanvesh Dittakavi 5a16db0cd5 SWDEV-477584 - Match hipGetLastError behavior with CUDA using env var
Change-Id: I4c5acff180ae904028f7c5fdf4e109ffd1f0c4ef


[ROCm/clr commit: e3b8754448]
2024-11-28 01:33:52 -05:00
German Andryeyev a9daa4c8f4 SWDEV-486602 - Disable sysmem pool
Currently amd::Monitor can work in FILO mode for the active waits
and cause a delay in wakeup of some threads. That may have a problem
with the current sysmem pool design.

Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69


[ROCm/clr commit: 9473f143c2]
2024-11-20 11:35:28 -05:00
taosang2 7169a92488 SWDEV-487356 - Fix AMD LOG compiling warining
Change-Id: I757185f9c7c12f736e266219b67daf5836d2a125


[ROCm/clr commit: cc25c5d646]
2024-11-09 12:57:22 -05:00
Saleel Kudchadker 672e4fa835 SWDEV-446123 - Revert "Match hipGetLastError behavior with CUDA using env var"
This reverts commit 941cfd5b36.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I11a456655393bcf4b82d749ce7259bc1b78d1424


[ROCm/clr commit: 582dc7dd6d]
2024-11-08 20:35:13 -05:00
Satyanvesh Dittakavi 941cfd5b36 SWDEV-446123 - Match hipGetLastError behavior with CUDA using env var
Change-Id: Iaec697c1304d746376ecf2bfe2ad683b15ee189f


[ROCm/clr commit: 5f477900a3]
2024-11-07 12:02:34 -05:00
Tao Sang 5fe3dc5bf9 SWDEV-487356 - Fix AMD LOG issue in Win32
Change-Id: Ia1c19cf4ea24188cdb2d374b01f975f794e02dbf


[ROCm/clr commit: 802cacf3e9]
2024-11-01 08:26:25 -04:00
German Andryeyev 4a2687a450 SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293


[ROCm/clr commit: 6bb7d1afdc]
2024-10-18 11:35:54 -04:00
German Andryeyev 0a03665a3f SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396


[ROCm/clr commit: 8657a77029]
2024-10-17 10:53:57 -04:00
German Andryeyev faea40cbb3 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9


[ROCm/clr commit: 364dfb0ed1]
2024-10-11 14:50:25 -04:00
Saleel Kudchadker b9497ea70e SWDEV-301667 - Enable ROCr logging
- Use AMD_LOG_LEVEL=5 to dump AQL packets in ROCr

Change-Id: I2c044a5304c4eaf3d3af20e62d1f54c98d4fbaa4


[ROCm/clr commit: e36666e536]
2024-10-04 19:22:12 -04:00
Saleel Kudchadker 5296c77138 SWDEV-301667 - Logging upgrades
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB

Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3


[ROCm/clr commit: 35e03ea0d0]
2024-10-04 13:26:25 -04:00
pghafari 3fc58e93b3 SWDEV-444447 - Fix regression for verbose printing for AMD_LOG_LEVEL=4
Change-Id: Id245caef711b7ccdf4e999e934993beb43d7c3d5


[ROCm/clr commit: 365ffd4805]
2024-09-18 13:08:10 -04:00
Saleel Kudchadker 343bdf3187 SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c


[ROCm/clr commit: 9de6d4d46c]
2024-09-11 14:53:15 -04:00