rocm-systems

Author	SHA1	Message	Date
Xie, AlexBin	e22c9b457e	SWDEV-576718 - provide option to limit memory cache usage (#2810 ) * SWDEV-576718 - provide option to limit memory cache usage * SWDEV-576718 - Use MiB instead of MB in description	2026-01-26 11:35:01 -05:00
SaleelK	340f3aa887	clr: Implement dynamic stream to HWq logic (#1958 ) * clr: Implement dynamic stream to HW queue assignment This change implements dynamic stream to hardware queue (HWq) mapping with the following features: * Queue depth heuristics with weights for optimal HWq assignment * Make last used queue sticky for better locality * Use pipe HWq to pipe mapping - gfx9 follows a round-robin queue to pipe mapping based on creation order (single process per device only, as pipe ID is statically assigned by runtime) * More aggressive heuristic usage for better queue distribution * Extend dynamic queues support for all stream priorities Environment variables: * DEBUG_HIP_DYNAMIC_QUEUE: 0 - disabled, 1 - Depth heuristics 2 - Depth+Pipe heuristics * DEBUG_HIP_IGNORE_STREAM_PRIORITY=1: ignore priority stream creation * clr: Clean up last_used_queue_	2026-01-23 10:40:54 -08:00
Godavarthy Surya, Anusha	1ef6a86ee3	SWDEV-549711 - Improve graph DEBUG dot print for segments (#2205 ) Co-authored-by: Anusha GodavarthySurya<agodavar@amd.com>	2026-01-07 14:07:49 +05:30
SaleelK	c105dcd05b	clr: Use graph segment scheduling to process HIP Graphs (#1372 ) * clr: Use graph segment scheduling to process HIP Graphs * Add a broader path to use capture packet capture for all topologies * Refactor code * Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING to toggle new vs classic path, Enabled by default * clr: Few fixes and improvements * clr: Detect complex graphs to take classic path * Use DEBUG_HIP_GRAPH_SEGMENT_SCHEDULING=2 to force segment scheduling path * clr: Fix a cornercase stack corruption * clr: Track commands of segments instead of snapshots * clr: Fix Batch dispatch logic * Track fence_dirty_ flag for command of other streams * Dependency resolution markers can now accomodate dirty fence on cross streams --------- Co-authored-by: Ioannis Assiouras <Ioannis.Assiouras@amd.com> Co-authored-by: Godavarthy Surya, Anusha <agodavar@amd.com>	2025-12-01 12:49:26 -08:00
Todd tiantuo Li	ee48f6221d	SWDEV-562708 - change default maximum SVM size to 256GB (#1731 )	2025-11-25 23:59:39 -08:00
German Andryeyev	2c5754844f	SWDEV-465041 - Enable direct dispatch under Linux by default. (#1934 )	2025-11-25 11:30:32 -05:00
SaleelK	738bb19835	clr: Increase kernelArg/managedBuffer size (#1586 ) * Increase the buffer to 4MB. That can help kernel launches limited by a deep kernel pipeline Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>	2025-11-08 18:32:43 -08:00
Pengda Xie	a4bbd73dc6	SWDEV-556684 - Remove HSAIL support (#1183 )	2025-10-23 11:21:49 -07:00
SaleelK	149dc17c90	clr: Optimize doorbell ring (#1030 ) Lay foundation to batch packets efficiently for graphs Dynamically copy packets with max threshold set with DEBUG_HIP_GRAPH_BATCH_SIZE, if not stagger packet copy with pow2 Default threshold for DEBUG_HIP_GRAPH_BATCH_SIZE is 256 If TS are not collected for a signal for reuse, create a new signal. This can potentially increase signal footprint if the handler doesn't run fast enough.	2025-09-18 15:02:10 -07:00
Danylo Lytovchenko	2ff2316227	Adjust clang format to the new versions, revert broken macro layout (#714 )	2025-08-22 17:23:22 +02:00
Danylo Lytovchenko	f7338717ae	SWDEV-470698 - fix formatting, add format check workflow (#657 )	2025-08-20 19:58:06 +05:30
Kudchadker, Saleel	3a849c6962	SWDEV-538195 - Introduce threshold for handler submission (#723 ) - When doing device/stream sync, we can submit a handler which may introduce some host side delays. Use DEBUG_CLR_BATCH_CPU_SYNC_SIZE to batch commands for host wait. Default for HIP is 8 commands. - Investigation is underway in ROCr but need to address this for now in HIP runtime. [ROCm/clr commit: `9b045922a8`]	2025-08-06 20:34:42 -07:00
Xie, Pengda	b7d8cb56d1	SWDEV-505833 - Remove DEBUG_CLR_SKIP_RELEASE_SCOPE flag (#735 ) Cleanup debug flag DEBUG_CLR_SKIP_RELEASE_SCOPE [ROCm/clr commit: `4121a860bf`]	2025-08-05 08:31:55 -07:00
Belton-Schure, Aidan	88c1717658	SWDEV-515426 - Remove HIP_USE_RUNTIME_UNBUNDLER (#205 ) * remove HIP_USE_RUNTIME_UNBUNDLER * clang-format * Generic to use comgr * Remove HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION flag * Removes runtime unbundling unused and debug Code * Removes stale functions [ROCm/clr commit: `81238db679`]	2025-07-08 21:45:31 +05:30
Harrymanoharan, Jessey	1868e4e595	SWDEV-531711 - Enable skipping host side abort when GPU crashes. (#380 ) Co-authored-by: kjayapra-amd <karthik.jayaprakash@amd.com> [ROCm/clr commit: `3930ae2524`]	2025-05-26 17:52:02 +05:30
Dittakavi, Satyanvesh	1cc35da9be	SWDEV-438790 - Remove DEBUG_HIP_7_PREVIEW env var keeping the hipGetLastError changes by default (#337 ) [ROCm/clr commit: `664bf232dd`]	2025-05-21 22:12:45 +05:30
Andryeyev, German	c512258e45	SWDEV-528808 - Disable dynamic queue by default (#256 ) Dynamic queue management will be disabled by default and the original sort logic is restored [ROCm/clr commit: `9b018165ce`]	2025-05-05 10:56:35 -04:00
Sang, Tao	68deb3d10a	SWDEV-520352 - Remove HostThread and legacy monitor (#230 ) * SWDEV-520352 - Remove HostThread and legacy monitor Remove HostThread, semaphore and legacy monitor. Make original logics of thread and command queue stricker. Add more comments to make logics clearer. Some other minor improvement. Also part of SWDEV-458943. [ROCm/clr commit: `96cadbc9e9`]	2025-04-29 09:55:24 -04:00
Jayaprakash, Karthik	49a527c826	SWDEV-506467 - Skip Abort in case of crash from the device. (#60 ) Change-Id: I964b2f2647d068202e9c38fcddb1337da754df8d [ROCm/clr commit: `b2388dfb88`]	2025-04-29 11:19:02 +05:30
Andryeyev, German	f8344154a0	SWDEV-497841 - Enable memory manager by default (#149 ) [ROCm/clr commit: `a5c860f3b0`]	2025-04-22 21:20:37 +05:30
Chaudhary, Jatin Jaikishan	e9e207d7b0	SWDEV-517941 - use device bitcode before spirv (#95 ) Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use SPIRV. * use cache for already compiled code objects * address review comments and use the two spirv isa names [ROCm/clr commit: `07e57a1f0d`]	2025-04-14 23:40:52 +01:00
Andryeyev, German	5c7c86f66d	SWDEV-517481 - Add dynamic queue management (#37 ) Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature [ROCm/clr commit: `28967982b2`]	2025-03-19 11:22:50 -04:00
German Andryeyev	f9d9b2c441	SWDEV-497841 - Add virtual memory heap Add initial implementation of virtual memory heap with dynamic virtual memory mapping support for memory pools. DEBUG_HIP_MEM_POOL_VMHEAP controls the new method. Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500 [ROCm/clr commit: `296dce5570`]	2025-02-20 10:55:49 -05:00
Tao Sang	7803594aea	SWDEV-458943 - Add fast path in wait() wait() is redesigned with two pathes: fast path: Use spinlock to wait for notify signal. If the signal hasn't been received for some loops, go to slow path. slow path: Use condition_variable's wait(). Improve monitor wrapper for better performance. Fix some bugs left from name removing patch. Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24 [ROCm/clr commit: `f2ff56af9c`]	2025-01-28 12:19:55 -05:00
Saleel Kudchadker	d4594531ef	SWDEV-506251 - Disable blit copy thresold for OpenCL Change-Id: Id0ca43b13d5792791a42da263f6aa4496382cea6 [ROCm/clr commit: `39801b5750`]	2025-01-08 02:46:01 +00:00
Pengda Xie	612ae28524	SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets - Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to SCOPE_NONE in AQL packet header Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868 [ROCm/clr commit: `8155943c5f`]	2025-01-03 17:28:21 -05:00
Ioannis Assiouras	2c8805e536	SWDEV-483134 - Remove hipExtHostAlloc API Change-Id: I60777ef5c56b60dd8100d0d794ca10fb3b96a555 [ROCm/clr commit: `e8b2fdab96`]	2024-12-16 17:13:49 -05:00
Saleel Kudchadker	7d7aa8b69c	SWDEV-497145 - Use rocr copyOnEngine API for staged copies - Refactor blit code and clean ASAN instrumentation - Use unified function for rocr copy - Enable shader copy path for unpinned writeBuffer/readBuffer paths - Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for pinned copies or unpinned H2D/D2H copies < 16KB Change-Id: I42045cca79234b340dbf53dafb93044199736ae4 [ROCm/clr commit: `7863eb92dc`]	2024-12-04 13:38:13 -05:00
Satyanvesh Dittakavi	5a16db0cd5	SWDEV-477584 - Match hipGetLastError behavior with CUDA using env var Change-Id: I4c5acff180ae904028f7c5fdf4e109ffd1f0c4ef [ROCm/clr commit: `e3b8754448`]	2024-11-28 01:33:52 -05:00
German Andryeyev	a9daa4c8f4	SWDEV-486602 - Disable sysmem pool Currently amd::Monitor can work in FILO mode for the active waits and cause a delay in wakeup of some threads. That may have a problem with the current sysmem pool design. Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69 [ROCm/clr commit: `9473f143c2`]	2024-11-20 11:35:28 -05:00
Saleel Kudchadker	672e4fa835	SWDEV-446123 - Revert "Match hipGetLastError behavior with CUDA using env var" This reverts commit `941cfd5b36`. Reason for revert: <INSERT REASONING HERE> Change-Id: I11a456655393bcf4b82d749ce7259bc1b78d1424 [ROCm/clr commit: `582dc7dd6d`]	2024-11-08 20:35:13 -05:00
Satyanvesh Dittakavi	941cfd5b36	SWDEV-446123 - Match hipGetLastError behavior with CUDA using env var Change-Id: Iaec697c1304d746376ecf2bfe2ad683b15ee189f [ROCm/clr commit: `5f477900a3`]	2024-11-07 12:02:34 -05:00
German Andryeyev	4a2687a450	SWDEV-486602 - Fix Windows 32 bit build Windows alings fields to 8 bytes even with 32bit builds. Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool. Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293 [ROCm/clr commit: `6bb7d1afdc`]	2024-10-18 11:35:54 -04:00
German Andryeyev	0a03665a3f	SWDEV-491375 - Limit the SW batch size Applications may submit commands withoout waits for GPU. That causes a growth of SW unreleased commands. Make sure runtime flushes SW queue, if it grows over some threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE. Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396 [ROCm/clr commit: `8657a77029`]	2024-10-17 10:53:57 -04:00
German Andryeyev	faea40cbb3	SWDEV-486602 - Optimize HSA callback performance - Don't generate callbacks for HIP events - Don't process profiling info in the callback for HIP events - Wait for CPU status update of the submitted commands every 50 calls. That will allow to drain the commands and destroy HSA signals. Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9 [ROCm/clr commit: `364dfb0ed1`]	2024-10-11 14:50:25 -04:00
Saleel Kudchadker	5296c77138	SWDEV-301667 - Logging upgrades - Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3 [ROCm/clr commit: `35e03ea0d0`]	2024-10-04 13:26:25 -04:00
Saleel Kudchadker	343bdf3187	SWDEV-478624 - Use readback workaround to ensure kernel arg coherence Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush workaround. The default is 0 Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c [ROCm/clr commit: `9de6d4d46c`]	2024-09-11 14:53:15 -04:00
victzhan	fde29b7c06	Revert "SWDEV-458943 - make new AMD_MONITOR on" This reverts commit `47dcfbae6b`. Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b [ROCm/clr commit: `7a01db98e9`]	2024-09-05 10:10:50 -04:00
Ioannis Assiouras	a00f071579	SWDEV-470372 - Added hipExtHostAlloc API This change adds a new HIP API `hipExtHostAlloc` which preserves the functionality of `hipHostMalloc`. Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff [ROCm/clr commit: `2c84211b58`]	2024-08-27 08:26:03 -04:00
German Andryeyev	35c7a87014	SWDEV-470612 - Add the optimized multistream path - Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution - Optimize the launch latency, where commands creation and execution is done at the same time - Optimize the scheduling to use less barriers and waiting signals if the same queue can be detected - The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES environment variable, where 0 will use the original path and any other value will force the number of asynchronous queues for execution - DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async execution in graphs(applicable for Navi families only) Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e [ROCm/clr commit: `9db52f9a46`]	2024-08-02 14:19:44 -04:00
Saleel Kudchadker	16920809d7	SWDEV-301667 - Refactor Blit force env var Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a [ROCm/clr commit: `d379f4efd0`]	2024-07-25 15:15:10 -04:00
taosang2	47dcfbae6b	SWDEV-458943 - make new AMD_MONITOR on make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac [ROCm/clr commit: `f8598dabb0`]	2024-07-24 14:29:27 -04:00
Tao Sang	b8cf863eaa	SWDEV-458943 - Implement std::mutex based monitor Implement std::mutex based monitor that has much simpler logics than legacy monitor. Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to toggle them. If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false (by default), use legacy monitor; If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true, use std::mutex based monitor. If no perf drop of stl::mutex based monitor, legacy one will be removed later. Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad [ROCm/clr commit: `73c02041e1`]	2024-07-04 11:50:46 -04:00
Ioannis Assiouras	a774b89a43	SWDEV-470787 - Fixed undefined symbols for flags in static build Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211 [ROCm/clr commit: `fa07c33cba`]	2024-07-04 02:57:22 -04:00
Ioannis Assiouras	af089a2171	SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1 [ROCm/clr commit: `3edf1501cc`]	2024-06-12 16:22:27 -04:00
kjayapra-amd	41cb6dadf9	SWDEV-460948 - Changes to alloc, set, capture under single function. Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543 [ROCm/clr commit: `892071aeb2`]	2024-06-06 16:57:53 -04:00
Tao Sang	7bf8d102fc	SWDEV-433371 - Support new comgr unbundling action Support new comgr unbundling action api to extract codebjects in compressed and uncompressed modes. Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to toggle new path and old path. If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default), uncompressed codeobject will go old path for better perf, compressed codeobject will go new path. If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true, both uncompressed and compressed codeobjects will go new path. Add comgr wrapper for amd_comgr_action_info_set_bundle_entry_ids() Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32 [ROCm/clr commit: `d0050ce309`]	2024-05-28 06:31:10 +00:00
Tao Sang	5bf67d7da7	Revert "SWDEV-433371 - use comgr to unbundle code objects" This reverts commit `c0ee0ffa1c`. Reason for revert: <INSERT REASONING HERE> New comgr unbundling action leads to perf drop for uncompressed code object. Will create a new patch to use old path for uncompressed , new unbundling api for compressed . Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52 [ROCm/clr commit: `a1350fe8c1`]	2024-05-28 06:31:10 +00:00
taosang2	c0ee0ffa1c	SWDEV-433371 - use comgr to unbundle code objects 1.Make runtime use comgr to unbundle code objects 2.Support compressed/uncompressed modes 3.Remove HIP_USE_RUNTIME_UNBUNDLER and HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics 4.Add comgr wrapper for amd_comgr_action_info_set_bundle_entry_ids() Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329 [ROCm/clr commit: `e53df57ffe`]	2024-05-01 16:09:12 -04:00
German Andryeyev	daceede8a7	SWDEV-311271 - Enable mempools under Linux Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c [ROCm/clr commit: `7a371503b2`]	2024-04-29 18:06:34 -04:00

1 2 3

140 Commits