rocm-systems

Autore	SHA1	Messaggio	Data
Chaudhary, Jatin Jaikishan	07e57a1f0d	SWDEV-517941 - use device bitcode before spirv (#95 ) Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use SPIRV. * use cache for already compiled code objects * address review comments and use the two spirv isa names	2025-04-14 23:40:52 +01:00
Andryeyev, German	28967982b2	SWDEV-517481 - Add dynamic queue management (#37 ) Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature	2025-03-19 11:22:50 -04:00
German Andryeyev	cece301fd4	SWDEV-518474 - Add comgr debug mask Move prints from CO processing under COMGR debug mask. Change-Id: I2a417e42a1f4e2922a34eb104c69e4db10b5f1c6	2025-03-04 14:37:08 -05:00
German Andryeyev	296dce5570	SWDEV-497841 - Add virtual memory heap Add initial implementation of virtual memory heap with dynamic virtual memory mapping support for memory pools. DEBUG_HIP_MEM_POOL_VMHEAP controls the new method. Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500	2025-02-20 10:55:49 -05:00
Tao Sang	f2ff56af9c	SWDEV-458943 - Add fast path in wait() wait() is redesigned with two pathes: fast path: Use spinlock to wait for notify signal. If the signal hasn't been received for some loops, go to slow path. slow path: Use condition_variable's wait(). Improve monitor wrapper for better performance. Fix some bugs left from name removing patch. Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24	2025-01-28 12:19:55 -05:00
Branislav Brzak	3fd46a3783	SWDEV-508743 - [6.4 Preview] Add ROCm 7.0 breaking change fields Change-Id: I07bff42731e74a4c409505cf8981342e22ce26be	2025-01-17 06:25:27 -05:00
Saleel Kudchadker	39801b5750	SWDEV-506251 - Disable blit copy thresold for OpenCL Change-Id: Id0ca43b13d5792791a42da263f6aa4496382cea6	2025-01-08 02:46:01 +00:00
Pengda Xie	8155943c5f	SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets - Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to SCOPE_NONE in AQL packet header Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868	2025-01-03 17:28:21 -05:00
Ioannis Assiouras	e8b2fdab96	SWDEV-483134 - Remove hipExtHostAlloc API Change-Id: I60777ef5c56b60dd8100d0d794ca10fb3b96a555	2024-12-16 17:13:49 -05:00
Saleel Kudchadker	7863eb92dc	SWDEV-497145 - Use rocr copyOnEngine API for staged copies - Refactor blit code and clean ASAN instrumentation - Use unified function for rocr copy - Enable shader copy path for unpinned writeBuffer/readBuffer paths - Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for pinned copies or unpinned H2D/D2H copies < 16KB Change-Id: I42045cca79234b340dbf53dafb93044199736ae4	2024-12-04 13:38:13 -05:00
Satyanvesh Dittakavi	e3b8754448	SWDEV-477584 - Match hipGetLastError behavior with CUDA using env var Change-Id: I4c5acff180ae904028f7c5fdf4e109ffd1f0c4ef	2024-11-28 01:33:52 -05:00
German Andryeyev	9473f143c2	SWDEV-486602 - Disable sysmem pool Currently amd::Monitor can work in FILO mode for the active waits and cause a delay in wakeup of some threads. That may have a problem with the current sysmem pool design. Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69	2024-11-20 11:35:28 -05:00
taosang2	cc25c5d646	SWDEV-487356 - Fix AMD LOG compiling warining Change-Id: I757185f9c7c12f736e266219b67daf5836d2a125	2024-11-09 12:57:22 -05:00
Saleel Kudchadker	582dc7dd6d	SWDEV-446123 - Revert "Match hipGetLastError behavior with CUDA using env var" This reverts commit `5f477900a3`. Reason for revert: <INSERT REASONING HERE> Change-Id: I11a456655393bcf4b82d749ce7259bc1b78d1424	2024-11-08 20:35:13 -05:00
Satyanvesh Dittakavi	5f477900a3	SWDEV-446123 - Match hipGetLastError behavior with CUDA using env var Change-Id: Iaec697c1304d746376ecf2bfe2ad683b15ee189f	2024-11-07 12:02:34 -05:00
Tao Sang	802cacf3e9	SWDEV-487356 - Fix AMD LOG issue in Win32 Change-Id: Ia1c19cf4ea24188cdb2d374b01f975f794e02dbf	2024-11-01 08:26:25 -04:00
German Andryeyev	6bb7d1afdc	SWDEV-486602 - Fix Windows 32 bit build Windows alings fields to 8 bytes even with 32bit builds. Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool. Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293	2024-10-18 11:35:54 -04:00
German Andryeyev	8657a77029	SWDEV-491375 - Limit the SW batch size Applications may submit commands withoout waits for GPU. That causes a growth of SW unreleased commands. Make sure runtime flushes SW queue, if it grows over some threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE. Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396	2024-10-17 10:53:57 -04:00
German Andryeyev	364dfb0ed1	SWDEV-486602 - Optimize HSA callback performance - Don't generate callbacks for HIP events - Don't process profiling info in the callback for HIP events - Wait for CPU status update of the submitted commands every 50 calls. That will allow to drain the commands and destroy HSA signals. Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9	2024-10-11 14:50:25 -04:00
Saleel Kudchadker	e36666e536	SWDEV-301667 - Enable ROCr logging - Use AMD_LOG_LEVEL=5 to dump AQL packets in ROCr Change-Id: I2c044a5304c4eaf3d3af20e62d1f54c98d4fbaa4	2024-10-04 19:22:12 -04:00
Saleel Kudchadker	35e03ea0d0	SWDEV-301667 - Logging upgrades - Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3	2024-10-04 13:26:25 -04:00
pghafari	365ffd4805	SWDEV-444447 - Fix regression for verbose printing for AMD_LOG_LEVEL=4 Change-Id: Id245caef711b7ccdf4e999e934993beb43d7c3d5	2024-09-18 13:08:10 -04:00
Saleel Kudchadker	9de6d4d46c	SWDEV-478624 - Use readback workaround to ensure kernel arg coherence Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush workaround. The default is 0 Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c	2024-09-11 14:53:15 -04:00
victzhan	7a01db98e9	Revert "SWDEV-458943 - make new AMD_MONITOR on" This reverts commit `f8598dabb0`. Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b	2024-09-05 10:10:50 -04:00
Ioannis Assiouras	2c84211b58	SWDEV-470372 - Added hipExtHostAlloc API This change adds a new HIP API `hipExtHostAlloc` which preserves the functionality of `hipHostMalloc`. Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff	2024-08-27 08:26:03 -04:00
Ajay	e07172ff57	SWDEV-478881 - Fix log AMD_LOG file corruption hiprtc and hip APIs use the same file. Append to file instead of start of file Change-Id: I2703f9bb67f0c51b557a058daab129679a0b5dd9	2024-08-23 11:19:48 -04:00
German Andryeyev	9db52f9a46	SWDEV-470612 - Add the optimized multistream path - Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution - Optimize the launch latency, where commands creation and execution is done at the same time - Optimize the scheduling to use less barriers and waiting signals if the same queue can be detected - The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES environment variable, where 0 will use the original path and any other value will force the number of asynchronous queues for execution - DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async execution in graphs(applicable for Navi families only) Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e	2024-08-02 14:19:44 -04:00
Saleel Kudchadker	d379f4efd0	SWDEV-301667 - Refactor Blit force env var Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a	2024-07-25 15:15:10 -04:00
taosang2	f8598dabb0	SWDEV-458943 - make new AMD_MONITOR on make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac	2024-07-24 14:29:27 -04:00
pghafari	9e6e77b7dd	SWDEV-444447 - log print pid/tid only in verbose mode Change-Id: I2bbe9085d607e9d8d5acda1ed43e3245335d239f	2024-07-11 15:39:13 -04:00
Tao Sang	73c02041e1	SWDEV-458943 - Implement std::mutex based monitor Implement std::mutex based monitor that has much simpler logics than legacy monitor. Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to toggle them. If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false (by default), use legacy monitor; If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true, use std::mutex based monitor. If no perf drop of stl::mutex based monitor, legacy one will be removed later. Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad	2024-07-04 11:50:46 -04:00
Ioannis Assiouras	fa07c33cba	SWDEV-470787 - Fixed undefined symbols for flags in static build Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211	2024-07-04 02:57:22 -04:00
Ioannis Assiouras	3edf1501cc	SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1	2024-06-12 16:22:27 -04:00
kjayapra-amd	892071aeb2	SWDEV-460948 - Changes to alloc, set, capture under single function. Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543	2024-06-06 16:57:53 -04:00
Ioannis Assiouras	b8c2ac4de4	SWDEV-463865 - symbol renamings to prevent conflicts in static build Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb	2024-06-06 04:05:55 -04:00
Tao Sang	d0050ce309	SWDEV-433371 - Support new comgr unbundling action Support new comgr unbundling action api to extract codebjects in compressed and uncompressed modes. Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to toggle new path and old path. If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default), uncompressed codeobject will go old path for better perf, compressed codeobject will go new path. If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true, both uncompressed and compressed codeobjects will go new path. Add comgr wrapper for amd_comgr_action_info_set_bundle_entry_ids() Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32	2024-05-28 06:31:10 +00:00
Tao Sang	a1350fe8c1	Revert "SWDEV-433371 - use comgr to unbundle code objects" This reverts commit `e53df57ffe`. Reason for revert: <INSERT REASONING HERE> New comgr unbundling action leads to perf drop for uncompressed code object. Will create a new patch to use old path for uncompressed , new unbundling api for compressed . Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52	2024-05-28 06:31:10 +00:00
Alex Xie	2eb30376ba	SWDEV-451945 - Remove ShouldLoadPlatform function Change-Id: Iabb4071bb77201576bc2c0488a04f4fa188815df	2024-05-06 10:42:59 -04:00
taosang2	e53df57ffe	SWDEV-433371 - use comgr to unbundle code objects 1.Make runtime use comgr to unbundle code objects 2.Support compressed/uncompressed modes 3.Remove HIP_USE_RUNTIME_UNBUNDLER and HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics 4.Add comgr wrapper for amd_comgr_action_info_set_bundle_entry_ids() Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329	2024-05-01 16:09:12 -04:00
Saleel Kudchadker	948ca5a931	SWDEV-301667 - Add LOG_TS mask - Add LOG_TS mask for printing signal times - Read raw ticks from signals Change-Id: Ibdd0bf06c790729f6c65083a4784c97a3c3219e0	2024-04-30 12:24:48 -04:00
German Andryeyev	7a371503b2	SWDEV-311271 - Enable mempools under Linux Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c	2024-04-29 18:06:34 -04:00
taosang2	35c80dd482	SWDEV-424956 - Fix half vector printf issue Refactor PrintfDbg::outputArgument() to remove potential risk. Fix half vector printf issue on all devices. Fix FEAT-56794 as well. Change-Id: Iae39359d2128588def2e43d77fe58e868b8e71ff	2024-04-12 14:25:44 -04:00
German Andryeyev	f0c7ecf617	SWDEV-455254 - Add kernel arg optimization Add kernel arguments optimization into blit path. Enabled by default on MI300. Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e	2024-04-10 18:08:37 -04:00
Saleel Kudchadker	c157bfb202	SWDEV-301667 - Create TS for each node recorded in graph - Create a vector to allow multiple TS to be stored in Command. - This would mean we dont wait for entire batch in Accumulate command to finish when we exhaust signals. - Reduce the number of signals created at init to 64. This min value may still need to be tuned but the KFD allows max of 4094 interrupt signals per device. - Store kernel names whenever they are available and not just when profiling. If we dynamically enable profiling like for Torch, a crash can happen if hipGraphInstantiate wasnt included in Torch profile scope beacuse we previously entered kernel names only when profiler is attached. Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006	2024-03-26 14:47:24 -04:00
German Andryeyev	0f3391b93e	SWDEV-311271 - Enable mempool under Windows Change-Id: Ifa4cac4a8d52e031d63f62515439ca09efe7b4cb	2024-03-11 10:45:51 -04:00
Vikram	6f390f5af9	SWDEV-424956 - Fix OpenCL printf bug while printing vectors of half type OpenCL printf handling did not process vector of half precision floats properly (mainly because compiler packs 2 halfs into a dword and runtime failed to extract the individual parts). This patch fixes the issue. Change-Id: Ia1f15ccfb5db52b71c43cfd588dd38f551ee5277	2024-03-04 03:53:18 -05:00
Saleel Kudchadker	94c7004df8	SWDEV-301667 - Increase default signal pool to 4096 Change-Id: I4ab23b0f87e295b40ab76ad6e96249d11b8ad04d	2024-02-29 22:52:02 +00:00
Saleel Kudchadker	68f40f78dd	SWDEV-443760 - Enable device kernel args for MI300 - Enable Device kernel args for MI300* for now. - Fix a perf issue which impacts graph instantiate when dev kernel args are enabled. Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f	2024-02-28 19:10:04 -05:00
Rahul Garg	b954d0d6e0	SWDEV-443760 - Disable HIP_FORCE_DEV_KERNARG by default Change-Id: I8c3d8e65aa954bd28499eebefbc532d1177445dc	2024-02-22 04:37:51 -05:00
Todd tiantuo Li	7bfee3481b	SWDEV-333557 - Enable PAL_HIP_IPC_FLAG by default Change-Id: Ibb2ca0b9521aff4eca190e4817dcc5f8d697b172	2024-02-20 18:45:25 -05:00

1 2 3 4 5 ...

641 Commit