Графік комітів

646 Коміти

Автор SHA1 Повідомлення Дата
Andryeyev, German 9b018165ce SWDEV-528808 - Disable dynamic queue by default (#256)
Dynamic queue management will be disabled by default and
the original sort logic is restored
2025-05-05 10:56:35 -04:00
Sang, Tao 96cadbc9e9 SWDEV-520352 - Remove HostThread and legacy monitor (#230)
* SWDEV-520352 - Remove HostThread and legacy monitor

Remove HostThread, semaphore and  legacy monitor.
Make original logics of thread and command queue stricker.
Add more comments to make logics clearer.
Some other minor improvement.

Also part of SWDEV-458943.
2025-04-29 09:55:24 -04:00
Jayaprakash, Karthik b2388dfb88 SWDEV-506467 - Skip Abort in case of crash from the device. (#60)
Change-Id: I964b2f2647d068202e9c38fcddb1337da754df8d
2025-04-29 11:19:02 +05:30
Kudchadker, Saleel ce24936970 SWDEV-510186 - Improve logging (#220)
- Print all arguments for logs, this is useful for debug
2025-04-25 08:40:31 -07:00
Andryeyev, German a5c860f3b0 SWDEV-497841 - Enable memory manager by default (#149) 2025-04-22 21:20:37 +05:30
Chaudhary, Jatin Jaikishan 07e57a1f0d SWDEV-517941 - use device bitcode before spirv (#95)
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.

* use cache for already compiled code objects

* address review comments and use the two spirv isa names
2025-04-14 23:40:52 +01:00
Andryeyev, German 28967982b2 SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature
2025-03-19 11:22:50 -04:00
German Andryeyev cece301fd4 SWDEV-518474 - Add comgr debug mask
Move prints from CO processing under COMGR debug mask.

Change-Id: I2a417e42a1f4e2922a34eb104c69e4db10b5f1c6
2025-03-04 14:37:08 -05:00
German Andryeyev 296dce5570 SWDEV-497841 - Add virtual memory heap
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.

Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
2025-02-20 10:55:49 -05:00
Tao Sang f2ff56af9c SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24
2025-01-28 12:19:55 -05:00
Branislav Brzak 3fd46a3783 SWDEV-508743 - [6.4 Preview] Add ROCm 7.0 breaking change fields
Change-Id: I07bff42731e74a4c409505cf8981342e22ce26be
2025-01-17 06:25:27 -05:00
Saleel Kudchadker 39801b5750 SWDEV-506251 - Disable blit copy thresold for OpenCL
Change-Id: Id0ca43b13d5792791a42da263f6aa4496382cea6
2025-01-08 02:46:01 +00:00
Pengda Xie 8155943c5f SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets
- Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to
   SCOPE_NONE in AQL packet header

Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868
2025-01-03 17:28:21 -05:00
Ioannis Assiouras e8b2fdab96 SWDEV-483134 - Remove hipExtHostAlloc API
Change-Id: I60777ef5c56b60dd8100d0d794ca10fb3b96a555
2024-12-16 17:13:49 -05:00
Saleel Kudchadker 7863eb92dc SWDEV-497145 - Use rocr copyOnEngine API for staged copies
- Refactor blit code and clean ASAN instrumentation
- Use unified function for rocr copy
- Enable shader copy path for unpinned writeBuffer/readBuffer paths
- Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for
  pinned copies or unpinned H2D/D2H copies < 16KB

Change-Id: I42045cca79234b340dbf53dafb93044199736ae4
2024-12-04 13:38:13 -05:00
Satyanvesh Dittakavi e3b8754448 SWDEV-477584 - Match hipGetLastError behavior with CUDA using env var
Change-Id: I4c5acff180ae904028f7c5fdf4e109ffd1f0c4ef
2024-11-28 01:33:52 -05:00
German Andryeyev 9473f143c2 SWDEV-486602 - Disable sysmem pool
Currently amd::Monitor can work in FILO mode for the active waits
and cause a delay in wakeup of some threads. That may have a problem
with the current sysmem pool design.

Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69
2024-11-20 11:35:28 -05:00
taosang2 cc25c5d646 SWDEV-487356 - Fix AMD LOG compiling warining
Change-Id: I757185f9c7c12f736e266219b67daf5836d2a125
2024-11-09 12:57:22 -05:00
Saleel Kudchadker 582dc7dd6d SWDEV-446123 - Revert "Match hipGetLastError behavior with CUDA using env var"
This reverts commit 5f477900a3.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I11a456655393bcf4b82d749ce7259bc1b78d1424
2024-11-08 20:35:13 -05:00
Satyanvesh Dittakavi 5f477900a3 SWDEV-446123 - Match hipGetLastError behavior with CUDA using env var
Change-Id: Iaec697c1304d746376ecf2bfe2ad683b15ee189f
2024-11-07 12:02:34 -05:00
Tao Sang 802cacf3e9 SWDEV-487356 - Fix AMD LOG issue in Win32
Change-Id: Ia1c19cf4ea24188cdb2d374b01f975f794e02dbf
2024-11-01 08:26:25 -04:00
German Andryeyev 6bb7d1afdc SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
2024-10-18 11:35:54 -04:00
German Andryeyev 8657a77029 SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
2024-10-17 10:53:57 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Saleel Kudchadker e36666e536 SWDEV-301667 - Enable ROCr logging
- Use AMD_LOG_LEVEL=5 to dump AQL packets in ROCr

Change-Id: I2c044a5304c4eaf3d3af20e62d1f54c98d4fbaa4
2024-10-04 19:22:12 -04:00
Saleel Kudchadker 35e03ea0d0 SWDEV-301667 - Logging upgrades
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB

Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3
2024-10-04 13:26:25 -04:00
pghafari 365ffd4805 SWDEV-444447 - Fix regression for verbose printing for AMD_LOG_LEVEL=4
Change-Id: Id245caef711b7ccdf4e999e934993beb43d7c3d5
2024-09-18 13:08:10 -04:00
Saleel Kudchadker 9de6d4d46c SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
2024-09-11 14:53:15 -04:00
victzhan 7a01db98e9 Revert "SWDEV-458943 - make new AMD_MONITOR on"
This reverts commit f8598dabb0.

Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b
2024-09-05 10:10:50 -04:00
Ioannis Assiouras 2c84211b58 SWDEV-470372 - Added hipExtHostAlloc API
This change adds a new HIP API `hipExtHostAlloc` which preserves
the functionality of `hipHostMalloc`.

Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff
2024-08-27 08:26:03 -04:00
Ajay e07172ff57 SWDEV-478881 - Fix log AMD_LOG file corruption
hiprtc and hip APIs use the same file.
Append to file instead of start of file

Change-Id: I2703f9bb67f0c51b557a058daab129679a0b5dd9
2024-08-23 11:19:48 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Saleel Kudchadker d379f4efd0 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a
2024-07-25 15:15:10 -04:00
taosang2 f8598dabb0 SWDEV-458943 - make new AMD_MONITOR on
make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true

Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac
2024-07-24 14:29:27 -04:00
pghafari 9e6e77b7dd SWDEV-444447 - log print pid/tid only in verbose mode
Change-Id: I2bbe9085d607e9d8d5acda1ed43e3245335d239f
2024-07-11 15:39:13 -04:00
Tao Sang 73c02041e1 SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
2024-07-04 11:50:46 -04:00
Ioannis Assiouras fa07c33cba SWDEV-470787 - Fixed undefined symbols for flags in static build
Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211
2024-07-04 02:57:22 -04:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
kjayapra-amd 892071aeb2 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543
2024-06-06 16:57:53 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Tao Sang d0050ce309 SWDEV-433371 - Support new comgr unbundling action
Support new comgr unbundling action api to extract codebjects
in compressed and uncompressed modes.

Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to
toggle new path and old path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default),
   uncompressed codeobject will go old path for better perf,
   compressed   codeobject will go new path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true,
   both uncompressed and compressed codeobjects will go new
   path.

Add comgr wrapper for
   amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32
2024-05-28 06:31:10 +00:00
Tao Sang a1350fe8c1 Revert "SWDEV-433371 - use comgr to unbundle code objects"
This reverts commit e53df57ffe.

Reason for revert: <INSERT REASONING HERE>
New comgr unbundling action leads to perf drop for uncompressed code object.   Will create a new patch to use old path for uncompressed , new unbundling api for compressed . 

Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52
2024-05-28 06:31:10 +00:00
Alex Xie 2eb30376ba SWDEV-451945 - Remove ShouldLoadPlatform function
Change-Id: Iabb4071bb77201576bc2c0488a04f4fa188815df
2024-05-06 10:42:59 -04:00
taosang2 e53df57ffe SWDEV-433371 - use comgr to unbundle code objects
1.Make runtime use comgr to unbundle code objects
2.Support compressed/uncompressed modes
3.Remove HIP_USE_RUNTIME_UNBUNDLER and
  HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics
4.Add comgr wrapper for
  amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329
2024-05-01 16:09:12 -04:00
Saleel Kudchadker 948ca5a931 SWDEV-301667 - Add LOG_TS mask
- Add LOG_TS mask for printing signal times
- Read raw ticks from signals

Change-Id: Ibdd0bf06c790729f6c65083a4784c97a3c3219e0
2024-04-30 12:24:48 -04:00
German Andryeyev 7a371503b2 SWDEV-311271 - Enable mempools under Linux
Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c
2024-04-29 18:06:34 -04:00
taosang2 35c80dd482 SWDEV-424956 - Fix half vector printf issue
Refactor PrintfDbg::outputArgument() to remove potential risk.
Fix half vector printf issue on all devices.
Fix FEAT-56794 as well.

Change-Id: Iae39359d2128588def2e43d77fe58e868b8e71ff
2024-04-12 14:25:44 -04:00
German Andryeyev f0c7ecf617 SWDEV-455254 - Add kernel arg optimization
Add kernel arguments optimization into blit path.
Enabled by default on MI300.

Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e
2024-04-10 18:08:37 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
German Andryeyev 0f3391b93e SWDEV-311271 - Enable mempool under Windows
Change-Id: Ifa4cac4a8d52e031d63f62515439ca09efe7b4cb
2024-03-11 10:45:51 -04:00