Grafico dei commit

641 Commit

Autore SHA1 Messaggio Data
Chaudhary, Jatin Jaikishan 07e57a1f0d SWDEV-517941 - use device bitcode before spirv (#95)
Also add flag: HIP_FORCE_SPIRV_CODEOBJECT to allow override to force use
SPIRV.

* use cache for already compiled code objects

* address review comments and use the two spirv isa names
2025-04-14 23:40:52 +01:00
Andryeyev, German 28967982b2 SWDEV-517481 - Add dynamic queue management (#37)
Enabled by defaulty. DEBUG_HIP_DYNAMIC_QUEUES controls the feature
2025-03-19 11:22:50 -04:00
German Andryeyev cece301fd4 SWDEV-518474 - Add comgr debug mask
Move prints from CO processing under COMGR debug mask.

Change-Id: I2a417e42a1f4e2922a34eb104c69e4db10b5f1c6
2025-03-04 14:37:08 -05:00
German Andryeyev 296dce5570 SWDEV-497841 - Add virtual memory heap
Add initial implementation of virtual memory heap with
dynamic virtual memory mapping support for memory pools.
DEBUG_HIP_MEM_POOL_VMHEAP controls the new method.

Change-Id: I8dc5be2e0f34ab472f1800f43bb6243639a5e500
2025-02-20 10:55:49 -05:00
Tao Sang f2ff56af9c SWDEV-458943 - Add fast path in wait()
wait() is redesigned with two pathes:
fast path: Use spinlock to wait for notify signal. If the
 signal hasn't been received for some loops, go to slow path.
slow path: Use condition_variable's wait().

Improve monitor wrapper for better performance.

Fix some bugs left from name removing patch.

Change-Id: I893a8353121a25d11e37c8e631caf31cc1fc1f24
2025-01-28 12:19:55 -05:00
Branislav Brzak 3fd46a3783 SWDEV-508743 - [6.4 Preview] Add ROCm 7.0 breaking change fields
Change-Id: I07bff42731e74a4c409505cf8981342e22ce26be
2025-01-17 06:25:27 -05:00
Saleel Kudchadker 39801b5750 SWDEV-506251 - Disable blit copy thresold for OpenCL
Change-Id: Id0ca43b13d5792791a42da263f6aa4496382cea6
2025-01-08 02:46:01 +00:00
Pengda Xie 8155943c5f SWDEV-505833 - Provide functionality to avoid L2 flush for CPX mode for dispatch packets
- Added DEBUG_CLR_SKIP_RELEASE_SCOPE flag to force release scope to
   SCOPE_NONE in AQL packet header

Change-Id: Ife02cddb9d5cd4749103ce585d3d5fe9024c6868
2025-01-03 17:28:21 -05:00
Ioannis Assiouras e8b2fdab96 SWDEV-483134 - Remove hipExtHostAlloc API
Change-Id: I60777ef5c56b60dd8100d0d794ca10fb3b96a555
2024-12-16 17:13:49 -05:00
Saleel Kudchadker 7863eb92dc SWDEV-497145 - Use rocr copyOnEngine API for staged copies
- Refactor blit code and clean ASAN instrumentation
- Use unified function for rocr copy
- Enable shader copy path for unpinned writeBuffer/readBuffer paths
- Set GPU_FORCE_BLIT_COPY_SIZE=16 which means we will use BLIT copy for
  pinned copies or unpinned H2D/D2H copies < 16KB

Change-Id: I42045cca79234b340dbf53dafb93044199736ae4
2024-12-04 13:38:13 -05:00
Satyanvesh Dittakavi e3b8754448 SWDEV-477584 - Match hipGetLastError behavior with CUDA using env var
Change-Id: I4c5acff180ae904028f7c5fdf4e109ffd1f0c4ef
2024-11-28 01:33:52 -05:00
German Andryeyev 9473f143c2 SWDEV-486602 - Disable sysmem pool
Currently amd::Monitor can work in FILO mode for the active waits
and cause a delay in wakeup of some threads. That may have a problem
with the current sysmem pool design.

Change-Id: I145081478d1e0b282d8838855c5718f09cf54b69
2024-11-20 11:35:28 -05:00
taosang2 cc25c5d646 SWDEV-487356 - Fix AMD LOG compiling warining
Change-Id: I757185f9c7c12f736e266219b67daf5836d2a125
2024-11-09 12:57:22 -05:00
Saleel Kudchadker 582dc7dd6d SWDEV-446123 - Revert "Match hipGetLastError behavior with CUDA using env var"
This reverts commit 5f477900a3.

Reason for revert: <INSERT REASONING HERE>

Change-Id: I11a456655393bcf4b82d749ce7259bc1b78d1424
2024-11-08 20:35:13 -05:00
Satyanvesh Dittakavi 5f477900a3 SWDEV-446123 - Match hipGetLastError behavior with CUDA using env var
Change-Id: Iaec697c1304d746376ecf2bfe2ad683b15ee189f
2024-11-07 12:02:34 -05:00
Tao Sang 802cacf3e9 SWDEV-487356 - Fix AMD LOG issue in Win32
Change-Id: Ia1c19cf4ea24188cdb2d374b01f975f794e02dbf
2024-11-01 08:26:25 -04:00
German Andryeyev 6bb7d1afdc SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
2024-10-18 11:35:54 -04:00
German Andryeyev 8657a77029 SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
2024-10-17 10:53:57 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Saleel Kudchadker e36666e536 SWDEV-301667 - Enable ROCr logging
- Use AMD_LOG_LEVEL=5 to dump AQL packets in ROCr

Change-Id: I2c044a5304c4eaf3d3af20e62d1f54c98d4fbaa4
2024-10-04 19:22:12 -04:00
Saleel Kudchadker 35e03ea0d0 SWDEV-301667 - Logging upgrades
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB

Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3
2024-10-04 13:26:25 -04:00
pghafari 365ffd4805 SWDEV-444447 - Fix regression for verbose printing for AMD_LOG_LEVEL=4
Change-Id: Id245caef711b7ccdf4e999e934993beb43d7c3d5
2024-09-18 13:08:10 -04:00
Saleel Kudchadker 9de6d4d46c SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
2024-09-11 14:53:15 -04:00
victzhan 7a01db98e9 Revert "SWDEV-458943 - make new AMD_MONITOR on"
This reverts commit f8598dabb0.

Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b
2024-09-05 10:10:50 -04:00
Ioannis Assiouras 2c84211b58 SWDEV-470372 - Added hipExtHostAlloc API
This change adds a new HIP API `hipExtHostAlloc` which preserves
the functionality of `hipHostMalloc`.

Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff
2024-08-27 08:26:03 -04:00
Ajay e07172ff57 SWDEV-478881 - Fix log AMD_LOG file corruption
hiprtc and hip APIs use the same file.
Append to file instead of start of file

Change-Id: I2703f9bb67f0c51b557a058daab129679a0b5dd9
2024-08-23 11:19:48 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Saleel Kudchadker d379f4efd0 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a
2024-07-25 15:15:10 -04:00
taosang2 f8598dabb0 SWDEV-458943 - make new AMD_MONITOR on
make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true

Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac
2024-07-24 14:29:27 -04:00
pghafari 9e6e77b7dd SWDEV-444447 - log print pid/tid only in verbose mode
Change-Id: I2bbe9085d607e9d8d5acda1ed43e3245335d239f
2024-07-11 15:39:13 -04:00
Tao Sang 73c02041e1 SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
2024-07-04 11:50:46 -04:00
Ioannis Assiouras fa07c33cba SWDEV-470787 - Fixed undefined symbols for flags in static build
Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211
2024-07-04 02:57:22 -04:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
kjayapra-amd 892071aeb2 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543
2024-06-06 16:57:53 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Tao Sang d0050ce309 SWDEV-433371 - Support new comgr unbundling action
Support new comgr unbundling action api to extract codebjects
in compressed and uncompressed modes.

Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to
toggle new path and old path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default),
   uncompressed codeobject will go old path for better perf,
   compressed   codeobject will go new path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true,
   both uncompressed and compressed codeobjects will go new
   path.

Add comgr wrapper for
   amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32
2024-05-28 06:31:10 +00:00
Tao Sang a1350fe8c1 Revert "SWDEV-433371 - use comgr to unbundle code objects"
This reverts commit e53df57ffe.

Reason for revert: <INSERT REASONING HERE>
New comgr unbundling action leads to perf drop for uncompressed code object.   Will create a new patch to use old path for uncompressed , new unbundling api for compressed . 

Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52
2024-05-28 06:31:10 +00:00
Alex Xie 2eb30376ba SWDEV-451945 - Remove ShouldLoadPlatform function
Change-Id: Iabb4071bb77201576bc2c0488a04f4fa188815df
2024-05-06 10:42:59 -04:00
taosang2 e53df57ffe SWDEV-433371 - use comgr to unbundle code objects
1.Make runtime use comgr to unbundle code objects
2.Support compressed/uncompressed modes
3.Remove HIP_USE_RUNTIME_UNBUNDLER and
  HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics
4.Add comgr wrapper for
  amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329
2024-05-01 16:09:12 -04:00
Saleel Kudchadker 948ca5a931 SWDEV-301667 - Add LOG_TS mask
- Add LOG_TS mask for printing signal times
- Read raw ticks from signals

Change-Id: Ibdd0bf06c790729f6c65083a4784c97a3c3219e0
2024-04-30 12:24:48 -04:00
German Andryeyev 7a371503b2 SWDEV-311271 - Enable mempools under Linux
Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c
2024-04-29 18:06:34 -04:00
taosang2 35c80dd482 SWDEV-424956 - Fix half vector printf issue
Refactor PrintfDbg::outputArgument() to remove potential risk.
Fix half vector printf issue on all devices.
Fix FEAT-56794 as well.

Change-Id: Iae39359d2128588def2e43d77fe58e868b8e71ff
2024-04-12 14:25:44 -04:00
German Andryeyev f0c7ecf617 SWDEV-455254 - Add kernel arg optimization
Add kernel arguments optimization into blit path.
Enabled by default on MI300.

Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e
2024-04-10 18:08:37 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
German Andryeyev 0f3391b93e SWDEV-311271 - Enable mempool under Windows
Change-Id: Ifa4cac4a8d52e031d63f62515439ca09efe7b4cb
2024-03-11 10:45:51 -04:00
Vikram 6f390f5af9 SWDEV-424956 - Fix OpenCL printf bug while printing vectors of half type
OpenCL printf handling did not process vector of half precision floats properly
 (mainly because compiler packs 2 halfs into a dword and runtime failed to extract the
 individual parts).

 This patch fixes the issue.

Change-Id: Ia1f15ccfb5db52b71c43cfd588dd38f551ee5277
2024-03-04 03:53:18 -05:00
Saleel Kudchadker 94c7004df8 SWDEV-301667 - Increase default signal pool to 4096
Change-Id: I4ab23b0f87e295b40ab76ad6e96249d11b8ad04d
2024-02-29 22:52:02 +00:00
Saleel Kudchadker 68f40f78dd SWDEV-443760 - Enable device kernel args for MI300
- Enable Device kernel args for MI300* for now.
- Fix a perf issue which impacts graph instantiate when dev kernel args
are enabled.

Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f
2024-02-28 19:10:04 -05:00
Rahul Garg b954d0d6e0 SWDEV-443760 - Disable HIP_FORCE_DEV_KERNARG by default
Change-Id: I8c3d8e65aa954bd28499eebefbc532d1177445dc
2024-02-22 04:37:51 -05:00
Todd tiantuo Li 7bfee3481b SWDEV-333557 - Enable PAL_HIP_IPC_FLAG by default
Change-Id: Ibb2ca0b9521aff4eca190e4817dcc5f8d697b172
2024-02-20 18:45:25 -05:00