Commit Graph

625 Commits

Author SHA1 Message Date
German Andryeyev 6bb7d1afdc SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
2024-10-18 11:35:54 -04:00
German Andryeyev 8657a77029 SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
2024-10-17 10:53:57 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Saleel Kudchadker e36666e536 SWDEV-301667 - Enable ROCr logging
- Use AMD_LOG_LEVEL=5 to dump AQL packets in ROCr

Change-Id: I2c044a5304c4eaf3d3af20e62d1f54c98d4fbaa4
2024-10-04 19:22:12 -04:00
Saleel Kudchadker 35e03ea0d0 SWDEV-301667 - Logging upgrades
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB

Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3
2024-10-04 13:26:25 -04:00
pghafari 365ffd4805 SWDEV-444447 - Fix regression for verbose printing for AMD_LOG_LEVEL=4
Change-Id: Id245caef711b7ccdf4e999e934993beb43d7c3d5
2024-09-18 13:08:10 -04:00
Saleel Kudchadker 9de6d4d46c SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
2024-09-11 14:53:15 -04:00
victzhan 7a01db98e9 Revert "SWDEV-458943 - make new AMD_MONITOR on"
This reverts commit f8598dabb0.

Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b
2024-09-05 10:10:50 -04:00
Ioannis Assiouras 2c84211b58 SWDEV-470372 - Added hipExtHostAlloc API
This change adds a new HIP API `hipExtHostAlloc` which preserves
the functionality of `hipHostMalloc`.

Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff
2024-08-27 08:26:03 -04:00
Ajay e07172ff57 SWDEV-478881 - Fix log AMD_LOG file corruption
hiprtc and hip APIs use the same file.
Append to file instead of start of file

Change-Id: I2703f9bb67f0c51b557a058daab129679a0b5dd9
2024-08-23 11:19:48 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Saleel Kudchadker d379f4efd0 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a
2024-07-25 15:15:10 -04:00
taosang2 f8598dabb0 SWDEV-458943 - make new AMD_MONITOR on
make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true

Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac
2024-07-24 14:29:27 -04:00
pghafari 9e6e77b7dd SWDEV-444447 - log print pid/tid only in verbose mode
Change-Id: I2bbe9085d607e9d8d5acda1ed43e3245335d239f
2024-07-11 15:39:13 -04:00
Tao Sang 73c02041e1 SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
2024-07-04 11:50:46 -04:00
Ioannis Assiouras fa07c33cba SWDEV-470787 - Fixed undefined symbols for flags in static build
Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211
2024-07-04 02:57:22 -04:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
kjayapra-amd 892071aeb2 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543
2024-06-06 16:57:53 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Tao Sang d0050ce309 SWDEV-433371 - Support new comgr unbundling action
Support new comgr unbundling action api to extract codebjects
in compressed and uncompressed modes.

Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to
toggle new path and old path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default),
   uncompressed codeobject will go old path for better perf,
   compressed   codeobject will go new path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true,
   both uncompressed and compressed codeobjects will go new
   path.

Add comgr wrapper for
   amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32
2024-05-28 06:31:10 +00:00
Tao Sang a1350fe8c1 Revert "SWDEV-433371 - use comgr to unbundle code objects"
This reverts commit e53df57ffe.

Reason for revert: <INSERT REASONING HERE>
New comgr unbundling action leads to perf drop for uncompressed code object.   Will create a new patch to use old path for uncompressed , new unbundling api for compressed . 

Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52
2024-05-28 06:31:10 +00:00
Alex Xie 2eb30376ba SWDEV-451945 - Remove ShouldLoadPlatform function
Change-Id: Iabb4071bb77201576bc2c0488a04f4fa188815df
2024-05-06 10:42:59 -04:00
taosang2 e53df57ffe SWDEV-433371 - use comgr to unbundle code objects
1.Make runtime use comgr to unbundle code objects
2.Support compressed/uncompressed modes
3.Remove HIP_USE_RUNTIME_UNBUNDLER and
  HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics
4.Add comgr wrapper for
  amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329
2024-05-01 16:09:12 -04:00
Saleel Kudchadker 948ca5a931 SWDEV-301667 - Add LOG_TS mask
- Add LOG_TS mask for printing signal times
- Read raw ticks from signals

Change-Id: Ibdd0bf06c790729f6c65083a4784c97a3c3219e0
2024-04-30 12:24:48 -04:00
German Andryeyev 7a371503b2 SWDEV-311271 - Enable mempools under Linux
Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c
2024-04-29 18:06:34 -04:00
taosang2 35c80dd482 SWDEV-424956 - Fix half vector printf issue
Refactor PrintfDbg::outputArgument() to remove potential risk.
Fix half vector printf issue on all devices.
Fix FEAT-56794 as well.

Change-Id: Iae39359d2128588def2e43d77fe58e868b8e71ff
2024-04-12 14:25:44 -04:00
German Andryeyev f0c7ecf617 SWDEV-455254 - Add kernel arg optimization
Add kernel arguments optimization into blit path.
Enabled by default on MI300.

Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e
2024-04-10 18:08:37 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
German Andryeyev 0f3391b93e SWDEV-311271 - Enable mempool under Windows
Change-Id: Ifa4cac4a8d52e031d63f62515439ca09efe7b4cb
2024-03-11 10:45:51 -04:00
Vikram 6f390f5af9 SWDEV-424956 - Fix OpenCL printf bug while printing vectors of half type
OpenCL printf handling did not process vector of half precision floats properly
 (mainly because compiler packs 2 halfs into a dword and runtime failed to extract the
 individual parts).

 This patch fixes the issue.

Change-Id: Ia1f15ccfb5db52b71c43cfd588dd38f551ee5277
2024-03-04 03:53:18 -05:00
Saleel Kudchadker 94c7004df8 SWDEV-301667 - Increase default signal pool to 4096
Change-Id: I4ab23b0f87e295b40ab76ad6e96249d11b8ad04d
2024-02-29 22:52:02 +00:00
Saleel Kudchadker 68f40f78dd SWDEV-443760 - Enable device kernel args for MI300
- Enable Device kernel args for MI300* for now.
- Fix a perf issue which impacts graph instantiate when dev kernel args
are enabled.

Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f
2024-02-28 19:10:04 -05:00
Rahul Garg b954d0d6e0 SWDEV-443760 - Disable HIP_FORCE_DEV_KERNARG by default
Change-Id: I8c3d8e65aa954bd28499eebefbc532d1177445dc
2024-02-22 04:37:51 -05:00
Todd tiantuo Li 7bfee3481b SWDEV-333557 - Enable PAL_HIP_IPC_FLAG by default
Change-Id: Ibb2ca0b9521aff4eca190e4817dcc5f8d697b172
2024-02-20 18:45:25 -05:00
Saleel Kudchadker f138e0d113 SWDEV-443760 - Enable device kern args
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback

Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2
2024-02-20 02:03:14 -05:00
Anusha GodavarthySurya ae0368d12d SWDEV-422207 - Enable DEBUG_CLR_GRAPH_PACKET_CAPTURE environiment variable
Change-Id: I9bf72b9c1a56980352109bd4d42b54ecb2d1b8f9
2024-02-05 05:08:11 +00:00
Anusha GodavarthySurya 0a055f874b SWDEV-422207 - Added debug env to dump graph during Instantiation
Change-Id: Ibde2ae5b8d240f3986bcd168facc513a319c0f17
2024-02-05 05:08:11 +00:00
pghafari 0cff14c9e1 SWDEV-441258 - remove full path for HIP LOG windows
Change-Id: Ibad6e9542c0cede38f5a114dcd352356ddedf019
2024-01-16 15:26:06 -05:00
German 7d661bc7df SWDEV-404889 - Enable debugger interface in PAL
Add GPU_DEBUG_ENABLE to control ttpm behavior. If enabled,
then HW will collect more debug info at some perf cost

Change-Id: Icee0686b903a7b1bd483710b9d611877cd43c6aa
2024-01-02 11:51:42 -05:00
kjayapra-amd e05923b139 SWDEV-413997 - Enable Virtual Mem support by default.
Change-Id: Ia3db3919701708cf95574692e1d47375ca99d7fd
2023-12-20 12:49:16 -05:00
German Andryeyev f1dc81f427 SWDEV-432174 - Change the fillBuffer kernel
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG

Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
2023-11-16 14:25:55 -04:00
Ioannis Assiouras 7868876db7 SWDEV-428244 - Set PARAMETERS_MIN_ALIGNMENT to the native alignment
Change-Id: I14d8a0db4e575d6fa816754c52df405de88d9200
2023-10-21 17:26:46 -04:00
kjayapra-amd 3ef829939a SWDEV-413997 - Initial VMM changes for ROCm path.
Change-Id: I4405fd7b53182eb4c4622835c811c0dc08461537
2023-10-16 11:29:16 -04:00
jiabaxie 28f0daa34f SWDEV-405983 - adding in HIP_LAUNCH_BLOCKING
Change-Id: I3f9c8a745099aab05155ebe910e727693961a02f
2023-10-10 21:11:13 -04:00
Anusha GodavarthySurya e63c280d4d SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch
Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7
2023-10-05 00:34:29 -04:00
German 7be3a5e33e SWDEV-407533 - [ABI Break]Remove Wavelimiter
Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd
2023-09-07 09:58:41 -04:00
Ioannis Assiouras 1302d6f119 SWDEV-420328 - Initialize AMD_LOG_MASK with decimals instead of hex
Change-Id: Id25510863c51206bca2e50fc93d6e1e1c5cbbfea
2023-09-07 03:04:37 -04:00
kjayapra-amd 6a0f80a03d SWDEV-381625 - Parse compiler and linker options from environment variable.
Change-Id: Id5a012b678e5973c4b64dff84444a909aefae006
2023-08-29 20:24:27 -04:00
German 077311153a SWDEV-407533 - [ABI Break]Purge unused env vars
Change-Id: I627950e8ebb6299affc602754a20d442dbe42b14
2023-08-24 14:11:40 -04:00
Saleel Kudchadker aa6eb555e2 SWDEV-384557 - Enable SDMA query
Change-Id: Ibb0a8d131f799985a4d4adbf753261e58c04157f
2023-08-01 18:41:23 -04:00