Commit Graph

108 Commits

Author SHA1 Message Date
German Andryeyev 6bb7d1afdc SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
2024-10-18 11:35:54 -04:00
German Andryeyev 8657a77029 SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
2024-10-17 10:53:57 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Saleel Kudchadker 35e03ea0d0 SWDEV-301667 - Logging upgrades
- Use AMD_LOG_LEVEL_SIZE in MBs to set log file size truncation, by default its 2048 MB

Change-Id: Ia2f87e8c6b94148e30edfb602b279f93630817c3
2024-10-04 13:26:25 -04:00
Saleel Kudchadker 9de6d4d46c SWDEV-478624 - Use readback workaround to ensure kernel arg coherence
Use env var DEBUG_CLR_KERNARG_HDP_FLUSH_WA=1 to fall back to HDP flush
workaround. The default is 0

Change-Id: I7bdb9be61da60c30d15ac9991b7cd27351e1831c
2024-09-11 14:53:15 -04:00
victzhan 7a01db98e9 Revert "SWDEV-458943 - make new AMD_MONITOR on"
This reverts commit f8598dabb0.

Change-Id: I2a7ddb2d4340224f43749a2ea91a894a8a95b83b
2024-09-05 10:10:50 -04:00
Ioannis Assiouras 2c84211b58 SWDEV-470372 - Added hipExtHostAlloc API
This change adds a new HIP API `hipExtHostAlloc` which preserves
the functionality of `hipHostMalloc`.

Change-Id: I13504c6fc13465ddd7aed329795bb4f2fef1baff
2024-08-27 08:26:03 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Saleel Kudchadker d379f4efd0 SWDEV-301667 - Refactor Blit force env var
Change-Id: I5344ac2e6442cd8f526118e688f1b1412cc5b45a
2024-07-25 15:15:10 -04:00
taosang2 f8598dabb0 SWDEV-458943 - make new AMD_MONITOR on
make DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR be true

Change-Id: I1d21378ff462478d3238d71e4e2a1a7d6b9167ac
2024-07-24 14:29:27 -04:00
Tao Sang 73c02041e1 SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
2024-07-04 11:50:46 -04:00
Ioannis Assiouras fa07c33cba SWDEV-470787 - Fixed undefined symbols for flags in static build
Change-Id: I7812c8924396d0df9ab331f9a1844aabbf5a9211
2024-07-04 02:57:22 -04:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
kjayapra-amd 892071aeb2 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543
2024-06-06 16:57:53 -04:00
Tao Sang d0050ce309 SWDEV-433371 - Support new comgr unbundling action
Support new comgr unbundling action api to extract codebjects
in compressed and uncompressed modes.

Create HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION ENV to
toggle new path and old path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=false(default),
   uncompressed codeobject will go old path for better perf,
   compressed   codeobject will go new path.
If HIP_ALWAYS_USE_NEW_COMGR_UNBUNDLING_ACTION=true,
   both uncompressed and compressed codeobjects will go new
   path.

Add comgr wrapper for
   amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: I79952f132fe21249296685ee12cae05a4f9aec32
2024-05-28 06:31:10 +00:00
Tao Sang a1350fe8c1 Revert "SWDEV-433371 - use comgr to unbundle code objects"
This reverts commit e53df57ffe.

Reason for revert: <INSERT REASONING HERE>
New comgr unbundling action leads to perf drop for uncompressed code object.   Will create a new patch to use old path for uncompressed , new unbundling api for compressed . 

Change-Id: I41ef53b71fc9f7aaa8cf231d4d70945f1117db52
2024-05-28 06:31:10 +00:00
taosang2 e53df57ffe SWDEV-433371 - use comgr to unbundle code objects
1.Make runtime use comgr to unbundle code objects
2.Support compressed/uncompressed modes
3.Remove HIP_USE_RUNTIME_UNBUNDLER and
  HIPRTC_USE_RUNTIME_UNBUNDLER to simplify logics
4.Add comgr wrapper for
  amd_comgr_action_info_set_bundle_entry_ids()

Change-Id: Ic41b1ad1b64cca1e31986437983a5146d52a7329
2024-05-01 16:09:12 -04:00
German Andryeyev 7a371503b2 SWDEV-311271 - Enable mempools under Linux
Change-Id: I7fda94e61121f9d3a30f4ad185b8a97712922f3c
2024-04-29 18:06:34 -04:00
German Andryeyev f0c7ecf617 SWDEV-455254 - Add kernel arg optimization
Add kernel arguments optimization into blit path.
Enabled by default on MI300.

Change-Id: I2694a81b90d48ad07d86dfe4c0c64fe187bada8e
2024-04-10 18:08:37 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
German Andryeyev 0f3391b93e SWDEV-311271 - Enable mempool under Windows
Change-Id: Ifa4cac4a8d52e031d63f62515439ca09efe7b4cb
2024-03-11 10:45:51 -04:00
Saleel Kudchadker 94c7004df8 SWDEV-301667 - Increase default signal pool to 4096
Change-Id: I4ab23b0f87e295b40ab76ad6e96249d11b8ad04d
2024-02-29 22:52:02 +00:00
Saleel Kudchadker 68f40f78dd SWDEV-443760 - Enable device kernel args for MI300
- Enable Device kernel args for MI300* for now.
- Fix a perf issue which impacts graph instantiate when dev kernel args
are enabled.

Change-Id: I962e58fd9d8dd1a8db95e601cb03a8e9c7bac97f
2024-02-28 19:10:04 -05:00
Rahul Garg b954d0d6e0 SWDEV-443760 - Disable HIP_FORCE_DEV_KERNARG by default
Change-Id: I8c3d8e65aa954bd28499eebefbc532d1177445dc
2024-02-22 04:37:51 -05:00
Todd tiantuo Li 7bfee3481b SWDEV-333557 - Enable PAL_HIP_IPC_FLAG by default
Change-Id: Ibb2ca0b9521aff4eca190e4817dcc5f8d697b172
2024-02-20 18:45:25 -05:00
Saleel Kudchadker f138e0d113 SWDEV-443760 - Enable device kern args
- Implement workaround to ensure HDP writes are done by writing and
reading the HDP MMIO register.
- Implement the same workaround for graphs, we no longer need sentinel
write/readback

Change-Id: I0d3027b46a1f61131ec62e3c8c669ff5184fa6b2
2024-02-20 02:03:14 -05:00
Anusha GodavarthySurya ae0368d12d SWDEV-422207 - Enable DEBUG_CLR_GRAPH_PACKET_CAPTURE environiment variable
Change-Id: I9bf72b9c1a56980352109bd4d42b54ecb2d1b8f9
2024-02-05 05:08:11 +00:00
Anusha GodavarthySurya 0a055f874b SWDEV-422207 - Added debug env to dump graph during Instantiation
Change-Id: Ibde2ae5b8d240f3986bcd168facc513a319c0f17
2024-02-05 05:08:11 +00:00
German 7d661bc7df SWDEV-404889 - Enable debugger interface in PAL
Add GPU_DEBUG_ENABLE to control ttpm behavior. If enabled,
then HW will collect more debug info at some perf cost

Change-Id: Icee0686b903a7b1bd483710b9d611877cd43c6aa
2024-01-02 11:51:42 -05:00
kjayapra-amd e05923b139 SWDEV-413997 - Enable Virtual Mem support by default.
Change-Id: Ia3db3919701708cf95574692e1d47375ca99d7fd
2023-12-20 12:49:16 -05:00
German Andryeyev f1dc81f427 SWDEV-432174 - Change the fillBuffer kernel
- Add the new fillBuffer kernel, which allows to launch a limited
number of workgroups for memory fill operation
- Switch fill memory to 16 bytes write by default
- Allow to limit the workgroups with DEBUG_CLR_LIMIT_BLIT_WG

Change-Id: Ibad1822f2d42b2fc71bcfc1917c31409c0623e8e
2023-11-16 14:25:55 -04:00
Ioannis Assiouras 7868876db7 SWDEV-428244 - Set PARAMETERS_MIN_ALIGNMENT to the native alignment
Change-Id: I14d8a0db4e575d6fa816754c52df405de88d9200
2023-10-21 17:26:46 -04:00
kjayapra-amd 3ef829939a SWDEV-413997 - Initial VMM changes for ROCm path.
Change-Id: I4405fd7b53182eb4c4622835c811c0dc08461537
2023-10-16 11:29:16 -04:00
jiabaxie 28f0daa34f SWDEV-405983 - adding in HIP_LAUNCH_BLOCKING
Change-Id: I3f9c8a745099aab05155ebe910e727693961a02f
2023-10-10 21:11:13 -04:00
Anusha GodavarthySurya e63c280d4d SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch
Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7
2023-10-05 00:34:29 -04:00
German 7be3a5e33e SWDEV-407533 - [ABI Break]Remove Wavelimiter
Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd
2023-09-07 09:58:41 -04:00
kjayapra-amd 6a0f80a03d SWDEV-381625 - Parse compiler and linker options from environment variable.
Change-Id: Id5a012b678e5973c4b64dff84444a909aefae006
2023-08-29 20:24:27 -04:00
German 077311153a SWDEV-407533 - [ABI Break]Purge unused env vars
Change-Id: I627950e8ebb6299affc602754a20d442dbe42b14
2023-08-24 14:11:40 -04:00
Saleel Kudchadker aa6eb555e2 SWDEV-384557 - Enable SDMA query
Change-Id: Ibb0a8d131f799985a4d4adbf753261e58c04157f
2023-08-01 18:41:23 -04:00
Todd tiantuo Li 04b9ab49eb SWDEV-333557 - add PAL_HIP_IPC_FLAG for PAL HIP device allocations
Change-Id: I9017f4e3b03d4817bf233c788e30775fb2297589
2023-07-17 08:10:25 -04:00
Anusha GodavarthySurya b0e6f99ad7 SWDEV-392732 - Initial commit for graph doorbell optimization(AQL Buffering)
Change-Id: I451725006c54c249dc530c55d2af2a31594bf49b
2023-07-16 07:56:00 -04:00
Saleel Kudchadker 770b2a4711 SWDEV-384557 - Rename env var
- Rename HIP_USE_SDMA_QUERY to DEBUG_CLR_USE_SDMA_QUERY as this is
supposed to be a temporary env var for debug purposes only.

Change-Id: If6ebd52ab87624375a3df24ceccdcc05c60a65af
2023-06-29 13:54:55 -04:00
Ioannis Assiouras 4add0e6563 SWDEV-405182 - Revert min alignment for abstract parameters stack to 16 bytes
Change-Id: I9e6ace281468e8ef11b011c58f5971ce8907f3c6
2023-06-23 04:39:51 -04:00
Saleel Kudchadker 8d193c32bb SWDEV-384557 - Use toggle for SDMA query
- Use HIP_USE_SDMA_QUERY env var toggle for new API use. Env var is 0 by
default

Change-Id: If725a0c41e15f78a1a6c3f47942954fe9240b4db
2023-06-15 01:02:24 -04:00
Jacob Lambert 443f912c7f SWDEV-375055 - Re-enable Comgr unbundler
With recent upstream changes (D145770), we can now use the
Comgr unbundler without requiring an env field in the supplied
targetID. For users, this is consistent with previous legacy
unbundler behavior.

Change-Id: I5f085b0fa1ad352bbbb282b75367c206b75f279f
2023-05-31 16:14:08 -04:00
Saleel Kudchadker 5436d362b1 SWDEV-301667 - Add a flag for gpuvm kernargs
HIP_FORCE_DEV_KERNARG=1 will create a device allocation for kernel arg
segment. Flag is 0 by default.

Change-Id: Iaaf5a149f3be8596568878d5d272268baf067c60
2023-05-22 11:23:48 -04:00
Alex Voicu 06df9e2efd SWDEV-301667 - Kernelarg gpuvm
Add aligned, nontemporal `memcpy` for kernarg.

Change-Id: I5d8ac76904feaf793b45ec2ea5fbd1069be20068
2023-05-22 11:21:14 -04:00
German 04b696abee SWDEV-353281 - VM support in mempool for graphs
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.

Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
2023-05-05 15:31:26 -04:00
Maneesh Gupta 5dc104b3ea SWDEV-368235 - Revert "Remove obsolete env variables"
This reverts commit 7b50c935f8.

Reason for revert: Deferred to a future release.

Change-Id: Ia66c37f0ab9734dee73c930d10d7469d5fd57254
2023-02-15 07:25:00 +00:00
German 7b50c935f8 SWDEV-368235 - Remove obsolete env variables
Change-Id: I7e14d53297e79e2f68b3a6cc40251ad7db9eb5ab
2023-02-03 13:44:24 -05:00