نمودار کامیت

80 کامیت‌ها

مولف SHA1 پیام تاریخ
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Todd tiantuo Li 41dc4545fc SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL
Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb
2024-10-10 18:00:19 -04:00
Anusha GodavarthySurya 742b0210d3 SWDEV-477324 - Capture Memcpy1D pinned H2D D2H
Change-Id: I1f4744f20a9caeed005ec68da44e5fde737e09f7
2024-09-30 01:01:30 -04:00
Vladana Stojiljkovic da5f1a6146 SWDEV-482086 - Fix hipGraphInstantiate leak
* In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released.
* The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not.

Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692
2024-09-29 11:36:53 -04:00
Chong Li e6a5c81221 SWDEV-478929 - Benchmark ReallyQuickPureX Failed
Ensure the member function Alloc() and Free() of command_pool_ will not be
accessed after command_pool_ be destructed.

Signed-off-by: Chong Li <chongli2@amd.com>
Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194
2024-09-10 22:08:18 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya bd3a35bde1 SWDEV-468424 - Add support to capture multiple AQL Packets
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.

Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
2024-08-01 23:55:51 -04:00
Anusha GodavarthySurya 346da4bb40 SWDEV-468424 - hipgraph capture memset node
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.

Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..

Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
2024-07-19 23:52:50 -04:00
Tao Sang 73c02041e1 SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
2024-07-04 11:50:46 -04:00
kjayapra-amd 892071aeb2 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543
2024-06-06 16:57:53 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Saleel Kudchadker badf2b0880 SWDEV-301667 - Refactor graph code
- Remove Last graph node optimization and instead submit a barrier NOP
packet always. This simplifies the code.

Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c
2024-05-28 06:28:17 +00:00
German Andryeyev 5b0bfdcbad SWDEV-460242 - Add system memory suballocator
Switch commands creation to the new suballocator to avoid
frequent expensive OS calls

Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400
2024-05-28 06:28:17 +00:00
Saleel Kudchadker 4a9d24a211 SWDEV-301667 - Pass reference to kernel name
Change-Id: I21abe109ddfabfe7640bf78a96c81a1317d31952
2024-05-05 16:38:20 -04:00
Saleel Kudchadker c157bfb202 SWDEV-301667 - Create TS for each node recorded in graph
- Create a vector to allow multiple TS to be stored in Command.
- This would mean we dont wait for entire batch in Accumulate command
to finish when we exhaust signals.
- Reduce the number of signals created at init to 64. This min value
may still need to be tuned but the KFD allows max of 4094 interrupt
signals per device.
- Store kernel names whenever they are available and not just when
profiling. If we dynamically enable profiling like for Torch, a crash
can happen if hipGraphInstantiate wasnt included in Torch profile scope
beacuse we previously entered kernel names only when profiler is
attached.

Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006
2024-03-26 14:47:24 -04:00
Saleel Kudchadker 9a6ddae7b2 SWDEV-301667 - Reset profiler correlation_id_
- The correlation_id had random junk values which we were inserting in
the dispatch AQL packet even when no profiler was attached but if we had
a valid timestamp.
- Also make sure we dont even write the reserved2 field in the AQL
packet if no profiler attached.

Change-Id: Icdb7493198c1bb5e2d786a97e027288660854cd7
2024-02-05 05:08:11 +00:00
Saleel Kudchadker dfd4635f91 SWDEV-422207 - Tag captured kernel names for graphs
Change-Id: I9540daa4abf9c340541a681037e2dca4eec821ed
2024-01-03 11:50:05 -05:00
Saleel Kudchadker b056686607 SWDEV-422207 - Report kernel names for activity profiling
- Report kernel names for optimized graph path
- Refactor code so that we store profiling info in Accumulate command

Change-Id: Ib97735a0239aeb9fc3a50a4bb7126dd0bcadc8af
2023-11-15 14:38:07 -05:00
Saleel Kudchadker c3bd229f4f SWDEV-422207 - Optimize graph end detection
- Do not use extra barrier to detect graph end. If its a kernel node we
can use a completion signal for the last packet. Saves roughly 6us for
Phantom testcase per graph launch.

Change-Id: I5e0c2479d9964fbeda86ed97533f6718f49a7f91
2023-11-10 11:57:02 -05:00
Saleel Kudchadker f5c6fc4dfa SWDEV-422207 - Report TS for Accumulate command
Change-Id: Iba193a6068c1a2d25c2136643faee2c1e2591a07
2023-11-07 18:19:40 +00:00
Saleel Kudchadker 40f41f4d0b SWDEV-422207 - Track commands for capture
- Track all captured commands under a new AccumulateCommand
- Add begin() and end() methods to capture commands
- Explicit TS object now passed to certain methods because
profilingBegin() and profilingEnd() now happen separately and thus can
run into threading issues

Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f
2023-11-03 05:09:04 +00:00
Anusha GodavarthySurya e63c280d4d SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch
Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7
2023-10-05 00:34:29 -04:00
German 7be3a5e33e SWDEV-407533 - [ABI Break]Remove Wavelimiter
Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd
2023-09-07 09:58:41 -04:00
Tao Sang d433df4761 SWDEV-417727 - Fix hipSignalExternalSemaphoresAsync()
This reverts commit 44a3935cda.

Implement the right way to make ExternalSemaphores be signalled
only after prior works on the stream have been finished.

Change-Id: I9d5974e05d5f229170b928db4566c14e40e3cbaa
2023-08-23 22:31:27 -04:00
taosang2 44a3935cda SWDEV-417727 - Fix hipSignalExternalSemaphoresAsync()
Let ExternalSemaphores be signalled only after prior works on the
stream have been finished.

Change-Id: I856917db905f68f55fdf484f5267f7fe8ea3117f
2023-08-23 14:58:37 -04:00
Anusha GodavarthySurya b0e6f99ad7 SWDEV-392732 - Initial commit for graph doorbell optimization(AQL Buffering)
Change-Id: I451725006c54c249dc530c55d2af2a31594bf49b
2023-07-16 07:56:00 -04:00
sdashmiz 38a67df312 SWDEV-403638 - Fix warnings
- disable deprecated function use warning
- disalbe size_t to .type' warning
- disable conversion from 'type1' to 'type2' warning

Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I64161fd37cf56de3d132102103267ae8da40193a
2023-06-15 12:17:22 -04:00
German 04b696abee SWDEV-353281 - VM support in mempool for graphs
The change enables VM support in graphs on Windows. That allows
to avoid caching of all allocations at the cost of map/unmap
overhead during memory create/destroy.

Change-Id: I792be00fba099e5e5d3cd44a963e1dfd6976a86d
2023-05-05 15:31:26 -04:00
Saleel Kudchadker 20ca8b8116 SWDEV-384557 - Leverage SDMA engine status query
Change-Id: I5f386f2965de24a229ea43b6c4da82099692f91f
2023-04-05 07:50:53 +00:00
German 53a10c9039 SWDEV-377991 - Remove liquidflash support
Change-Id: Iba6455e5c0210c3223a06fec332404cd9f489154
2023-01-20 09:57:06 -05:00
Ioannis Assiouras 72b45e2a1f SWDEV-369581 - Convey copy API metadata to ROCclr
Change-Id: I569462d6d268700d419510255e201bf7d80d6714
2022-12-09 00:27:15 -05:00
Sourabh Betigeri 5d7f3f9f3c SWDEV-305894 - Cooperative groups grid and multi grid sync support for gfx940+
Change-Id: I35d72f1cb50c3a96eee56a612b72d641852b145f
2022-12-05 16:30:30 -05:00
Laurent Morichetti e713b5c7d0 SWDEV-351980 - Enable profiling for commands reporting activities
Profiling should be enabled for any command reporting activities as the
activity record captures the profilingInfo's start and end timestamps.

Since IS_PROFILER_ON is only used to determine whether API tracing is
enabled, there is no need to expose it globally, it should be a property
of the activity_prof::CallbacksTable.

Change-Id: I44a0d19ed2862606cfbc9a98c1a07a336ab7e26c
2022-09-21 05:53:59 -04:00
Laurent Morichetti 4fbae91468 SWDEV-351980 - Move activity_ to the ProfilingInfo
The activity_ is only instantiated if profiling is enabled.

Remove the HIP private global record ID. Instead, use the correlation ID
stored in the hip_api_data_t by the profiler while the last HIP function
is in scope.

For NDRange and Copy commands, store the kernel name and byte size
(respectively) in the record.

General cleanups to improve the code's readability.

Change-Id: I01907484b0d9611eb9440c3a7c4865479dc42289
2022-09-21 05:53:47 -04:00
Christophe Paquot 67657d6099 SWDEV-322620 - Virtual Memory Management
Implement map/unmap for PAL backend
Create commands since PAL uses the IQueue to map/unmap

Change-Id: I97e26a7d28ae5e10774c9ca65307153100945621
2022-04-22 18:09:26 -04:00
Saleel Kudchadker b6cbfaf499 SWDEV-301667 - Separate scope from marker_ts_
Change-Id: I19f4d394e898bfb8c9d9a2c2edf9d5bf5def3b08
2022-04-16 19:26:31 -04:00
Saleel Kudchadker 8eeaa998c0 SWDEV-301667 - Add cache state for a device
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker

Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
2022-04-12 12:27:31 -04:00
anusha GodavarthySurya ef1ec6ffde SWDEV-240806 - hipGraph performance create new graph commands for every launch
Change-Id: Ifd4a373d6a76118ae0946238b29accfacbe32937
2021-11-19 00:09:47 -05:00
Jason Tang f61dc18681 SWDEV-292525 - Fix -Werror=parentheses build failure
Change-Id: I2650413914914392df68a9fbf669af216a132640
2021-10-22 09:43:15 -04:00
Sourabh Betigeri 641b1d3968 SWDEV-292525 - Add more parentheses to fix debug build failures
Change-Id: I91bb7e1f0f40b85dd908a532a77b11c9e7406019
2021-10-09 00:04:01 +00:00
anusha GodavarthySurya f4bdb5c6ff SWDEV-24806 - Fix compilation warning
Change-Id: I6d015b0349e01047f8f26a8d73365e2963990eb0
2021-10-06 22:09:17 -07:00
Sourabh Betigeri 5e116c6c99 SWDEV-292525 - Adds parentheses to fix regression
Debug builds fails with error due to missing
parentheses with -Werror=parentheses enabled

Change-Id: I5745a63b5cf2c7a3aeed90ea572081a6fa67e366
2021-10-06 13:38:55 -04:00
anusha GodavarthySurya 34e86bf0c3 SWDEV-24806 - Added support to update memory command params
Change-Id: Ib518eaedeeb820023a05278a017a9716e5601dca
2021-10-05 10:51:14 -04:00
Sarbojit Sarkar 22a847f3ce SWDEV-301823 - Optimize hipMemset2D/3D
Change-Id: Ibe560149a263c2ac6b08e4eb1a1d331d2aeac78c
2021-09-27 14:10:06 -04:00
Sourabh cbb8d82bdb SWDEV-292525 - [vdi] Path to streamOps shaders
Implementation to use a blit kernel to perform
a hipStreamWait/write instead of an AQL packet.

Change-Id: I462671ed5cec37144dfe97ff66439249196117c1
2021-09-27 13:59:35 -04:00
Vladislav Sytchenko c68f024b35 SWDEV-1 - Fix Windows build
std: :mem_fun() is removed in c++17. Simplify logic to not require it.
Change-Id: Ic9a4753b48dd13fcb20cd5b90ff73c3df3211b9f
2021-09-08 12:59:48 -04:00
Vladislav Sytchenko 215853fd54 SWDEV-298985 - Calm down build warnings
This resolves -Wreorder warning.

Change-Id: I28851d66e19a70c4851ac056819d2daadbdc7113
2021-08-29 13:58:48 -04:00
Satyanvesh Dittakavi 169cc857fd SWDEV-298985 - hipMemPrefetchAsync should prefetch the data to the specified destination device
Pass the device agent specified by the user to the ROCr api instead of passing the device agent attached to the specified stream

Change-Id: I86c98935b9dc404eaa6d47ccdd082a8c3678fb36
2021-08-27 05:12:07 -04:00
Vladislav Sytchenko de53cd1903 SWDEV-240806 - Fix Windows build
Fixes error "All control paths should return a value".

Change-Id: I4718688b55b24862465e15ea0d64b32fa44b3299
2021-08-22 23:56:08 -07:00
anusha GodavarthySurya 050d54b503 SWDEV-240806 - Add methods to update kernel command parameters
Change-Id: Iba90a31f9c5d6d4f2b60b7ccf903325c03d4d245
2021-08-22 23:56:08 -07:00