Commit Graph

321 Commits

Author SHA1 Message Date
German Andryeyev 403f624bf8 SWDEV-486602 - Add tracking of HSA handlers
Add an atomic counter to track the outstanding HSA handlers.
Wait on CPU for the callbacks if the number exceeds the value
in DEBUG_HIP_BLOCK_SYNC env variable.

Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5
2024-10-25 15:20:50 -04:00
Julia Jiang 9f2f6a8aa7 SWDEV-488396,489257 - Fixed the regression in CTS pipes sub-test failure
Change-Id: Id4004f0d6da5754b12c9a21038de50472cb1fee5
2024-10-25 05:58:46 -04:00
German Andryeyev 6bb7d1afdc SWDEV-486602 - Fix Windows 32 bit build
Windows alings fields to 8 bytes even with 32bit builds.
Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool.

Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293
2024-10-18 11:35:54 -04:00
German Andryeyev ad18146d8f SWDEV-486602 - Change SysmemPool implementation
- Remove the list of all chunks and use embedded chunk
information in each allocation. That simplifies Free() logic,
avoiding expensive loop if for some reason the number of
outstanding allocations significantly grew.

Change-Id: I9ea84d314320ce356ed24dd3180f262e2116c59b
2024-10-17 12:39:39 -04:00
German Andryeyev 8657a77029 SWDEV-491375 - Limit the SW batch size
Applications may submit commands withoout waits
for GPU. That causes a growth of SW unreleased commands.
Make sure runtime flushes SW queue, if it grows over some
threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE.

Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396
2024-10-17 10:53:57 -04:00
German Andryeyev 364dfb0ed1 SWDEV-486602 - Optimize HSA callback performance
- Don't generate callbacks for HIP events
- Don't process profiling info in the callback for HIP events
- Wait for CPU status update of the submitted commands
every 50 calls. That will allow to drain the commands and
destroy HSA signals.

Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9
2024-10-11 14:50:25 -04:00
Todd tiantuo Li 41dc4545fc SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL
Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb
2024-10-10 18:00:19 -04:00
kjayapra-amd 74ebbe17e9 SWDEV-486573 - Check the return type of commit memory.
Change-Id: Id158cd7a0dff37b382b858cf7113aa4cf326300a
2024-10-09 05:10:03 -04:00
Branislav Brzak 43fcac1739 SWDEV-482130 - Fix release of virtual mem obj
Change-Id: I893a8353aa1a25d00e36c8e601caf31cc0fc1f22
2024-10-08 01:37:39 -04:00
Ioannis Assiouras 07bcc283f9 SWDEV-488851 - Correctly remove the queue from the active set on windows
Change-Id: I4d21743ecf7a44636121f85566f898e62ff61e97
2024-10-02 12:06:59 +01:00
Anusha GodavarthySurya 742b0210d3 SWDEV-477324 - Capture Memcpy1D pinned H2D D2H
Change-Id: I1f4744f20a9caeed005ec68da44e5fde737e09f7
2024-09-30 01:01:30 -04:00
Vladana Stojiljkovic da5f1a6146 SWDEV-482086 - Fix hipGraphInstantiate leak
* In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released.
* The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not.

Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692
2024-09-29 11:36:53 -04:00
Anusha GodavarthySurya 870842201d SWDEV-485904 - Fix virtual,physical mem obj leaks
Change-Id: Ie0456b5dcfec206ae54a6aabfc2a15a620cac693
2024-09-19 23:04:20 -04:00
kjayapra-amd 12a39fbf22 SWDEV-480772 - Remove name variable from amd::Monitor class.
Change-Id: Ie2a4fa44f485786227230f8a892e090e718aa30e
2024-09-19 11:55:01 -04:00
Ioannis Assiouras bcc545e6b8 SWDEV-476929 - Introduce an activeQueues set
The new set tracks only the queues that have a command
submitted to them. This allows for fast iteration
in waitActiveStreams.

Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7
2024-09-16 15:53:49 -04:00
kjayapra-amd 4ecd77df5e SWDEV-484188 - Moving std::maps into struct const and into amd::Kernel class.
Change-Id: Ie4d5a64511412fdb498b045aaffb52c3a1286de6
2024-09-15 09:14:51 -04:00
Chong Li e6a5c81221 SWDEV-478929 - Benchmark ReallyQuickPureX Failed
Ensure the member function Alloc() and Free() of command_pool_ will not be
accessed after command_pool_ be destructed.

Signed-off-by: Chong Li <chongli2@amd.com>
Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194
2024-09-10 22:08:18 -04:00
Saleel Kudchadker abc80fcc2f SWDEV-301667 - Improve kernel logging
Change-Id: I4b2b1950e3ab7124fd41af9a92a677c48d6da5eb
2024-09-10 13:43:58 -04:00
kjayapra-amd 6211037f63 SWDEV-439234 - Access check before memcpy and kernel operations.
Change-Id: I7057125c03460db205409e19980145298c190fe2
2024-09-06 14:30:00 -04:00
taosang2 857d0d60b9 SWDEV-475144 - Fix random language string
Fix random language string that leads to compiling failure
of trap handler and TDR of hipMemset() on VM in release
mode of hip-rt

Change-Id: Ie1d874742b804f62ceda68064fa54f5d39c092b8
2024-08-20 17:42:31 -04:00
German Andryeyev 9db52f9a46 SWDEV-470612 - Add the optimized multistream path
- Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution
- Optimize the launch latency, where commands
creation and execution is done at the same time
- Optimize the scheduling to use less barriers and waiting signals if
the same queue  can be detected
- The new path is controlled by  DEBUG_HIP_FORCE_GRAPH_QUEUES
environment variable, where 0 will use the original path and any other
value will force the number of asynchronous queues for execution
- DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async
execution in graphs(applicable for Navi families only)

Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e
2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya bd3a35bde1 SWDEV-468424 - Add support to capture multiple AQL Packets
=> Added support to capture multiple AQL Packets.
=> Added Interface to callback to hip runtime from rocclr to allocate
kernel args from the graph kernel arg pool.
=> Enabled Support to capture memset node.

Change-Id: I7e1c2ba06927459e024653058af142bd82192c43
2024-08-01 23:55:51 -04:00
Ioannis Assiouras 8e137e8702 SWDEV-476460 - Fix for a race condition in SysmemPool::Alloc
Change-Id: Ia94709e68b236c9460589963c0f09ec1f481c306
2024-08-01 04:22:26 -04:00
Ioannis Assiouras 9b33db9b24 SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary
hipDeviceSynchronize called from __hipUnregisterFatBinary
accesses static maps and monitors. This change ensures these ojects
are not destroyed before __hipUnregisterFatBinary  is called.
Additionally it disables the teardown process for static build.

Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d
2024-07-30 10:26:59 -04:00
Anusha GodavarthySurya 346da4bb40 SWDEV-468424 - hipgraph capture memset node
Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch.

Added support to capture single graph memset node.
Capture support for memset node is currently disabled.
Memset capture will be enabled when capture for multiple packets are supported..

Change-Id: I14dfbc41731025cc3a548a730558915def3fa384
2024-07-19 23:52:50 -04:00
Tao Sang 73c02041e1 SWDEV-458943 - Implement std::mutex based monitor
Implement std::mutex based monitor that has much
simpler logics than legacy monitor.
Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to
toggle them.
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false
  (by default), use legacy monitor;
If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true,
  use std::mutex based monitor.
If no perf drop of stl::mutex based monitor,
legacy one will be removed later.

Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad
2024-07-04 11:50:46 -04:00
Saleel Kudchadker 561fb8a459 SWDEV-470008 - Fix AMD_SERIALIZE_KERNEL
- awaitCompletion code may do a endless spin wait for cases where we
dont submit a handler. One such case can be the hipExt*Launch API which
takes a stop event. In that case we optimize the stop event by attaching
a signal to the dispatch packet but dont submit a handler when we attach
the signal. That means if awaitCompletion() is called after that, we
would keep on waiting on command status on the host rather than simply
checking signal value.

Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283
2024-07-02 19:05:05 -04:00
taosang2 749385155a SWDEV-467540 - Get lastCommand safely
We must be in protected way to get last command when calling
awaitCompletion() where lastCommand will be released and
possibly destroyed.
This can solve scope lock(notify_lock_) crash in
Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true.

Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34
2024-06-28 21:18:22 -04:00
taosang2 544c45364f SWDEV-467540 - Fix reference of freed locks
1.Move global amd::monitor listenerLock before global
class runtime_tear_down as it will be referenced in
~RuntimeTearDown() after main(). It should be freed
later than runtime_tear_down.

2.Update  Device::~Device() to SVM free coopHostcallBuffer_
before context_ is released and freed.

Change-Id: I1d21378ff463477d3238d71e5e2a1a7d6b9147ad
2024-06-18 13:58:36 -04:00
Ioannis Assiouras d44f44a5b1 SWDEV-467069 - Added safety check in activity prof for accumulate command
Adding a safety check prevents an invalid memory access
if timestamps and kernelNames vectors are of different size.

The patch also moves the addKernelNames for the accumulate command
into dispatchAqlPacket function.

Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3
2024-06-12 21:53:03 +01:00
Ioannis Assiouras 3edf1501cc SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds
Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1
2024-06-12 16:22:27 -04:00
German Andryeyev ae2992ea43 SWDEV-459610 - Skip destruction for the child process
Fork() duplicates all system memory resources, but runtime can't duplicate
GPU resources. Thus, avoid tearDown() calls for the child process(s).

Change-Id: Id6b12bacd5112b9ad3747c218e09cba98ea1b42c
2024-06-12 11:12:39 -04:00
Ioannis Assiouras 775dc204aa SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd
Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd
2024-06-07 12:23:06 -04:00
kjayapra-amd 892071aeb2 SWDEV-460948 - Changes to alloc, set, capture under single function.
Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543
2024-06-06 16:57:53 -04:00
Ioannis Assiouras b8c2ac4de4 SWDEV-463865 - symbol renamings to prevent conflicts in static build
Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb
2024-06-06 04:05:55 -04:00
Saleel Kudchadker ecff928284 SWDEV-463428 - Acquire correlation ID after clear
Change-Id: I472085178d5751f5e2c8a6dfe190b6b3249317f0
2024-06-06 03:49:01 -04:00
Gu, Wangfeng 0c6a952a90 SWDEV-460019 - [OGLP][Nv2x] DaVinci Resolve Studio: Crash observed when editing in color tab
When CL-GL interop is used, a GL context are used by two or more threads at the same time, which causes race condition.

Solution:
Add lock when accessing GL functions during CL-GL interop.

Change-Id: I3a34da3cbdf74c401111cc4e3a04ad84cc52709e
2024-06-04 16:35:44 -04:00
Ioannis Assiouras d6eaf49033 SWDEV-460925 - Do awaitCompletion before releasing the lastEnqueueCommand
Change-Id: I210399dd1bced13c0923fdb1c215e044920c5a4b
2024-05-28 06:31:10 +00:00
Saleel Kudchadker badf2b0880 SWDEV-301667 - Refactor graph code
- Remove Last graph node optimization and instead submit a barrier NOP
packet always. This simplifies the code.

Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c
2024-05-28 06:28:17 +00:00
Satyanvesh Dittakavi 3d540ec113 SWDEV-457755 - Add TS only for kernel packets in the Accumulate command
Change-Id: I1b2f01c5763761808f49802fa117abc6306a22aa
2024-05-28 06:28:17 +00:00
Saleel Kudchadker 51e4368723 SWDEV-459778 - Remove CPU wait for profiler
- No cpu wait is needed when profiler is attached, Doing this changes
the application profile when roctracer is attached.

Change-Id: I2b9cfc48d697cf5ed54bb6a240d8c12bdb079171
2024-05-28 06:28:17 +00:00
German Andryeyev 5b0bfdcbad SWDEV-460242 - Add system memory suballocator
Switch commands creation to the new suballocator to avoid
frequent expensive OS calls

Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400
2024-05-28 06:28:17 +00:00
Saleel Kudchadker 4a9d24a211 SWDEV-301667 - Pass reference to kernel name
Change-Id: I21abe109ddfabfe7640bf78a96c81a1317d31952
2024-05-05 16:38:20 -04:00
Jaydeep Patel 1d48f2a1ab SWDEV-456279 - Adding new hip flag to access contiguous memory and pass the flag to HSA API.
Change-Id: I1bafeaa3096395c729723af958d609bc41e7845c
2024-04-30 05:25:38 -04:00
kjayapra-amd 0e1a0572e6 SWDEV-413997 - Changes to use GlobalContext in views.
Change-Id: I1f8411eae9ed49632667e244a25f223fed92c720
2024-04-29 16:41:39 -04:00
German Andryeyev 5c1804aa14 SWDEV-353281 - Corret VA unmap
Make sure graph mempool unmaps VA on release

Change-Id: Id3f1bd8d0115b533ae60aa5ba3676b8bf7e5b961
2024-04-26 09:37:01 -04:00
German Andryeyev 0ccdb3e160 SWDEV-440746 - Release last command on terminate
Change-Id: Ib6a9b8fc9a8692eb17b39b854cefd92c6b59733f
2024-04-22 09:57:38 -04:00
German Andryeyev 7448113cfc SWDEV-440746 - Remove obsolete code
The "optimized" version of memcpy is outdated and
was used in win32 only.

Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6
2024-04-22 09:56:42 -04:00
German Andryeyev fd81490bb8 SWDEV-440746 - Don't set CL_SUBMITTED twice
Change-Id: I9ba34454f7487d6bc0d398b322a147cbac6c6443
2024-04-19 17:36:51 -04:00
German Andryeyev c95a75a2bf SWDEV-444670 - Enable teardown class
Force implicit runtime teardown with a global destructor.

Change-Id: Iabe63dedf5b94fefc98668585c45a61607120669
2024-04-16 12:00:06 -04:00