rocm-systems

Author	SHA1	Message	Date
German Andryeyev	403f624bf8	SWDEV-486602 - Add tracking of HSA handlers Add an atomic counter to track the outstanding HSA handlers. Wait on CPU for the callbacks if the number exceeds the value in DEBUG_HIP_BLOCK_SYNC env variable. Change-Id: I95dc8c4bf0258c7e59411b7504220709ed6898c5	2024-10-25 15:20:50 -04:00
Julia Jiang	9f2f6a8aa7	SWDEV-488396,489257 - Fixed the regression in CTS pipes sub-test failure Change-Id: Id4004f0d6da5754b12c9a21038de50472cb1fee5	2024-10-25 05:58:46 -04:00
German Andryeyev	6bb7d1afdc	SWDEV-486602 - Fix Windows 32 bit build Windows alings fields to 8 bytes even with 32bit builds. Add BUG_CLR_SYSMEM_POOL to cotnrol sysmempool. Change-Id: I8622aabc9f7391ed7dd8583b252ce9eb41d62293	2024-10-18 11:35:54 -04:00
German Andryeyev	ad18146d8f	SWDEV-486602 - Change SysmemPool implementation - Remove the list of all chunks and use embedded chunk information in each allocation. That simplifies Free() logic, avoiding expensive loop if for some reason the number of outstanding allocations significantly grew. Change-Id: I9ea84d314320ce356ed24dd3180f262e2116c59b	2024-10-17 12:39:39 -04:00
German Andryeyev	8657a77029	SWDEV-491375 - Limit the SW batch size Applications may submit commands withoout waits for GPU. That causes a growth of SW unreleased commands. Make sure runtime flushes SW queue, if it grows over some threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE. Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396	2024-10-17 10:53:57 -04:00
German Andryeyev	364dfb0ed1	SWDEV-486602 - Optimize HSA callback performance - Don't generate callbacks for HIP events - Don't process profiling info in the callback for HIP events - Wait for CPU status update of the submitted commands every 50 calls. That will allow to drain the commands and destroy HSA signals. Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9	2024-10-11 14:50:25 -04:00
Todd tiantuo Li	41dc4545fc	SWDEV-472357 - support Rect copy with staging buffer for 2D & 3D memcpy in PAL Change-Id: Ie32f3e5a6fa077f6b2db20fc1ab1e2e0da8344cb	2024-10-10 18:00:19 -04:00
kjayapra-amd	74ebbe17e9	SWDEV-486573 - Check the return type of commit memory. Change-Id: Id158cd7a0dff37b382b858cf7113aa4cf326300a	2024-10-09 05:10:03 -04:00
Branislav Brzak	43fcac1739	SWDEV-482130 - Fix release of virtual mem obj Change-Id: I893a8353aa1a25d00e36c8e601caf31cc0fc1f22	2024-10-08 01:37:39 -04:00
Ioannis Assiouras	07bcc283f9	SWDEV-488851 - Correctly remove the queue from the active set on windows Change-Id: I4d21743ecf7a44636121f85566f898e62ff61e97	2024-10-02 12:06:59 +01:00
Anusha GodavarthySurya	742b0210d3	SWDEV-477324 - Capture Memcpy1D pinned H2D D2H Change-Id: I1f4744f20a9caeed005ec68da44e5fde737e09f7	2024-09-30 01:01:30 -04:00
Vladana Stojiljkovic	da5f1a6146	SWDEV-482086 - Fix hipGraphInstantiate leak * In a scenario where kernel is launched with hipExtLaunchKernelGGL and stop event is used, hipGraphInstantiate leaks. Since stop event is used, profiling is enabled and Timestamp (ReferencedCountedObject) is created, but it doesn't get released. * The idea behind this solution is that profiling should be disabled when command is captured, hence the timestamp should not be created. Because information about capturing isn't available when kernel command is created, packet capturing state is used to determine whether to create a timestamp or not. Change-Id: Ia23adac4592ded4fb5e236acf99e12e729f63692	2024-09-29 11:36:53 -04:00
Anusha GodavarthySurya	870842201d	SWDEV-485904 - Fix virtual,physical mem obj leaks Change-Id: Ie0456b5dcfec206ae54a6aabfc2a15a620cac693	2024-09-19 23:04:20 -04:00
kjayapra-amd	12a39fbf22	SWDEV-480772 - Remove name variable from amd::Monitor class. Change-Id: Ie2a4fa44f485786227230f8a892e090e718aa30e	2024-09-19 11:55:01 -04:00
Ioannis Assiouras	bcc545e6b8	SWDEV-476929 - Introduce an activeQueues set The new set tracks only the queues that have a command submitted to them. This allows for fast iteration in waitActiveStreams. Change-Id: I2c832eefa01280d9a87a5f57874d36d2e9441de7	2024-09-16 15:53:49 -04:00
kjayapra-amd	4ecd77df5e	SWDEV-484188 - Moving std::maps into struct const and into amd::Kernel class. Change-Id: Ie4d5a64511412fdb498b045aaffb52c3a1286de6	2024-09-15 09:14:51 -04:00
Chong Li	e6a5c81221	SWDEV-478929 - Benchmark ReallyQuickPureX Failed Ensure the member function Alloc() and Free() of command_pool_ will not be accessed after command_pool_ be destructed. Signed-off-by: Chong Li <chongli2@amd.com> Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194	2024-09-10 22:08:18 -04:00
Saleel Kudchadker	abc80fcc2f	SWDEV-301667 - Improve kernel logging Change-Id: I4b2b1950e3ab7124fd41af9a92a677c48d6da5eb	2024-09-10 13:43:58 -04:00
kjayapra-amd	6211037f63	SWDEV-439234 - Access check before memcpy and kernel operations. Change-Id: I7057125c03460db205409e19980145298c190fe2	2024-09-06 14:30:00 -04:00
taosang2	857d0d60b9	SWDEV-475144 - Fix random language string Fix random language string that leads to compiling failure of trap handler and TDR of hipMemset() on VM in release mode of hip-rt Change-Id: Ie1d874742b804f62ceda68064fa54f5d39c092b8	2024-08-20 17:42:31 -04:00
German Andryeyev	9db52f9a46	SWDEV-470612 - Add the optimized multistream path - Added the optimized multi stream path in graph execution. It uses a fixed number of async streams in the execution - Optimize the launch latency, where commands creation and execution is done at the same time - Optimize the scheduling to use less barriers and waiting signals if the same queue can be detected - The new path is controlled by DEBUG_HIP_FORCE_GRAPH_QUEUES environment variable, where 0 will use the original path and any other value will force the number of asynchronous queues for execution - DEBUG_HIP_FORCE_ASYNC_QUEUE can force single queue async execution in graphs(applicable for Navi families only) Change-Id: I7eb40bc15c45f508d6911868a6f6d4c3598d380e	2024-08-02 14:19:44 -04:00
Anusha GodavarthySurya	bd3a35bde1	SWDEV-468424 - Add support to capture multiple AQL Packets => Added support to capture multiple AQL Packets. => Added Interface to callback to hip runtime from rocclr to allocate kernel args from the graph kernel arg pool. => Enabled Support to capture memset node. Change-Id: I7e1c2ba06927459e024653058af142bd82192c43	2024-08-01 23:55:51 -04:00
Ioannis Assiouras	8e137e8702	SWDEV-476460 - Fix for a race condition in SysmemPool::Alloc Change-Id: Ia94709e68b236c9460589963c0f09ec1f481c306	2024-08-01 04:22:26 -04:00
Ioannis Assiouras	9b33db9b24	SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary hipDeviceSynchronize called from __hipUnregisterFatBinary accesses static maps and monitors. This change ensures these ojects are not destroyed before __hipUnregisterFatBinary is called. Additionally it disables the teardown process for static build. Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d	2024-07-30 10:26:59 -04:00
Anusha GodavarthySurya	346da4bb40	SWDEV-468424 - hipgraph capture memset node Capture AQL packets during GraphInstantiation and enqueue AQL packets during graph launch. Added support to capture single graph memset node. Capture support for memset node is currently disabled. Memset capture will be enabled when capture for multiple packets are supported.. Change-Id: I14dfbc41731025cc3a548a730558915def3fa384	2024-07-19 23:52:50 -04:00
Tao Sang	73c02041e1	SWDEV-458943 - Implement std::mutex based monitor Implement std::mutex based monitor that has much simpler logics than legacy monitor. Create DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR to toggle them. If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = false (by default), use legacy monitor; If DEBUG_CLR_USE_STDMUTEX_IN_AMD_MONITOR = true, use std::mutex based monitor. If no perf drop of stl::mutex based monitor, legacy one will be removed later. Change-Id: I1d21368ff462477d3238d71e4e2a1a7d6b9167ad	2024-07-04 11:50:46 -04:00
Saleel Kudchadker	561fb8a459	SWDEV-470008 - Fix AMD_SERIALIZE_KERNEL - awaitCompletion code may do a endless spin wait for cases where we dont submit a handler. One such case can be the hipExt*Launch API which takes a stop event. In that case we optimize the stop event by attaching a signal to the dispatch packet but dont submit a handler when we attach the signal. That means if awaitCompletion() is called after that, we would keep on waiting on command status on the host rather than simply checking signal value. Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283	2024-07-02 19:05:05 -04:00
taosang2	749385155a	SWDEV-467540 - Get lastCommand safely We must be in protected way to get last command when calling awaitCompletion() where lastCommand will be released and possibly destroyed. This can solve scope lock(notify_lock_) crash in Event::notifyCmdQueue() with AMD_DIRECT_DISPATCH = true. Change-Id: I4297166f912a71112f4a8945d993160ba9afdc34	2024-06-28 21:18:22 -04:00
taosang2	544c45364f	SWDEV-467540 - Fix reference of freed locks 1.Move global amd::monitor listenerLock before global class runtime_tear_down as it will be referenced in ~RuntimeTearDown() after main(). It should be freed later than runtime_tear_down. 2.Update Device::~Device() to SVM free coopHostcallBuffer_ before context_ is released and freed. Change-Id: I1d21378ff463477d3238d71e5e2a1a7d6b9147ad	2024-06-18 13:58:36 -04:00
Ioannis Assiouras	d44f44a5b1	SWDEV-467069 - Added safety check in activity prof for accumulate command Adding a safety check prevents an invalid memory access if timestamps and kernelNames vectors are of different size. The patch also moves the addKernelNames for the accumulate command into dispatchAqlPacket function. Change-Id: Iea0927e1253800403a1ae3f3d72de1e7d96476c3	2024-06-12 21:53:03 +01:00
Ioannis Assiouras	3edf1501cc	SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1	2024-06-12 16:22:27 -04:00
German Andryeyev	ae2992ea43	SWDEV-459610 - Skip destruction for the child process Fork() duplicates all system memory resources, but runtime can't duplicate GPU resources. Thus, avoid tearDown() calls for the child process(s). Change-Id: Id6b12bacd5112b9ad3747c218e09cba98ea1b42c	2024-06-12 11:12:39 -04:00
Ioannis Assiouras	775dc204aa	SWDEV-463865 - changed device,roc and pal namespaces to be nested under amd Change-Id: Icad342843c039c634e249a13a7aa31400730b1dd	2024-06-07 12:23:06 -04:00
kjayapra-amd	892071aeb2	SWDEV-460948 - Changes to alloc, set, capture under single function. Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543	2024-06-06 16:57:53 -04:00
Ioannis Assiouras	b8c2ac4de4	SWDEV-463865 - symbol renamings to prevent conflicts in static build Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb	2024-06-06 04:05:55 -04:00
Saleel Kudchadker	ecff928284	SWDEV-463428 - Acquire correlation ID after clear Change-Id: I472085178d5751f5e2c8a6dfe190b6b3249317f0	2024-06-06 03:49:01 -04:00
Gu, Wangfeng	0c6a952a90	SWDEV-460019 - [OGLP][Nv2x] DaVinci Resolve Studio: Crash observed when editing in color tab When CL-GL interop is used, a GL context are used by two or more threads at the same time, which causes race condition. Solution: Add lock when accessing GL functions during CL-GL interop. Change-Id: I3a34da3cbdf74c401111cc4e3a04ad84cc52709e	2024-06-04 16:35:44 -04:00
Ioannis Assiouras	d6eaf49033	SWDEV-460925 - Do awaitCompletion before releasing the lastEnqueueCommand Change-Id: I210399dd1bced13c0923fdb1c215e044920c5a4b	2024-05-28 06:31:10 +00:00
Saleel Kudchadker	badf2b0880	SWDEV-301667 - Refactor graph code - Remove Last graph node optimization and instead submit a barrier NOP packet always. This simplifies the code. Change-Id: Ied443173ba47a08b6df148ac7e3ead712acda11c	2024-05-28 06:28:17 +00:00
Satyanvesh Dittakavi	3d540ec113	SWDEV-457755 - Add TS only for kernel packets in the Accumulate command Change-Id: I1b2f01c5763761808f49802fa117abc6306a22aa	2024-05-28 06:28:17 +00:00
Saleel Kudchadker	51e4368723	SWDEV-459778 - Remove CPU wait for profiler - No cpu wait is needed when profiler is attached, Doing this changes the application profile when roctracer is attached. Change-Id: I2b9cfc48d697cf5ed54bb6a240d8c12bdb079171	2024-05-28 06:28:17 +00:00
German Andryeyev	5b0bfdcbad	SWDEV-460242 - Add system memory suballocator Switch commands creation to the new suballocator to avoid frequent expensive OS calls Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400	2024-05-28 06:28:17 +00:00
Saleel Kudchadker	4a9d24a211	SWDEV-301667 - Pass reference to kernel name Change-Id: I21abe109ddfabfe7640bf78a96c81a1317d31952	2024-05-05 16:38:20 -04:00
Jaydeep Patel	1d48f2a1ab	SWDEV-456279 - Adding new hip flag to access contiguous memory and pass the flag to HSA API. Change-Id: I1bafeaa3096395c729723af958d609bc41e7845c	2024-04-30 05:25:38 -04:00
kjayapra-amd	0e1a0572e6	SWDEV-413997 - Changes to use GlobalContext in views. Change-Id: I1f8411eae9ed49632667e244a25f223fed92c720	2024-04-29 16:41:39 -04:00
German Andryeyev	5c1804aa14	SWDEV-353281 - Corret VA unmap Make sure graph mempool unmaps VA on release Change-Id: Id3f1bd8d0115b533ae60aa5ba3676b8bf7e5b961	2024-04-26 09:37:01 -04:00
German Andryeyev	0ccdb3e160	SWDEV-440746 - Release last command on terminate Change-Id: Ib6a9b8fc9a8692eb17b39b854cefd92c6b59733f	2024-04-22 09:57:38 -04:00
German Andryeyev	7448113cfc	SWDEV-440746 - Remove obsolete code The "optimized" version of memcpy is outdated and was used in win32 only. Change-Id: I7f2e0e9051e37cec95438266824b5b0025c324c6	2024-04-22 09:56:42 -04:00
German Andryeyev	fd81490bb8	SWDEV-440746 - Don't set CL_SUBMITTED twice Change-Id: I9ba34454f7487d6bc0d398b322a147cbac6c6443	2024-04-19 17:36:51 -04:00
German Andryeyev	c95a75a2bf	SWDEV-444670 - Enable teardown class Force implicit runtime teardown with a global destructor. Change-Id: Iabe63dedf5b94fefc98668585c45a61607120669	2024-04-16 12:00:06 -04:00

1 2 3 4 5 ...

321 Commits