rocm-systems

Autor	SHA1	Mensaje	Fecha
German Andryeyev	8657a77029	SWDEV-491375 - Limit the SW batch size Applications may submit commands withoout waits for GPU. That causes a growth of SW unreleased commands. Make sure runtime flushes SW queue, if it grows over some threshold, controlled by DEBUG_CLR_MAX_BATCH_SIZE. Change-Id: Ia4d85c24210ef91c394f638ab6b53b14323a0396	2024-10-17 10:53:57 -04:00
German Andryeyev	364dfb0ed1	SWDEV-486602 - Optimize HSA callback performance - Don't generate callbacks for HIP events - Don't process profiling info in the callback for HIP events - Wait for CPU status update of the submitted commands every 50 calls. That will allow to drain the commands and destroy HSA signals. Change-Id: Ib601a350e7e7c2b6c6209a172385389baccf73a9	2024-10-11 14:50:25 -04:00
Chong Li	e6a5c81221	SWDEV-478929 - Benchmark ReallyQuickPureX Failed Ensure the member function Alloc() and Free() of command_pool_ will not be accessed after command_pool_ be destructed. Signed-off-by: Chong Li <chongli2@amd.com> Change-Id: Ic2d36423302518a030bd61fa399290ebe2ed8194	2024-09-10 22:08:18 -04:00
Ioannis Assiouras	9b33db9b24	SWDEV-472309 - Ensure static maps are destroyed after __hipUnregisterFatBinary hipDeviceSynchronize called from __hipUnregisterFatBinary accesses static maps and monitors. This change ensures these ojects are not destroyed before __hipUnregisterFatBinary is called. Additionally it disables the teardown process for static build. Change-Id: I46b58641d60efcf6637a8e99cdd786ffe9e2c77d	2024-07-30 10:26:59 -04:00
Saleel Kudchadker	561fb8a459	SWDEV-470008 - Fix AMD_SERIALIZE_KERNEL - awaitCompletion code may do a endless spin wait for cases where we dont submit a handler. One such case can be the hipExt*Launch API which takes a stop event. In that case we optimize the stop event by attaching a signal to the dispatch packet but dont submit a handler when we attach the signal. That means if awaitCompletion() is called after that, we would keep on waiting on command status on the host rather than simply checking signal value. Change-Id: Ie8bf175aeefa3f9e4299b1ae7ae9108dad67e283	2024-07-02 19:05:05 -04:00
Ioannis Assiouras	3edf1501cc	SWDEV-463865 - namespace changes to prevent symbol conflicts in static builds Change-Id: I09ceb5962b7aa19156909f47167c87d6887c9cd1	2024-06-12 16:22:27 -04:00
kjayapra-amd	892071aeb2	SWDEV-460948 - Changes to alloc, set, capture under single function. Change-Id: I7b2d40e99e812b97c53535c5e63c41ad64a8f543	2024-06-06 16:57:53 -04:00
Ioannis Assiouras	b8c2ac4de4	SWDEV-463865 - symbol renamings to prevent conflicts in static build Change-Id: Id7fbb638c1088c23df52fee877cd790d637b1ffb	2024-06-06 04:05:55 -04:00
Saleel Kudchadker	ecff928284	SWDEV-463428 - Acquire correlation ID after clear Change-Id: I472085178d5751f5e2c8a6dfe190b6b3249317f0	2024-06-06 03:49:01 -04:00
German Andryeyev	5b0bfdcbad	SWDEV-460242 - Add system memory suballocator Switch commands creation to the new suballocator to avoid frequent expensive OS calls Change-Id: I3597c811820e577c15708bad8b8a41aa53acc400	2024-05-28 06:28:17 +00:00
German Andryeyev	fd81490bb8	SWDEV-440746 - Don't set CL_SUBMITTED twice Change-Id: I9ba34454f7487d6bc0d398b322a147cbac6c6443	2024-04-19 17:36:51 -04:00
Saleel Kudchadker	c157bfb202	SWDEV-301667 - Create TS for each node recorded in graph - Create a vector to allow multiple TS to be stored in Command. - This would mean we dont wait for entire batch in Accumulate command to finish when we exhaust signals. - Reduce the number of signals created at init to 64. This min value may still need to be tuned but the KFD allows max of 4094 interrupt signals per device. - Store kernel names whenever they are available and not just when profiling. If we dynamically enable profiling like for Torch, a crash can happen if hipGraphInstantiate wasnt included in Torch profile scope beacuse we previously entered kernel names only when profiler is attached. Change-Id: I34e7881a25bbc763f82fdeb3408a8ea58e1ec006	2024-03-26 14:47:24 -04:00
Saleel Kudchadker	9a6ddae7b2	SWDEV-301667 - Reset profiler correlation_id_ - The correlation_id had random junk values which we were inserting in the dispatch AQL packet even when no profiler was attached but if we had a valid timestamp. - Also make sure we dont even write the reserved2 field in the AQL packet if no profiler attached. Change-Id: Icdb7493198c1bb5e2d786a97e027288660854cd7	2024-02-05 05:08:11 +00:00
Saleel Kudchadker	f5c6fc4dfa	SWDEV-422207 - Report TS for Accumulate command Change-Id: Iba193a6068c1a2d25c2136643faee2c1e2591a07	2023-11-07 18:19:40 +00:00
Saleel Kudchadker	40f41f4d0b	SWDEV-422207 - Track commands for capture - Track all captured commands under a new AccumulateCommand - Add begin() and end() methods to capture commands - Explicit TS object now passed to certain methods because profilingBegin() and profilingEnd() now happen separately and thus can run into threading issues Change-Id: I171106bdcad72b057836cb2f3fc398db3533119f	2023-11-03 05:09:04 +00:00
Saleel Kudchadker	1338ff37e8	SWDEV-301667 - Cleanup unused paths - Refactor code and cleanup logic for callback saving for event records Change-Id: I5c56aa8e9c968a5bca70fb07ad1796da318e9e89	2023-11-02 11:43:41 -04:00
jiabaxie	28f0daa34f	SWDEV-405983 - adding in HIP_LAUNCH_BLOCKING Change-Id: I3f9c8a745099aab05155ebe910e727693961a02f	2023-10-10 21:11:13 -04:00
Anusha GodavarthySurya	e63c280d4d	SWDEV-422207 - Capture AQL Packets for graph Kernel nodes during graph Inst. And enqueue AQL packet during launch Change-Id: I1e5f7f9e2a70bd500d190193cb6ba0867f5a63e7	2023-10-05 00:34:29 -04:00
German	7be3a5e33e	SWDEV-407533 - [ABI Break]Remove Wavelimiter Change-Id: I6a2f6fb5a0c3acea93fa0200a69679783e76f5bd	2023-09-07 09:58:41 -04:00
Anusha GodavarthySurya	b0e6f99ad7	SWDEV-392732 - Initial commit for graph doorbell optimization(AQL Buffering) Change-Id: I451725006c54c249dc530c55d2af2a31594bf49b	2023-07-16 07:56:00 -04:00
Ioannis Assiouras	2e9f6fb49b	SWDEV-385050 - Fixed possible invalid queue access from kernelCommand::releaseResources Change-Id: I7c5d99987cb7ab4fa0aa634f2bb6a4d60331b3af	2023-02-23 16:39:27 +00:00
Saleel Kudchadker	3e603d986a	SWDEV-364604 - Add ROCclr support for hipEventDisableSystemFence Change-Id: I6127b432a8759359359a1890fda85bc401be6a56	2023-02-21 19:07:35 -05:00
German	53a10c9039	SWDEV-377991 - Remove liquidflash support Change-Id: Iba6455e5c0210c3223a06fec332404cd9f489154	2023-01-20 09:57:06 -05:00
German	c8927cd84e	SWDEV-377991 - Remove Liquidflash extension Initial check-in to untie dependencies with HIP and OCL repos Change-Id: I363b63954c3f118f40a6ed893545d6a4ac44144c	2023-01-18 13:16:20 -05:00
Sourabh Betigeri	5d7f3f9f3c	SWDEV-305894 - Cooperative groups grid and multi grid sync support for gfx940+ Change-Id: I35d72f1cb50c3a96eee56a612b72d641852b145f	2022-12-05 16:30:30 -05:00
Laurent Morichetti	52eb28930a	SWDEV-351980 - Consolidate registration tables in the roctracer library Remove the activity_prof::CallbacksTable. The table was redundant with the information already stored in the roctracer library. Instead use a single callback into the roctracer library to query whether the activity is enabled, and to report it. Change-Id: I2e05b0881bb4a1953c14361d00ea310d02eb6e0c	2022-09-21 05:54:09 -04:00
Laurent Morichetti	e713b5c7d0	SWDEV-351980 - Enable profiling for commands reporting activities Profiling should be enabled for any command reporting activities as the activity record captures the profilingInfo's start and end timestamps. Since IS_PROFILER_ON is only used to determine whether API tracing is enabled, there is no need to expose it globally, it should be a property of the activity_prof::CallbacksTable. Change-Id: I44a0d19ed2862606cfbc9a98c1a07a336ab7e26c	2022-09-21 05:53:59 -04:00
Laurent Morichetti	4fbae91468	SWDEV-351980 - Move activity_ to the ProfilingInfo The activity_ is only instantiated if profiling is enabled. Remove the HIP private global record ID. Instead, use the correlation ID stored in the hip_api_data_t by the profiler while the last HIP function is in scope. For NDRange and Copy commands, store the kernel name and byte size (respectively) in the record. General cleanups to improve the code's readability. Change-Id: I01907484b0d9611eb9440c3a7c4865479dc42289	2022-09-21 05:53:47 -04:00
Anusha Godavarthy Surya	7b1c6d06d5	SWDEV-345683 - Fix HIP out of memory If for every eventRecord handler is not submitted, memory is not getting released during hipFree and leads to OOM. Change-Id: I19b61a0c523502e9e1a3564ce8b791f3e2cea02c	2022-07-28 07:36:38 -04:00
Ajay	236178d0d4	SWDEV-337331 - command queue logs for debugging option Change-Id: I198aecc5fd12369d87d4acc9910acc9435c1967a	2022-06-22 19:41:38 +00:00
Sarbojit Sarkar	356e22f910	SWDEV-325379 - Fix for remote copy crash Change-Id: I22152c0b3538cf7cfc80f82505bc255c01d98f7b	2022-06-16 23:59:11 -04:00
Saleel Kudchadker	02566677cf	SWDEV-334152 - Set release as systemscope Set release scope as system for dispatch AQL when events are passed to hipLaunchKernelGGL Change-Id: I93b91591e0ab023f1ecc5247f7905eca26147358	2022-04-29 13:19:29 -04:00
Saleel Kudchadker	fa76f03654	SWDEV-334150 - Force callback to cycle commands Enqueue a handler callback for hipEventRecords(aka marker_ts_) for every 64 submits, This recycles the memory if we dont end up calling synchronize for the longest time. Change-Id: I3d39fe76d52a5d81387927edd85b5663b563682c	2022-04-28 12:30:23 -04:00
Saleel Kudchadker	b6cbfaf499	SWDEV-301667 - Separate scope from marker_ts_ Change-Id: I19f4d394e898bfb8c9d9a2c2edf9d5bf5def3b08	2022-04-16 19:26:31 -04:00
Saleel Kudchadker	8eeaa998c0	SWDEV-301667 - Add cache state for a device - Add a global cache state for a device to indicate scopes of submitted AQL packets - Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records. It would flush caches by default with system scope release. - Calling finish() should ensure if caches are flushed, if not queue a marker Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0	2022-04-12 12:27:31 -04:00
Saleel Kudchadker	3c3c0ca4c5	SWDEV-301667 - Selectively queue handler - Queue handler for hipEventRecord(aka marker_ts_) only if there is a callback associated with it. Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a	2022-03-24 19:46:28 -04:00
haoyuan2	439af94dd9	SWDEV-290298 - add a flag to indicate the primary context active status Change-Id: Ia31790706d3f855bc1eedf5ef874e471	2021-12-09 23:28:54 -05:00
Sarbojit Sarkar	aedbad0109	SWDEV-314254 - Fix for hipMemcpy3D test crash Change-Id: Iac70bfe0d351cfb5b56fefc9a6487d3f26f2b4ef	2021-12-09 11:46:52 -05:00
Sarbojit Sarkar	2afeacc858	SWDEV-310181 - Fix for mGPU dtest failure Change-Id: Id0898bd45e23f2d637bef25a3e69f26d9dc40785	2021-11-22 01:01:47 -05:00
German Andryeyev	7e12cf6318	SWDEV-257789 - Initial change to skip kernel arg copy The optimization is controlled with ROCR_SKIP_KERNEL_ARG_COPY. This is initial check-in for experiments. Extra changes are necessary for full support: - handle graph capture with the original sysmem alloc - avoid memobject references, otherwise there is a race condition with reusage of the arg buffer - Remove arg setup from hip Change-Id: Ib0af710f93e79834711fa4049a7c66093711e68b	2021-10-28 20:35:35 -04:00
Vladislav Sytchenko	d934612948	SWDEV-1 - Prepare for c++17 switch std::mem_fun() and std::bind2nd() are removed in c++17. Switch to simpler logic that does not require those functions. Change-Id: I19a31f076e1813e367615bd377b424046ce144c7	2021-09-08 16:18:33 -04:00
German Andryeyev	ff15c0893e	SWDEV-292018 - Switch to internal signals for markers Add ref counting to ProfilingSignal class to track the last release. If a signal was used in the marker, then don't reuse it, but create a new one for internal usage. Don't rely on HSA callback for the command status update if there are no pending dispatches. Change-Id: I19f14ed9d80acfe79993b343b2187635f8428a20	2021-08-22 23:56:07 -07:00
German Andryeyev	f34c1b9ff8	SWDEV-292820 - Add a new notify lock HSA signal calback may occur during the actual marker submit. That may cause a deadlock, because shared lock_ object. Create the new notify_lock_ field to protect the notification. Change-Id: I9752af84e59895530620fac3932c6fc276de8658	2021-08-22 23:56:07 -07:00
agunashe	d96481fb36	SWDEV-293742 - Update copyright end year VDI repo Change-Id: I69d2fea4a7a43adf96ccea794270e4af991c5261	2021-08-22 23:56:07 -07:00
German Andryeyev	ce8dad2ecc	SWDEV-290160 - Switch to global HSA signals Runtime can't assign internal HSA signals for HIP events, because HIP application can destroy the HIP stream or signal reuse may occur internally. Switch to global HSA signals for HIP events. Change-Id: Ieaea2d6b039e492b2e7c5112782a8f4e601e50a1	2021-08-22 23:56:07 -07:00
Christophe Paquot	133287f31f	SWDEV-240806 - Release resources in Command::terminate for HIP We do not want to release resources during setStatus in HIP because of Graphs Change-Id: Idc7b188ab5f8be6975ea91005dd2bbf177401f8c	2021-08-22 23:56:07 -07:00
German Andryeyev	c49f1069ab	SWDEV-290160 - Don't send notification for batch markers Batch marker already has a barrier with HSA signal callback Change-Id: I69fc63d72320c2e9cc2d2e59ebd3f07c0bd0e3b5	2021-08-22 23:56:07 -07:00
German Andryeyev	85c70a7495	SWDEV-284671 - Add HW event wait to improve hipDeviceSynchronize If AMD event contains a reference to a HW event, then runtime could check/wait for HW event. CPU status update will occur later after HSA signal callback, but it's not important for the result. Change-Id: I591391a953bbdba6a25ac07e2cd98aeb17cd4596	2021-08-22 23:56:07 -07:00
Saleel Kudchadker	9d0846e732	SWDEV-286092 - Enable handler for marker always For DD, send a NOP packet so that we leverage the handler to indicate completion. Change-Id: Ie57ea0124a8497d39cc49da1c4575c2cd86b9319	2021-08-22 23:56:07 -07:00
German Andryeyev	fa2e154a8b	SWDEV-278894 - Use GPU waits for HIP events Save HW events in amd::Event. Use HW events for synchronization Change-Id: I98cf9c2d0ec3c7fcaf254b749ac6c568d7270ae0	2021-05-25 13:41:15 -04:00

1 2

91 Commits