Use large signal pool if profiler is connected or profiling forced
enabled. This is needed to mitigate signal creation overhead when
profiling as signals are attached to every packet and deeper batch may
show overhead of signal allocation.
Change-Id: I8034b8a20b55328b87d593bf044f59672f9653e8
HIP_FORCE_DEV_KERNARG=1 will create a device allocation for kernel arg
segment. Flag is 0 by default.
Change-Id: Iaaf5a149f3be8596568878d5d272268baf067c60
- Introduce a state variable to indicate if HwProfiling is enabled to
eliminate a possible data race of vector<> signals_.
Change-Id: Id504cc76d7fa9f7e6455587dd232b60ccbbb735b
Heap initialization used device queue, but it shoudl be used for
cooperative launches only. Heap initialization must use the same queue
as the current dispatch.
Change-Id: I856621bf82bbdeb1c2d0fbc4970e90d09af805cb
Scheduler in device queue requires relaunching itself. Make sure
scheduler uses exactly the same AQL packet as the host launch.
Change-Id: I4eb03c4c91bf2408a6d4607731f081a2e2c2c8ae
- Use correct header for vendor packet
- Pass one dependent signal when submitting a marker if there is one
Change-Id: I4efc70dd5204b559de26f899d0637f50421c8834
- Use a dirty flag to determine fence optimization
- If fence is dirty submit a marker at top level to sync.
Change-Id: I53fb19b5bb05b7c7b37c41637a6c7aaf870b639a
- Store last fence scopes and use the last value to determine if we need a cache flush again. This helps cases where hipExtLaunchKernel API is
used.
- Purge code for ROC_EVENT_NO_FLUSH
Change-Id: I531cf9c9c60d5e2b3a9e265d0f52f79ed2fa8a8c
Remove the activity_prof::CallbacksTable. The table was redundant with
the information already stored in the roctracer library. Instead use a
single callback into the roctracer library to query whether the activity
is enabled, and to report it.
Change-Id: I2e05b0881bb4a1953c14361d00ea310d02eb6e0c
If the execution command had a split into multiple HW operations, then runtime has to accumulate time for all operations
Change-Id: Iaba31e96250918d8190bf63adb4c07730fdfefbf
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.
Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928
Move hidden heap creation to the kernel launch to make sure it's
allocated on the actual first usage.
Change-Id: I1b65a82fc06d9129ed45a69765bf14ea3d945b04
Disable hostcall buffer in OCL for now. COv5 can add hostcallbuffer
metadata for unknown reason. OCL may fail the buffer allocation
and kernel launch.
Change-Id: I34a6a45bac86c57422b764c0d69760c96920d6c5