Heap initialization used device queue, but it shoudl be used for
cooperative launches only. Heap initialization must use the same queue
as the current dispatch.
Change-Id: I856621bf82bbdeb1c2d0fbc4970e90d09af805cb
Scheduler in device queue requires relaunching itself. Make sure
scheduler uses exactly the same AQL packet as the host launch.
Change-Id: I4eb03c4c91bf2408a6d4607731f081a2e2c2c8ae
- Use correct header for vendor packet
- Pass one dependent signal when submitting a marker if there is one
Change-Id: I4efc70dd5204b559de26f899d0637f50421c8834
- Use a dirty flag to determine fence optimization
- If fence is dirty submit a marker at top level to sync.
Change-Id: I53fb19b5bb05b7c7b37c41637a6c7aaf870b639a
- Store last fence scopes and use the last value to determine if we need a cache flush again. This helps cases where hipExtLaunchKernel API is
used.
- Purge code for ROC_EVENT_NO_FLUSH
Change-Id: I531cf9c9c60d5e2b3a9e265d0f52f79ed2fa8a8c
Remove the activity_prof::CallbacksTable. The table was redundant with
the information already stored in the roctracer library. Instead use a
single callback into the roctracer library to query whether the activity
is enabled, and to report it.
Change-Id: I2e05b0881bb4a1953c14361d00ea310d02eb6e0c
If the execution command had a split into multiple HW operations, then runtime has to accumulate time for all operations
Change-Id: Iaba31e96250918d8190bf63adb4c07730fdfefbf
Maintain status of handler callback. For event records we no longer
submit callbacks to reduce the load on the async handler thread. However
without a callback we leak command memory/decrement refcounts. Indicate
status of the handler which we can use to queue a callback when
finish is called.
Change-Id: I89fd02f3d047a0e8162664ee17581a14795f1928
Move hidden heap creation to the kernel launch to make sure it's
allocated on the actual first usage.
Change-Id: I1b65a82fc06d9129ed45a69765bf14ea3d945b04
Disable hostcall buffer in OCL for now. COv5 can add hostcallbuffer
metadata for unknown reason. OCL may fail the buffer allocation
and kernel launch.
Change-Id: I34a6a45bac86c57422b764c0d69760c96920d6c5
- check pcie atomci support for printf functionality
- if not enabled printf wont work
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: Ib366e8e71772b02210c4a830bca4bd8cc7a11664
- Add a global cache state for a device to indicate scopes of submitted
AQL packets
- Remove scopes for TS marker if hipEventReleaseToDevice is passed. Set
env ROC_EVENT_NO_FLUSH=1 to use NOP AQL for event records.
It would flush caches by default with system scope release.
- Calling finish() should ensure if caches are flushed, if not queue a
marker
Change-Id: Ibbbdbb1cd7ac61cb35649169212142545be159e0
Remove assert for kernel arg size, because COv5 reports a value
bigger than the actual usage in the most of cases
Change-Id: I8e15bc45a9e21b58a5894f9977511ca84408ce61
- Fix a crash with AMD_CPU_AFFINITY=1 as numa_bitmask_alloc isnt the
right api to allocate bitmask
- Do not set affinity for ROCr thread. It worsens performance rather
than any improvement.
- Fix regression from my previous change for event handler.
Change-Id: I3ea75adc2a6333f29752283eddd5b555e9b58cc5
- Queue handler for hipEventRecord(aka marker_ts_) only if there is a
callback associated with it.
Change-Id: I8a9877ae0e342556053abbaacc9510744a8e772a