- Store last fence scopes and use the last value to determine if we need a cache flush again. This helps cases where hipExtLaunchKernel API is
used.
- Purge code for ROC_EVENT_NO_FLUSH
Change-Id: I531cf9c9c60d5e2b3a9e265d0f52f79ed2fa8a8c
[ROCm/clr commit: 9b5cbd37a2]
Remove the activity_prof::CallbacksTable. The table was redundant with
the information already stored in the roctracer library. Instead use a
single callback into the roctracer library to query whether the activity
is enabled, and to report it.
Change-Id: I2e05b0881bb4a1953c14361d00ea310d02eb6e0c
[ROCm/clr commit: 52eb28930a]
Profiling should be enabled for any command reporting activities as the
activity record captures the profilingInfo's start and end timestamps.
Since IS_PROFILER_ON is only used to determine whether API tracing is
enabled, there is no need to expose it globally, it should be a property
of the activity_prof::CallbacksTable.
Change-Id: I44a0d19ed2862606cfbc9a98c1a07a336ab7e26c
[ROCm/clr commit: e713b5c7d0]
The activity_ is only instantiated if profiling is enabled.
Remove the HIP private global record ID. Instead, use the correlation ID
stored in the hip_api_data_t by the profiler while the last HIP function
is in scope.
For NDRange and Copy commands, store the kernel name and byte size
(respectively) in the record.
General cleanups to improve the code's readability.
Change-Id: I01907484b0d9611eb9440c3a7c4865479dc42289
[ROCm/clr commit: 4fbae91468]
- In case of HMM, use blit kernel instead of CPU memcpy for hipMemset
Change-Id: I89bfc96ff01a2375ed8df1b1c6bc05357dea84f7
[ROCm/clr commit: f097cda948]
The fix for SWDEV-329789 moved down the last use of the a
command object pointer in order to prevent a race condition.
However, the previous patch did not move down the release of
that command. By releasing the command early, another thread
could get a command with the same pointer. That second thread
could later submit work to the queue using that new command.
The first thread could then perform a comparison against the
queue's last command using its own now-stale pointer. This
could eventually allow the second thread to skip synchornizing
on the queue. This would result in host synchronizations
completing before their device work was actually complete.
Change-Id: I292b7b369743251ceafe453a4c5cae14a6d01046
[ROCm/clr commit: 6b956f7627]
assert statement were hit silently in Release builds like SWDEV-353548.
Issue only seen in Debug builds
Change-Id: I9f7177806c854d64fcf986e9f6092076d8a05f23
[ROCm/clr commit: 040c416cb1]
HIP_FORCE_QUEUE_PROFILING has been replaced by GPU_FORCE_QUEUE_PROFILING.
Change-Id: Ic32ecdf829a2725ace84e76abab8a81c8790e13f
[ROCm/clr commit: dd49cf0fa0]
If the execution command had a split into multiple HW operations, then runtime has to accumulate time for all operations
Change-Id: Iaba31e96250918d8190bf63adb4c07730fdfefbf
[ROCm/clr commit: 24f5362296]
To support both hip and ocl. HIP_FORCE_QUEUE_PROFILING will be replaced with this later on.
Change-Id: I6d3514b1568ff049584ed9fd74bbdb3e4f4bf0c3
[ROCm/clr commit: d92b3a2d90]
- rocr attribute needs to be updated after each iteration
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I3afb2d7954ef3de37f5f5f9d3cc7757fdacffcec
[ROCm/clr commit: 50e0ddb055]