KMD can enable hipHostRegister optimization with HMM path.
That will make CPU and GPU pointers matching.
Change-Id: Iad96ceada5cfa3bada20452b906f744f9dbaebbe
Fix crash in hipGraphKernelNodeSetParams where paremeters
are taken from hipGraphKernelNodeGetParams.
Change-Id: I2216f72f4d4de6dd3766343b0d821cb3d35d7853
With opaque pointers, the suffix of those intrinsics changed. This caused
build failures that should be solved by just using the corresponding
Clang builtins instead of using intrinsics directly
See SWDEV-356581
Change-Id: Icd1d9b9438cac4bef0f7c52d4cd341ac76500890
Remove extra barrier, since ROCR backend in DD mode blocks HW queue now when a callback is injected
Add a notificaiton for MT mode about possible waiter for a callback
Change-Id: Ifd70ce5597e1ba868e4197ad1850ace11a4f90ae
When printing HIP API function parameters, use the integer format to
print 'char' arguments to avoid printing invalid ASCII characters
(value > 127).
Make sure the roctracer::hip_support::detail operator<< overloads are
used when printing arguments.
Change-Id: Id072c2ed19b1b4166108599e393d1cae6c54b6b0
With file reorganization changes HIP is installed in /opt/rocm-ver
Using HIP path as /opt/rocm-ver/hip will generate warnings
Use real path with symlinks resolved, so that HIP path always points to the installed path
Removed redundant code for finding hsa header files. find_dependency to hsa-runtime should handle hsa dependency
Change-Id: Iccea3c1c7297c705244bf752f38fbff71929d64c
Fix the following error:
hip_intercept.cpp:52:7: error: reinterpret_cast from 'const void *' to 'decltype(activity_prof::report_activity.load())' (aka 'int (*)(activity_domain_t, unsigned int, void *)') casts away qualifiers
reinterpret_cast<decltype(activity_prof::report_activity.load())>(function),
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
by replacing the 'const void *function' argument with the correct type.
Change-Id: I859d45ee01b7aaa1e46563cdc37de57b4159d330
- Aggregate all TLS(Thread Local Storage) variables into a single class
- This is to improve cache accesses per thread
Change-Id: Ic8361eaeae290fff00254684e309471958365eb9
For hipGraphKernelNode, remove func_;
and reorganize functions to naturely support mGPU;
For hipGraphMemcpyNode, make EnqueueCommands() support different
queues' sync
Change-Id: I22708923f454adf4456ff99d25559daffed8c20d
To avoid using the thread local std::stack to remember the phase enter
timestamp, the tracer tool uses the phase data to store the timestamp.
Change-Id: I9e95637b41d6f0b2bd61016062ca07d6ba897652
Remove the api_callbacks_table_t that was holding the API activities and
user callbacks. Instead use a single roctracer callback (TracerCallback)
used to report both API activities and callbacks.
Remove the hipInitActivityCallback that was setting the ROCtracer
callback and memory pool for asynchronous activities as it did not
allow disctinct pools to be used for each activity. Instead, use
hipRegisterTracerCallback to set the single roctracer callback.
Change-Id: I4c10f04f29a6e4cce8caf15db3016c3f72c86b04
The CallbacksTable::is_enabled() can simply be implemented by checking
if enabled_api_count is > 0. The ROCclr does not use IS_PROFILER_ON
to report asynchronous activities.
Change-Id: Iab3d034357e51282bf2c453b2ac5c9726786b9eb
Since the hip_api_data and record are only needed at the HIP function's
scope, there is no need to allocate/free them in the ROCtracer activity
callback, they can reside on the HIP function's stack frame.
This solves an issue with the thread local stacks of records the tracer
maintains that are destroyed first (before any global destructor) on
process exit, making it impossible to use HIP functions in global
destructors when the profiler is enabled.
Change-Id: Ib1d70124d009a44dc1f08d41edff95e5f9f84369