[SWDEV-418917] reported that timing skew was being introduced by
roctracer. Most of the cause of this problem seems to stem from outrunning
the double buffering scheme that we use in memory_pool (part of the
reason for this outrun is due to File writing being slow). A semi-quick
fix that may be able to last until RocProf v2 is complete is to allow
adjustment of the buffer size. ROCTRACER_BUFFER_SIZE env variable was
introduced here which allows setting the buffer size of tracer tool.
By increasing the buffer size, an ~8% reduction in execution time when timing
on the program side. This should also reduce the frequency of large delays
when we outrun the buffer. Note: increasing this size dramatically can cause
slow startups (i.e. above 50MB).
Change-Id: I98c4316cfe93a043623ae2669cfe1a5abb55c990
[ROCm/roctracer commit: 38ba63030d]
RPATH in libraries installed in /opt/rocm-ver/lib/roctracer should be: $ORIGIN:$ORIGIN/..
cmake shared linker flags will provide the rpath $ORIGIN
The patch will append the rpath $ORIGIN/.. to the component specific libraries
Change-Id: Ied2bcb57bf0dd38ee3d1a946a5afc1bb182ff619
[ROCm/roctracer commit: 6fbf7673aa]
Using wrapper header files will result in #warning message by default
Change-Id: Ib8a05d11f2391dfcdac8601da26e1096821cd555
[ROCm/roctracer commit: 245eafea4c]
Using backward compatibility paths will provide an #error message. Compile time option added to enable/disable the #error message.
Disabling the same will provide a #warning message
Change-Id: I6abc236e810ccc38d3636074e0e8f5a9657c2e9a
[ROCm/roctracer commit: ea061be2d1]
SWDEV-356024 - Development package name will have suffix dev or devel based on OS
Devel package contents - Header files, name link of public library files, html files and roctracer manual file
Runtime package contents - Versioned public library files, private library files and license file
Change-Id: I8ced3eab5d8824a66be39b9e777368506516b155
[ROCm/roctracer commit: 9acba8b4a1]
When multiple ranks are used, each rank's first logical device always
has GPU ID 0, regardless of which physical device is selected with
CUDA_VISIBLE_DEVICES. Because of this, when merging trace files from
multiple ranks, GPU IDs from different processes may overlap.
The long term solution is to use the KFD's gpu_id which is stable
across APIs and processes. Unfortunately the gpu_id is not yet exposed
by the ROCr, so for now use the driver's node id.
Change-Id: I2f5af8d2a7e8a89efeb5e0a1b86bdfa547b25fc8
[ROCm/roctracer commit: 799f0323cd]
Using a thread_local object is problematic as the thread local
destructors are called first before any global destructor, making
the object invalid while tearing down the process.
rocblas uses a global destructor to clean up the loaded HIP modules
and ends up calling hip_executable_destroy after the timestamp stack
is destructed. As a result the begin timestamp for that API function
is 0.
The solution is to store the phase_enter timestamp in the phase_data.
Change-Id: If143f4d123dfb111c72fb20365431d07e73fc570
[ROCm/roctracer commit: 8a575d8d6e]
The timestamps coming from the HIP runtime for asynchronus memory
copies are corrupted (begin > end) because the HSA setting to record
timestamps is turned off by the tracer's HSA intercept.
The solution is to intercept hsa_amd_profiling_async_copy_enable and
remember the application/runtime's request so that it can be ORed with
IsEnabled(ACTIVITY_DOMAIN_HSA_OPS, HSA_OP_ID_COPY).
Change-Id: Ib687cbf36711563e86c2bb8bc934c7c51572bfde
[ROCm/roctracer commit: 329c0467cb]
The tracer tool needs to remember the begin timestamps for API
callbacks, and uses a thread_local std::stack for that purpose.
The issue with thread_local objects is that they are destructed
before anything else when the main thread exits. To work around
that issue, we use a "safe" stack in the roctracer API.
Use the same "safe" stack in the tracer tool.
Change-Id: I0d69d4eb44f0205f4102d0d5ef9803a1ec1800a5
[ROCm/roctracer commit: b664937ebd]
rocprof errors out with the following message:
symbol lookup 'KernelNameRef' failed: libamdhip64.so.5: undefined \
symbol: KernelNameRef
The HipLoader is incorrectly looking for a KernelNameRef symbol
instead of hipKernelNameRef.
Fixed the typo: KernelNameRef -> hipKernelNameRef.
Change-Id: Ia4860e1669707b0c83d67e71b78d362b07a6aaa7
[ROCm/roctracer commit: a287f20961]
Starting with gcc-11 (verified with gcc-12 as well), an array
out-of-bounds subscript error is reported for accessing the registration
table element at the operation ID index. Validating the index in the
function calling Register/Unregister does not quiet the warning/error
in release builds, so, for gcc-11 and gcc-12, we disable that warning
just for the RegistrationTable class.
Change-Id: I6bc4a02aa072cfa8905ecde5e3960aebf32fc912
[ROCm/roctracer commit: 67ce5fae13]
The post-processing script cannot handle HIP ops without a correlation
ID. The correlation ID is needed to connect the record to a HIP stream
and originating thread.
This issue was exposed by a change to the tracer API to report
asynchronous activities even if their originating synchronous API
activity (callback) is not enabled. This was a flow in the API.
Also fix an issue with the API filtering. Undefined API names should
not cause an exception, they should be ignored.
Change-Id: Iab2221af6180ade2b9c2eb10c256c3a73d872e9f
[ROCm/roctracer commit: 4856d33959]
Default to the HSA runtime's hsa_system_get_info if the saved HSA
functions table is not yet initialized.
Change-Id: I3659095a5ad662f7ca8b0d92bd035901c6d66bb0
[ROCm/roctracer commit: 87ffbd27f4]
Instead of dlopen'ing RTLD_NOLOAD a library (for example libamdhip64.so)
and rely on the dynamic linker search path, search through the already
loaded shared objects for a library with a matching name.
Change-Id: I3e74d432bd7ca68df8927ca435b290e86aaaf9e9
[ROCm/roctracer commit: db69cc1c9f]
Remove the hipInitActivityCallback and use the new hipRegister/
RemoveActivityCallback which allows distinct memory pools to be used
for HIP_OPS activities.
Enable the multi_pool_activities test.
Change-Id: I6f6feaedecc9c36285bea975caf24dbf8f5f624b
[ROCm/roctracer commit: 340c7cb553]
The code is easier to read if calling HIPActivityCallbackTracker
enable/disable_check directly. Both enable/disable_check return the
new mask, and the check whether a callback is already installed is
clearer.
Change-Id: Ic90d34489b5b4d9929dc08b4d9e93cc974b136b1
[ROCm/roctracer commit: f0e082feb1]
The HIP runtime is now allocating the hip_api_data and record on its
stack so we don't need the thread local record_data_pair stack anymore.
Refactor the API callback function to handle both the case where
synchronous user callbacks are requested and the case where asynchronous
records are requested (enable_callback & enable_activity respectively).
If the callback argument (memory pool) is not null, then activity
records are requested.
Remove CorrelationIdRegister and CorrelationIdLookup. These were used
by the HIP runtime to associate a HIP record id to a ROCtracer
correlation id. Instead, the HIP runtime is now using the correlation
ID returned in the hip_api_data_t.
Added a test to check enabling/disabling concurrent callbacks and
activities.
Change-Id: I5850cfead9861eb3602a3e8fcb7b22580d5fc979
[ROCm/roctracer commit: 88c6e0a700]
These functions have little value as it is very unlikely an application
would want to enable all the domains.
Change-Id: I4743e8ddf6743e60c95c7ba5240950d2ef734301
[ROCm/roctracer commit: ad01ba513a]
This function has been deprecated since ROCm-2.9, use ROCTX's
roctxMark(const char* message) as a replacement for roctracer_mark.
Change-Id: Ie4aeae1db238453fc4451746cc9a338032ba817f
[ROCm/roctracer commit: bddb9850de]
- Multithreaded Applications and plugin destruction
- Fixing Async-copy trace in file plugin
- Adding the assert checkups for every trace buffer flush function
Change-Id: I96e096fd7ee2604931200a0b446edb5ce49959dd
[ROCm/roctracer commit: 4cd7497a87]
- Added File plugin as the default plugin
- Moved the flush functions to the plugins
- Improved the flush to file implementation
Change-Id: I80dd448eb8147a8ea4aa63b39bd1d0a4baf7252b
[ROCm/roctracer commit: b7e1f74054]
Intercept the first call to hsa_iterate_agents in order to number them.
The index assigned to agents will be used by a future commit.
Change-Id: I8db365f8fe913b6cde16a4dccb9bf09600846521
[ROCm/roctracer commit: 84ad727c38]
Remove declarations that are not meant to be part of the public API.
Change-Id: I47d9e83bf41bdb2f7ac25a1507200b51c616049b
[ROCm/roctracer commit: 05d3cf3529]