Add custom_commands to generate the HSA code objects
Remove the configure time file generation and add custom commands to
generate them at build time.
Change-Id: I167dd9befc6c73f32224935eaab74510922b26f4
[ROCm/roctracer commit: 3773384af8]
Reporting error in CMake if CppHeaderParser and argparse are not installed in the system
Change-Id: I7617f662bc061fde45ce9f72c08d80a5108766d9
[ROCm/roctracer commit: b88bbe155f]
ROCtracer does not rely on the ld.so search path to load the tracer
tool library.
Change-Id: I19f69add4777c8c1b274db61906d4497997171ff
[ROCm/roctracer commit: c74b1fa8ff]
This should be enabled at the command line during the cmake configure
step (-DCMAKE_VERBOSE_MAKEFILE=True).
Verbose output can also be enabled during the build by setting the
VERBOSE=1 GNU make variable, or using the -v Ninja option.
Change-Id: Ie842c900c83c8f9f1c3ab4119e3bbc7931d371f5
[ROCm/roctracer commit: 2b3dc8f20b]
Check if a default pool is defined when enabling activities.
Set default pool to undefined if it is deleted.
Disable activities associated with the pool when it is deleted.
Document restrictions on deleting pools.
Change-Id: Ide466b55cab12ca2dd67d9f26b578f421e45a376
[ROCm/roctracer commit: feb652e45d]
Global destructor issue, using atexit to run tool_unload once at the exit of the tracer tool.
Change-Id: I276f6d240cd312ba1eacaf52c38ef8fd1f607268
[ROCm/roctracer commit: ae1091d816]
Adding roctracer_hcc.h for backward compatability to enable multiple components using it as tensorflow
Change-Id: Idfcdda9207277866e629e7bb9bfc0da835481217
[ROCm/roctracer commit: 1db8cdf99a]
Optimizing trace period to use std::threads as well as std::chrono sleep instead of sleep and usleep and catching up corner cases for ending before the trace period duration and some cosmetic clean up
Change-Id: Ia99f346bf71a3faad5dfdfc8d7a08f6c2b2cc0b9
[ROCm/roctracer commit: 1f9efecd4a]
The test (MatrixTranspose) and the tracer tool both write to stdout
which sometime causes a trace corruption.
Change the test to emit info messages to stderr instead of stdout,
leaving stdout for the tracer tool's exclusive use.
Change-Id: I18047dbcd9039b70dd24ef6e7e8e9d89b40bedd2
[ROCm/roctracer commit: bbe1db3810]
Removing unused definitions and compile options
Using cmake variables to set the options needed
Changing the visibility to make it specific for the targets
Change-Id: I80cf0997cd28897d5a06a58c7225ba40dfc51e2d
[ROCm/roctracer commit: 2f5313a0c7]
Using std::thread instead of pthreads and also atomic_bool to identify the end of the flush function so that the unload_tool can wait for it
Change-Id: Iea00d7e16c65d51db2d222e8b42f03f9caeb2067
[ROCm/roctracer commit: 80464525c7]
removing the backward compatability file and making sure to use the right paths
Change-Id: I518d52c82e0c5878bd334713e7b1758bba79762d
[ROCm/roctracer commit: 6b16d37d65]
Each thread has a thread-local record_pair_stack. The stack is
dynamically allocated on first use, but is not detroyed when the
thread exits.
Replaced record_pair_stack pointers with record_pair_stack instances,
the intances are constructed on first odr-use, and destructed when the
thread exits.
Also, converted the cb_journal and act_journal to instances.
Change-Id: I186ac29da477f194880a1ab599f4be5715a23063
[ROCm/roctracer commit: 67481bd295]
The range message stack is mirrored in case ranges are pushed or popped
while tracing is stopped (by the tracer tool?). When a stop event is
reported, the tracer tool emits RangePop events by unwinding the stack,
then when the start event is reported, it emits RangePush events again
by unwinding the stack. The issue is that the RangePush events should
be emitted in reverse order.
For example:
RangePush(M1); RangePush(M2); \
TracerStop; RangePop; RangePop; \
...; \
TracerStart; RangePush(M2); RangePush(M1); \ <- In the wrong order
RangePop; RangePop;
It could be fixed by reversing the stack in RangeStackIterate but is it
worth it? The roctx range markers are supposed to be unintrusive so that
they can be left in the application even when it isn't being traced.
Simplifying the roctx API and reducing its added latency by removing
the range message stack mirroring seems like the better choise.
TODO: A future change should make roctx events immune to tracer start
and tracer stop requests. Or simply remove roctracer_start/stop.
Change-Id: Ie4d76afb5ce8d263848dcf1b599af394db56ddab
[ROCm/roctracer commit: 3d0198c395]
Remove thread_data_init. The C++ standard guarantees that the thread
local variable is initialized before its first odr-use and destructed
when the thread exits. Use a global initializer to set the reference
from the message stack instance in the map.
Remove roctracer_error_string. This does not belong to this library.
ROCTX does not expose errors to the application. The only functions
returning errors are returning -1 (Push/Pop).
Remove memory leaks due to strdup on the ranges messages. The memory
for the messages is guaranteed to be valid for the duration of the
callback, and it is the application's responsibility to strdup the
strings if it needs to extend the message's lifetime.
Add a lock to the RegisterApiCallback implementation. Iterating the
message stack map must be synchronized as a new thread could be adding
a new value to the map.
Change-Id: Iaf5b07ebc9efe4061cb01327d4c7034888727816
[ROCm/roctracer commit: 713db1fce5]
Make CallbackTable::Get return the callback_function/user_arg pair
as an actual return value instead of returning it through arguments
pointers.
Change-Id: Ia2dfcdad8c237a09620518ad67af94add47220da
[ROCm/roctracer commit: 4aeb76f7a8]
At the end of the test, the tracer tool is unloaded and the active
memory pools are flushed. In the flush callback, to get the activity
operation string, the RocpLoader instance is neeeded, and if the
RocpLoader is not already loaded, it attempts to dlopen the rocprofiler
library.
Calling dlopen from a global destructor hangs because the dynamic
loader lock is already owned (e.g. by dlclose).
To temporarily work around the issue, instanciate the RocpLoader when
the activities needing it are enabled.
Change-Id: I712c66d88c43694fe53a95d6a61d7b22abb75262
[ROCm/roctracer commit: 11887f596a]
System clock timestamps should only come from a single source:
util::timestamp_ns(). Externally, this function is exposed as
roctracer_get_timestamp() (used by the tracer tool).
Removed the now unused HSA Runtime Utilities which were never part
of the ROCtracer API.
Change-Id: I044b7f4da60fd8fdb771b0c877622a3143f0e815
[ROCm/roctracer commit: f8462b8637]
hsa_rsrc_factory was only used to enumerate the agents types and pools.
The pools don't seem to be used by bin/mem_manager.py, so I only
ported the agent enumeration using hsa_iterate_agents.
Change-Id: Idd586aa13db303cf92962a6392771b7bf38b758f
[ROCm/roctracer commit: 6d6017249a]
Compilers doesn't see assert as a usage of the variables, I added [[maybe_unused]] to the variables that are used only in assert to make sure that the compiler is skipping them in the check. Note: [[maybe_unused]] is introduced in C++17
Change-Id: I96bb53cb2ab55ee7120681c2d279271c0075095d
[ROCm/roctracer commit: 78869032ad]
The else part was not used as it was only using the hipApiString to format the data to string
Change-Id: I376721c478cffba0890436ca8895dfe2a7641570
[ROCm/roctracer commit: 5e012541c5]
1) The Entry's state was published after making the record avaiable,
so a thread flushing the records could see an unitialized record.
2) data_ and write_pointer_ could become out of sync. write_pointer_
could be indexing into another buffer than what data_ was pointing
to.
3) GetEntry could get a nullptr free_buffer_ because multiple threads
could acquire the work_mutex_ before the work_thread_ could wake up,
or between allocate_worker's loop iterations.
Change-Id: I6f0a015557888eeeaa75a8bce7fde8de276d11dd
[ROCm/roctracer commit: 046df32729]
A trace buffer is used to efficiently store synchronous event records
so that they can be processed later, possibly in a different thread,
when the buffer is flushed. This helps reduce the latency added by
tracing API calls.
The API does not need to use trace buffers as synchronous events are
directly reported to the client with callbacks, and asynchronous events
(activities) are saved in memory pools.
The implentation of HSA asynchronous memory copy activities was using
a trace buffer shared with the tracer tool to write the records to a
file (async_copy_trace.txt), instead of using a memory pool and
reporting the activity to the client.
Removed the asynchronous memory copies trace buffer, and updated
hsa_async_copy_handler to use the pool specified when the activity
was enabled.
Updated the tracer tool to read HSA_OP_ID_COPY records out of the
default memory pool and write them to async_copy_trace.txt.
Move trace_buffer.h to test/tool as tracer_tool.cpp is now the only
file using it.
Change-Id: Ida95aba2eaf3c3f2a979ed6c2b060374017b7424
[ROCm/roctracer commit: 61f35b0204]
This test stresses the concurrent writing of trace buffer records while
frequently allocating new storage to hold the records.
Due to race conditions, this test fails with the current trace buffer
implementation.
Change-Id: I0b77c64005e776319bf21f1ee1e6d7c99ddccfff
[ROCm/roctracer commit: 200e27f12d]
Replace EXC_ABORT() checks with assertions.
Rewrite the exception class to use std::runtime_error (as it
already handles the std::string/char* message argument).
Change-Id: I48e31924f3aea1328e6562ab6bb06ec373fd5d5e
[ROCm/roctracer commit: 5963363484]
There's a typo in RegisterApiCallback, roctx::cb_table.Get should be
roctx::cb_table.Set.
Change-Id: I47ec8ac666f783ff4e03f35d13e375e645899900
[ROCm/roctracer commit: 0d7d56eea5]