Removing unused definitions and compile options
Using cmake variables to set the options needed
Changing the visibility to make it specific for the targets
Change-Id: I80cf0997cd28897d5a06a58c7225ba40dfc51e2d
Each thread has a thread-local record_pair_stack. The stack is
dynamically allocated on first use, but is not detroyed when the
thread exits.
Replaced record_pair_stack pointers with record_pair_stack instances,
the intances are constructed on first odr-use, and destructed when the
thread exits.
Also, converted the cb_journal and act_journal to instances.
Change-Id: I186ac29da477f194880a1ab599f4be5715a23063
Improve the roctx markers performance when the tracer is not engaged
(the application is not running with rocprof).
The performance of roctx push/pop, measured with:
-----------------------------------------------------------------------
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < 10000000; ++i) {
roctxRangePush ("A");
roctxRangePop ();
}
auto end = std::chrono::steady_clock::now();
std::cout << "ns = " << std::chrono::nanoseconds(end - start).count()
/ 10000000 << std::endl;
-----------------------------------------------------------------------
w/o rocprof | with rocprof | commit
92ns | 770ns | 0d6e132: Cleanup CallbackTable::Get
28ns | 712ns | 6421bd5: Cleanup ROCTX's implementation
20ns | 664ns | 7f0e5e5: Remove the roctx range message...
6ns | 665ns | this commit
Change-Id: Id679dcbd0fb190a3179be98a9b2c1db151efee3d
The range message stack is mirrored in case ranges are pushed or popped
while tracing is stopped (by the tracer tool?). When a stop event is
reported, the tracer tool emits RangePop events by unwinding the stack,
then when the start event is reported, it emits RangePush events again
by unwinding the stack. The issue is that the RangePush events should
be emitted in reverse order.
For example:
RangePush(M1); RangePush(M2); \
TracerStop; RangePop; RangePop; \
...; \
TracerStart; RangePush(M2); RangePush(M1); \ <- In the wrong order
RangePop; RangePop;
It could be fixed by reversing the stack in RangeStackIterate but is it
worth it? The roctx range markers are supposed to be unintrusive so that
they can be left in the application even when it isn't being traced.
Simplifying the roctx API and reducing its added latency by removing
the range message stack mirroring seems like the better choise.
TODO: A future change should make roctx events immune to tracer start
and tracer stop requests. Or simply remove roctracer_start/stop.
Change-Id: Ie4d76afb5ce8d263848dcf1b599af394db56ddab
Remove thread_data_init. The C++ standard guarantees that the thread
local variable is initialized before its first odr-use and destructed
when the thread exits. Use a global initializer to set the reference
from the message stack instance in the map.
Remove roctracer_error_string. This does not belong to this library.
ROCTX does not expose errors to the application. The only functions
returning errors are returning -1 (Push/Pop).
Remove memory leaks due to strdup on the ranges messages. The memory
for the messages is guaranteed to be valid for the duration of the
callback, and it is the application's responsibility to strdup the
strings if it needs to extend the message's lifetime.
Add a lock to the RegisterApiCallback implementation. Iterating the
message stack map must be synchronized as a new thread could be adding
a new value to the map.
Change-Id: Iaf5b07ebc9efe4061cb01327d4c7034888727816
Make CallbackTable::Get return the callback_function/user_arg pair
as an actual return value instead of returning it through arguments
pointers.
Change-Id: Ia2dfcdad8c237a09620518ad67af94add47220da
At the end of the test, the tracer tool is unloaded and the active
memory pools are flushed. In the flush callback, to get the activity
operation string, the RocpLoader instance is neeeded, and if the
RocpLoader is not already loaded, it attempts to dlopen the rocprofiler
library.
Calling dlopen from a global destructor hangs because the dynamic
loader lock is already owned (e.g. by dlclose).
To temporarily work around the issue, instanciate the RocpLoader when
the activities needing it are enabled.
Change-Id: I712c66d88c43694fe53a95d6a61d7b22abb75262
System clock timestamps should only come from a single source:
util::timestamp_ns(). Externally, this function is exposed as
roctracer_get_timestamp() (used by the tracer tool).
Removed the now unused HSA Runtime Utilities which were never part
of the ROCtracer API.
Change-Id: I044b7f4da60fd8fdb771b0c877622a3143f0e815
Compilers doesn't see assert as a usage of the variables, I added [[maybe_unused]] to the variables that are used only in assert to make sure that the compiler is skipping them in the check. Note: [[maybe_unused]] is introduced in C++17
Change-Id: I96bb53cb2ab55ee7120681c2d279271c0075095d
A trace buffer is used to efficiently store synchronous event records
so that they can be processed later, possibly in a different thread,
when the buffer is flushed. This helps reduce the latency added by
tracing API calls.
The API does not need to use trace buffers as synchronous events are
directly reported to the client with callbacks, and asynchronous events
(activities) are saved in memory pools.
The implentation of HSA asynchronous memory copy activities was using
a trace buffer shared with the tracer tool to write the records to a
file (async_copy_trace.txt), instead of using a memory pool and
reporting the activity to the client.
Removed the asynchronous memory copies trace buffer, and updated
hsa_async_copy_handler to use the pool specified when the activity
was enabled.
Updated the tracer tool to read HSA_OP_ID_COPY records out of the
default memory pool and write them to async_copy_trace.txt.
Move trace_buffer.h to test/tool as tracer_tool.cpp is now the only
file using it.
Change-Id: Ida95aba2eaf3c3f2a979ed6c2b060374017b7424
Replace EXC_ABORT() checks with assertions.
Rewrite the exception class to use std::runtime_error (as it
already handles the std::string/char* message argument).
Change-Id: I48e31924f3aea1328e6562ab6bb06ec373fd5d5e
The proxy queue implements packet interception to enable timestamps
collection. As it is, the roctracer is not intercepting packets, and
instead relies on the rocprofiler tool to collect the timestamps for
kernel dispatches.
This is an issue as the roctracer API does not implement HSA_OPS
activities for kernel dispatches. This will be addressed in a future
commit.
Change-Id: Ib6a778a513410bec4579f223a9d9e9fd9b6054df
Building with -DLIBRARY_TYPE=STATIC fails with 3 undefined symbols.
Add weak symbols to satisfy the linker (mirror what is done for the
other Loader symbols).
Change-Id: I8a2878def21d5f500b0764ceacb4e5255e1111c5
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.
Change-Id: Ifbc529278cea54dd23e6086aa9b9ea2df952d5dd
Before this change, when a producer was blocked by a flush operation,
no other producer could write to the memory pool. This change allows
other producer threads to continue to write by releasing the producer
lock before waiting on the consumer condition variable.
Change-Id: Idc1c07173d2edb18fbe1a61961f10c02e7ca8c20
HCC_EXC_RAISING and HIP_EXC_RAISING don't add much value, so to
simplify, only keep EXC_RAISING and EXC_ABORT.
Change-Id: Ifdc54981bb682fe68b418cdc95ecebe668e3dcf6
The HCC runtime is no longer used, so move all the remaining
activities in the HipApi loader and remove the HccLoader.
Change-Id: I845c04ca275a474526840315bae0ad1a4ce02257
Use the standard concurrent support library (std::thread, std::mutex,
st::condition_variable) instead of pthread.
Fix a mismatched memory allocation/deallocation when a custom allocator
is provided. The MemoryPool destructor was always using the default
allocator (using malloc/realloc/free) even if the pool memory was
allocated with the custom allocator.
Fix various thread safety issues and inefficiencies (spin loops).
Change-Id: I97592caa947f63463041bf43e00af9ebb5ff5886
Move roctracer_cb_table.h to the src/core directory, as it should not
be exposed as a public header, and rename it callback_table.h
Change-Id: Ib448cbd32a275df0268d53bd8d1da0bdc9201470
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.
Change-Id: Ida25d3bfc72047afaa27326d697be76d97564334
Changing correlation_id_map to static instance instead of being a pointer and fixing the corresponding references
Change-Id: Id8a481a90b46831f91985a7e0523fd2869991aeb
Use HIP_API_ID_NONE to detect unsupported API instead of
HIP_API_ID_NUMBER which can grow with a new version of the API.
This HIP_API_ID_NONE enum has a fixed value of 0 so the
HIP_API_IDs really start at FIRST.
Change-Id: I760aa50ddf6fa6d46bf20555ad7d429335a53f97