This commit is for code cleanup and for optimizing kernel name search
in the API callback, making sure to get the kernel name accurately
for the hip functions that have any kernel names
Change-Id: Ie9ab917c895748bfb8eee9ddfcbcad81a0b9a9fa
When ROCP_TRUNCATE_NAMES is not set, getenv returns NULL and std::atoi
crashes. Check that getenv returns a non-NULL string before calling
std::atoi.
Change-Id: Ie479a481f8d23f034b425d14e3cfefb3d62c84e8
Split the public and private HSA profiler/tracer interfaces. Only the
public interface should be exposed in include/roctracer.
Change-Id: I7e4424cd90023693350c31e6b02caca8c984ba84
Use GNUInstallDirs variables to determine the location of BINDIR,
LIBDIR, INCLUDEDIR, DATADIR, DOCDIR, LIBEXECDIR.
Depends-On: Id11f862fb4bdb2425d68f455074172c38814ec92
Change-Id: I6459a4531ef899321a5e2d8050cf8b553e89a968
The roctracer-tests package contains all the roctracer test binaries
and scripts needed to run the testsuite outside of the build directory.
Change-Id: Id11f862fb4bdb2425d68f455074172c38814ec92
The ROCR now detects already loaded tool libraries and calls OnLoad/
OnUnload in the order specified with HSA_AMD_TOOL_ORDER.
It is no longer necessary to set the HSA_TOOLS_LIB environment variable
to load the roctracer API. The roctracer tool library should be
pre-loaded with LD_PRELOAD.
Change-Id: I6de1b1bd4f93caa08d3554aad2376d242c74fb7e
Enabling the new methodology of ROCP_STATS_OPT of getting HIP activities while the application is running
Change-Id: I94b3311b0740db804643dba0e4f77c1f9de0319b
In file included from roctracer/src/roctracer/tracker.h:24,
from roctracer/src/roctracer/roctracer.cpp:44:
/opt/rocm/hsa/include/hsa/amd_hsa_signal.h:26:246: note: ‘#pragma message: amd_hsa_signal.h has moved to ...’
26 | ssage("amd_hsa_signal.h has moved to ...")
| ^
Change-Id: I38d151d836688083a4fdb0e86a04fc40923a369f
The same information can be generated from the hcc_ops_trace.txt file,
so in a later commit, will add a stage to the tblextr.py script to
generate the .csv files when ROCP_STATS_OPT=1.
Change-Id: I3d1575e096bedf98c66068d9a4ca141421e5bb9d
Some records may need to point to data with the same lifetime as the
records themselves. One solution is to store the data at the end of
the memory pool buffer. Records in the buffer grow up, and the data
grows down. When the buffer is flushed both records and data are
recycled.
Change-Id: I278fa84478236bf895f7c2d152d47d4256987392
The roctracer_load, roctracer_unload, and roctrace_flush_buf functions
are not part of the ROCtracer API, and should not be exposed in the API
header file, but keep the functions in the library for backward
compatibility.
Add src/roctracer/backward_compat.cpp to implement retired functions.
Add test/app/backward_compat_test.cpp to test that the retired functions
are still accessible in the latest roctracer library.
Change-Id: I4c94310a7bfccfeae9384dac5db18fc79b4c5b17
Make error codes more informative and have negative values. This is an
ABI break but it does not appear known tools are relying on the exact
error codes.
Use logging for all errors so that roctracer_error_string will be able
to return last error message.
Make internal errors fatal and abort.
Do not use the tracer API exceptions in the tracer tool.
Change-Id: Ie8ed3d50e5ad26625ac9d1263f7e048edb5584c0
Add symbol versioning to the roctracer64 and roctx64 library, and only
expose the OnLoad and OnUnload tracer_tool symbols.
Change-Id: I7f160fc3e568567fd1146ff5b9c0aef3bdcccf53
Add custom_commands to generate the HSA code objects
Remove the configure time file generation and add custom commands to
generate them at build time.
Change-Id: I167dd9befc6c73f32224935eaab74510922b26f4
Check if a default pool is defined when enabling activities.
Set default pool to undefined if it is deleted.
Disable activities associated with the pool when it is deleted.
Document restrictions on deleting pools.
Change-Id: Ide466b55cab12ca2dd67d9f26b578f421e45a376
Removing unused definitions and compile options
Using cmake variables to set the options needed
Changing the visibility to make it specific for the targets
Change-Id: I80cf0997cd28897d5a06a58c7225ba40dfc51e2d
Each thread has a thread-local record_pair_stack. The stack is
dynamically allocated on first use, but is not detroyed when the
thread exits.
Replaced record_pair_stack pointers with record_pair_stack instances,
the intances are constructed on first odr-use, and destructed when the
thread exits.
Also, converted the cb_journal and act_journal to instances.
Change-Id: I186ac29da477f194880a1ab599f4be5715a23063
Improve the roctx markers performance when the tracer is not engaged
(the application is not running with rocprof).
The performance of roctx push/pop, measured with:
-----------------------------------------------------------------------
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < 10000000; ++i) {
roctxRangePush ("A");
roctxRangePop ();
}
auto end = std::chrono::steady_clock::now();
std::cout << "ns = " << std::chrono::nanoseconds(end - start).count()
/ 10000000 << std::endl;
-----------------------------------------------------------------------
w/o rocprof | with rocprof | commit
92ns | 770ns | 0d6e132: Cleanup CallbackTable::Get
28ns | 712ns | 6421bd5: Cleanup ROCTX's implementation
20ns | 664ns | 7f0e5e5: Remove the roctx range message...
6ns | 665ns | this commit
Change-Id: Id679dcbd0fb190a3179be98a9b2c1db151efee3d
The range message stack is mirrored in case ranges are pushed or popped
while tracing is stopped (by the tracer tool?). When a stop event is
reported, the tracer tool emits RangePop events by unwinding the stack,
then when the start event is reported, it emits RangePush events again
by unwinding the stack. The issue is that the RangePush events should
be emitted in reverse order.
For example:
RangePush(M1); RangePush(M2); \
TracerStop; RangePop; RangePop; \
...; \
TracerStart; RangePush(M2); RangePush(M1); \ <- In the wrong order
RangePop; RangePop;
It could be fixed by reversing the stack in RangeStackIterate but is it
worth it? The roctx range markers are supposed to be unintrusive so that
they can be left in the application even when it isn't being traced.
Simplifying the roctx API and reducing its added latency by removing
the range message stack mirroring seems like the better choise.
TODO: A future change should make roctx events immune to tracer start
and tracer stop requests. Or simply remove roctracer_start/stop.
Change-Id: Ie4d76afb5ce8d263848dcf1b599af394db56ddab
Remove thread_data_init. The C++ standard guarantees that the thread
local variable is initialized before its first odr-use and destructed
when the thread exits. Use a global initializer to set the reference
from the message stack instance in the map.
Remove roctracer_error_string. This does not belong to this library.
ROCTX does not expose errors to the application. The only functions
returning errors are returning -1 (Push/Pop).
Remove memory leaks due to strdup on the ranges messages. The memory
for the messages is guaranteed to be valid for the duration of the
callback, and it is the application's responsibility to strdup the
strings if it needs to extend the message's lifetime.
Add a lock to the RegisterApiCallback implementation. Iterating the
message stack map must be synchronized as a new thread could be adding
a new value to the map.
Change-Id: Iaf5b07ebc9efe4061cb01327d4c7034888727816
Make CallbackTable::Get return the callback_function/user_arg pair
as an actual return value instead of returning it through arguments
pointers.
Change-Id: Ia2dfcdad8c237a09620518ad67af94add47220da
At the end of the test, the tracer tool is unloaded and the active
memory pools are flushed. In the flush callback, to get the activity
operation string, the RocpLoader instance is neeeded, and if the
RocpLoader is not already loaded, it attempts to dlopen the rocprofiler
library.
Calling dlopen from a global destructor hangs because the dynamic
loader lock is already owned (e.g. by dlclose).
To temporarily work around the issue, instanciate the RocpLoader when
the activities needing it are enabled.
Change-Id: I712c66d88c43694fe53a95d6a61d7b22abb75262
System clock timestamps should only come from a single source:
util::timestamp_ns(). Externally, this function is exposed as
roctracer_get_timestamp() (used by the tracer tool).
Removed the now unused HSA Runtime Utilities which were never part
of the ROCtracer API.
Change-Id: I044b7f4da60fd8fdb771b0c877622a3143f0e815
Compilers doesn't see assert as a usage of the variables, I added [[maybe_unused]] to the variables that are used only in assert to make sure that the compiler is skipping them in the check. Note: [[maybe_unused]] is introduced in C++17
Change-Id: I96bb53cb2ab55ee7120681c2d279271c0075095d