In the generated header file hsa_prof_str.h , the header file hsa_ostream_ops.h was included using angle brackets
This results in compilation with include path /opt/rocm-ver/include. Corrected the usage by using double quotes
Change-Id: Ie9f1fff78d16a6953a2c99056b2acef42e577204
Strings ([const] char *, [const] char[]) passed as arguments to API
functions may not always contain printable characters. All string
arguments should be quoted and escaped in the trace logs.
Change-Id: Ie39058f2190048b1a0090df16d9ac6bc6507e28a
Using a thread_local object is problematic as the thread local
destructors are called first before any global destructor, making
the object invalid while tearing down the process.
rocblas uses a global destructor to clean up the loaded HIP modules
and ends up calling hip_executable_destroy after the timestamp stack
is destructed. As a result the begin timestamp for that API function
is 0.
The solution is to store the phase_enter timestamp in the phase_data.
Change-Id: If143f4d123dfb111c72fb20365431d07e73fc570
Using rocprof with ROCP_MCOPY_DATA=1 while tracing HSA produces the
following error:
tblextr.py: Memcpy args "(0x7feb16a00000, 123handle=28593376125, 0x7feb12a00010, 123handle=27558560125, 4194304, 0, 0, 123handle=140661639440000125) = 1" cannot be identified
Profiling data corrupted: ' ./out/rpl_data_220930_143009_1826700/input_results_220930_143009/results.txt'
There are two issues:
1) The hsa_agent_t handle argument is misprinted: "123handle=...125"
Instead of printing '{' and '}', it prints '123' and '125'. The wrong
operator<<(unsigned char) is used and an integer value is printed
instead of a char.
Use std::operator<< instead of hsa_support::detail::operator<< to
print '{' and '}'
2) The result value is unitialized and in some cases printed as a
negative integer value. The leading '-' is not matched by the
mem_manager regular expresion for HSA api calls.
Correctly capture the HSA function's return value.
Change-Id: If13a1e62eeb4e598447c4b90d53d1b2e3b408696
Use #include "header" instead of #include <header> so that the header
files are found when the application #includes <roctracer/roctracer.h>
with -I /opt/rocm/include.
Change-Id: I24feac9a5030d3600aee98084340e246c3990db5
Move the HSA intercept to the OnLoad function, so that it is available
as soon as the ROCR is loaded.
Layer the HSA API wrappers on top of the basic HSA activity intercept.
Change-Id: Ie636d59755543cda181e76ec29f0b55081136b63
Making sure not to count duplicates for load_unload_reload_trace and
fixed the ignore-count option in check_trace.py.
Change-Id: I9e674aa624ec3b473bb7c6cc95260e240204627f
Split the public and private HSA profiler/tracer interfaces. Only the
public interface should be exposed in include/roctracer.
Change-Id: I7e4424cd90023693350c31e6b02caca8c984ba84
The roctracer-tests package contains all the roctracer test binaries
and scripts needed to run the testsuite outside of the build directory.
Change-Id: Id11f862fb4bdb2425d68f455074172c38814ec92
Improve the roctx markers performance when the tracer is not engaged
(the application is not running with rocprof).
The performance of roctx push/pop, measured with:
-----------------------------------------------------------------------
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < 10000000; ++i) {
roctxRangePush ("A");
roctxRangePop ();
}
auto end = std::chrono::steady_clock::now();
std::cout << "ns = " << std::chrono::nanoseconds(end - start).count()
/ 10000000 << std::endl;
-----------------------------------------------------------------------
w/o rocprof | with rocprof | commit
92ns | 770ns | 0d6e132: Cleanup CallbackTable::Get
28ns | 712ns | 6421bd5: Cleanup ROCTX's implementation
20ns | 664ns | 7f0e5e5: Remove the roctx range message...
6ns | 665ns | this commit
Change-Id: Id679dcbd0fb190a3179be98a9b2c1db151efee3d
Make CallbackTable::Get return the callback_function/user_arg pair
as an actual return value instead of returning it through arguments
pointers.
Change-Id: Ia2dfcdad8c237a09620518ad67af94add47220da
Move roctracer_cb_table.h to the src/core directory, as it should not
be exposed as a public header, and rename it callback_table.h
Change-Id: Ib448cbd32a275df0268d53bd8d1da0bdc9201470
CppHeaderParser has limited support for unnamed structs. It leaves the
name empty so this results in classes (a.k.a structs) having trailing '::'
characters, also giving no way to distingush two unnamed structs at the
same level of nesting. An example are the inner structs of
hipExternalSemaphoreSignalParams. The workaround consists in skipping
over these, so they are not generated in the output header file
which lists all ostream ops<<. Only the inner unnamed structs are skipped,
the rest is processed as it should.
Change-Id: I17439c46095469b7adb7aee0b0f0b3d234aabc11