Граф коммитов

203 Коммитов

Автор SHA1 Сообщение Дата
Laurent Morichetti 37ab921f02 Cleanup roctracer.cpp
Minor cosmetic changes.

Change-Id: Ie5a904c757aa933d83ca6e496726e47fe7032620
2022-05-12 20:15:54 -04:00
Ammar ELWazir 2f5313a0c7 Fixing cmake_modules
Removing unused definitions and compile options
Using cmake variables to set the options needed
Changing the visibility to make it specific for the targets

Change-Id: I80cf0997cd28897d5a06a58c7225ba40dfc51e2d
2022-05-11 19:25:43 -04:00
Laurent Morichetti 67481bd295 Fix memory leaks in roctracer
Each thread has a thread-local record_pair_stack. The stack is
dynamically allocated on first use, but is not detroyed when the
thread exits.

Replaced record_pair_stack pointers with record_pair_stack instances,
the intances are constructed on first odr-use, and destructed when the
thread exits.

Also, converted the cb_journal and act_journal to instances.

Change-Id: I186ac29da477f194880a1ab599f4be5715a23063
2022-05-10 12:08:06 -07:00
Laurent Morichetti a794247c55 Optimize rotcx markers
Improve the roctx markers performance when the tracer is not engaged
(the application is not running with rocprof).

The performance of roctx push/pop, measured with:

-----------------------------------------------------------------------
  auto start = std::chrono::steady_clock::now();
  for (int i = 0; i < 10000000; ++i) {
    roctxRangePush ("A");
    roctxRangePop ();
  }
  auto end = std::chrono::steady_clock::now();
  std::cout << "ns = " << std::chrono::nanoseconds(end - start).count()
      / 10000000 << std::endl;
-----------------------------------------------------------------------

w/o rocprof | with rocprof | commit
       92ns |       770ns  | 0d6e132: Cleanup CallbackTable::Get
       28ns |       712ns  | 6421bd5: Cleanup ROCTX's implementation
       20ns |       664ns  | 7f0e5e5: Remove the roctx range message...
        6ns |       665ns  | this commit

Change-Id: Id679dcbd0fb190a3179be98a9b2c1db151efee3d
2022-05-10 12:08:06 -07:00
Laurent Morichetti 3d0198c395 Remove the roctx range message stack
The range message stack is mirrored in case ranges are pushed or popped
while tracing is stopped (by the tracer tool?). When a stop event is
reported, the tracer tool emits RangePop events by unwinding the stack,
then when the start event is reported, it emits RangePush events again
by unwinding the stack. The issue is that the RangePush events should
be emitted in reverse order.

For example:

RangePush(M1); RangePush(M2); \
  TracerStop; RangePop; RangePop; \
...; \
  TracerStart; RangePush(M2); RangePush(M1); \ <- In the wrong order
RangePop; RangePop;

It could be fixed by reversing the stack in RangeStackIterate but is it
worth it? The roctx range markers are supposed to be unintrusive so that
they can be left in the application even when it isn't being traced.

Simplifying the roctx API and reducing its added latency by removing
the range message stack mirroring seems like the better choise.

TODO: A future change should make roctx events immune to tracer start
and tracer stop requests. Or simply remove roctracer_start/stop.

Change-Id: Ie4d76afb5ce8d263848dcf1b599af394db56ddab
2022-05-10 12:08:06 -07:00
Laurent Morichetti 713db1fce5 Cleanup ROCTX's implementation
Remove thread_data_init. The C++ standard guarantees that the thread
local variable is initialized before its first odr-use and destructed
when the thread exits. Use a global initializer to set the reference
from the message stack instance in the map.

Remove roctracer_error_string. This does not belong to this library.
ROCTX does not expose errors to the application. The only functions
returning errors are returning -1 (Push/Pop).

Remove memory leaks due to strdup on the ranges messages. The memory
for the messages is guaranteed to be valid for the duration of the
callback, and it is the application's responsibility to strdup the
strings if it needs to extend the message's lifetime.

Add a lock to the RegisterApiCallback implementation. Iterating the
message stack map must be synchronized as a new thread could be adding
a new value to the map.

Change-Id: Iaf5b07ebc9efe4061cb01327d4c7034888727816
2022-05-10 12:08:06 -07:00
Laurent Morichetti 6e4055503c Merge "Cleanup CallbackTable::Get" into amd-staging 2022-05-10 14:55:20 -04:00
Laurent Morichetti e8909158b3 Merge "Remove unused open_output_file/close_output_file" into amd-staging 2022-05-10 14:55:10 -04:00
Laurent Morichetti 9cecf30131 Merge "Fix a hang in './test/hsa/ctrl ctrl_hsa_input_trace'" into amd-staging 2022-05-10 14:54:11 -04:00
Laurent Morichetti fe0adfd37b Merge "Remove now unused hsa_rsrc_factory" into amd-staging 2022-05-10 14:54:01 -04:00
Laurent Morichetti 7c4f7625b1 Merge "Consolidate all sources of timestamps" into amd-staging 2022-05-10 14:53:36 -04:00
Laurent Morichetti 4aeb76f7a8 Cleanup CallbackTable::Get
Make CallbackTable::Get return the callback_function/user_arg pair
as an actual return value instead of returning it through arguments
pointers.

Change-Id: Ia2dfcdad8c237a09620518ad67af94add47220da
2022-05-10 08:13:18 -07:00
Laurent Morichetti cb040b7def Remove unused open_output_file/close_output_file
Change-Id: I0e5118b814617cb605949c99e5f0dc235f6edac0
2022-05-10 08:13:18 -07:00
Laurent Morichetti 11887f596a Fix a hang in './test/hsa/ctrl ctrl_hsa_input_trace'
At the end of the test, the tracer tool is unloaded and the active
memory pools are flushed. In the flush callback, to get the activity
operation string, the RocpLoader instance is neeeded, and if the
RocpLoader is not already loaded, it attempts to dlopen the rocprofiler
library.

Calling dlopen from a global destructor hangs because the dynamic
loader lock is already owned (e.g. by dlclose).

To temporarily work around the issue, instanciate the RocpLoader when
the activities needing it are enabled.

Change-Id: I712c66d88c43694fe53a95d6a61d7b22abb75262
2022-05-10 08:13:18 -07:00
Laurent Morichetti 4ced94b9a2 Remove now unused hsa_rsrc_factory
Change-Id: I66175eb9fae2e7e61400af77a0c89be9c39e770e
2022-05-10 08:13:18 -07:00
Laurent Morichetti f8462b8637 Consolidate all sources of timestamps
System clock timestamps should only come from a single source:
util::timestamp_ns(). Externally, this function is exposed as
roctracer_get_timestamp() (used by the tracer tool).

Removed the now unused HSA Runtime Utilities which were never part
of the ROCtracer API.

Change-Id: I044b7f4da60fd8fdb771b0c877622a3143f0e815
2022-05-10 08:13:09 -07:00
Ammar ELWazir 502ea835b9 Solving issue with using clang as the compiler
Change-Id: I4fa7b24af7008a30b0300b57ccbf1bc82dbfd66e
2022-05-09 17:41:33 -05:00
Ammar ELWazir 78869032ad SWDEV-335490: Unused variables
Compilers doesn't see assert as a usage of the variables, I added [[maybe_unused]] to the variables that are used only in assert to make sure that the compiler is skipping them in the check. Note: [[maybe_unused]] is introduced in C++17

Change-Id: I96bb53cb2ab55ee7120681c2d279271c0075095d
2022-05-04 11:24:28 -04:00
Laurent Morichetti 61f35b0204 Move trace_buffer.h to the tool directory
A trace buffer is used to efficiently store synchronous event records
so that they can be processed later, possibly in a different thread,
when the buffer is flushed. This helps reduce the latency added by
tracing API calls.

The API does not need to use trace buffers as synchronous events are
directly reported to the client with callbacks, and asynchronous events
(activities) are saved in memory pools.

The implentation of HSA asynchronous memory copy activities was using
a trace buffer shared with the tracer tool to write the records to a
file (async_copy_trace.txt), instead of using a memory pool and
reporting the activity to the client.

Removed the asynchronous memory copies trace buffer, and updated
hsa_async_copy_handler to use the pool specified when the activity
was enabled.

Updated the tracer tool to read HSA_OP_ID_COPY records out of the
default memory pool and write them to async_copy_trace.txt.

Move trace_buffer.h to test/tool as tracer_tool.cpp is now the only
file using it.

Change-Id: Ida95aba2eaf3c3f2a979ed6c2b060374017b7424
2022-05-03 21:56:28 -04:00
Laurent Morichetti 5963363484 Fix assertions
Replace EXC_ABORT() checks with assertions.

Rewrite the exception class to use std::runtime_error (as it
already handles the std::string/char* message argument).

Change-Id: I48e31924f3aea1328e6562ab6bb06ec373fd5d5e
2022-04-27 11:24:26 -07:00
Laurent Morichetti 0d7d56eea5 Fix a SEGV when running --roctx-trace
There's a typo in RegisterApiCallback, roctx::cb_table.Get should be
roctx::cb_table.Set.

Change-Id: I47ec8ac666f783ff4e03f35d13e375e645899900
2022-04-27 12:14:32 -04:00
Laurent Morichetti 18f60efe05 Fix typos/spelling errors
Change-Id: Idec1cb8fab91c30f99563bc7dd4db1faeb2db954
2022-04-26 12:39:38 -07:00
Laurent Morichetti 6b06322578 Remove unused proxy utilities
The proxy queue implements packet interception to enable timestamps
collection. As it is, the roctracer is not intercepting packets, and
instead relies on the rocprofiler tool to collect the timestamps for
kernel dispatches.

This is an issue as the roctracer API does not implement HSA_OPS
activities for kernel dispatches. This will be addressed in a future
commit.

Change-Id: Ib6a778a513410bec4579f223a9d9e9fd9b6054df
2022-04-26 15:26:26 -04:00
Laurent Morichetti b352eedac6 Fix the static library build
Building with -DLIBRARY_TYPE=STATIC fails with 3 undefined symbols.
Add weak symbols to satisfy the linker (mirror what is done for the
other Loader symbols).

Change-Id: I8a2878def21d5f500b0764ceacb4e5255e1111c5
2022-04-26 15:26:10 -04:00
Ammar ELWazir e4569c41fe SWDEV-295522: Fixing Performance Issue
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.

Change-Id: Ifbc529278cea54dd23e6086aa9b9ea2df952d5dd
2022-04-22 18:51:49 -04:00
Laurent Morichetti dc8717a6b5 Allow MemoryPool::Write while Flushing
Before this change, when a producer was blocked by a flush operation,
no other producer could write to the memory pool.  This change allows
other producer threads to continue to write by releasing the producer
lock before waiting on the consumer condition variable.

Change-Id: Idc1c07173d2edb18fbe1a61961f10c02e7ca8c20
2022-04-22 11:22:23 -07:00
Laurent Morichetti 121a84b449 Remove HCC_EXC_RAISING and HIP_EXC_RAISING
HCC_EXC_RAISING and HIP_EXC_RAISING don't add much value, so to
simplify, only keep EXC_RAISING and EXC_ABORT.

Change-Id: Ifdc54981bb682fe68b418cdc95ecebe668e3dcf6
2022-04-22 11:22:23 -07:00
Laurent Morichetti 85552ea3a0 Move the HccLoader activities into the HipLoader
The HCC runtime is no longer used, so move all the remaining
activities in the HipApi loader and remove the HccLoader.

Change-Id: I845c04ca275a474526840315bae0ad1a4ce02257
2022-04-22 11:22:07 -07:00
Laurent Morichetti abf1b90017 Use ACTIVITY_DOMAIN_HIP_OPS instead of ACTIVITY_DOMAIN_HCC_OPS
Change-Id: I43fbac3d02011f74bf7b597519148ed0bd68ff98
2022-04-20 22:00:59 -07:00
Laurent Morichetti d3b166cf01 Remove roctracer_hcc.h
roctracer_hip.h now contains the definitions for the HCC_OPS domain.

Change-Id: I132c993110254050aaa68828f3ca80f368ad24bc
2022-04-20 22:00:59 -07:00
Laurent Morichetti c009df3327 Remove hip_act_cb_tracker.h
It only defines one class (hip_act_cb_tracker_t) that is only used
by roctracer.cpp.

Change-Id: I375a25bd363770d70a7b3b713223484a498cc3d1
2022-04-20 19:48:24 -07:00
Laurent Morichetti 9d728f74a1 Simplify memory_pool.h
Use the standard concurrent support library (std::thread, std::mutex,
st::condition_variable) instead of pthread.

Fix a mismatched memory allocation/deallocation when a custom allocator
is provided. The MemoryPool destructor was always using the default
allocator (using malloc/realloc/free) even if the pool memory was
allocated with the custom allocator.

Fix various thread safety issues and inefficiencies (spin loops).

Change-Id: I97592caa947f63463041bf43e00af9ebb5ff5886
2022-04-20 19:48:24 -07:00
Laurent Morichetti cd62d841fa Make roctracer_cb_table.h a private header
Move roctracer_cb_table.h to the src/core directory, as it should not
be exposed as a public header, and rename it callback_table.h

Change-Id: Ib448cbd32a275df0268d53bd8d1da0bdc9201470
2022-04-20 19:47:43 -07:00
Laurent Morichetti dc22139977 Address review comments from previous commit
Change-Id: I6629dd911de0d7fd08d7a863c172ec73f35fa3d1
2022-04-20 22:46:15 -04:00
Laurent Morichetti 15ab5d9cda Run clang-format on all source files
Change-Id: Ifb52ca306286b6b2d473821bed9db28e9f616d50
2022-04-20 22:45:54 -04:00
Laurent Morichetti 89f6880371 Simplify journal.h
Simplify implementation of journal.h.

Change-Id: I9e2e93fd3cd3391fdf182249f5c4c5ef3debae03
2022-04-20 19:43:16 -07:00
Laurent Morichetti 06a3da7c63 Fix copyright headers
Change-Id: I380d867fa5fb04e68b5b332e9abf33fbeb1e9418
2022-04-19 09:30:45 -07:00
Ammar Elwazir 57add1a6fa Revert "SWDEV-295522: Fixing Performance Issue"
This reverts commit e7327aaac7.

Reason for revert: Merged by mistake

Change-Id: I8c39c823d92cc20a238ca6120dde4b2fa9121e85
2022-04-07 06:55:34 -04:00
Ammar ELWazir e7327aaac7 SWDEV-295522: Fixing Performance Issue
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.

Change-Id: Ida25d3bfc72047afaa27326d697be76d97564334
2022-04-07 00:07:24 +00:00
Ammar ELWazir 43a36f8dd5 SWDEV-328300: supporting centos-9 for roctracer
Changing pthread_yield() to sched_yield() as pthread_yield() is deprecated in CentOS 9

Change-Id: I2961b61374e36995d0835f0e65b26c35f5eb8715
2022-03-18 01:26:39 +00:00
Ranjith Ramakrishnan ebda880c4a SWDEV-291455: Prefer rocm include path to hip include path
Change-Id: I1fa96e72169fac689a3a2ed38e988d7f5d18bf04
2022-02-14 14:21:32 -08:00
Saurabh Verma a7cd80b716 SWDEV-295878 Fix for seg fault when using --trace-start off
Change-Id: Ic76d814b3591f72db18319d78f34596dae1ddfee
2021-08-31 16:46:59 -05:00
Ammar ELWazir 513460bd41 SWDEV-294248 (Fixing Race Conditions):
Fixing race conditions that happened when enabling trace-period feature on the following code:
#include <hip/hip_runtime.h>
__global__ void
kernel ()
{
}
int
main (int argc, char **argv)
{
  for (size_t i = 0; i < 10000; ++i) {
    hipLaunchKernelGGL (kernel, 1, 1, 0, 0);
    hipDeviceSynchronize ();
  }
  return 0;
}

Change-Id: I4eb88a4a71efbad0f6483e7fb6e8e0c6a662860b
2021-08-11 22:48:55 -04:00
Ammar Elwazir 53fa06ad66 Merge "Fixing correlation_id_map" into amd-staging 2021-08-11 21:44:59 -04:00
Ammar Elwazir 682ebba1c2 Merge "Cosmetic change" into amd-staging 2021-08-11 20:47:53 -04:00
Ammar ELWazir b02586c587 Cosmetic change
Fixing variable names and removing un-needed references

Change-Id: I3ed2cee89e7dc599caf1726fe1eab1a913e5a93d
2021-08-09 14:13:57 -04:00
Ammar ELWazir 1e3ed06a9a Fixing correlation_id_map
Changing correlation_id_map to static instance instead of being a pointer and fixing the corresponding references

Change-Id: Id8a481a90b46831f91985a7e0523fd2869991aeb
2021-08-09 14:05:52 -04:00
Christophe Paquot e5e1258ef8 SWDEV-295205 - Remove KFD domain from roctracer
Change-Id: I2771cf43aa115bb466531bf887f7cc75e187f2ef
2021-07-22 10:12:45 -07:00
Christophe Paquot b04dfd5fdf SWDEV-281658 - Preserve the callback IDs enum ordering
Use HIP_API_ID_NONE to detect unsupported API instead of
HIP_API_ID_NUMBER which can grow with a new version of the API.
This HIP_API_ID_NONE enum has a fixed value of 0 so the
HIP_API_IDs really start at FIRST.

Change-Id: I760aa50ddf6fa6d46bf20555ad7d429335a53f97
2021-06-30 09:52:20 -04:00
Rachida Kebichi e5dd1e9b85 SWDEV-284863 Fixed several issues preventing memcpy info dump in csv
Change-Id: I4c6323311ce8314296e81a9b9b5d8adb485e0aa8
2021-05-25 11:10:52 -04:00