Граф коммитов

586 Коммитов

Автор SHA1 Сообщение Дата
Ammar ELWazir 1db8cdf99a Adding backward compatability
Adding roctracer_hcc.h for backward compatability to enable multiple components using it as tensorflow

Change-Id: Idfcdda9207277866e629e7bb9bfc0da835481217
2022-05-13 09:35:28 -05:00
Ammar ELWazir 1f9efecd4a Trace Period Optimization
Optimizing trace period to use std::threads as well as std::chrono sleep instead of sleep and usleep and catching up corner cases for ending before the trace period duration and some cosmetic clean up

Change-Id: Ia99f346bf71a3faad5dfdfc8d7a08f6c2b2cc0b9
2022-05-13 00:11:02 -05:00
Laurent Morichetti e1fa2cb5d5 run.sh: In case of error, also print the stderr log
Change-Id: I9a20bf2d755749b036788d7e2fce044a7f36eb2e
2022-05-12 20:16:09 -04:00
Laurent Morichetti 37ab921f02 Cleanup roctracer.cpp
Minor cosmetic changes.

Change-Id: Ie5a904c757aa933d83ca6e496726e47fe7032620
2022-05-12 20:15:54 -04:00
Laurent Morichetti bbe1db3810 Fix an intermittent failure in "tool flushing test"
The test (MatrixTranspose) and the tracer tool both write to stdout
which sometime causes a trace corruption.

Change the test to emit info messages to stderr instead of stdout,
leaving stdout for the tracer tool's exclusive use.

Change-Id: I18047dbcd9039b70dd24ef6e7e8e9d89b40bedd2
2022-05-12 20:15:37 -04:00
Ammar ELWazir 24f8a50b20 Removing missed backward compatability files
Change-Id: I4fdc69d508063e4ee3abdfa2d65ad5d3d64e68ca
2022-05-12 10:03:17 -05:00
Ammar ELWazir 2f5313a0c7 Fixing cmake_modules
Removing unused definitions and compile options
Using cmake variables to set the options needed
Changing the visibility to make it specific for the targets

Change-Id: I80cf0997cd28897d5a06a58c7225ba40dfc51e2d
2022-05-11 19:25:43 -04:00
Ammar Elwazir 3882091c71 Merge "Flush function fix" into amd-staging 2022-05-11 19:13:43 -04:00
Ammar ELWazir 80464525c7 Flush function fix
Using std::thread instead of pthreads and also atomic_bool to identify the end of the flush function so that the unload_tool can wait for it

Change-Id: Iea00d7e16c65d51db2d222e8b42f03f9caeb2067
2022-05-11 17:20:39 -04:00
Ammar ELWazir 6b16d37d65 Removing Backward compatability
removing the backward compatability file and making sure to use the right paths

Change-Id: I518d52c82e0c5878bd334713e7b1758bba79762d
2022-05-11 14:43:35 -04:00
Ammar ELWazir ed0e1f5cb8 Changing Installation docs
using build.sh rather than cmake in the readme

Change-Id: If3b80641497c0c967ec3340cb9ef546bf44824c3
2022-05-11 01:31:52 -04:00
Ammar ELWazir 7060b76927 Changing the set CMAKE_CXX_FLAGS set for fPIC to known cmake ideal way
Change-Id: I898de3d05feffee2d7d37cf62ac33afe2ecde85a
2022-05-10 22:38:13 -05:00
Laurent Morichetti a98476fe11 Fix the roctracer tests
14/15 tests pass, 1/15 intermittent failure (tool flushing test).

Change-Id: I36ed2900a1c51e584718993badeaefd48ad450a2
2022-05-10 14:58:08 -07:00
Laurent Morichetti 3f402eb6e9 Disallow copying or moving trace buffers
Change-Id: I104b8240a76c6d96ae176b0b26bdc2e4e5e3c180
2022-05-10 12:08:06 -07:00
Laurent Morichetti 67481bd295 Fix memory leaks in roctracer
Each thread has a thread-local record_pair_stack. The stack is
dynamically allocated on first use, but is not detroyed when the
thread exits.

Replaced record_pair_stack pointers with record_pair_stack instances,
the intances are constructed on first odr-use, and destructed when the
thread exits.

Also, converted the cb_journal and act_journal to instances.

Change-Id: I186ac29da477f194880a1ab599f4be5715a23063
2022-05-10 12:08:06 -07:00
Laurent Morichetti a794247c55 Optimize rotcx markers
Improve the roctx markers performance when the tracer is not engaged
(the application is not running with rocprof).

The performance of roctx push/pop, measured with:

-----------------------------------------------------------------------
  auto start = std::chrono::steady_clock::now();
  for (int i = 0; i < 10000000; ++i) {
    roctxRangePush ("A");
    roctxRangePop ();
  }
  auto end = std::chrono::steady_clock::now();
  std::cout << "ns = " << std::chrono::nanoseconds(end - start).count()
      / 10000000 << std::endl;
-----------------------------------------------------------------------

w/o rocprof | with rocprof | commit
       92ns |       770ns  | 0d6e132: Cleanup CallbackTable::Get
       28ns |       712ns  | 6421bd5: Cleanup ROCTX's implementation
       20ns |       664ns  | 7f0e5e5: Remove the roctx range message...
        6ns |       665ns  | this commit

Change-Id: Id679dcbd0fb190a3179be98a9b2c1db151efee3d
2022-05-10 12:08:06 -07:00
Laurent Morichetti 3d0198c395 Remove the roctx range message stack
The range message stack is mirrored in case ranges are pushed or popped
while tracing is stopped (by the tracer tool?). When a stop event is
reported, the tracer tool emits RangePop events by unwinding the stack,
then when the start event is reported, it emits RangePush events again
by unwinding the stack. The issue is that the RangePush events should
be emitted in reverse order.

For example:

RangePush(M1); RangePush(M2); \
  TracerStop; RangePop; RangePop; \
...; \
  TracerStart; RangePush(M2); RangePush(M1); \ <- In the wrong order
RangePop; RangePop;

It could be fixed by reversing the stack in RangeStackIterate but is it
worth it? The roctx range markers are supposed to be unintrusive so that
they can be left in the application even when it isn't being traced.

Simplifying the roctx API and reducing its added latency by removing
the range message stack mirroring seems like the better choise.

TODO: A future change should make roctx events immune to tracer start
and tracer stop requests. Or simply remove roctracer_start/stop.

Change-Id: Ie4d76afb5ce8d263848dcf1b599af394db56ddab
2022-05-10 12:08:06 -07:00
Laurent Morichetti 713db1fce5 Cleanup ROCTX's implementation
Remove thread_data_init. The C++ standard guarantees that the thread
local variable is initialized before its first odr-use and destructed
when the thread exits. Use a global initializer to set the reference
from the message stack instance in the map.

Remove roctracer_error_string. This does not belong to this library.
ROCTX does not expose errors to the application. The only functions
returning errors are returning -1 (Push/Pop).

Remove memory leaks due to strdup on the ranges messages. The memory
for the messages is guaranteed to be valid for the duration of the
callback, and it is the application's responsibility to strdup the
strings if it needs to extend the message's lifetime.

Add a lock to the RegisterApiCallback implementation. Iterating the
message stack map must be synchronized as a new thread could be adding
a new value to the map.

Change-Id: Iaf5b07ebc9efe4061cb01327d4c7034888727816
2022-05-10 12:08:06 -07:00
Laurent Morichetti 6e4055503c Merge "Cleanup CallbackTable::Get" into amd-staging 2022-05-10 14:55:20 -04:00
Laurent Morichetti e8909158b3 Merge "Remove unused open_output_file/close_output_file" into amd-staging 2022-05-10 14:55:10 -04:00
Laurent Morichetti 9cecf30131 Merge "Fix a hang in './test/hsa/ctrl ctrl_hsa_input_trace'" into amd-staging 2022-05-10 14:54:11 -04:00
Laurent Morichetti fe0adfd37b Merge "Remove now unused hsa_rsrc_factory" into amd-staging 2022-05-10 14:54:01 -04:00
Laurent Morichetti 7c4f7625b1 Merge "Consolidate all sources of timestamps" into amd-staging 2022-05-10 14:53:36 -04:00
Laurent Morichetti 4aeb76f7a8 Cleanup CallbackTable::Get
Make CallbackTable::Get return the callback_function/user_arg pair
as an actual return value instead of returning it through arguments
pointers.

Change-Id: Ia2dfcdad8c237a09620518ad67af94add47220da
2022-05-10 08:13:18 -07:00
Laurent Morichetti cb040b7def Remove unused open_output_file/close_output_file
Change-Id: I0e5118b814617cb605949c99e5f0dc235f6edac0
2022-05-10 08:13:18 -07:00
Laurent Morichetti 11887f596a Fix a hang in './test/hsa/ctrl ctrl_hsa_input_trace'
At the end of the test, the tracer tool is unloaded and the active
memory pools are flushed. In the flush callback, to get the activity
operation string, the RocpLoader instance is neeeded, and if the
RocpLoader is not already loaded, it attempts to dlopen the rocprofiler
library.

Calling dlopen from a global destructor hangs because the dynamic
loader lock is already owned (e.g. by dlclose).

To temporarily work around the issue, instanciate the RocpLoader when
the activities needing it are enabled.

Change-Id: I712c66d88c43694fe53a95d6a61d7b22abb75262
2022-05-10 08:13:18 -07:00
Laurent Morichetti 4ced94b9a2 Remove now unused hsa_rsrc_factory
Change-Id: I66175eb9fae2e7e61400af77a0c89be9c39e770e
2022-05-10 08:13:18 -07:00
Laurent Morichetti f8462b8637 Consolidate all sources of timestamps
System clock timestamps should only come from a single source:
util::timestamp_ns(). Externally, this function is exposed as
roctracer_get_timestamp() (used by the tracer tool).

Removed the now unused HSA Runtime Utilities which were never part
of the ROCtracer API.

Change-Id: I044b7f4da60fd8fdb771b0c877622a3143f0e815
2022-05-10 08:13:09 -07:00
Ammar ELWazir 502ea835b9 Solving issue with using clang as the compiler
Change-Id: I4fa7b24af7008a30b0300b57ccbf1bc82dbfd66e
2022-05-09 17:41:33 -05:00
Laurent Morichetti f46d1717cc Remove unused ROCTX_CLOCK_TIME
Change-Id: I9696bb2892fe6fe21089462d624643b7a782fb71
2022-05-04 19:30:37 -04:00
Laurent Morichetti 6d6017249a Remove the tracer tool's dependency on hsa_rsrc_factory
hsa_rsrc_factory was only used to enumerate the agents types and pools.
The pools don't seem to be used by bin/mem_manager.py, so I only
ported the agent enumeration using hsa_iterate_agents.

Change-Id: Idd586aa13db303cf92962a6392771b7bf38b758f
2022-05-04 19:28:53 -04:00
Ammar ELWazir 78869032ad SWDEV-335490: Unused variables
Compilers doesn't see assert as a usage of the variables, I added [[maybe_unused]] to the variables that are used only in assert to make sure that the compiler is skipping them in the check. Note: [[maybe_unused]] is introduced in C++17

Change-Id: I96bb53cb2ab55ee7120681c2d279271c0075095d
2022-05-04 11:24:28 -04:00
Ammar ELWazir 5e012541c5 Removing HIP_API_PROF_STRING from the tracer_tool
The else part was not used as it was only using the hipApiString to format the data to string

Change-Id: I376721c478cffba0890436ca8895dfe2a7641570
2022-05-04 09:46:56 -04:00
Laurent Morichetti 046df32729 Fix race conditions in TraceBuffer
1) The Entry's state was published after making the record avaiable,
   so a thread flushing the records could see an unitialized record.
2) data_ and write_pointer_ could become out of sync. write_pointer_
   could be indexing into another buffer than what data_ was pointing
   to.
3) GetEntry could get a nullptr free_buffer_ because multiple threads
   could acquire the work_mutex_ before the work_thread_ could wake up,
   or between allocate_worker's loop iterations.

Change-Id: I6f0a015557888eeeaa75a8bce7fde8de276d11dd
2022-05-03 21:56:46 -04:00
Laurent Morichetti 61f35b0204 Move trace_buffer.h to the tool directory
A trace buffer is used to efficiently store synchronous event records
so that they can be processed later, possibly in a different thread,
when the buffer is flushed. This helps reduce the latency added by
tracing API calls.

The API does not need to use trace buffers as synchronous events are
directly reported to the client with callbacks, and asynchronous events
(activities) are saved in memory pools.

The implentation of HSA asynchronous memory copy activities was using
a trace buffer shared with the tracer tool to write the records to a
file (async_copy_trace.txt), instead of using a memory pool and
reporting the activity to the client.

Removed the asynchronous memory copies trace buffer, and updated
hsa_async_copy_handler to use the pool specified when the activity
was enabled.

Updated the tracer tool to read HSA_OP_ID_COPY records out of the
default memory pool and write them to async_copy_trace.txt.

Move trace_buffer.h to test/tool as tracer_tool.cpp is now the only
file using it.

Change-Id: Ida95aba2eaf3c3f2a979ed6c2b060374017b7424
2022-05-03 21:56:28 -04:00
Tony Tye 48f4c82685 Merge "Add doxygen to roctracer.h" into amd-staging 2022-05-03 20:00:10 -04:00
Tony Tye 1f630a9291 Add doxygen to roctracer.h
Change-Id: Ie542399e990e02482ed740d99c6afe4b95b1f6f4
2022-04-30 00:33:05 +00:00
Laurent Morichetti 200e27f12d Add a trace_buffer directed test
This test stresses the concurrent writing of trace buffer records while
frequently allocating new storage to hold the records.

Due to race conditions, this test fails with the current trace buffer
implementation.

Change-Id: I0b77c64005e776319bf21f1ee1e6d7c99ddccfff
2022-04-29 08:52:13 -07:00
Laurent Morichetti 5963363484 Fix assertions
Replace EXC_ABORT() checks with assertions.

Rewrite the exception class to use std::runtime_error (as it
already handles the std::string/char* message argument).

Change-Id: I48e31924f3aea1328e6562ab6bb06ec373fd5d5e
2022-04-27 11:24:26 -07:00
Laurent Morichetti 0d7d56eea5 Fix a SEGV when running --roctx-trace
There's a typo in RegisterApiCallback, roctx::cb_table.Get should be
roctx::cb_table.Set.

Change-Id: I47ec8ac666f783ff4e03f35d13e375e645899900
2022-04-27 12:14:32 -04:00
Ranjith Ramakrishnan 7f05496a87 Merge "Populate roctracer.h wrapper file with orginal file contents as dead code" into amd-staging 2022-04-27 02:20:26 -04:00
Laurent Morichetti 18f60efe05 Fix typos/spelling errors
Change-Id: Idec1cb8fab91c30f99563bc7dd4db1faeb2db954
2022-04-26 12:39:38 -07:00
Laurent Morichetti 6b06322578 Remove unused proxy utilities
The proxy queue implements packet interception to enable timestamps
collection. As it is, the roctracer is not intercepting packets, and
instead relies on the rocprofiler tool to collect the timestamps for
kernel dispatches.

This is an issue as the roctracer API does not implement HSA_OPS
activities for kernel dispatches. This will be addressed in a future
commit.

Change-Id: Ib6a778a513410bec4579f223a9d9e9fd9b6054df
2022-04-26 15:26:26 -04:00
Laurent Morichetti b352eedac6 Fix the static library build
Building with -DLIBRARY_TYPE=STATIC fails with 3 undefined symbols.
Add weak symbols to satisfy the linker (mirror what is done for the
other Loader symbols).

Change-Id: I8a2878def21d5f500b0764ceacb4e5255e1111c5
2022-04-26 15:26:10 -04:00
Ranjith Ramakrishnan 8ca752ce2c Populate roctracer.h wrapper file with orginal file contents as dead code
Backward comaptibility for components that search for  contents in roctracer.h
Improvements: Removed redundant code for setting and unsetting variables
Added header template file in source code instead of generating it on build time

Change-Id: I96aeb7f2a6d53d45eb5aeb5300024cd22dad1324
2022-04-26 03:09:35 -07:00
Ammar ELWazir e4569c41fe SWDEV-295522: Fixing Performance Issue
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.

Change-Id: Ifbc529278cea54dd23e6086aa9b9ea2df952d5dd
2022-04-22 18:51:49 -04:00
Laurent Morichetti dc8717a6b5 Allow MemoryPool::Write while Flushing
Before this change, when a producer was blocked by a flush operation,
no other producer could write to the memory pool.  This change allows
other producer threads to continue to write by releasing the producer
lock before waiting on the consumer condition variable.

Change-Id: Idc1c07173d2edb18fbe1a61961f10c02e7ca8c20
2022-04-22 11:22:23 -07:00
Laurent Morichetti 121a84b449 Remove HCC_EXC_RAISING and HIP_EXC_RAISING
HCC_EXC_RAISING and HIP_EXC_RAISING don't add much value, so to
simplify, only keep EXC_RAISING and EXC_ABORT.

Change-Id: Ifdc54981bb682fe68b418cdc95ecebe668e3dcf6
2022-04-22 11:22:23 -07:00
Laurent Morichetti 85552ea3a0 Move the HccLoader activities into the HipLoader
The HCC runtime is no longer used, so move all the remaining
activities in the HipApi loader and remove the HccLoader.

Change-Id: I845c04ca275a474526840315bae0ad1a4ce02257
2022-04-22 11:22:07 -07:00
Laurent Morichetti abf1b90017 Use ACTIVITY_DOMAIN_HIP_OPS instead of ACTIVITY_DOMAIN_HCC_OPS
Change-Id: I43fbac3d02011f74bf7b597519148ed0bd68ff98
2022-04-20 22:00:59 -07:00