Optimizing trace period to use std::threads as well as std::chrono sleep instead of sleep and usleep and catching up corner cases for ending before the trace period duration and some cosmetic clean up
Change-Id: Ia99f346bf71a3faad5dfdfc8d7a08f6c2b2cc0b9
The test (MatrixTranspose) and the tracer tool both write to stdout
which sometime causes a trace corruption.
Change the test to emit info messages to stderr instead of stdout,
leaving stdout for the tracer tool's exclusive use.
Change-Id: I18047dbcd9039b70dd24ef6e7e8e9d89b40bedd2
Removing unused definitions and compile options
Using cmake variables to set the options needed
Changing the visibility to make it specific for the targets
Change-Id: I80cf0997cd28897d5a06a58c7225ba40dfc51e2d
Using std::thread instead of pthreads and also atomic_bool to identify the end of the flush function so that the unload_tool can wait for it
Change-Id: Iea00d7e16c65d51db2d222e8b42f03f9caeb2067
The range message stack is mirrored in case ranges are pushed or popped
while tracing is stopped (by the tracer tool?). When a stop event is
reported, the tracer tool emits RangePop events by unwinding the stack,
then when the start event is reported, it emits RangePush events again
by unwinding the stack. The issue is that the RangePush events should
be emitted in reverse order.
For example:
RangePush(M1); RangePush(M2); \
TracerStop; RangePop; RangePop; \
...; \
TracerStart; RangePush(M2); RangePush(M1); \ <- In the wrong order
RangePop; RangePop;
It could be fixed by reversing the stack in RangeStackIterate but is it
worth it? The roctx range markers are supposed to be unintrusive so that
they can be left in the application even when it isn't being traced.
Simplifying the roctx API and reducing its added latency by removing
the range message stack mirroring seems like the better choise.
TODO: A future change should make roctx events immune to tracer start
and tracer stop requests. Or simply remove roctracer_start/stop.
Change-Id: Ie4d76afb5ce8d263848dcf1b599af394db56ddab
System clock timestamps should only come from a single source:
util::timestamp_ns(). Externally, this function is exposed as
roctracer_get_timestamp() (used by the tracer tool).
Removed the now unused HSA Runtime Utilities which were never part
of the ROCtracer API.
Change-Id: I044b7f4da60fd8fdb771b0c877622a3143f0e815
hsa_rsrc_factory was only used to enumerate the agents types and pools.
The pools don't seem to be used by bin/mem_manager.py, so I only
ported the agent enumeration using hsa_iterate_agents.
Change-Id: Idd586aa13db303cf92962a6392771b7bf38b758f
1) The Entry's state was published after making the record avaiable,
so a thread flushing the records could see an unitialized record.
2) data_ and write_pointer_ could become out of sync. write_pointer_
could be indexing into another buffer than what data_ was pointing
to.
3) GetEntry could get a nullptr free_buffer_ because multiple threads
could acquire the work_mutex_ before the work_thread_ could wake up,
or between allocate_worker's loop iterations.
Change-Id: I6f0a015557888eeeaa75a8bce7fde8de276d11dd
A trace buffer is used to efficiently store synchronous event records
so that they can be processed later, possibly in a different thread,
when the buffer is flushed. This helps reduce the latency added by
tracing API calls.
The API does not need to use trace buffers as synchronous events are
directly reported to the client with callbacks, and asynchronous events
(activities) are saved in memory pools.
The implentation of HSA asynchronous memory copy activities was using
a trace buffer shared with the tracer tool to write the records to a
file (async_copy_trace.txt), instead of using a memory pool and
reporting the activity to the client.
Removed the asynchronous memory copies trace buffer, and updated
hsa_async_copy_handler to use the pool specified when the activity
was enabled.
Updated the tracer tool to read HSA_OP_ID_COPY records out of the
default memory pool and write them to async_copy_trace.txt.
Move trace_buffer.h to test/tool as tracer_tool.cpp is now the only
file using it.
Change-Id: Ida95aba2eaf3c3f2a979ed6c2b060374017b7424
This test stresses the concurrent writing of trace buffer records while
frequently allocating new storage to hold the records.
Due to race conditions, this test fails with the current trace buffer
implementation.
Change-Id: I0b77c64005e776319bf21f1ee1e6d7c99ddccfff
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.
Change-Id: Ifbc529278cea54dd23e6086aa9b9ea2df952d5dd
Removing DEBUG_TRACES and the unnecessary use of roctracer_op_string, made the MS app reporting 78 to 81 stable samples per second, depending on the type of the trace, while the main app without rocprof reports 100 to 106. More detailed numbers will be posted in the ticket.
Change-Id: Ida25d3bfc72047afaa27326d697be76d97564334
Exchanging the git clone of the hsa-class to a local downloaded version pushed to the roctracer repo
Change-Id: Id45a38b2d355102c2e0dee1e4bfde50398369047
Package installed /opt/rocm
Soft links and wrapper header files installed /opt/rocm/roctracer for backward compatibility
tracer_tool library renamed to roctracer_tool and installed in /opt/rocm/lib/roctracer
Change-Id: Ica7518c5ef2e591715121cbc942b69dff29233d3
The manually written Makefiles in the test directory are not safe to
use by more than one job. For example we see things like
all: clean $(EXECUTABLE)
which says that the 'all' target depends on the 'clean' and
'$(EXECUTABLE)' targets. If make is invoked with -j2 then the clean
and '${EXECUTABLE}' can be built in parallel, so the clean can delete
things whilst they are being built!
Change-Id: I9c56db4c629081b8d812dad45dfd4afde10e481f
Signed-off-by: Icarus Sparry <icarus.sparry@amd.com>
on_exit() registered exit_handler function, but in the case of OpenMP applications the registered exit func exit_handler was called after the dlclose of the library. So we removed exit_handler from roctracer as it is already handled in both rocprofiler and rocprof script.
Change-Id: I7c3d42e6ccc282e713b48b4a7faec4935e7a2600
Use HIP_API_ID_NONE to detect unsupported API instead of
HIP_API_ID_NUMBER which can grow with a new version of the API.
This HIP_API_ID_NONE enum has a fixed value of 0 so the
HIP_API_IDs really start at FIRST.
Change-Id: I760aa50ddf6fa6d46bf20555ad7d429335a53f97
Cmake will do the post processing required for RPATH or any other
needed for the libraries only if installed as libraries not as
regular files
FIX: SWDEV-287893
Change-Id: I9cf478fcd23b9f2e8b3bdd81aa566cad3ec2a5e3
As this snippet shows, HCC is no longer supported by roctracer:
#if HIP_VDI
...
#else
#error HCC support dropped
#endif
Removed HIP_VDI from the CMakeLists.txt, and the source code.
Change-Id: Ib273da2a5af6d67fa1b021a7eca3ff785c8b9c73
Add numa lib as this will be required with a static thunk
Look for static thunk of shared thunk cannot be found
Signed-off-by: Kent Russell <kent.russell@amd.com>
Change-Id: I5de63e0a56a8946132ccbb7140a19a82a70b951d