Files
Baraldi, Giovanni 4ca156e572 Thread trace and Trace Decoder API tests and samples (#416)
* Adding test and samples to decoder

* Fix sample

* Formatting

* Fix multi test

* Disable sample

* Fix tests

* Format

* Version fix

* Locking the decoder

* Add atomic

* Review comments

* Format

* Adding readme

* merge conflict and adding PCS+ATT test

* Review comments

* Properly disable PCS test

* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt

* Adding back env var test

* Name fix

* Preload sample

* Addressing review comments

* Update docs

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

[ROCm/rocprofiler-sdk commit: e898079a13]
2025-07-22 20:08:12 -05:00

27 wiersze
1.1 KiB
Markdown

# Thread Trace and ROCprof Trace Decoder
## Services
- Thread trace in device profiling mode
- ROCprof Trace Decoder decodes the received thread trace data
- Thread trace start/stop using roctx
## Properties
### [agent.cpp](agent.cpp):
- Configures thread trace in all GPU agents found with `rocprofiler_configure_device_thread_trace_service`
- Waits until `roctxProfilerResume` is called to start thread trace
- Stops tracing at `roctxProfilerPause`
- Receives the trace data in `shader_data_callback` and calls `rocprofiler_trace_decode` to decode the data
- `rocprofiler_trace_decode` calls `parse` (a lambda)
- `parse` receives the dedecoded data and increments hitcount/latencies by pc address
- At application end, `tool_fini` calls `gen_output_stream` to write the top hotspots into `thread_trace.log`
### [main.cpp](main.cpp):
- Defines a few different kernels and runs them
- The first loop iteration warms up the kernels
- The second iteration calls `roctxProfilerResume` to start thread trace
- After the loop ends, `roctxProfilerPause` is called to stop tracing