Files
Baraldi, Giovanni 4ca156e572 Thread trace and Trace Decoder API tests and samples (#416)
* Adding test and samples to decoder

* Fix sample

* Formatting

* Fix multi test

* Disable sample

* Fix tests

* Format

* Version fix

* Locking the decoder

* Add atomic

* Review comments

* Format

* Adding readme

* merge conflict and adding PCS+ATT test

* Review comments

* Properly disable PCS test

* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt

* Adding back env var test

* Name fix

* Preload sample

* Addressing review comments

* Update docs

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

[ROCm/rocprofiler-sdk commit: e898079a13]
2025-07-22 20:08:12 -05:00

1.1 KiB

Thread Trace and ROCprof Trace Decoder

Services

  • Thread trace in device profiling mode
  • ROCprof Trace Decoder decodes the received thread trace data
  • Thread trace start/stop using roctx

Properties

agent.cpp:

  • Configures thread trace in all GPU agents found with rocprofiler_configure_device_thread_trace_service
  • Waits until roctxProfilerResume is called to start thread trace
  • Stops tracing at roctxProfilerPause
  • Receives the trace data in shader_data_callback and calls rocprofiler_trace_decode to decode the data
  • rocprofiler_trace_decode calls parse (a lambda)
  • parse receives the dedecoded data and increments hitcount/latencies by pc address
  • At application end, tool_fini calls gen_output_stream to write the top hotspots into thread_trace.log

main.cpp:

  • Defines a few different kernels and runs them
  • The first loop iteration warms up the kernels
  • The second iteration calls roctxProfilerResume to start thread trace
  • After the loop ends, roctxProfilerPause is called to stop tracing