4ca156e572
* Adding test and samples to decoder
* Fix sample
* Formatting
* Fix multi test
* Disable sample
* Fix tests
* Format
* Version fix
* Locking the decoder
* Add atomic
* Review comments
* Format
* Adding readme
* merge conflict and adding PCS+ATT test
* Review comments
* Properly disable PCS test
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
* Adding back env var test
* Name fix
* Preload sample
* Addressing review comments
* Update docs
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
[ROCm/rocprofiler-sdk commit: e898079a13]
27 wiersze
1.1 KiB
Markdown
27 wiersze
1.1 KiB
Markdown
# Thread Trace and ROCprof Trace Decoder
|
|
|
|
## Services
|
|
|
|
- Thread trace in device profiling mode
|
|
- ROCprof Trace Decoder decodes the received thread trace data
|
|
- Thread trace start/stop using roctx
|
|
|
|
## Properties
|
|
|
|
### [agent.cpp](agent.cpp):
|
|
|
|
- Configures thread trace in all GPU agents found with `rocprofiler_configure_device_thread_trace_service`
|
|
- Waits until `roctxProfilerResume` is called to start thread trace
|
|
- Stops tracing at `roctxProfilerPause`
|
|
- Receives the trace data in `shader_data_callback` and calls `rocprofiler_trace_decode` to decode the data
|
|
- `rocprofiler_trace_decode` calls `parse` (a lambda)
|
|
- `parse` receives the dedecoded data and increments hitcount/latencies by pc address
|
|
- At application end, `tool_fini` calls `gen_output_stream` to write the top hotspots into `thread_trace.log`
|
|
|
|
### [main.cpp](main.cpp):
|
|
|
|
- Defines a few different kernels and runs them
|
|
- The first loop iteration warms up the kernels
|
|
- The second iteration calls `roctxProfilerResume` to start thread trace
|
|
- After the loop ends, `roctxProfilerPause` is called to stop tracing
|