Tracing Documentation (#997)
* Update callback_services.md
* Callback tracing services
* Intercept table
* Buffer tracing
[ROCm/rocprofiler-sdk commit: cfbac19640]
This commit is contained in:
committed by
GitHub
parent
6c6adddb5c
commit
8cea0cae2e
@@ -2,12 +2,250 @@
|
||||
|
||||
For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field.
|
||||
|
||||
## Buffered Tracing Services
|
||||
|
||||
## Overview
|
||||
|
||||
In buffered approach, callbacks are receieved for batches of records from an internal (background) thread. Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`.
|
||||
In buffered approach, callbacks are receieved for batches of records from an internal (background) thread.
|
||||
Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`. Configuring
|
||||
a buffer tracing service requires the creation of a buffer. When the buffer is "flushed", either implicitly
|
||||
or explicitly, a callback to the tool will be invoked which provides an array of one or more buffer records.
|
||||
A buffer can be explicitly flushed via the `rocprofiler_flush_buffer` function.
|
||||
|
||||
## HSA API Tracing
|
||||
## Subscribing to Buffer Tracing Services
|
||||
|
||||
## Kernel Tracing
|
||||
During tool initialization, tools configure callback tracing via the `rocprofiler_configure_buffer_tracing_service`
|
||||
function. However, before invoking `rocprofiler_configure_buffer_tracing_service`, the tool must create a buffer
|
||||
for the tracing records.
|
||||
|
||||
### Creating a Buffer
|
||||
|
||||
```cpp
|
||||
rocprofiler_status_t
|
||||
rocprofiler_create_buffer(rocprofiler_context_id_t context,
|
||||
size_t size,
|
||||
size_t watermark,
|
||||
rocprofiler_buffer_policy_t policy,
|
||||
rocprofiler_buffer_tracing_cb_t callback,
|
||||
void* callback_data,
|
||||
rocprofiler_buffer_id_t* buffer_id);
|
||||
```
|
||||
|
||||
The `size` parameter is the size of the buffer in bytes and will be rounded up to the nearest
|
||||
memory page size (defined by `sysconf(_SC_PAGESIZE)`); the default memory page size on Linux
|
||||
is 4096 bytes (4 KB).
|
||||
|
||||
The `watermark` parameter specifies the number of bytes at which
|
||||
the buffer should be "flushed", i.e. when the records in the buffer should invoke the
|
||||
`callback` parameter to deliver the records to the tool. For example, if a buffer has a size
|
||||
of 4096 bytes and the watermark is set to 48 bytes, six 8-byte records can be placed in the
|
||||
buffer before `callback` is invoked. However, every 64-byte record that is placed in the
|
||||
buffer will trigger a flush. It is safe to set the `watermark` to any value between
|
||||
zero and the buffer size.
|
||||
|
||||
The `policy` parameter specifies the behavior for when a record is larger than the
|
||||
amount of free space in the current buffer. For example, if a buffer has a size of
|
||||
4000 bytes with a watermark set to 4000 bytes and 3998 of the bytes in the buffer
|
||||
have been populated with records, the `policy` dictates how to handle an incoming record >
|
||||
2 bytes. The `ROCPROFILER_BUFFER_POLICY_DISCARD` policy dictates that all records greater
|
||||
than should 2 bytes should be dropped until the tool _explicitly_ flushes the buffer via
|
||||
a `rocprofiler_flush_buffer` function call whereas the `ROCPROFILER_BUFFER_POLICY_LOSSLESS`
|
||||
policy dictates that the current buffer should be swapped out for an empty buffer and placed
|
||||
in that new buffer and former (full) buffer should be _implicitly_ flushed.
|
||||
|
||||
The `callback` parameter is the function that rocprofiler-sdk should invoke when flushing
|
||||
the buffer; the value of the `callback_data` parameter will be passed as one of the arguments
|
||||
to the `callback` function.
|
||||
|
||||
The `buffer_id` parameter is an output parameter for the function call and will have a
|
||||
non-zero handle field after successful buffer creation.
|
||||
|
||||
### Creating a Dedicated Thread for Buffer Callbacks
|
||||
|
||||
By default, all buffers will use the same (default) background thread created by rocprofiler-sdk to
|
||||
invoke their callback. However, rocprofiler-sdk provides an interface for tools to specify the
|
||||
creation of an additional background thread for one or more of their buffers.
|
||||
|
||||
Callback threads for buffers are created via the `rocprofiler_create_callback_thread` function:
|
||||
|
||||
```cpp
|
||||
rocprofiler_status_t
|
||||
rocprofiler_create_callback_thread(rocprofiler_callback_thread_t* cb_thread_id);
|
||||
```
|
||||
|
||||
Buffers are assigned to that callback thread via the `rocprofiler_assign_callback_thread` function:
|
||||
|
||||
```cpp
|
||||
rocprofiler_status_t
|
||||
rocprofiler_assign_callback_thread(rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler_callback_thread_t cb_thread_id);
|
||||
```
|
||||
|
||||
#### Buffer Callback Thread Creation and Assignment Example
|
||||
|
||||
```cpp
|
||||
{
|
||||
// create a context
|
||||
auto context_id = rocprofiler_context_id_t{};
|
||||
rocprofiler_create_context(&context_id);
|
||||
|
||||
// create a buffer associated with the context
|
||||
auto buffer_id = rocprofiler_buffer_id_t{};
|
||||
rocprofiler_create_buffer(context_id, ..., &buffer_id);
|
||||
|
||||
// specify that a new callback thread should be created and provide
|
||||
// and assign the identifier for it to the "thr_id" variable
|
||||
auto thr_id = rocprofiler_callback_thread_t{};
|
||||
rocprofiler_create_callback_thread(&thr_id);
|
||||
|
||||
// assign the buffer callback to be delivered on this thread
|
||||
rocprofiler_assign_callback_thread(buffer_id, thr_id);
|
||||
}
|
||||
```
|
||||
|
||||
### Configuring Buffer Tracing Services
|
||||
|
||||
```cpp
|
||||
rocprofiler_status_t
|
||||
rocprofiler_configure_buffer_tracing_service(rocprofiler_context_id_t context_id,
|
||||
rocprofiler_buffer_tracing_kind_t kind,
|
||||
rocprofiler_tracing_operation_t* operations,
|
||||
size_t operations_count,
|
||||
rocprofiler_buffer_id_t buffer_id);
|
||||
```
|
||||
|
||||
The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
|
||||
Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
|
||||
For each domain, there are (often) various "operations", which can be used to restrict the callbacks
|
||||
to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
|
||||
which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
|
||||
parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
|
||||
of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
|
||||
size of the array for the `operations` and `operations_count` parameter.
|
||||
|
||||
Similar to `rocprofiler_configure_callback_tracing_service`,
|
||||
`rocprofiler_configure_buffer_tracing_service` will return an error if a buffer service for given context
|
||||
and given domain is configured more than once.
|
||||
|
||||
#### Example
|
||||
|
||||
```cpp
|
||||
{
|
||||
auto ctx = rocprofiler_context_id_t{};
|
||||
// ... creation of context, etc. ...
|
||||
|
||||
// buffer parameters
|
||||
constexpr auto KB = 1024; // 1024 bytes
|
||||
constexpr auto buffer_size = 16 * KB;
|
||||
constexpr auto watermark = 15 * KB;
|
||||
constexpr auto policy = ROCPROFILER_BUFFER_POLICY_LOSSLESS;
|
||||
|
||||
// buffer handle
|
||||
auto buffer_id = rocprofiler_buffer_id_t{};
|
||||
|
||||
// create a buffer associated with the context
|
||||
rocprofiler_create_buffer(
|
||||
context_id, buffer_size, watermark, policy, callback_func, nullptr, &buffer_id);
|
||||
|
||||
// configure HIP runtime API function records to be placed in buffer
|
||||
rocprofiler_configure_buffer_tracing_service(
|
||||
ctx, ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API, nullptr, 0, buffer_id);
|
||||
|
||||
// configure kernel dispatch records to be placed in buffer
|
||||
// (more than one service can use the same buffer)
|
||||
rocprofiler_configure_buffer_tracing_service(
|
||||
ctx, ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH, nullptr, 0, buffer_id);
|
||||
|
||||
// ... etc. ...
|
||||
}
|
||||
```
|
||||
|
||||
## Buffer Tracing Callback Function
|
||||
|
||||
Rocprofiler-sdk buffer tracing callback functions have the signature:
|
||||
|
||||
```cpp
|
||||
typedef void (*rocprofiler_buffer_tracing_cb_t)(rocprofiler_context_id_t context,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler_record_header_t** headers,
|
||||
size_t num_headers,
|
||||
void* data,
|
||||
uint64_t drop_count);
|
||||
```
|
||||
|
||||
The `rocprofiler_record_header_t` data type provides three pieces of information:
|
||||
|
||||
1. Category (`rocprofiler_buffer_category_t`)
|
||||
2. Kind
|
||||
3. Payload
|
||||
|
||||
The category is used to distinguish the classification of the buffer record. For all
|
||||
services configured via `rocprofiler_configure_buffer_tracing_service`, the category will
|
||||
be equal to the value of `ROCPROFILER_BUFFER_CATEGORY_TRACING`. The meaning of the kind
|
||||
field is dependent on the category but when the category is `ROCPROFILER_BUFFER_CATEGORY_TRACING`,
|
||||
the kind value will be equivalent to the is used
|
||||
to distinguish the `rocprofiler_buffer_tracing_kind_t` value passed to
|
||||
`rocprofiler_configure_buffer_tracing_service`, e.g. `ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH`.
|
||||
Once the category and kind have been determined, the payload can be casted:
|
||||
|
||||
```cpp
|
||||
{
|
||||
if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
|
||||
header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_hip_api_record_t*>(header->payload);
|
||||
|
||||
// ... etc. ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Buffer Tracing Callback Function Example
|
||||
|
||||
```cpp
|
||||
void
|
||||
buffer_callback_func(rocprofiler_context_id_t context,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler_record_header_t** headers,
|
||||
size_t num_headers,
|
||||
void* user_data,
|
||||
uint64_t drop_count)
|
||||
{
|
||||
for(size_t i = 0; i < num_headers; ++i)
|
||||
{
|
||||
auto* header = headers[i];
|
||||
|
||||
if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
|
||||
header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_hip_api_record_t*>(header->payload);
|
||||
|
||||
// ... etc. ...
|
||||
}
|
||||
else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
|
||||
header->kind == ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH)
|
||||
{
|
||||
auto* record =
|
||||
static_cast<rocprofiler_buffer_tracing_kernel_dispatch_record_t*>(header->payload);
|
||||
|
||||
// ... etc. ...
|
||||
}
|
||||
else
|
||||
{
|
||||
throw std::runtime_error{"unhandled record header category + kind"};
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Buffer Tracing Record
|
||||
|
||||
Unlike callback tracing records, there is no common set of data for each buffer tracing record. However,
|
||||
many buffer tracing records contain a `kind` field and an `operation` field.
|
||||
The name of a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_name` function.
|
||||
The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_operation_name`
|
||||
function. One can also iterate over all the buffer tracing kinds and operations for each tracing kind via the
|
||||
`rocprofiler_iterate_buffer_tracing_kinds` and `rocprofiler_iterate_buffer_tracing_kind_operations` functions.
|
||||
|
||||
The buffer tracing record data types can be found in the `rocprofiler-sdk/buffer_tracing.h` header
|
||||
(`source/include/rocprofiler-sdk/buffer_tracing.h` in the [rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk)).
|
||||
|
||||
@@ -2,6 +2,336 @@
|
||||
|
||||
## Overview
|
||||
|
||||
Callback tracing services provide immediate callbacks to a tool on the current CPU thread when a given event occurs.
|
||||
For example, when tracing an API function, e.g. `hipSetDevice`, callback tracing invokes a user-specified callback
|
||||
before and after the traced function executes on the thread which is invoking the API function.
|
||||
|
||||
## Subscribing to Callback Tracing Services
|
||||
|
||||
During tool initialization, tools configure callback tracing via the `rocprofiler_configure_callback_tracing_service`
|
||||
function:
|
||||
|
||||
```cpp
|
||||
rocprofiler_status_t
|
||||
rocprofiler_configure_callback_tracing_service(rocprofiler_context_id_t context_id,
|
||||
rocprofiler_callback_tracing_kind_t kind,
|
||||
rocprofiler_tracing_operation_t* operations,
|
||||
size_t operations_count,
|
||||
rocprofiler_callback_tracing_cb_t callback,
|
||||
void* callback_args);
|
||||
```
|
||||
|
||||
The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
|
||||
Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
|
||||
For each domain, there are (often) various "operations", which can be used to restrict the callbacks
|
||||
to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
|
||||
which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
|
||||
parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
|
||||
of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
|
||||
size of the array for the `operations` and `operations_count` parameter.
|
||||
|
||||
`rocprofiler_configure_callback_tracing_service` will return an error if a callback service for given context
|
||||
and given domain is configured more than once. For example, if one only wanted to trace two functions within
|
||||
the HIP runtime API, `hipGetDevice` and `hipSetDevice`, the following code would accomplish this objective:
|
||||
|
||||
```cpp
|
||||
{
|
||||
auto ctx = rocprofiler_context_id_t{};
|
||||
// ... creation of context, etc. ...
|
||||
|
||||
// array of operations (i.e. API functions)
|
||||
auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
|
||||
ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
|
||||
ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
|
||||
};
|
||||
|
||||
rocprofiler_configure_callback_tracing_service(ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
|
||||
operations.data(),
|
||||
operations.size(),
|
||||
callback_func,
|
||||
nullptr);
|
||||
// ... etc. ...
|
||||
}
|
||||
```
|
||||
|
||||
But the following code would be invalid:
|
||||
|
||||
```cpp
|
||||
{
|
||||
auto ctx = rocprofiler_context_id_t{};
|
||||
// ... creation of context, etc. ...
|
||||
|
||||
// array of operations (i.e. API functions)
|
||||
auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
|
||||
ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
|
||||
ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
|
||||
};
|
||||
|
||||
for(auto op : operations)
|
||||
{
|
||||
// after the first iteration, will return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED
|
||||
rocprofiler_configure_callback_tracing_service(ctx,
|
||||
ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
|
||||
&op,
|
||||
1,
|
||||
callback_func,
|
||||
nullptr);
|
||||
}
|
||||
|
||||
// ... etc. ...
|
||||
}
|
||||
```
|
||||
|
||||
## Callback Tracing Callback Function
|
||||
|
||||
Rocprofiler-sdk callback tracing callback functions have the signature:
|
||||
|
||||
```cpp
|
||||
typedef void (*rocprofiler_callback_tracing_cb_t)(rocprofiler_callback_tracing_record_t record,
|
||||
rocprofiler_user_data_t* user_data,
|
||||
void* callback_data)
|
||||
```
|
||||
|
||||
The `record` parameter contains the information to uniquely identify a tracing record type and has the
|
||||
following definition:
|
||||
|
||||
```cpp
|
||||
typedef struct rocprofiler_callback_tracing_record_t
|
||||
{
|
||||
rocprofiler_context_id_t context_id;
|
||||
rocprofiler_thread_id_t thread_id;
|
||||
rocprofiler_correlation_id_t correlation_id;
|
||||
rocprofiler_callback_tracing_kind_t kind;
|
||||
uint32_t operation;
|
||||
rocprofiler_callback_phase_t phase;
|
||||
void* payload;
|
||||
} rocprofiler_callback_tracing_record_t;
|
||||
```
|
||||
|
||||
The underlying type of `payload` field above is typically unique to a domain and, less frequently, an operation.
|
||||
For example, for the `ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API` and `ROCPROFILER_CALLBACK_TRACING_HIP_COMPILER_API`,
|
||||
the payload should be casted to `rocprofiler_callback_tracing_hip_api_data_t*` -- which will contain the arguments
|
||||
to the function and (in the exit phase) the return value of the function. The payload field will only be a valid
|
||||
pointer during the invocation of the callback function(s).
|
||||
|
||||
The `user_data` parameter can be used to store data in between callback phases. It is a unique for every
|
||||
instance of an operation. For example, if the tool library wishes to store the timestamp of the
|
||||
`ROCPROFILER_CALLBACK_PHASE_ENTER` phase for the ensuing `ROCPROFILER_CALLBACK_PHASE_EXIT` callback,
|
||||
this data can be stored in a method similar to below:
|
||||
|
||||
```cpp
|
||||
void
|
||||
callback_func(rocprofiler_callback_tracing_record_t record,
|
||||
rocprofiler_user_data_t* user_data,
|
||||
void* cb_data)
|
||||
{
|
||||
auto ts = rocprofiler_timestamp_t{};
|
||||
rocprofiler_get_timestamp(&ts);
|
||||
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
||||
{
|
||||
user_data->value = ts;
|
||||
}
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
{
|
||||
auto delta_ts = (ts - user_data->value);
|
||||
// ... etc. ...
|
||||
}
|
||||
else
|
||||
{
|
||||
// ... etc. ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The `callback_data` argument will be the value of `callback_args` passed to `rocprofiler_configure_callback_tracing_service`
|
||||
in [the previous section](#subscribing-to-callback-tracing-services).
|
||||
|
||||
## Callback Tracing Record
|
||||
|
||||
The name of a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_name` function.
|
||||
The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_operation_name`
|
||||
function. One can also iterate over all the callback tracing kinds and operations for each tracing kind via the
|
||||
`rocprofiler_iterate_callback_tracing_kinds` and `rocprofiler_iterate_callback_tracing_kind_operations` functions.
|
||||
Lastly, for a given `rocprofiler_callback_tracing_record_t` object, rocprofiler-sdk supports generically iterating over
|
||||
the arguments of the payload field for many domains.
|
||||
|
||||
As mentioned above, within the `rocprofiler_callback_tracing_record_t` object,
|
||||
an opaque `void* payload` is provided for accessing domain specific information.
|
||||
The data types generally follow the naming convention of `rocprofiler_callback_tracing_<DOMAIN>_data_t`,
|
||||
e.g., for the tracing kinds `ROCPROFILER_BUFFER_TRACING_HSA_{CORE,AMD_EXT,IMAGE_EXT,FINALIZE_EXT}_API`,
|
||||
the payload should be casted to `rocprofiler_callback_tracing_hsa_api_data_t*`:
|
||||
|
||||
```cpp
|
||||
void
|
||||
callback_func(rocprofiler_callback_tracing_record_t record,
|
||||
rocprofiler_user_data_t* user_data,
|
||||
void* cb_data)
|
||||
{
|
||||
static auto hsa_domains = std::unordered_set<rocprofiler_buffer_tracing_kind_t>{
|
||||
ROCPROFILER_BUFFER_TRACING_HSA_CORE_API,
|
||||
ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API,
|
||||
ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API,
|
||||
ROCPROFILER_BUFFER_TRACING_HSA_FINALIZER_API};
|
||||
|
||||
if(hsa_domains.count(record.kind) > 0)
|
||||
{
|
||||
auto* payload = static_cast<rocprofiler_callback_tracing_hsa_api_data_t*>(record.payload);
|
||||
|
||||
hsa_status_t status = payload->retval.hsa_status_t_retval;
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT && status != HSA_STATUS_SUCCESS)
|
||||
{
|
||||
const char* _kind = nullptr;
|
||||
const char* _operation = nullptr;
|
||||
|
||||
rocprofiler_query_callback_tracing_kind_name(record.kind, &_kind, nullptr);
|
||||
rocprofiler_query_callback_tracing_kind_operation_name(
|
||||
record.kind, record.operation, &_operation, nullptr);
|
||||
|
||||
// message that
|
||||
fprintf(stderr, "[domain=%s] %s returned a non-zero exit code: %i\n", _kind, _operation, status);
|
||||
}
|
||||
}
|
||||
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
|
||||
{
|
||||
auto delta_ts = (ts - user_data->value);
|
||||
// ... etc. ...
|
||||
}
|
||||
else
|
||||
{
|
||||
// ... etc. ...
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Sample `rocprofiler_iterate_callback_tracing_kind_operation_args`
|
||||
|
||||
```cpp
|
||||
int
|
||||
print_args(rocprofiler_callback_tracing_kind_t domain_idx,
|
||||
uint32_t op_idx,
|
||||
uint32_t arg_num,
|
||||
const void* const arg_value_addr,
|
||||
int32_t arg_indirection_count,
|
||||
const char* arg_type,
|
||||
const char* arg_name,
|
||||
const char* arg_value_str,
|
||||
int32_t arg_dereference_count,
|
||||
void* data)
|
||||
{
|
||||
if(arg_num == 0)
|
||||
{
|
||||
const char* _kind = nullptr;
|
||||
const char* _operation = nullptr;
|
||||
|
||||
rocprofiler_query_callback_tracing_kind_name(domain_idx, &_kind, nullptr);
|
||||
rocprofiler_query_callback_tracing_kind_operation_name(
|
||||
domain_idx, op_idx, &_operation, nullptr);
|
||||
|
||||
fprintf(stderr, "\n[%s] %s\n", _kind, _operation);
|
||||
}
|
||||
|
||||
char* _arg_type = abi::__cxa_demangle(arg_type, nullptr, nullptr, nullptr);
|
||||
|
||||
fprintf(stderr, " %u: %-18s %-16s = %s\n", arg_num, _arg_type, arg_name, arg_value_str);
|
||||
|
||||
free(_arg_type);
|
||||
|
||||
// unused in example
|
||||
(void) arg_value_addr;
|
||||
(void) arg_indirection_count;
|
||||
(void) arg_dereference_count;
|
||||
(void) data;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void
|
||||
callback_func(rocprofiler_callback_tracing_record_t record,
|
||||
rocprofiler_user_data_t* user_data,
|
||||
void* cb_data)
|
||||
{
|
||||
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT &&
|
||||
record.kind == ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API &&
|
||||
(record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchKernel ||
|
||||
record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAsync))
|
||||
{
|
||||
rocprofiler_iterate_callback_tracing_kind_operation_args(
|
||||
record, print_args, record.phase, nullptr));
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Sample Output:
|
||||
|
||||
```console
|
||||
|
||||
[HIP_RUNTIME_API] hipLaunchKernel
|
||||
0: void const* function_address = 0x219308
|
||||
1: rocprofiler_dim3_t numBlocks = {z=1, y=310, x=310}
|
||||
2: rocprofiler_dim3_t dimBlocks = {z=1, y=32, x=32}
|
||||
3: void** args = 0x7ffe6d8dd3c0
|
||||
4: unsigned long sharedMemBytes = 0
|
||||
5: ihipStream_t* stream = 0x17b40c0
|
||||
|
||||
[HIP_RUNTIME_API] hipMemcpyAsync
|
||||
0: void* dst = 0x7f06c7bbb010
|
||||
1: void const* src = 0x7f0698800000
|
||||
2: unsigned long sizeBytes = 393625600
|
||||
3: hipMemcpyKind kind = DeviceToHost
|
||||
4: ihipStream_t* stream = 0x25dfcf0
|
||||
```
|
||||
|
||||
## Code Object Tracing
|
||||
|
||||
## HSA API Tracing
|
||||
The code object tracing service is a critical component for obtaining information regarding
|
||||
asynchronous activity on the GPU. The `rocprofiler_callback_tracing_code_object_load_data_t`
|
||||
payload (kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_LOAD`)
|
||||
provides a unique identifier for a bundle of one or more GPU kernel symbols which have been loaded
|
||||
for a specific GPU agent. For example, if your application is leveraging a multi-GPU system system
|
||||
containing 4 Vega20 GPUs and 4 MI100 GPUs, there will at least 8 code objects loaded: one code
|
||||
object for each GPU. Each code object will be associated with a set of kernel symbols:
|
||||
the `rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t` payload
|
||||
(kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`)
|
||||
provides a globally unique identifier for the specific kernel symbol along with the kernel name and
|
||||
several other static properties of the kernel (e.g. scratch size, scalar general purpose register count, etc.).
|
||||
Note: two otherwise identical kernel symbols (same kernel name, scratch size, etc.) which are part of
|
||||
otherwise identical code objects but the code objects are loaded for different GPU agents ***will*** have unique
|
||||
kernel identifiers. Furthermore, if the same code object (and it's kernel symbols) are unloaded and then
|
||||
re-loaded, that code object and all of it's kernel symbols ***will*** be given new unique identifiers.
|
||||
|
||||
In general, when a code object is loaded and unloaded, here is the sequence of events:
|
||||
|
||||
1. Callback: code object load
|
||||
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
|
||||
- operation=`ROCPROFILER_CODE_OBJECT_LOAD`
|
||||
- phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
|
||||
2. Callback: kernel symbol load
|
||||
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
|
||||
- operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
|
||||
- phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
|
||||
- Repeats for each kernel symbol in code object
|
||||
3. Application Execution
|
||||
4. Callback: kernel symbol unload
|
||||
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
|
||||
- operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
|
||||
- phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
|
||||
- Repeats for each kernel symbol in code object
|
||||
5. Callback: code object unload
|
||||
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
|
||||
- operation=`ROCPROFILER_CODE_OBJECT_LOAD`
|
||||
- phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
|
||||
|
||||
Note: rocprofiler-sdk does not provide an interface to query this information outside of the
|
||||
code object tracing service. If you wish to be able to associate kernel names with kernel tracing records,
|
||||
a tool is personally responsible for making a copy of the relevant information when the code objects and
|
||||
kernel symbol are loaded (however, any constant string fields like the (`const char* kernel_name` field)
|
||||
need not to be copied, these are guaranteed to be valid pointers until after rocprofiler-sdk finalization).
|
||||
If a tool decides to delete their copy of the data associated with a given code object or kernel symbol
|
||||
identifier when the code object and kernel symbols are unloaded, it is highly recommended to flush
|
||||
any/all buffers which might contain references to that code object or kernel symbol identifiers before
|
||||
deleting the associated data.
|
||||
|
||||
For a sample of code object tracing, please see the `samples/code_object_tracing` example in the
|
||||
[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).
|
||||
|
||||
@@ -1,3 +1,96 @@
|
||||
# Runtime Intercept Tables
|
||||
|
||||
Discussion on how access the raw runtime intercept tables of HSA and HIP (i.e. ExaTracer requirements by LTTng).
|
||||
Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx
|
||||
APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are
|
||||
designed similar to the following sample.
|
||||
|
||||
## Dispatch Table Overview
|
||||
|
||||
### Forward Declaration of public C API function
|
||||
|
||||
```cpp
|
||||
extern "C"
|
||||
{
|
||||
// forward declaration of public C API function
|
||||
int
|
||||
foo(int) __attribute__((visibility("default")));
|
||||
}
|
||||
```
|
||||
|
||||
### Internal Implementation of API function
|
||||
|
||||
```cpp
|
||||
namespace impl
|
||||
{
|
||||
int
|
||||
foo(int val)
|
||||
{
|
||||
// real implementation
|
||||
return (2 * val);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Dispatch Table Implementation
|
||||
|
||||
```cpp
|
||||
namespace impl
|
||||
{
|
||||
struct dispatch_table
|
||||
{
|
||||
int (*foo_fn)(int) = nullptr;
|
||||
};
|
||||
|
||||
// invoked once: populates the dispatch_table with function pointers to implementation
|
||||
dispatch_table*&
|
||||
construct_dispatch_table()
|
||||
{
|
||||
static dispatch_table* tbl = new dispatch_table{};
|
||||
tbl->foo_fn = impl::foo;
|
||||
|
||||
// in between above and below, rocprofiler-sdk gets passed the pointer
|
||||
// to the dispatch table and has the opportunity to wrap the function
|
||||
// pointers for interception
|
||||
|
||||
return tbl;
|
||||
}
|
||||
|
||||
// constructs dispatch table and stores it in static variable
|
||||
dispatch_table*
|
||||
get_dispatch_table()
|
||||
{
|
||||
static dispatch_table*& tbl = construct_dispatch_table();
|
||||
return tbl;
|
||||
}
|
||||
} // namespace impl
|
||||
```
|
||||
|
||||
### Implementaiton of public C API function
|
||||
|
||||
```cpp
|
||||
extern "C"
|
||||
{
|
||||
// implementation of public C API function
|
||||
int
|
||||
foo(int val)
|
||||
{
|
||||
return impl::get_dispatch_table()->foo_fn(val);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Dispatch Table Chaining
|
||||
|
||||
rocprofiler-sdk is given an opportunity within `impl::construct_dispatch_table()` to
|
||||
save the original value(s) of the function pointers such as `foo_fn` and install
|
||||
it's own function pointers in its place -- this results in the public C API function `foo`
|
||||
calling into the rocprofiler-sdk function pointer, which then in turn, calls the original
|
||||
function pointer to `impl::foo` (this is called "chaining"). Once rocprofiler-sdk
|
||||
has made any necessary modifications to the dispatch table, tools which indicated
|
||||
they also want access to the raw dispatch table via `rocprofiler_at_intercept_table_registration`
|
||||
will be passed the pointer to the dispatch table.
|
||||
|
||||
## Sample
|
||||
|
||||
For a demo of dispatch table chaining, please see the `samples/intercept_table` example in the
|
||||
[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).
|
||||
|
||||
Reference in New Issue
Block a user