Files
rocm-systems/source/docs/api-reference/callback_services.md
T

338 řádky
15 KiB
Markdown
Surový Normální zobrazení Historie

2024-08-03 00:38:04 +05:30
# Callback tracing services
2023-11-28 10:04:37 -06:00
## Overview
2024-08-01 02:59:35 -05:00
Callback tracing services provide immediate callbacks to a tool on the current CPU thread when a given event occurs.
For example, when tracing an API function, e.g. `hipSetDevice`, callback tracing invokes a user-specified callback
before and after the traced function executes on the thread which is invoking the API function.
## Subscribing to Callback Tracing Services
During tool initialization, tools configure callback tracing via the `rocprofiler_configure_callback_tracing_service`
function:
```cpp
rocprofiler_status_t
rocprofiler_configure_callback_tracing_service(rocprofiler_context_id_t context_id,
rocprofiler_callback_tracing_kind_t kind,
rocprofiler_tracing_operation_t* operations,
size_t operations_count,
rocprofiler_callback_tracing_cb_t callback,
void* callback_args);
```
The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
For each domain, there are (often) various "operations", which can be used to restrict the callbacks
to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
size of the array for the `operations` and `operations_count` parameter.
`rocprofiler_configure_callback_tracing_service` will return an error if a callback service for given context
and given domain is configured more than once. For example, if one only wanted to trace two functions within
the HIP runtime API, `hipGetDevice` and `hipSetDevice`, the following code would accomplish this objective:
```cpp
{
auto ctx = rocprofiler_context_id_t{};
// ... creation of context, etc. ...
// array of operations (i.e. API functions)
auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
};
rocprofiler_configure_callback_tracing_service(ctx,
ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
operations.data(),
operations.size(),
callback_func,
nullptr);
// ... etc. ...
}
```
But the following code would be invalid:
```cpp
{
auto ctx = rocprofiler_context_id_t{};
// ... creation of context, etc. ...
// array of operations (i.e. API functions)
auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
};
for(auto op : operations)
{
// after the first iteration, will return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED
rocprofiler_configure_callback_tracing_service(ctx,
ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
&op,
1,
callback_func,
nullptr);
}
// ... etc. ...
}
```
## Callback Tracing Callback Function
Rocprofiler-sdk callback tracing callback functions have the signature:
```cpp
typedef void (*rocprofiler_callback_tracing_cb_t)(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* callback_data)
```
The `record` parameter contains the information to uniquely identify a tracing record type and has the
following definition:
```cpp
typedef struct rocprofiler_callback_tracing_record_t
{
rocprofiler_context_id_t context_id;
rocprofiler_thread_id_t thread_id;
rocprofiler_correlation_id_t correlation_id;
rocprofiler_callback_tracing_kind_t kind;
uint32_t operation;
rocprofiler_callback_phase_t phase;
void* payload;
} rocprofiler_callback_tracing_record_t;
```
The underlying type of `payload` field above is typically unique to a domain and, less frequently, an operation.
For example, for the `ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API` and `ROCPROFILER_CALLBACK_TRACING_HIP_COMPILER_API`,
the payload should be casted to `rocprofiler_callback_tracing_hip_api_data_t*` -- which will contain the arguments
to the function and (in the exit phase) the return value of the function. The payload field will only be a valid
pointer during the invocation of the callback function(s).
The `user_data` parameter can be used to store data in between callback phases. It is a unique for every
instance of an operation. For example, if the tool library wishes to store the timestamp of the
`ROCPROFILER_CALLBACK_PHASE_ENTER` phase for the ensuing `ROCPROFILER_CALLBACK_PHASE_EXIT` callback,
this data can be stored in a method similar to below:
```cpp
void
callback_func(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* cb_data)
{
auto ts = rocprofiler_timestamp_t{};
rocprofiler_get_timestamp(&ts);
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
{
user_data->value = ts;
}
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
{
auto delta_ts = (ts - user_data->value);
// ... etc. ...
}
else
{
// ... etc. ...
}
}
```
The `callback_data` argument will be the value of `callback_args` passed to `rocprofiler_configure_callback_tracing_service`
in [the previous section](#subscribing-to-callback-tracing-services).
## Callback Tracing Record
The name of a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_name` function.
The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_operation_name`
function. One can also iterate over all the callback tracing kinds and operations for each tracing kind via the
`rocprofiler_iterate_callback_tracing_kinds` and `rocprofiler_iterate_callback_tracing_kind_operations` functions.
Lastly, for a given `rocprofiler_callback_tracing_record_t` object, rocprofiler-sdk supports generically iterating over
the arguments of the payload field for many domains.
As mentioned above, within the `rocprofiler_callback_tracing_record_t` object,
an opaque `void* payload` is provided for accessing domain specific information.
The data types generally follow the naming convention of `rocprofiler_callback_tracing_<DOMAIN>_data_t`,
e.g., for the tracing kinds `ROCPROFILER_BUFFER_TRACING_HSA_{CORE,AMD_EXT,IMAGE_EXT,FINALIZE_EXT}_API`,
the payload should be casted to `rocprofiler_callback_tracing_hsa_api_data_t*`:
```cpp
void
callback_func(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* cb_data)
{
static auto hsa_domains = std::unordered_set<rocprofiler_buffer_tracing_kind_t>{
ROCPROFILER_BUFFER_TRACING_HSA_CORE_API,
ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API,
ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API,
ROCPROFILER_BUFFER_TRACING_HSA_FINALIZER_API};
if(hsa_domains.count(record.kind) > 0)
{
auto* payload = static_cast<rocprofiler_callback_tracing_hsa_api_data_t*>(record.payload);
hsa_status_t status = payload->retval.hsa_status_t_retval;
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT && status != HSA_STATUS_SUCCESS)
{
const char* _kind = nullptr;
const char* _operation = nullptr;
rocprofiler_query_callback_tracing_kind_name(record.kind, &_kind, nullptr);
rocprofiler_query_callback_tracing_kind_operation_name(
record.kind, record.operation, &_operation, nullptr);
// message that
fprintf(stderr, "[domain=%s] %s returned a non-zero exit code: %i\n", _kind, _operation, status);
}
}
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
{
auto delta_ts = (ts - user_data->value);
// ... etc. ...
}
else
{
// ... etc. ...
}
}
```
### Sample `rocprofiler_iterate_callback_tracing_kind_operation_args`
```cpp
int
print_args(rocprofiler_callback_tracing_kind_t domain_idx,
uint32_t op_idx,
uint32_t arg_num,
const void* const arg_value_addr,
int32_t arg_indirection_count,
const char* arg_type,
const char* arg_name,
const char* arg_value_str,
int32_t arg_dereference_count,
void* data)
{
if(arg_num == 0)
{
const char* _kind = nullptr;
const char* _operation = nullptr;
rocprofiler_query_callback_tracing_kind_name(domain_idx, &_kind, nullptr);
rocprofiler_query_callback_tracing_kind_operation_name(
domain_idx, op_idx, &_operation, nullptr);
fprintf(stderr, "\n[%s] %s\n", _kind, _operation);
}
char* _arg_type = abi::__cxa_demangle(arg_type, nullptr, nullptr, nullptr);
fprintf(stderr, " %u: %-18s %-16s = %s\n", arg_num, _arg_type, arg_name, arg_value_str);
free(_arg_type);
// unused in example
(void) arg_value_addr;
(void) arg_indirection_count;
(void) arg_dereference_count;
(void) data;
return 0;
}
void
callback_func(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* cb_data)
{
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT &&
record.kind == ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API &&
(record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchKernel ||
record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAsync))
{
rocprofiler_iterate_callback_tracing_kind_operation_args(
record, print_args, record.phase, nullptr));
}
}
```
Sample Output:
```console
[HIP_RUNTIME_API] hipLaunchKernel
0: void const* function_address = 0x219308
1: rocprofiler_dim3_t numBlocks = {z=1, y=310, x=310}
2: rocprofiler_dim3_t dimBlocks = {z=1, y=32, x=32}
3: void** args = 0x7ffe6d8dd3c0
4: unsigned long sharedMemBytes = 0
2024-08-13 23:35:12 +05:30
5: hipStream_t* stream = 0x17b40c0
2024-08-01 02:59:35 -05:00
[HIP_RUNTIME_API] hipMemcpyAsync
0: void* dst = 0x7f06c7bbb010
1: void const* src = 0x7f0698800000
2: unsigned long sizeBytes = 393625600
3: hipMemcpyKind kind = DeviceToHost
2024-08-13 23:35:12 +05:30
4: hipStream_t* stream = 0x25dfcf0
2024-08-01 02:59:35 -05:00
```
2023-11-28 10:04:37 -06:00
## Code Object Tracing
2024-08-01 02:59:35 -05:00
The code object tracing service is a critical component for obtaining information regarding
asynchronous activity on the GPU. The `rocprofiler_callback_tracing_code_object_load_data_t`
payload (kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_LOAD`)
provides a unique identifier for a bundle of one or more GPU kernel symbols which have been loaded
for a specific GPU agent. For example, if your application is leveraging a multi-GPU system system
containing 4 Vega20 GPUs and 4 MI100 GPUs, there will at least 8 code objects loaded: one code
object for each GPU. Each code object will be associated with a set of kernel symbols:
the `rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t` payload
(kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`)
provides a globally unique identifier for the specific kernel symbol along with the kernel name and
several other static properties of the kernel (e.g. scratch size, scalar general purpose register count, etc.).
Note: two otherwise identical kernel symbols (same kernel name, scratch size, etc.) which are part of
otherwise identical code objects but the code objects are loaded for different GPU agents ***will*** have unique
kernel identifiers. Furthermore, if the same code object (and it's kernel symbols) are unloaded and then
re-loaded, that code object and all of it's kernel symbols ***will*** be given new unique identifiers.
In general, when a code object is loaded and unloaded, here is the sequence of events:
1. Callback: code object load
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_LOAD`
- phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
2. Callback: kernel symbol load
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
- phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
- Repeats for each kernel symbol in code object
3. Application Execution
4. Callback: kernel symbol unload
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
- phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
- Repeats for each kernel symbol in code object
5. Callback: code object unload
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_LOAD`
- phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
Note: rocprofiler-sdk does not provide an interface to query this information outside of the
code object tracing service. If you wish to be able to associate kernel names with kernel tracing records,
a tool is personally responsible for making a copy of the relevant information when the code objects and
kernel symbol are loaded (however, any constant string fields like the (`const char* kernel_name` field)
need not to be copied, these are guaranteed to be valid pointers until after rocprofiler-sdk finalization).
If a tool decides to delete their copy of the data associated with a given code object or kernel symbol
identifier when the code object and kernel symbols are unloaded, it is highly recommended to flush
any/all buffers which might contain references to that code object or kernel symbol identifiers before
deleting the associated data.
For a sample of code object tracing, please see the `samples/code_object_tracing` example in the
2024-08-13 23:35:12 +05:30
[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocprofiler-sdk).