Tracing Documentation (#997)

* Update callback_services.md

* Callback tracing services

* Intercept table

* Buffer tracing

[ROCm/rocprofiler-sdk commit: cfbac19640]
This commit is contained in:
Jonathan R. Madsen
2024-08-01 02:59:35 -05:00
committed by GitHub
parent 6c6adddb5c
commit 8cea0cae2e
3 changed files with 668 additions and 7 deletions
@@ -2,12 +2,250 @@
For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field.
## Buffered Tracing Services
## Overview
In buffered approach, callbacks are receieved for batches of records from an internal (background) thread. Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`.
In buffered approach, callbacks are receieved for batches of records from an internal (background) thread.
Supported buffered tracing services are enumerated in `rocprofiler_buffer_tracing_kind_t`. Configuring
a buffer tracing service requires the creation of a buffer. When the buffer is "flushed", either implicitly
or explicitly, a callback to the tool will be invoked which provides an array of one or more buffer records.
A buffer can be explicitly flushed via the `rocprofiler_flush_buffer` function.
## HSA API Tracing
## Subscribing to Buffer Tracing Services
## Kernel Tracing
During tool initialization, tools configure callback tracing via the `rocprofiler_configure_buffer_tracing_service`
function. However, before invoking `rocprofiler_configure_buffer_tracing_service`, the tool must create a buffer
for the tracing records.
### Creating a Buffer
```cpp
rocprofiler_status_t
rocprofiler_create_buffer(rocprofiler_context_id_t context,
size_t size,
size_t watermark,
rocprofiler_buffer_policy_t policy,
rocprofiler_buffer_tracing_cb_t callback,
void* callback_data,
rocprofiler_buffer_id_t* buffer_id);
```
The `size` parameter is the size of the buffer in bytes and will be rounded up to the nearest
memory page size (defined by `sysconf(_SC_PAGESIZE)`); the default memory page size on Linux
is 4096 bytes (4 KB).
The `watermark` parameter specifies the number of bytes at which
the buffer should be "flushed", i.e. when the records in the buffer should invoke the
`callback` parameter to deliver the records to the tool. For example, if a buffer has a size
of 4096 bytes and the watermark is set to 48 bytes, six 8-byte records can be placed in the
buffer before `callback` is invoked. However, every 64-byte record that is placed in the
buffer will trigger a flush. It is safe to set the `watermark` to any value between
zero and the buffer size.
The `policy` parameter specifies the behavior for when a record is larger than the
amount of free space in the current buffer. For example, if a buffer has a size of
4000 bytes with a watermark set to 4000 bytes and 3998 of the bytes in the buffer
have been populated with records, the `policy` dictates how to handle an incoming record >
2 bytes. The `ROCPROFILER_BUFFER_POLICY_DISCARD` policy dictates that all records greater
than should 2 bytes should be dropped until the tool _explicitly_ flushes the buffer via
a `rocprofiler_flush_buffer` function call whereas the `ROCPROFILER_BUFFER_POLICY_LOSSLESS`
policy dictates that the current buffer should be swapped out for an empty buffer and placed
in that new buffer and former (full) buffer should be _implicitly_ flushed.
The `callback` parameter is the function that rocprofiler-sdk should invoke when flushing
the buffer; the value of the `callback_data` parameter will be passed as one of the arguments
to the `callback` function.
The `buffer_id` parameter is an output parameter for the function call and will have a
non-zero handle field after successful buffer creation.
### Creating a Dedicated Thread for Buffer Callbacks
By default, all buffers will use the same (default) background thread created by rocprofiler-sdk to
invoke their callback. However, rocprofiler-sdk provides an interface for tools to specify the
creation of an additional background thread for one or more of their buffers.
Callback threads for buffers are created via the `rocprofiler_create_callback_thread` function:
```cpp
rocprofiler_status_t
rocprofiler_create_callback_thread(rocprofiler_callback_thread_t* cb_thread_id);
```
Buffers are assigned to that callback thread via the `rocprofiler_assign_callback_thread` function:
```cpp
rocprofiler_status_t
rocprofiler_assign_callback_thread(rocprofiler_buffer_id_t buffer_id,
rocprofiler_callback_thread_t cb_thread_id);
```
#### Buffer Callback Thread Creation and Assignment Example
```cpp
{
// create a context
auto context_id = rocprofiler_context_id_t{};
rocprofiler_create_context(&context_id);
// create a buffer associated with the context
auto buffer_id = rocprofiler_buffer_id_t{};
rocprofiler_create_buffer(context_id, ..., &buffer_id);
// specify that a new callback thread should be created and provide
// and assign the identifier for it to the "thr_id" variable
auto thr_id = rocprofiler_callback_thread_t{};
rocprofiler_create_callback_thread(&thr_id);
// assign the buffer callback to be delivered on this thread
rocprofiler_assign_callback_thread(buffer_id, thr_id);
}
```
### Configuring Buffer Tracing Services
```cpp
rocprofiler_status_t
rocprofiler_configure_buffer_tracing_service(rocprofiler_context_id_t context_id,
rocprofiler_buffer_tracing_kind_t kind,
rocprofiler_tracing_operation_t* operations,
size_t operations_count,
rocprofiler_buffer_id_t buffer_id);
```
The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
For each domain, there are (often) various "operations", which can be used to restrict the callbacks
to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
size of the array for the `operations` and `operations_count` parameter.
Similar to `rocprofiler_configure_callback_tracing_service`,
`rocprofiler_configure_buffer_tracing_service` will return an error if a buffer service for given context
and given domain is configured more than once.
#### Example
```cpp
{
auto ctx = rocprofiler_context_id_t{};
// ... creation of context, etc. ...
// buffer parameters
constexpr auto KB = 1024; // 1024 bytes
constexpr auto buffer_size = 16 * KB;
constexpr auto watermark = 15 * KB;
constexpr auto policy = ROCPROFILER_BUFFER_POLICY_LOSSLESS;
// buffer handle
auto buffer_id = rocprofiler_buffer_id_t{};
// create a buffer associated with the context
rocprofiler_create_buffer(
context_id, buffer_size, watermark, policy, callback_func, nullptr, &buffer_id);
// configure HIP runtime API function records to be placed in buffer
rocprofiler_configure_buffer_tracing_service(
ctx, ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API, nullptr, 0, buffer_id);
// configure kernel dispatch records to be placed in buffer
// (more than one service can use the same buffer)
rocprofiler_configure_buffer_tracing_service(
ctx, ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH, nullptr, 0, buffer_id);
// ... etc. ...
}
```
## Buffer Tracing Callback Function
Rocprofiler-sdk buffer tracing callback functions have the signature:
```cpp
typedef void (*rocprofiler_buffer_tracing_cb_t)(rocprofiler_context_id_t context,
rocprofiler_buffer_id_t buffer_id,
rocprofiler_record_header_t** headers,
size_t num_headers,
void* data,
uint64_t drop_count);
```
The `rocprofiler_record_header_t` data type provides three pieces of information:
1. Category (`rocprofiler_buffer_category_t`)
2. Kind
3. Payload
The category is used to distinguish the classification of the buffer record. For all
services configured via `rocprofiler_configure_buffer_tracing_service`, the category will
be equal to the value of `ROCPROFILER_BUFFER_CATEGORY_TRACING`. The meaning of the kind
field is dependent on the category but when the category is `ROCPROFILER_BUFFER_CATEGORY_TRACING`,
the kind value will be equivalent to the is used
to distinguish the `rocprofiler_buffer_tracing_kind_t` value passed to
`rocprofiler_configure_buffer_tracing_service`, e.g. `ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH`.
Once the category and kind have been determined, the payload can be casted:
```cpp
{
if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API)
{
auto* record =
static_cast<rocprofiler_buffer_tracing_hip_api_record_t*>(header->payload);
// ... etc. ...
}
}
```
### Buffer Tracing Callback Function Example
```cpp
void
buffer_callback_func(rocprofiler_context_id_t context,
rocprofiler_buffer_id_t buffer_id,
rocprofiler_record_header_t** headers,
size_t num_headers,
void* user_data,
uint64_t drop_count)
{
for(size_t i = 0; i < num_headers; ++i)
{
auto* header = headers[i];
if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API)
{
auto* record =
static_cast<rocprofiler_buffer_tracing_hip_api_record_t*>(header->payload);
// ... etc. ...
}
else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
header->kind == ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH)
{
auto* record =
static_cast<rocprofiler_buffer_tracing_kernel_dispatch_record_t*>(header->payload);
// ... etc. ...
}
else
{
throw std::runtime_error{"unhandled record header category + kind"};
}
}
}
```
## Buffer Tracing Record
Unlike callback tracing records, there is no common set of data for each buffer tracing record. However,
many buffer tracing records contain a `kind` field and an `operation` field.
The name of a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_name` function.
The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_operation_name`
function. One can also iterate over all the buffer tracing kinds and operations for each tracing kind via the
`rocprofiler_iterate_buffer_tracing_kinds` and `rocprofiler_iterate_buffer_tracing_kind_operations` functions.
The buffer tracing record data types can be found in the `rocprofiler-sdk/buffer_tracing.h` header
(`source/include/rocprofiler-sdk/buffer_tracing.h` in the [rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk)).
@@ -2,6 +2,336 @@
## Overview
Callback tracing services provide immediate callbacks to a tool on the current CPU thread when a given event occurs.
For example, when tracing an API function, e.g. `hipSetDevice`, callback tracing invokes a user-specified callback
before and after the traced function executes on the thread which is invoking the API function.
## Subscribing to Callback Tracing Services
During tool initialization, tools configure callback tracing via the `rocprofiler_configure_callback_tracing_service`
function:
```cpp
rocprofiler_status_t
rocprofiler_configure_callback_tracing_service(rocprofiler_context_id_t context_id,
rocprofiler_callback_tracing_kind_t kind,
rocprofiler_tracing_operation_t* operations,
size_t operations_count,
rocprofiler_callback_tracing_cb_t callback,
void* callback_args);
```
The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
For each domain, there are (often) various "operations", which can be used to restrict the callbacks
to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
size of the array for the `operations` and `operations_count` parameter.
`rocprofiler_configure_callback_tracing_service` will return an error if a callback service for given context
and given domain is configured more than once. For example, if one only wanted to trace two functions within
the HIP runtime API, `hipGetDevice` and `hipSetDevice`, the following code would accomplish this objective:
```cpp
{
auto ctx = rocprofiler_context_id_t{};
// ... creation of context, etc. ...
// array of operations (i.e. API functions)
auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
};
rocprofiler_configure_callback_tracing_service(ctx,
ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
operations.data(),
operations.size(),
callback_func,
nullptr);
// ... etc. ...
}
```
But the following code would be invalid:
```cpp
{
auto ctx = rocprofiler_context_id_t{};
// ... creation of context, etc. ...
// array of operations (i.e. API functions)
auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
};
for(auto op : operations)
{
// after the first iteration, will return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED
rocprofiler_configure_callback_tracing_service(ctx,
ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
&op,
1,
callback_func,
nullptr);
}
// ... etc. ...
}
```
## Callback Tracing Callback Function
Rocprofiler-sdk callback tracing callback functions have the signature:
```cpp
typedef void (*rocprofiler_callback_tracing_cb_t)(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* callback_data)
```
The `record` parameter contains the information to uniquely identify a tracing record type and has the
following definition:
```cpp
typedef struct rocprofiler_callback_tracing_record_t
{
rocprofiler_context_id_t context_id;
rocprofiler_thread_id_t thread_id;
rocprofiler_correlation_id_t correlation_id;
rocprofiler_callback_tracing_kind_t kind;
uint32_t operation;
rocprofiler_callback_phase_t phase;
void* payload;
} rocprofiler_callback_tracing_record_t;
```
The underlying type of `payload` field above is typically unique to a domain and, less frequently, an operation.
For example, for the `ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API` and `ROCPROFILER_CALLBACK_TRACING_HIP_COMPILER_API`,
the payload should be casted to `rocprofiler_callback_tracing_hip_api_data_t*` -- which will contain the arguments
to the function and (in the exit phase) the return value of the function. The payload field will only be a valid
pointer during the invocation of the callback function(s).
The `user_data` parameter can be used to store data in between callback phases. It is a unique for every
instance of an operation. For example, if the tool library wishes to store the timestamp of the
`ROCPROFILER_CALLBACK_PHASE_ENTER` phase for the ensuing `ROCPROFILER_CALLBACK_PHASE_EXIT` callback,
this data can be stored in a method similar to below:
```cpp
void
callback_func(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* cb_data)
{
auto ts = rocprofiler_timestamp_t{};
rocprofiler_get_timestamp(&ts);
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
{
user_data->value = ts;
}
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
{
auto delta_ts = (ts - user_data->value);
// ... etc. ...
}
else
{
// ... etc. ...
}
}
```
The `callback_data` argument will be the value of `callback_args` passed to `rocprofiler_configure_callback_tracing_service`
in [the previous section](#subscribing-to-callback-tracing-services).
## Callback Tracing Record
The name of a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_name` function.
The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_operation_name`
function. One can also iterate over all the callback tracing kinds and operations for each tracing kind via the
`rocprofiler_iterate_callback_tracing_kinds` and `rocprofiler_iterate_callback_tracing_kind_operations` functions.
Lastly, for a given `rocprofiler_callback_tracing_record_t` object, rocprofiler-sdk supports generically iterating over
the arguments of the payload field for many domains.
As mentioned above, within the `rocprofiler_callback_tracing_record_t` object,
an opaque `void* payload` is provided for accessing domain specific information.
The data types generally follow the naming convention of `rocprofiler_callback_tracing_<DOMAIN>_data_t`,
e.g., for the tracing kinds `ROCPROFILER_BUFFER_TRACING_HSA_{CORE,AMD_EXT,IMAGE_EXT,FINALIZE_EXT}_API`,
the payload should be casted to `rocprofiler_callback_tracing_hsa_api_data_t*`:
```cpp
void
callback_func(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* cb_data)
{
static auto hsa_domains = std::unordered_set<rocprofiler_buffer_tracing_kind_t>{
ROCPROFILER_BUFFER_TRACING_HSA_CORE_API,
ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API,
ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API,
ROCPROFILER_BUFFER_TRACING_HSA_FINALIZER_API};
if(hsa_domains.count(record.kind) > 0)
{
auto* payload = static_cast<rocprofiler_callback_tracing_hsa_api_data_t*>(record.payload);
hsa_status_t status = payload->retval.hsa_status_t_retval;
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT && status != HSA_STATUS_SUCCESS)
{
const char* _kind = nullptr;
const char* _operation = nullptr;
rocprofiler_query_callback_tracing_kind_name(record.kind, &_kind, nullptr);
rocprofiler_query_callback_tracing_kind_operation_name(
record.kind, record.operation, &_operation, nullptr);
// message that
fprintf(stderr, "[domain=%s] %s returned a non-zero exit code: %i\n", _kind, _operation, status);
}
}
else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
{
auto delta_ts = (ts - user_data->value);
// ... etc. ...
}
else
{
// ... etc. ...
}
}
```
### Sample `rocprofiler_iterate_callback_tracing_kind_operation_args`
```cpp
int
print_args(rocprofiler_callback_tracing_kind_t domain_idx,
uint32_t op_idx,
uint32_t arg_num,
const void* const arg_value_addr,
int32_t arg_indirection_count,
const char* arg_type,
const char* arg_name,
const char* arg_value_str,
int32_t arg_dereference_count,
void* data)
{
if(arg_num == 0)
{
const char* _kind = nullptr;
const char* _operation = nullptr;
rocprofiler_query_callback_tracing_kind_name(domain_idx, &_kind, nullptr);
rocprofiler_query_callback_tracing_kind_operation_name(
domain_idx, op_idx, &_operation, nullptr);
fprintf(stderr, "\n[%s] %s\n", _kind, _operation);
}
char* _arg_type = abi::__cxa_demangle(arg_type, nullptr, nullptr, nullptr);
fprintf(stderr, " %u: %-18s %-16s = %s\n", arg_num, _arg_type, arg_name, arg_value_str);
free(_arg_type);
// unused in example
(void) arg_value_addr;
(void) arg_indirection_count;
(void) arg_dereference_count;
(void) data;
return 0;
}
void
callback_func(rocprofiler_callback_tracing_record_t record,
rocprofiler_user_data_t* user_data,
void* cb_data)
{
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT &&
record.kind == ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API &&
(record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchKernel ||
record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAsync))
{
rocprofiler_iterate_callback_tracing_kind_operation_args(
record, print_args, record.phase, nullptr));
}
}
```
Sample Output:
```console
[HIP_RUNTIME_API] hipLaunchKernel
0: void const* function_address = 0x219308
1: rocprofiler_dim3_t numBlocks = {z=1, y=310, x=310}
2: rocprofiler_dim3_t dimBlocks = {z=1, y=32, x=32}
3: void** args = 0x7ffe6d8dd3c0
4: unsigned long sharedMemBytes = 0
5: ihipStream_t* stream = 0x17b40c0
[HIP_RUNTIME_API] hipMemcpyAsync
0: void* dst = 0x7f06c7bbb010
1: void const* src = 0x7f0698800000
2: unsigned long sizeBytes = 393625600
3: hipMemcpyKind kind = DeviceToHost
4: ihipStream_t* stream = 0x25dfcf0
```
## Code Object Tracing
## HSA API Tracing
The code object tracing service is a critical component for obtaining information regarding
asynchronous activity on the GPU. The `rocprofiler_callback_tracing_code_object_load_data_t`
payload (kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_LOAD`)
provides a unique identifier for a bundle of one or more GPU kernel symbols which have been loaded
for a specific GPU agent. For example, if your application is leveraging a multi-GPU system system
containing 4 Vega20 GPUs and 4 MI100 GPUs, there will at least 8 code objects loaded: one code
object for each GPU. Each code object will be associated with a set of kernel symbols:
the `rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t` payload
(kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`)
provides a globally unique identifier for the specific kernel symbol along with the kernel name and
several other static properties of the kernel (e.g. scratch size, scalar general purpose register count, etc.).
Note: two otherwise identical kernel symbols (same kernel name, scratch size, etc.) which are part of
otherwise identical code objects but the code objects are loaded for different GPU agents ***will*** have unique
kernel identifiers. Furthermore, if the same code object (and it's kernel symbols) are unloaded and then
re-loaded, that code object and all of it's kernel symbols ***will*** be given new unique identifiers.
In general, when a code object is loaded and unloaded, here is the sequence of events:
1. Callback: code object load
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_LOAD`
- phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
2. Callback: kernel symbol load
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
- phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
- Repeats for each kernel symbol in code object
3. Application Execution
4. Callback: kernel symbol unload
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
- phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
- Repeats for each kernel symbol in code object
5. Callback: code object unload
- kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
- operation=`ROCPROFILER_CODE_OBJECT_LOAD`
- phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
Note: rocprofiler-sdk does not provide an interface to query this information outside of the
code object tracing service. If you wish to be able to associate kernel names with kernel tracing records,
a tool is personally responsible for making a copy of the relevant information when the code objects and
kernel symbol are loaded (however, any constant string fields like the (`const char* kernel_name` field)
need not to be copied, these are guaranteed to be valid pointers until after rocprofiler-sdk finalization).
If a tool decides to delete their copy of the data associated with a given code object or kernel symbol
identifier when the code object and kernel symbols are unloaded, it is highly recommended to flush
any/all buffers which might contain references to that code object or kernel symbol identifiers before
deleting the associated data.
For a sample of code object tracing, please see the `samples/code_object_tracing` example in the
[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).
@@ -1,3 +1,96 @@
# Runtime Intercept Tables
Discussion on how access the raw runtime intercept tables of HSA and HIP (i.e. ExaTracer requirements by LTTng).
Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx
APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are
designed similar to the following sample.
## Dispatch Table Overview
### Forward Declaration of public C API function
```cpp
extern "C"
{
// forward declaration of public C API function
int
foo(int) __attribute__((visibility("default")));
}
```
### Internal Implementation of API function
```cpp
namespace impl
{
int
foo(int val)
{
// real implementation
return (2 * val);
}
}
```
### Dispatch Table Implementation
```cpp
namespace impl
{
struct dispatch_table
{
int (*foo_fn)(int) = nullptr;
};
// invoked once: populates the dispatch_table with function pointers to implementation
dispatch_table*&
construct_dispatch_table()
{
static dispatch_table* tbl = new dispatch_table{};
tbl->foo_fn = impl::foo;
// in between above and below, rocprofiler-sdk gets passed the pointer
// to the dispatch table and has the opportunity to wrap the function
// pointers for interception
return tbl;
}
// constructs dispatch table and stores it in static variable
dispatch_table*
get_dispatch_table()
{
static dispatch_table*& tbl = construct_dispatch_table();
return tbl;
}
} // namespace impl
```
### Implementaiton of public C API function
```cpp
extern "C"
{
// implementation of public C API function
int
foo(int val)
{
return impl::get_dispatch_table()->foo_fn(val);
}
}
```
### Dispatch Table Chaining
rocprofiler-sdk is given an opportunity within `impl::construct_dispatch_table()` to
save the original value(s) of the function pointers such as `foo_fn` and install
it's own function pointers in its place -- this results in the public C API function `foo`
calling into the rocprofiler-sdk function pointer, which then in turn, calls the original
function pointer to `impl::foo` (this is called "chaining"). Once rocprofiler-sdk
has made any necessary modifications to the dispatch table, tools which indicated
they also want access to the raw dispatch table via `rocprofiler_at_intercept_table_registration`
will be passed the pointer to the dispatch table.
## Sample
For a demo of dispatch table chaining, please see the `samples/intercept_table` example in the
[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).