Tracing Documentation (#997)

* Update callback_services.md * Callback tracing services * Intercept table * Buffer tracing [ROCm/rocprofiler-sdk commit: cfbac19640]
2024-08-01 02:59:35 -05:00
parent 6c6adddb5c
commit 8cea0cae2e
3 changed files with 668 additions and 7 deletions
@@ -2,12 +2,250 @@

 For the buffered approach, supported buffer record categories are enumerated in `rocprofiler_buffer_category_t` category field.

-## Buffered Tracing Services
-
 ## Overview

-In buffered approach, callbacks are receieved for batches of records from an internal (background) thread. Supported buffered tracing services are enumerated in  `rocprofiler_buffer_tracing_kind_t`.
+In buffered approach, callbacks are receieved for batches of records from an internal (background) thread.
+Supported buffered tracing services are enumerated in  `rocprofiler_buffer_tracing_kind_t`. Configuring
+a buffer tracing service requires the creation of a buffer. When the buffer is "flushed", either implicitly
+or explicitly, a callback to the tool will be invoked which provides an array of one or more buffer records.
+A buffer can be explicitly flushed via the `rocprofiler_flush_buffer` function.

-## HSA API Tracing
+## Subscribing to Buffer Tracing Services

-## Kernel Tracing
+During tool initialization, tools configure callback tracing via the `rocprofiler_configure_buffer_tracing_service`
+function. However, before invoking `rocprofiler_configure_buffer_tracing_service`, the tool must create a buffer
+for the tracing records.
+
+### Creating a Buffer
+
+```cpp
+rocprofiler_status_t
+rocprofiler_create_buffer(rocprofiler_context_id_t        context,
+                          size_t                          size,
+                          size_t                          watermark,
+                          rocprofiler_buffer_policy_t     policy,
+                          rocprofiler_buffer_tracing_cb_t callback,
+                          void*                           callback_data,
+                          rocprofiler_buffer_id_t*        buffer_id);
+```
+
+The `size` parameter is the size of the buffer in bytes and will be rounded up to the nearest
+memory page size (defined by `sysconf(_SC_PAGESIZE)`); the default memory page size on Linux
+is 4096 bytes (4 KB).
+
+The `watermark` parameter specifies the number of bytes at which
+the buffer should be "flushed", i.e. when the records in the buffer should invoke the
+`callback` parameter to deliver the records to the tool. For example, if a buffer has a size
+of 4096 bytes and the watermark is set to 48 bytes, six 8-byte records can be placed in the
+buffer before `callback` is invoked. However, every 64-byte record that is placed in the
+buffer will trigger a flush. It is safe to set the `watermark` to any value between
+zero and the buffer size.
+
+The `policy` parameter specifies the behavior for when a record is larger than the
+amount of free space in the current buffer. For example, if a buffer has a size of
+4000 bytes with a watermark set to 4000 bytes and 3998 of the bytes in the buffer
+have been populated with records, the `policy` dictates how to handle an incoming record >
+2 bytes. The `ROCPROFILER_BUFFER_POLICY_DISCARD` policy dictates that all records greater
+than should 2 bytes should be dropped until the tool _explicitly_ flushes the buffer via
+a `rocprofiler_flush_buffer` function call whereas the `ROCPROFILER_BUFFER_POLICY_LOSSLESS`
+policy dictates that the current buffer should be swapped out for an empty buffer and placed
+in that new buffer and former (full) buffer should be _implicitly_ flushed.
+
+The `callback` parameter is the function that rocprofiler-sdk should invoke when flushing
+the buffer; the value of the `callback_data` parameter will be passed as one of the arguments
+to the `callback` function.
+
+The `buffer_id` parameter is an output parameter for the function call and will have a
+non-zero handle field after successful buffer creation.
+
+### Creating a Dedicated Thread for Buffer Callbacks
+
+By default, all buffers will use the same (default) background thread created by rocprofiler-sdk to
+invoke their callback. However, rocprofiler-sdk provides an interface for tools to specify the
+creation of an additional background thread for one or more of their buffers.
+
+Callback threads for buffers are created via the `rocprofiler_create_callback_thread` function:
+
+```cpp
+rocprofiler_status_t
+rocprofiler_create_callback_thread(rocprofiler_callback_thread_t* cb_thread_id);
+```
+
+Buffers are assigned to that callback thread via the `rocprofiler_assign_callback_thread` function:
+
+```cpp
+rocprofiler_status_t
+rocprofiler_assign_callback_thread(rocprofiler_buffer_id_t       buffer_id,
+                                   rocprofiler_callback_thread_t cb_thread_id);
+```
+
+#### Buffer Callback Thread Creation and Assignment Example
+
+```cpp
+{
+    // create a context
+    auto context_id = rocprofiler_context_id_t{};
+    rocprofiler_create_context(&context_id);
+
+    // create a buffer associated with the context
+    auto buffer_id  = rocprofiler_buffer_id_t{};
+    rocprofiler_create_buffer(context_id, ..., &buffer_id);
+
+    // specify that a new callback thread should be created and provide
+    // and assign the identifier for it to the "thr_id" variable
+    auto thr_id = rocprofiler_callback_thread_t{};
+    rocprofiler_create_callback_thread(&thr_id);
+
+    // assign the buffer callback to be delivered on this thread
+    rocprofiler_assign_callback_thread(buffer_id, thr_id);
+}
+```
+
+### Configuring Buffer Tracing Services
+
+```cpp
+rocprofiler_status_t
+rocprofiler_configure_buffer_tracing_service(rocprofiler_context_id_t          context_id,
+                                             rocprofiler_buffer_tracing_kind_t kind,
+                                             rocprofiler_tracing_operation_t*  operations,
+                                             size_t                            operations_count,
+                                             rocprofiler_buffer_id_t           buffer_id);
+```
+
+The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
+Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
+For each domain, there are (often) various "operations", which can be used to restrict the callbacks
+to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
+which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
+parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
+of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
+size of the array for the `operations` and `operations_count` parameter.
+
+Similar to `rocprofiler_configure_callback_tracing_service`,
+`rocprofiler_configure_buffer_tracing_service` will return an error if a buffer service for given context
+and given domain is configured more than once.
+
+#### Example
+
+```cpp
+{
+    auto ctx = rocprofiler_context_id_t{};
+    // ... creation of context, etc. ...
+
+    // buffer parameters
+    constexpr auto KB          = 1024;  // 1024 bytes
+    constexpr auto buffer_size = 16 * KB;
+    constexpr auto watermark   = 15 * KB;
+    constexpr auto policy      = ROCPROFILER_BUFFER_POLICY_LOSSLESS;
+
+    // buffer handle
+    auto buffer_id = rocprofiler_buffer_id_t{};
+
+    // create a buffer associated with the context
+    rocprofiler_create_buffer(
+        context_id, buffer_size, watermark, policy, callback_func, nullptr, &buffer_id);
+
+    // configure HIP runtime API function records to be placed in buffer
+    rocprofiler_configure_buffer_tracing_service(
+        ctx, ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API, nullptr, 0, buffer_id);
+
+    // configure kernel dispatch records to be placed in buffer
+    // (more than one service can use the same buffer)
+    rocprofiler_configure_buffer_tracing_service(
+        ctx, ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH, nullptr, 0, buffer_id);
+
+    // ... etc. ...
+}
+```
+
+## Buffer Tracing Callback Function
+
+Rocprofiler-sdk buffer tracing callback functions have the signature:
+
+```cpp
+typedef void (*rocprofiler_buffer_tracing_cb_t)(rocprofiler_context_id_t      context,
+                                                rocprofiler_buffer_id_t       buffer_id,
+                                                rocprofiler_record_header_t** headers,
+                                                size_t                        num_headers,
+                                                void*                         data,
+                                                uint64_t                      drop_count);
+```
+
+The `rocprofiler_record_header_t` data type provides three pieces of information:
+
+1. Category (`rocprofiler_buffer_category_t`)
+2. Kind
+3. Payload
+
+The category is used to distinguish the classification of the buffer record. For all
+services configured via `rocprofiler_configure_buffer_tracing_service`, the category will
+be equal to the value of `ROCPROFILER_BUFFER_CATEGORY_TRACING`. The meaning of the kind
+field is dependent on the category but when the category is `ROCPROFILER_BUFFER_CATEGORY_TRACING`,
+the kind value will be equivalent to the  is used
+to distinguish the `rocprofiler_buffer_tracing_kind_t` value passed to
+`rocprofiler_configure_buffer_tracing_service`, e.g. `ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH`.
+Once the category and kind have been determined, the payload can be casted:
+
+```cpp
+{
+    if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
+        header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API)
+    {
+        auto* record =
+            static_cast<rocprofiler_buffer_tracing_hip_api_record_t*>(header->payload);
+
+        // ... etc. ...
+    }
+}
+```
+
+### Buffer Tracing Callback Function Example
+
+```cpp
+void
+buffer_callback_func(rocprofiler_context_id_t      context,
+                     rocprofiler_buffer_id_t       buffer_id,
+                     rocprofiler_record_header_t** headers,
+                     size_t                        num_headers,
+                     void*                         user_data,
+                     uint64_t                      drop_count)
+{
+    for(size_t i = 0; i < num_headers; ++i)
+    {
+        auto* header = headers[i];
+
+        if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
+           header->kind == ROCPROFILER_BUFFER_TRACING_HIP_RUNTIME_API)
+        {
+            auto* record =
+                static_cast<rocprofiler_buffer_tracing_hip_api_record_t*>(header->payload);
+
+            // ... etc. ...
+        }
+        else if(header->category == ROCPROFILER_BUFFER_CATEGORY_TRACING &&
+                header->kind == ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH)
+        {
+            auto* record =
+                static_cast<rocprofiler_buffer_tracing_kernel_dispatch_record_t*>(header->payload);
+
+            // ... etc. ...
+        }
+        else
+        {
+            throw std::runtime_error{"unhandled record header category + kind"};
+        }
+    }
+}
+```
+
+## Buffer Tracing Record
+
+Unlike callback tracing records, there is no common set of data for each buffer tracing record. However,
+many buffer tracing records contain a `kind` field and an `operation` field.
+The name of a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_name` function.
+The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_buffer_tracing_kind_operation_name`
+function. One can also iterate over all the buffer tracing kinds and operations for each tracing kind via the
+`rocprofiler_iterate_buffer_tracing_kinds` and `rocprofiler_iterate_buffer_tracing_kind_operations` functions.
+
+The buffer tracing record data types can be found in the `rocprofiler-sdk/buffer_tracing.h` header
+(`source/include/rocprofiler-sdk/buffer_tracing.h` in the [rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk)).
@@ -2,6 +2,336 @@

 ## Overview

+Callback tracing services provide immediate callbacks to a tool on the current CPU thread when a given event occurs.
+For example, when tracing an API function, e.g. `hipSetDevice`, callback tracing invokes a user-specified callback
+before and after the traced function executes on the thread which is invoking the API function.
+
+## Subscribing to Callback Tracing Services
+
+During tool initialization, tools configure callback tracing via the `rocprofiler_configure_callback_tracing_service`
+function:
+
+```cpp
+rocprofiler_status_t
+rocprofiler_configure_callback_tracing_service(rocprofiler_context_id_t            context_id,
+                                               rocprofiler_callback_tracing_kind_t kind,
+                                               rocprofiler_tracing_operation_t*    operations,
+                                               size_t                              operations_count,
+                                               rocprofiler_callback_tracing_cb_t   callback,
+                                               void*                               callback_args);
+```
+
+The `kind` parameter is a high-level specifier of which service to trace (also known as a "domain").
+Domain examples include, but are not limited to, the HIP API, the HSA API, and kernel dispatches.
+For each domain, there are (often) various "operations", which can be used to restrict the callbacks
+to a subset within the domain. For domains which correspond to APIs, the "operations" are the functions
+which compose the API. If all operations in a domain should be traced, the `operations` and `operations_count`
+parameters can be set to `nullptr` and `0`, respectively. If the tracing domain should be restricted to a subset
+of operations, the tool library should specify a C-array of type `rocprofiler_tracing_operation_t` and the
+size of the array for the `operations` and `operations_count` parameter.
+
+`rocprofiler_configure_callback_tracing_service` will return an error if a callback service for given context
+and given domain is configured more than once. For example, if one only wanted to trace two functions within
+the HIP runtime API, `hipGetDevice` and `hipSetDevice`, the following code would accomplish this objective:
+
+```cpp
+{
+    auto ctx = rocprofiler_context_id_t{};
+    // ... creation of context, etc. ...
+
+    // array of operations (i.e. API functions)
+    auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
+        ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
+        ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
+    };
+
+    rocprofiler_configure_callback_tracing_service(ctx,
+                                                   ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
+                                                   operations.data(),
+                                                   operations.size(),
+                                                   callback_func,
+                                                   nullptr);
+    // ... etc. ...
+}
+```
+
+But the following code would be invalid:
+
+```cpp
+{
+    auto ctx = rocprofiler_context_id_t{};
+    // ... creation of context, etc. ...
+
+    // array of operations (i.e. API functions)
+    auto operations = std::array<rocprofiler_tracing_operation_t, 2>{
+        ROCPROFILER_HIP_RUNTIME_API_ID_hipSetDevice,
+        ROCPROFILER_HIP_RUNTIME_API_ID_hipGetDevice
+    };
+
+    for(auto op : operations)
+    {
+        // after the first iteration, will return ROCPROFILER_STATUS_ERROR_SERVICE_ALREADY_CONFIGURED
+        rocprofiler_configure_callback_tracing_service(ctx,
+                                                       ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API,
+                                                       &op,
+                                                       1,
+                                                       callback_func,
+                                                       nullptr);
+    }
+
+    // ... etc. ...
+}
+```
+
+## Callback Tracing Callback Function
+
+Rocprofiler-sdk callback tracing callback functions have the signature:
+
+```cpp
+typedef void (*rocprofiler_callback_tracing_cb_t)(rocprofiler_callback_tracing_record_t record,
+                                                  rocprofiler_user_data_t*              user_data,
+                                                  void* callback_data)
+```
+
+The `record` parameter contains the information to uniquely identify a tracing record type and has the
+following definition:
+
+```cpp
+typedef struct rocprofiler_callback_tracing_record_t
+{
+    rocprofiler_context_id_t            context_id;
+    rocprofiler_thread_id_t             thread_id;
+    rocprofiler_correlation_id_t        correlation_id;
+    rocprofiler_callback_tracing_kind_t kind;
+    uint32_t                            operation;
+    rocprofiler_callback_phase_t        phase;
+    void*                               payload;
+} rocprofiler_callback_tracing_record_t;
+```
+
+The underlying type of `payload` field above is typically unique to a domain and, less frequently, an operation.
+For example, for the `ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API` and `ROCPROFILER_CALLBACK_TRACING_HIP_COMPILER_API`,
+the payload should be casted to `rocprofiler_callback_tracing_hip_api_data_t*` -- which will contain the arguments
+to the function and (in the exit phase) the return value of the function. The payload field will only be a valid
+pointer during the invocation of the callback function(s).
+
+The `user_data` parameter can be used to store data in between callback phases. It is a unique for every
+instance of an operation. For example, if the tool library wishes to store the timestamp of the
+`ROCPROFILER_CALLBACK_PHASE_ENTER` phase for the ensuing `ROCPROFILER_CALLBACK_PHASE_EXIT` callback,
+this data can be stored in a method similar to below:
+
+```cpp
+void
+callback_func(rocprofiler_callback_tracing_record_t record,
+              rocprofiler_user_data_t*              user_data,
+              void*                                 cb_data)
+{
+    auto ts = rocprofiler_timestamp_t{};
+    rocprofiler_get_timestamp(&ts);
+
+    if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
+    {
+        user_data->value = ts;
+    }
+    else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
+    {
+        auto delta_ts = (ts - user_data->value);
+        // ... etc. ...
+    }
+    else
+    {
+        // ... etc. ...
+    }
+}
+```
+
+The `callback_data` argument will be the value of `callback_args` passed to `rocprofiler_configure_callback_tracing_service`
+in [the previous section](#subscribing-to-callback-tracing-services).
+
+## Callback Tracing Record
+
+The name of a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_name` function.
+The name of an operation specific to a tracing kind can be obtained via the `rocprofiler_query_callback_tracing_kind_operation_name`
+function. One can also iterate over all the callback tracing kinds and operations for each tracing kind via the
+`rocprofiler_iterate_callback_tracing_kinds` and `rocprofiler_iterate_callback_tracing_kind_operations` functions.
+Lastly, for a given `rocprofiler_callback_tracing_record_t` object, rocprofiler-sdk supports generically iterating over
+the arguments of the payload field for many domains.
+
+As mentioned above, within the `rocprofiler_callback_tracing_record_t` object,
+an opaque `void* payload` is provided for accessing domain specific information.
+The data types generally follow the naming convention of `rocprofiler_callback_tracing_<DOMAIN>_data_t`,
+e.g., for the tracing kinds `ROCPROFILER_BUFFER_TRACING_HSA_{CORE,AMD_EXT,IMAGE_EXT,FINALIZE_EXT}_API`,
+the payload should be casted to `rocprofiler_callback_tracing_hsa_api_data_t*`:
+
+```cpp
+void
+callback_func(rocprofiler_callback_tracing_record_t record,
+              rocprofiler_user_data_t*              user_data,
+              void*                                 cb_data)
+{
+    static auto hsa_domains = std::unordered_set<rocprofiler_buffer_tracing_kind_t>{
+        ROCPROFILER_BUFFER_TRACING_HSA_CORE_API,
+        ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API,
+        ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API,
+        ROCPROFILER_BUFFER_TRACING_HSA_FINALIZER_API};
+
+    if(hsa_domains.count(record.kind) > 0)
+    {
+        auto* payload = static_cast<rocprofiler_callback_tracing_hsa_api_data_t*>(record.payload);
+
+        hsa_status_t status = payload->retval.hsa_status_t_retval;
+        if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT && status != HSA_STATUS_SUCCESS)
+        {
+            const char* _kind = nullptr;
+            const char* _operation = nullptr;
+
+            rocprofiler_query_callback_tracing_kind_name(record.kind, &_kind, nullptr);
+            rocprofiler_query_callback_tracing_kind_operation_name(
+                record.kind, record.operation, &_operation, nullptr);
+
+            // message that
+            fprintf(stderr, "[domain=%s] %s returned a non-zero exit code: %i\n", _kind, _operation, status);
+        }
+    }
+    else if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT)
+    {
+        auto delta_ts = (ts - user_data->value);
+        // ... etc. ...
+    }
+    else
+    {
+        // ... etc. ...
+    }
+}
+```
+
+### Sample `rocprofiler_iterate_callback_tracing_kind_operation_args`
+
+```cpp
+int
+print_args(rocprofiler_callback_tracing_kind_t domain_idx,
+           uint32_t                            op_idx,
+           uint32_t                            arg_num,
+           const void* const                   arg_value_addr,
+           int32_t                             arg_indirection_count,
+           const char*                         arg_type,
+           const char*                         arg_name,
+           const char*                         arg_value_str,
+           int32_t                             arg_dereference_count,
+           void*                               data)
+{
+    if(arg_num == 0)
+    {
+        const char* _kind      = nullptr;
+        const char* _operation = nullptr;
+
+        rocprofiler_query_callback_tracing_kind_name(domain_idx, &_kind, nullptr);
+        rocprofiler_query_callback_tracing_kind_operation_name(
+            domain_idx, op_idx, &_operation, nullptr);
+
+        fprintf(stderr, "\n[%s] %s\n", _kind, _operation);
+    }
+
+    char* _arg_type = abi::__cxa_demangle(arg_type, nullptr, nullptr, nullptr);
+
+    fprintf(stderr, "    %u: %-18s %-16s = %s\n", arg_num, _arg_type, arg_name, arg_value_str);
+
+    free(_arg_type);
+
+    // unused in example
+    (void) arg_value_addr;
+    (void) arg_indirection_count;
+    (void) arg_dereference_count;
+    (void) data;
+
+    return 0;
+}
+
+void
+callback_func(rocprofiler_callback_tracing_record_t record,
+              rocprofiler_user_data_t*              user_data,
+              void*                                 cb_data)
+{
+    if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT &&
+       record.kind == ROCPROFILER_CALLBACK_TRACING_HIP_RUNTIME_API &&
+       (record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipLaunchKernel ||
+        record.operation == ROCPROFILER_HIP_RUNTIME_API_ID_hipMemcpyAsync))
+    {
+        rocprofiler_iterate_callback_tracing_kind_operation_args(
+                             record, print_args, record.phase, nullptr));
+    }
+}
+```
+
+Sample Output:
+
+```console
+
+[HIP_RUNTIME_API] hipLaunchKernel
+    0: void const*        function_address = 0x219308
+    1: rocprofiler_dim3_t numBlocks        = {z=1, y=310, x=310}
+    2: rocprofiler_dim3_t dimBlocks        = {z=1, y=32, x=32}
+    3: void**             args             = 0x7ffe6d8dd3c0
+    4: unsigned long      sharedMemBytes   = 0
+    5: ihipStream_t*      stream           = 0x17b40c0
+
+[HIP_RUNTIME_API] hipMemcpyAsync
+    0: void*              dst              = 0x7f06c7bbb010
+    1: void const*        src              = 0x7f0698800000
+    2: unsigned long      sizeBytes        = 393625600
+    3: hipMemcpyKind      kind             = DeviceToHost
+    4: ihipStream_t*      stream           = 0x25dfcf0
+```
+
 ## Code Object Tracing

-## HSA API Tracing
+The code object tracing service is a critical component for obtaining information regarding
+asynchronous activity on the GPU. The `rocprofiler_callback_tracing_code_object_load_data_t`
+payload (kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_LOAD`)
+provides a unique identifier for a bundle of one or more GPU kernel symbols which have been loaded
+for a specific GPU agent. For example, if your application is leveraging a multi-GPU system system
+containing 4 Vega20 GPUs and 4 MI100 GPUs, there will at least 8 code objects loaded: one code
+object for each GPU. Each code object will be associated with a set of kernel symbols:
+the `rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t` payload
+(kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`, operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`)
+provides a globally unique identifier for the specific kernel symbol along with the kernel name and
+several other static properties of the kernel (e.g. scratch size, scalar general purpose register count, etc.).
+Note: two otherwise identical kernel symbols (same kernel name, scratch size, etc.) which are part of
+otherwise identical code objects but the code objects are loaded for different GPU agents ***will*** have unique
+kernel identifiers. Furthermore, if the same code object (and it's kernel symbols) are unloaded and then
+re-loaded, that code object and all of it's kernel symbols ***will*** be given new unique identifiers.
+
+In general, when a code object is loaded and unloaded, here is the sequence of events:
+
+1. Callback: code object load
+    - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
+    - operation=`ROCPROFILER_CODE_OBJECT_LOAD`
+    - phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
+2. Callback: kernel symbol load
+    - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
+    - operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
+    - phase=`ROCPROFILER_CALLBACK_PHASE_LOAD`
+    - Repeats for each kernel symbol in code object
+3. Application Execution
+4. Callback: kernel symbol unload
+    - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
+    - operation=`ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER`
+    - phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
+    - Repeats for each kernel symbol in code object
+5. Callback: code object unload
+    - kind=`ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT`
+    - operation=`ROCPROFILER_CODE_OBJECT_LOAD`
+    - phase=`ROCPROFILER_CALLBACK_PHASE_UNLOAD`
+
+Note: rocprofiler-sdk does not provide an interface to query this information outside of the
+code object tracing service. If you wish to be able to associate kernel names with kernel tracing records,
+a tool is personally responsible for making a copy of the relevant information when the code objects and
+kernel symbol are loaded (however, any constant string fields like the (`const char* kernel_name` field)
+need not to be copied, these are guaranteed to be valid pointers until after rocprofiler-sdk finalization).
+If a tool decides to delete their copy of the data associated with a given code object or kernel symbol
+identifier when the code object and kernel symbols are unloaded, it is highly recommended to flush
+any/all buffers which might contain references to that code object or kernel symbol identifiers before
+deleting the associated data.
+
+For a sample of code object tracing, please see the `samples/code_object_tracing` example in the
+[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).
@@ -1,3 +1,96 @@
 # Runtime Intercept Tables

-Discussion on how access the raw runtime intercept tables of HSA and HIP (i.e. ExaTracer requirements by LTTng).
+Although most tools will want to leverage the callback or buffer tracing services for tracing the HIP, HSA, and ROCTx
+APIs, rocprofiler-sdk does provide access to the raw API dispatch tables. Each of the aforementioned APIs are
+designed similar to the following sample.
+
+## Dispatch Table Overview
+
+### Forward Declaration of public C API function
+
+```cpp
+extern "C"
+{
+// forward declaration of public C API function
+int
+foo(int) __attribute__((visibility("default")));
+}
+```
+
+### Internal Implementation of API function
+
+```cpp
+namespace impl
+{
+int
+foo(int val)
+{
+    // real implementation
+    return (2 * val);
+}
+}
+```
+
+### Dispatch Table Implementation
+
+```cpp
+namespace impl
+{
+struct dispatch_table
+{
+    int (*foo_fn)(int) = nullptr;
+};
+
+// invoked once: populates the dispatch_table with function pointers to implementation
+dispatch_table*&
+construct_dispatch_table()
+{
+    static dispatch_table* tbl = new dispatch_table{};
+    tbl->foo_fn                = impl::foo;
+
+    // in between above and below, rocprofiler-sdk gets passed the pointer
+    // to the dispatch table and has the opportunity to wrap the function
+    // pointers for interception
+
+    return tbl;
+}
+
+// constructs dispatch table and stores it in static variable
+dispatch_table*
+get_dispatch_table()
+{
+    static dispatch_table*& tbl = construct_dispatch_table();
+    return tbl;
+}
+}  // namespace impl
+```
+
+### Implementaiton of public C API function
+
+```cpp
+extern "C"
+{
+// implementation of public C API function
+int
+foo(int val)
+{
+    return impl::get_dispatch_table()->foo_fn(val);
+}
+}
+```
+
+### Dispatch Table Chaining
+
+rocprofiler-sdk is given an opportunity within `impl::construct_dispatch_table()` to
+save the original value(s) of the function pointers such as `foo_fn` and install
+it's own function pointers in its place -- this results in the public C API function `foo`
+calling into the rocprofiler-sdk function pointer, which then in turn, calls the original
+function pointer to `impl::foo` (this is called "chaining"). Once rocprofiler-sdk
+has made any necessary modifications to the dispatch table, tools which indicated
+they also want access to the raw dispatch table via `rocprofiler_at_intercept_table_registration`
+will be passed the pointer to the dispatch table.
+
+## Sample
+
+For a demo of dispatch table chaining, please see the `samples/intercept_table` example in the
+[rocprofiler-sdk GitHub repository](https://github.com/ROCm/rocproifler-sdk).