1517a398bf
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
1186 satır
46 KiB
C++
1186 satır
46 KiB
C++
// MIT License
|
|
//
|
|
// Copyright (c) 2023-2025 Advanced Micro Devices, Inc. All rights reserved.
|
|
//
|
|
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
// of this software and associated documentation files (the "Software"), to deal
|
|
// in the Software without restriction, including without limitation the rights
|
|
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
// copies of the Software, and to permit persons to whom the Software is
|
|
// furnished to do so, subject to the following conditions:
|
|
//
|
|
// The above copyright notice and this permission notice shall be included in
|
|
// all copies or substantial portions of the Software.
|
|
//
|
|
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
// THE SOFTWARE.
|
|
|
|
#include "lib/rocprofiler-sdk/ompt/ompt.hpp"
|
|
#include "lib/common/logging.hpp"
|
|
#include "lib/common/string_entry.hpp"
|
|
#include "lib/common/utility.hpp"
|
|
#include "lib/rocprofiler-sdk/context/correlation_id.hpp"
|
|
#include "lib/rocprofiler-sdk/tracing/fwd.hpp"
|
|
#include "lib/rocprofiler-sdk/tracing/tracing.hpp"
|
|
|
|
#include <rocprofiler-sdk/buffer.h>
|
|
#include <rocprofiler-sdk/buffer_tracing.h>
|
|
#include <rocprofiler-sdk/callback_tracing.h>
|
|
#include <rocprofiler-sdk/external_correlation.h>
|
|
#include <rocprofiler-sdk/fwd.h>
|
|
#include <rocprofiler-sdk/ompt.h>
|
|
#include <rocprofiler-sdk/ompt/api_args.h>
|
|
#include <rocprofiler-sdk/ompt/omp-tools.h>
|
|
|
|
#include <glog/logging.h>
|
|
|
|
#include <cstddef>
|
|
#include <cstdint>
|
|
#include <unordered_map>
|
|
#include <utility>
|
|
|
|
namespace rocprofiler
|
|
{
|
|
namespace ompt
|
|
{
|
|
namespace
|
|
{
|
|
ompt_table&
|
|
get_table();
|
|
|
|
struct ompt_table_lookup
|
|
{
|
|
using type = ompt_table;
|
|
auto& operator()(type& _v) const { return _v; }
|
|
auto& operator()(type* _v) const { return *_v; }
|
|
auto& operator()() const { return (*this)(get_table()); }
|
|
};
|
|
} // namespace
|
|
} // namespace ompt
|
|
} // namespace rocprofiler
|
|
|
|
#define ROCPROFILER_LIB_ROCPROFILER_OMPT_OMPT_CPP_IMPL 1
|
|
#include "ompt.def.cpp"
|
|
#undef ROCPROFILER_LIB_ROCPROFILER_OMPT_OMPT_CPP_IMPL
|
|
|
|
namespace rocprofiler
|
|
{
|
|
namespace ompt
|
|
{
|
|
namespace
|
|
{
|
|
auto&
|
|
get_ompt_state_stack()
|
|
{
|
|
// for callbacks that don't have a place to stash context, we assume
|
|
// a per-thread stack. otherwise we stash the saved state in the ompt_data_t field.
|
|
static thread_local auto _v = tracing::small_vector_t<ompt_save_state*, 8>{};
|
|
return _v;
|
|
}
|
|
|
|
auto*
|
|
get_ompt_data_proxy()
|
|
{
|
|
static auto*& _v = common::static_object<ompt_data_proxy>::construct();
|
|
return _v;
|
|
}
|
|
|
|
// Macros for access to appropriate ompt_data_t* proxy
|
|
#define CLIENT(name) (CHECK_NOTNULL(get_ompt_data_proxy())->get_client_ptr(name))
|
|
#define INTERNAL(name) (CHECK_NOTNULL(get_ompt_data_proxy())->get_internal_ptr(name))
|
|
|
|
void
|
|
ompt_thread_begin_callback(ompt_thread_t thread_type, ompt_data_t* thread_data)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_thread_begin>::event(thread_type, CLIENT(thread_data));
|
|
}
|
|
|
|
void
|
|
ompt_thread_end_callback(ompt_data_t* thread_data)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_thread_end>::event(CLIENT(thread_data));
|
|
}
|
|
|
|
void
|
|
ompt_parallel_begin_callback(ompt_data_t* encountering_task_data,
|
|
const ompt_frame_t* encountering_task_frame,
|
|
ompt_data_t* parallel_data,
|
|
unsigned int requested_parallelism,
|
|
int flags,
|
|
const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_parallel_begin>::event(CLIENT(encountering_task_data),
|
|
encountering_task_frame,
|
|
CLIENT(parallel_data),
|
|
requested_parallelism,
|
|
flags,
|
|
codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_parallel_end_callback(ompt_data_t* parallel_data,
|
|
ompt_data_t* encountering_task_data,
|
|
int flags,
|
|
const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_parallel_end>::event(
|
|
CLIENT(parallel_data), CLIENT(encountering_task_data), flags, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_task_create_callback(ompt_data_t* encountering_task_data,
|
|
const ompt_frame_t* encountering_task_frame,
|
|
ompt_data_t* new_task_data,
|
|
int flags,
|
|
int has_dependences,
|
|
const void* codeptr_ra)
|
|
{
|
|
auto* corr_id =
|
|
ompt_impl<ROCPROFILER_OMPT_ID_task_create>::event_common(CLIENT(encountering_task_data),
|
|
encountering_task_frame,
|
|
CLIENT(new_task_data),
|
|
flags,
|
|
has_dependences,
|
|
codeptr_ra);
|
|
if(!corr_id) return; // During finalization
|
|
|
|
auto* state = new ompt_task_save_state{corr_id, flags};
|
|
INTERNAL(new_task_data)->ptr = state;
|
|
|
|
context::pop_latest_correlation_id(corr_id);
|
|
}
|
|
|
|
void
|
|
ompt_task_schedule_callback(ompt_data_t* prior_task_data,
|
|
ompt_task_status_t prior_task_status,
|
|
ompt_data_t* next_task_data)
|
|
{
|
|
auto* corr_id = ompt_impl<ROCPROFILER_OMPT_ID_task_schedule>::event_common(
|
|
CLIENT(prior_task_data), prior_task_status, CLIENT(next_task_data));
|
|
if(!corr_id) return; // During finalization
|
|
context::pop_latest_correlation_id(corr_id);
|
|
corr_id->sub_ref_count();
|
|
|
|
/* Warning: some tasks like early_fulfill may be scheduled
|
|
* out twice. The ordering between the early_fulfill and the complete
|
|
* (for example) is not specified. In this case the prior_task_state
|
|
* needs to be added to the early return if condition below.
|
|
*/
|
|
auto* pprior = INTERNAL(prior_task_data);
|
|
auto* pnext = INTERNAL(next_task_data);
|
|
assert(pprior != nullptr);
|
|
auto* state_prior = reinterpret_cast<ompt_task_save_state*>(pprior->ptr);
|
|
if(state_prior == nullptr)
|
|
ROCP_FATAL << "state_prior == nullptr prior_task_status: " << prior_task_status << ".";
|
|
|
|
auto* state_next = pnext ? reinterpret_cast<ompt_task_save_state*>(pnext->ptr) : nullptr;
|
|
auto* prior_corrid = context::get_latest_correlation_id();
|
|
if(state_prior->corr_id == prior_corrid && state_prior->task_flags != 0)
|
|
{
|
|
// pop the current correlation ID (for the prior_task)
|
|
assert((state_prior->task_flags & 0xFF) == ompt_task_explicit);
|
|
context::pop_latest_correlation_id(prior_corrid);
|
|
}
|
|
if(state_next && (state_next->task_flags & 0xFF) == ompt_task_explicit)
|
|
{
|
|
// push the next correlation ID (for the next_task)
|
|
context::push_correlation_id(state_next->corr_id);
|
|
}
|
|
if(prior_task_status == ompt_task_yield || prior_task_status == ompt_task_detach ||
|
|
prior_task_status == ompt_task_switch || prior_task_status == ompt_task_early_fulfill)
|
|
return;
|
|
// the prior task is done
|
|
assert(state_prior != nullptr);
|
|
assert(state_prior->task_flags != 0);
|
|
if(prior_task_status == ompt_task_complete)
|
|
{
|
|
// FIXME? do we need to decrement the ref count
|
|
// state_prior->corr_id->sub_ref_count();
|
|
delete state_prior;
|
|
pprior->ptr = nullptr;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_implicit_task_callback(ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
unsigned int actual_parallelism,
|
|
unsigned int index,
|
|
int flags)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_implicit_task>::begin(INTERNAL(task_data),
|
|
endpoint,
|
|
CLIENT(parallel_data),
|
|
CLIENT(task_data),
|
|
actual_parallelism,
|
|
index,
|
|
flags);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_implicit_task>::end(INTERNAL(task_data),
|
|
endpoint,
|
|
CLIENT(parallel_data),
|
|
CLIENT(task_data),
|
|
actual_parallelism,
|
|
index,
|
|
flags);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in implicit_task is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_device_initialize_callback(int device_num,
|
|
const char* type,
|
|
ompt_device_t* device,
|
|
ompt_function_lookup_t lookup,
|
|
const char* documentation)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_device_initialize>::event(
|
|
device_num, type, device, lookup, documentation);
|
|
}
|
|
|
|
void
|
|
ompt_device_finalize_callback(int device_num)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_device_finalize>::event(device_num);
|
|
}
|
|
|
|
void
|
|
ompt_device_load_callback(int device_num,
|
|
const char* filename,
|
|
int64_t offset_in_file,
|
|
void* vma_in_file,
|
|
size_t bytes,
|
|
void* host_addr,
|
|
void* device_addr,
|
|
uint64_t module_id)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_device_load>::event(device_num,
|
|
filename,
|
|
offset_in_file,
|
|
vma_in_file,
|
|
bytes,
|
|
host_addr,
|
|
device_addr,
|
|
module_id);
|
|
}
|
|
|
|
// void
|
|
// ompt_device_unload_callback(int device_num, uint64_t module_id)
|
|
// {
|
|
// ompt_impl<ROCPROFILER_OMPT_ID_device_unload>::event(device_num, module_id);
|
|
// }
|
|
|
|
void
|
|
ompt_sync_region_wait_callback(ompt_sync_region_t kind,
|
|
ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
const void* codeptr_ra)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_sync_region_wait>::begin(
|
|
nullptr, kind, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_sync_region_wait>::end(
|
|
nullptr, kind, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in sync_region_wait is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_mutex_released_callback(ompt_mutex_t kind, ompt_wait_id_t wait_id, const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_mutex_released>::event(kind, wait_id, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_dependences_callback(ompt_data_t* task_data, const ompt_dependence_t* deps, int ndeps)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_dependences>::event(CLIENT(task_data), deps, ndeps);
|
|
}
|
|
|
|
void
|
|
ompt_task_dependence_callback(ompt_data_t* src_task_data, ompt_data_t* sink_task_data)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_task_dependence>::event(CLIENT(src_task_data),
|
|
CLIENT(sink_task_data));
|
|
}
|
|
|
|
void
|
|
ompt_work_callback(ompt_work_t work_type,
|
|
ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
uint64_t count,
|
|
const void* codeptr_ra)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_work>::begin(nullptr,
|
|
work_type,
|
|
endpoint,
|
|
CLIENT(parallel_data),
|
|
CLIENT(task_data),
|
|
count,
|
|
codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_work>::end(nullptr,
|
|
work_type,
|
|
endpoint,
|
|
CLIENT(parallel_data),
|
|
CLIENT(task_data),
|
|
count,
|
|
codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in work is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_masked_callback(ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
const void* codeptr_ra)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_masked>::begin(
|
|
nullptr, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_masked>::end(
|
|
nullptr, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in masked is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_target_map_callback(ompt_id_t target_id,
|
|
unsigned int nitems,
|
|
void** host_addr,
|
|
void** device_addr,
|
|
size_t* bytes,
|
|
unsigned int* mapping_flags,
|
|
const void* codeptr_ra)
|
|
{
|
|
common::consume_args(
|
|
target_id, nitems, host_addr, device_addr, bytes, mapping_flags, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_sync_region_callback(ompt_sync_region_t kind,
|
|
ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
const void* codeptr_ra)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_sync_region>::begin(
|
|
nullptr, kind, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_sync_region>::end(
|
|
nullptr, kind, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in sync_region is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_lock_init_callback(ompt_mutex_t kind,
|
|
unsigned int hint,
|
|
unsigned int impl,
|
|
ompt_wait_id_t wait_id,
|
|
const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_lock_init>::event(kind, hint, impl, wait_id, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_lock_destroy_callback(ompt_mutex_t kind, ompt_wait_id_t wait_id, const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_lock_destroy>::event(kind, wait_id, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_mutex_acquire_callback(ompt_mutex_t kind,
|
|
unsigned int hint,
|
|
unsigned int impl,
|
|
ompt_wait_id_t wait_id,
|
|
const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_mutex_acquire>::event(kind, hint, impl, wait_id, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_mutex_acquired_callback(ompt_mutex_t kind, ompt_wait_id_t wait_id, const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_mutex_acquired>::event(kind, wait_id, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_nest_lock_callback(ompt_scope_endpoint_t endpoint,
|
|
ompt_wait_id_t wait_id,
|
|
const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_nest_lock>::event(endpoint, wait_id, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_flush_callback(ompt_data_t* thread_data, const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_flush>::event(CLIENT(thread_data), codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_cancel_callback(ompt_data_t* task_data, int flags, const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_cancel>::event(CLIENT(task_data), flags, codeptr_ra);
|
|
}
|
|
|
|
void
|
|
ompt_reduction_callback(ompt_sync_region_t kind,
|
|
ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
const void* codeptr_ra)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_reduction>::begin(
|
|
nullptr, kind, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_reduction>::end(
|
|
nullptr, kind, endpoint, CLIENT(parallel_data), CLIENT(task_data), codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in reduction is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_dispatch_callback(ompt_data_t* parallel_data,
|
|
ompt_data_t* task_data,
|
|
ompt_dispatch_t kind,
|
|
ompt_data_t instance)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_dispatch>::event(
|
|
CLIENT(parallel_data), CLIENT(task_data), kind, instance);
|
|
}
|
|
|
|
void
|
|
ompt_target_emi_callback(ompt_target_t kind,
|
|
ompt_scope_endpoint_t endpoint,
|
|
int device_num,
|
|
ompt_data_t* task_data,
|
|
ompt_data_t* target_task_data,
|
|
ompt_data_t* target_data,
|
|
const void* codeptr_ra)
|
|
{
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_target_emi>::begin(INTERNAL(target_data),
|
|
kind,
|
|
endpoint,
|
|
device_num,
|
|
CLIENT(task_data),
|
|
CLIENT(target_task_data),
|
|
CLIENT(target_data),
|
|
codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_target_emi>::end(INTERNAL(target_data),
|
|
kind,
|
|
endpoint,
|
|
device_num,
|
|
CLIENT(task_data),
|
|
CLIENT(target_task_data),
|
|
CLIENT(target_data),
|
|
codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in target_emi is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_target_data_op_emi_callback(ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* target_task_data,
|
|
ompt_data_t* target_data,
|
|
ompt_id_t* host_op_id,
|
|
ompt_target_data_op_t optype,
|
|
void* src_address,
|
|
int src_device_num,
|
|
void* dst_address,
|
|
int dst_device_num,
|
|
size_t bytes,
|
|
const void* codeptr_ra)
|
|
{
|
|
auto* _host_op_data = reinterpret_cast<ompt_data_t*>(host_op_id);
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_target_data_op_emi>::begin(INTERNAL(_host_op_data),
|
|
endpoint,
|
|
CLIENT(target_task_data),
|
|
CLIENT(target_data),
|
|
CLIENT(_host_op_data),
|
|
optype,
|
|
src_address,
|
|
src_device_num,
|
|
dst_address,
|
|
dst_device_num,
|
|
bytes,
|
|
codeptr_ra);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_target_data_op_emi>::end(INTERNAL(_host_op_data),
|
|
endpoint,
|
|
CLIENT(target_task_data),
|
|
CLIENT(target_data),
|
|
CLIENT(_host_op_data),
|
|
optype,
|
|
src_address,
|
|
src_device_num,
|
|
dst_address,
|
|
dst_device_num,
|
|
bytes,
|
|
codeptr_ra);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in target_data_op_emi is not begin or end: " << endpoint;
|
|
}
|
|
}
|
|
|
|
void
|
|
ompt_target_submit_emi_callback(ompt_scope_endpoint_t endpoint,
|
|
ompt_data_t* target_data,
|
|
ompt_id_t* host_op_id,
|
|
unsigned int requested_num_teams)
|
|
{
|
|
auto* _host_op_data = reinterpret_cast<ompt_data_t*>(host_op_id);
|
|
if(endpoint == ompt_scope_begin)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_target_submit_emi>::begin(INTERNAL(_host_op_data),
|
|
endpoint,
|
|
CLIENT(target_data),
|
|
CLIENT(_host_op_data),
|
|
requested_num_teams);
|
|
}
|
|
else if(endpoint == ompt_scope_end)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_target_submit_emi>::end(INTERNAL(_host_op_data),
|
|
endpoint,
|
|
CLIENT(target_data),
|
|
CLIENT(_host_op_data),
|
|
requested_num_teams);
|
|
}
|
|
else
|
|
{
|
|
ROCP_FATAL << "endpoint in target_submit_emi is not begin or end: " << endpoint;
|
|
}
|
|
(void) target_data;
|
|
}
|
|
|
|
// void
|
|
// ompt_target_map_emi_callback(ompt_data_t* target_data,
|
|
// unsigned int nitems,
|
|
// void** host_addr,
|
|
// void** device_addr,
|
|
// size_t* bytes,
|
|
// unsigned int* mapping_flags,
|
|
// const void* codeptr_ra)
|
|
// {
|
|
// common::consume_args(
|
|
// target_data, nitems, host_addr, device_addr, bytes, mapping_flags, codeptr_ra);
|
|
// }
|
|
|
|
void
|
|
ompt_error_callback(ompt_severity_t severity,
|
|
const char* message,
|
|
size_t length,
|
|
const void* codeptr_ra)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_error>::event(severity, message, length, codeptr_ra);
|
|
}
|
|
#undef CLIENT
|
|
#undef INTERNAL
|
|
|
|
// The ompt callback table
|
|
ompt_table ompt_callback_table = {
|
|
ompt_thread_begin_callback,
|
|
ompt_thread_end_callback,
|
|
ompt_parallel_begin_callback,
|
|
ompt_parallel_end_callback,
|
|
ompt_task_create_callback,
|
|
ompt_task_schedule_callback,
|
|
ompt_implicit_task_callback,
|
|
ompt_device_initialize_callback,
|
|
ompt_device_finalize_callback,
|
|
ompt_device_load_callback,
|
|
// ompt_device_unload_callback,
|
|
ompt_sync_region_wait_callback,
|
|
ompt_mutex_released_callback,
|
|
ompt_dependences_callback,
|
|
ompt_task_dependence_callback,
|
|
ompt_work_callback,
|
|
ompt_masked_callback,
|
|
ompt_target_map_callback,
|
|
ompt_sync_region_callback,
|
|
ompt_lock_init_callback,
|
|
ompt_lock_destroy_callback,
|
|
ompt_mutex_acquire_callback,
|
|
ompt_mutex_acquired_callback,
|
|
ompt_nest_lock_callback,
|
|
ompt_flush_callback,
|
|
ompt_cancel_callback,
|
|
ompt_reduction_callback,
|
|
ompt_dispatch_callback,
|
|
ompt_target_emi_callback,
|
|
ompt_target_data_op_emi_callback,
|
|
ompt_target_submit_emi_callback,
|
|
// ompt_target_map_emi_callback,
|
|
ompt_error_callback,
|
|
};
|
|
|
|
ompt_table&
|
|
get_table()
|
|
{
|
|
return ompt_callback_table;
|
|
}
|
|
|
|
void
|
|
rocprof_ompt_cb_interface(rocprofiler_ompt_callback_functions_t& cb_functions)
|
|
{
|
|
ompt_impl<ROCPROFILER_OMPT_ID_callback_functions>::event(cb_functions);
|
|
}
|
|
} // namespace
|
|
|
|
ompt_data_t*
|
|
proxy_data_ptr(ompt_data_t* realptr)
|
|
{
|
|
return (get_ompt_data_proxy())->get_client_ptr(realptr);
|
|
}
|
|
|
|
// special case fake callback to send the ompt cb function pointers
|
|
template <>
|
|
struct ompt_info<ROCPROFILER_OMPT_ID_callback_functions>
|
|
{
|
|
static constexpr auto callback_domain_idx = ompt_domain_info::callback_domain_idx;
|
|
static constexpr auto buffered_domain_idx = ompt_domain_info::buffered_domain_idx;
|
|
static constexpr auto operation_idx = ROCPROFILER_OMPT_ID_callback_functions;
|
|
static constexpr auto name = "omp_callback_functions";
|
|
static constexpr bool unsupported = false;
|
|
static constexpr auto begin = -1;
|
|
|
|
using this_type = ompt_info<ROCPROFILER_OMPT_ID_callback_functions>;
|
|
using base_type = ompt_impl<ROCPROFILER_OMPT_ID_callback_functions>;
|
|
|
|
static constexpr auto offset() { return -1; }
|
|
|
|
template <typename DataT>
|
|
static auto& get_api_data_args(DataT& _data)
|
|
{
|
|
return _data.callback_functions;
|
|
}
|
|
};
|
|
|
|
// These implement the callbacks for OMPT
|
|
template <size_t OpIdx>
|
|
template <typename... Args>
|
|
void
|
|
ompt_impl<OpIdx>::begin(ompt_data_t* data, Args... args)
|
|
{
|
|
using info_type = ompt_info<OpIdx>;
|
|
|
|
ROCP_TRACE << __FUNCTION__ << " :: " << info_type::name;
|
|
|
|
constexpr auto external_corr_id_domain_idx =
|
|
ompt_domain_info::external_correlation_id_domain_idx;
|
|
|
|
constexpr auto ref_count = 2;
|
|
auto thr_id = common::get_tid();
|
|
auto callback_contexts = tracing::callback_context_data_vec_t{};
|
|
auto buffered_contexts = tracing::buffered_context_data_vec_t{};
|
|
auto external_corr_ids = tracing::external_correlation_id_map_t{};
|
|
|
|
tracing::populate_contexts(info_type::callback_domain_idx,
|
|
info_type::buffered_domain_idx,
|
|
info_type::operation_idx,
|
|
callback_contexts,
|
|
buffered_contexts,
|
|
external_corr_ids);
|
|
|
|
auto* corr_id = tracing::correlation_service::construct(ref_count);
|
|
auto internal_corr_id = corr_id->internal;
|
|
auto ancestor_corr_id = corr_id->ancestor;
|
|
|
|
tracing::populate_external_correlation_ids(external_corr_ids,
|
|
thr_id,
|
|
external_corr_id_domain_idx,
|
|
info_type::operation_idx,
|
|
internal_corr_id);
|
|
|
|
// invoke the callbacks
|
|
if(!callback_contexts.empty())
|
|
{
|
|
auto tracer_data = common::init_public_api_struct(callback_ompt_data_t{});
|
|
set_data_args(info_type::get_api_data_args(tracer_data.args), std::forward<Args>(args)...);
|
|
|
|
tracing::execute_phase_enter_callbacks(callback_contexts,
|
|
thr_id,
|
|
internal_corr_id,
|
|
external_corr_ids,
|
|
ancestor_corr_id,
|
|
info_type::callback_domain_idx,
|
|
info_type::operation_idx,
|
|
tracer_data);
|
|
}
|
|
|
|
// enter callback may update the external correlation id field
|
|
tracing::update_external_correlation_ids(
|
|
external_corr_ids, thr_id, external_corr_id_domain_idx);
|
|
|
|
// stash the state
|
|
ompt_save_state* state = new ompt_save_state{.thr_id = thr_id,
|
|
.start_timestamp = 0,
|
|
.operation_idx = info_type::operation_idx,
|
|
.corr_id = corr_id,
|
|
.external_corr_ids = external_corr_ids,
|
|
.callback_contexts = callback_contexts,
|
|
.buffered_contexts = buffered_contexts};
|
|
|
|
if(data)
|
|
data->ptr = state;
|
|
else
|
|
get_ompt_state_stack().emplace_back(state);
|
|
|
|
// decrement the reference count before returning
|
|
corr_id->sub_ref_count();
|
|
state->start_timestamp = common::timestamp_ns();
|
|
}
|
|
|
|
template <size_t OpIdx>
|
|
template <typename... Args>
|
|
void
|
|
ompt_impl<OpIdx>::end(ompt_data_t* data, Args... args)
|
|
{
|
|
using info_type = ompt_info<OpIdx>;
|
|
|
|
ROCP_TRACE << __FUNCTION__ << " :: " << info_type::name;
|
|
|
|
// END PART OF OMPT CALLBACK
|
|
auto end_timestamp = common::timestamp_ns();
|
|
|
|
ompt_save_state* state = nullptr;
|
|
if(data != nullptr)
|
|
state = static_cast<ompt_save_state*>(data->ptr);
|
|
else
|
|
state = get_ompt_state_stack().pop_back_val();
|
|
assert(state != nullptr);
|
|
|
|
ROCP_FATAL_IF(state->operation_idx != info_type::operation_idx)
|
|
<< "Mismatch of OMPT operation: begin=" << state->operation_idx
|
|
<< ", end=" << info_type::operation_idx;
|
|
|
|
auto& callback_contexts = state->callback_contexts;
|
|
auto& buffered_contexts = state->buffered_contexts;
|
|
auto external_corr_ids = state->external_corr_ids;
|
|
|
|
auto* corr_id = state->corr_id;
|
|
auto internal_corr_id = corr_id->internal;
|
|
auto ancestor_corr_id = corr_id->ancestor;
|
|
|
|
ROCP_FATAL_IF(common::get_tid() != state->thr_id)
|
|
<< "MIsmatch of OMPT begin/end thread id: "
|
|
<< " current=" << common::get_tid() << ", expected= " << state->thr_id;
|
|
|
|
if(!callback_contexts.empty())
|
|
{
|
|
auto tracer_data = common::init_public_api_struct(callback_ompt_data_t{});
|
|
set_data_args(info_type::get_api_data_args(tracer_data.args), std::forward<Args>(args)...);
|
|
|
|
tracing::execute_phase_exit_callbacks(callback_contexts,
|
|
external_corr_ids,
|
|
info_type::callback_domain_idx,
|
|
info_type::operation_idx,
|
|
tracer_data);
|
|
}
|
|
|
|
if(!buffered_contexts.empty())
|
|
{
|
|
auto buffer_record = common::init_public_api_struct(buffer_ompt_record_t{});
|
|
if constexpr(OpIdx == ROCPROFILER_OMPT_ID_target_emi ||
|
|
OpIdx == ROCPROFILER_OMPT_ID_target_data_op_emi ||
|
|
OpIdx == ROCPROFILER_OMPT_ID_target_submit_emi)
|
|
{
|
|
auto tracer_data = common::init_public_api_struct(callback_ompt_data_t{});
|
|
set_data_args(info_type::get_api_data_args(tracer_data.args),
|
|
std::forward<Args>(args)...);
|
|
if constexpr(OpIdx == ROCPROFILER_OMPT_ID_target_emi)
|
|
{
|
|
buffer_record.target.kind = tracer_data.args.target_emi.kind;
|
|
buffer_record.target.device_num = tracer_data.args.target_emi.device_num;
|
|
buffer_record.target.task_id = tracer_data.args.target_emi.task_data->value;
|
|
buffer_record.target.target_id = tracer_data.args.target_emi.target_data->value;
|
|
buffer_record.target.codeptr_ra = tracer_data.args.target_emi.codeptr_ra;
|
|
}
|
|
else if constexpr(OpIdx == ROCPROFILER_OMPT_ID_target_data_op_emi)
|
|
{
|
|
buffer_record.target_data_op.host_op_id =
|
|
tracer_data.args.target_data_op_emi.host_op_id->value;
|
|
buffer_record.target_data_op.optype = tracer_data.args.target_data_op_emi.optype;
|
|
buffer_record.target_data_op.src_device_num =
|
|
tracer_data.args.target_data_op_emi.src_device_num;
|
|
buffer_record.target_data_op.dst_device_num =
|
|
tracer_data.args.target_data_op_emi.dst_device_num;
|
|
buffer_record.target_data_op.reserved = 0;
|
|
buffer_record.target_data_op.bytes = tracer_data.args.target_data_op_emi.bytes;
|
|
buffer_record.target_data_op.codeptr_ra =
|
|
tracer_data.args.target_data_op_emi.codeptr_ra;
|
|
}
|
|
else if constexpr(OpIdx == ROCPROFILER_OMPT_ID_target_submit_emi)
|
|
{
|
|
buffer_record.target_kernel.device_num = 0; // FIXME
|
|
buffer_record.target_kernel.requested_num_teams =
|
|
tracer_data.args.target_submit_emi.requested_num_teams;
|
|
buffer_record.target_kernel.host_op_id =
|
|
tracer_data.args.target_submit_emi.host_op_id->value;
|
|
}
|
|
}
|
|
|
|
buffer_record.start_timestamp = state->start_timestamp;
|
|
buffer_record.end_timestamp = end_timestamp;
|
|
tracing::execute_buffer_record_emplace(buffered_contexts,
|
|
state->thr_id,
|
|
internal_corr_id,
|
|
external_corr_ids,
|
|
ancestor_corr_id,
|
|
info_type::buffered_domain_idx,
|
|
info_type::operation_idx,
|
|
buffer_record);
|
|
}
|
|
|
|
// decrement the reference count after usage in the callback/buffers
|
|
state->corr_id->sub_ref_count();
|
|
context::pop_latest_correlation_id(state->corr_id);
|
|
delete state;
|
|
if(data) data->ptr = nullptr;
|
|
}
|
|
|
|
template <size_t OpIdx>
|
|
template <typename... Args>
|
|
context::correlation_id*
|
|
ompt_impl<OpIdx>::event_common(Args... args)
|
|
{
|
|
using info_type = ompt_info<OpIdx>;
|
|
|
|
ROCP_TRACE << __FUNCTION__ << " :: " << info_type::name;
|
|
|
|
constexpr auto external_corr_id_domain_idx =
|
|
ompt_domain_info::external_correlation_id_domain_idx;
|
|
|
|
constexpr auto ref_count = 1;
|
|
auto thr_id = common::get_tid();
|
|
auto callback_contexts = tracing::callback_context_data_vec_t{};
|
|
auto buffered_contexts = tracing::buffered_context_data_vec_t{};
|
|
auto external_corr_ids = tracing::external_correlation_id_map_t{};
|
|
|
|
tracing::populate_contexts(info_type::callback_domain_idx,
|
|
info_type::buffered_domain_idx,
|
|
info_type::operation_idx,
|
|
callback_contexts,
|
|
buffered_contexts,
|
|
external_corr_ids);
|
|
|
|
auto buffer_record = common::init_public_api_struct(buffer_ompt_record_t{});
|
|
auto tracer_data = common::init_public_api_struct(callback_ompt_data_t{});
|
|
auto* corr_id = tracing::correlation_service::construct(ref_count);
|
|
|
|
// During finalization, correlation ID construction may return nullptr
|
|
if(!corr_id) return nullptr;
|
|
|
|
uint64_t internal_corr_id = corr_id->internal;
|
|
uint64_t ancestor_corr_id = corr_id->ancestor;
|
|
|
|
tracing::populate_external_correlation_ids(external_corr_ids,
|
|
thr_id,
|
|
external_corr_id_domain_idx,
|
|
info_type::operation_idx,
|
|
internal_corr_id);
|
|
|
|
// invoke the callbacks
|
|
if(!callback_contexts.empty())
|
|
{
|
|
set_data_args(info_type::get_api_data_args(tracer_data.args), std::forward<Args>(args)...);
|
|
|
|
tracing::execute_phase_none_callbacks(callback_contexts,
|
|
thr_id,
|
|
internal_corr_id,
|
|
external_corr_ids,
|
|
ancestor_corr_id,
|
|
info_type::callback_domain_idx,
|
|
info_type::operation_idx,
|
|
tracer_data);
|
|
}
|
|
|
|
tracing::update_external_correlation_ids(
|
|
external_corr_ids, thr_id, external_corr_id_domain_idx);
|
|
|
|
if(!buffered_contexts.empty())
|
|
{
|
|
buffer_record.start_timestamp = common::timestamp_ns();
|
|
buffer_record.end_timestamp = buffer_record.start_timestamp;
|
|
tracing::execute_buffer_record_emplace(buffered_contexts,
|
|
thr_id,
|
|
internal_corr_id,
|
|
external_corr_ids,
|
|
ancestor_corr_id,
|
|
info_type::buffered_domain_idx,
|
|
info_type::operation_idx,
|
|
buffer_record);
|
|
}
|
|
|
|
return corr_id;
|
|
}
|
|
|
|
template <size_t OpIdx>
|
|
template <typename... Args>
|
|
void
|
|
ompt_impl<OpIdx>::event(Args&&... args)
|
|
{
|
|
auto corr_id = ompt_impl<OpIdx>::event_common(std::forward<Args>(args)...);
|
|
if(!corr_id) return; // During finalization
|
|
context::pop_latest_correlation_id(corr_id);
|
|
corr_id->sub_ref_count();
|
|
}
|
|
|
|
namespace
|
|
{
|
|
template <typename Tp>
|
|
decltype(auto)
|
|
convert_arg(Tp&& _arg)
|
|
{
|
|
using type = common::mpl::unqualified_type_t<Tp>;
|
|
if constexpr(common::mpl::is_string_type<type>::value)
|
|
{
|
|
if(!_arg) return std::remove_reference_t<Tp>(_arg);
|
|
return common::get_string_entry(std::string_view{_arg})->c_str();
|
|
}
|
|
else
|
|
return std::forward<Tp>(_arg);
|
|
}
|
|
|
|
template <size_t OpIdx, size_t... OpIdxTail>
|
|
void
|
|
get_ids(std::vector<uint32_t>& _id_list, std::index_sequence<OpIdx, OpIdxTail...>)
|
|
{
|
|
auto _idx = ompt_info<OpIdx>::operation_idx;
|
|
if(_idx < ompt_domain_info::last) _id_list.emplace_back(_idx);
|
|
|
|
if constexpr(sizeof...(OpIdxTail) > 0) get_ids(_id_list, std::index_sequence<OpIdxTail...>{});
|
|
}
|
|
|
|
template <size_t OpIdx, size_t... OpIdxTail>
|
|
const char*
|
|
name_by_id(const uint32_t id, std::index_sequence<OpIdx, OpIdxTail...>)
|
|
{
|
|
if(OpIdx == id) return ompt_info<OpIdx>::name;
|
|
|
|
if constexpr(sizeof...(OpIdxTail) > 0)
|
|
return name_by_id(id, std::index_sequence<OpIdxTail...>{});
|
|
else
|
|
return nullptr;
|
|
}
|
|
|
|
bool
|
|
should_enable_callback(rocprofiler_callback_tracing_kind_t _callback_domain,
|
|
rocprofiler_buffer_tracing_kind_t _buffered_domain,
|
|
int _operation)
|
|
{
|
|
// we loop over all the *registered* contexts and see if any of them, at any point in time,
|
|
// might require callback or buffered API tracing
|
|
for(const auto& itr : context::get_registered_contexts())
|
|
{
|
|
if(!itr) continue;
|
|
|
|
// if there is a callback tracer enabled for the given domain and op, we need to wrap
|
|
if(itr->callback_tracer && itr->callback_tracer->domains(_callback_domain) &&
|
|
itr->callback_tracer->domains(_callback_domain, _operation))
|
|
return true;
|
|
|
|
// if there is a buffered tracer enabled for the given domain and op, we need to wrap
|
|
if(itr->buffered_tracer && itr->buffered_tracer->domains(_buffered_domain) &&
|
|
itr->buffered_tracer->domains(_buffered_domain, _operation))
|
|
return true;
|
|
}
|
|
return false;
|
|
}
|
|
|
|
template <size_t OpIdx>
|
|
void
|
|
update_table(ompt_update_func f, std::integral_constant<size_t, OpIdx>)
|
|
{
|
|
auto _info = ompt_info<OpIdx>();
|
|
|
|
if(_info.unsupported)
|
|
{
|
|
ROCP_INFO << "OMPT operation not supported: " << _info.name;
|
|
return;
|
|
}
|
|
// check to see if there are any contexts which enable this operation in the OMPT API domain
|
|
if(!should_enable_callback(
|
|
_info.callback_domain_idx, _info.buffered_domain_idx, _info.operation_idx))
|
|
return;
|
|
|
|
ROCP_TRACE << "updating table entry for " << _info.name;
|
|
|
|
// Register this callback for OMPT at init time.
|
|
auto& _func = _info.get_table_func();
|
|
auto* _ompt_cb = reinterpret_cast<ompt_callback_t*>(&_func);
|
|
f(_info.name, _ompt_cb, _info.ompt_idx);
|
|
}
|
|
|
|
template <size_t OpIdx, size_t... OpIdxTail>
|
|
void
|
|
update_table(ompt_update_func f, std::index_sequence<OpIdx, OpIdxTail...>)
|
|
{
|
|
update_table(f, std::integral_constant<size_t, OpIdx>{});
|
|
if constexpr(sizeof...(OpIdxTail) > 0) update_table(f, std::index_sequence<OpIdxTail...>{});
|
|
}
|
|
} // namespace
|
|
|
|
template <size_t OpIdx>
|
|
template <typename DataArgsT, typename... Args>
|
|
void
|
|
ompt_impl<OpIdx>::set_data_args(DataArgsT& _data_args, Args... args)
|
|
{
|
|
if constexpr(sizeof...(Args) == 0)
|
|
_data_args.no_args.empty = '\0';
|
|
else
|
|
_data_args = DataArgsT{convert_arg(args)...};
|
|
}
|
|
|
|
// check out the assembly here... this compiles to a switch statement
|
|
const char*
|
|
name_by_id(uint32_t id)
|
|
{
|
|
return name_by_id(id, std::make_index_sequence<ompt_domain_info::last>{});
|
|
}
|
|
|
|
std::vector<uint32_t>
|
|
get_ids()
|
|
{
|
|
constexpr auto last_api_id = ompt_domain_info::last;
|
|
auto _data = std::vector<uint32_t>{};
|
|
_data.reserve(last_api_id);
|
|
get_ids(_data, std::make_index_sequence<last_api_id>{});
|
|
return _data;
|
|
}
|
|
|
|
template <typename DataT, size_t OpIdx, size_t... OpIdxTail>
|
|
void
|
|
iterate_args(const uint32_t id,
|
|
const DataT& data,
|
|
rocprofiler_callback_tracing_operation_args_cb_t func,
|
|
int32_t max_deref,
|
|
void* user_data,
|
|
std::index_sequence<OpIdx, OpIdxTail...>)
|
|
{
|
|
if(OpIdx == id)
|
|
{
|
|
using info_type = ompt_info<OpIdx>;
|
|
auto&& arg_list = info_type::as_arg_list(data, max_deref);
|
|
auto&& arg_addr = info_type::as_arg_addr(data);
|
|
for(size_t i = 0; i < std::min(arg_list.size(), arg_addr.size()); ++i)
|
|
{
|
|
auto ret = func(info_type::callback_domain_idx, // kind
|
|
id, // operation
|
|
i, // arg_number
|
|
arg_addr.at(i), // arg_value_addr
|
|
arg_list.at(i).indirection_level, // indirection
|
|
arg_list.at(i).type, // arg_type
|
|
arg_list.at(i).name, // arg_name
|
|
arg_list.at(i).value.c_str(), // arg_value_str
|
|
arg_list.at(i).dereference_count, // num deref in str
|
|
user_data);
|
|
if(ret != 0) break;
|
|
}
|
|
return;
|
|
}
|
|
if constexpr(sizeof...(OpIdxTail) > 0)
|
|
iterate_args(id, data, func, max_deref, user_data, std::index_sequence<OpIdxTail...>{});
|
|
}
|
|
|
|
void
|
|
update_callback(rocprofiler_ompt_callback_functions_t& cb_functions)
|
|
{
|
|
auto _info = ompt_info<ROCPROFILER_OMPT_ID_callback_functions>();
|
|
|
|
if(should_enable_callback(
|
|
_info.callback_domain_idx, _info.buffered_domain_idx, _info.operation_idx))
|
|
rocprof_ompt_cb_interface(cb_functions);
|
|
}
|
|
|
|
void
|
|
iterate_args(uint32_t id,
|
|
const rocprofiler_callback_tracing_ompt_data_t& data,
|
|
rocprofiler_callback_tracing_operation_args_cb_t callback,
|
|
int32_t max_deref,
|
|
void* user_data)
|
|
{
|
|
if(callback)
|
|
iterate_args(id,
|
|
data,
|
|
callback,
|
|
max_deref,
|
|
user_data,
|
|
std::make_index_sequence<ompt::ompt_domain_info::ompt_last>{});
|
|
}
|
|
|
|
void
|
|
update_table(ompt_update_func f)
|
|
{
|
|
update_table(f, std::make_index_sequence<ompt::ompt_domain_info::ompt_last>{});
|
|
}
|
|
|
|
} // namespace ompt
|
|
} // namespace rocprofiler
|