bf49039005
* attach: milestone: API tracing - This pairs with another commit in rocprofiler-sdk to fully function - Add ptrace entry points for tool attachment - API tracing works at this commit - Queue tracing not supported yet * attach: cleanup - Remove hardcode for loading of tool library - Make invoke registration functions public again * attach: proxy queue first draft - Adds ability to trace with queues during attachment - Must be paired with updated rocprofiler-sdk * attach: prestore overhaul - Must be paired with commit in rocprofiler-sdk * attach: add dispatch table rework - Register will load the prestore library and provide entrypoints to sdk * attach: formatting and cleanup * attach: revise dispatch table scheme * attach: formatting * attach: milestone: API tracing - This change must be paired with a change in rocprofiler-register to fully function. - API tracing works at this commit - Queue tracing not supported yet * attach: cleanup and comments * attach: Formatting and crash fixes * attach: add attach duration - Add option attach-duration-msec for attachment * Formatting + sglang hang fix via signal handling * Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues * attach: proxy queue first draft - Adds ability to trace with queues during attachment - Must be paired with updated rocprofiler-register * Allow null agents for scratch output * attach: improve queue library interface - Significant changes to force exported interfaces back to C - Fixes bug with unknown agents at attachment - Code objects' names may still be incorrect * attach: add code_object support - Kernel traces will now have names and all other information for launches - Add capture of hsa_executable to the queue library - Various logging improvements * attach: rename queue library to prestore * attach: prestore overhaul - Must be paired with commit from rocprofiler-register - Massive overhaul of code organization in prestore library - Separates registrations for different object types - Sets up future changes for initialization * attach: add prestore dispatch table - Removes linkage to prestore library from sdk * attach: cleanup * attach: formatting * attach: fix input prompt not appearing * attach: fix component name in cmake * attach: revert change to export level * Make prestore API public * attach: update sdk attachment library WIP - This commit is NONFUNCTIONAL - Changes around structure to remove classes - Seperate C linkage where needed - Still needs updates to register for correct usage * attach: update register with dispatch table WIP - This commit is NONFUNCTIONAL - Changes rocprofiler_register to handle dispatch table from attach library. - Still needs changes in SDK with dispatch table usage * attach: dispatch table wip - This commit is NONFUNCTIONAL * attach: move attach component into core * attach: rename to rocprofv3-attach * attach: add callbacks for new queues and code objects * attach: finish dispatch table implementation - Fixes kernel tracing * attach: add cmake variable for attachment support * feat: Add --attach alias for rocprofv3 with comprehensive attachment tests - Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py - Create comprehensive attachment test suite with CSV and JSON output validation: - New attachment-test application for testing dynamic profiling scenarios - Unified test script supporting both CSV and JSON output formats - Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info - Add CMake integration for automated attachment testing - Support parameterized output directory and filename specification - Implement proper environment setup for attachment queue registration Tests verify successful attachment to running processes and capture of: - Kernel dispatch traces with workgroup/grid dimensions - Memory copy operations (H2D/D2H) with size validation - HSA API call traces across multiple domains - GPU/CPU agent information and capabilities * Documentation Update * attach: make attach script callable * Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name * attach: revert metrics library path changes * Generic Attachment in Register (#942) Remove tool references in register * Add second param to attach call in rocprof register * Add experimental reattachment support for ROCprofiler-SDK This commit introduces experimental reattachment functionality allowing tools to dynamically reattach to running processes with comprehensive design changes to support multiple attach/detach cycles: **Core Reattachment API:** - Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks - Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports - Implement reattachment tracking in rocprofiler_register_attach to differentiate initial attachment from reattachment cycles - Add rocprofiler_register_invoke_reattach for handling reattachment requests **Design Changes - Registration System Flow:** The registration system now supports a dual-path initialization: 1. Initial Attachment Flow: - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations() - Full tool initialization with complete context setup - Sets prev_attached atomic flag to track state 2. Reattachment Flow: - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach() - Bypasses full re-initialization, calls client reattach callbacks instead - Preserves existing contexts and buffers, only reactivates profiling services **Design Changes - Tool Library Loading:** Enhanced rocprofiler-register library loading with function pointer resolution: - Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers - Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions - Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability **Design Changes - Context Management:** Introduced dual context systems for attachment scenarios: - get_contexts() - Original contexts for standard tool initialization - get_attach_contexts() - Separate context map for attachment-specific lifecycle - attach_init() - Creates contexts for ALL buffer tracing services using existing buffers - attach_start() - Selectively starts contexts based on configuration options - attach_detach() - Cleanly stops and destroys attachment contexts **Design Changes - Buffer Management:** Added reset_tmp_file_buffer() template for clean reattachment state: - Properly closes and removes old temporary files - Deletes existing file_buffer instances to prevent stale file position tracking - Creates fresh file_buffer instances for clean reattachment cycles - Addresses core issue where file position metadata becomes stale between cycles **Design Changes - Environment Variable Injection:** Added ROCP_REGISTERED_TOOL_ATTACH environment variable: - Distinguishes attachment-loaded tools from LD_PRELOAD scenarios - Enables registration system to apply attachment-specific logic - Helps tools adapt behavior for attachment vs standard initialization **Attachment Context Management:** - Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle - Add reset_tmp_file_buffer template for clean reattachment state management - Implement get_attach_contexts() for tracking active attachment contexts **Test Infrastructure:** - Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite - Include reattachment test scripts with unified attachment/detachment cycles - Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info - Add conftest.py for JSON and CSV data loading utilities **Configuration Updates:** - Update CMakeLists.txt to include reattachment tests in build system - Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking - Enhance rocprofiler-register library loading with reattach/detach function resolution **Flow Impact Analysis:** This design enables robust multi-cycle attachment by: 1. Preventing duplicate initialization on reattachment 2. Maintaining separate context lifecycles for attachment vs standard operation 3. Ensuring clean temporary file state between attachment cycles 4. Providing tools with explicit reattach/detach callback hooks 5. Supporting both programmatic and environment-based tool configuration The experimental nature allows for iteration on the API while establishing the foundation for production-ready dynamic profiling capabilities. * Fix misc clang-tidy warnings/errors * CMake Option and Environment Variable Updates - CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT - Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED -> * Source reorganization * Formatting + new lines at EOF * Fix flake8 F841: local variable is assigned to but never used * Update attachment test - get rid of 5 second start delay - add roctx * Rework implementation - Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach - Add <rocprofiler-sdk/experimental/registration.h> - TODO: Update process_attachment.rst * Handle re-attachment options - inherit options from previous attachment - check previous options do not modify data collection services * Fix support for tools w/o rocprofiler_configure_attach - fix segfault when rocprofiler_configure_attach does not exist - fix naming convention for functions accepting attach dispatch table - cleanup rocprofiler_configure_attach implementation in rocprofv3 tool * attach: remove unknown agent handling - Change was from earlier commit, no longer needed * attach: add error for attaching without library loaded * attach: revise version numbering * attach: register header revisions * attach: clang format register * attach: formatting * attach: fix build failure - Remove cross dependency into rocprofiler-sdk, fixes build on some systems * attach: revise register library detection * Update rocprofiler-register and attach library - formatting - proper signature of register_functor for rocprofiler-sdk-attach library callback - remove get_dispatch_registration_table() * Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion * Fix output support for rocprofiler-sdk-tool * Fix formatting * Fix clang tidy errors * Misc rocprofiler-sdk-attach fixes * attach: add sigint handling to attach python * tool README.md formatting Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> * Fix buffered output issue * attach: add errors for tool attach * CI Fixes * Rework tests * attach: improve library loading in rocprofv3 attach * formatting * Update tests to use pytest framework * Fix test_attachment_hsa_api_trace * attach: catch ctypes exceptions * attach: fix leak in registration * attach: fix sanitizer tests * attach: fix sanitizer tests further * attach: disable attach asan tests * attach: disable ubsan test * attach: fix permissions in installed test package * attach: formatting --------- Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com> Co-authored-by: Tim Gu <Tim.Gu@amd.com> Co-authored-by: Claude Code <claude@anthropic.com> Co-authored-by: Benjamin Welton <bwelton@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com>
240 строки
7.1 KiB
C++
240 строки
7.1 KiB
C++
// MIT License
|
|
//
|
|
// Copyright (c) 2023-2025 Advanced Micro Devices, Inc. All rights reserved.
|
|
//
|
|
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
// of this software and associated documentation files (the "Software"), to deal
|
|
// in the Software without restriction, including without limitation the rights
|
|
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
// copies of the Software, and to permit persons to whom the Software is
|
|
// furnished to do so, subject to the following conditions:
|
|
//
|
|
// The above copyright notice and this permission notice shall be included in all
|
|
// copies or substantial portions of the Software.
|
|
//
|
|
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
// SOFTWARE.
|
|
|
|
#pragma once
|
|
|
|
#include "domain_type.hpp"
|
|
#include "output_config.hpp"
|
|
#include "tmp_file.hpp"
|
|
|
|
#include "lib/common/container/ring_buffer.hpp"
|
|
#include "lib/common/logging.hpp"
|
|
#include "lib/common/units.hpp"
|
|
|
|
#include <fmt/format.h>
|
|
|
|
#include <deque>
|
|
#include <mutex>
|
|
#include <string>
|
|
#include <tuple>
|
|
#include <type_traits>
|
|
#include <utility>
|
|
|
|
namespace rocprofiler
|
|
{
|
|
namespace tool
|
|
{
|
|
template <typename Tp>
|
|
using ring_buffer_t = rocprofiler::common::container::ring_buffer<Tp>;
|
|
|
|
using tmp_file_name_callback_t = std::function<std::string(domain_type)>;
|
|
|
|
std::string
|
|
compose_tmp_file_name(const output_config& cfg, domain_type buffer_type);
|
|
|
|
tmp_file_name_callback_t&
|
|
get_tmp_file_name_callback();
|
|
|
|
template <typename Tp>
|
|
struct file_buffer
|
|
{
|
|
file_buffer() = delete;
|
|
file_buffer(domain_type _domain)
|
|
: domain{_domain}
|
|
, buffer{16 * static_cast<uint64_t>(::rocprofiler::common::units::get_page_size())}
|
|
, file{get_tmp_file_name_callback()(_domain)}
|
|
{}
|
|
|
|
~file_buffer() = default;
|
|
file_buffer(const file_buffer&) = delete;
|
|
file_buffer(file_buffer&&) noexcept = default;
|
|
file_buffer& operator=(const file_buffer&) = delete;
|
|
file_buffer& operator=(file_buffer&&) noexcept = default;
|
|
|
|
void reset();
|
|
|
|
domain_type domain = {};
|
|
uint64_t nbytes = 0;
|
|
ring_buffer_t<Tp> buffer = {};
|
|
tmp_file file;
|
|
};
|
|
|
|
template <typename Tp>
|
|
void
|
|
file_buffer<Tp>::reset()
|
|
{
|
|
auto _lk = std::lock_guard<std::mutex>{file.file_mutex};
|
|
file.close();
|
|
file.remove(); // Delete old file
|
|
file.file_pos.clear();
|
|
nbytes = 0;
|
|
buffer.clear();
|
|
}
|
|
|
|
template <typename Tp>
|
|
struct file_buffer<ring_buffer_t<Tp>>
|
|
{
|
|
static_assert(std::is_void<Tp>::value && std::is_empty<Tp>::value,
|
|
"error! instantiated with ring_buffer_t<Tp> instead of Tp");
|
|
};
|
|
|
|
template <typename Tp>
|
|
file_buffer<Tp>*&
|
|
get_tmp_file_buffer(domain_type type)
|
|
{
|
|
static file_buffer<Tp>* val = new file_buffer<Tp>{type};
|
|
return val;
|
|
}
|
|
|
|
template <typename Tp>
|
|
void
|
|
offload_buffer(domain_type type)
|
|
{
|
|
auto* filebuf = get_tmp_file_buffer<Tp>(type);
|
|
|
|
if(!filebuf)
|
|
{
|
|
ROCP_CI_LOG(WARNING) << "rocprofv3 cannot offload buffer for "
|
|
<< get_domain_column_name(type) << ". Buffer has been destroyed.";
|
|
return;
|
|
}
|
|
|
|
auto _lk = std::lock_guard<std::mutex>(filebuf->file.file_mutex);
|
|
[[maybe_unused]] auto _success = filebuf->file.open();
|
|
auto& _fs = filebuf->file.stream;
|
|
|
|
ROCP_CI_LOG_IF(WARNING, _fs.tellg() != _fs.tellp()) // this should always be true
|
|
<< "tellg=" << _fs.tellg() << ", tellp=" << _fs.tellp();
|
|
|
|
auto _nbytes = (filebuf->buffer.count() * filebuf->buffer.data_size());
|
|
|
|
ROCP_TRACE << fmt::format(
|
|
"offloading {} B from {} buffer to tmp file", _nbytes, get_domain_column_name(type));
|
|
|
|
filebuf->file.file_pos.emplace(_fs.tellp());
|
|
filebuf->nbytes += _nbytes;
|
|
filebuf->buffer.save(_fs);
|
|
filebuf->buffer.clear();
|
|
|
|
ROCP_CI_LOG_IF(ERROR, !filebuf->buffer.is_empty())
|
|
<< "buffer is not empty after offload: count=" << filebuf->buffer.count();
|
|
}
|
|
|
|
template <typename Tp>
|
|
void
|
|
write_ring_buffer(Tp _v, domain_type type)
|
|
{
|
|
auto* filebuf = get_tmp_file_buffer<Tp>(type);
|
|
|
|
if(!filebuf)
|
|
{
|
|
ROCP_CI_LOG(WARNING) << "rocprofv3 is dropping record from domain "
|
|
<< get_domain_column_name(type) << ". Buffer has been destroyed.";
|
|
return;
|
|
}
|
|
else if(filebuf->buffer.capacity() == 0)
|
|
{
|
|
ROCP_CI_LOG(WARNING) << "rocprofv3 is dropping record from domain "
|
|
<< get_domain_column_name(type) << ". Buffer has a capacity of zero.";
|
|
return;
|
|
}
|
|
|
|
auto* ptr = filebuf->buffer.request(false);
|
|
if(ptr == nullptr)
|
|
{
|
|
offload_buffer<Tp>(type);
|
|
ptr = filebuf->buffer.request(false);
|
|
|
|
// if failed, try again
|
|
if(!ptr) ptr = filebuf->buffer.request(false);
|
|
|
|
// after second failure, emit warning message
|
|
ROCP_CI_LOG_IF(WARNING, !ptr)
|
|
<< "rocprofv3 is dropping record from domain " << get_domain_column_name(type)
|
|
<< ". No space in buffer: "
|
|
<< fmt::format(
|
|
"capacity={}, record_size={}, used_count={}, free_count={} | raw_info=[{}]",
|
|
filebuf->buffer.capacity(),
|
|
filebuf->buffer.data_size(),
|
|
filebuf->buffer.count(),
|
|
filebuf->buffer.free(),
|
|
filebuf->buffer.as_string());
|
|
}
|
|
|
|
if(ptr)
|
|
{
|
|
if constexpr(std::is_move_constructible<Tp>::value)
|
|
{
|
|
new(ptr) Tp{std::move(_v)};
|
|
}
|
|
else if constexpr(std::is_move_assignable<Tp>::value)
|
|
{
|
|
*ptr = std::move(_v);
|
|
}
|
|
else if constexpr(std::is_copy_constructible<Tp>::value)
|
|
{
|
|
new(ptr) Tp{_v};
|
|
}
|
|
else if constexpr(std::is_copy_assignable<Tp>::value)
|
|
{
|
|
*ptr = _v;
|
|
}
|
|
else
|
|
{
|
|
static_assert(std::is_void<Tp>::value,
|
|
"data type is neither move/copy constructible nor move/copy assignable");
|
|
}
|
|
}
|
|
}
|
|
|
|
template <typename Tp>
|
|
void
|
|
flush_tmp_buffer(domain_type type)
|
|
{
|
|
auto* filebuf = get_tmp_file_buffer<Tp>(type);
|
|
if(filebuf && !filebuf->buffer.is_empty()) offload_buffer<Tp>(type);
|
|
}
|
|
|
|
template <typename Tp>
|
|
void
|
|
read_tmp_file(domain_type type)
|
|
{
|
|
auto* filebuf = get_tmp_file_buffer<Tp>(type);
|
|
|
|
if(!filebuf)
|
|
{
|
|
ROCP_CI_LOG(WARNING) << "rocprofv3 cannot read tmp file for "
|
|
<< get_domain_column_name(type) << ". Buffer has been destroyed.";
|
|
return;
|
|
}
|
|
|
|
auto _lk = std::lock_guard<std::mutex>{filebuf->file.file_mutex};
|
|
if(filebuf->file.exists())
|
|
{
|
|
auto& _fs = filebuf->file.stream;
|
|
if(_fs.is_open()) _fs.close();
|
|
filebuf->file.open(std::ios::binary | std::ios::in);
|
|
}
|
|
}
|
|
} // namespace tool
|
|
} // namespace rocprofiler
|