2
0
Ficheiros
Mark Meserve bf49039005 [rocprofiler-sdk][rocprofiler-register] Initial Attachment Support (#316)
* attach: milestone: API tracing

- This pairs with another commit in rocprofiler-sdk to fully
  function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup

- Remove hardcode for loading of tool library
- Make invoke registration functions public again

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk

* attach: prestore overhaul

- Must be paired with commit in rocprofiler-sdk

* attach: add dispatch table rework

- Register will load the prestore library and provide entrypoints to sdk

* attach: formatting and cleanup

* attach: revise dispatch table scheme

* attach: formatting

* attach: milestone: API tracing

- This change must be paired with a change in rocprofiler-register to
  fully function.
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup and comments

* attach: Formatting and crash fixes

* attach: add attach duration

- Add option attach-duration-msec for attachment

* Formatting + sglang hang fix via signal handling

* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register

* Allow null agents for scratch output

* attach: improve queue library interface

- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect

* attach: add code_object support

- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements

* attach: rename queue library to prestore

* attach: prestore overhaul

- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
  - Separates registrations for different object types
  - Sets up future changes for initialization

* attach: add prestore dispatch table

- Removes linkage to prestore library from sdk

* attach: cleanup

* attach: formatting

* attach: fix input prompt not appearing

* attach: fix component name in cmake

* attach: revert change to export level

* Make prestore API public

* attach: update sdk attachment library WIP

- This commit is NONFUNCTIONAL

- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage

* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL

- Changes rocprofiler_register to handle dispatch table from attach
  library.
- Still needs changes in SDK with dispatch table usage

* attach: dispatch table wip
- This commit is NONFUNCTIONAL

* attach: move attach component into core

* attach: rename to rocprofv3-attach

* attach: add callbacks for new queues and code objects

* attach: finish dispatch table implementation

- Fixes kernel tracing

* attach: add cmake variable for attachment support

* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests

- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration

Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities

* Documentation Update

* attach: make attach script callable

* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name

* attach: revert metrics library path changes

* Generic Attachment in Register (#942)

Remove tool references in register

* Add second param to attach call in rocprof register

* Add experimental reattachment support for ROCprofiler-SDK

This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:

**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests

**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:

1. Initial Attachment Flow:
    - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
    - Full tool initialization with complete context setup
    - Sets prev_attached atomic flag to track state

2. Reattachment Flow:
    - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
    - Bypasses full re-initialization, calls client reattach callbacks instead
    - Preserves existing contexts and buffers, only reactivates profiling services

**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability

**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts

**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles

**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization

**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts

**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities

**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution

**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration

The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.

* Fix misc clang-tidy warnings/errors

* CMake Option and Environment Variable Updates

- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->

* Source reorganization

* Formatting + new lines at EOF

* Fix flake8 F841: local variable is assigned to but never used

* Update attachment test

- get rid of 5 second start delay
- add roctx

* Rework implementation

- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst

* Handle re-attachment options

- inherit options from previous attachment
- check previous options do not modify data collection services

* Fix support for tools w/o rocprofiler_configure_attach

- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool

* attach: remove unknown agent handling

- Change was from earlier commit, no longer needed

* attach: add error for attaching without library loaded

* attach: revise version numbering

* attach: register header revisions

* attach: clang format register

* attach: formatting

* attach: fix build failure

- Remove cross dependency into rocprofiler-sdk, fixes build on some systems

* attach: revise register library detection

* Update rocprofiler-register and attach library

- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()

* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion

* Fix output support for rocprofiler-sdk-tool

* Fix formatting

* Fix clang tidy errors

* Misc rocprofiler-sdk-attach fixes

* attach: add sigint handling to attach python

* tool README.md formatting

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix buffered output issue

* attach: add errors for tool attach

* CI Fixes

* Rework tests

* attach: improve library loading in rocprofv3 attach

* formatting

* Update tests to use pytest framework

* Fix test_attachment_hsa_api_trace

* attach: catch ctypes exceptions

* attach: fix leak in registration

* attach: fix sanitizer tests

* attach: fix sanitizer tests further

* attach: disable attach asan tests

* attach: disable ubsan test

* attach: fix permissions in installed test package

* attach: formatting

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-09-18 18:10:45 -05:00

269 linhas
10 KiB
C++

// MIT License
//
// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
#include "queue_registration.h"
#include "queue_registration.hpp"
#include "table.hpp"
#include "lib/common/static_object.hpp"
#include <mutex>
namespace
{
using callback_t = void (*)(hsa_status_t status, hsa_queue_t* source, void* data);
struct queue_entry_t
{
hsa_agent_t agent = hsa_agent_t{};
write_interceptor_t user_write_interceptor_func = nullptr;
void* user_write_interceptor_data = nullptr;
};
using queue_collection_t = std::unordered_map<hsa_queue_t*, queue_entry_t>;
struct queue_registration_t
{
// guards access to both queues collection
std::mutex queues_mutex;
queue_collection_t queues;
decltype(AmdExtTable::hsa_amd_queue_intercept_create_fn) hsa_amd_queue_intercept_create_fn =
nullptr;
decltype(AmdExtTable::hsa_amd_profiling_set_profiler_enabled_fn)
hsa_amd_profiling_set_profiler_enabled_fn = nullptr;
decltype(AmdExtTable::hsa_amd_queue_intercept_register_fn) hsa_amd_queue_intercept_register_fn =
nullptr;
decltype(CoreApiTable::hsa_status_string_fn) hsa_status_string_fn = nullptr;
};
queue_registration_t*
get_queue_registration()
{
static auto*& registration =
rocprofiler::common::static_object<queue_registration_t>::construct();
return registration;
}
std::string_view
get_hsa_status_string(hsa_status_t _status)
{
auto* registration = CHECK_NOTNULL(get_queue_registration());
const char* _status_msg = nullptr;
return (CHECK_NOTNULL(registration->hsa_status_string_fn)(_status, &_status_msg) ==
HSA_STATUS_SUCCESS &&
_status_msg)
? std::string_view{_status_msg}
: std::string_view{"(unknown HSA error)"};
}
#define ROCP_ATTACH_HSA_TABLE_CALL(SEVERITY, EXPR) \
auto ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__) = (EXPR); \
ROCP_##SEVERITY##_IF(ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__) != \
HSA_STATUS_SUCCESS) \
<< #EXPR << " returned non-zero status code " \
<< ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__) \
<< " :: " << get_hsa_status_string(ROCPROFILER_VARIABLE(rocp_hsa_table_call_, __LINE__)) \
<< " "
// This is the attach library's WriteInterceptor that is provided to HSA.
// Since the interceptor function cannot be changed later, this shim is provided immediately upon
// queue creation. This shim's user data is a reference to the queue_entry_t for this queue, which
// will then by cast and used to call the user write interceptor if it is non-null.
void
write_interceptor(const void* packets,
uint64_t pkt_count,
uint64_t unused,
void* data,
hsa_amd_queue_intercept_packet_writer_t writer)
{
ROCP_FATAL_IF(data == nullptr) << "WriteInterceptor was not passed a valid pointer";
const auto* entry = static_cast<const queue_entry_t*>(data);
if(entry->user_write_interceptor_func)
{
entry->user_write_interceptor_func(
packets, pkt_count, unused, entry->user_write_interceptor_data, writer);
}
else
{
writer(packets, pkt_count);
}
}
// HSA Intercept Functions (create_queue/destroy_queue)
hsa_status_t
create_queue(hsa_agent_t agent,
uint32_t size,
hsa_queue_type32_t type,
callback_t callback,
void* data,
uint32_t private_segment_size,
uint32_t group_segment_size,
hsa_queue_t** queue)
{
auto* registration = CHECK_NOTNULL(get_queue_registration());
// Create new queue in HSA
hsa_queue_t* new_queue = nullptr;
ROCP_FATAL_IF(!registration->hsa_amd_queue_intercept_create_fn ||
!registration->hsa_amd_profiling_set_profiler_enabled_fn ||
!registration->hsa_amd_queue_intercept_register_fn ||
!registration->hsa_status_string_fn)
<< "Queue registration was not initialized before create queue was called!";
ROCP_ATTACH_HSA_TABLE_CALL(FATAL,
registration->hsa_amd_queue_intercept_create_fn(agent,
size,
type,
callback,
data,
private_segment_size,
group_segment_size,
&new_queue))
<< "Could not create intercept queue";
ROCP_ATTACH_HSA_TABLE_CALL(
FATAL, registration->hsa_amd_profiling_set_profiler_enabled_fn(new_queue, true))
<< "Could not setup intercept profiler";
// Create and insert our queue's data entry now, as we need to provide a reference to it for the
// write_interceptor
queue_entry_t entry{};
entry.agent = agent;
{
std::lock_guard lg(registration->queues_mutex);
ROCP_FATAL_IF(registration->queues.count(new_queue) > 0)
<< "Queue registration already contains an entry for new queue handle " << new_queue;
registration->queues.insert({new_queue, entry});
}
auto* write_interceptor_data = &(registration->queues.at(new_queue));
// Pass queue_entry_t* as user data, used to directly call the user's write interceptor
ROCP_ATTACH_HSA_TABLE_CALL(FATAL,
registration->hsa_amd_queue_intercept_register_fn(
new_queue, write_interceptor, write_interceptor_data))
<< "Could not register interceptor";
*queue = new_queue;
ROCP_INFO << "created attach queue for HSA agent handle " << agent.handle;
auto* attach_table = rocprofiler::attach::get_dispatch_table();
if(attach_table->rocprofiler_attach_notify_new_queue)
{
attach_table->rocprofiler_attach_notify_new_queue(new_queue, agent, nullptr);
}
return HSA_STATUS_SUCCESS;
}
hsa_status_t
destroy_queue(hsa_queue_t* hsa_queue)
{
auto* registration = get_queue_registration();
if(registration)
{
std::lock_guard lg(registration->queues_mutex);
size_t erase_count = registration->queues.erase(hsa_queue);
ROCP_WARNING_IF(erase_count == 0)
<< "Destroy queue was called for a handle that was not in queues: " << hsa_queue;
}
return HSA_STATUS_SUCCESS;
}
int
iterate_all_queues(rocprof_attach_queue_iterator_t func, void* user_data)
{
auto* registration = CHECK_NOTNULL(get_queue_registration());
std::lock_guard lg(registration->queues_mutex);
for(const auto& qr_pair : registration->queues)
{
func(qr_pair.first, qr_pair.second.agent, user_data);
}
return ROCPROFILER_STATUS_SUCCESS;
}
int
set_write_interceptor(hsa_queue_t* queue, write_interceptor_t func, void* data)
{
auto* registration = CHECK_NOTNULL(get_queue_registration());
auto qr_pair = registration->queues.find(queue);
if(qr_pair == registration->queues.end())
{
ROCP_ERROR << "couldn't find registration to set write interceptor for queue " << queue;
return ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT;
}
qr_pair->second.user_write_interceptor_func = func;
qr_pair->second.user_write_interceptor_data = data;
return 0;
}
} // namespace
namespace rocprofiler
{
namespace attach
{
void
queue_registration_init(HsaApiTable* table)
{
ROCP_TRACE << "Initializing Queue Registration";
auto* registration = CHECK_NOTNULL(get_queue_registration());
CoreApiTable& core_table = *table->core_;
core_table.hsa_queue_create_fn = create_queue;
core_table.hsa_queue_destroy_fn = destroy_queue;
registration->hsa_amd_queue_intercept_create_fn =
*table->amd_ext_->hsa_amd_queue_intercept_create_fn;
registration->hsa_amd_profiling_set_profiler_enabled_fn =
*table->amd_ext_->hsa_amd_profiling_set_profiler_enabled_fn;
registration->hsa_amd_queue_intercept_register_fn =
*table->amd_ext_->hsa_amd_queue_intercept_register_fn;
registration->hsa_status_string_fn = *table->core_->hsa_status_string_fn;
}
} // namespace attach
} // namespace rocprofiler
ROCPROFILER_EXTERN_C_INIT
int
rocprofiler_attach_iterate_all_queues(rocprof_attach_queue_iterator_t func, void* data)
{
return iterate_all_queues(func, data);
}
int
rocprofiler_attach_set_write_interceptor(hsa_queue_t* queue, write_interceptor_t func, void* data)
{
return set_write_interceptor(queue, func, data);
}
ROCPROFILER_EXTERN_C_FINI