Files
rocm-systems/projects/rocprofiler-sdk/source/docs/api-reference/process_attachment.rst
T
Mark Meserve bf49039005 [rocprofiler-sdk][rocprofiler-register] Initial Attachment Support (#316)
* attach: milestone: API tracing

- This pairs with another commit in rocprofiler-sdk to fully
  function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup

- Remove hardcode for loading of tool library
- Make invoke registration functions public again

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk

* attach: prestore overhaul

- Must be paired with commit in rocprofiler-sdk

* attach: add dispatch table rework

- Register will load the prestore library and provide entrypoints to sdk

* attach: formatting and cleanup

* attach: revise dispatch table scheme

* attach: formatting

* attach: milestone: API tracing

- This change must be paired with a change in rocprofiler-register to
  fully function.
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup and comments

* attach: Formatting and crash fixes

* attach: add attach duration

- Add option attach-duration-msec for attachment

* Formatting + sglang hang fix via signal handling

* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register

* Allow null agents for scratch output

* attach: improve queue library interface

- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect

* attach: add code_object support

- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements

* attach: rename queue library to prestore

* attach: prestore overhaul

- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
  - Separates registrations for different object types
  - Sets up future changes for initialization

* attach: add prestore dispatch table

- Removes linkage to prestore library from sdk

* attach: cleanup

* attach: formatting

* attach: fix input prompt not appearing

* attach: fix component name in cmake

* attach: revert change to export level

* Make prestore API public

* attach: update sdk attachment library WIP

- This commit is NONFUNCTIONAL

- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage

* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL

- Changes rocprofiler_register to handle dispatch table from attach
  library.
- Still needs changes in SDK with dispatch table usage

* attach: dispatch table wip
- This commit is NONFUNCTIONAL

* attach: move attach component into core

* attach: rename to rocprofv3-attach

* attach: add callbacks for new queues and code objects

* attach: finish dispatch table implementation

- Fixes kernel tracing

* attach: add cmake variable for attachment support

* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests

- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration

Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities

* Documentation Update

* attach: make attach script callable

* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name

* attach: revert metrics library path changes

* Generic Attachment in Register (#942)

Remove tool references in register

* Add second param to attach call in rocprof register

* Add experimental reattachment support for ROCprofiler-SDK

This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:

**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests

**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:

1. Initial Attachment Flow:
    - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
    - Full tool initialization with complete context setup
    - Sets prev_attached atomic flag to track state

2. Reattachment Flow:
    - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
    - Bypasses full re-initialization, calls client reattach callbacks instead
    - Preserves existing contexts and buffers, only reactivates profiling services

**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability

**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts

**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles

**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization

**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts

**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities

**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution

**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration

The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.

* Fix misc clang-tidy warnings/errors

* CMake Option and Environment Variable Updates

- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->

* Source reorganization

* Formatting + new lines at EOF

* Fix flake8 F841: local variable is assigned to but never used

* Update attachment test

- get rid of 5 second start delay
- add roctx

* Rework implementation

- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst

* Handle re-attachment options

- inherit options from previous attachment
- check previous options do not modify data collection services

* Fix support for tools w/o rocprofiler_configure_attach

- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool

* attach: remove unknown agent handling

- Change was from earlier commit, no longer needed

* attach: add error for attaching without library loaded

* attach: revise version numbering

* attach: register header revisions

* attach: clang format register

* attach: formatting

* attach: fix build failure

- Remove cross dependency into rocprofiler-sdk, fixes build on some systems

* attach: revise register library detection

* Update rocprofiler-register and attach library

- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()

* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion

* Fix output support for rocprofiler-sdk-tool

* Fix formatting

* Fix clang tidy errors

* Misc rocprofiler-sdk-attach fixes

* attach: add sigint handling to attach python

* tool README.md formatting

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix buffered output issue

* attach: add errors for tool attach

* CI Fixes

* Rework tests

* attach: improve library loading in rocprofv3 attach

* formatting

* Update tests to use pytest framework

* Fix test_attachment_hsa_api_trace

* attach: catch ctypes exceptions

* attach: fix leak in registration

* attach: fix sanitizer tests

* attach: fix sanitizer tests further

* attach: disable attach asan tests

* attach: disable ubsan test

* attach: fix permissions in installed test package

* attach: formatting

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-09-18 18:10:45 -05:00

1139 строки
35 KiB
ReStructuredText

.. meta::
:description: Technical guide for implementing ROCprofiler-SDK process attachment
:keywords: ROCprofiler-SDK, process attachment, ptrace, dynamic profiling, tool development
.. _process_attachment_implementation:
********************************************************************************
Implementing Process Attachment Tools
********************************************************************************
Overview
========
This document provides the technical details needed to implement a process attachment tool similar to ``rocprofv3 --attach``. Process attachment allows profiling tools to dynamically attach to running GPU applications without requiring application restart.
The implementation uses specific exported C functions and involves low-level process manipulation using ptrace, environment variable injection, library loading, and coordination with the ROCprofiler-SDK registration system.
Exported C Functions for Attachment
===================================
The attachment functionality provides the following exported C functions that tools can use:
ROCprofiler-Attach Functions
-----------------------------
These functions are exported from the ``rocprofiler-attach`` binary:
.. code-block:: cpp
extern "C" {
// Start attachment to a target process
void attach(uint32_t pid) ROCPROFILER_EXPORT;
// Detach from target process and cleanup
void detach() ROCPROFILER_EXPORT;
}
**Function Details:**
- **``attach(uint32_t pid)``**: Main entry point for starting attachment to a process
- Takes the target process ID as parameter
- Initiates ptrace-based attachment sequence
- Spawns background thread for ptrace operations
- **``detach()``**: Entry point for detaching from the target process
- Cleans up attachment resources and terminates profiling
- Joins ptrace thread and releases resources
ROCprofiler-Register Functions
------------------------------
These functions are exported from the ``librocprofiler-register.so`` library and are called via ptrace:
.. code-block:: cpp
extern "C" {
// Activate profiling in target process (called via ptrace)
rocprofiler_register_error_code_t
rocprofiler_register_attach(const char* environment_buffer, const char* tool_lib_path)
ROCPROFILER_REGISTER_PUBLIC_API;
// Deactivate profiling in target process (called via ptrace)
rocprofiler_register_error_code_t
rocprofiler_register_detach()
ROCPROFILER_REGISTER_PUBLIC_API;
// Reattach to previously attached process (experimental)
rocprofiler_register_error_code_t
rocprofiler_register_invoke_reattach()
ROCPROFILER_REGISTER_PUBLIC_API;
// Client callback functions for reattachment support
void rocprofiler_call_client_reattach(void)
ROCPROFILER_REGISTER_PUBLIC_API;
void rocprofiler_call_client_detach(void)
ROCPROFILER_REGISTER_PUBLIC_API;
}
**Function Details:**
- **``rocprofiler_register_attach(const char* environment_buffer, const char* tool_lib_path)``**:
- Called via ptrace from the attachment system
- Receives serialized environment variables for profiling configuration
- Receives the tool library path to load (defaults to "librocprofiler-sdk-tool.so" if NULL)
- Loads the specified tool library and activates profiling services
- Returns ``rocprofiler_register_error_code_t`` status
- **``rocprofiler_register_detach()``**:
- Called via ptrace to stop profiling in the target process
- Calls the tool's detach function and cleans up resources
- Returns ``rocprofiler_register_error_code_t`` status
- **``rocprofiler_register_invoke_reattach()``**: (EXPERIMENTAL)
- Called to reattach profiling to a previously attached process
- Invokes client reattach callbacks without full re-initialization
- Used for resuming profiling after temporary detachment
- Returns ``rocprofiler_register_error_code_t`` status
- **``rocprofiler_call_client_reattach()`` and ``rocprofiler_call_client_detach()``**:
- C wrapper functions for client tool reattachment callbacks
- Automatically resolved and called by the registration system
- Enable tools to handle dynamic attach/detach cycles
Function Call Sequence
======================
Initial Attachment Sequence
---------------------------
The initial attachment process follows this sequence:
.. code-block:: text
Tool Implementation
|
v
attach(pid) ← Your tool calls this
|
v
Ptrace attachment & environment setup
|
v
rocprofiler_register_attach(env_buffer) ← Called via ptrace in target
|
v
Profiling active in target process
|
v
[Profiling data collection...]
|
v
rocprofiler_register_detach() ← Called via ptrace in target
|
v
detach() ← Your tool calls this
|
v
Cleanup complete
Reattachment Sequence (Experimental)
------------------------------------
For reattachment to a previously attached process:
.. code-block:: text
Tool Implementation
|
v
attach(pid) ← Your tool calls this again
|
v
Ptrace attachment & environment setup
|
v
rocprofiler_register_attach(env_buffer) ← Detects previous attachment
|
v
rocprofiler_register_invoke_reattach() ← Calls client reattach callbacks
|
v
Profiling resumed in target process
|
v
[Continued profiling data collection...]
|
v
rocprofiler_register_detach() ← Called via ptrace in target
|
v
detach() ← Your tool calls this
|
v
Cleanup complete
Using the Attachment Functions
==============================
Here's how to use these functions in your own attachment tool:
Basic Attachment Tool Implementation
-----------------------------------
.. code-block:: cpp
#include <dlfcn.h>
#include <iostream>
#include <thread>
#include <chrono>
class ROCprofilerAttachmentTool {
private:
void* attach_lib_handle = nullptr;
void (*attach_func)(uint32_t) = nullptr;
void (*detach_func)() = nullptr;
public:
bool initialize() {
// Load the rocprofiler-attach library/binary
attach_lib_handle = dlopen("librocprofiler-attach.so", RTLD_NOW);
if (!attach_lib_handle) {
std::cerr << "Failed to load rocprofiler-attach: " << dlerror() << std::endl;
return false;
}
// Get the attachment function pointers
attach_func = (void(*)(uint32_t))dlsym(attach_lib_handle, "attach");
detach_func = (void(*)())dlsym(attach_lib_handle, "detach");
if (!attach_func || !detach_func) {
std::cerr << "Failed to find attachment functions" << std::endl;
return false;
}
return true;
}
bool attach_to_process(pid_t pid, uint32_t duration_ms = 0) {
// Validate the target process
if (kill(pid, 0) != 0) {
std::cerr << "Target process " << pid << " is not accessible" << std::endl;
return false;
}
std::cout << "Attaching to process " << pid << std::endl;
// Start attachment - this will handle all ptrace operations
attach_func(pid);
if (duration_ms > 0) {
// Profile for specified duration
std::cout << "Profiling for " << duration_ms << " milliseconds..." << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(duration_ms));
// Stop profiling
detach_func();
} else {
std::cout << "Profiling until process ends or manual detach..." << std::endl;
// Monitor process or wait for external signal to detach
while (kill(pid, 0) == 0) {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
detach_func();
}
std::cout << "Profiling completed" << std::endl;
return true;
}
~ROCprofilerAttachmentTool() {
if (attach_lib_handle) {
dlclose(attach_lib_handle);
}
}
};
Complete Tool Example
--------------------
.. code-block:: cpp
#include <iostream>
#include <vector>
#include <string>
#include <cstdlib>
int main(int argc, char* argv[]) {
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <PID> [duration_ms]" << std::endl;
std::cerr << " PID: Process ID to attach to" << std::endl;
std::cerr << " duration_ms: Optional profiling duration in milliseconds" << std::endl;
return 1;
}
pid_t target_pid = std::stoi(argv[1]);
uint32_t duration = (argc > 2) ? std::stoi(argv[2]) : 0;
// Set up profiling environment variables before attachment
setenv("ROCP_TOOL_ATTACH", "1", 1);
// Note: The attachment system now uses the hardcoded default tool library path
// "librocprofiler-sdk-tool.so" and no longer uses environment variables for tool selection
setenv("ROCPROF_HIP_API_TRACE", "1", 1);
setenv("ROCPROF_KERNEL_TRACE", "1", 1);
setenv("ROCPROF_MEMORY_COPY_TRACE", "1", 1);
setenv("ROCPROF_OUTPUT_PATH", "./attachment-output", 1);
setenv("ROCPROF_OUTPUT_FILE_NAME", "attached_profile", 1);
// Initialize and run attachment tool
ROCprofilerAttachmentTool tool;
if (!tool.initialize()) {
std::cerr << "Failed to initialize attachment tool" << std::endl;
return 1;
}
if (!tool.attach_to_process(target_pid, duration)) {
std::cerr << "Attachment failed" << std::endl;
return 1;
}
std::cout << "Attachment completed successfully" << std::endl;
return 0;
}
Experimental Reattachment API
=============================
ROCprofiler-SDK now provides experimental support for reattachment, allowing tools to handle dynamic attach/detach cycles more efficiently.
Tool Configuration for Reattachment
-----------------------------------
Tools that support reattachment should implement the experimental configuration structure:
.. code-block:: cpp
#include <rocprofiler-sdk/registration.h>
// Experimental reattachment callbacks
void tool_reattach(void* tool_data) {
// Reinitialize contexts and resume profiling
// This is called when reattaching to a previously profiled process
}
void tool_detach(void* tool_data) {
// Suspend profiling operations temporarily
// This is called during detachment, but contexts may be preserved
}
extern "C" rocprofiler_tool_configure_result_experimental_t*
rocprofiler_configure_experimental(uint32_t version,
const char* runtime_version,
uint32_t prio,
rocprofiler_client_id_t* client_id)
{
static auto cfg = rocprofiler_tool_configure_result_experimental_t {
.size = sizeof(rocprofiler_tool_configure_result_experimental_t),
.initialize = &tool_init,
.finalize = &tool_fini,
.tool_data = nullptr,
.tool_reattach = &tool_reattach, // Experimental reattachment support
.tool_detach = &tool_detach // Experimental detachment support
};
return &cfg;
}
Client Callback Functions
-------------------------
The registration system automatically provides C wrapper functions:
.. code-block:: cpp
// These are automatically generated and called by rocprofiler-register
extern "C" void rocprofiler_call_client_reattach(void) {
// Calls the tool's reattach callback with stored tool_data
}
extern "C" void rocprofiler_call_client_detach(void) {
// Calls the tool's detach callback with stored tool_data
}
Reattachment Environment Variables
---------------------------------
When using reattachment, set this additional environment variable:
.. code-block:: cpp
// Indicates that the tool was loaded via attachment (not LD_PRELOAD)
setenv("ROCPROFILER_REGISTER_TOOL_ATTACHED", "1", 1);
This helps the registration system differentiate between initial attachment and reattachment cycles.
Environment Variable Configuration
=================================
Before calling the attachment functions, set up environment variables that will be injected into the target process:
Required Variables
-----------------
.. code-block:: cpp
// Essential for attachment functionality
setenv("ROCP_TOOL_ATTACH", "1", 1);
Tool Library Configuration
--------------------------
The attachment system now uses a hardcoded default tool library path:
.. code-block:: cpp
// The attachment system automatically uses "librocprofiler-sdk-tool.so"
// No environment variable configuration is needed or supported
Tracing Options
--------------
.. code-block:: cpp
// Enable different types of tracing
setenv("ROCPROF_HIP_API_TRACE", "1", 1); // HIP API calls
setenv("ROCPROF_HSA_API_TRACE", "1", 1); // HSA API calls
setenv("ROCPROF_KERNEL_TRACE", "1", 1); // Kernel dispatches
setenv("ROCPROF_MEMORY_COPY_TRACE", "1", 1); // Memory operations
setenv("ROCPROF_MEMORY_ALLOCATION_TRACE", "1", 1); // Memory allocations
setenv("ROCPROF_SCRATCH_MEMORY_TRACE", "1", 1); // Scratch memory
setenv("ROCPROF_MARKER_TRACE", "1", 1); // ROCTx markers
Output Configuration
-------------------
.. code-block:: cpp
// Control output location and format
setenv("ROCPROF_OUTPUT_PATH", "/path/to/output", 1);
setenv("ROCPROF_OUTPUT_FILE_NAME", "profile_name", 1);
setenv("ROCPROF_OUTPUT_FORMAT", "csv", 1); // or "json", "pftrace", etc.
Build Configuration
==================
To build a tool using the attachment functions:
CMakeLists.txt
-------------
.. code-block:: cmake
cmake_minimum_required(VERSION 3.16)
project(my_rocprofiler_attach_tool)
set(CMAKE_CXX_STANDARD 17)
# Find ROCprofiler SDK (for headers and linking)
find_package(rocprofiler-sdk REQUIRED)
add_executable(my_attach_tool
main.cpp
attachment_tool.cpp
)
# Link with required libraries
target_link_libraries(my_attach_tool
rocprofiler-sdk::rocprofiler-sdk
dl # for dlopen/dlsym operations
)
# Set capabilities for ptrace operations
add_custom_command(TARGET my_attach_tool POST_BUILD
COMMAND sudo setcap cap_sys_ptrace+ep $<TARGET_FILE:my_attach_tool>
COMMENT "Setting ptrace capability"
)
Error Handling
=============
When using the attachment functions, handle these common error conditions:
.. code-block:: cpp
class AttachmentErrorHandler {
public:
static bool validate_target_process(pid_t pid) {
// Check if process exists
if (kill(pid, 0) != 0) {
std::cerr << "Process " << pid << " not found or not accessible" << std::endl;
return false;
}
// Check if it's a GPU application
std::string maps_path = "/proc/" + std::to_string(pid) + "/maps";
std::ifstream maps(maps_path);
std::string line;
bool has_gpu_libs = false;
while (std::getline(maps, line)) {
if (line.find("libamdhip64.so") != std::string::npos ||
line.find("libhsa-runtime64.so") != std::string::npos) {
has_gpu_libs = true;
break;
}
}
if (!has_gpu_libs) {
std::cerr << "Process " << pid << " does not appear to use GPU APIs" << std::endl;
return false;
}
return true;
}
static void handle_attachment_errors() {
// Check for common permission issues
if (geteuid() != 0) {
std::cerr << "Warning: Not running as root. Ensure CAP_SYS_PTRACE capability is set." << std::endl;
}
// Check if rocprofiler libraries are available
if (getenv("LD_LIBRARY_PATH") == nullptr ||
std::string(getenv("LD_LIBRARY_PATH")).find("/opt/rocm/lib") == std::string::npos) {
std::cerr << "Warning: /opt/rocm/lib may not be in LD_LIBRARY_PATH" << std::endl;
}
}
};
Architecture Overview
=====================
Process attachment consists of several cooperating components:
.. code-block:: text
Attachment Tool (your implementation)
|
v
1. Process Discovery & Validation
|
v
2. Ptrace Attachment & Control
|
v
3. Environment Variable Injection
|
v
4. Library Loading (rocprofiler-register)
|
v
5. Profiling Service Activation
|
v
6. Data Collection & Management
|
v
7. Detachment & Cleanup
Theoretical Implementation Details
=================================
Core Implementation Components
=============================
1. Process Discovery and Validation
-----------------------------------
**Target Process Requirements:**
.. code-block:: cpp
#include <sys/types.h>
#include <signal.h>
#include <unistd.h>
bool validate_target_process(pid_t pid) {
// Check if process exists and is accessible
if (kill(pid, 0) != 0) {
return false; // Process doesn't exist or no permission
}
// Verify it's a GPU application by checking loaded libraries
std::string maps_path = "/proc/" + std::to_string(pid) + "/maps";
std::ifstream maps(maps_path);
std::string line;
bool has_hip = false, has_hsa = false;
while (std::getline(maps, line)) {
if (line.find("libamdhip64.so") != std::string::npos) has_hip = true;
if (line.find("libhsa-runtime64.so") != std::string::npos) has_hsa = true;
}
return has_hip || has_hsa; // Must use HIP or HSA
}
2. Ptrace-Based Process Control
------------------------------
**Core Ptrace Operations:**
.. code-block:: cpp
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>
class ProcessAttachment {
private:
pid_t target_pid;
bool attached = false;
public:
bool attach(pid_t pid) {
target_pid = pid;
// Attach to the target process
if (ptrace(PTRACE_ATTACH, target_pid, nullptr, nullptr) == -1) {
perror("ptrace PTRACE_ATTACH failed");
return false;
}
// Wait for the process to stop
int status;
if (waitpid(target_pid, &status, 0) == -1) {
perror("waitpid failed");
detach();
return false;
}
if (!WIFSTOPPED(status)) {
fprintf(stderr, "Process did not stop after attach\n");
detach();
return false;
}
attached = true;
return true;
}
bool detach() {
if (!attached) return true;
// Detach and allow process to continue
if (ptrace(PTRACE_DETACH, target_pid, nullptr, nullptr) == -1) {
perror("ptrace PTRACE_DETACH failed");
return false;
}
attached = false;
return true;
}
};
3. Environment Variable Injection
---------------------------------
**Environment Variable Management:**
.. code-block:: cpp
#include <fstream>
#include <vector>
class EnvironmentInjector {
public:
struct EnvironmentVar {
std::string name;
std::string value;
};
// Prepare environment variables for profiling
std::vector<EnvironmentVar> prepare_profiling_env(
const std::vector<std::string>& trace_options,
const std::string& output_path,
const std::string& output_file) {
std::vector<EnvironmentVar> env_vars;
// Essential attachment variable
env_vars.push_back({"ROCP_TOOL_ATTACH", "1"});
// Configure tracing based on options
for (const auto& option : trace_options) {
if (option == "hip-trace") {
env_vars.push_back({"ROCPROF_HIP_API_TRACE", "1"});
}
if (option == "kernel-trace") {
env_vars.push_back({"ROCPROF_KERNEL_TRACE", "1"});
}
if (option == "hsa-trace") {
env_vars.push_back({"ROCPROF_HSA_API_TRACE", "1"});
}
if (option == "memory-copy-trace") {
env_vars.push_back({"ROCPROF_MEMORY_COPY_TRACE", "1"});
}
}
// Output configuration
env_vars.push_back({"ROCPROF_OUTPUT_PATH", output_path});
env_vars.push_back({"ROCPROF_OUTPUT_FILE_NAME", output_file});
return env_vars;
}
// Serialize environment for injection
std::vector<uint8_t> serialize_environment(const std::vector<EnvironmentVar>& vars) {
std::vector<uint8_t> buffer(4); // Start with count
uint32_t count = vars.size();
// Store count in first 4 bytes
buffer[0] = count & 0xFF;
buffer[1] = (count >> 8) & 0xFF;
buffer[2] = (count >> 16) & 0xFF;
buffer[3] = (count >> 24) & 0xFF;
// Add each variable as null-terminated name and value
for (const auto& var : vars) {
// Add variable name
for (char c : var.name) {
buffer.push_back(c);
}
buffer.push_back(0); // Null terminate name
// Add variable value
for (char c : var.value) {
buffer.push_back(c);
}
buffer.push_back(0); // Null terminate value
}
return buffer;
}
};
4. Memory Manipulation and Library Loading
------------------------------------------
**Remote Memory Operations:**
.. code-block:: cpp
#include <sys/mman.h>
class RemoteMemoryManager {
private:
pid_t target_pid;
public:
RemoteMemoryManager(pid_t pid) : target_pid(pid) {}
// Allocate memory in remote process
void* remote_mmap(size_t length, int prot, int flags) {
// Find a suitable location for injection
struct user_regs_struct regs;
if (ptrace(PTRACE_GETREGS, target_pid, nullptr, &regs) == -1) {
return nullptr;
}
// Save original registers
struct user_regs_struct orig_regs = regs;
// Set up mmap syscall
regs.rax = 9; // __NR_mmap
regs.rdi = 0; // addr (let kernel choose)
regs.rsi = length;
regs.rdx = prot;
regs.r10 = flags;
regs.r8 = -1; // fd
regs.r9 = 0; // offset
if (ptrace(PTRACE_SETREGS, target_pid, nullptr, &regs) == -1) {
return nullptr;
}
// Execute syscall
if (ptrace(PTRACE_SYSCALL, target_pid, nullptr, nullptr) == -1) {
return nullptr;
}
// Wait for syscall completion
int status;
waitpid(target_pid, &status, 0);
// Get result
if (ptrace(PTRACE_GETREGS, target_pid, nullptr, &regs) == -1) {
return nullptr;
}
void* result = (void*)regs.rax;
// Restore original registers
ptrace(PTRACE_SETREGS, target_pid, nullptr, &orig_regs);
return (result == (void*)-1) ? nullptr : result;
}
// Write data to remote process memory
bool write_memory(void* addr, const void* data, size_t size) {
const uint8_t* bytes = static_cast<const uint8_t*>(data);
size_t written = 0;
while (written < size) {
long word = 0;
size_t to_copy = std::min(sizeof(long), size - written);
// For partial words, read existing content first
if (to_copy < sizeof(long)) {
errno = 0;
word = ptrace(PTRACE_PEEKDATA, target_pid,
(uint8_t*)addr + written, nullptr);
if (errno != 0) return false;
}
// Copy new data into word
memcpy(&word, bytes + written, to_copy);
// Write word to remote process
if (ptrace(PTRACE_POKEDATA, target_pid,
(uint8_t*)addr + written, word) == -1) {
return false;
}
written += to_copy;
}
return true;
}
};
5. Library Injection and Symbol Resolution
------------------------------------------
**Dynamic Library Loading:**
.. code-block:: cpp
#include <dlfcn.h>
#include <link.h>
class LibraryInjector {
private:
pid_t target_pid;
RemoteMemoryManager memory_manager;
public:
LibraryInjector(pid_t pid) : target_pid(pid), memory_manager(pid) {}
// Inject rocprofiler-register library
bool inject_register_library() {
const char* lib_path = "/opt/rocm/lib/librocprofiler-register.so";
// Find dlopen in target process
void* dlopen_addr = find_function_address("dlopen");
if (!dlopen_addr) {
fprintf(stderr, "Could not find dlopen in target process\n");
return false;
}
// Allocate memory for library path
void* path_addr = memory_manager.remote_mmap(
strlen(lib_path) + 1,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS);
if (!path_addr) return false;
// Write library path to remote memory
if (!memory_manager.write_memory(path_addr, lib_path, strlen(lib_path) + 1)) {
return false;
}
// Call dlopen in target process
return call_remote_function(dlopen_addr,
{(uint64_t)path_addr, RTLD_NOW | RTLD_GLOBAL});
}
void* find_function_address(const char* function_name) {
// Parse /proc/PID/maps to find loaded libraries
std::string maps_path = "/proc/" + std::to_string(target_pid) + "/maps";
std::ifstream maps(maps_path);
std::string line;
while (std::getline(maps, line)) {
if (line.find("libc.so") != std::string::npos) {
// Extract base address of libc
size_t dash = line.find('-');
std::string base_addr_str = line.substr(0, dash);
void* base_addr = (void*)std::stoull(base_addr_str, nullptr, 16);
// Open libc and find function offset
void* handle = dlopen("libc.so.6", RTLD_LAZY);
if (handle) {
void* func_addr = dlsym(handle, function_name);
if (func_addr) {
// Calculate actual address in target process
return (uint8_t*)base_addr + ((uint8_t*)func_addr - (uint8_t*)dlsym(RTLD_DEFAULT, "main"));
}
dlclose(handle);
}
}
}
return nullptr;
}
};
6. ROCprofiler-Register Communication Protocol
----------------------------------------------
**Attachment Protocol Implementation:**
.. code-block:: cpp
extern "C" {
// Function signatures from rocprofiler-register
typedef void (*attach_func_t)(uint32_t pid);
typedef void (*detach_func_t)();
}
class ROCprofilerAttachment {
private:
pid_t target_pid;
void* register_handle = nullptr;
attach_func_t attach_func = nullptr;
detach_func_t detach_func = nullptr;
public:
bool initialize() {
// Load rocprofiler-register library
register_handle = dlopen("/opt/rocm/lib/librocprofiler-register.so", RTLD_NOW);
if (!register_handle) {
fprintf(stderr, "Failed to load rocprofiler-register: %s\n", dlerror());
return false;
}
// Get attachment functions
attach_func = (attach_func_t)dlsym(register_handle, "attach");
detach_func = (detach_func_t)dlsym(register_handle, "detach");
if (!attach_func || !detach_func) {
fprintf(stderr, "Failed to find attachment functions\n");
return false;
}
return true;
}
bool attach_to_process(pid_t pid, const std::vector<uint8_t>& env_buffer) {
target_pid = pid;
// Set up environment for rocprofiler-register
// This involves injecting the environment buffer into the target process
// Call the attach function
attach_func(pid);
return true;
}
void detach_from_process() {
if (detach_func) {
detach_func();
}
}
};
Complete Attachment Tool Implementation
======================================
**Main Attachment Tool Structure:**
.. code-block:: cpp
#include <iostream>
#include <vector>
#include <string>
#include <chrono>
#include <thread>
class ROCprofilerAttachTool {
private:
ProcessAttachment process_control;
EnvironmentInjector env_injector;
LibraryInjector lib_injector;
ROCprofilerAttachment rocprof_attachment;
public:
struct AttachmentConfig {
pid_t target_pid;
std::vector<std::string> trace_options;
std::string output_path = "./rocprof-attachment-output";
std::string output_filename = "attached_profile";
uint32_t duration_msec = 0; // 0 = until process ends
};
bool attach_and_profile(const AttachmentConfig& config) {
// 1. Validate target process
if (!validate_target_process(config.target_pid)) {
std::cerr << "Invalid or inaccessible target process: " << config.target_pid << std::endl;
return false;
}
// 2. Initialize rocprofiler attachment system
if (!rocprof_attachment.initialize()) {
std::cerr << "Failed to initialize rocprofiler attachment system" << std::endl;
return false;
}
// 3. Attach to target process
if (!process_control.attach(config.target_pid)) {
std::cerr << "Failed to attach to process " << config.target_pid << std::endl;
return false;
}
// 4. Prepare environment variables
auto env_vars = env_injector.prepare_profiling_env(
config.trace_options,
config.output_path,
config.output_filename);
auto env_buffer = env_injector.serialize_environment(env_vars);
// 5. Inject rocprofiler-register library
LibraryInjector injector(config.target_pid);
if (!injector.inject_register_library()) {
std::cerr << "Failed to inject rocprofiler-register library" << std::endl;
process_control.detach();
return false;
}
// 6. Activate profiling
if (!rocprof_attachment.attach_to_process(config.target_pid, env_buffer)) {
std::cerr << "Failed to activate profiling" << std::endl;
process_control.detach();
return false;
}
// 7. Allow process to continue with profiling active
if (!process_control.detach()) {
std::cerr << "Warning: Failed to detach cleanly" << std::endl;
}
// 8. Wait for specified duration or until process ends
if (config.duration_msec > 0) {
std::cout << "Profiling for " << config.duration_msec << " milliseconds..." << std::endl;
std::this_thread::sleep_for(std::chrono::milliseconds(config.duration_msec));
// Re-attach to stop profiling
rocprof_attachment.detach_from_process();
} else {
std::cout << "Profiling until process ends..." << std::endl;
// Monitor process and wait for it to end
while (kill(config.target_pid, 0) == 0) {
std::this_thread::sleep_for(std::chrono::seconds(1));
}
}
std::cout << "Profiling completed. Output saved to: "
<< config.output_path << "/" << config.output_filename << std::endl;
return true;
}
};
// Example usage
int main(int argc, char* argv[]) {
if (argc < 2) {
std::cerr << "Usage: " << argv[0] << " <PID> [options]" << std::endl;
return 1;
}
ROCprofilerAttachTool::AttachmentConfig config;
config.target_pid = std::stoi(argv[1]);
config.trace_options = {"hip-trace", "kernel-trace", "memory-copy-trace"};
config.duration_msec = 5000; // 5 seconds
ROCprofilerAttachTool tool;
if (!tool.attach_and_profile(config)) {
std::cerr << "Attachment and profiling failed" << std::endl;
return 1;
}
return 0;
}
Required System Permissions and Setup
=====================================
**Permission Requirements:**
.. code-block:: bash
# Your attachment tool will need:
# 1. Ptrace permissions (may require root or capabilities)
sudo setcap cap_sys_ptrace+ep your_attachment_tool
# 2. Access to /proc filesystem
# Usually available by default
# 3. Ability to load shared libraries
# Ensure ROCm libraries are in LD_LIBRARY_PATH
export LD_LIBRARY_PATH=/opt/rocm/lib:$LD_LIBRARY_PATH
**Build Requirements:**
.. code-block:: cmake
# CMakeLists.txt for your attachment tool
cmake_minimum_required(VERSION 3.16)
project(rocprofiler_attach_tool)
set(CMAKE_CXX_STANDARD 17)
find_package(rocprofiler-sdk REQUIRED)
add_executable(rocprofiler_attach_tool
main.cpp
process_attachment.cpp
environment_injection.cpp
library_injection.cpp
)
target_link_libraries(rocprofiler_attach_tool
rocprofiler-sdk::rocprofiler-sdk
dl # for dlopen/dlsym
)
Error Handling and Debugging
============================
**Common Issues and Solutions:**
1. **Ptrace Permissions**: Use ``strace`` to debug ptrace failures
2. **Library Loading**: Check ``/proc/PID/maps`` to verify library injection
3. **Environment Variables**: Validate environment buffer format
4. **Process State**: Monitor target process status during attachment
**Debugging Techniques:**
.. code-block:: cpp
// Enable debug logging
setenv("ROCPROF_LOGGING_LEVEL", "trace", 1);
// Monitor attachment progress
bool debug_attachment(pid_t pid) {
std::cout << "Target process memory maps:" << std::endl;
std::string cmd = "cat /proc/" + std::to_string(pid) + "/maps";
system(cmd.c_str());
std::cout << "Target process environment:" << std::endl;
cmd = "cat /proc/" + std::to_string(pid) + "/environ | tr '\\0' '\\n'";
system(cmd.c_str());
return true;
}
This implementation guide provides the foundation needed to build a complete process attachment tool for ROCprofiler-SDK. The actual rocprofv3 implementation uses similar techniques with additional optimizations and error handling.