Files
rocm-systems/projects/rocprofiler-sdk/source/lib/rocprofv3-attach/ptrace_session.cpp
T
Mark Meserve bf49039005 [rocprofiler-sdk][rocprofiler-register] Initial Attachment Support (#316)
* attach: milestone: API tracing

- This pairs with another commit in rocprofiler-sdk to fully
  function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup

- Remove hardcode for loading of tool library
- Make invoke registration functions public again

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk

* attach: prestore overhaul

- Must be paired with commit in rocprofiler-sdk

* attach: add dispatch table rework

- Register will load the prestore library and provide entrypoints to sdk

* attach: formatting and cleanup

* attach: revise dispatch table scheme

* attach: formatting

* attach: milestone: API tracing

- This change must be paired with a change in rocprofiler-register to
  fully function.
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup and comments

* attach: Formatting and crash fixes

* attach: add attach duration

- Add option attach-duration-msec for attachment

* Formatting + sglang hang fix via signal handling

* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register

* Allow null agents for scratch output

* attach: improve queue library interface

- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect

* attach: add code_object support

- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements

* attach: rename queue library to prestore

* attach: prestore overhaul

- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
  - Separates registrations for different object types
  - Sets up future changes for initialization

* attach: add prestore dispatch table

- Removes linkage to prestore library from sdk

* attach: cleanup

* attach: formatting

* attach: fix input prompt not appearing

* attach: fix component name in cmake

* attach: revert change to export level

* Make prestore API public

* attach: update sdk attachment library WIP

- This commit is NONFUNCTIONAL

- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage

* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL

- Changes rocprofiler_register to handle dispatch table from attach
  library.
- Still needs changes in SDK with dispatch table usage

* attach: dispatch table wip
- This commit is NONFUNCTIONAL

* attach: move attach component into core

* attach: rename to rocprofv3-attach

* attach: add callbacks for new queues and code objects

* attach: finish dispatch table implementation

- Fixes kernel tracing

* attach: add cmake variable for attachment support

* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests

- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration

Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities

* Documentation Update

* attach: make attach script callable

* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name

* attach: revert metrics library path changes

* Generic Attachment in Register (#942)

Remove tool references in register

* Add second param to attach call in rocprof register

* Add experimental reattachment support for ROCprofiler-SDK

This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:

**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests

**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:

1. Initial Attachment Flow:
    - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
    - Full tool initialization with complete context setup
    - Sets prev_attached atomic flag to track state

2. Reattachment Flow:
    - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
    - Bypasses full re-initialization, calls client reattach callbacks instead
    - Preserves existing contexts and buffers, only reactivates profiling services

**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability

**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts

**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles

**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization

**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts

**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities

**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution

**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration

The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.

* Fix misc clang-tidy warnings/errors

* CMake Option and Environment Variable Updates

- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->

* Source reorganization

* Formatting + new lines at EOF

* Fix flake8 F841: local variable is assigned to but never used

* Update attachment test

- get rid of 5 second start delay
- add roctx

* Rework implementation

- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst

* Handle re-attachment options

- inherit options from previous attachment
- check previous options do not modify data collection services

* Fix support for tools w/o rocprofiler_configure_attach

- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool

* attach: remove unknown agent handling

- Change was from earlier commit, no longer needed

* attach: add error for attaching without library loaded

* attach: revise version numbering

* attach: register header revisions

* attach: clang format register

* attach: formatting

* attach: fix build failure

- Remove cross dependency into rocprofiler-sdk, fixes build on some systems

* attach: revise register library detection

* Update rocprofiler-register and attach library

- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()

* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion

* Fix output support for rocprofiler-sdk-tool

* Fix formatting

* Fix clang tidy errors

* Misc rocprofiler-sdk-attach fixes

* attach: add sigint handling to attach python

* tool README.md formatting

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix buffered output issue

* attach: add errors for tool attach

* CI Fixes

* Rework tests

* attach: improve library loading in rocprofv3 attach

* formatting

* Update tests to use pytest framework

* Fix test_attachment_hsa_api_trace

* attach: catch ctypes exceptions

* attach: fix leak in registration

* attach: fix sanitizer tests

* attach: fix sanitizer tests further

* attach: disable attach asan tests

* attach: disable ubsan test

* attach: fix permissions in installed test package

* attach: formatting

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-09-18 18:10:45 -05:00

886 lines
26 KiB
C++

// MIT License
//
// Copyright (c) 2025 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include "ptrace_session.hpp"
#include "details/filesystem.hpp"
#include "lib/common/logging.hpp"
#include <dlfcn.h>
#include <fcntl.h>
#include <link.h>
#include <sys/mman.h>
#include <sys/ptrace.h>
#include <sys/stat.h>
#include <sys/user.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fstream>
#include <type_traits>
#define AT_ENTRY 9 /* Entry point of program */
// ptrace memory operations use "word length" which is dependent on system architecture.
static_assert(sizeof(void*) == 8);
// In addition, this file uses x64 assembly which is inherently platform dependent.
#ifndef __x86_64__
static_assert(false);
#endif
namespace
{
/* Copied from glibc's elf.h. */
typedef struct
{
uint64_t a_type; /* Entry type */
union
{
uint64_t a_val; /* Integer value */
/* We use to have pointer elements added here. We cannot do that,
though, since it does not work when using 32-bit definitions
on 64-bit platforms and vice versa. */
} a_un;
} Elf64_auxv_t;
// Very limited list of operations for logging only.
constexpr const char*
ptrace_op_name(__ptrace_request op)
{
switch(op)
{
case PTRACE_SEIZE: return "PTRACE_SEIZE";
case PTRACE_DETACH: return "PTRACE_DETACH";
case PTRACE_POKEDATA: return "PTRACE_POKEDATA";
case PTRACE_PEEKDATA: return "PTRACE_PEEKDATA";
case PTRACE_INTERRUPT: return "PTRACE_INTERRUPT";
case PTRACE_GETREGS: return "PTRACE_GETREGS";
case PTRACE_SETREGS: return "PTRACE_SETREGS";
case PTRACE_CONT: return "PTRACE_CONT";
default: return "unknown op";
}
}
// Boilerplate around ptrace calls.
// If an error occurs, logs the error and returns false.
#define PTRACE_CALL(op, pid, addr, data) \
ROCP_TRACE << "ptrace call params(" << ptrace_op_name(op) << "(" << op << "), " << pid << ", " \
<< (uint64_t) addr << ", " << (uint64_t) data << ")"; \
if(errno = 0, ptrace(op, pid, addr, data); errno != 0) \
{ \
ROCP_ERROR << "ptrace call failed. errno: " << errno << " - " << strerror(errno) \
<< " params(" << ptrace_op_name(op) << "(" << op << "), " << pid << ", " \
<< (uint64_t) addr << ", " << (uint64_t) data << ")"; \
return false; \
}
// Changes the order of parameters for PEEKDATA so it can be used like other operations.
// value should be uint64_t
#define PTRACE_PEEK(pid, addr, read_value) \
static_assert(std::is_same<decltype(read_value), uint64_t>::value); \
ROCP_TRACE << "ptrace call params(PTRACE_PEEKDATA(2), " << pid << ", " << (uint64_t) addr \
<< ", 0)"; \
if(errno = 0, read_value = ptrace(PTRACE_PEEKDATA, pid, addr, NULL); errno != 0) \
{ \
ROCP_ERROR << "ptrace call failed. errno: " << errno << " params(PTRACE_PEEKDATA(2), " \
<< pid << ", " << (uint64_t) addr << ", 0)"; \
return false; \
}
using open_modes_vec_t = std::vector<int>;
void
get_auxv_entry(int pid, size_t& entry_addr)
{
char filename[PATH_MAX];
int fd{};
const int auxv_size = sizeof(Elf64_auxv_t);
char buf[sizeof(Elf64_auxv_t)]; /* The larger of the two. */
snprintf(filename, sizeof filename, "/proc/%d/auxv", pid);
fd = open(filename, O_RDONLY);
if(fd < 0) ROCP_ERROR << "Unable to open auxv file " << filename;
entry_addr = 0;
while(read(fd, buf, auxv_size) == auxv_size && entry_addr == 0)
{
Elf64_auxv_t* const aux = (Elf64_auxv_t*) buf;
if(aux->a_type == AT_ENTRY)
{
entry_addr = aux->a_un.a_val;
}
}
close(fd);
if(entry_addr == 0)
{
ROCP_ERROR << "Unexpected mising AT_ENTRY for " << filename;
}
ROCP_TRACE << "Entry address found to be " << entry_addr << " from " << filename;
}
std::optional<std::string>
get_linked_path(std::string_view _name, open_modes_vec_t&& _open_modes)
{
const open_modes_vec_t default_link_open_modes = {(RTLD_LAZY | RTLD_NOLOAD)};
if(_name.empty()) return fs::current_path().string();
if(_open_modes.empty()) _open_modes = default_link_open_modes;
void* _handle = nullptr;
bool _noload = false;
for(auto _mode : _open_modes)
{
_handle = dlopen(_name.data(), _mode);
_noload = (_mode & RTLD_NOLOAD) == RTLD_NOLOAD;
if(_handle) break;
}
if(_handle)
{
struct link_map* _link_map = nullptr;
dlinfo(_handle, RTLD_DI_LINKMAP, &_link_map);
if(_link_map != nullptr && !std::string_view{_link_map->l_name}.empty())
{
return fs::absolute(fs::path{_link_map->l_name}).string();
}
if(_noload == false) dlclose(_handle);
}
return std::nullopt;
}
auto
get_this_library_path()
{
auto _this_lib_path = get_linked_path("librocprofv3-attach.so.1", {RTLD_NOLOAD | RTLD_LAZY});
LOG_IF(FATAL, !_this_lib_path) << "librocprofv3-attach.so.1"
<< " could not locate itself in the list of loaded libraries";
return fs::path{*_this_lib_path}.parent_path().string();
}
void*
get_library_handle(std::string_view _lib_name)
{
void* _lib_handle = nullptr;
if(_lib_name.empty()) return nullptr;
auto _lib_path = fs::path{_lib_name};
auto _lib_path_fname = _lib_path.filename();
auto _lib_path_abs =
(_lib_path.is_absolute()) ? _lib_path : (fs::path{get_this_library_path()} / _lib_path);
// check to see if the rocprofiler library is already loaded
_lib_handle = dlopen(_lib_path.c_str(), RTLD_NOLOAD | RTLD_LAZY);
if(_lib_handle)
{
LOG(INFO) << "loaded " << _lib_name << " library at " << _lib_path.string()
<< " (handle=" << _lib_handle << ") via RTLD_NOLOAD | RTLD_LAZY";
}
// try to load with the given path
if(!_lib_handle)
{
_lib_handle = dlopen(_lib_path.c_str(), RTLD_GLOBAL | RTLD_LAZY);
if(_lib_handle)
{
LOG(INFO) << "loaded " << _lib_name << " library at " << _lib_path.string()
<< " (handle=" << _lib_handle << ") via RTLD_GLOBAL | RTLD_LAZY";
}
}
// try to load with the absoulte path
if(!_lib_handle)
{
_lib_path = _lib_path_abs;
_lib_handle = dlopen(_lib_path.c_str(), RTLD_GLOBAL | RTLD_LAZY);
}
// try to load with the basename path
if(!_lib_handle)
{
_lib_path = _lib_path_fname;
_lib_handle = dlopen(_lib_path.c_str(), RTLD_GLOBAL | RTLD_LAZY);
}
LOG(INFO) << "loaded " << _lib_name << " library at " << _lib_path.string()
<< " (handle=" << _lib_handle << ")";
LOG_IF(WARNING, _lib_handle == nullptr) << _lib_name << " failed to load\n";
return _lib_handle;
}
} // namespace
namespace rocprofiler
{
namespace attach
{
PTraceSession::PTraceSession(int _pid)
: m_pid{_pid}
{}
PTraceSession::~PTraceSession()
{
if(m_attached)
{
detach();
}
}
bool
PTraceSession::attach()
{
PTRACE_CALL(PTRACE_SEIZE, m_pid, NULL, NULL);
ROCP_INFO << "Successfully attached to pid " << m_pid;
m_attached = true;
return true;
}
bool
PTraceSession::detach()
{
m_attached = false;
PTRACE_CALL(PTRACE_DETACH, m_pid, NULL, NULL);
ROCP_INFO << "Detached from pid " << m_pid;
return true;
}
// pre-cond: process must be stopped
bool
PTraceSession::write(size_t addr, const std::vector<uint8_t>& data, size_t size) const
{
constexpr size_t word_size = sizeof(void*);
size_t word_iter = 0;
for(word_iter = 0; word_iter < (size / word_size); ++word_iter)
{
const size_t offset = (word_iter * word_size);
uint64_t word;
std::memcpy(&word, data.data() + offset, word_size);
PTRACE_CALL(PTRACE_POKEDATA, m_pid, addr + offset, word);
}
// If not divisible, get the last word to do a partial write correctly.
size_t remainder = size % word_size;
if(remainder != 0u)
{
const size_t offset = (word_iter * word_size);
uint64_t last_word = 0;
PTRACE_PEEK(m_pid, addr + offset, last_word);
std::memcpy(&last_word, data.data() + offset, remainder);
PTRACE_CALL(PTRACE_POKEDATA, m_pid, addr + offset, last_word);
}
ROCP_TRACE << "ptrace wrote " << size << " bytes at " << addr;
return true;
}
// pre-cond: process must be stopped
bool
PTraceSession::read(size_t addr, std::vector<uint8_t>& data, size_t size) const
{
data.clear();
data.resize(size);
constexpr size_t word_size = sizeof(void*);
size_t word_iter = 0;
for(word_iter = 0; word_iter < (size / word_size); ++word_iter)
{
const size_t offset = (word_iter * word_size);
uint64_t word = 0;
PTRACE_PEEK(m_pid, addr + offset, word);
std::memcpy(data.data() + offset, &word, word_size);
}
size_t remainder = size % word_size;
if(remainder != 0u)
{
const size_t offset = (word_iter * word_size);
uint64_t last_word = 0;
PTRACE_PEEK(m_pid, addr + offset, last_word);
std::memcpy(data.data() + offset, &last_word, remainder);
}
ROCP_TRACE << "ptrace read " << size << " bytes at " << addr;
return true;
}
// pre-cond: process must be stopped
bool
PTraceSession::swap(size_t addr,
const std::vector<uint8_t>& in_data,
std::vector<uint8_t>& out_data,
size_t size) const
{
if(!read(addr, out_data, size))
{
return false;
}
return write(addr, in_data, size);
}
bool
PTraceSession::simple_mmap(void*& addr, size_t length) const
{
if(!m_attached)
{
ROCP_ERROR << "simple_mmap called while not attached";
return false;
}
if(!stop())
{
return false;
}
// Create a system call to mmap:
// mmap(NULL, length, prot, flags, -1, 0);
// Get entry address for safe injection of op codes
size_t entry_addr{0};
get_auxv_entry(m_pid, entry_addr);
// Save current register file
struct user_regs_struct oldregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &oldregs);
// Set register file for call
struct user_regs_struct newregs = oldregs;
newregs.rax = 9; // calling convention: syscall ID for mmap
newregs.rdi = 0; // addr
newregs.rsi = length; // length
newregs.rdx = PROT_READ | PROT_WRITE; // prot
newregs.r10 = MAP_PRIVATE | MAP_ANONYMOUS; // flags
newregs.r8 = -1; // fd (unused)
newregs.r9 = 0; // offset
newregs.rip = entry_addr;
newregs.rsp = oldregs.rsp - 128; // move sp by 128 to not clobber redlined functions
newregs.rsp -= (newregs.rsp % 16);
// Set syscall registers
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &newregs);
// x64 assembly to perform a syscall and breakpoint when done
// 0f 05 syscall
// cc int3
std::vector<uint8_t> new_code({0x0f, 0x05, 0xcc});
std::vector<uint8_t> old_code;
// Write in new opcodes
if(!swap(entry_addr, new_code, old_code, 3))
{
return false;
}
ROCP_TRACE << "Attempting to execute mmap syscall";
// Resume execution
if(!cont())
{
return false;
}
// Wait for int3 breakpoint to be hit
int status;
if(waitpid(m_pid, &status, WUNTRACED) == -1)
{
return false;
}
// Get registers to see mmap's return values
struct user_regs_struct returnregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &returnregs);
// Write in old opcodes
if(!write(entry_addr, old_code, 3))
{
return false;
}
// Restore register file
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &oldregs);
// Restart execution
if(!cont())
{
return false;
}
addr = reinterpret_cast<void*>(returnregs.rax); // NOLINT(performance-no-int-to-ptr)
return true;
}
bool
PTraceSession::simple_munmap(void*& addr, size_t length) const
{
if(!m_attached)
{
ROCP_ERROR << "simple_munmap called while not attached";
return false;
}
// Stop the process
if(!stop())
{
return false;
}
// Create a system call to mumap:
// mumap(NULL, length, prot, flags, -1, 0);
// Get entry address for safe injection of op codes
size_t entry_addr{0};
get_auxv_entry(m_pid, entry_addr);
// Save current register file
struct user_regs_struct oldregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &oldregs);
// Set register file for call
struct user_regs_struct newregs = oldregs;
newregs.rax = 11; // calling convention: syscall ID for mumap
newregs.rdi = reinterpret_cast<size_t>(addr); // addr
newregs.rsi = length; // length
newregs.rip = entry_addr;
newregs.rsp = oldregs.rsp - 128; // move sp by 128 to not clobber redlined functions
newregs.rsp -= (newregs.rsp % 16);
// Set syscall registers
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &newregs);
// x64 assembly to perform a syscall and breakpoint when done
// 0f 05 syscall
// cc int3
std::vector<uint8_t> new_code({0x0f, 0x05, 0xcc});
std::vector<uint8_t> old_code;
// Write in new opcodes
if(!swap(entry_addr, new_code, old_code, 3))
{
return false;
}
ROCP_TRACE << "Attempting to execute munmap syscall";
// Restart execution
if(!cont())
{
return false;
}
// Wait for int3 breakpoint to be hit
int status;
if(waitpid(m_pid, &status, WUNTRACED) == -1)
{
return false;
}
// Get registers to see munmap's return values
struct user_regs_struct returnregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &returnregs);
// Write in old opcodes
if(!write(entry_addr, old_code, 3))
{
return false;
}
// Restore register file
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &oldregs);
// Restart execution
if(!cont())
{
return false;
}
return true;
}
bool
PTraceSession::call_function(const std::string& library, const std::string& symbol)
{
return call_function(library, symbol, nullptr);
}
// This supports calling a dynamically loaded function with at most 1 parameter.
// More parameters could be supported, but this is good enough for now.
// Correctly implementing this would require duplicating the x64 calling convention. Probably not
// worth it.
bool
PTraceSession::call_function(const std::string& library,
const std::string& symbol,
void* first_param)
{
if(!m_attached)
{
ROCP_ERROR << "call_function called while not attached";
return false;
}
// Stop the process
if(!stop())
{
return false;
}
void* target_addr;
if(!find_symbol(target_addr, library, symbol))
{
return false;
}
// Get entry address for safe injection of op codes
size_t entry_addr{0};
get_auxv_entry(m_pid, entry_addr);
// Save current register file
struct user_regs_struct oldregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &oldregs);
// Construct registers to call a function with 1 parameter
// symbol(first_param)
struct user_regs_struct newregs = oldregs;
newregs.rax = reinterpret_cast<size_t>(target_addr); // target function
newregs.rdi = reinterpret_cast<size_t>(first_param); // first parameter
newregs.rip = entry_addr;
newregs.rsp = oldregs.rsp - 128; // move sp by 128 to not clobber redlined functions
newregs.rsp -= (newregs.rsp % 16);
// x64 assembly to call a function by register and breakpoint when done
// ff d0 call rax
// cc int3
std::vector<uint8_t> new_code({0xff, 0xd0, 0xcc});
std::vector<uint8_t> old_code;
// Write in new opcodes
if(!swap(entry_addr, new_code, old_code, 3))
{
return false;
}
// Set syscall registers
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &newregs);
ROCP_TRACE << "Attempting to execute " << library << "::" << symbol << "(" << first_param
<< ")";
// Restart execution
if(!cont())
{
return false;
}
// Wait for int3 to be hit
if(waitpid(m_pid, nullptr, WSTOPPED) == -1)
{
return false;
}
// Get registers to see return values
struct user_regs_struct returnregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &returnregs);
// Write in old opcodes
if(!write(entry_addr, old_code, 3))
{
return false;
}
// Restore register file
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &oldregs);
// Restart execution
if(!cont())
{
return false;
}
return true;
}
// This supports calling a dynamically loaded function with at most 2 parameters.
// Uses x64 calling convention: RDI for first param, RSI for second param
bool
PTraceSession::call_function(const std::string& library,
const std::string& symbol,
void* first_param,
void* second_param)
{
if(!m_attached)
{
ROCP_ERROR << "call_function called while not attached";
return false;
}
// Stop the process
if(!stop())
{
return false;
}
void* target_addr = nullptr;
if(!find_symbol(target_addr, library, symbol))
{
return false;
}
// Get entry address for safe injection of op codes
size_t entry_addr{0};
get_auxv_entry(m_pid, entry_addr);
// Save current register file
struct user_regs_struct oldregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &oldregs);
// Construct registers to call a function with 2 parameters
// symbol(first_param, second_param)
struct user_regs_struct newregs = oldregs;
newregs.rax = reinterpret_cast<size_t>(target_addr); // target function
newregs.rdi = reinterpret_cast<size_t>(first_param); // first parameter
newregs.rsi = reinterpret_cast<size_t>(second_param); // second parameter
newregs.rip = entry_addr;
newregs.rsp = oldregs.rsp - 128; // move sp by 128 to not clobber redlined functions
newregs.rsp -= (newregs.rsp % 16);
// x64 assembly to call a function by register and breakpoint when done
// ff d0 call rax
// cc int3
std::vector<uint8_t> new_code({0xff, 0xd0, 0xcc});
std::vector<uint8_t> old_code;
// Write in new opcodes
if(!swap(entry_addr, new_code, old_code, 3))
{
return false;
}
// Set syscall registers
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &newregs);
ROCP_TRACE << "Attempting to execute " << library << "::" << symbol << "(" << first_param
<< ", " << second_param << ")";
// Restart execution
if(!cont())
{
return false;
}
// Wait for int3 to be hit
if(waitpid(m_pid, nullptr, WSTOPPED) == -1)
{
return false;
}
// Get registers to see return values
struct user_regs_struct returnregs;
PTRACE_CALL(PTRACE_GETREGS, m_pid, NULL, &returnregs);
// Write in old opcodes
if(!write(entry_addr, old_code, 3))
{
return false;
}
// Restore register file
PTRACE_CALL(PTRACE_SETREGS, m_pid, NULL, &oldregs);
// Restart execution
if(!cont())
{
return false;
}
return true;
}
bool
PTraceSession::find_library(void*& addr, int inpid, const std::string& library)
{
std::stringstream searchname;
searchname << inpid << "::" << library;
// TODO: add this back
// if (target_library_addrs.find(searchname.str()) != target_library_addrs.end())
//{
// return target_library_addrs[searchname.str()];
//}
// uses "maps" file to find where library has been loaded in target process
// does not require this process to be attached
std::stringstream filename;
filename << "/proc/" << inpid << "/maps";
std::ifstream maps(filename.str().c_str());
if(!maps)
{
ROCP_ERROR << "Couldn't open " << filename.str();
return false;
}
std::string line;
while(std::getline(maps, line))
{
if(line.find(library) != std::string::npos)
{
ROCP_TRACE << "entry in pid " << inpid << " maps file is: " << line;
break;
}
}
if(!maps)
{
ROCP_ERROR << "Couldn't find library " << library << " in " << filename.str();
return false;
}
// NOLINTNEXTLINE(performance-no-int-to-ptr)
addr = reinterpret_cast<void*>(std::stoull(line, nullptr, 16));
// target_library_addrs[searchname.str()] = addr;
return true;
}
bool
PTraceSession::find_symbol(void*& addr, const std::string& library, const std::string& symbol)
{
auto searchname = std::stringstream{};
searchname << library << "::" << symbol;
if(auto itr = m_target_symbol_addrs.find(searchname.str()); itr != m_target_symbol_addrs.end())
{
ROCP_TRACE << "found symbol for " << searchname.str() << " at " << itr->second;
return itr->second != nullptr;
}
void* libraryaddr = nullptr;
void* symboladdr = nullptr;
// Load the library in our process to determine the offset of the requested symbol from the
// start address of the library
addr = nullptr;
libraryaddr = get_library_handle(library);
if(!libraryaddr)
{
ROCP_ERROR << "host couldn't dlopen " << library;
return false;
}
symboladdr = dlsym(libraryaddr, symbol.c_str());
if(!symboladdr)
{
ROCP_ERROR << "host couldn't dlsym " << symbol;
return false;
}
// Find the start address of the library in our process
void* hostlibraryaddr;
if(!find_library(hostlibraryaddr, getpid(), library))
{
ROCP_ERROR << "couldn't determine where " << library << " was loaded for host";
return false;
}
// Caluclate the offset
size_t offset =
reinterpret_cast<size_t>(symboladdr) - reinterpret_cast<size_t>(hostlibraryaddr);
ROCP_TRACE << "offset of " << symbol << " into " << library << " calculated as " << offset;
// Find the start address of the library in the target process
void* targetlibraryaddr;
if(!find_library(targetlibraryaddr, m_pid, library))
{
ROCP_ERROR << "couldn't determine where " << library << " was loaded for target";
return false;
}
// Calculate address of symbol in the target process using the offset
// NOLINTNEXTLINE(performance-no-int-to-ptr)
addr = reinterpret_cast<void*>(reinterpret_cast<size_t>(targetlibraryaddr) + offset);
m_target_symbol_addrs[searchname.str()] = addr;
ROCP_TRACE << "found symbol for " << searchname.str() << " at " << addr;
return true;
}
bool
PTraceSession::stop() const
{
if(!m_attached)
{
ROCP_ERROR << "stop called while not attached";
return false;
}
// Stop the process
PTRACE_CALL(PTRACE_INTERRUPT, m_pid, NULL, NULL);
// Wait for the stop
if(waitpid(m_pid, nullptr, WSTOPPED) == -1)
{
return false;
}
ROCP_TRACE << "ptrace stopped pid " << m_pid;
return true;
}
bool
PTraceSession::cont() const
{
if(!m_attached)
{
ROCP_ERROR << "cont called while not attached";
return false;
}
PTRACE_CALL(PTRACE_CONT, m_pid, NULL, NULL);
ROCP_TRACE << "ptrace resumed pid " << m_pid;
return true;
}
bool
PTraceSession::handle_signals() const
{
while(!m_detaching_ptrace_session.load())
{
int status{0};
if(waitpid(m_pid, &status, WNOHANG) == -1)
{
ROCP_ERROR << "waitpid failed in handle_signal for pid " << m_pid;
return false;
}
if(status != 0 && WIFEXITED(status))
{
ROCP_ERROR << "process " << m_pid << " exited, status=" << WEXITSTATUS(status);
return false;
}
else if(status != 0 && WIFSIGNALED(status))
{
ROCP_ERROR << "process " << m_pid << " killed by signal " << WTERMSIG(status);
return false;
}
else if(status != 0 && WIFSTOPPED(status))
{
auto sig = WSTOPSIG(status);
ROCP_TRACE << "process " << m_pid << "stopped by signal " << sig;
PTRACE_CALL(PTRACE_CONT, m_pid, NULL, sig);
}
std::this_thread::yield();
}
return true;
}
void
PTraceSession::detach_ptrace_session()
{
m_detaching_ptrace_session.store(true);
}
} // namespace attach
} // namespace rocprofiler