* attach: milestone: API tracing
- This pairs with another commit in rocprofiler-sdk to fully
function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet
* attach: cleanup
- Remove hardcode for loading of tool library
- Make invoke registration functions public again
* attach: proxy queue first draft
- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk
* attach: prestore overhaul
- Must be paired with commit in rocprofiler-sdk
* attach: add dispatch table rework
- Register will load the prestore library and provide entrypoints to sdk
* attach: formatting and cleanup
* attach: revise dispatch table scheme
* attach: formatting
* attach: milestone: API tracing
- This change must be paired with a change in rocprofiler-register to
fully function.
- API tracing works at this commit
- Queue tracing not supported yet
* attach: cleanup and comments
* attach: Formatting and crash fixes
* attach: add attach duration
- Add option attach-duration-msec for attachment
* Formatting + sglang hang fix via signal handling
* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues
* attach: proxy queue first draft
- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register
* Allow null agents for scratch output
* attach: improve queue library interface
- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect
* attach: add code_object support
- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements
* attach: rename queue library to prestore
* attach: prestore overhaul
- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
- Separates registrations for different object types
- Sets up future changes for initialization
* attach: add prestore dispatch table
- Removes linkage to prestore library from sdk
* attach: cleanup
* attach: formatting
* attach: fix input prompt not appearing
* attach: fix component name in cmake
* attach: revert change to export level
* Make prestore API public
* attach: update sdk attachment library WIP
- This commit is NONFUNCTIONAL
- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage
* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL
- Changes rocprofiler_register to handle dispatch table from attach
library.
- Still needs changes in SDK with dispatch table usage
* attach: dispatch table wip
- This commit is NONFUNCTIONAL
* attach: move attach component into core
* attach: rename to rocprofv3-attach
* attach: add callbacks for new queues and code objects
* attach: finish dispatch table implementation
- Fixes kernel tracing
* attach: add cmake variable for attachment support
* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests
- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration
Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities
* Documentation Update
* attach: make attach script callable
* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name
* attach: revert metrics library path changes
* Generic Attachment in Register (#942)
Remove tool references in register
* Add second param to attach call in rocprof register
* Add experimental reattachment support for ROCprofiler-SDK
This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:
**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests
**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:
1. Initial Attachment Flow:
- rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
- Full tool initialization with complete context setup
- Sets prev_attached atomic flag to track state
2. Reattachment Flow:
- rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
- Bypasses full re-initialization, calls client reattach callbacks instead
- Preserves existing contexts and buffers, only reactivates profiling services
**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability
**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts
**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles
**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization
**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts
**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities
**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution
**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration
The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.
* Fix misc clang-tidy warnings/errors
* CMake Option and Environment Variable Updates
- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->
* Source reorganization
* Formatting + new lines at EOF
* Fix flake8 F841: local variable is assigned to but never used
* Update attachment test
- get rid of 5 second start delay
- add roctx
* Rework implementation
- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst
* Handle re-attachment options
- inherit options from previous attachment
- check previous options do not modify data collection services
* Fix support for tools w/o rocprofiler_configure_attach
- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool
* attach: remove unknown agent handling
- Change was from earlier commit, no longer needed
* attach: add error for attaching without library loaded
* attach: revise version numbering
* attach: register header revisions
* attach: clang format register
* attach: formatting
* attach: fix build failure
- Remove cross dependency into rocprofiler-sdk, fixes build on some systems
* attach: revise register library detection
* Update rocprofiler-register and attach library
- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()
* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion
* Fix output support for rocprofiler-sdk-tool
* Fix formatting
* Fix clang tidy errors
* Misc rocprofiler-sdk-attach fixes
* attach: add sigint handling to attach python
* tool README.md formatting
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Fix buffered output issue
* attach: add errors for tool attach
* CI Fixes
* Rework tests
* attach: improve library loading in rocprofv3 attach
* formatting
* Update tests to use pytest framework
* Fix test_attachment_hsa_api_trace
* attach: catch ctypes exceptions
* attach: fix leak in registration
* attach: fix sanitizer tests
* attach: fix sanitizer tests further
* attach: disable attach asan tests
* attach: disable ubsan test
* attach: fix permissions in installed test package
* attach: formatting
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Using semaphore to sync with all peer processes in finalization stage
[rocprofv3] Implement synchronization using POSIX semaphore in finalization
* clang format code
* clang 11 format code
* Add process sync option for rocprofv3
* Default value of process sync is false
* Update source/lib/rocprofiler-sdk-tool/tool.cpp
Apply suggestion by Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* update according to comments
* add new line to helper.hpp
---------
Co-authored-by: Huanran Wang <huanrwan@amd.com>
Co-authored-by: Huanran Wang <huanran.wang@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Adding GPU index as a parameter for ATT
* Tidy fix
* Using tokenize
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
* Adding error logging. Using idx instead of id.
---------
Co-authored-by: Giovanni <gbaraldi@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
[ROCm/rocprofiler-sdk commit: fd6f96ffb5]
* Reverted header and field location for csv memory allocation and updated tests
* Updated example csv file and made small update
[ROCm/rocprofiler-sdk commit: 533a8329d8]
* Add sample data for avail and remove color code for non terminal output
* review comments
* review comments
* add documentation
* test fix
[ROCm/rocprofiler-sdk commit: 2447a85215]
* Arbitrary host-trap sampling skid (doc)
The host-trap PC sampling might introduce a skid of [0, 2]
instructions. We documented this information and provides
some advice to application developers how to find
hot-spots in the profiles generated by host-trap sampling.
[ROCm/rocprofiler-sdk commit: 650d35bdaa]
* addressing issues
* doc fix
* test fix
* fix
* fix formatting issue and doc update
* fix column size
* fix
* fix formatting in output
* tests fix
* test fix
* add new line
* add new line
* fix new line
* fixing typo in using-rocprofv3-avail.rst
[ROCm/rocprofiler-sdk commit: 3aaffc42da]
* Update docs with new info in kernel_trace csv output and add flag for csv in docs.
* Misc.
---------
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Vaddireddy, Sushma <Sushma.Vaddireddy@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
[ROCm/rocprofiler-sdk commit: 1c91774c6a]
* Convert YAML Format
Convert YAML format and reader to properly read the YAML.
Comparison between output's from the YAML show only changes in ordering
of architectures (and ids).
* Test fixes
* Add script for converting the YAML schema to source/scripts
* Update documentation
* Change the extra counter code block to YAML
* Add missing new line at EOF
* remove name issues
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 33e43e66d3]
* Adding Benchmarking Stg1
* config fix
* reset
* add jpeg and decode traces in iteration
* address comments benchmark config files.
* address comments.
* address comments.
* address comments: revert cntrl ctx.
* address comments: revert csv output.
* resolve merge conflits.
* format.
* build fix.
* fix hip runtime api traces.
* loop cb services.
* format.
* bug fix.
* Fix operator>
- public C++ comparison operator
* Update configuration options
- support selected regions (--selected-regions)
- support writing output config json (--output-config)
- update serialization data
* rocprofv3 tool library misc updates
- lambda for starting context
- support for writing config json
* Tool library updates
- Finished support for all benchmarking modes
- Added build spec support to config json
* Fix ROCPROFILER_SOVERSION
- this value should not be multiplied by 10,000
* Minor tweak to rocprofv3
* Benchmarking scripts
* formatting
* Fix duplicate include
* Add reproducible-dispatch-count test app
- used in benchmarking
* registration logging
- report number of registered contexts and active contexts after client initialization
* Serialize environment in rocprofv3 output config
* ROCPROFILER_BUILD_BENCHMARK CMake option
* Update benchmark SQL schema
- hash_id is text
- add md5sum to benchmarked_app
- remove app_id from benchmarked_sdk
- add sdk_id to benchmark_config
- separate hip_trace into hip_runtime_trace and hip_compiler_trace
- use INT instead of INTEGER for MySQL compatibility
- add count column in benchmark_statistics
- allow std_dev to be NULL in benchmark_statistics
* Update rocprofv3-benchmark.py
- use md5 instead of python hash (which includes random seed)
- use args.mysql_database
- compute md5sum of executable
- fix insert_benchmark_config
- marker trace fixes
- memory allocation fixes
- split hip_trace into hip_{runtime,compiler}_trace
- remove app_id from benchmarked_sdk
- support warmup runs
- count field in benchmark_statistics
* Support launcher and environment in YAML
* Update reproducible-dispatch-count.cpp
- support mode which doesn't use hip event timing
* Misc rocprofv3-benchmark.py updates
- fix some MySQL support
- remove some unnecessary logging
* support mysql db.
* Format.
* Updated SQL input files
- moved benchmark_schema.sql to benchmark_table.sql
- added benchmark_views.sql
- uses {{metric}} syntax for variable substitution
* cmake formatting
* update rocprofv3-benchmark.py
- benchmark config labels
- overhead views
* Encode rocprofv3-benchmark PID in rocprofv3 and timem output files
* Minor tweak to benchmark_views.sql
- include count
- reorder fields for readability
* split statements and use IS if values is NONE.
* use backtick instead of double quotes and add IS before NOT NULL.:
* Adding Mandelbrot Benchmark App
* Adding Dockerfile example
* Update dockerfile
* Update dockerfile
* [SDK] rocprofiler_query_external_correlation_id_request_kind_name
* Execution-profile benchmark mode
* Execution profile SQL support
* Rename mandlebrot folder + misc clang-tidy
* [rocprofv3-benchmark] Execution profile support
* Update installation
* add work dir when setting git revision, useful when building outside src.
* Set FULL_VERSION_STRING and ROCPROFILER_SDK_GIT_REVISION
- when benchmark folder is top-level
* Remove unused python packages from requirements.txt
* Use ldd/pyelftools to include linked libs for md5sum
- also add --filter-benchmark and --filter-rocprofv3 options
- support labeling the rocprofv3 options
- use more argparse groups
- more generic application of filters
- support variable substitution in environment, e.g. PATH=/some/path:$PATH
* Environment improvements
- improve reproducibility when env set via input file vs. shell
- support "environment-ignore" to remove environment variables
* Misc formatting
* Misc. fix
* use backticks for defining new columns name
* Support shuffling the order of benchmark modes/rocprofv3 args
* Address review comments
* Update Dockerfile
- rename to Dockerfile
- reduce to one layer
* Support docker build arg BRANCH
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 6f17da7ade]
* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: e626df43eb]
* [rocprofv3] Use -P for collection period option
- Reserve -p for profiler attachment
* Update changelog
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: d2bde3ce27]
* rocprofv3: LD_PRELOAD for signal and sigaction
- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)
* [DO NOT COMMIT] extra logging
* Switch git submodule url for perfetto
- use GitHub URL as this is more accessible
* Update ring_buffer<Tp>
- account for alignment padding
* Update buffered_output
- track number of bytes stored
- add nullptr checks
* Update tmp_file_buffer
- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist
* Update tmp_file
- add exists member function for checking whether temporary file already exists
- tweak remove() implementation
* Update config.hpp
- add option to enable/disable signal handlers
- add option for minimum_output_bytes
* Make signal, sigaction functions visible
* rocprofv3 tool updates
- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes
* Add commandline support
* fixing test
* minor fix
* minor fix
* fix clang issue
* fix
* Adding docs
* review comments
* review changes
* review
* YUV pulldown additions to rocdecode
* More rocdecode changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 87badfbd15]
* MI300 Stochastic PC Sampling Documentation
* Stochastic PC sampling title renaming
---------
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
[ROCm/rocprofiler-sdk commit: 96a0ef244f]
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing
* Updated perfetto to output args individually rather than as a string list
* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type
* Updated tests for review comments
* Test args exist and return value
* Updated to use string entry
* Change function name
* Updated PR to reflect review comments
* Updated for PR review comments
* Change function name
[ROCm/rocprofiler-sdk commit: 077723337a]
- Re-ran pip-compile on source/docs/sphinx/requirements.in
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 2061c52817]
* Make sure all structs/enums can be forward declared
* Updates to counter collection
- consistency updates and cleanup
* Conversion of dimension information to info struct
* Added deprecated folder
* Testing changes
* merge changes
* Fix shadowed variable
* Source code formatting
* Fix shadowed variable
* Update rocprofiler_counter_info_v1_t member names
* Split version.h into version.h and ext_version.h
- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision
* profile_config -> counter_config
* EOF new line
* [Samples] Reduce header includes + reorg counter collection samples
* Misc compilation fixes
- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables
* Minor misc modifications
- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
- move local anon namespace functions into rocprofiler::counters:: anon namespace
- use std::string_view for get_static_string
- const ref for get_static_ptr
- misc namespace shortening
* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t
- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet
* [Tests] Improve async-copy-testing test
- relax constraints
- improve logging
* Update counter_config.h doxygen docs
* ROCPROFILER_SDK_BETA_COMPAT
- ppdef which helps with renaming when set to 1
* Remove spurious include
* Fix includes for cxx/version.hpp
* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet
* Public API Experimental Designation
- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries
* Fix use of assert instead of static_assert in hip/stream.cpp
* Use typedef instead of define for rocprofiler_profile_config_id_t
* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef
- added <rocprofiler-sdk/deprecated/profile_config.h>
* Doxygen for rocprofiler_{create,destroy}_profile_config
* ROCPROFILER_SDK_DEPRECATED_WARNINGS
* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1
* cmake formatting
* Misc variable renaming in samples and tests
* Fix declarations of types
* Fix hip stream tracing service struct name
- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t
* Rename "HIP_STREAM_API" to "HIP_STREAM"
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 4cd121e27b]