* [rocprofiler-sdk] Fix fmt::join build errors
- remedy use of fmt::join without include <fmt/ranges.h>
* include memory header
* Disable FMT build for SDK CI
* Add -DROCPROFILER_BUILD_FMT=OFF to sanitizer steps
* Add temporary workaround for rccl.h issue
* Add ROCPROFILER_INTERNAL_RCCL_API_TRACE to SDK CI builds
* disable clang-tidy for vendored includes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jbonnell-amd <jason.bonnell@amd.com>
* Fix dimension mismatch for multi-GPU systems with identical architectures
This change addresses an issue where counter dimensions were incorrectly
shared across all GPU agents with the same architecture name, even when
those agents had different hardware configurations (e.g., different CU counts).
Changes:
- Updated getBlockDimensions() to accept agent ID instead of architecture name
- Made dimension cache agent-specific instead of architecture-specific
- Updated set_dimensions() in AST evaluation to use specific agent ID
- Modified all API functions to handle agent-specific dimension lookups
- Updated tests to work with agent-specific dimensions
This fix ensures that dimensions accurately reflect the actual hardware
configuration of each individual GPU agent, preventing dimension mismatches
in multi-GPU systems where GPUs share the same architecture but have
different physical configurations.
Counter ID Representation Changes:
- Modified counter_id encoding to include agent information in bits 37-32
- Agent logical_node_id is encoded as (value + 1) to ensure agent 0 is detectable
- Counter records internally store only 16-bit base metric IDs (bits 15-0)
- Tool reconstructs agent-encoded counter IDs from base metric ID & agent info
- Instance record counter_id field uses bitwise AND mask to extract base metric ID
(counter_id.handle & 0xFFFF) to fit in 16-bit storage
- Output generators (CSV, JSON, Perfetto) use agent-encoded IDs for consistency
- Updated counter_config.cpp and metrics.cpp to extract base metric ID when needed
- All counter lookups now properly handle agent-encoded vs base metric IDs
This ensures counter IDs are consistent between metadata and output records while
maintaining compact storage in instance records.
* Initial consecutive kernel WIP
* Updated logic after discussion, create context only when needed, change set of captured ids to dispatch_id_t type
* Updated to fix concurrency issues and revert kernel_iterations
* Add captured id in first lock capture
* Updated code to use wlock, added comments, removed some unecessary atomic
* Cleaned up, need to add test
* Add test to check that generated stats csv file is not empty
* Updated test to check if vector-ops kernels are being used
* Fix phase bug
* Updated for comments
* Flattened ATT logic a bit
* Fix incorrect if-statement
* Fix merge conflict
* attach: milestone: API tracing
- This pairs with another commit in rocprofiler-sdk to fully
function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet
* attach: cleanup
- Remove hardcode for loading of tool library
- Make invoke registration functions public again
* attach: proxy queue first draft
- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk
* attach: prestore overhaul
- Must be paired with commit in rocprofiler-sdk
* attach: add dispatch table rework
- Register will load the prestore library and provide entrypoints to sdk
* attach: formatting and cleanup
* attach: revise dispatch table scheme
* attach: formatting
* attach: milestone: API tracing
- This change must be paired with a change in rocprofiler-register to
fully function.
- API tracing works at this commit
- Queue tracing not supported yet
* attach: cleanup and comments
* attach: Formatting and crash fixes
* attach: add attach duration
- Add option attach-duration-msec for attachment
* Formatting + sglang hang fix via signal handling
* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues
* attach: proxy queue first draft
- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register
* Allow null agents for scratch output
* attach: improve queue library interface
- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect
* attach: add code_object support
- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements
* attach: rename queue library to prestore
* attach: prestore overhaul
- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
- Separates registrations for different object types
- Sets up future changes for initialization
* attach: add prestore dispatch table
- Removes linkage to prestore library from sdk
* attach: cleanup
* attach: formatting
* attach: fix input prompt not appearing
* attach: fix component name in cmake
* attach: revert change to export level
* Make prestore API public
* attach: update sdk attachment library WIP
- This commit is NONFUNCTIONAL
- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage
* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL
- Changes rocprofiler_register to handle dispatch table from attach
library.
- Still needs changes in SDK with dispatch table usage
* attach: dispatch table wip
- This commit is NONFUNCTIONAL
* attach: move attach component into core
* attach: rename to rocprofv3-attach
* attach: add callbacks for new queues and code objects
* attach: finish dispatch table implementation
- Fixes kernel tracing
* attach: add cmake variable for attachment support
* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests
- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration
Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities
* Documentation Update
* attach: make attach script callable
* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name
* attach: revert metrics library path changes
* Generic Attachment in Register (#942)
Remove tool references in register
* Add second param to attach call in rocprof register
* Add experimental reattachment support for ROCprofiler-SDK
This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:
**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests
**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:
1. Initial Attachment Flow:
- rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
- Full tool initialization with complete context setup
- Sets prev_attached atomic flag to track state
2. Reattachment Flow:
- rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
- Bypasses full re-initialization, calls client reattach callbacks instead
- Preserves existing contexts and buffers, only reactivates profiling services
**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability
**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts
**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles
**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization
**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts
**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities
**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution
**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration
The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.
* Fix misc clang-tidy warnings/errors
* CMake Option and Environment Variable Updates
- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->
* Source reorganization
* Formatting + new lines at EOF
* Fix flake8 F841: local variable is assigned to but never used
* Update attachment test
- get rid of 5 second start delay
- add roctx
* Rework implementation
- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst
* Handle re-attachment options
- inherit options from previous attachment
- check previous options do not modify data collection services
* Fix support for tools w/o rocprofiler_configure_attach
- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool
* attach: remove unknown agent handling
- Change was from earlier commit, no longer needed
* attach: add error for attaching without library loaded
* attach: revise version numbering
* attach: register header revisions
* attach: clang format register
* attach: formatting
* attach: fix build failure
- Remove cross dependency into rocprofiler-sdk, fixes build on some systems
* attach: revise register library detection
* Update rocprofiler-register and attach library
- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()
* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion
* Fix output support for rocprofiler-sdk-tool
* Fix formatting
* Fix clang tidy errors
* Misc rocprofiler-sdk-attach fixes
* attach: add sigint handling to attach python
* tool README.md formatting
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
* Fix buffered output issue
* attach: add errors for tool attach
* CI Fixes
* Rework tests
* attach: improve library loading in rocprofv3 attach
* formatting
* Update tests to use pytest framework
* Fix test_attachment_hsa_api_trace
* attach: catch ctypes exceptions
* attach: fix leak in registration
* attach: fix sanitizer tests
* attach: fix sanitizer tests further
* attach: disable attach asan tests
* attach: disable ubsan test
* attach: fix permissions in installed test package
* attach: formatting
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Remove config checks for stream and kernel rename data collection
* Updated csv generation to check if kernel rename is on before calling get_kernel_name
* Update metadata to use kernel_rename bool argument
* Formatting + unconditionally store kernel name in rocpd
* Readded kernel rename parameter after rebase
* Fixed rebase conflicts
* Updated comment in line with github comments
* Added check in rocpd csv.cpp to output kernel name if region name is empty
* Add test for kernel rename
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
* Removing regex from the tool
* Adding alternative for regex regarding handling
* Adding ROCpd
* Removing regex include
* Apply suggestion from @jomadsen_amdeng
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Apply suggestion from @jomadsen_amdeng
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Apply suggestion from @jomadsen_amdeng
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Adding Standalone Regex Header File
* Fixing Regex to handle grouping and
* Fixing Regex to handle grouping and
* Fixing Regex to handle grouping and
* Formatting Fix
* Update rocprofiler-sdk-restrictions.yml
* Separating regex.hpp to source and header & Adding Tests for parity with std::regex
* Update regex.cpp
* Using snake_case for naming and addressing some comments
* Adding more tests & README for regex implementation
* Updating rocprofiler sdk restrictions workflow
* Updating more tests & README for regex implementation
* Update README_regex.md
* Rename README_regex.md to README.md
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
* Using semaphore to sync with all peer processes in finalization stage
[rocprofv3] Implement synchronization using POSIX semaphore in finalization
* clang format code
* clang 11 format code
* Add process sync option for rocprofv3
* Default value of process sync is false
* Update source/lib/rocprofiler-sdk-tool/tool.cpp
Apply suggestion by Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* update according to comments
* add new line to helper.hpp
---------
Co-authored-by: Huanran Wang <huanrwan@amd.com>
Co-authored-by: Huanran Wang <huanran.wang@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Adding GPU index as a parameter for ATT
* Tidy fix
* Using tokenize
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
* Adding error logging. Using idx instead of id.
---------
Co-authored-by: Giovanni <gbaraldi@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
[ROCm/rocprofiler-sdk commit: fd6f96ffb5]
* expose dimensional info in rocprofiler_counter_info_v1_t.
* add counter_id in dim info.
* address review comments
* format.
* address comments.
* use array of pointers for dimensions_instaces.
* format and comments.
* address comments.
* new line.
* Update counter_defs.yaml
* Update counter_defs.yaml
* Update counter_defs.yaml
* counter_defs.
* format counter defs.
* format counter defs.
* format counter defs.
* show only counters being profiled in metadata.
* Format.
* use config for counters and fix warnings.
* add version for rocprofiler_counter_dimension_info_v1_t struct.
* rename rocprofiler_counter_record_dimension_instance_v1_info_t.
* account device id from pmc for counters metadata.
* move dim structs to counters.h.
* address comments to compare value.
* fix tests.
* Address comments. use pointer of arrays for ABI.
* rebase.
* fix build error.
* use separate metadata::init() for rocprofv3.
* also print not found counters.
* precompute all the perf counters needed to be in metadata.
* Misc.
* format
* Format.
* rocprofiler::sdk::container::c_array
* Address comments.
* source/lib/output/metadata.cpp
* lint.
* add unit test for c_array.
* add unit test and serialization support for c_array container.
* Misc.
* Clean files.
* Format.
* clang-tidy.
* add more checks to c_array.
* misc. typo
* Addr comments.
---------
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: bf0fad1d54]
* Add sample data for avail and remove color code for non terminal output
* review comments
* review comments
* add documentation
* test fix
[ROCm/rocprofiler-sdk commit: 2447a85215]
* [SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges
Range extent to capture all work between roctxpush/pop operations. Entry callback takes place during roxtxpush and exit callback takes place in roctxpop. This is primarily to allow us to keep an ancestor id on the ancestor stack such that all operations that take place within the push/pop context can be annotated as being apart of this range. With the current setup (where push and pop are two separate operations that need to be combined externally), we cannot keep an ancestor id on the stack and thus cannot tie tracing events to particular ranges.
Correlation id information is inherited from the push operation. Ancestor id needs to be added in a future commit that also outputs this ancestor to CSV.
Output:
```
[ctest] {'size': 64, 'kind': 7, 'operation': 1, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479402642, 'end_timestamp': 2932551491178449, 'thread_id': 3254861}
[ctest] {'size': 64, 'kind': 8, 'operation': 2, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479405878, 'end_timestamp': 2932551491181214, 'thread_id': 3254861}
```
Note: Kind 8 = range extent op.
* Merge fix
Revert several changes
source/lib/rocprofiler-sdk/marker/range_marker.*
- separate out range marker implementation for standard marker implementation
Update public API with marker core range
Support marker core range in sdk (source/lib/rocprofiler-sdk)
Transition rocprofiler-sdk-tool and output lib to use marker core range
Misc fixes for tests
Fix logic in lib/output/generate{CSV,Stats}.cpp
Update tests/rocprofv3/tracing-hip-in-libraries (marker validation)
Fix test_otf2_data
* Test fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 2c4e20b951]
* Added null check for stream_stack before get_stream_id is called
* Rename function and add check for stream strack before pop
* Removed empty check for stream stack and adding error log for get_stream_id in stream.cpp
[ROCm/rocprofiler-sdk commit: 0904b6e34d]
* [CI] Testing Stability
- CMake option ROCPROFILER_DISABLE_UNSTABLE_CTESTS
- used for tests which periodically fail around 1 out of every 10 runs
- set to ON while instability remains, this needs to set to OFF in ROCm 7.1 or, ideally, ROCm 7.0.1
- Use FIXTURES_SETUP and FIXTURES_REQUIRED for some tests
- replace "threw an exception" with "${ROCPROFILER_DEFAULT_FAIL_REGEX}" for misc FAIL_REGULAR_EXPRESSIONS
* Remove contents of all EXCLUDE_{TESTS,LABEL}_REGEX from CI workflow
* Disable patch git step in code-coverage run
* Tweak spin time of reproducible runtime
* Removed patch git step in code-coverage run
* Update ROCPROFILER_DEFAULT_FAIL_REGEX
* Mark test-counter-collection tests as unstable
- add fixtures setup/required
* Remove ATTACHED_FILES_ON_FAIL
- CDash doesn't store enable downloading these properly anyway
* Relax collection-period fuzzing window
* Disable unstable collection-period test
- too unstable
* formatting
* Disable unstable device_counting_service_test.async_counters
* Suppress perfetto internal data race errors
* Switch code-coverage CI jobs to mi300 runner
* Timeout increases
* rocprofv3-test-rocpd updates
- add fixtures
- switch executable
- redefine input/output paths
* Revert code-coverage job to mi300a runner
* Update rocprofv3-test-rocpd-execute-multiproc
- reduce problem size
* disable multiproc rocpd
* Split code-coverage into separate workflow
- network issues cause this job to fail frequently
- when in a separate workflow, it can be restarted easily
* Fixtures for rocprofv3-test-trace-hip-in-libraries
* Disable unstable device_counting_service_test.sync_counters
* Potential fix for code scanning alert no. 171: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Switch code-coverage to run on rocprof-azure
- mi300a EMU runner set is unstable (network issues)
* tests/rocprofv3/pc-sampling SKIP_REGULAR_EXPRESSION
* Update rocprofv3-test-list-avail-trace-execute
- reduce log level and increase timeout
* rocprofv3: Prevent recursive call to rocprofv3_error_signal_handler + log chaining
* rocprofv3: Use ROCP_ERROR + std::exit instead of ROCP_FATAL
- should help with SKIP_REGULAR_EXPRESSION
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit: 640ca55ac0]
* Modified perfetto output for HIP stream display
* Moved stream_map file location and changed perfetto output names Private_Segment_Size and Group_Segment_Size to Scratch_Size and LDS_Block_Size respectively
* Used const_cast to remove const modifier on void*
* Reverted stream_map changes, now using tool_metadata map to track mapping between stream ptrs and stream IDs
* Removed buffer tracing args in perfetto, added tool_...hip buffer record struct that stores the HIP stream ID for display purposes
* Updated rocpd perfetto.cpp to reflect stream changes. Still need to add vgpr values and stream ID for HIP API
* Changes pass-by const reference to pass-by const value
[ROCm/rocprofiler-sdk commit: 1f8b8c5e9f]
* Updated kernel rename service to use internal correaltion IDs for external correlation IDs and kernel rename values
* Updated for review comments
* Changed if-condition in generateJSON.cpp to check if string view is empty before calling get_entry
[ROCm/rocprofiler-sdk commit: afb27f3f1a]
* [rocprofv3] Add rocpd output support (part 1: prelude)
- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
- md5sum
- hasher
- simple_timer
- static_tl_object
- get_process_start_time_ns(pid_t)
- output library updates
- node_info
- file_generator (generator is now virtual base class)
- stream info updates
* Added submodules
* Code review updates
* Minor unused-but-set-X warning fixes
* Update CI
- install libsqlite3-dev package
* Update CI
- install libsqlite3-dev package
* Fix static thread-local object memory leak
- also fix signal handler chaining
* Remove URL from comment
* Remove page migration exception
* Enable ROCPROFILER_BUILD_SQLITE3 by default
- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON
* Fix gotcha installation
- make install of target optional
* Validate tracing + counter collection dispatch data
- i.e. correlation ids, thread ids, timestamps
* Make find_package(SQLite3) optional
- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)
* Fixes to tracing + counter collection test
* get_process_start_time_ns update
- original implementation did not work
* Fix pytest-packages test_perfetto_data for counter collection
- erroneous failure when used with same PMC + multiple agents
* cmake policy: option() honors normal variables
- for GOTCHA submodule
* Improve samples/api_buffered_tracing stability
- reduce likelihood of sporadic exception throw
* Update gotcha submodule
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 7166b1ab58]
* Add trace decoder to API.
* Cleanup and activity
* Rename
* Minor fix
* Replace tt/TT with thread_trace/THREAD_TRACE
- public API types are not abbreviated
* Fix aliases
* Build system updates
- activate clang-tidy for all subfolders in lib
- fix addition of sources for att-tool
* Fix clang-tidy issues with lib/att-tool/counters.{hpp,cpp}
* Delete counters.cpp
* Formatting
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 65786f619d]
* Adding Benchmarking Stg1
* config fix
* reset
* add jpeg and decode traces in iteration
* address comments benchmark config files.
* address comments.
* address comments.
* address comments: revert cntrl ctx.
* address comments: revert csv output.
* resolve merge conflits.
* format.
* build fix.
* fix hip runtime api traces.
* loop cb services.
* format.
* bug fix.
* Fix operator>
- public C++ comparison operator
* Update configuration options
- support selected regions (--selected-regions)
- support writing output config json (--output-config)
- update serialization data
* rocprofv3 tool library misc updates
- lambda for starting context
- support for writing config json
* Tool library updates
- Finished support for all benchmarking modes
- Added build spec support to config json
* Fix ROCPROFILER_SOVERSION
- this value should not be multiplied by 10,000
* Minor tweak to rocprofv3
* Benchmarking scripts
* formatting
* Fix duplicate include
* Add reproducible-dispatch-count test app
- used in benchmarking
* registration logging
- report number of registered contexts and active contexts after client initialization
* Serialize environment in rocprofv3 output config
* ROCPROFILER_BUILD_BENCHMARK CMake option
* Update benchmark SQL schema
- hash_id is text
- add md5sum to benchmarked_app
- remove app_id from benchmarked_sdk
- add sdk_id to benchmark_config
- separate hip_trace into hip_runtime_trace and hip_compiler_trace
- use INT instead of INTEGER for MySQL compatibility
- add count column in benchmark_statistics
- allow std_dev to be NULL in benchmark_statistics
* Update rocprofv3-benchmark.py
- use md5 instead of python hash (which includes random seed)
- use args.mysql_database
- compute md5sum of executable
- fix insert_benchmark_config
- marker trace fixes
- memory allocation fixes
- split hip_trace into hip_{runtime,compiler}_trace
- remove app_id from benchmarked_sdk
- support warmup runs
- count field in benchmark_statistics
* Support launcher and environment in YAML
* Update reproducible-dispatch-count.cpp
- support mode which doesn't use hip event timing
* Misc rocprofv3-benchmark.py updates
- fix some MySQL support
- remove some unnecessary logging
* support mysql db.
* Format.
* Updated SQL input files
- moved benchmark_schema.sql to benchmark_table.sql
- added benchmark_views.sql
- uses {{metric}} syntax for variable substitution
* cmake formatting
* update rocprofv3-benchmark.py
- benchmark config labels
- overhead views
* Encode rocprofv3-benchmark PID in rocprofv3 and timem output files
* Minor tweak to benchmark_views.sql
- include count
- reorder fields for readability
* split statements and use IS if values is NONE.
* use backtick instead of double quotes and add IS before NOT NULL.:
* Adding Mandelbrot Benchmark App
* Adding Dockerfile example
* Update dockerfile
* Update dockerfile
* [SDK] rocprofiler_query_external_correlation_id_request_kind_name
* Execution-profile benchmark mode
* Execution profile SQL support
* Rename mandlebrot folder + misc clang-tidy
* [rocprofv3-benchmark] Execution profile support
* Update installation
* add work dir when setting git revision, useful when building outside src.
* Set FULL_VERSION_STRING and ROCPROFILER_SDK_GIT_REVISION
- when benchmark folder is top-level
* Remove unused python packages from requirements.txt
* Use ldd/pyelftools to include linked libs for md5sum
- also add --filter-benchmark and --filter-rocprofv3 options
- support labeling the rocprofv3 options
- use more argparse groups
- more generic application of filters
- support variable substitution in environment, e.g. PATH=/some/path:$PATH
* Environment improvements
- improve reproducibility when env set via input file vs. shell
- support "environment-ignore" to remove environment variables
* Misc formatting
* Misc. fix
* use backticks for defining new columns name
* Support shuffling the order of benchmark modes/rocprofv3 args
* Address review comments
* Update Dockerfile
- rename to Dockerfile
- reduce to one layer
* Support docker build arg BRANCH
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 6f17da7ade]
* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: e626df43eb]
* Fix code coverage workflow
* Relocate rocprofv3 conversion test script + rename tests
- these are rocprofv3 tests and were not properly located and not properly named
* Fix thread sanitizer
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 4f03ebc360]
* rocprofv3: LD_PRELOAD for signal and sigaction
- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)
* [DO NOT COMMIT] extra logging
* Switch git submodule url for perfetto
- use GitHub URL as this is more accessible
* Update ring_buffer<Tp>
- account for alignment padding
* Update buffered_output
- track number of bytes stored
- add nullptr checks
* Update tmp_file_buffer
- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist
* Update tmp_file
- add exists member function for checking whether temporary file already exists
- tweak remove() implementation
* Update config.hpp
- add option to enable/disable signal handlers
- add option for minimum_output_bytes
* Make signal, sigaction functions visible
* rocprofv3 tool updates
- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes
* Add commandline support
* fixing test
* minor fix
* minor fix
* fix clang issue
* fix
* Adding docs
* review comments
* review changes
* review
* YUV pulldown additions to rocdecode
* More rocdecode changes
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 87badfbd15]
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing
* Updated perfetto to output args individually rather than as a string list
* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type
* Updated tests for review comments
* Test args exist and return value
* Updated to use string entry
* Change function name
* Updated PR to reflect review comments
* Updated for PR review comments
* Change function name
[ROCm/rocprofiler-sdk commit: 077723337a]
* Make sure all structs/enums can be forward declared
* Updates to counter collection
- consistency updates and cleanup
* Conversion of dimension information to info struct
* Added deprecated folder
* Testing changes
* merge changes
* Fix shadowed variable
* Source code formatting
* Fix shadowed variable
* Update rocprofiler_counter_info_v1_t member names
* Split version.h into version.h and ext_version.h
- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision
* profile_config -> counter_config
* EOF new line
* [Samples] Reduce header includes + reorg counter collection samples
* Misc compilation fixes
- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables
* Minor misc modifications
- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
- move local anon namespace functions into rocprofiler::counters:: anon namespace
- use std::string_view for get_static_string
- const ref for get_static_ptr
- misc namespace shortening
* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t
- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet
* [Tests] Improve async-copy-testing test
- relax constraints
- improve logging
* Update counter_config.h doxygen docs
* ROCPROFILER_SDK_BETA_COMPAT
- ppdef which helps with renaming when set to 1
* Remove spurious include
* Fix includes for cxx/version.hpp
* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet
* Public API Experimental Designation
- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries
* Fix use of assert instead of static_assert in hip/stream.cpp
* Use typedef instead of define for rocprofiler_profile_config_id_t
* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef
- added <rocprofiler-sdk/deprecated/profile_config.h>
* Doxygen for rocprofiler_{create,destroy}_profile_config
* ROCPROFILER_SDK_DEPRECATED_WARNINGS
* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1
* cmake formatting
* Misc variable renaming in samples and tests
* Fix declarations of types
* Fix hip stream tracing service struct name
- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t
* Rename "HIP_STREAM_API" to "HIP_STREAM"
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: 4cd121e27b]
* [SDK][rocprofv3] HIP API buffer records with args (ext)
- New buffer tracing domain(s) for HIP APIs which include the arguments and the return value in the buffer records
- Update HIP stream support for extended HIP buffer tracing
- Update rocprofv3 tool library and output library to use extended HIP buffer tracing recods
* Update stream.cpp
- handle hipStream_t address being reused for a new stream
* Update doxygen docs for rocprofiler_iterate_buffer_tracing_record_args
* Update rocprofv3 tool.cpp
- configure buffer tracing services with HIP_*_API_EXT variants
- tweak logging level for hip_stream_display_callback
* Fix validation tests
- add HIP_RUNTIME_API_EXT and HIP_COMPILER_API_EXT to valid domain names
* Serialization support for buffer tracing args
* Disable stream service for __hipPopCallConfiguration
- this is interpreted as a stream create but it doesn't create a stream
* Fix execute_buffer_record_emplace for HIP extended contexts
* Add uint64_t_retval to rocprofiler_hip_api_retval_t union
- reading in hipError_t_retval during serialization of pointer return value causes undefined behavior
* Fix compilation warning about unused but set parameter
- in hip/stream.cpp
* Add synchronization for async_copy_data
* Fix compilation error
* Fix compilation error
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: e33dff7ad0]
* MI300 Stochastic PC sampling SDK API implementation
* ROCProfV3: Stochastic PC sampling Support (#94)
* ROCProfV3: MI300 Stochastic PC sampling initial draft
* ROCProfV3: Initial Stochastic PC sampling Tests (#95)
ROCProfV3: Initial Stochastic PC sampling tests
* Update rocprofiler_pc_sampling_record_stochastic_v0_t
- update doxygen docs for members
- replace rocprofiler_correlation_id_t with rocprofiler_async_correlation_id_t
* Relax the check in JSON tests
* drain PC sampling buffer during finalize_rocprofv3
* Increase timeout for "Test Install Build" step
- 10 minutes -> 20 minutes
- "Test Installed Packages" has 20 minutes so "Test Install Build" should also
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 49ce79a5b5]
* Add Stack IDs
* Add memcpy test
* Add async corr id record
* Async events use `rocprofiler_async_correlation_id_t`
* Sync events use `rocprofiler_correlation_id_t`
* Update ATT to use asnyc IDs
* Review comments
[ROCm/rocprofiler-sdk commit: f27f76716e]
* rocprofiler_stream_id_t: opaque handle for a stream
- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
- rocprofiler_buffer_tracing_hip_api_record_t
- rocprofiler_buffer_tracing_memory_copy_record_t
- rocprofiler_callback_tracing_hip_api_data_t
- rocprofiler_callback_tracing_memory_copy_data_t
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: ccd1e54293]
Adds iteration based multiplexing to counter collection. Counter groups can now be specified. These counter groups are collected on a device individually until a specified interval period is reached. When the interval is reached, the next counter group is set to be collected on subsequent kernel executions.
Supplies two new argument types that can be included in YAML/JSON inputs:
pmc_groups: an array of arrays containing the counter groups to run (i.e. [ ["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"])
pmc_group_interval: the number of kernel invocations on a GPU of a group before rotating to the next group
Note: originally there was a random_seed_generator proposed in the linked ticket, that was not implemented since there are very few instances where you would want the selection of the groups to be randomly generated (and if you do, you can randomly generate the pattern and place it as a large list of groups in pmc_group).
All existing counter functionality should be preserved (selection of counters on specific devices only, profiling of only specific kernels, etc).
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit: aa88dd44c7]
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Support for rocJPEG API Trace
* Added newline to rocjpeg_version.h
* json-tool code added, initial test/bin commit
* Formatting
* Resolved rocjpeg bin test compilation errors
* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed
* Formatting and compilation errors
* Minor fixes
* Copyright year update and minor fixes
* Doc update fix
* Added rocjpeg csv file in data
* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes
* Added rocdecode and rocjpeg to CI
* Removed rocdecode and rocjpeg from CI and added back build tests option
* Updated Cmake Files
* Added rocDecode and rocJPEG to CI
* Remove cmake line added in error
* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes
* Added find_package for test
* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path
* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests
* Resolve merge conflicts and formatting
* Added regex find and replace instead of include for CI
* VAAPI package causing errors on Vega20
* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved
* Removed workflows regex
* Formatting and minor test modification
* Modified test for vega20
* Update rocDecode and rocJPEG cmake and tests
* Changelog
* Fix merge conflict
* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing
* Removed if found statements, replaced with TARGET:EXISTS
* Skip json file for rocjpeg and rocdecode tests if not supported
* Add os import
---------
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 31fe8858d1]
* rocprofv3: suppress agent info when no data collected
* Update output config serialization
- full serialization of output configuration
* Update rocprofiler-sdk-att/tests
- add version and soversion
- change output directory
- generate libatt_decoder_summary
- disable tests instead of removing them
* Update rocprofv3 command-line
- make --att-library-path hidden by default
- simplify check_att_capability
- reorder pc sampling options
- add hidden --echo option
- remove ROCPROF_LIST_AVAIL_TOOL_LIBRARY from preload
* Add new rocprofv3 tests for specify the ATT library path
* Tweak to rocprofv3-test-hsa-multiqueue-att tests
* Update rocprofv3 tool to enable output with att
* Fix standalone test installation
* Revert to fetchcontent_makeavailable to fetchcontent_populate
* Revert tests/common/CMakeLists.txt
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit: 59b41ab5aa]
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Merge conflict error
* CMake files changed in response to review comments. Attempting to implement callbacks.
* Turned off test building for rocdecode
* Minor fixes for review comments
* Review comments
* Updated formatting
* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.
* Remove iterate args support
* Remove iterate-args
* enforce abi versioning in macro if
* Fix doc error
* removed spaces to fix indentation error
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit: e307b89ca4]
If a user requests PC sampling on a system that does not support this feature,
report a fatal error message and stop executing the program.
[ROCm/rocprofiler-sdk commit: 0ce75c1043]