* Enable queue interception with scratch reporting
Scratch reporting reports agent ID in buffer and callback records, but
HSA runtime provides only queue ID in the scratch callback.
This change enables queue interception when scratch reporting is requested
* Validation test for rocprofv3 + scratch-memory-trace
* Simplify checks for whether context is tracing a domain
* Update changelog
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* [Draft]: Add support for RCCL tracing
Address comments
* [Draft]: Add support for RCCL tracing
Address PR comments, changes from RCCL upstream
* Add RCCL library table registration
Working on adding support to rocprofiler-register
* Support compilation w/o <rccl/amd_detail/api_trace.h>
- dummy api_trace.h header
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED when RCCL does not have api_trace.h header
* RCCL API tracing tool support
- add to rocprofv3
- add to json-tool
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* LD_PRELOAD librocprofiler-sdk-roctx.so when marker-trace enabled
- this enables apps to link against old ROCTx (libroctx64.so) but get marker tracing in rocprofv3
* Update CHANGELOG
* Validation test for app linked to old (roctracer) ROCTx library
* Tweak scope of tool_counter_info
- causing "signal-unsafe call inside of a signal" error for ThreadSanitizer on mi200
* Fix handling of missing transpose-roctracer-roctx
* Disable rocprofv3 aborted-app test (ThreadSanitizer)
- ThreadSanitizer + mi200/mi300 + aborted-app results in a signal-unsafe call inside a signal that cannot be specifically suppressed as usual via rocprofv3_error_signal_handler for some unknown reason
* Add UndefinedBehaviorSanitizer job
* Fix -d option broken by hostname
* Fix rocprofv3 output filename containing directory
* Fix TID handling in Perfetto and OTF2 output
* Revert changes which removed hostname
* Revise tests/rocprofv3/tracing output filenames
- specify an output filename for tests which include a subdirectory
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
* Move include/rocprofiler-sdk/cxx/details/delimit.hpp to tokenize.hpp
* Update docs/how-to/using-rocprofv3.rst
- fix code block indents
- reorder rocprofv3 options, limit them to important options
- add docs for `--runtime-trace`
* Update rocprofv3.py
- parser argument groups
- new `--runtime-trace` option
- new `--summary` option
- new `--summary-per-domain` option
- new `--summary-groups` option
- new `--summary-output-file` option
- new `--summary-units` option
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- fix async copy operation names: add "MEMORY_COPY_" prefix
* lib/rocprofiler-sdk-tool: update statistics.{hpp,cpp}
- statistics<>::get_percent function
- stats_entry_t struct
- stats_formatter struct
- percentage struct
- std::to_string(::rocprofiler::tool::percentage)
* lib/rocprofiler-sdk-tool: update domain_type.{hpp,cpp}
- reorder domain_type enum values
* lib/rocprofiler-sdk-tool: update generateCSV.{hpp,cpp}
- separate writing CSV from accumulating statistics
- a lot of functionality was moved to statistics.{hpp,cpp}
* lib/rocprofiler-sdk-tool: update output_file.{hpp,cpp}
- output_stream_t struct
- get_output_stream(...) returns output_stream_t instance
* lib/rocprofiler-sdk-tool: update generateJSON.cpp
- update get_output_stream usage to output_stream_t
* lib/rocprofiler-sdk-tool: update generateOTF2.cpp
- header include order tweak
* lib/rocprofiler-sdk-tool: update buffered_output.hpp
- stats_data_t was renamed to stats_entry_t
* lib/rocprofiler-sdk-tool: update generatePerfetto.cpp
- header include tweak
* lib/rocprofiler-sdk-tool: update tmp_file_buffer.hpp
- emit warning message if write_ring_buffer fails after offloading instead of aborting
- prefer placement new instead of assignment in write_ring_buffer
* lib/rocprofiler-sdk-tool: add generateStats.{hpp,cpp}
- functions for accumulating statistics
* Update tests/rocprofv3/tracing-hip-in-libraries/CMakeLists.txt
- accommodate tweak to CSV output file name for HIP and HSA traces
* lib/rocprofiler-sdk-tool: update config.{hpp,cpp}
- new config variables
- stats_summary
- stats_summary_per_domain
- summary_output
- stats_summary_unit_value
- stats_summary_unit
- stats_summary_file
- stats_summary_groups
- support output keys for hostname: %hostname% / %h
* lib/rocprofiler-sdk-tool: update tool.cpp
- support summary output
* Documentation fixes
* Test for summary output
* Update tests/bin/transpose to use more ROCTx
- also support building with the roctracer ROCTx
* Remove roctxMark from OTF2 + fix kernel-rename tests
- following more ROCTx calls in transpose, kernel-rename validation had to be updated
* JSON metadata + JSON summary
- add serialization support for config
- add serialization support for statistics
- additions to json spec
- rocprofiler-sdk-tool/metadata/config
- rocprofiler-sdk-tool/metadata/command
- rocprofiler-sdk-tool/summary
- config output_keys support for NVIDIA %q{<ENV-VAR>} syntax
- config output_keys support keys within keys
* rocprofv3 --summary-groups warning if no domain matches
- emit warning if a regex in for summary groups did not match any domain names
* Compile fix for lib/rocprofiler-sdk-tool/tool.cpp
- get_config().scratch_memory_trace
- pass contributions to write_json
* Update rocprofv3.py to preload rocprofiler-sdk-roctx
- appended to LD_PRELOAD when args.marker_trace is enabled
* Fix ReST link errors about subtitle underline being too short
* Patch tokenization of config::stats_summary_groups
- guard against array values of empty strings
* Tweak rocprofv3 summary test
- input-summary.yaml (used by rocprofv3-test-summary-inp-yaml-execute) only provides one summary group regex
* Disable LD_PRELOAD of librocprofiler-sdk-roctx.so
- this causes problems in the sanitizers, will be addressed in another PR
* PC sampling: online partial PC sampling decoding
PC sampling service decodes a PC sample partially
by replacing the PC with an id of the loaded code object instance
containing PC and the offset of the PC within that code object instance.
* PC sampling: marker records removed
* PC sampling parser: minor doc update in mock
* PC sampling: introducing rocprofiler_pc_t
* NULL value of the code object id introduced.
* Clarifying documenation related to PC offset.
* PC offset documentation improvement
* PC sampling parser benchmark: Reducing the number of samples to recreate half of performance.
- ROCPROFILER_API after function
- use rocprofiler_tracing_operation_t in lieu of uint32_t where appropriate
- rocprofiler_tracing_operation_t is not int32_t typedef (formerly uint32_t)
- use const T* instead of T* where appropriate
* Tidying ATT dispatch API. ATT Agent to be initialized with rest of profiler. Removing read_index-based wait.
* Formatting
* Adding some input validation
* Add perf test for agent
* Removing async
* look for symbols in dynsym table
* checking both symtab and dynsym
* Avoid symbol duplication in non stripped binaries
* clang-format
* Minor elf_utils.cpp updates
- use 'else if' instead of 'if'
- logging tweaks
* Update registration
- tweak logging
* Update testing
- strip the rocprofiler-sdk-c-tool library
- add test-c-tool-rocp-tool-lib-execute test which does NOT LD_PRELOAD the library (uses only ROCP_TOOL_LIBRARIES instead)
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
- missing-new-line CI job: ensures all source files end with new line
- logging updates
- add new line to the end of many files
- fix header include ordering is misc places
- transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies
* PC sampling: integration test with instruction decoding
* PC sampling: verifying internal and external CIDs
The PC sampling integration test has been extended
to verify internal and external correlation IDs.
* tmp solution of using Instructions as keys
* wrapper for HIP call
* PCS integration test: ld_addr as instruction id
For the sake of the integration test, use as the
instruction identifier. To support code object unloading
and relocations, use as the identifier
(the change in the decoder is required).
* PCS integration test: removing shared_ptr
Completely removing usage of shared pointers.
* PCS integration test: removing decoder
When a code object has been unloaded, ensure all PC samples
corresponding to that object are decoded, prior to removing
the decoder.
* PCS integration test: fixing build flags and imports
* PCS integration test: fixing labels
* PCS integration test: cmake flags fix
* PC sampling cmake labels renamed
* PCS integration test refactoring
* PCS integration test: minimize usage of raw pointers
* PCS integration test: at least one sample should be delivered.
* PC sampling lables: pc-sampling
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- Optimize WriteInterceptor to eliminate extra barrier packets causing gaps between kernels in kernel tracing
- increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- misc logging improvements
- source/lib/rocprofiler-sdk/counters/agent_profiling.cpp
- increase timeout_hint in hsa_signal_wait in set_profiler_active_on_queue
- tests/rocprofv3/hsa-queue-dependency/CMakeLists.txt
- add TIMEOUT for rocprofv3-test-hsa-multiqueue-execute
* Incremental Counter Profile Creation
Adds support for incremental counter creation. How this functions is the
behavior of rocprofiler_create_profile_config has been changed.
rocprofiler_create_profile_config(rocprofiler_agent_id_t agent_id,
rocprofiler_counter_id_t* counters_list,
size_t counters_count,
rocprofiler_profile_config_id_t* config_id)
The behavior of this function now allows an existing config_id to be
supplied via config_id. The counters contained in this config will be
copied over and used as a base for a new config along with any counters
supplied in counters_list. The new config id is returned via config_id
and can be used in future dispatch/agent counting sessions.
A new config is created over modifying an existing config since there
is no gaurentee that the existing config isn't already in use. While we
could add locks (or other mutual exclusion properties) to check if its
in use and reject an update, the benefit from doing so is minor in
comparison to just creating a new config. This also side steps a common
pattern a tool may use to add additional counters at some point later on
during execution. Now they can do that without destroying the existing
config.
---------
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT
* Apply suggestions from code review
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Adding unit tests
* Adding counters check for gfx9 and SQ block only
* Addressing review comments
* changing the struct size
* fixing header includes
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
* Dispatch table copy/update uses ROCP_TRACE instead of ROCP_INFO
* Update rocprofiler-sdk CMake config
- rocprofiler::rocprofiler is alias to rocprofiler-sdk::rocprofiler-sdk instead of other way around
* Prefer rocprofiler-sdk::rocprofiler-sdk over rocprofiler::rocprofiler
* Fix WITH_UNWIND for glog
- requires a value of "none" instead of boolean now
* Update include/rocprofiler-sdk/registration.h
- explicit struct names to permit forward decl
* Update include/rocprofiler-sdk/cxx/serialization.hpp
- ROCPROFILER_SDK_CEREAL_NAMESPACE_BEGIN and ROCPROFILER_SDK_CEREAL_NAMESPACE_END to enable customized namespace
* Perfetto submodule
* include/rocprofiler-sdk/cxx/perfetto.hpp
- adapted from tests/common/perfetto.hpp
- updated json-tool to use <rocprofiler-sdk/cxx/perfetto.hpp>
* Update include/rocprofiler-sdk/cxx
- add details/delimit.hpp
- add details/join.hpp
- extend details/mpl.hpp
- extend details/operators.hpp
* Update lib/rocprofiler-sdk/hsa/async_copy.cpp
- update MEMORY_COPY direction names
* Preliminary perfetto support
* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp
- fix getting roctx msg vs. buffer operation name
* Temporary variable restructuring
* Perfetto patches after rebasing onto main
* Revert lib/rocprofiler-sdk/hsa/async_copy.cpp
- revert name
* Update lib/rocprofiler-sdk-tool/generatePerfetto.cpp
- fix ReadTrace
* Update tests/bin/hip-in-libraries
- sleep_for
* Support PFTRACE output format option in rocprofv3
* Change perfetto logging
* Update rocprofv3 tests to generate pftrace output
* Minor tweak to json-tool.cpp
* Update requirements.txt for perfetto testing
* Fix data race on amount_read in generatePerfetto.cpp
* Add testing for pftrace output
- relatively simple testing which verifies that the pftrace file has the same number of entries as JSON data for HIP/HSA/marker/kernel/memory_copy
* Fix import in perfetto_reader.py
* Fix data race in generatePerfetto.cpp
* Add default values for kernel struct
* Update hsa-queue-dependency app
- default initializers
- check HSA_AMD_MEMORY_POOL_INFO_RUNTIME_ALLOC_ALLOWED for memory pools
- clang-tidy fixes (member -> static, etc.)
* Update run-ci.py
- add --progress --output-on-failure -V if no other options regarding verbosity are passed
- improve the ability to control the stages
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>