Граф коммитов

63 Коммитов

Автор SHA1 Сообщение Дата
Mythreya 4fa165ec1a Add support for scratch reporting (#523)
* Add ToolsApiTable

Add ToolsApiTable wrapping for
scratch memory tracking

* Add initial support for scratch memory tracking

Buffering is implemented

* cmake formatting (cmake-format) (#525)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* source formatting (clang-format v11) (#524)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* Add callback tracing for scratch

Fixed the error where scratch tracking init was called irrespective of whether any client requested for it

* Apply suggestions from code review

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix tools api copy/update

Table were saved/updated incorrectly in previous
commit. Also adds passing user data through the callback

* Fix OpKind sequence for scratch tracking

Previously scratch was using OpKind from rocprofiler-sdk, but
templates were instantiated using API ID. These differ by 1

* Integration tests for scratch reporting

Added buffer and callback integration tests for scratch reporting

* source formatting (clang-format v11) (#550)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* cmake formatting (cmake-format) (#551)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* python formatting (black) (#549)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* CI fixes

* source formatting (clang-format v11) (#554)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* Update api

Rebase on main and updates based on PR feedback

* Update scratch reporting and address PR comments

- Added agent id to buffer records
- Updated `test_internal_correlation_ids` - Is almost identical to
  one in async-copy
- Updated scratch test to check for agent id
- Updated queue id serialization in callback records (prints
  handle as nested key)
- Remove `marker_api_traces` from scratch `test_internal_correlation_ids`
  validation test
- Rename `amd_tools_api` to `scratch_memory`
- Added doxygen comments
- Remove scratch callback from `tool.cpp`
- Replace assert with `LOF_IF` in `scratch_memory.cpp`

* Update tools table

Changed to match up with changes to hsa tables in main branch

* Rework scratch memory structure

* Update tests

- Added suggestions from PR review, and updated tests accordingly

* Misc cleanup

* Update scratch test

As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled.

Note,
> This API: `hsa_amd_agent_set_async_scratch_limit` is currently
> disabled. We need some changes in CP firmware to be able to do this
> and these changes are not ready yet.
> With the current code, you will also not get notifications for
> alternate-scratch allocations because this feature has been disabled
> while CP firmware is making additional changes
> We are hoping to have that feature enabled by ROCm-6.3

* Minor update to lib/rocprofiler-sdk/internal_threading.*

- delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-05 20:32:57 -05:00
Jonathan R. Madsen 1addfed9f6 Fix agent node id + randomize offset id (#625)
* Fix agent node id + randomize offset id

- fixes the node_id value
- randomizes a constant offset for the id.handle values
- switch to using node ids in rocprofiler-sdk-tool library
- update tests related to agents

* Logical node id

- sequential node id values from 0 to (N-1) where N is the number of agents
2024-03-21 20:04:21 -05:00
Jonathan R. Madsen 8591ed1c96 Use small_vector for API iterate_args (#597)
* Use small_vector for API iterate_args

- replace dim3 value arguments with rocprofiler_dim3_t
  - dim3 has a non-trivial destructor
- common::mpl::unqualified_type
- common::stringified_argument_array_t<N> alias
- assert_public_data_type_properties()
- common::container::small_vector<T>::at function
- stringize returns small_vector<stringified_argument>
  - stack allocated vector
- remove has_pc_sampling condition (HSA, HIP)
  - this will be handled in queue interception

* Misc tweaks
2024-03-13 07:36:55 -05:00
Jonathan R. Madsen 7b6d3c70bd Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin

* Renamed cmake project in tests/bin

* Update samples

- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages

* Update tests

- Use ROCPROFILER_DEFAULT_FAIL_REGEX

* Add tests/lib

- libraries with HIP code

* Update PTL submodule

- remove atexit delete of thread_id_map

* Update cmake/rocprofiler_options.cmake

- Set ROCPROFILER_DEFAULT_FAIL_REGEX

* Update common lib: env + logging

- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*

* Add lib/rocprofiler-sdk/shared_library.cpp

- shared library constructor

* Update lib/rocprofiler-sdk-tool/tool.cpp

- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging

* Add tests/bin/hip-in-libraries

- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
  - used for testing deadlocking within __hipRegisterFatBinary

* Update bin/rocprofv3

- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1

* Add tests/rocprofv3/tracing-hip-in-libraries

- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels

* Update bin/rocprofv3

- fix counter collection (no exec)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- replace "Kernel-Name" with "Kernel_Name"

* Update lib/rocprofiler-sdk/registration.cpp

Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries

* Update tests/rocprofv3

- replace "Kernel-Name" with "Kernel_Name"

* Update tests

- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection

* Update samples/counter_collection

- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes

* Update bin/rocprofv3

- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace

* Update lib/rocprofiler-sdk-tool

- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime

* Update lib/rocprofiler-sdk/agent.cpp

- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)

* GPR (vector and scalar) info in kernel symbol data

- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info

* Header include order fix

- Include repo headers first
- Third party library headers next
- standard library headers last

* Update dispatch profiling public API

- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback

* Update tests/bin/CMakeLists.txt

- fix add_subdirectory(hip-in-libraries) order

* Update VERSION

- bump to 0.2.0 in prep for AFAR
2024-03-07 22:21:26 -06:00
Gopesh Bhardwaj 665c546e65 Documentation Updates (#470)
* Updating installation doc

* updating README

* Addressing Yifan's feedback

* trivial updates

* Added limitation for individual xcc in readme

* Added rocprofv3 page

* README updates

* Updating rocprofv3

* source formatting (clang-format v11) (#538)

Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>

* Merging documentation team's update

* Update source/docs/installation.md

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bgopesh <7112102+bgopesh@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2024-03-07 21:05:16 -06:00
Jonathan R. Madsen 1d33d4cf78 Update rocprofiler_query_available_agents(...) (#596)
* Agent info version

* Complete implementation

- revert "rocprofiler_iterate_agents" to "rocprofiler_query_available_agents"

* Misc tweaks

- update rocprofiler_query_available_agents impl

* Update include/rocprofiler-sdk/agent.h

- Fix undocumented param for rocprofiler_query_available_agents
2024-03-06 02:17:40 -06:00
Jonathan R. Madsen 19971d5719 Fix rocprofiler_context_is_active(...) (#595)
* Fix rocprofiler_context_is_active

- previously returning ROCPROFILER_STATUS_ERROR_CONTEXT_NOT_FOUND if context was inactive

* Update include/rocprofiler-sdk/context.h

- Update doxygen docs
2024-03-06 00:32:34 -06:00
Jonathan R. Madsen 1bb94add11 Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args

- `include/rocprofiler-sdk/hip/api_args.h`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
  - iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
  - fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
  - improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
  - remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
  - destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
  - improve parallelism in tool_tracing_callback
  - serialize the marker api args
  - only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
  - reduce timeout on tests to 120 seconds

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- disable dereference of double pointer in stringize_impl

* Update lib/common

- indirection_level in mpl.hpp
- stringize_arg.hpp

* Rework rocprofiler_iterate_callback_tracing_kind_operation_args

- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
2024-03-01 01:46:07 -06:00
Jonathan R. Madsen a1267e1fd2 C compatibility for public headers (#566)
* C compatibility for public headers

- add tests/tools/c-tool.c
  - builds a tool (which does nothing) with C language
  - ensures that tool can be compiled in C
- add tests/c-tool/CMakeLists.txt
  - ensures that tool library build from C is a valid tool
- rocprofiler_counter_info_v0_t is_derived is int instead of bool
  - C does not have bool unless <stdbool.h> is included
- add `include/rocprofiler-sdk/hsa/api_trace_version.h
  - handles providing HSA_*_TABLE_(MAJOR|STEP)_VERSION values if compiled from C
- cmake define in version.h.in for ROCPROFILER_HSA_*_TABLE_(MAJOR|STEP)_VERSION
  - HSA table versions compiled with
- use rocprofiler_(hsa|hip|marker)_api_no_args struct to handle incompatibility b/t empty structs in C vs. C++ (size of 0 vs. size of 1)
- extern "C" in include/rocprofiler-sdk/{hsa,hip,marker}/api_args.h
- fixed spelling error: derrived -> derived
- scope YY_NO_INPUT compile definition to lib/rocprofiler-sdk/counters/parser/*

* Revert CDash dashboard
2024-02-29 23:49:54 -06:00
Jonathan R. Madsen 875f53b608 Correlation ID Retirement + misc (#527)
* Correlation ID Retirement

- include/rocprofiler-sdk/buffer_tracing.h
  - add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
  - ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
  - kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
  - emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
  - pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
  - update correlation_id struct
    - make ref_count private
    - {get,add,sub}_ref_count() functions
      - sub_ref_count() performs correlation id retirement
    - use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
  - check retired_correlation_ids
- tests/common/serialization.hpp
  - support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
  - check retired_correlation_ids
- tests/common/CMakeLists.txt
  - perfetto external project
- tests/common/perfetto.hpp
  - perfetto categories + aliases
  - add_perfetto_annotation
  - metaprogramming helpers
- tests/tools/CMakeLists.txt
  - link to tests-perfetto
- tests/tools/json-tool.cpp
  - demangling functions
  - serialization of marker API callback args
  - reduce parallel bottleneck in tool_tracing_callback
  - support correlation id retirement
  - Multiple threads for buffers
  - Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
  - write_perfetto() function

* Update tests/rocprofv3/tracing/validate.py

- tweak test_hsa_api_trace

* Update PTL submodule

- fixes for data race during destruction of task

* Update lib/rocprofiler-sdk/buffer.*

- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t

* Reduce timeouts in counter collection samples [skip ci]

* Update tests/tools/json-tool.cpp

- tweak demangle(string_view, int*) -> demangle(string_view, int&)

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more
2024-02-23 10:30:33 -06:00
Jonathan R. Madsen 0d939edbba Updates/fixes for CI, docs, tests, samples, and common library (#528)
- .github/workflows/continuous_integration.yml
  - apt-get update before apt-get install
  - remove libgtest-dev
  - actions-comment-pull-request: v2.4.3 -> v2.5.0
- .github/workflows/formatting.yml
  - create-pull-request: v5 -> v6
- cmake/rocprofiler_options.cmake
  - remove unused ROCPROFILER_DEBUG_TRACE and ROCPROFILER_LD_AQLPROFILE options
- samples/counter_collection/callback_client.cpp
  - corr_id field renamed to correlation_id
- samples/counter_collection/client.cpp
  - corr_id field renamed to correlation_id
- include/rocprofiler-sdk/fwd.h
  - In rocprofiler_record_counter_t: rename corr_id field to correlation_id
  - doxygen fixes
- lib/common/utility.*
  - remove get_accurate_clock_id_impl
  - timestamp_ns() defaults to CLOCK_BOOTTIME
- lib/rocprofiler-sdk/counters/core.cpp
  - fix spelling mistake: extrenal -> external
  - corr_id field renamed to correlation_id
- lib/rocprofiler-sdk-tool/tool.cpp
  - fix destruction of static tool::output_file before finalization
- scripts/update-docs.sh
  - define PROJECT_NAME
- tests/async-copy-tracing/validate.py
  - init_time and fini_time checks
  - hip_api_traces, marker_api_tracing
- tests/common/serialization.hpp
  - fix save function for rocprofiler_record_counter_t following rename of corr_id to correlation_id
- tests/kernel-tracing/validate.py
  - init_time and fini_time checks
  - relax test_total_runtime range
- tests/rocprofv3/tracing/CMakeLists.txt
  - remove -M from rocprofv3-test-systrace-execute
  - exclude test_hsa_api_trace in rocprofv3-test-systrace-validate due to HIP API tracing
- tests/rocprofv3/tracing/validate.py
  - update test_kernel_trace to accept mangled or demangled
- tests/tools/json-tool.cpp
  - remove use of GLOG
  - include init_time and fini_time
  - write_json(...) function
2024-02-22 00:16:43 -06:00
Benjamin Welton 7adffd5b22 Add rocprofiler_query_counter_info function (#452)
* Add rocprofiler_query_counter_info function

Replaces rocprofiler_query_counter_name. Allows for
querying other types of info from counters (such as
description) and gives us some flexibility to add
return data in the near future (if we have to).

* source formatting (clang-format v11) (#453)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Updated version fetching

* source formatting (clang-format v11) (#509)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Merged

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-19 16:05:38 -08:00
Benjamin Welton 3638351b4c Callback based handler for counter collection (#506)
* Callback based handler for counter collection

* source formatting (clang-format v11) (#507)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#508)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Doc fix

* Minor doc fix

* More doc fixes

* More doc fixes

* More doc fixes

* Update CI

* Changes to the API per comments

* Mutex exception for HSA

* source formatting (clang-format v11) (#511)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Doc fix

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-19 15:55:21 -08:00
Benjamin Welton 3eb6a27bc6 Add support for AQL dimensions (#262)
* Add support for AQL dimension changes

Adds support for returning dimensions from AQLProfile through rocprofiler
to tools. Includes a much larger expanded test suite that covers nearly
all files in counter collection.

Specific changes below:

samples/counter_collection/print_functional_counters: Modified to check
the validity of dimensions returned in comparison to the actual underlying
data obtained from a kernel execution.

rocprofiler-sdk/aql/helpers: adds function calls to support fetching
dimension information from AQLProfile.

rocprofiler-sdk/aql/packet_construct: modified to allow for events
to be exported to aid evaluate_ast in decoding the output buffer.

lib/rocprofiler-sdk/counters: Instance count now derived from dimension
sizes. rocprofiler_query_counter_dimensions now moved to a callback format
to improve usability.

rocprofiler-sdk/counters/core: Code migrations and exports of functions
for testing.

rocprofiler-sdk/counters/dimensions: Generates a dimension cache to be
used when querying dimension information for a counter id.

rocprofiler-sdk/counters/evaluate_ast: Modified to pass back correct
dimension information and to check/determine output dimensions for derived
counters.

rocprofiler-sdk/counters/id_decode: Modified to have a map between
dimension name -> dimension along with a conversion from the aql profile
id for a dimension (string) -> integer based id (happens only once during
init).

rocprofiler-sdk/hsa/queue: Modified to allow for making testing easier.
Specifically to allow Queue to now be mocked in unit tests for counter
collection.

* Merge with changes for serialization

* Added suggestions

* source formatting (clang-format v11) (#457)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor fix

* Test change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-07 22:03:21 -06:00
Jonathan R. Madsen aaff4976d2 Kernel Tracing Fix (#439)
* Update lib/rocprofiler-sdk/hsa/queue.cpp

- switch using the kernel_pkt.kernel_dispatch.completion_signal instead of interrupt signal for getting the dispatch time

* Update tests/kernel-tracing/validate.py

- add verification of total runtime collected in test_timestamps
  - the sum of the runtime of all the kernels in reproducible-runtime should be ~1 sec +/- 10%

* Remove include/rocprofiler-sdk/rocprofiler_plugin.h

* Update CI workflow

- update actions/cache@v3 -> v4
- actions/cache/save@v3 -> v4
- thollander/actions-comment-pull-request@v2 -> v2.4.3

* Update pytest.ini

- change default options to one that is more verbose

* Update tests/kernel-tracing/CMakeLists.txt

- skip test_total_runtime when Address or Thread Sanitizer enabled
  - overhead skews the results

* Update tests/kernel-tracing/validate.py

- separate test_total_runtime test
2024-01-30 14:52:17 -06:00
Jonathan R. Madsen 3f39339926 API Tracing Overhaul (#437)
* Update include/rocprofiler-sdk/hsa/*

- split HSA API IDs into separate enumerations
- add support for finalize ext table

* Update include/rocprofiler-sdk/hip/*

- remove compiler_api_args.h
- rocprofiler_hip_api_args_t contains all for HIP runtime and HIP compiler
- ROCPROFILER_HIP_API_ID_ -> ROCPROFILER_HIP_RUNTIME_API_ID_

* Update include/rocprofiler-sdk/marker/table_api_id.h

- ROCPROFILER_MARKER_API_TABLE_ID_ -> ROCPROFILER_MARKER_TABLE_ID_

* Update include/rocprofiler-sdk/*/table_api_id.h

- table_api_id.h -> table_id.h

* Update include/rocprofiler-sdk/*/table_api_id.h

- table_api_id.h -> table_id.h

* Update include/rocprofiler-sdk/fwd.h

- ROCPROFILER_CALLBACK_TRACING_HSA_API split into 4 enum values:
  - ROCPROFILER_CALLBACK_TRACING_HSA_CORE_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_AMD_EXT_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_IMAGE_EXT_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_FINALIZE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_API split into 4 enum values:
  - ROCPROFILER_BUFFER_TRACING_HSA_CORE_API
  - ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API
  - ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API
  - ROCPROFILER_BUFFER_TRACING_HSA_FINALIZE_EXT_API
- rocprofiler_callback_tracing_code_object_operation_t renamed to rocprofiler_code_object_operation_t (more consistent)
- doxygen updates

* Update include/rocprofiler-sdk/buffer_tracing.h

- improved doxygen comments
- removed unused rocprofiler_buffer_tracing_queue_scheduling_record_t
- removed unused rocprofiler_buffer_tracing_correlation_record_t

* Update include/rocprofiler-sdk/callback_tracing.h

- removed rocprofiler_callback_tracing_hip_compiler_api_data_t
  - rocprofiler_hip_api_args_t and rocprofiler_hip_compiler_api_args_t were combined
  - rocprofiler_hsa_api_retval_t and rocprofiler_hsa_compiler_api_retval_t were combined

* Update lib/rocprofiler-sdk/hsa/*

- utils.hpp
  - formatters for hsa_ext_program_t and hsa_ext_control_directives_t
- defines.hpp
  - removed variadic macros from lib/common/defines.hpp
  - HSA_API_META_DEFINITION, HSA_API_INFO_DEFINITION_0, HSA_API_INFO_DEFINITION_V specialize on table id
- async_copy.cpp
  - ROCPROFILER_HSA_API_ID_* -> ROCPROFILER_HSA_AMD_EXT_API_ID_*
  - add table id to templates
  - improve async_copy_fini
- hsa.hpp
  - add hsa_table_id_lookup
  - add hsa_domain_info
  - add table id to templates
  - add copy_table function
- hsa.cpp
  - add table id to templates
  - require hsa tables to be trivial and standard layout
  - remove set_data_args specialization for hsa_amd_memory_async_copy_rect
  - implement copy_table function
- hsa.def.cpp
  - update enums

* Update lib/rocprofiler-sdk/hip/*

- defines.hpp
  - use lib/common/defines.hpp
  - add hip_table_id_lookup to HIP_API_TABLE_LOOKUP_DEFINITION
- hip.hpp
  - hip_table_id_lookup
  - template iterate_args on table id
  - templated copy_table and update_table
- hip.cpp
  - replaced api_id_bounds with hip_domain_info
  - templated iterate_args on table id
  - templated copy_table and update_table

* Update lib/rocprofiler-sdk/marker/*

- defines.hpp
  - use lib/common/defines.hpp
- marker.cpp
  - updated enums
- marker.def.cpp
  - updated enums

* Update lib/rocprofiler-sdk/tests

- common.hpp
  - ROCPROFILER_CALL_EXPECT
  - callback_data_ext
  - update get_callback_tracing_names with new enums
  - update get_buffer_tracing_names with new enums
- external_correlation.cpp
  - support new HSA API enums
- intercept_table.cpp
  - use test/common.hpp
  - update to new HSA API enums
- registration.cpp
  - support new HSA API enums
- naming.cpp
  - validation for all get_ids(), get_names(), name_by_id(), id_by_name(), etc.

* Update lib/common

- defines.hpp
  - Move IMPL_DETAIL_FOR_EACH_NARG, GET_ADDR_MEMBER_FIELDS, and GET_NAMED_MEMBER_FIELDS here
    - used by HSA, HIP, and Marker
- static_object.hpp
  - is_trivial_standard_layout static constexpr member function
  - suppress register_static_dtor when is_trivial_standard_layout

* Update lib/rocprofiler-sdk/hsa/code_object.*

- name_by_id
- id_by_name
- get_names
- get_ids

* Update lib/rocprofiler-sdk/registration.cpp

- Update rocprofiler_set_api_table for HSA

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- Update for new HSA enums
- Rework to use switch statement
  - rocprofiler_query_callback_tracing_kind_operation_name
  - rocprofiler_iterate_callback_tracing_kind_operations
  - rocprofiler_iterate_callback_tracing_kind_operation_args

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- Update for new HSA enums
- Rework to use switch statement
  - rocprofiler_query_buffer_tracing_kind_operation_name
  - rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler-sdk-tool

- helper.cpp
  - update get_buffer_id_names with new enums
  - update get_callback_id_names with new enums
- tools.cpp
  - update to use new HSA enums

* Update samples/common

- added call_stack.hpp
  - source_location struct
  - call_stack_t alias
  - print_call_stack function
- added name_info.hpp
  - utils for getting buffer/callback domain and operation names

* Update samples/api_buffered_tracing/client.cpp

- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums

* Update samples/api_callback_tracing/client.cpp

- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums

* Update tests/tools/json-tool.cpp

- update for new HSA enums

* Update tests/rocprofv3/tracing/validate.py

- update for new HSA domain names

* Update samples/counter_collection/main.cpp

- reduce number of kernels to 50,000 since 200,000 causes issues with thread sanitizer
2024-01-30 12:14:26 -06:00
Jonathan R. Madsen 9efafc4d23 Split ROCTx API tables and update intercept table API (#421)
* Update include/rocprofiler-sdk

- buffer_tracing.h
  - fix doxygen for rocprofiler_buffer_tracing_hip_api_record_t
  - update doxygen for rocprofiler_buffer_tracing_marker_api_record_t
    - remove unused marker_id field
- fwd.h
  - Split ROCPROFILER_CALLBACK_TRACING_MARKER_API into ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API
  - Split ROCPROFILER_BUFFER_TRACING_MARKER_API into ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API
  - split rocprofiler_runtime_library_t into rocprofiler_runtime_library_t and rocprofiler_intercept_table_t
    - after split of ROCTx into 3 tables, specifying rocprofiler_at_internal_thread_create became confusing

* Update include/rocprofiler-sdk-roctx/api_trace.h

- Split into three tables: core, control, and name
  - core: what it sounds like
  - control: functions for controling the profiler
  - name: functions for giving resources names

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- modifications following split into multiple tables

* Update lib/rocprofiler-sdk/marker/*

- modifications following split of ROCTx API into multiple intercept tables

* Update lib/rocprofiler-sdk/tests

- common.hpp
  - add enums to get_callback_tracing_names() and get_buffer_tracing_names()
- intercept_table.cpp
  - update test to use rocprofiler_intercept_table_t (and enums) instead of rocproifler_runtime_library_t
  - update OR combos tested
- roctx.cpp
  - updates following split of ROCTx API table into multiple tables
  - use simplified specification of control API

* Update lib/rocprofiler-sdk

- buffer_tracing.cpp
  - Updates for ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- callback_tracing.cpp
  - Updates for ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- intercept_table.hpp
  - notify_runtime_api_registration -> notify_intercept_table_registration
- intercept_table.cpp
  - updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- registration.cpp
  - updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
  - updates for notify_runtime_api_registration -> notify_intercept_table_registration

* Update lib/rocprofiler-sdk-tool

- helper.cpp
  - Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- tool.cpp
  - migrate to new enums for split ROCTx tables
  - use simplified split for control table vs. core+name tables

* Update samples/{api_callback_tracing,intercept_table}

- intercept_table/client.cpp
  - rocprofiler_runtime_library_t -> rocprofiler_intercept_table_t
- api_callback_tracing/client.cpp
  - Updates for new enums in get_callback_id_names()
  - use simplified split for control table vs. core+name tables
  - migrate to new enums for split ROCTx tables

* Update tests

- rocprofv3/tracing/validate.py
  - handle new marker domain names
- tools/json-tool.cpp
  - Updates for new enums in get_callback_id_names() and get_buffer_id_names()
  - use simplified split for control table vs. core+name tables
  - migrate to new enums for split ROCTx tables

* Update tests/rocprofv3/tracing/CMakeLists.txt

- fix FAIL_REGULAR_EXPRESSION for rocprofv3-test-trace-execute

* Update lib/rocprofiler-sdk-tool/{output_file,tool}.*

- logging in output_file dtor
- support stdout/stderr

* Update lib/common/container/record_header_buffer.hpp

- reduce probability of is_empty() returning true while emplace is happening

* Update lib/rocprofiler-sdk-tool/tool.cpp

- logging for buffered_tracing_callback
- counter collection uses CSV encoder

* Update bin/rocprofv3

- remove -i flag from help menu
2024-01-26 13:56:15 -06:00
Jonathan R. Madsen 3547a45c0c Improve buffer flush error handling (#416)
* Update include/rocprofiler-sdk/fwd.h

- add ROCPROFILER_STATUS_ERROR_FINALIZED error code

* Update lib/rocprofiler-sdk/rocprofiler.cpp

- status string for ROCPROFILER_STATUS_ERROR_FINALIZED

* Update lib/rocprofiler-sdk/buffer.cpp

- return error code if buffer flush invoked after finalized
- fatal error if task group destroyed
- error message if task runs after finalized
- improve join of task group

* Update lib/rocprofiler-sdk/counters/tests/evaluate_ast_tests.cpp

- Update lambdas to return reference due to strange -Warray-bounds and -Wstringop-overflow warnings with g++ (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0
2024-01-26 04:01:09 -06:00
Jonathan R. Madsen 9a8b6f6b7b Counter API and Samples Updates (#410)
* Update include/rocprofiler-sdk/{counters,profile_config}.h

- use rocprofiler_agent_id_t instead of rocprofiler_agent_t

* Update samples

- use rocprofiler-sdk::rocprofiler-sdk instead of rocprofiler::rocprofiler in cmake
- api_callback_tracing sample roctxProfiler{Pause,Resume}
- api_callback_tracing sample uses ROCTx
- updates to use rocprofiler_agent_id_t

* Update run-ci.py

- exclude rocprofiler-sdk-tool from samples (no sample uses that code)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- Update rocprofiler_iterate_agent_supported_counters to use agent ID

* Update lib/rocprofiler-sdk/counters/core.*

- profile_config has pointer to agent instead of copy

* Update lib/rocprofiler-sdk/agent.*

- provide get_agent(...) func via rocp agent id

* Update lib/rocprofiler-sdk/{buffer,callback}_tracing.cpp

- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED for enums missing implementation

* Update lib/rocprofiler-sdk/counters.cpp

- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t

* Update lib/rocprofiler-sdk/profile_config.cpp

- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t

* Update source/docs

- requirements.txt + install reqs in cmake

* Bump version to 0.1.0

* Update samples/api_callback_tracing/CMakeLists.txt

- LD_LIBRARY_PATH for test

* Update test/rocprofv3/tracing/CMakeLists.txt

- reorder validation files so memory copy comes first

* Update lib/rocprofiler-sdk-tool/tool.cpp

- logging for flushing buffers
- variables for buffer_size and buffer_watermark
  - increase the watermark to a full buffer
- use dedicated threads for each buffer

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- test sets ROCPROF_LOG_LEVEL and ROCPROFILER_LOG_LEVEL to info

* Remove lib/rocprofiler-sdk-tool/trace_buffer.hpp

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- drop log level to warning when leak sanitizer is enabled (produces small memory leak)
2024-01-25 23:47:40 -06:00
Jonathan R. Madsen c641749fe6 HIP API Tracing (#357)
* Update include/rocprofiler-sdk/hip*

- updates for intercept table

* Update lib/common/units.hpp

- clang-tidy fixes

* Add lib/rocprofiler-sdk/hip

- tracing implementation for the HIP intercept table

* Update source/lib/rocprofiler-sdk/CMakeLists.txt

- add_subdirectory(hip)

* Update source/lib/rocprofiler-sdk/hsa

- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION

* Update lib/rocprofiler-sdk/hip

- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- stringize_impl print dereferenced pointers when possible

* Update lib/rocprofiler-sdk/tests/intercept_table.cpp

- remove failures for intercepting HIP API tables

* Update include/rocprofiler-sdk/fwd.h

- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args

* Update lib/rocprofiler-sdk/intercept_table.cpp

- support HipDispatchTable and HipCompilerDispatchTable

* Update lib/rocprofiler-sdk/internal_threading.cpp

- Support ROCPROFILER_HIP_COMPILER_LIBRARY

* Update lib/rocprofiler-sdk/registration.cpp

- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging

* Update samples/api_{buffered,callback}_tracing

- Modifications to demonstrate HIP API tracing

* Update tests/kernel-tracing

- Modifications to handle/test HIP API tracing

* Separate HIP tracing from HIP compiler tracing

* Fix installation of include/rocprofiler-sdk/hip/*

- add compiler and table headers to install

* Fixes to HIP interception

- hip_api_trace.hpp was updated a bit
  - removed hipGetDeviceProperties (generic)
  - added hipGetDevicePropertiesR0600
  - added hipGetDevicePropertiesR0000
  - removed hipRegisterTracerCallback
  - reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
  - added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers

* Update lib/rocprofiler-sdk/hip/hip.*

- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)

* Update lib/rocprofiler-sdk/hsa/hsa.*

- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)

* Update test/kernel-tracing/validate.py

- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register

* Update tests/tools/json-tool.cpp

- fix context associated with "HIP_API_CALLBACK"

* Update external/CMakeLists.txt

- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
  - BUILD_TESTING (OFF)
  - BUILD_SHARED_LIBS (OFF)
  - BUILD_OBJECT_LIBS (OFF)
  - BUILD_STATIC_LIBS (ON)
  - CMAKE_POSITION_INDEPENDENT_CODE (ON)
  - CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
  - CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog

* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt

- remove explicit setting of SKIP_BUILD_RPATH

* Update CMakeLists.txt

- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH

* Update tests/CMakeLists.txt

- include(GNUInstallDirs)

* Update samples/CMakeLists.txt

- include(GNUInstallDirs)

* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h

- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)

* Update lib/rocprofiler-sdk/hip/details/ostream.hpp

- clang-tidy fixes

* Update cmake/rocprofiler_linting.cmake

- add a feature for clang tidy exe

* Update lib/rocprofiler-sdk/hip/hip.cpp

- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- fix merge

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- fix merge

* Update bin/rocprofv3

- args for marker, HIP runtime, and HIP compiler tracing

* Update tests/apps/simple-transpose

- use roctx

* Update tests/rocprofv3/tracing

- validate marker API data

* Update lib/rocprofiler-sdk-tool

- support for HIP runtime, HIP compiler, marker API

* Update queue/queue_controller/registration/utility

- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
  - implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
  - this is used to sync each queue during queue_controller_fini()

* Fix data races: queue/context/stable_vector

- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array

* Update lib/rocprofiler-sdk/hsa/hsa.*

- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables

* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp

- use HSA subtable accessors

* Update rocprofiler_memcheck and CI workflow

- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
  - GCC 13 uses libtsan.so.2

* Update CI workflow

* Update lib/rocprofiler-sdk/counters/{metrics,counters}

- fix possibly dangling reference to a temporary from gcc-13

* Update thread-sanitizer-suppr.txt

- Ignore data races originating in hsa-runtime library

* Update cmake/rocprofiler_memcheck.cmake

- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library

* Update tests/rocprofv3/tracing/CMakeLists.txt

- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test

* Update lib/common/container/record_header_buffer.hpp

- fix data race identified by gcc v13 and libtsan.so.2

* Update hip API id, args, and def

- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0

* Update lib/common/container/record_header_buffer.hpp

- fix deadlock in save/read/reset

* Update source/docs/CMakeLists.txt

- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr

* Update lib/rocprofiler-sdk/hip/details/ostream.hpp

- remove overloads for HIP_MEMSET_NODE_PARAMS

* Update docs/CMakeLists.txt

- use find_program for shell instead of hardcoded /bin/bash
2024-01-24 16:32:54 -06:00
Jonathan R. Madsen 1f4cf1aa39 Tools update (#397)
* Srnagara/tool counters collect (#331)

* Adding counter collection capability to tools

* Adding counter collection feature to tools

* Adding counter collection capability to tools

* Fixing merge down issues

* Small tool fixes for build + prevent profile realloc

* Reproducing the counter name query issue in buffered callback

* Minor fix for init order + sample that directly uses sdk-tool for debug purposes

* Adding a temporary fix to print the counter names

* Fixing the output file name and reverting the changes of caching the profile config

* Fixing SGPR_Count value

* cleaning up debug prints

* Adding header to counter collection file

* Adding kernel filtering support

* Remove threading

* Cleaning up the code

* Removing redundant prints

* Revert "Remove threading"

This reverts commit 05c58fb9de826e92cf8d2e3d1c31d5578525dcb4.

* Revert "Cleaning up the code"

This reverts commit 1d964882bf2396dee8ad020cbb6c83b36e0674e9.

* Changing the tools code to align with init-order fix

* cmake formatting (cmake-format) (#335)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* source formatting (clang-format v11) (#336)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Adding support for async memory copy

* source formatting (clang-format v11) (#391)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Fixing header typo

* Fixing tool_fini

* Replaceing the direction and kind fields values with description

* Update lib/rocprofiler-sdk-tool/helper.cpp

- Remove use of VLA

* Update lib/rocprofiler-sdk-tool/tool.cpp

- Formatting

* Migrate common/config.* to rocprofiler-sdk-tool

* Update lib/rocprofiler-sdk-tool/tool.cpp

- fix clang-tidy issues

* source formatting (clang-format v11) (#392)

Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

* Update lib/common/mpl.hpp

- is_string_type / is_string_type_impl for deducing if type is a string type

* Update include/rocprofiler-sdk/fwd.h

- ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_NONE starts at zero

* Update lib/rocprofiler-sdk/hsa/async_copy.*

- functions for operation ids and names

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- support iterating and getting names for ROCPROFILER_BUFFER_TRACING_MEMORY_COPY

* Update lib/rocprofiler-sdk-tool/config.*

- env ROCPROFILER_ prefix -> ROCPROF_ prefix
- add support for memory copy tracing, counter collection, etc.

* Update lib/rocprofiler-sdk-tool/helper.*

- removed TracerFlushRecord
- removed cxa_demangle (use one in common library)
- removed GetCounterNames (handled in config)
- removed GetKernelNames (handled in config)

* Add lib/rocprofiler-sdk-tool/output_file.*

- separate out get_output_stream function and output_file struct from tool.cpp

* Add lib/rocprofiler-sdk-tool/csv.hpp

- write_csv_entry automatically quotes strings
- csv_encoder struct enforces correct number of columns

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- add new files

* Update lib/rocprofiler-sdk-tool/tool.cpp

- update construction of output_file class
- add kernel_symbol_data for serializing kernel trace data
- use config instead of env lookups
- optimize counter collection profile config lookup/creation

* Update bin/rocprofv3

- rocprofv3 --help exits with 0 (as it should)
- command-line arg for memory copy tracing
- command-line arg for mangled kernels
- command-line arg for truncated kernels
- env ROCPROFILER_ prefix -> env ROCPROF_ prefix

* Update tests/async-copy-tracing/validate.py

- update test_async_copy_direction to new enum values

* Update tests/kernel-tracing/validate.py

- update test_async_copy_direction to new enum values

* Update tests/tools/json-tool.cpp

- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY to supported buffer_name_info

* Update samples/counter_collection/{CMakeLists.txt,main.cpp}

- remove counter-collection-sdk-tool

* Update .github/workflows/docs.yml

- fix paths triggering running the workflow

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

* adding counter collection support

* Adding counter collection test

* changing directory structure of counter collection tests

* Fixing test path for rocprofv3

* Adding hsa-tracing basic test

* cmake formatting (cmake-format) (#362)

Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>

* counter collection tests drop2

* fixing hsa-trace test for rocprofv3 path

* python formatting (black) (#371)

Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>

* both counter colleciton and tracing should work together

* Fixing rocprofv3 path

* Attempt to fix Segfault with AddressSanitizer

* fixing sanitizer segfault

* Update rocprofv3

* Update lib/rocprofiler-sdk-tool/README.md

- update env variables

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- return ROCPROFILER_STATUS_BUFFER_NOT_FOUND if buffer tracing service is configured with invalid buffer

* Update lib/rocprofiler-sdk-tool/tool.cpp

- designated hsa API trace buffer

* Update tests/hsa-tracing/CMakeLists.txt

- Fix environment

* Update rocprofv3

- do not override HSA_TOOLS_LIB
- support ROCPROF_PRELOAD
- LD_PRELOAD librocprofiler-sdk.so

* Restructure tests directory

- move all rocprofv3 integration tests into subfolder

* Update cmake/Templates/rocprofiler-sdk/config.cmake.in

- create rocprofiler-sdk::rocprofv3 cmake target

* Update tests/rocprofv3/hsa-tracing

- improve validate.py
- convert input to dict via csv.DictReader

* Update tests/apps/CMakeLists.txt

- fix build rpath for simple-transpose

* Update  cmake/rocprofiler_memcheck.cmake

- prefer libtsan.so.0

* Update tests/rocprofv3/hsa-tracing

- move to tests/rocprofv3/tracing
- include kernel tracing and memory copy tracing

* Update lib/rocprofiler-sdk-tool/tool.cpp

- normalize "_ID" vs. "_Id" in CSV column names (use "_Id")

* Update lib/rocprofiler-sdk/buffer.{hpp,cpp}

- change signature of buffer::get_buffers()
- buffer::get_buffers() uses static_object

* Update lib/rocprofiler-sdk/context/context.cpp

- update usage of buffer::get_buffers()
  - now returns pointer

* Update lib/rocprofiler-sdk/tests/buffer.cpp

- update to change for signature of buffer::get_buffers()

* Update tests/rocprofv3/tracing/CMakeLists.txt

- use %argt% with -d argument

* Update lib/rocprofiler-sdk-tool/tool.cpp

- use atexit for finalization

* Update tests/rocprofv3/tracing/CMakeLists.txt

- tweaked name of tests

* Update lib/rocprofiler-sdk/hsa/async_copy.*

- async_copy_fini + reference counting signals

* Update lib/rocprofiler-sdk/registration.cpp

- invoke hsa::async_copy_fini() to prevent data race on signals

---------

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>
2024-01-22 19:06:25 -06:00
Jonathan R. Madsen 21dd088c8e ROCTx Library Tracing (#390)
* Update include/rocprofiler-sdk/marker/*

- Update rocprofiler_marker_api_args_t for all API functions
- Add ROCPROFILER_MARKER_API_ID_roctxGetThreadId to rocprofiler_marker_api_id_t

* Update include/rocprofiler-sdk/marker/api_args.h

- fix include

* Update lib/common/mpl.hpp

- is_pair
- is_type_complete_v

* Update include/rocprofiler-sdk/marker/*

- fix rocprofiler_marker_api_retval_t
- add roctxGetThreadId to rocprofiler_marker_api_args_t
- fix type in enum: HsaDevice -> HsaAgent
- add table_api_id.h

* Update include/rocprofiler-sdk/marker.h

- include marker/table_api_id.h

* Update include/rocprofiler-sdk/buffer_tracing.h

- Buffer marker tracer records have begin and end timestamp

* Add lib/rocprofiler-sdk/marker

- tracing implementation for marker (roctx) library

* Update include/rocprofiler-sdk/{buffer_tracing,marker/table_api_id}.h

- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- support for ROCPROFILER_BUFFER_TRACING_MARKER_API

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- support for ROCPROFILER_CALLBACK_TRACING_MARKER_API

* Update lib/rocprofiler-sdk/intercept_table.cpp

- template instantiation for notify_runtime_api_registration

* Update lib/rocprofiler-sdk/registration.cpp

- enable roctx in rocprofiler_set_api_table

* Update lib/rocprofiler-sdk/marker/marker.cpp

- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t

* Update lib/rocprofiler/tests for roctx testing

- add roctx.cpp
  - unit tests for roctx callback and buffer tracing
- support marker API in get_{buffer,callback}_tracing_names()

* Update lib/common/logging.cpp

- logging initialized message mentions env variable

* Update lib/common/mpl.hpp

- NOLINT for misc-definitions-in-headers

* Update lib/rocprofiler-sdk/tests/CMakeLists.txt

- include LD_LIBRARY_PATH in rocprofiler-lib-tests-shared tests

* Update lib/rocprofiler-sdk/registration.cpp

- client_library_vec_t is now vector of option<client_library>
  - enables resetting the client_library after finalization
- removed acquiring registration lock when invoke_client_finalizers called via atexit
  - this was causing some lock-order-inversion warnings (potential deadlock)

* Update lib/rocprofiler-sdk/agent.cpp

- model name for agent supports spaces

* Update tests/common/serialization.hpp

- add serialization support for marker tracing data structures

* Update tests/apps

- Add ROCTx markers into reproducible-runtime and transpose

* Update tests/tools/json-tools.cpp

- add marker tracing support
- remove strdup (no longer necessary)

* Update tests/kernel-tracing/validate.py

- validate marker API tracing data

* Update tests/async-copy-tracing/validate.py

- validate marker API tracing data

* Update cmake for load path resolution during testing

* Update tests/async-copy-tracing/CMakeLists.txt

- fix test LD_LIBRARY_PATH

* Update cmake/Templates/rocprofiler-sdk-roctx/config.cmake.in

- fix constructing rocprofiler-sdk-roctx::rocprofiler-sdk-roctx
2024-01-18 09:48:06 -06:00
Jonathan R. Madsen 1edd4891b2 ROCTx Library (#360)
* Initial implementation of roctx library

* Update include/roctx/CMakeLists.txt

- fix installation

* Update cmake/rocprofiler_config_packaging.cmake

- add rocprofiler-sdk-roctx installer

* Update include/roctx/CMakeLists.txt

- include api_trace.h in installation

* Update include/roctx/api_trace.h

- add ROCTX_API_TABLE_VERSION_MAJOR define
- add ROCTX_API_TABLE_VERSION_STEP define

* Update lib/roctx/roctx.cpp

- static asserts for table size and struct member offsets

* Update external/CMakeLists.txt

- move BUILD_SHARED_LIBS to top
- disable libunwind for glog

* Update lib/roctx/CMakeLists.txt

- Update {BUILD,INSTALL}_RPATH

* Relocate include/roctx to include/rocprofiler-sdk/roctx

* Relocate lib/roctx to lib/rocprofiler-sdk-roctx

- change the name of the library from libroctx to librocprofiler-sdk-roctx

* Move lib/plugins to lib/rocprofiler-sdk-tool/plugins

- also change install export group

* Update lib/rocprofiler-sdk/CMakeLists.txt

- change rocprofiler-shared-library EXPORT group (rocprofiler-sdk-library-targets -> rocprofiler-sdk-targets)

* Update cmake/rocprofiler_utilities.cmake

- change install EXPORT group
  - rocprofiler-sdk-library-targets -> rocprofiler-sdk-targets

* Update CMakeLists.txt

- set PACKAGE_NAME at high level
- include(rocprofiler_config_install_roctx)

* Update cmake/rocprofiler_config_install* and cmake/Templates/*.cmake.in

- added rocprofiler_config_install_roctx.cmake for installing roctx as a package
- reorganization of existing cmake/Templates/*-config.cmake.in files
- created new config.cmake.in and build-config.cmake.in for rocprofiler-sdk-roctx

* Relocate include/rocprofiler-sdk/roctx to include/rocprofiler-sdk-roctx

* Update rocprofiler_config_install_roctx.cmake

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- update include paths

* Update lib/rocprofiler-sdk-roctx/CMakeLists.txt

- change target name to have rocprofiler-sdk- prefix
- interface target_include_directories
- define export symbol

* source formatting (clang-format v11) (#361)

Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

* Update include/rocprofiler-sdk/fwd.h

- fix doxygen markup for ROCPROFILER_STATUS_ERROR_CONTEXT_ERROR

* Update modulefile and setup-env.sh

* Update cmake/Templates/rocprofiler-sdk/config.cmake.in

- fix inclusion of rocprofiler-sdk-targets.cmake

* Update include/rocprofiler-sdk-roctx

- add types.h for typedefs
- add doxygen comments for roctx.h
- add roctxGetThreadId function
- roctxProfilerStart and roctxProfilerStop accept thread ID param

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- hsa_agent_t* -> hsa_agent_s*

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- support for roctxGetThreadId
- update signatures of roctxProfilerPause and roctxProfilerResume

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- Initialize logging with ROCTX_LOG_LEVEL

* Update include/rocprofiler-sdk-roctx/roctx.h

- remove ROCTX_NONNULL for ihipStream_t parameter in roctxNameHipStream because default stream is a nullptr

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-01-17 01:27:41 -06:00
Jonathan R. Madsen dc8b8aa448 Cleanup + logging env variable (#387)
* [CP] Update tests/common/serialization.hpp

- remove duplication in rocprofiler_callback_tracing_code_object_load_data_t

* [CP] Update lib/rocprofiler-sdk/tests

- create common.hpp
- update registration.cpp to use common.hpp

* [CP] Add lib/common/logging.{hpp,cpp}

- generic init_logging function

* [CP] Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- remove excess logging

* [CP] Update lib/rocprofiler-sdk/registration.cpp

- use common::init_logging(...)
- enforce ROCPROFILER_REGISTER_FORCE_LOAD in rocprofiler_force_configure
- logging updates in rocprofiler_set_api_table

* Update include/rocprofiler-sdk/buffer_tracing.h

- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t

* Update lib/common/utility.hpp

- remove active_capacity_gate

* Update lib/rocprofiler-sdk/tests/common.hpp

- fix get_{callback,buffer}_tracing_names()

* Update lib/rocprofiler-sdk/counters/xml/{basic,derived}_counters.xml

- add entries for gfx1102
2024-01-17 00:28:20 -06:00
Benjamin Welton 0952308c4a Add check to ensure metrics are valid on GPU Arch (#384)
* Add check to ensure metrics are valid on GPU Arch

Ensure requested metrics are valid on the GPU arch. If not valid,
error is returned during profile config init.

* source formatting (clang-format v11) (#385)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Update metrics.cpp

* source formatting (clang-format v11) (#386)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Update metrics.cpp

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-01-16 21:47:45 -06:00
Jonathan R. Madsen 936816f762 Async memory copy tracing (#317)
* Update samples/api_buffered_tracing/client.cpp

- support ROCPROFILER_BUFFER_TRACING_MEMORY_COPY

* Update include/rocprofiler-sdk/{buffer_tracing,fwd}.h

- update rocprofiler_buffer_tracing_memory_copy_record_t
- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_HOST_TO_HOST to rocprofiler_memory_copy_operation_t

* Update lib/rocprofiler-sdk/context/context.*

- get_registered_contexts functions (local copy)

* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp

- include some memory allocations and memory copies for better testing

* Update tests/common/serialization.hpp

- update serialization save function for rocprofiler_buffer_tracing_memory_copy_record_t

* Update lib/rocprofiler-sdk/hsa/hsa.*

- remove stale set_callback / activity_functor_t code
- forward decl hsa_api_meta
- template struct hsa_api_func for getting function return type and args

* Update tests/kernel-tracing/validate.py

- enforce memory_copies data size
- test timestamps in memory copies data
- improve internal and external correlation id validation

* Update lib/rocprofiler-sdk/hsa/defines.hpp

- HSA_API_META_DEFINITION macro

* Update lib/rocprofiler/hsa/rocprofiler-sdk/hsa/hsa.def.cpp

- HSA_API_META_DEFINITION specializations for async copy functions

* Add lib/rocprofiler-sdk/hsa/async_copy.{hpp,cpp}

- implements buffer memory tracing

* Update lib/rocprofiler-sdk/registration.cpp

- invoke rocprofiler::hsa::async_copy_init

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- logging improvements
- improve hsa <-> rocp agent mapping

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- load original signal in async signal handler before store_screlease

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- use store_relaxed instead of store_screlease

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- logging

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- logging

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- misc changes

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- misc changes

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- misc changes

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- return function pointer instead of lambda

* Update reproducible-runtime.cpp

- device sync

* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp

- use *Async variants of hipMalloc and hipMemcpy

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- populate async data properly

* Update tests/kernel-tracing/validate.py

- verification of async copy direction

* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp

- temporarily disable async memcpy functions

* Create tests/tools

- directory containing tool libraries used for collecting data in integration tests

* Update tests/kernel-tracing

- remove kernel-tracing-test-tool library (now rocprofiler-sdk-json-tool)
- update cmake, validate.py, conftest.py accordingly

* Add tests/async-copy-tracing

- integration test validating async copy tracing in transpose example

* Update tests/CMakeLists.txt

- updates for restructuring

* Revert tests/apps/reproducible-runtime

- restore code to semi-original state (no memory copying)

* Update tests/async-copy-tracing/validate.py

- fix comment in test_async_copy_direction

* Fix building tests against installation
2024-01-09 11:34:46 -06:00
Jonathan R. Madsen 6b374b8e68 Improve static singleton memory safety (#316)
* Update GitHub links

* Update samples/api_buffered_tracing/client.cpp

- check if initialized before forcing initialization

* Add lib/common/static_object.*

- template class for creating a static allocation in the binary which has all the properties of a heap allocated singleton but does not trigger leak sanitizers

* Update include/rocprofiler-sdk/internal_threading.h

- document return values

* Update lib/rocprofiler-sdk/internal_threading.cpp

- return codes from rocprofiler_create_callback_thread and rocprofiler_assign_callback_thread
- use common::static_object for thread-pool object

* Update lib/rocprofiler-sdk/agent.cpp

- use common::static_object to store array of strings and their hashes

* Update lib/rocprofiler-sdk/hsa/code_object.cpp

- use common::static_object to store array of strings and their hashes to ensure strings exist until termination

* Update lib/rocprofiler-sdk/registration.cpp

- use common::static_object to store status and client libraries
- update return values for rocprofiler_set_api_table

* Update lib/rocprofiler-sdk/hsa/hsa.cpp

- check registration::get_fini_status() in hsa_api_impl::functor<Idx>(args...)

* Update lib/rocprofiler-sdk/context/context.cpp

- using common::static_object for correlation id map
2023-12-19 13:47:21 -06:00
Jonathan R. Madsen 8ed68ce4f3 Update packaging (#306)
* Restructured tests

- support standalone compilation
- move tests/kernel-tracing/serialization.hpp to tests/common/serialization.hpp
- created tests/common library
- handle cloning of cereal library in standalone build

* Update install and packaging

* Update cmake/rocprofiler_config_packaging.cmake

- condense core, samples, development, and tools install components into single rocprofiler-sdk package
- keep tests install component in separate rocprofiler-sdk-tests package

* Update CI workflow to test install and packaging

* Update CI workflow

- install newer cmake for packaging checks

* Update cmake/rocprofiler_config_packaging.cmake

- disable auto-generation of shared-lib deps and provides for tests package

* Update CI workflow

- add sbin to PATH for dpkg install

* Update CI workflow

- remove using github.workspace when installing packages

* Update CI workflow

- hack to fix ordering of dpkg install

* Update CI workflow

- whitespace cleanup
2023-12-15 14:39:13 -06:00
Vladimir Indic 0666f6a197 AmdExtTable updated (#292)
* AmdExtTable updated

* hsa_amd_agent_set_async_scratch_limit introduced

* source formatting (clang-format v11) (#294)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
2023-12-12 10:36:38 -06:00
Benjamin Welton d2a6eec1bf Added kernel id to enqueue callback for kernel dispatch (#276)
Adds kernel id as parameter to rocprofiler_profile_counting_dispatch_callback_t.
Small cleanup of code in core.cpp.
2023-12-11 09:13:48 -08:00
Jonathan R. Madsen 1c02e7a92a Update documentation (#275)
- finished most of the TODOs
2023-12-04 13:43:22 -06:00
Benjamin Welton 022d7abc29 Documentation Update For Counters (#246)
* Documentation Update

* Minor fixes

* source formatting (clang-format v11) (#265)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2023-11-30 14:58:54 -08:00
Jonathan R. Madsen 9a0c84efa6 Use -sdk suffix and reset VERSION to 0.0.0 (#263)
* Fix find_package(rocprofiler) in build tree

* Move include/rocprofiler to include/rocprofiler-sdk

* Update include/CMakeLists.txt

- add_subdirectory(rocprofiler-sdk)

* Move lib/rocprofiler to lib/rocprofiler-sdk

* Move lib/rocprofiler-tool to lib/rocprofiler-sdk-tool

* Update lib/CMakeLists.txt

- add_subdirectory(rocprofiler-sdk)
- add_subdirectory(rocprofiler-sdk-tool)

* Update lib/rocprofiler-sdk/CMakeLists.txt

* Rename rocprofiler-tool to rocprofiler-sdk-tool

* Replace include rocprofiler/ with include rocprofiler-sdk/

* Replace include lib/rocprofiler/ with include lib/rocprofiler-sdk/

* Set VERSION to 0.0.0 and finish install to rocprofiler-sdk

* More fixes for rocprofiler -> rocprofiler-sdk

- fix issue with rocprofiler-sdk-config.cmake.in
- fix counters xml install path

* Fix documentation generation

* Create rocprofiler_LIB_ROCPROFILER_SDK_DIR for build tree

* cmake formatting (cmake-format) (#264)

Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-11-29 20:43:18 -06:00
Ammar ELWazir fe5d074375 Misc updates for distribution (#233)
* Adding tools support

* cmake formatting (cmake-format) (#227)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Checking to do rebase

* Adding rocprofv2 script

* cmake formatting (cmake-format) (#229)

Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>

* Fixing build for the tool

* Removing the requirement for rocm_version

* Update rocprofiler_utilities.cmake

* C++ filesystem fixes

- added source/lib/common/filesystem.hpp
  - support older compilers which have <experimental/filesystem> and do not have <filesystem>
- added samples/common/filesystem.hpp
  - samples now depend on "common" library which provides the correct filesystem header
- renamed rocprofiler-stdcxxfs interface target to rocprofiler-cxx-filesystem
  - support old LLVM in addition to GNU
- fix bin/rocprof/rocprof.cpp
  - was using VLA

* Fix rocprofiler-drm include directories

- OpenSUSE only has include/libdrm/drm.h (no include/drm/drm.h)

* Tools fixes

* Fix for the tools

* Fix rocprofv2 script

* Fixing Filesystem Issues

* source formatting (clang-format v11) (#234)

Co-authored-by: ammarwa <ammarwa@users.noreply.github.com>

* Vlaindic/pc sampling api update (#235)

* pcs: updating PC sampling API

* source formatting (clang-format v11) (#232)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

---------

Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* Vlaindic/pc sampling api update for ammar branch (#244)

*Updating the documentation inside pc_sampling.h

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: use @p in front of params

* pcs: documenting struct fields updated

* Fixing PC Sampling Documentation issues

* Fixing PC Sampling Documentation

* Relocated tools directory to source/lib/rocprofiler-tool

* Fixes/updates to rocprofiler-tool

- updated CMake
- Fixed miscellaneous issues in the code (VLAs, etc.)
- Updated rocprofv2 to reflect some minor env variables changes in rocprofiler-tool
- Fixed clang-tidy warnings

* Update lib/rocprofiler-tool/CMakeLists.txt

- link to atomic library

* Add $ORIGIN/.. RUNPATH to rocprofiler-tool

* Adding readme file for tools

* Renaming the tools readme file

* Update ReadMe.md

* Update ReadMe.md

* Documentation updates

- overview and explanation of design and concepts

* Fix lib/rocprofiler-tool/README.md

- delete ReadMe.md

* Hacks for build

* Update Filesystem

* cmake formatting (cmake-format) (#248)

Co-authored-by: ammarwa <ammarwa@users.noreply.github.com>

* source formatting (clang-format v11) (#249)

Co-authored-by: ammarwa <ammarwa@users.noreply.github.com>

* source formatting (clang-format v11) (#250)

Co-authored-by: ammarwa <ammarwa@users.noreply.github.com>

* Addressing review comments on the tool readme file

* Revert "Hacks for build"

This reverts commit d6688cb3d1226c46fc97e37ced889a5b0d180940.

* Fixes for GCC 7.5 compiler in OpenSUSE 15.4

* Update lib/rocprofiler-tool/CMakeLists.txt

- link to AQL profile library

* Fix lib/rocprofiler-tool/README.md

- fix markdown

* Fix lib/rocprofiler-tool

- fix usage of hsa_ven_amd_loader_query_host_address

* Fix unused variable warnings

- byproduct of variables only used in assert statements

* Update docs

- update about.md
  - more "Important Changes" section here
- update tool_library_overview.md
  - extend "Tool Library Design" section
  - write "Tool Initialization" section
  - write "Tool Finalization" section

* Add ghc::filesystem submodule

* Implement usage of ghc::filesystem

* Add ROCPROFILER_BUILD_GHC_FS option

- option to use external/filesystem (ghc)

* Update samples/counter-collection

- compile flags
- common library
- fixes for warnings

* Update tests/kernel-tracing/CMakeLists.txt

- change install location of kernel-tracing-test-tool and install rpath

* Update samples/common/CMakeLists.txt

- compile features requiring C++17

* Update lib/rocprofiler-tool/tool.cpp

- remove include <filesystem>
- comment out unused variable
- remove unused functions
- move some functions into anonymous namespace

---------

Co-authored-by: Sriraksha Nagaraj <Sriraksha.Nagaraj@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: ammarwa <ammarwa@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2023-11-28 10:04:37 -06:00
Benjamin Welton e8a5845661 Buffered Counter Collection API (#179)
* Added buffer counter collection API.

Initial testing added into counter-collection sample.

Added support for constant metrics in counter collection (#194)

* Added support for constant metrics in counter collection

Adds support and test cases for constant metrics (such as max wave size)
and adds the metric kernel duration (though this is still not yet
calculated).

* Minor doc updates

* Simple counter unit tests (#199)

* Simple counter unit tests

Unit tests and some minor fixes for simple and derived counter evaluation

* Added unit tests for reduction operations (#200)

* Added unit tests for reduction operations

* added tests for combo (constant+regular) counters (#201)

source formatting (clang-format v11) (#202)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

source formatting (clang-format v11) (#203)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

Local changes

source formatting (clang-format v11) (#205)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

Minor doc fix

Remove kernel_duration, migrate over set_dimensions to after HSA init

source formatting (clang-format v11) (#207)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

Added output to ROCPROFILER_SAMPLE_OUTPUT_FILE:

* Remove integer based counter in return struct

This casues a lot of complications and seems to provide limit benefit
of just treating all counters as doubles. For ease of use, drop the integer
based counter.

* source formatting (clang-format v11) (#217)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Add correlation id support to counters (#218)

Adds correlation id support to counter collection. Requires tracing
to be enabled to return any useful value currently (since we do not
have HIP kernel tracing yet).

* source formatting (clang-format v11) (#223)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Add sample that attempts to fetch all counters

On whatever machine this test is run on, all counters available on
the platform will attempted to be fetched from a kernel execution.
Each counter will be fetched one time to check that the counter can be
fetched on the platform and that the counter is returning the correct instance
count (however due to the lack of transparency from AQL profiler this
check is not functional for some counters). We do not do any implicit
reduction on any counter, the result is that we see more counters than
the number of events being requested.

Below is the status of all counters on MI210.  All counters appear
functional with the changes in this PR. However, the instance count
retruned will be greater than that returned by
rocprofiler_query_counter_instance_count.

Got 516 counters collected
Counter ID: 0 (size) expected 1 instances and got 1
Counter ID: 1 (processor_id_low) expected 1 instances and got 1
Counter ID: 2 (capability) expected 1 instances and got 1
Counter ID: 3 (local_mem_size) expected 1 instances and got 1
Counter ID: 4 (min_latency) expected 1 instances and got 1
Counter ID: 5 (weight) expected 1 instances and got 1
Counter ID: 6 (node_from) expected 1 instances and got 1
Counter ID: 7 (version_major) expected 1 instances and got 1
Counter ID: 8 (version_minor) expected 1 instances and got 1
Counter ID: 9 (mem_clk_max) expected 1 instances and got 1
Counter ID: 10 (num_xcc) expected 1 instances and got 1
Counter ID: 11 (width) expected 1 instances and got 1
Counter ID: 12 (flags) expected 1 instances and got 1
Counter ID: 13 (size_in_bytes) expected 1 instances and got 1
Counter ID: 14 (array_count) expected 1 instances and got 1
Counter ID: 15 (num_gws) expected 1 instances and got 1
Counter ID: 16 (simd_id_base) expected 1 instances and got 1
Counter ID: 17 (max_waves_per_simd) expected 1 instances and got 1
Counter ID: 18 (sdma_fw_version) expected 1 instances and got 1
Counter ID: 19 (gfx_target_version) expected 1 instances and got 1
Counter ID: 20 (max_bandwidth) expected 1 instances and got 1
Counter ID: 21 (cpu_core_id_base) expected 1 instances and got 1
Counter ID: 22 (cache_line_size) expected 1 instances and got 1
Counter ID: 23 (level) expected 1 instances and got 1
Counter ID: 24 (min_bandwidth) expected 1 instances and got 1
Counter ID: 25 (location_id) expected 1 instances and got 1
Counter ID: 26 (wave_front_size) expected 1 instances and got 1
Counter ID: 27 (lds_size_in_kb) expected 1 instances and got 1
Counter ID: 28 (simd_count) expected 1 instances and got 1
Counter ID: 29 (fw_version) expected 1 instances and got 1
Counter ID: 30 (recommended_transfer_size) expected 1 instances and got 1
Counter ID: 31 (simd_per_cu) expected 1 instances and got 1
Counter ID: 32 (association) expected 1 instances and got 1
Counter ID: 33 (mem_banks_count) expected 1 instances and got 1
Counter ID: 34 (latency) expected 1 instances and got 1
Counter ID: 35 (max_latency) expected 1 instances and got 1
Counter ID: 36 (cpu_cores_count) expected 1 instances and got 1
Counter ID: 37 (io_links_count) expected 1 instances and got 1
Counter ID: 38 (domain) expected 1 instances and got 1
Counter ID: 39 (max_engine_clk_fcompute) expected 1 instances and got 1
Counter ID: 40 (caches_count) expected 1 instances and got 1
Counter ID: 41 (simd_arrays_per_engine) expected 1 instances and got 1
Counter ID: 42 (cache_lines_per_tag) expected 1 instances and got 1
Counter ID: 43 (gds_size_in_kb) expected 1 instances and got 1
Counter ID: 44 (cu_per_simd_array) expected 1 instances and got 1
Counter ID: 45 (type) expected 1 instances and got 1
Counter ID: 46 (max_slots_scratch_cu) expected 1 instances and got 1
Counter ID: 47 (vendor_id) expected 1 instances and got 1
Counter ID: 48 (device_id) expected 1 instances and got 1
Counter ID: 49 (heap_type) expected 1 instances and got 1
Counter ID: 50 (drm_render_minor) expected 1 instances and got 1
Counter ID: 51 (num_sdma_engines) expected 1 instances and got 1
Counter ID: 52 (node_to) expected 1 instances and got 1
Counter ID: 53 (num_sdma_xgmi_engines) expected 1 instances and got 1
Counter ID: 54 (num_sdma_queues_per_engine) expected 1 instances and got 1
Counter ID: 55 (hive_id) expected 1 instances and got 1
Counter ID: 56 (num_cp_queues) expected 1 instances and got 1
Counter ID: 57 (max_engine_clk_ccompute) expected 1 instances and got 1
Counter ID: 517 (MAX_WAVE_SIZE) expected 1 instances and got 1
Counter ID: 518 (SE_NUM) expected 1 instances and got 1
Counter ID: 519 (SIMD_NUM) expected 1 instances and got 1
Counter ID: 520 (CU_NUM) expected 1 instances and got 1
[ERROR]Counter ID: 521 (SQ_WAIT_INST_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 522 (TCP_TCP_TA_DATA_STALL_CYCLES) expected 16 instances and got 128
Counter ID: 523 (GRBM_COUNT) expected 1 instances and got 1
Counter ID: 524 (GRBM_GUI_ACTIVE) expected 1 instances and got 1
Counter ID: 525 (GRBM_CP_BUSY) expected 1 instances and got 1
Counter ID: 526 (GRBM_SPI_BUSY) expected 1 instances and got 1
Counter ID: 527 (GRBM_TA_BUSY) expected 1 instances and got 1
Counter ID: 528 (GRBM_TC_BUSY) expected 1 instances and got 1
Counter ID: 529 (GRBM_CPC_BUSY) expected 1 instances and got 1
Counter ID: 530 (GRBM_CPF_BUSY) expected 1 instances and got 1
Counter ID: 531 (GRBM_UTCL2_BUSY) expected 1 instances and got 1
Counter ID: 532 (GRBM_EA_BUSY) expected 1 instances and got 1
Counter ID: 533 (CPC_ME1_BUSY_FOR_PACKET_DECODE) expected 1 instances and got 1
Counter ID: 534 (CPC_UTCL1_STALL_ON_TRANSLATION) expected 1 instances and got 1
Counter ID: 535 (CPC_CPC_STAT_BUSY) expected 1 instances and got 1
Counter ID: 536 (CPC_CPC_STAT_IDLE) expected 1 instances and got 1
Counter ID: 537 (CPC_CPC_STAT_STALL) expected 1 instances and got 1
Counter ID: 538 (CPC_CPC_TCIU_BUSY) expected 1 instances and got 1
Counter ID: 539 (CPC_CPC_TCIU_IDLE) expected 1 instances and got 1
Counter ID: 540 (CPC_CPC_UTCL2IU_BUSY) expected 1 instances and got 1
Counter ID: 541 (CPC_CPC_UTCL2IU_IDLE) expected 1 instances and got 1
Counter ID: 542 (CPC_CPC_UTCL2IU_STALL) expected 1 instances and got 1
Counter ID: 543 (CPC_ME1_DC0_SPI_BUSY) expected 1 instances and got 1
Counter ID: 544 (CPF_CMP_UTCL1_STALL_ON_TRANSLATION) expected 1 instances and got 1
Counter ID: 545 (CPF_CPF_STAT_BUSY) expected 1 instances and got 1
Counter ID: 546 (CPF_CPF_STAT_IDLE) expected 1 instances and got 1
Counter ID: 547 (CPF_CPF_STAT_STALL) expected 1 instances and got 1
Counter ID: 548 (CPF_CPF_TCIU_BUSY) expected 1 instances and got 1
Counter ID: 549 (CPF_CPF_TCIU_IDLE) expected 1 instances and got 1
Counter ID: 550 (CPF_CPF_TCIU_STALL) expected 1 instances and got 1
[ERROR]Counter ID: 551 (SPI_CSN_WINDOW_VALID) expected 1 instances and got 8
[ERROR]Counter ID: 552 (SPI_CSN_BUSY) expected 1 instances and got 8
[ERROR]Counter ID: 553 (SPI_CSN_NUM_THREADGROUPS) expected 1 instances and got 8
[ERROR]Counter ID: 554 (SPI_CSN_WAVE) expected 1 instances and got 8
[ERROR]Counter ID: 555 (SPI_RA_REQ_NO_ALLOC) expected 1 instances and got 8
[ERROR]Counter ID: 556 (SPI_RA_REQ_NO_ALLOC_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 557 (SPI_RA_RES_STALL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 558 (SPI_RA_TMP_STALL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 559 (SPI_RA_WAVE_SIMD_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 560 (SPI_RA_VGPR_SIMD_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 561 (SPI_RA_SGPR_SIMD_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 562 (SPI_RA_LDS_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 563 (SPI_RA_BAR_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 564 (SPI_RA_BULKY_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 565 (SPI_RA_TGLIM_CU_FULL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 566 (SPI_RA_WVLIM_STALL_CSN) expected 1 instances and got 8
[ERROR]Counter ID: 567 (SPI_SWC_CSC_WR) expected 1 instances and got 8
[ERROR]Counter ID: 568 (SPI_VWC_CSC_WR) expected 1 instances and got 8
[ERROR]Counter ID: 569 (SQ_ACCUM_PREV) expected 1 instances and got 8
[ERROR]Counter ID: 570 (SQ_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 571 (SQ_BUSY_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 572 (SQ_WAVES) expected 1 instances and got 8
[ERROR]Counter ID: 573 (SQ_LEVEL_WAVES) expected 1 instances and got 8
[ERROR]Counter ID: 574 (SQ_WAVES_EQ_64) expected 1 instances and got 8
[ERROR]Counter ID: 575 (SQ_WAVES_LT_64) expected 1 instances and got 8
[ERROR]Counter ID: 576 (SQ_WAVES_LT_48) expected 1 instances and got 8
[ERROR]Counter ID: 577 (SQ_WAVES_LT_32) expected 1 instances and got 8
[ERROR]Counter ID: 578 (SQ_WAVES_LT_16) expected 1 instances and got 8
[ERROR]Counter ID: 579 (SQ_BUSY_CU_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 580 (SQ_ITEMS) expected 1 instances and got 8
[ERROR]Counter ID: 581 (SQ_INSTS) expected 1 instances and got 8
[ERROR]Counter ID: 582 (SQ_INSTS_VALU) expected 1 instances and got 8
[ERROR]Counter ID: 583 (SQ_INSTS_VALU_ADD_F16) expected 1 instances and got 8
[ERROR]Counter ID: 584 (SQ_INSTS_VALU_MUL_F16) expected 1 instances and got 8
[ERROR]Counter ID: 585 (SQ_INSTS_VALU_FMA_F16) expected 1 instances and got 8
[ERROR]Counter ID: 586 (SQ_INSTS_VALU_TRANS_F16) expected 1 instances and got 8
[ERROR]Counter ID: 587 (SQ_INSTS_VALU_ADD_F32) expected 1 instances and got 8
[ERROR]Counter ID: 588 (SQ_INSTS_VALU_MUL_F32) expected 1 instances and got 8
[ERROR]Counter ID: 589 (SQ_INSTS_VALU_FMA_F32) expected 1 instances and got 8
[ERROR]Counter ID: 590 (SQ_INSTS_VALU_TRANS_F32) expected 1 instances and got 8
[ERROR]Counter ID: 591 (SQ_INSTS_VALU_ADD_F64) expected 1 instances and got 8
[ERROR]Counter ID: 592 (SQ_INSTS_VALU_MUL_F64) expected 1 instances and got 8
[ERROR]Counter ID: 593 (SQ_INSTS_VALU_FMA_F64) expected 1 instances and got 8
[ERROR]Counter ID: 594 (SQ_INSTS_VALU_TRANS_F64) expected 1 instances and got 8
[ERROR]Counter ID: 595 (SQ_INSTS_VALU_INT32) expected 1 instances and got 8
[ERROR]Counter ID: 596 (SQ_INSTS_VALU_INT64) expected 1 instances and got 8
[ERROR]Counter ID: 597 (SQ_INSTS_VALU_CVT) expected 1 instances and got 8
[ERROR]Counter ID: 598 (SQ_INSTS_VALU_MFMA_I8) expected 1 instances and got 8
[ERROR]Counter ID: 599 (SQ_INSTS_VALU_MFMA_F16) expected 1 instances and got 8
[ERROR]Counter ID: 600 (SQ_INSTS_VALU_MFMA_BF16) expected 1 instances and got 8
[ERROR]Counter ID: 601 (SQ_INSTS_VALU_MFMA_F32) expected 1 instances and got 8
[ERROR]Counter ID: 602 (SQ_INSTS_VALU_MFMA_F64) expected 1 instances and got 8
[ERROR]Counter ID: 603 (SQ_INSTS_VALU_MFMA_MOPS_I8) expected 1 instances and got 8
[ERROR]Counter ID: 604 (SQ_INSTS_VALU_MFMA_MOPS_F16) expected 1 instances and got 8
[ERROR]Counter ID: 605 (SQ_INSTS_VALU_MFMA_MOPS_BF16) expected 1 instances and got 8
[ERROR]Counter ID: 606 (SQ_INSTS_VALU_MFMA_MOPS_F32) expected 1 instances and got 8
[ERROR]Counter ID: 607 (SQ_INSTS_VALU_MFMA_MOPS_F64) expected 1 instances and got 8
[ERROR]Counter ID: 608 (SQ_INSTS_MFMA) expected 1 instances and got 8
[ERROR]Counter ID: 609 (SQ_INSTS_VMEM_WR) expected 1 instances and got 8
[ERROR]Counter ID: 610 (SQ_INSTS_VMEM_RD) expected 1 instances and got 8
[ERROR]Counter ID: 611 (SQ_INSTS_VMEM) expected 1 instances and got 8
[ERROR]Counter ID: 612 (SQ_INSTS_SALU) expected 1 instances and got 8
[ERROR]Counter ID: 613 (SQ_INSTS_SMEM) expected 1 instances and got 8
[ERROR]Counter ID: 614 (SQ_INSTS_FLAT) expected 1 instances and got 8
[ERROR]Counter ID: 615 (SQ_INSTS_FLAT_LDS_ONLY) expected 1 instances and got 8
[ERROR]Counter ID: 616 (SQ_INSTS_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 617 (SQ_INSTS_GDS) expected 1 instances and got 8
[ERROR]Counter ID: 618 (SQ_INSTS_EXP_GDS) expected 1 instances and got 8
[ERROR]Counter ID: 619 (SQ_INSTS_BRANCH) expected 1 instances and got 8
[ERROR]Counter ID: 620 (SQ_INSTS_SENDMSG) expected 1 instances and got 8
[ERROR]Counter ID: 621 (SQ_INSTS_VSKIPPED) expected 1 instances and got 8
[ERROR]Counter ID: 622 (SQ_INST_LEVEL_VMEM) expected 1 instances and got 8
[ERROR]Counter ID: 623 (SQ_INST_LEVEL_SMEM) expected 1 instances and got 8
[ERROR]Counter ID: 624 (SQ_INST_LEVEL_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 625 (SQ_VALU_MFMA_BUSY_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 626 (SQ_WAVE_CYCLES) expected 1 instances and got 8
[ERROR]Counter ID: 627 (SQ_WAIT_ANY) expected 1 instances and got 8
[ERROR]Counter ID: 628 (SQ_WAIT_INST_ANY) expected 1 instances and got 8
[ERROR]Counter ID: 629 (SQ_ACTIVE_INST_ANY) expected 1 instances and got 8
[ERROR]Counter ID: 630 (SQ_ACTIVE_INST_VMEM) expected 1 instances and got 8
[ERROR]Counter ID: 631 (SQ_ACTIVE_INST_LDS) expected 1 instances and got 8
[ERROR]Counter ID: 632 (SQ_ACTIVE_INST_VALU) expected 1 instances and got 8
[ERROR]Counter ID: 633 (SQ_ACTIVE_INST_SCA) expected 1 instances and got 8
[ERROR]Counter ID: 634 (SQ_ACTIVE_INST_EXP_GDS) expected 1 instances and got 8
[ERROR]Counter ID: 635 (SQ_ACTIVE_INST_MISC) expected 1 instances and got 8
[ERROR]Counter ID: 636 (SQ_ACTIVE_INST_FLAT) expected 1 instances and got 8
[ERROR]Counter ID: 637 (SQ_INST_CYCLES_VMEM_WR) expected 1 instances and got 8
[ERROR]Counter ID: 638 (SQ_INST_CYCLES_VMEM_RD) expected 1 instances and got 8
[ERROR]Counter ID: 639 (SQ_INST_CYCLES_SMEM) expected 1 instances and got 8
[ERROR]Counter ID: 640 (SQ_INST_CYCLES_SALU) expected 1 instances and got 8
[ERROR]Counter ID: 641 (SQ_THREAD_CYCLES_VALU) expected 1 instances and got 8
[ERROR]Counter ID: 642 (SQ_IFETCH) expected 1 instances and got 8
[ERROR]Counter ID: 643 (SQ_IFETCH_LEVEL) expected 1 instances and got 8
[ERROR]Counter ID: 644 (SQ_LDS_BANK_CONFLICT) expected 1 instances and got 8
[ERROR]Counter ID: 645 (SQ_LDS_ADDR_CONFLICT) expected 1 instances and got 8
[ERROR]Counter ID: 646 (SQ_LDS_UNALIGNED_STALL) expected 1 instances and got 8
[ERROR]Counter ID: 647 (SQ_LDS_MEM_VIOLATIONS) expected 1 instances and got 8
[ERROR]Counter ID: 648 (SQ_LDS_ATOMIC_RETURN) expected 1 instances and got 8
[ERROR]Counter ID: 649 (SQ_LDS_IDX_ACTIVE) expected 1 instances and got 8
[ERROR]Counter ID: 650 (SQ_ACCUM_PREV_HIRES) expected 1 instances and got 8
[ERROR]Counter ID: 651 (SQ_WAVES_RESTORED) expected 1 instances and got 8
[ERROR]Counter ID: 652 (SQ_WAVES_SAVED) expected 1 instances and got 8
[ERROR]Counter ID: 653 (SQ_INSTS_SMEM_NORM) expected 1 instances and got 8
[ERROR]Counter ID: 654 (SQC_DCACHE_INPUT_VALID_READYB) expected 1 instances and got 8
[ERROR]Counter ID: 655 (SQC_TC_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 656 (SQC_TC_INST_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 657 (SQC_TC_DATA_READ_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 658 (SQC_TC_DATA_WRITE_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 659 (SQC_TC_DATA_ATOMIC_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 660 (SQC_TC_STALL) expected 1 instances and got 8
[ERROR]Counter ID: 661 (SQC_ICACHE_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 662 (SQC_ICACHE_HITS) expected 1 instances and got 8
[ERROR]Counter ID: 663 (SQC_ICACHE_MISSES) expected 1 instances and got 8
[ERROR]Counter ID: 664 (SQC_ICACHE_MISSES_DUPLICATE) expected 1 instances and got 8
[ERROR]Counter ID: 665 (SQC_DCACHE_REQ) expected 1 instances and got 8
[ERROR]Counter ID: 666 (SQC_DCACHE_HITS) expected 1 instances and got 8
[ERROR]Counter ID: 667 (SQC_DCACHE_MISSES) expected 1 instances and got 8
[ERROR]Counter ID: 668 (SQC_DCACHE_MISSES_DUPLICATE) expected 1 instances and got 8
[ERROR]Counter ID: 669 (SQC_DCACHE_ATOMIC) expected 1 instances and got 8
[ERROR]Counter ID: 670 (SQC_DCACHE_REQ_READ_1) expected 1 instances and got 8
[ERROR]Counter ID: 671 (SQC_DCACHE_REQ_READ_2) expected 1 instances and got 8
[ERROR]Counter ID: 672 (SQC_DCACHE_REQ_READ_4) expected 1 instances and got 8
[ERROR]Counter ID: 673 (SQC_DCACHE_REQ_READ_8) expected 1 instances and got 8
[ERROR]Counter ID: 674 (SQC_DCACHE_REQ_READ_16) expected 1 instances and got 8
[ERROR]Counter ID: 675 (TA_TA_BUSY) expected 16 instances and got 128
[ERROR]Counter ID: 676 (TA_TOTAL_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 677 (TA_BUFFER_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 678 (TA_BUFFER_READ_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 679 (TA_BUFFER_WRITE_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 680 (TA_BUFFER_ATOMIC_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 681 (TA_BUFFER_TOTAL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 682 (TA_BUFFER_COALESCED_READ_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 683 (TA_BUFFER_COALESCED_WRITE_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 684 (TA_ADDR_STALLED_BY_TC_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 685 (TA_ADDR_STALLED_BY_TD_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 686 (TA_DATA_STALLED_BY_TC_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 687 (TA_FLAT_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 688 (TA_FLAT_READ_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 689 (TA_FLAT_WRITE_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 690 (TA_FLAT_ATOMIC_WAVEFRONTS) expected 16 instances and got 128
[ERROR]Counter ID: 691 (TD_TD_BUSY) expected 16 instances and got 128
[ERROR]Counter ID: 692 (TD_TC_STALL) expected 16 instances and got 128
[ERROR]Counter ID: 693 (TD_SPI_STALL) expected 16 instances and got 128
[ERROR]Counter ID: 694 (TD_LOAD_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 695 (TD_ATOMIC_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 696 (TD_STORE_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 697 (TD_COALESCABLE_WAVEFRONT) expected 16 instances and got 128
[ERROR]Counter ID: 698 (TCP_GATE_EN1) expected 16 instances and got 128
[ERROR]Counter ID: 699 (TCP_GATE_EN2) expected 16 instances and got 128
[ERROR]Counter ID: 700 (TCP_TD_TCP_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 701 (TCP_TCR_TCP_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 702 (TCP_READ_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 703 (TCP_WRITE_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 704 (TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 705 (TCP_PENDING_STALL_CYCLES) expected 16 instances and got 128
[ERROR]Counter ID: 706 (TCP_TA_TCP_STATE_READ) expected 16 instances and got 128
[ERROR]Counter ID: 707 (TCP_VOLATILE) expected 16 instances and got 128
[ERROR]Counter ID: 708 (TCP_TOTAL_ACCESSES) expected 16 instances and got 128
[ERROR]Counter ID: 709 (TCP_TOTAL_READ) expected 16 instances and got 128
[ERROR]Counter ID: 710 (TCP_TOTAL_WRITE) expected 16 instances and got 128
[ERROR]Counter ID: 711 (TCP_TOTAL_ATOMIC_WITH_RET) expected 16 instances and got 128
[ERROR]Counter ID: 712 (TCP_TOTAL_ATOMIC_WITHOUT_RET) expected 16 instances and got 128
[ERROR]Counter ID: 713 (TCP_TOTAL_WRITEBACK_INVALIDATES) expected 16 instances and got 128
[ERROR]Counter ID: 714 (TCP_UTCL1_REQUEST) expected 16 instances and got 128
[ERROR]Counter ID: 715 (TCP_UTCL1_TRANSLATION_MISS) expected 16 instances and got 128
[ERROR]Counter ID: 716 (TCP_UTCL1_TRANSLATION_HIT) expected 16 instances and got 128
[ERROR]Counter ID: 717 (TCP_UTCL1_PERMISSION_MISS) expected 16 instances and got 128
[ERROR]Counter ID: 718 (TCP_TOTAL_CACHE_ACCESSES) expected 16 instances and got 128
[ERROR]Counter ID: 719 (TCP_TCP_LATENCY) expected 16 instances and got 128
[ERROR]Counter ID: 720 (TCP_TCC_READ_REQ_LATENCY) expected 16 instances and got 128
[ERROR]Counter ID: 721 (TCP_TCC_WRITE_REQ_LATENCY) expected 16 instances and got 128
[ERROR]Counter ID: 722 (TCP_TCC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 723 (TCP_TCC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 724 (TCP_TCC_ATOMIC_WITH_RET_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 725 (TCP_TCC_ATOMIC_WITHOUT_RET_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 726 (TCP_TCC_NC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 727 (TCP_TCC_NC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 728 (TCP_TCC_NC_ATOMIC_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 729 (TCP_TCC_UC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 730 (TCP_TCC_UC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 731 (TCP_TCC_UC_ATOMIC_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 732 (TCP_TCC_CC_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 733 (TCP_TCC_CC_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 734 (TCP_TCC_CC_ATOMIC_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 735 (TCP_TCC_RW_READ_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 736 (TCP_TCC_RW_WRITE_REQ) expected 16 instances and got 128
[ERROR]Counter ID: 737 (TCP_TCC_RW_ATOMIC_REQ) expected 16 instances and got 128
Counter ID: 738 (TCA_CYCLE) expected 32 instances and got 32
Counter ID: 739 (TCA_BUSY) expected 32 instances and got 32
Counter ID: 740 (TCC_CYCLE) expected 32 instances and got 32
Counter ID: 741 (TCC_BUSY) expected 32 instances and got 32
Counter ID: 742 (TCC_REQ) expected 32 instances and got 32
Counter ID: 743 (TCC_STREAMING_REQ) expected 32 instances and got 32
Counter ID: 744 (TCC_NC_REQ) expected 32 instances and got 32
Counter ID: 745 (TCC_UC_REQ) expected 32 instances and got 32
Counter ID: 746 (TCC_CC_REQ) expected 32 instances and got 32
Counter ID: 747 (TCC_RW_REQ) expected 32 instances and got 32
Counter ID: 748 (TCC_PROBE) expected 32 instances and got 32
Counter ID: 749 (TCC_PROBE_ALL) expected 32 instances and got 32
Counter ID: 750 (TCC_READ) expected 32 instances and got 32
Counter ID: 751 (TCC_WRITE) expected 32 instances and got 32
Counter ID: 752 (TCC_ATOMIC) expected 32 instances and got 32
Counter ID: 753 (TCC_HIT) expected 32 instances and got 32
Counter ID: 754 (TCC_MISS) expected 32 instances and got 32
Counter ID: 755 (TCC_WRITEBACK) expected 32 instances and got 32
Counter ID: 756 (TCC_EA_WRREQ) expected 32 instances and got 32
Counter ID: 757 (TCC_EA_WRREQ_64B) expected 32 instances and got 32
Counter ID: 758 (TCC_EA_WR_UNCACHED_32B) expected 32 instances and got 32
Counter ID: 759 (TCC_EA_WRREQ_STALL) expected 32 instances and got 32
Counter ID: 760 (TCC_EA_WRREQ_IO_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 761 (TCC_EA_WRREQ_GMI_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 762 (TCC_EA_WRREQ_DRAM_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 763 (TCC_TOO_MANY_EA_WRREQS_STALL) expected 32 instances and got 32
Counter ID: 764 (TCC_EA_WRREQ_LEVEL) expected 32 instances and got 32
Counter ID: 765 (TCC_EA_ATOMIC) expected 32 instances and got 32
Counter ID: 766 (TCC_EA_ATOMIC_LEVEL) expected 32 instances and got 32
Counter ID: 767 (TCC_EA_RDREQ) expected 32 instances and got 32
Counter ID: 768 (TCC_EA_RDREQ_32B) expected 32 instances and got 32
Counter ID: 769 (TCC_EA_RD_UNCACHED_32B) expected 32 instances and got 32
Counter ID: 770 (TCC_EA_RDREQ_IO_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 771 (TCC_EA_RDREQ_GMI_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 772 (TCC_EA_RDREQ_DRAM_CREDIT_STALL) expected 32 instances and got 32
Counter ID: 773 (TCC_EA_RDREQ_LEVEL) expected 32 instances and got 32
Counter ID: 774 (TCC_TAG_STALL) expected 32 instances and got 32
Counter ID: 775 (TCC_NORMAL_WRITEBACK) expected 32 instances and got 32
Counter ID: 776 (TCC_ALL_TC_OP_WB_WRITEBACK) expected 32 instances and got 32
Counter ID: 777 (TCC_NORMAL_EVICT) expected 32 instances and got 32
Counter ID: 778 (TCC_ALL_TC_OP_INV_EVICT) expected 32 instances and got 32
Counter ID: 779 (TCC_EA_RDREQ_DRAM) expected 32 instances and got 32
Counter ID: 780 (TCC_EA_WRREQ_DRAM) expected 32 instances and got 32
[ERROR]Counter ID: 1893 (MeanOccupancyPerCU) expected 1 instances and got 8
[ERROR]Counter ID: 1894 (MeanOccupancyPerActiveCU) expected 1 instances and got 8
[ERROR]Counter ID: 1895 (TA_BUSY_avr) expected 16 instances and got 1
[ERROR]Counter ID: 1896 (TA_BUSY_max) expected 16 instances and got 1
[ERROR]Counter ID: 1897 (TA_BUSY_min) expected 16 instances and got 1
[ERROR]Counter ID: 1898 (TA_TA_BUSY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1899 (TA_TOTAL_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1900 (TA_ADDR_STALLED_BY_TC_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1901 (TA_ADDR_STALLED_BY_TD_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1902 (TA_DATA_STALLED_BY_TC_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1903 (TA_FLAT_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1904 (TA_FLAT_READ_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1905 (TA_FLAT_WRITE_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1906 (TA_FLAT_ATOMIC_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1907 (TA_BUFFER_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1908 (TA_BUFFER_READ_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1909 (TA_BUFFER_WRITE_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1910 (TA_BUFFER_ATOMIC_WAVEFRONTS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1911 (TA_BUFFER_TOTAL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1912 (TA_BUFFER_COALESCED_READ_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1913 (TA_BUFFER_COALESCED_WRITE_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1914 (TD_TD_BUSY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1915 (TD_TC_STALL_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1916 (TD_LOAD_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1917 (TD_ATOMIC_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1918 (TD_STORE_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1919 (TD_COALESCABLE_WAVEFRONT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1920 (TD_SPI_STALL_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1921 (TCP_GATE_EN1_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1922 (TCP_GATE_EN2_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1923 (TCP_TD_TCP_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1924 (TCP_TCR_TCP_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1925 (TCP_READ_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1926 (TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1927 (TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1928 (TCP_VOLATILE_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1929 (TCP_TOTAL_ACCESSES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1930 (TCP_TOTAL_READ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1931 (TCP_TOTAL_WRITE_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1932 (TCP_TOTAL_ATOMIC_WITH_RET_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1933 (TCP_TOTAL_ATOMIC_WITHOUT_RET_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1934 (TCP_TOTAL_WRITEBACK_INVALIDATES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1935 (TCP_UTCL1_REQUEST_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1936 (TCP_UTCL1_TRANSLATION_MISS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1937 (TCP_UTCL1_TRANSLATION_HIT_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1938 (TCP_UTCL1_PERMISSION_MISS_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1939 (TCP_TOTAL_CACHE_ACCESSES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1940 (TCP_TCP_LATENCY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1941 (TCP_TA_TCP_STATE_READ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1942 (TCP_TCC_READ_REQ_LATENCY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1943 (TCP_TCC_WRITE_REQ_LATENCY_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1944 (TCP_TCC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1945 (TCP_TCC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1946 (TCP_TCC_ATOMIC_WITH_RET_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1947 (TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1948 (TCP_TCC_NC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1949 (TCP_TCC_NC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1950 (TCP_TCC_NC_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1951 (TCP_TCC_UC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1952 (TCP_TCC_UC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1953 (TCP_TCC_UC_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1954 (TCP_TCC_CC_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1955 (TCP_TCC_CC_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1956 (TCP_TCC_CC_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1957 (TCP_TCC_RW_READ_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1958 (TCP_TCC_RW_WRITE_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1959 (TCP_TCC_RW_ATOMIC_REQ_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1960 (TCP_PENDING_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 1961 (TCA_CYCLE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1962 (TCA_BUSY_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1963 (TCC_BUSY_avr) expected 32 instances and got 1
[ERROR]Counter ID: 1964 (TCC_WRREQ_STALL_max) expected 32 instances and got 1
[ERROR]Counter ID: 1965 (TCC_CYCLE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1966 (TCC_BUSY_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1967 (TCC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1968 (TCC_STREAMING_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1969 (TCC_NC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1970 (TCC_UC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1971 (TCC_CC_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1972 (TCC_RW_REQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1973 (TCC_PROBE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1974 (TCC_PROBE_ALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1975 (TCC_READ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1976 (TCC_WRITE_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1977 (TCC_ATOMIC_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1978 (TCC_HIT_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1979 (TCC_MISS_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1980 (TCC_WRITEBACK_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1981 (TCC_EA_WRREQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1982 (TCC_EA_WRREQ_64B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1983 (TCC_EA_WR_UNCACHED_32B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1984 (TCC_EA_WRREQ_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1985 (TCC_EA_WRREQ_IO_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1986 (TCC_EA_WRREQ_GMI_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1987 (TCC_EA_WRREQ_DRAM_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1988 (TCC_TOO_MANY_EA_WRREQS_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1989 (TCC_EA_WRREQ_LEVEL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1990 (TCC_EA_RDREQ_LEVEL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1991 (TCC_EA_ATOMIC_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1992 (TCC_EA_ATOMIC_LEVEL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1993 (TCC_EA_RDREQ_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1994 (TCC_EA_RDREQ_32B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1995 (TCC_EA_RD_UNCACHED_32B_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1996 (TCC_EA_RDREQ_IO_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1997 (TCC_EA_RDREQ_GMI_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1998 (TCC_EA_RDREQ_DRAM_CREDIT_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 1999 (TCC_TAG_STALL_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2000 (TCC_NORMAL_WRITEBACK_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2001 (TCC_ALL_TC_OP_WB_WRITEBACK_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2002 (TCC_NORMAL_EVICT_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2003 (TCC_ALL_TC_OP_INV_EVICT_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2004 (TCC_EA_RDREQ_DRAM_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2005 (TCC_EA_WRREQ_DRAM_sum) expected 32 instances and got 1
[ERROR]Counter ID: 2006 (FETCH_SIZE) expected 32 instances and got 1
[ERROR]Counter ID: 2007 (WRITE_SIZE) expected 32 instances and got 1
[ERROR]Counter ID: 2008 (WRITE_REQ_32B) expected 32 instances and got 1
[ERROR]Counter ID: 2009 (CU_OCCUPANCY) expected 1 instances and got 8
Counter ID: 2010 (CU_UTILIZATION) expected 1 instances and got 1
[ERROR]Counter ID: 2011 (TOTAL_16_OPS) expected 1 instances and got 8
[ERROR]Counter ID: 2012 (TOTAL_32_OPS) expected 1 instances and got 8
[ERROR]Counter ID: 2013 (TOTAL_64_OPS) expected 1 instances and got 8
Counter ID: 2014 (AggSysCycles) expected 1 instances and got 1
Counter ID: 2015 (GpuUtil) expected 1 instances and got 1
Counter ID: 2016 (CpUtil) expected 1 instances and got 1
Counter ID: 2017 (SpiUtil) expected 1 instances and got 1
Counter ID: 2018 (TaUtil) expected 1 instances and got 1
Counter ID: 2019 (TcUtil) expected 1 instances and got 1
Counter ID: 2020 (EaUtil) expected 1 instances and got 1
[ERROR]Counter ID: 2021 (InstrFetchLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2022 (WaveOccupancy) expected 1 instances and got 8
[ERROR]Counter ID: 2023 (WaveDuration) expected 1 instances and got 8
[ERROR]Counter ID: 2024 (WaveDepWait) expected 1 instances and got 8
[ERROR]Counter ID: 2025 (WaveIssueWait) expected 1 instances and got 8
[ERROR]Counter ID: 2026 (WaveExec) expected 1 instances and got 8
[ERROR]Counter ID: 2027 (ValuIops) expected 1 instances and got 8
[ERROR]Counter ID: 2028 (MfmaFlops) expected 1 instances and got 8
[ERROR]Counter ID: 2029 (MfmaFlopsF16) expected 1 instances and got 8
[ERROR]Counter ID: 2030 (MfmaFlopsBF16) expected 1 instances and got 8
[ERROR]Counter ID: 2031 (MfmaFlopsF32) expected 1 instances and got 8
[ERROR]Counter ID: 2032 (MfmaFlopsF64) expected 1 instances and got 8
[ERROR]Counter ID: 2033 (ScaPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2034 (ValuPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2035 (VmemPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2036 (MfmaUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2037 (AvgNumActiveThreads) expected 1 instances and got 8
[ERROR]Counter ID: 2038 (VmemLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2039 (SmemLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2040 (LdsUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2041 (LdsPipeIssueUtil) expected 1 instances and got 8
[ERROR]Counter ID: 2042 (LdsLatency) expected 1 instances and got 8
[ERROR]Counter ID: 2043 (LdsBankConflict) expected 1 instances and got 8
[ERROR]Counter ID: 2044 (L1iCacheHitRate) expected 1 instances and got 8
[ERROR]Counter ID: 2045 (sL1dCacheHitRate) expected 1 instances and got 8
[ERROR]Counter ID: 2046 (vL1dBufCoalesceRate) expected 16 instances and got 1
[ERROR]Counter ID: 2047 (vL1dCacheUtil) expected 16 instances and got 1
[ERROR]Counter ID: 2048 (vL1dCacheTcbHitRate) expected 16 instances and got 1
[ERROR]Counter ID: 2049 (vL1dCacheWaveLatency) expected 16 instances and got 1
[ERROR]Counter ID: 2050 (vL1dReadFromL2Latency) expected 16 instances and got 1
[ERROR]Counter ID: 2051 (vL1dWriteToL2Latency) expected 16 instances and got 1
[ERROR]Counter ID: 2052 (vL1dRdTagConfStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2053 (vL1dWrTagConfStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2054 (vL1dAtomicTagConfStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2055 (vL1dMissReqStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2056 (vL1dDataPendRate) expected 16 instances and got 1
[ERROR]Counter ID: 2057 (vL1dDataRetStallRate) expected 16 instances and got 1
[ERROR]Counter ID: 2058 (L2CacheHitRate) expected 32 instances and got 1
[ERROR]Counter ID: 2059 (L2CacheTagRamStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2060 (EaRdLatency) expected 32 instances and got 1
[ERROR]Counter ID: 2061 (EaRdIoStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2062 (EaRdGmiStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2063 (EaRdDramStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2064 (EaWrLatency) expected 32 instances and got 1
[ERROR]Counter ID: 2065 (EaWrIoStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2066 (EaWrGmiStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2067 (EaWrDramStallRate) expected 32 instances and got 1
[ERROR]Counter ID: 2068 (EaWrStarveRate) expected 32 instances and got 1
[ERROR]Counter ID: 2069 (EaAtomicLatency) expected 32 instances and got 1
[ERROR]Counter ID: 2070 (TCP_TCP_TA_DATA_STALL_CYCLES_sum) expected 16 instances and got 1
[ERROR]Counter ID: 2071 (TCP_TCP_TA_DATA_STALL_CYCLES_max) expected 16 instances and got 1
[ERROR]Counter ID: 2072 (VFetchInsts) expected 16 instances and got 8
[ERROR]Counter ID: 2073 (VWriteInsts) expected 16 instances and got 8
[ERROR]Counter ID: 2074 (FlatVMemInsts) expected 1 instances and got 8
[ERROR]Counter ID: 2075 (LDSInsts) expected 1 instances and got 8
[ERROR]Counter ID: 2076 (FlatLDSInsts) expected 1 instances and got 8
[ERROR]Counter ID: 2077 (VALUUtilization) expected 1 instances and got 8
[ERROR]Counter ID: 2078 (VALUBusy) expected 1 instances and got 8
[ERROR]Counter ID: 2079 (SALUBusy) expected 1 instances and got 8
[ERROR]Counter ID: 2080 (FetchSize) expected 32 instances and got 1
[ERROR]Counter ID: 2081 (WriteSize) expected 32 instances and got 1
[ERROR]Counter ID: 2082 (MemWrites32B) expected 32 instances and got 1
[ERROR]Counter ID: 2083 (L2CacheHit) expected 32 instances and got 1
[ERROR]Counter ID: 2084 (MemUnitStalled) expected 16 instances and got 1
[ERROR]Counter ID: 2085 (WriteUnitStalled) expected 32 instances and got 1
[ERROR]Counter ID: 2086 (LDSBankConflict) expected 1 instances and got 8

* source formatting (clang-format v11) (#225)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#224)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor fixes

* source formatting (clang-format v11) (#226)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor test change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2023-11-17 01:49:51 -08:00
Jonathan R. Madsen ca296ff22b Remove _service from rocprofiler_service_* types (#221)
- this is a continuation of #168 which removed _SERVICE from the ROCPROFILER_SERVICE_ enums
2023-11-16 04:44:50 -06:00
Jonathan R. Madsen cf5e4b4b1b Integration Testing (#211)
* Add external/cereal submodule

- used for integration testing

* Update lib/common/container/small_vector.hpp

- documentation notes

* Update tests/apps

- update transpose app (fix build)
- add reproducible-runtime app

* Update include/rocprofiler/fwd.h

- rocprofiler_service_callback_phase_t -> rocprofiler_callback_phase_t

* Update PTL submodule

- fix for task group: submitting tasks from different thread

* Update lib/rocprofiler/hsa/queue.cpp

- CHECK_NOTNULL(_buffer)

* Update lib/rocprofiler/hsa/hsa.cpp

- use buffer::get_buffer instead of manually looking for buffer

* Update lib/rocprofiler/internal_threading.cpp

- use buffer::get_buffer instead of manually looking for buffer

* Update lib/rocprofiler/buffer.cpp

- offset the buffer id
- properly handle rocprofiler_create_buffer reusing rocprofiler_buffer_id_t on a different context

* Update tests

- kernel tracing library for integration testing

* Add cereal submodule

* Update lib/rocprofiler/registration.*

- OnUnload
- Support ROCP_TOOL_LIBRARIES for python usage
- improve finalize function
- remove calling hsa_shut_down in finalize function

* Update lib/rocprofiler/buffer.*

- allocate_buffer sets the buffer id value
- expose (internally) is_valid_buffer_id
- update test

* Update tests/kernel-tracing

- installation
- better organization of JSON groups
- improved messaging

* Update lib/rocprofiler/registration.cpp

- add workaround for hsa-runtime supporting rocprofiler-register

* Update tests/kernel-tracing/kernel-tracing.cpp

- fix memory leaks

* cereal support for minimal JSON

- update cereal submodule to rocprofiler branch
- change REPO_BRANCH in rocprofiler_checkout_git_submodule for cereal
- update tests/kernel-tracing/kernel-tracing.cpp
  - use minimal json
  - slight tweak putting giving contexts name in storing name + context pointer pair in map

* Update tests/kernel-tracing/kernel-tracing.cpp

- support runtime selection of contexts via KERNEL_TRACING_CONTEXTS environment variable

* Update tests

- tests/CMakeLists.txt
  - find_package(Python3 REQUIRED)
- tests/kernel-tracing
  - pytest validation

* Update CI workflow

- install pytest
- add checks for test labels

* Update scripts/run-ci.py

- change --coverage options
  - replace 'unittests' with 'tests'
- replace test label regex '-L unittests' with '-L tests'

* Update requirements.txt

- this is now an empty file since none of the packages are required for this repo
2023-11-16 03:21:39 -06:00
Jonathan R. Madsen 086218c2eb Fixes licensing in files (#206)
* Update LICENSE

- fix inconsistencies

* Revert lib/rocprofiler/counters/parser/scanner.cpp

* Update lib/rocprofiler/counters/tests/dimension.cpp

- revert ending curly brace

* Revert missing curly braces

- missing curly braces when file did not end with a new line
2023-11-14 10:58:33 -06:00
Jonathan R. Madsen 3082288a25 Code object, kernel dispatch, and memory copy tracing (#177)
* Update samples/api_buffered_tracing

- external correlation id
- support ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH

* Update lib/rocprofiler/context.cpp

- update alternative get_active_contexts paradigm

* Update lib/rocprofiler/external_correlation.cpp

- inherit correlation id from main thread

* Update lib/rocprofiler/hsa/queue.*

- typedef changes
- rocprofiler_packet union
- modify Queue::queue_info_session_t
  - use rocprofiler_packet
  - add thread id
  - add kernel id
  - add correlation id
- out of line definitions
- AsyncSignalHandler function update
  - handle kernel dispatch tracing
- Move CreateBarrierPacket and AddVendorSpecificPacket to lambdas
- handle contexts

* Update lib/rocprofiler/hsa/hsa.cpp

- remove unnecessary log function
- use new get_active_contexts paradigm
- use new correlation id updates

* Update AgentCache and kernel dispatch record

- include const rocprofiler_agent_t* in rocprofiler_buffer_tracing_kernel_dispatch_record_t
- AgentCache::get_rocp_agent returns const pointer

* Replace ROCPROFILER_SERVICE_ with ROCPROFILER_

* source formatting

* Code Object Tracing

- include/rocprofiler/callback_tracing.h
  - remove rocprofiler_callback_tracing_code_object_unload_data_t
  - remove rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t
- include/rocprofiler/fwd.h
  - remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_UNLOAD
  - remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_UNREGISTER
- lib/common/utility.hpp
  - assert_public_api_struct_properties()
  - init_public_api_struct(...)
- lib/rocprofiler/registration.cpp
  - invoke hsa::code_object_init
- lib/rocprofiler/hsa/CMakeLists.txt
  - compile code_object code
- lib/rocprofiler/hsa/code_object.{hpp,cpp}
  - tracing code object load/unload
- lib/rocprofiler/hsa/queue.cpp
  - get_kernel_id

* Update lib/rocprofiler/hsa/hsa.cpp

- fix should_wrap_functor logic (which was not handling callback_tracer + buffered_tracer properly)

* Update lib/rocprofiler/hsa/queue.cpp

- fix rocprofiler_buffer_tracing_kernel_dispatch_record_t construction

* Update samples/api_buffered_tracing/client.cpp

- print kernel names

* Move samples/apps to tests/apps

* Update lib/rocprofiler/hsa/code_object.cpp

- ensure unload callbacks when application is exiting
- support user data in between load/unload callbacks

* Update lib/rocprofiler/hsa/queue.{hpp,cpp}

- store contexts and external correlation ids in queue_info_session
- reduce signal_limiter to 96 to fix hangs
- fix support for kernel tracing and async memory copies

* Add lib/common/scope_destructor.hpp

- similar to static_cleanup_wrapper but different

* Update include/rocprofiler/buffer_tracing.h

- update rocprofiler_buffer_tracing_memory_copy_record_t
- remove operation: user can figure that out from correlation id
- add kernel id
- add rocprofiler agent id

* Update include/rocprofiler/callback_tracing.h

- fix data type of load_delta field in code object
- remove rocp_agent from kernel_symbol_register_data_t (known via code_object_id)

* Add samples/code_object_tracing

- sample demonstrating code object tracing

* Update samples

- minor tweak to print_call_stack

* Update lib/rocprofiler/hsa/code_object.cpp

- flip ordering of unload callbacks for code object unloading and kernel symbol deregistering

* clang-tidy fixes

* Update lib/rocprofiler/hsa/code_object.cpp

- fix heap-use-after-free issue with code object

* Update include/rocprofiler/external_correlation.h

- update documentation to include info about default value of external correlation value

* Use common::container::small_vector for contexts

- small_vector<const context*> is an ideal data structure for array of active contexts

* Update context handling for code object unload

- code object unload is only called for contexts which received the load callback

* Update samples

- improve ROCPROFILER_CALL macro to include status string
- api_buffered_tracing handles ROCPROFILER_STATUS_ERROR_BUFFER_BUSY

* Code object shutdown

- ensure code object callbacks are invoked prior to finalizing

* Update lib/common (memory allocators)

- added lib/common/memory folder with allocators

* Add lib/rocprofiler/allocator.*

- rocprofiler::allocator::static_data_allocator
  - special allocator for static data which finalizes before any data gets destroyed
- rocprofiler::allocator::unique_static_ptr_t
  - unique_ptr that uses static data deleter (ensure finalize is called)

* Update lib/rocprofiler/buffer.cpp

- flush checks fini status
- use unique_static_ptr_t

* Update lib/rocprofiler/internal_threading.*

- change meaning of thread_pool_t and task_group_t
- improve finalization to prevent data races and heap-use-after-free

* Update lib/rocprofiler/registration.cpp

- use static_data_allocator for client_library vector

* Update lib/rocprofiler/context/context.*

- use allocator::unique_static_ptr_t

* Update lib/rocprofiler/allocator.cpp

- avoid deadlock in deleter<static_data>::operator()

* Update lib/rocprofiler/registration.cpp

- avoid deadlock in rocprofiler::registration::finalize()

* Update lib/rocprofiler/hsa/code_object.cpp

- suppress duplicate reporting of code-object/kernel-symbol load/unload

* Update leak sanitizer suppressions

- __new_exitfn (via stdlib/cxa_atexit.c leaks
2023-11-13 22:30:15 -06:00
Jonathan R. Madsen 55f2dabbb3 Generalized updates (#174)
- include/rocprofiler/agent.h
  - move rocprofiler_dim3_t
- include/rocprofiler/buffer_tracing.h
  - size fields
  - update kernel dispatch record
- include/rocprofiler/callback_tracing.h
  - remove rocprofiler_callback_tracing_code_object_unload_data_t
  - remove rocprofiler_callback_tracing_code_object_register_host_kernel_symbol_data_t
- include/rocprofiler/fwd.h
  - added ROCPROFILER_STATUS_ERROR_CONTEXT_CONFLICT
  - remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_UNLOAD
  - remove ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_UNREGISTER
  - add rocprofiler_kernel_id_t typedef
  - add rocprofiler_dim3_t (moved from agent.h)
- lib/common/synchronized.hpp
  - rlock/wlock return decltype(auto)
  - separate prototype from definition
- lib/common/utility.{hpp,cpp}
  - timestamp functions replicating HSA implementation(s)
  - init_public_api_struct for setting size field and ensuring certain type traits
  - simplified static_cleanup_wrapper
  - separate prototype from definition in active_capacity_gate
- lib/rocprofiler/agent.cpp
  - tweak get_rocprofiler_agent impl
- lib/rocprofiler/buffer.cpp
  - fix buffer message log level
- lib/rocprofiler/context.cpp
  - use new paradigm for getting active contexts
- lib/rocprofiler/internal_threading.hpp
  - update to simplified static_cleanup_wrapper implementation
- lib/rocprofiler/registration.cpp
  - fix deactivating contexts
- lib/rocprofiler/rocprofiler.cpp
  - status string for context conflict
- lib/rocprofiler/context/context.*
  - correlation_id struct
  - new get_active_contexts paradigm
- lib/rocprofiler/counters/core.*
  - rocprofiler_packet union
  - tweak start/stop context to accept pointer instead of handle
- lib/rocprofiler/counters/dimensions.cpp
  - update to new get_rocp_agent() return type
- lib/rocprofiler/hsa/hsa.*
  - update to new get_active_contexts paradigm
  - update to new correlation id implementation
  - guard against hsa.def.cpp direct compilation
- lib/rocprofiler/hsa/queue_controller.*
  - update to change in get_rocp_agent return type
  - consistent aliases
  - lookup function for getting queue pointer from hsa queue id
- lib/rocprofiler/hsa/queue.*
  - rocprofiler_packet
  - extend queue_info_session_t
- lib/rocprofiler/tests/registration.cpp
  - improve diagnostic on perf check for rocprofiler_lib.callback_registration_lambda_with_result
2023-11-06 21:59:31 -06:00
Saurabh Verma 63775f241a Evaluation portion for metrics (#123)
* EvaluateAST and validation of RawAST

* Adding MetricDimension class and concepts

* set_dimensions() and improved ValidateRawAST()

* source formatting (clang-format v11) (#124)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Addressing 1st round of review comments

* Modified the parser production rules to support the right syntax for REDUCE and SELECT derived metric expressions

* changes to raw_ast.hpp and fmt::format()

* Parser tests updated to support corrected REDUCE and SELECT syntax

* changes to EvaluateAST::set_dimensions() and other dimension related code changes

* Added a test for EvaluateAST::evaluate() to test basic arithmetic on EvaluateAST

* Format source code (via clang-format v11) on sauverma/evaluate-ast (#146)

* source formatting (clang-format v11)

* Add dimension information to counter record

Restructures counter records to have the following design:

rocprofiler_record_id_t which is an int64_t that encodes
both the counter id and dimension information for the
record. The first 16 bits are reserved for the counter id while
the last 48 are split among the dimensions specified in
rocprofiler_dimension_t (currently 8 bits per dimension).
Each of the 8 bits for the dimension stores the dimension
value for that dimension for this record (i.e. a value of 8
on dimension XCC would denote XCC[8] for the counter). The
split among the dimensions will automatically adjust as
dimensions are added or removed.

The record also contains a union of {int64_t hw_counter, double
derived_counter} to specify the value of the record at
rocprofiler_record_id_t. int64_t denotes a physical hardware
counter that has integer types while the double is used for derived
counters (which type this counters values are needs to be queried
separately).

* Integration of new id type + other fixes

---------

Co-authored-by: sauverma93 <sauverma93@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

* Fixed sissues with reduce() implementation and added a test for reduce()

* Updated parser syntax for reduce() and updated the parser test. Disabled the test for select()

* Build warning fixes

* Modifications to support fetching xcc/etc info from agent

* Initial plumbing working for single counters, cleanup+tests still needed

* Remove string comparison from reduce ops

* source formatting (clang-format v11) (#163)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#164)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* source formatting (clang-format v11) (#171)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Merged with master

* source formatting (clang-format v11) (#172)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* source formatting (clang-format v11) (#173)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Test fix

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
Co-authored-by: sauverma93 <sauverma93@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2023-11-03 21:10:40 -07:00
Jonathan R. Madsen 4f2dc896d3 Support Tool Intercept API Tables (#165)
* Update include/rocprofiler

- intercept_table.h header
- generic rocprofiler_runtime_library_t
- rocprofiler_internal_thread_library_t is not typedef for rocprofiler_runtime_library_t
- rocprofiler_at_runtime_api_registration

* Update lib/rocprofiler

- minor tweaks to context.cpp
  - check if none context early
  - disallow stop_context when finalizing
- add intercept_table.hpp and intercept_table.cpp
  - implement rocprofiler_at_runtime_api_registration
  - implement notify_runtime_api_registration
- update registration.cpp
  - invoke notify_runtime_api_registration
  - tweak to fini status when invoking client finalizer

* Update lib/rocprofiler/tests

- add tests for intercept table

* Add samples/intercept_table

- demonstrate how to install custom API function wrappers instead of relying on HSA callback tracing

* Update lib/rocprofiler/tests/intercept_table.cpp

- remove _SERVICE from ROCPROFILER_SERVICE_

* Update include/rocprofiler/intercept_table.h

- Update doxygen comments

* Update lib/rocprofiler/intercept_table.cpp

- return error config locked if already initialized

* Update lib/rocprofiler/intercept_table.cpp

- remove unnecessary alias

* Apply suggestions from code review

Co-authored-by: Tony Tye <Tony.Tye@amd.com>

* Update doxygen comments

- clarify when rocprofiler_at_runtime_api_registration can be invoked

* Use rocprofiler_runtime_library_t for intercept table and internal threading

- remove rocprofiler_intercept_library_t alias to rocprofiler_runtime_library_t
- remove rocprofiler_internal_thread_library_t alias to rocprofiler_runtime_library_t
- move around documentation with regard to rocprofiler_runtime_library_t enumeration
- added some extra doxygen documentation to internal threading functions

---------

Co-authored-by: Tony Tye <Tony.Tye@amd.com>
2023-11-02 19:10:10 -05:00
Jonathan R. Madsen 14373c57be Doxygen Improvements (#170)
* Doxygen updates

- Fix multiple @param where [in]/[out] was misplaced
- Fix @return
- Insert @retval
- Separate out installing conda environment from build docs step
2023-11-01 15:31:15 -05:00
Jonathan R. Madsen 033fd941e0 Remove SERVICE_ from ROCPROFILER_SERVICE_* enum vals (#168)
- these are unnecessary and are inconsistent with the name convention of everything else related to callback tracing
2023-10-31 15:06:03 -05:00
Jonathan R. Madsen cfbea0e5eb Update include/rocprofiler and lib/rocprofiler (#166)
- renamed inconsistent callback tracing types
- updated HIP and Marker API data structures (resemble HSA)
- cleaned up api_args.h and api_id.h headers
- cleaned up hsa.h, hip.h, and marker.h headers
- update to use (more consistent) name changes
- update code object data structs
- ROCPROFILER_SERVICE_CALLBACK_PHASE_{LOAD,UNLOAD} equivalent to ENTER, EXIT respectively
2023-10-31 12:48:24 -05:00
Jonathan R. Madsen 7f631de401 Separate agent cache from queue controller (#145)
* Update lib/rocprofiler/agent.{hpp,cpp}

- get_agents() function for internal access to agent pointers

* Update AgentCache

- make member variables and member functions distinguish b/t hsa agent and rocprofiler agent clear

* Change ctor of AgentCache

* Update lib/rocprofiler/hsa/queue_controller.cpp

- QueueController::init uses agent::get_agent_cache

* Update lib/rocprofiler/hsa/agent_cache.*

- member function to get index
- operator== for rocprofiler_agent_t and hsa_agent_t
- removed hsa_iterate_agents from ctor (now in agent.cpp)

* Update lib/rocprofiler/agent.*

- construct_agent_cache function
- functions for rocprofiler agent <-> HSA agent
- functions for getting agent cache

* Update lib/rocprofiler/registration.cpp

- invoke construct_agent_cache when HSA table is receieved

* Update lib/rocprofiler/agent.cpp

- loosen failure conditions
- handle spurious duplicate entry warning

* Update lib/rocprofiler/agent.cpp

- improve read_map diagnostics

* Update lib/rocprofiler/agent.cpp

- avoid infinite loop in read_map

* Update lib/rocprofiler/agent.cpp

- handle empty kfd node properties file

* Update lib/rocprofiler/agent.cpp

- check for permissions to read a node properties file

* Update lib/rocprofiler/agent.cpp

- more checks on file readability

* Update lib/rocprofiler/tests/agent.cpp

- print virtual kfd topology

* Update lib/rocprofiler/tests/agent.cpp

- verify id.handle == hsa_agent internal node id

* Update lib/rocprofiler/tests/agent.cpp

- check node_id
- check location id
- check device id
- update abi test

* Update include/rocprofiler/agent.h

- add node_id field
- add reserved0 field to ensure new field increases struct size

* Update lib/rocprofiler/agent.cpp

- node_id instead of id.handle

* Update lib/rocprofiler/agent_cache.cpp

- node_id instead of id.handle

* Update samples/pc_sampling

- node_id for agent instead of id.handle

* Update lib/rocprofiler/buffer.cpp

- remove debug prints
2023-10-19 19:04:02 -05:00
Jonathan R. Madsen 87cc748c3d Query callback and buffered tracing names (#135)
* Update include/rocprofiler/buffer_tracing.h

- add query functions for kind name, and kind operation name
- tweak iterate functions to not be specifically dedicated to names

* Update include/rocprofiler/callback_tracing.h

- add query functions for kind name, and kind operation name
- tweak iterate functions to not be specifically dedicated to names

* Update lib/rocprofiler/callback_tracing.cpp

- implement rocprofiler_query_callback_tracing_kind_name
- implement rocprofiler_query_callback_tracing_kind_name_buf
- implement rocprofiler_query_callback_tracing_kind_operation_name
- implement rocprofiler_query_callback_tracing_kind_operation_name_buf
- implement rocprofiler_iterate_callback_tracing_kinds
- implement rocprofiler_iterate_callback_tracing_kind_operations

* Update lib/rocprofiler/buffer_tracing.cpp

- implement rocprofiler_query_buffer_tracing_kind_name
- implement rocprofiler_query_buffer_tracing_kind_name_buf
- implement rocprofiler_query_buffer_tracing_kind_operation_name
- implement rocprofiler_query_buffer_tracing_kind_operation_name_buf
- implement rocprofiler_iterate_buffer_tracing_kinds
- implement rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler/tests/registration.cpp

- use new implementation for getting callback/buffer tracing names

* Update samples/api_buffered_tracing

- use new implementation for getting callback/buffer tracing names

* Update samples/api_callback_tracing

- use new implementation for getting callback/buffer tracing names

* Remove buffered query functions

- *_buf variants of the rocprofiler_query_X_tracing_Y functions were removed since we currently have no names requiring these functions

* Rename ROCPROFILER_STATUS_ERROR_DOMAIN_NOT_FOUND

- "DOMAIN" changed to "KIND" since former is more specific tracing whereas kind is used more generically
2023-10-19 15:21:07 -05:00
Jonathan R. Madsen 6a3f79e626 Update correlation id definition + status strings + const active contexts (#127)
* Update include/rocprofiler

- remove rocprofiler_external_correlation_id_t
- redefine rocprofiler_correlation_id_t to include internal id and external user data
- associate rocprofiler_push_external_correlation_id and rocprofiler_pop_external_correlation_id with a context

* Update include/rocprofiler/rocprofiler.h

- rocprofiler_get_status_name
- rocprofiler_get_status_string

* Update lib/rocprofiler/rocprofiler.cpp

- implement rocprofiler_get_status_name and rocprofiler_get_status_string

* Update lib/rocprofiler/tests/status.cpp

- unit test for status string and name

* Update lib/rocprofiler/tests/registration.cpp

- update to new rocprofiler_correlation_id_t

* Update samples

- update to new rocprofiler_correlation_id_t

* Add lib/rocprofiler/external_correlation.cpp

- placeholder for external correlation push/pop

* Update lib/rocprofiler/hsa/agent_cache.cpp

- slight tweak to when HSA_AMD_AGENT_INFO_NEAREST_CPU is defined

* Update context implementation and hsa.cpp

- get_active_contexts is array of const context pointers
- update hsa_api_impl<Idx>::functor to new rocprofiler_correlation_id_t

* Update include/rocprofiler/fwd.h

- add ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT
- reorder enum for consistency

* Update include/rocprofiler/external_correlation.h

- doxygen comments
- thread id parameter

* Update include/rocprofiler/rocprofiler.h

- add rocprofiler_get_thread_id function (needed for external corr id)

* Update lib/common/synchronized.hpp

- explicit LockedType
- define all copy/move ctor and assignment
- update rlock/wlock/ulock to support arguments and return values
- Support additional template parameter for special case of synchronized instance which is the mapped type of a sychronized map

* Update lib/rocprofiler/external_correlation.cpp

- implement rocprofiler_{push,pop}_external_correlation_id

* Update lib/rocprofiler/CMakeLists.txt

- external_correlation.hpp

* Update lib/rocprofiler/rocprofiler.cpp

- status string for ROCPROFILER_STATUS_ERROR_INVALID_ARGUMENT
- implement rocprofiler_get_thread_id

* Update lib/rocprofiler/tests (external correlation)

- add external_correlation unit tests

* Update include/rocprofiler/callback_tracing.h

- doxygen comments
- callback invoked in callback tracing has user_data pointer passed to it

* Update samples/api_callback_tracing/client.cpp

- add rocprofiler_user_data_t* to tool_tracing_callback

* Update lib/rocprofiler/tests/registration.cpp

- add rocprofiler_user_data_t* to tool_tracing_callback

* Update lib/rocprofiler/context/context.{hpp,cpp}

- update correlation_tracing_service
  - external_correlation instance
  - rename get_unique_record_id to get_unique_internal_id

* Update lib/tests/common/demangling.cpp

- tweak mangled definitions due to changing function get_unique_record_id to get_unique_internal_id

* Update lib/rocprofiler/hsa/hsa.cpp

- handle updates to external correlation id
- handle updates to callback signature in callback tracing

* Update CMakeLists.txt

- CMAKE_BUILD_TYPE=Coverage defines CODECOV=1

* Update samples/api_callback_tracing/client.cpp
2023-10-18 13:59:41 -05:00
Jonathan R. Madsen d1518c65b2 Miscellaneous Updates (const-correctness, logic fixes, etc.) (#126)
* Update lib/rocprofiler/hsa/hsa.cpp

- fix logic for constructing callback_contexts and buffered_contexts arrays

* Update include/rocprofiler/{agent,fwd,pc_sampling}.h

- remove rocprofiler_pc_sampling_config_array_t due to const problems
- update rocprofiler_agent_t to use arrays to const data
- remove redundant rocprofiler_query_pc_sampling_agent_configurations
  - this implementation is quite literally looking up info in the agent struct that was passed

* Update lib/rocprofiler/pc_sampling.cpp

- remove rocprofiler_query_pc_sampling_agent_configurations

* update lib/rocprofiler/agent.cpp

- handle const fields
- make mi200_pc_sampling_config variable static

* Update lib/rocprofiler/tests/agent.cpp

- tweak to pc_sampling_configs offset

* Update samples/pc_sampling

- Update sample to reflect minor tweaks to pc_sampling_configs in rocprofiler_agent_t

* Update CI workflow

- remove 'if: ${{ always() }}'
  - I suspect this is why the jobs do not cancel in progress correctly
2023-10-17 00:39:41 -05:00
Benjamin Welton 010693b795 Agent, Counters, and AQL (#55)
* Migrate XML counter defs and reader from v1/v2

* Current Working Set

* Modified parser

* Evaluate AST Start

* Update lib/common/xml

- move definitions out of class declaration

* Update lib/rocprofiler/counters/parser

- update build of bison and flex build
  - reproducible generation
- add ROCPROFILER_REGENERATE_COUNTERS_PARSER option
- fix namespacing

* Update lib/rocprofiler/counters/xml

- change location of XML files and install them

* Update lib/rocprofiler/counter/tests

- normalize the test names
- improve test failures (more clear about where failure is)

* Update lib/rocprofiler/counters

- fix namespace
- update to new XML metrics directory

* Update lib/rocprofiler/CMakeLists.txt

- link to object library

* Update lib/rocprofiler/hsa/types.hpp

- reorganize includes

* Add metric loading class/printers

* Agent Implementation

* Queue Implementation (#79)

* Queue Implementation

* API Implementation For Counters (part 1) (#80)

* API Implementation For Counters

* Bewelton/counter collection 3 (#84)

* Added counter sample

* More changes

* More changes

* Update samples/counter_collection

- mostly formatting

* Update include/rocprofiler/counters.h

- formatting

* Add lib.common/synchronized.hpp

- Synchronized struct

* Update lib/rocprofiler/counters/xml/basic_counters.xml

- whitespace

* Update scripts/patch-parser.cmake

- tweaks for consistency

* Update lib/rocprofiler/counters/parser/tests/parser_tests.cpp

- formatting

* Update lib/rocprofiler/counters/parser

- improve consistency in rocprofiler-expr-parser-patch
- update parser.{h,cpp} and scanner.cpp
  - formatting + regenerated

* Update lib/rocprofiler/aql

- formatting
- clang-tidy fixes
- guard against memory pool access errors

* Update lib/rocprofiler/aql/tests

- formatting
- update use of get_val
- normalize test names

* Update lib/rocprofiler/counters/tests

- formatting
- patch basic_counters and derived_counters
- normalize test names

* Update lib/rocprofiler/aql/tests

- set_tests_properties

* Update test labels

- fix minor issue with gtest labels

* Update lib/rocprofiler/counters

- formatting
- clang-tidy fixes

* Update lib/rocprofiler/hsa

- fix includes
- formatting
- clang-tidy fixes
- tweak to queue_controller_init interface

* Update lib/rocprofiler

- include fixes
- namespace fixes
- clang-tidy fixes
- formatting

* Update scripts/run-ci.py

- exclude counters/parser from code coverage (generated files)

* Update include/rocprofiler/counters.h

- fix doxygen comment

* Update lib/rocprofiler/aql/packet_construct.cpp

- guard against HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT and HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED

* Update lib/rocprofiler/counters/parser/raw_ast.hpp

- clang-tidy fixes

* Update lib/rocprofiler/counters/evaluate_ast.hpp

- clang-tidy fixes

* Update lib/rocprofiler/aql/tests

- disable packet_generation_single and packet_generation_multi tests
  - the entire implementation rocprofiler::get_ext_table() is incorrect

* Minor fixes before cleanup

* More changes

* More fixes

* More fixes

* source formatting (clang-format v11) (#99)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Revert PTL submodule

* Update scripts/run-ci.py

- exclude counters/parser from code coverage (generated files)

* Migrating counters state to context

* Linting

* source formatting (clang-format v11) (#101)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* revert run-ci

* Testing fixes

* More test changes

* Fix minor typo

* Small queue change

* Small queue change

* source formatting (clang-format v11) (#102)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* source formatting (clang-format v11) (#105)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Documentation Change

* More documentation fixes

* source formatting (clang-format v11) (#106)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Threading fixes

* Threading fixes

* source formatting (clang-format v11) (#107)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Threading fixes

* More test fixes

* More agent fixes

* More build fixes

* source formatting (clang-format v11) (#109)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* changed test timeouts

* Build fix

* Build fix

* Updates to agent

* source formatting (clang-format v11) (#114)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#113)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* remove git worktree folder

* Doc update

* testing fix

* Another test fix

* More test changes

* Rebase

* source formatting (clang-format v11) (#116)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Documentation

* source formatting (clang-format v11) (#119)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* PTL Changes

* Minor agent fix for empty labels

* source formatting (clang-format v11) (#120)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor agent fix for empty labels

* Refactor read_map

* source formatting (clang-format v11) (#121)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Refactor read_map

* Cache fixes

* source formatting (clang-format v11) (#122)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2023-10-16 15:41:40 -05:00