Commit Graph

89 Commits

Author SHA1 Message Date
Jonathan R. Madsen 5eb8c2658c rocprofv3: refactor and reorganize rocprofiler-sdk-tool library (#1138)
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool

* Initial source re-organization

- create "output" static library

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- add GPR count fields to kernel symbol serialization

* Add source/scripts/generate-rocpd.py

- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation

* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output

* Updates to generate-rocpd.py

- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names

* Update generate-rocpd.py

- Add --marker-mode option

* Update generate-rocpd.py

- Improve debugging of bad bulk SQLite statements

* Update rocprofv3-multi-node.md

- cleanup of proposed SQL schema

* lib/output/format_path.{hpp,cpp}

- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}

* Rework lib/output/tmp_file_buffer.{hpp,cpp}

* Update output_key.cpp

- support %cwd%, %launch_date%

* Rework lib/output/buffered_output.hpp

* Support csv_output_file constructed via domain_type

* Update lib/output/domain_type.{hpp,cpp}

- get_domain_trace_file_name
- get_domain_stats_file_name

* Update lib/rocprofiler-sdk-tool/tool.cpp

- tweak headers

* Update lib/output/generate*.cpp

- remove include of helpers.hpp
- CSV uses domain_type for filenames

* Update samples/counter_collection/per_dev_serialization.cpp

- make wait_on volatile

* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool

- Also split various structs into their own files
  - lib/output/agent_info
  - lib/output/metadata
  - lib/output/kernel_symbol_info
  - lib/output/counter_info
- Implemented rocprofiler::tool::metadata

* Optimize rocprofiler_tool_counter_collection_record_t

- reduce the size of the struct from 24784 bytes to 8376 bytes

* Introduced output_config

- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory

* Stream chunks of data into output instead of loading all info memory

* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization

* Adding Q&A to rocprofv3-multi-node.md

* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output

- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output

* Update Q&A of rocprofv3-multi-node.md

* Fix minor compilation errors + minor cleanup

* Update hsa/async_copy.cpp

- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time

* Update profiling_time.hpp

- fix log messages for when start/end time is less/greater than enqueue/current CPU time

* Fix generate_stats for tool_counter_record_t

* Dictionary optimization for generate-rocpd.py

---------

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
2024-11-07 01:15:19 -06:00
Benjamin Welton 4a5b1d98c2 SDK: counter collection serialization per device (#1157)
Migrates profiler_serializer class in QueueController to have an instance per-agent instead of one globally. Other changes in this commit are to allow for maps of the queues associated with each agent to be passed to profiler_serializer when it is turned on/off. Existing test cases cover whether or not the kernels are serialized (multistream app). New test case added to show that this serialization only occurs on a per device level with a kernel launched on one device waiting for a value to be set on the other.
2024-10-25 13:13:36 -07:00
Jonathan R. Madsen 74facf87a6 CMake: Consistently name CMake Targets (#1082)
* Change all rocprofiler-X target names to rocprofiler-sdk-X

* Update rocprofiler-sdk-config.cmake

- fix install tree target names
- simplify logic for using find w/ components and find w/o components

* Update rocprofiler-sdk-roctx-config.cmake

- simplify logic for using find w/ components and find w/o components

* Update samples/intercept_table/CMakeLists.txt

- demonstrate/test use of `find_package(rocprofiler-sdk ... COMPONENTS ...)`
2024-10-25 11:17:34 -05:00
venkat1361 3f91d90bbc Check to force tools to initialize the ctx id to zero. (#1135)
* Check to force tool to initialize the ctx id to zero.

* initialize rocprofiler_context_id_t with 0 in units tests

* changelog

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
2024-10-22 18:09:25 +05:30
Benjamin Welton 210762c69d Added agent_id to rocprofiler_record_counter_t (#1078)
Co-authored-by: Benjamin Welton <ben@amd.com>
2024-10-21 16:29:53 -07:00
Benjamin Welton bb69467765 Renamed agent profiling service to device counting service (#1132)
* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
2024-10-18 14:14:11 +05:30
itrowbri 8d7be2e4b4 SWDEV-483130: Replace calls to deprecated functions hipHostMalloc/hipHostFree (#1070)
* SWDEV-483130: Replace calls to deprecated functions hipHostMalloc/hipHostFree

* SWDEV-483130: Replace calls to deprecated functions hipHostMalloc/hipHostFree. Moved definitions from lib/commons/defines.hpp to samples/common/defines.hpp and tests/common/defines.hpp

* Updated comment for clarity

* Update tests/rocprofv3/aborted-app/validate.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Formatting

* Formatting

* Updated CHANGELOG

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-09-12 18:31:00 -05:00
Vladimir Indic 93e82663d9 PC sampling: online partial PC sampling decoding (#1004)
* PC sampling: online partial PC sampling decoding

PC sampling service decodes a PC sample partially
by replacing the PC with an id of the loaded code object instance
containing PC and the offset of the PC within that code object instance.

* PC sampling: marker records removed

* PC sampling parser: minor doc update in mock

* PC sampling: introducing rocprofiler_pc_t

* NULL value of the code object id introduced.

* Clarifying documenation related to PC offset.

* PC offset documentation improvement

* PC sampling parser benchmark: Reducing the number of samples to recreate half of performance.
2024-09-05 11:35:46 -05:00
Jonathan R. Madsen 5d54682468 Misc cleanup and stale code removal (#1026)
* Remove custom allocators

- remove unused lib/rocprofiler-sdk/allocator.*
- remove unused lib/rocprofiler-sdk/context/allocator.hpp

* Fix rocprofiler_strip_target (rocprofiler_utilities.cmake)

* Remove old HSA_TOOLS_LIB support

- remove OnLoad/OnUnload functions used by HSA_TOOLS_LIB env variable

* Fix linter warnings + specific NOLINT exceptions

- replace bare NOLINT with NOLINT(<warning-name>)
2024-08-20 01:07:32 -05:00
Jonathan R. Madsen bb25376480 Misc API cleanup and consistency fixes (#1023)
- ROCPROFILER_API after function
- use rocprofiler_tracing_operation_t in lieu of uint32_t where appropriate
- rocprofiler_tracing_operation_t is not int32_t typedef (formerly uint32_t)
- use const T* instead of T* where appropriate
2024-08-20 01:06:12 -05:00
Jonathan R. Madsen 20e07caad4 Reorganize thread trace codeobj headers (#1001)
* include/rocprofiler-sdk/cxx/codeobj

- Relocated from include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj

* Update include/rocprofiler-sdk/cxx

- cmake updates
- correct namespace rocprofiler::codeobj rocprofiler::sdk::codeobj

* Update codeobj tests and samples
2024-08-01 00:10:09 -05:00
Vladimir Indic 0f89f0449d PC sampling: chiplet id + integration test fix (#983)
* PCS: show chiplet; cover loading/unloading in integration test

* Use (code_object_id, pc_addr) pair as instruction id.
2024-07-22 16:00:59 +05:30
Jonathan R. Madsen 1e49b43738 Miscellaneous updates (#959)
- missing-new-line CI job: ensures all source files end with new line
- logging updates
- add new line to the end of many files
- fix header include ordering is misc places
- transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies
2024-07-08 16:50:32 -05:00
Giovanni Lenzi Baraldi 78fd8cb379 Returning code object id information in code_printing.cpp:Instruction (#965)
* Returning code object id information in code_printing.cpp:Instruction

* Adding assertions

* Simplifying decoder library
2024-07-08 16:59:40 -03:00
Giovanni Lenzi Baraldi a045947a89 Removing cache of decoded lines and returning shared_ptr (#953) 2024-06-25 16:00:59 -03:00
Benjamin Welton 81d1407565 Incremental Counter Profile Creation (#933)
* Incremental Counter Profile Creation

Adds support for incremental counter creation. How this functions is the
behavior of rocprofiler_create_profile_config has been changed.

rocprofiler_create_profile_config(rocprofiler_agent_id_t           agent_id,
                                  rocprofiler_counter_id_t*        counters_list,
                                  size_t                           counters_count,
                                  rocprofiler_profile_config_id_t* config_id)

The behavior of this function now allows an existing config_id to be
supplied via config_id. The counters contained in this config will be
copied over and used as a base for a new config along with any counters
supplied in counters_list. The new config id is returned via config_id
and can be used in future dispatch/agent counting sessions.

A new config is created over modifying an existing config since there
is no gaurentee that the existing config isn't already in use. While we
could add locks (or other mutual exclusion properties) to check if its
in use and reject an update, the benefit from doing so is minor in
comparison to just creating a new config. This also side steps a common
pattern a tool may use to add additional counters at some point later on
during execution. Now they can do that without destroying the existing
config.

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-06-19 00:11:03 -07:00
Giovanni Lenzi Baraldi 9676295d3d ATT API changes - add user_data field and separation of dispatch vs agent profiling (#893)
* DRM Issue Fix for SLES 15 (#897)

* DRM Issue Fix

* Formatting Fix

* PC sampling: CID manager unit test (#898)

* Adding per-dispatch userdata field to ATT

* Clang tidy

* Formatting

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding dispatch_id, fixing user_data and update aql_profile_v2

* Formatting

* Tidy fixes

* Second fix for userdata

* removing assert for union

* Adding serialization. Created agent profiling-like thread trace

* Implemented agent thread trace

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Restructured thread trace packets

* Added agent API tests

* Fixing multigpu for agent test

* Formatting

* Formatting

* Improving header locations

* Fixing merge conflicts

* Tidy

* Tidy

* Tidy

---------

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
2024-06-13 15:29:29 -03:00
Manjunath P Jakaraddi c49719649b SWDEV-465322: Adding support for Perfcounter SIMD Mask in ATT (#910)
* SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT

* Apply suggestions from code review

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

* Adding unit tests

* Adding counters check for gfx9 and SQ block only

* Addressing review comments

* changing the struct size

* fixing header includes

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2024-06-12 16:25:06 -07:00
Benjamin Welton f5753d3ae3 Add dimension query to counter collection sample (#918)
Co-authored-by: Benjamin Welton <ben@amd.com>
2024-06-07 14:30:01 -07:00
Benjamin Welton 3d9a448797 Small change to sample for clarity (#913)
Co-authored-by: Benjamin Welton <ben@amd.com>
2024-06-07 13:05:46 -07:00
Vladimir Indic b0c41827c3 PC sampling client: using raw pointers (#902)
* PC sampling client: using raw pointers to prevent premature destruction of buffers

* PCS client: freeing buffer_ids
2024-06-04 13:16:26 -05:00
Ammar ELWazir 0e43a30de0 Update client.cpp (#900) 2024-06-04 13:16:26 -05:00
Jonathan R. Madsen a76f61a0a3 Migrate to rocprofiler-sdk:: namespace in CMake everywhere (#892)
- remove all usage/support for rocprofiler:: namespace
2024-05-29 22:28:43 -05:00
Jonathan R. Madsen 5525b400c3 Miscellanous AFAR 5 Updates (#891)
* Dispatch table copy/update uses ROCP_TRACE instead of ROCP_INFO

* Update rocprofiler-sdk CMake config

- rocprofiler::rocprofiler is alias to rocprofiler-sdk::rocprofiler-sdk instead of other way around

* Prefer rocprofiler-sdk::rocprofiler-sdk over rocprofiler::rocprofiler

* Fix WITH_UNWIND for glog

- requires a value of "none" instead of boolean now

* Update include/rocprofiler-sdk/registration.h

- explicit struct names to permit forward decl

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- ROCPROFILER_SDK_CEREAL_NAMESPACE_BEGIN and ROCPROFILER_SDK_CEREAL_NAMESPACE_END to enable customized namespace
2024-05-29 16:45:56 -05:00
Giovanni Lenzi Baraldi 1b95089c28 Enable ATT continuous mode and code object tracing registration (#850)
* Adding ATT continuous mode and ATT code object tracking

* Fixing aql_packet.cpp

* Updating to aqlprofile codeobj changes

* Removing kernel packet from ATT dispatch callback

* Changing getSymbolMap() to return relative vaddr

* Tidy fixes

* Formatting

* Fix shadowing

* Fixing packet test

* Updating tests

* Simplifying multi-agent traces

* Adding dynamic codeobj tracking

* leftover book-keeping for codeobj markers

* Formatting

* Formatting

* Temporary removing codeobj marker

* Formatting

* Re-enabling codeobj tracking

* Making copy of coreapi table

* Fixing issues with toolData lifetile

* Formatting

* Fixing issues with ASAN

* Improving memory profile

* Removing misplaced annotation

* Fixing queue type and allowing shared_locks in globalThreadTracer

* Update logging

* Changing ATT formats to be more in line with the SDk (#883)

* Fixing some merge conflicts

* Fixing cmakelists

* Fixing merge conflicts

* Formatting
2024-05-29 11:09:28 -05:00
Giovanni Lenzi Baraldi 385980e279 Moving ATT to amd_detail (#885)
* Moving ATT to amd_detail

* Formatting
2024-05-29 11:28:01 -03:00
Giovanni Lenzi Baraldi a84c9fa7d4 Removing code object static library (#865)
* Removing static library build for codeobj library

* Moving codeobj library to amd_detail

* Formatting

* Formatting

* Adding findDW

* Adding libdw to common samples cmake
2024-05-28 23:15:11 -03:00
Ammar ELWazir 987ae3cc47 PC Sampling Support (#715)
* cmake formatting (cmake-format) (#188)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#189)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets

* source formatting (clang-format v11) (#191)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#192)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: shadow variable fix

* pcs: fix for compiler errors reported by CI/CD

* source formatting (clang-format v11) (#193)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: docs fix; samples uses rocprofiler::rocprofiler library

* cmake formatting (cmake-format) (#195)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client in samples folder fixed

* pcs: client requires rocprofiler package as dependency

* pcs: client uses single context

* source formatting (clang-format v11) (#196)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client using single buffer; no buffer destroy in client

* pcs: client::setup explicitly called from the example

* pcs: rocprofiler_pc_sample_record_t updated

* pcs: fixed init of external correlation id

* source formatting (clang-format v11) (#198)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: remove outdated files; update CMakeLists

* cmake formatting (cmake-format) (#212)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: using rocprofiler_agent_id_t

* pcs: Removing trailing whitespaces

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* source formatting (clang-format v11) (#214)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: mapping agent_id to the agent

* source formatting (clang-format v11) (#215)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: const while iterating over agents

* source formatting (clang-format v11) (#216)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: calling get_buffer instead of get_buffers

* pcs: workgroup typo

* pcs: documentation for the public PC sampling API

* pcs: queue_cb_t signature adaptation

* pcs: mocks removed

* pcs: updating HsaApiTable with HSA/ROCr PC sampling API

* pcs: querying available PC sampling configs through IOCTL

* pcs: create the PCS session in IOCTL

* pcs: first actual PC samples delivered to the rocprofiler's client :)

* pcs: works with marker packet too

* pcs: using HSA table to call pc sampling related functions

* pcs: using ioctl instead of kfd in naming

* pcs: configuration service test fixed

* pcs: sample processing test fixed

* pcs: marker packet macro wrapper removed

* pcs: marker packet is part of the rocprofiler_packet union

* pcs: one fixme added

* pcs: client that uses pc-sampling and code obj tracing

* pcs: client that supprts PC sampling and code obj tracing refactored

* pcs: show more info for each PC sample

* pcs: hex output for the samples that do not belong to the matmul kernel

* pcs: querying avail configuration happens immediately before configuring

* pcs: hsa_ven_amd_pcs_create_from_id renamed

* pcs: using hsa_stop; accessing a buffer by id from parser

* pcs: includes reworked, tests returned to life

* pcs: rocrofiler dir removed as outdated

* cmake formatting (cmake-format) (#271)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#272)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: some warnings fixed

* source formatting (clang-format v11) (#273)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#274)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: show MI200 relevant information in the sample

* pcs: queue cb fixed; rocr.h include fixed

* source formatting (clang-format v11) (#296)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: getting hsa_agent and the doorbell_id from hsa_queue

* source formatting (clang-format v11) (#297)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: correlation ID logic fixed

* source formatting (clang-format v11) (#303)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: pure pc sampling example fixed

* source formatting (clang-format v11) (#307)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#308)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: interval value if the PC sampling is already configured

* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED

New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.

* pcs: hsa_amd_queue_get_info mocked in tests

* source formatting (clang-format v11) (#328)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs (tests): query configs after configuring service

* source formatting (clang-format v11) (#329)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: sample checks workgroup_id_* and wave_id

* source formatting (clang-format v11) (#330)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs samples: running samples on the device 0

* pcs: kfd_ioctl updated

* pcs: ioctl config struct changed fields names

* pcs: status when PC sampling is configured by another process is renamed

* pcs: HSA PC sampling API table fixed

* pcs: tmp hack to be able to use HSA pc sampling table

* source formatting (clang-format v11) (#443)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs service use CIDs generated by HIP API tracing service

* source formatting (clang-format v11) (#455)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#456)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: CID manager

* pcs: explicit flush with no delivered data executes retirement logic

* source formatting (clang-format v11) (#464)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update

* source formatting (clang-format v11) (#465)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_configure_pc_sampling_service docs update

* pcs: explicit sync introduced in PCSCIDManager

* pcs: new logic for retiring CIDs in PC sampling service documented

* pcs: queue interception cb signature updated

* source formatting (clang-format v11) (#471)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: if no agents supports PC sampling, fail gracefully

* elaborating when KFD returns EBUSY and EEXIST

* pcs: the second PC sampling examples fails gracefully

* code samples use only single kernel for now

* pcs: CID manager refactored

* source formatting (clang-format v11) (#481)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: ioctl update

* source formatting (clang-format v11) (#531)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs:code sample to test PC sampling applied on concurrent kernels

* source formatting (clang-format v11) (#533)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: pc sampling strest test included

* cmake formatting (cmake-format) (#539)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#540)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: standalone benchmark

* cmake formatting (cmake-format) (#555)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: glance in external correlation IDs

* source formatting (clang-format v11) (#557)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* another change in ioctl interface

* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version

* source formatting (clang-format v11) (#611)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: avoid running problematic PC sampling test

* pcs: guarding tests not to fail on architectures not supporting PC sampling

* source formatting (clang-format v11) (#617)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: check IOCTL version prior to each KFD call

* pcs: ioctl refactoring

* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch

* cmake formatting (cmake-format) (#631)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#632)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: PC sampling service provides external correlation IDs

* source formatting (clang-format v11) (#644)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: use rocprofiler_dim3_t for workgrou_ip

* source formatting (clang-format v11) (#645)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: minor fixes

* pcs: updating the documentation for the pc sampling API functions

* pcs: api table and queue controller fix

* pcs: don't generate marker packets for the agent if PC sampling is not configured on it

* pcs: multi-GPU and single-GPU clients

* source formatting (clang-format v11) (#700)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: warning and errors fixed

* source formatting (clang-format v11) (#702)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: clang compiler errors and warnings fixed

* source formatting (clang-format v11) (#716)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const reference in cid manager

* source formatting (clang-format v11) (#717)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const & func in manager explicit

* pcs: test to cover creating PC sampling service of agent that does not exist

* pcs: generate marker packets if service is active

* source formatting (clang-format v11) (#719)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp

* Update utils.cpp

* moving pc-sampling tests and samples to pc-sampling label

* Format fix

* pcs: use configured instead of active service

* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp

* pcs: ensure configuring PC sampling on the HSA level is called only once

* pcs: minor fix

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* pcs: refactoring IOCTL integration

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: reverting back what bot doubled

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: retesting the bot

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: why bot fails on this IOCTL status

* pcs: why failing on <vector>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: returning commits removed by bot

* pcs: formatting locally

* pcs: clients are flushing buffers inside the tool_fini

* pcs: sync function in public API

* pcs: sync prior to unloading the code object

* pcs: sync function requires context

* pcs: client uses CID retirement service

* pcs: test for flusing internal ROCr buffers

* pcs: source formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: code samples refactoring

* pcs: public API header refactored

* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too

* pcs: remove unnecessary functions

* pcs: do not call hsa's copytables

* pcs: include reordering

* pcs: using ROCP_ERROR inside PC sampling implementation

* pcs: pc_sampling sample uses ostream instean of printfs

* pcs: pc_sampling_codeobj tracing using ostream instead of prints

* pcs: registering once for interceptor callbacks

* pcs: do not generate internal CIDs if not in debug mode

* pcs: rebasing fixed; missing external correlation IDs

* pcs: code formatting

* enable kernel tracing service to receive external correlation IDs

* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL

* pcs: polishing parser

* formatting

* updating parser to use workgroup_id

* kfd_ioctl.h extracted in details folder

* refactoring

* pcs: preparing to generate code object information

* flush internal buffers prior to unloading code object

* pcs: generating marker records

* pcs: wrap code_object's shutdown function

* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment

* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES

* pcs: separate structs for code object loading/unloading markers

* pcs: inst_pkt_t changed the namespace

* pcs: removing wrapper around the shutdown function

* pcs: size in record field

* pcs: documentation refactoring + typdefs

* renaming PCSAgentConfig to PCSAgentSession

* pcs: service does not keep a pointer to the context

* pcs: static assertions related to the versioning

* pcs: rocprofiler_pc_sampling_configuration_t size field

* pcs: report API unimplemented unleass explicitly enabled

* pcs: skip tests if KFD does not support PC sampling

* pcs: if ROCr hides some devices, no PC samples will be delivered for it

* pcs: hip error check after kernel launch

* formatting

* removing PCS info from agent.h

* fix based on review

* Update continuous integration workflow

- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300

* Updating pc sampling test labels

* ROCP_PC_SAMPLING_ENABLED env in CI

* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs

* Rearrange sanitizer assignments

* fixes according to review

* removed unused functions

* pcs: rocprofiler_agent_id_t instead of handle as a key in map

* Update source/lib/rocprofiler-sdk/context/context.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* removing drm_fd from the agent.h

* pcs: removing one sample due to complexity

* pcs: refactoring sample

* simplifying sample

* new lines

* Improve queue_control enable intercepter logic

* Update lib/rocprofiler-sdk/hsa/types.hpp

- handle amd_ext size for HSA 1.12.0

* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED

* Update hsa_adapter.cpp

- anonymous namespace + remove debug

* parser update

* Apply suggestions from code review

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-05-24 09:49:44 -05:00
Benjamin Welton 28e6430d04 [2/N] Agent Counter implementation with unit tests to check functionality (#846)
Agent Counter Collection API with tests and samples.
---------

Co-authored-by: Benjamin Welton <ben@amd.com>
2024-05-21 13:34:54 -07:00
Vladimir Indic 358c599c3f avoiding early destruction of code objects list (#847) 2024-05-08 11:00:08 +02:00
Giovanni Lenzi Baraldi 099ac7c72d Gbaraldi/att tool (#766)
* Enabling codeobj and thread trace samples

* Updating aqlprofile_v2 header

* Codeobj and thread trace samples with output log files

* Fixing clang format

* Cmake formatting

* Adding coverage to codeobj

* Comment trace sample

* Adding ATT Parser API

* Fixing forwarding to aqlprofile

* Clang formatting

* Clang tidy

* Adding option to print memory kernels

* Clang format

* Remove default from switch case

* Separating  client/main on codeobj sample for ASAn

* Formatting

* Gbaraldi/att tool rebase (#801)

* Enabling codeobj and thread trace samples

* Updating aqlprofile_v2 header

* Codeobj and thread trace samples with output log files

* Fixing clang format

* Cmake formatting

* Adding coverage to codeobj

* Comment trace sample

* Removing python from workflow

* Adding ATT Parser API

* Fixing forwarding to aqlprofile

* Clang formatting

* Clang tidy

* Adding option to print memory kernels

* Clang format

* Remove default from switch case

* Separating  client/main on codeobj sample for ASAn

* Formatting

* Enabling codeobj and thread trace samples

* Updating aqlprofile_v2 header

* Codeobj and thread trace samples with output log files

* Fixing clang format

* Cmake formatting

* Adding coverage to codeobj

* Comment trace sample

* Adding ATT Parser API

* Fixing forwarding to aqlprofile

* Clang formatting

* Clang tidy

* Adding option to print memory kernels

* Clang format

* Remove default from switch case

* Separating  client/main on codeobj sample for ASAn

* Formatting

* Fix codeobj library

* Allow thread trace in parallel with other service

* Zeroing the HSA signals

* Adding exception wrappers in ATT sample

* Removed force configure

* Remove force configure from ISA decode

* Removing codecov flag

* Gbaraldi/att tool tests (#828)

* Adding tests for codeobj ISA decode

* Adding ATT tests

* Adding ATT integration tests

* Formatting

* Changing codeobj binary extension

* Renaming codeobj library spaces

* Fixing samples

* Formatting

* Formatting

* Fixing int test

* Fixing linker error

* Fixing memory fault

* Moving kernel ot inside namespace

* ASAN linking fix

* Removing unecessary headers

* Formatting

* Fixing target_cu

* Remove codeobj binary

* Revert "Remove codeobj binary"

This reverts commit 7d286f89d8096bc36925cd79cd742a5e6d10d179.

* Enable memory snapshot

* adding comgr

---------

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
2024-05-03 18:45:47 -03:00
Jonathan R. Madsen de13d2ac5d Public C++ header files and samples updates (#819)
* Public C++ header files (source/include/rocprofiler-sdk/cxx)

* Update samples/api_buffered_tracing

- scratch memory and page migration
- README

* Update samples/api_buffered_tracing

- page migration component in sample

* Update tests/page-migration/validate.py

- fix checks for page migration operation names

* Update tests/page-migration/validate.py

- fix get_allocated_pages

* Update scratch memory and page migration validations

* Fix include/rocprofiler-sdk/cxx installation

* Rework include/rocprofiler-sdk/cxx

- Improve name_info to support const char*, string_view, string

* Update samples/api_{buffered,callback}_tracing

* External correlation ID request sample

- includes correlation ID retirement demo

* Update samples/api_buffered_tracing/README.md

* Update lib/rocprofiler-sdk/hsa/queue.cpp

- generate correlation ID for kernel launch if one doesn't exist

* Remove priority check from tool libraries (samples/tests)

- if(priority > 0) return nullptr check in rocprofiler_configure has proliferated beyond its intended use

* Apply suggestions from code review
2024-04-25 20:09:11 -05:00
Jonathan R. Madsen 8c985543d9 Rework counter collection sample app (#822)
* Sync more often in counter collection samples

* Update samples/counter_collection/main.cpp

- support command line arguments
  - number of iterations
  - iterations per sync
  - number of devices to use
2024-04-24 14:00:59 -05:00
Jonathan R. Madsen b570ff5273 Update samples/intercept_table (#792)
- install function wrappers around HIP runtime API
  - easily correlated to the executable
  - safer than HSA runtime due to potential for HSA to get invoked after main returns
2024-04-18 05:30:34 -05:00
Benjamin Welton edb1883a05 Modified hipMalloc size for main.cpp in sample (#786)
* Modified hipMalloc size for main.cpp in sample

* Update samples/counter_collection/main.cpp

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2024-04-18 04:28:56 -05:00
Benjamin Welton c2f659ab5c Removal of HSA from counter collection (#697)
* Minor fix

Removal of HSA from counter collection

Tests for AQL

Updated counter collection client to build profiles in tool init

* Rebased

* Debug printing

* Formatting

* More format

* fix shadowing

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2024-04-12 18:46:10 -07:00
Jonathan R. Madsen 07537b6231 rocprofiler_kernel_dispatch_info_t + header record for buffered counter collection (#758)
* Update include/rocprofiler-sdk

- defines.h
  - ROCPROFILER_VERSION_10_0 -> ROCPROFILER_SDK_VERSION_0_0
- fwd.h
  - rocprofiler_counter_record_kind_t
  - rocprofiler_kernel_dispatch_info_t
  - rocprofiler_record_counter_t
    - has dispatch id instead of correlation id
  - rocprofiler_counter_info_v0_t
    - added rocprofiler_counter_id_t field
    - added is_constant field
    - reordered better packing
- dispatch_profile.h
  - added rocprofiler_profile_counting_dispatch_record_t for use as a header record for rocprofiler_profile_counting_dispatch_data_t
- callback_tracing.h
  - rocprofiler_callback_tracing_kernel_dispatch_data_t uses rocprofiler_kernel_dispatch_info_t
- buffer_tracing.h
  - rocprofiler_buffer_tracing_kernel_dispatch_record_t uses rocprofiler_kernel_dispatch_info_t

* Update lib/rocprofiler-sdk/*

- transition to rocprofiler_kernel_dispatch_info_t
- set id and is_constant values for rocprofiler_counter_info_v0_t in rocprofiler_query_counter_info

* Update lib/rocprofiler-sdk-tool

- transition to rocprofiler_kernel_dispatch_info_t

* Update lib/rocprofiler-sdk/counters/tests/core.cpp

- transition to rocprofiler_kernel_dispatch_info_t

* Update samples

- transition to rocprofiler_kernel_dispatch_info_t
- transition to rocprofiler_counter_record_kind_t

* Update tests

- transition to rocprofiler_kernel_dispatch_info_t
- transition to rocprofiler_counter_record_kind_t
- improve integration test validation for counter-collection
- update serialization for new/additional types

* Fix tests/counter-collection/validate.py

- loosen restrictions on the length of counter description

* Update include/rocprofiler-sdk/buffer_tracing.h

- remove accidental packed attribute

* Update lib/rocprofiler-sdk/counters/xml/derived_counters.xml

- Add description for TCC_TAG_STALL_sum (reference: https://rocm.docs.amd.com/en/develop/conceptual/gpu-arch/mi300-mi200-performance-counters.html)

* Update tests/page-migration/validate.py
2024-04-12 17:30:34 -05:00
Jonathan R. Madsen 3eaa678054 CTest Environment Update (#756)
* Update test/tools/json-tool.cpp

- push/pop ppid as external correlation id instead of pid

* Update environment variables for tests and samples

* Revert to old CDash dashboard in run-ci.py

* Revert to new CDash dashboard in run-ci.py
2024-04-12 08:40:00 -05:00
Jonathan R. Madsen 0f5c575435 Fix code_object_operation_t and memory_copy_operation_t enums (#751)
- enums for operations should not contain callback/buffer tracing categorization
- e.g. ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT_LOAD should be ROCPROIFLER_CODE_OBJECT_LOAD
2024-04-11 18:52:13 -05:00
Giovanni Lenzi Baraldi 69b8a43dc6 Gbaraldi/threadtrace2 (#724)
* Added first ATT API

* Finalizing thread trace API

* Fixing more rebase conflicts

* Added codeobj disassembly sample

* Fixing merge issues with rebase [2]

* Adding ATT packets

* Implemented thread trace intercept

* Moved codeobj parser to same repo as rocprofiler

* Moved thread trace to new API

* Fixing merge conflicts

* Fixing more merge conflicts

* Adding thread trace packet reuse

* Merged aql_profile_v2 headers

* Linked ATT sample to aqlprofile

* Updated decoder to include non-loaded codeobjs

* Implemented ISA decoder into ATT sample

* Added marker_id to vaddr

* Updating aql_profile_v2 API to memcpy

* Updating thread trace API to include 64bit markers. Using the result of ISA matching.

* Added instruction type and cycles summary

* Updated sample with selection of kernel by kernel_object

* Added option to copy from memory kernels

* Moved tool_data in thread_trace to dynamic alloc

* Restoring hsa.cpp

* Fixed ATT sample crash. General improvements.

* Moved codeobj library to outside src/

* Updated license header

* Moved codeobj_capture to camelcase

* Solving some more merge conflicts

* Update samples/advanced_thread_trace/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update samples/advanced_thread_trace/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update samples/code_object_isa_decode/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt

* Removing unused parameter check

* Adding const to isEmpty

* Removing unused warning

* Adding libdw-dev to requirements

* Running clang-format

* Commenting out new aql calls

* Clang format

* Unused variable fix

* Adding codeobj-decoder coverage

* Commenting out threadtrace

* Update samples/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* P

* WOverloaded

* Addressing clang-tidy

* Virtual destructor on ttracer class

* Corr id

* Fixing code source format

* Update CMakeLists.txt

* Build fixes

* Update source/lib/rocprofiler-sdk-codeobj/code_object_track.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix shadowing

* Update CMakeLists.txt

* Update samples/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2024-04-08 12:43:02 -07:00
Jonathan R. Madsen 939e23e9d1 Stop all client contexts prior to finalization (#721)
* Stop all client contexts prior to finalization

* Update lib/common/container/static_vector.hpp

- improve emplace_back for non-{move,copy}-assignable object

* Update samples/intercept_table/client.cpp

- improve robustness against static object destruction

* Update lib/rocprofiler-sdk/context/context.cpp

- change storage of registered context array
  - stable_vector of optional contexts
  - common::static_object wrapper around stable_vector

* Update samples/intercept_table/client.cpp

- use variable template for underlying function pointer
2024-04-02 03:05:11 -05:00
Jonathan R. Madsen 7b6d3c70bd Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin

* Renamed cmake project in tests/bin

* Update samples

- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages

* Update tests

- Use ROCPROFILER_DEFAULT_FAIL_REGEX

* Add tests/lib

- libraries with HIP code

* Update PTL submodule

- remove atexit delete of thread_id_map

* Update cmake/rocprofiler_options.cmake

- Set ROCPROFILER_DEFAULT_FAIL_REGEX

* Update common lib: env + logging

- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*

* Add lib/rocprofiler-sdk/shared_library.cpp

- shared library constructor

* Update lib/rocprofiler-sdk-tool/tool.cpp

- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging

* Add tests/bin/hip-in-libraries

- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
  - used for testing deadlocking within __hipRegisterFatBinary

* Update bin/rocprofv3

- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1

* Add tests/rocprofv3/tracing-hip-in-libraries

- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels

* Update bin/rocprofv3

- fix counter collection (no exec)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- replace "Kernel-Name" with "Kernel_Name"

* Update lib/rocprofiler-sdk/registration.cpp

Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries

* Update tests/rocprofv3

- replace "Kernel-Name" with "Kernel_Name"

* Update tests

- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection

* Update samples/counter_collection

- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes

* Update bin/rocprofv3

- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace

* Update lib/rocprofiler-sdk-tool

- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime

* Update lib/rocprofiler-sdk/agent.cpp

- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)

* GPR (vector and scalar) info in kernel symbol data

- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info

* Header include order fix

- Include repo headers first
- Third party library headers next
- standard library headers last

* Update dispatch profiling public API

- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback

* Update tests/bin/CMakeLists.txt

- fix add_subdirectory(hip-in-libraries) order

* Update VERSION

- bump to 0.2.0 in prep for AFAR
2024-03-07 22:21:26 -06:00
Jonathan R. Madsen 1bb94add11 Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args

- `include/rocprofiler-sdk/hip/api_args.h`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
  - iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
  - fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
  - improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
  - remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
  - destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
  - improve parallelism in tool_tracing_callback
  - serialize the marker api args
  - only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
  - reduce timeout on tests to 120 seconds

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- disable dereference of double pointer in stringize_impl

* Update lib/common

- indirection_level in mpl.hpp
- stringize_arg.hpp

* Rework rocprofiler_iterate_callback_tracing_kind_operation_args

- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
2024-03-01 01:46:07 -06:00
Jonathan R. Madsen a1267e1fd2 C compatibility for public headers (#566)
* C compatibility for public headers

- add tests/tools/c-tool.c
  - builds a tool (which does nothing) with C language
  - ensures that tool can be compiled in C
- add tests/c-tool/CMakeLists.txt
  - ensures that tool library build from C is a valid tool
- rocprofiler_counter_info_v0_t is_derived is int instead of bool
  - C does not have bool unless <stdbool.h> is included
- add `include/rocprofiler-sdk/hsa/api_trace_version.h
  - handles providing HSA_*_TABLE_(MAJOR|STEP)_VERSION values if compiled from C
- cmake define in version.h.in for ROCPROFILER_HSA_*_TABLE_(MAJOR|STEP)_VERSION
  - HSA table versions compiled with
- use rocprofiler_(hsa|hip|marker)_api_no_args struct to handle incompatibility b/t empty structs in C vs. C++ (size of 0 vs. size of 1)
- extern "C" in include/rocprofiler-sdk/{hsa,hip,marker}/api_args.h
- fixed spelling error: derrived -> derived
- scope YY_NO_INPUT compile definition to lib/rocprofiler-sdk/counters/parser/*

* Revert CDash dashboard
2024-02-29 23:49:54 -06:00
Jonathan R. Madsen 875f53b608 Correlation ID Retirement + misc (#527)
* Correlation ID Retirement

- include/rocprofiler-sdk/buffer_tracing.h
  - add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
  - ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
  - kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
  - emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
  - pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
  - update correlation_id struct
    - make ref_count private
    - {get,add,sub}_ref_count() functions
      - sub_ref_count() performs correlation id retirement
    - use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
  - check retired_correlation_ids
- tests/common/serialization.hpp
  - support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
  - check retired_correlation_ids
- tests/common/CMakeLists.txt
  - perfetto external project
- tests/common/perfetto.hpp
  - perfetto categories + aliases
  - add_perfetto_annotation
  - metaprogramming helpers
- tests/tools/CMakeLists.txt
  - link to tests-perfetto
- tests/tools/json-tool.cpp
  - demangling functions
  - serialization of marker API callback args
  - reduce parallel bottleneck in tool_tracing_callback
  - support correlation id retirement
  - Multiple threads for buffers
  - Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
  - write_perfetto() function

* Update tests/rocprofv3/tracing/validate.py

- tweak test_hsa_api_trace

* Update PTL submodule

- fixes for data race during destruction of task

* Update lib/rocprofiler-sdk/buffer.*

- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t

* Reduce timeouts in counter collection samples [skip ci]

* Update tests/tools/json-tool.cpp

- tweak demangle(string_view, int*) -> demangle(string_view, int&)

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more
2024-02-23 10:30:33 -06:00
Jonathan R. Madsen 0d939edbba Updates/fixes for CI, docs, tests, samples, and common library (#528)
- .github/workflows/continuous_integration.yml
  - apt-get update before apt-get install
  - remove libgtest-dev
  - actions-comment-pull-request: v2.4.3 -> v2.5.0
- .github/workflows/formatting.yml
  - create-pull-request: v5 -> v6
- cmake/rocprofiler_options.cmake
  - remove unused ROCPROFILER_DEBUG_TRACE and ROCPROFILER_LD_AQLPROFILE options
- samples/counter_collection/callback_client.cpp
  - corr_id field renamed to correlation_id
- samples/counter_collection/client.cpp
  - corr_id field renamed to correlation_id
- include/rocprofiler-sdk/fwd.h
  - In rocprofiler_record_counter_t: rename corr_id field to correlation_id
  - doxygen fixes
- lib/common/utility.*
  - remove get_accurate_clock_id_impl
  - timestamp_ns() defaults to CLOCK_BOOTTIME
- lib/rocprofiler-sdk/counters/core.cpp
  - fix spelling mistake: extrenal -> external
  - corr_id field renamed to correlation_id
- lib/rocprofiler-sdk-tool/tool.cpp
  - fix destruction of static tool::output_file before finalization
- scripts/update-docs.sh
  - define PROJECT_NAME
- tests/async-copy-tracing/validate.py
  - init_time and fini_time checks
  - hip_api_traces, marker_api_tracing
- tests/common/serialization.hpp
  - fix save function for rocprofiler_record_counter_t following rename of corr_id to correlation_id
- tests/kernel-tracing/validate.py
  - init_time and fini_time checks
  - relax test_total_runtime range
- tests/rocprofv3/tracing/CMakeLists.txt
  - remove -M from rocprofv3-test-systrace-execute
  - exclude test_hsa_api_trace in rocprofv3-test-systrace-validate due to HIP API tracing
- tests/rocprofv3/tracing/validate.py
  - update test_kernel_trace to accept mangled or demangled
- tests/tools/json-tool.cpp
  - remove use of GLOG
  - include init_time and fini_time
  - write_json(...) function
2024-02-22 00:16:43 -06:00
Benjamin Welton 7adffd5b22 Add rocprofiler_query_counter_info function (#452)
* Add rocprofiler_query_counter_info function

Replaces rocprofiler_query_counter_name. Allows for
querying other types of info from counters (such as
description) and gives us some flexibility to add
return data in the near future (if we have to).

* source formatting (clang-format v11) (#453)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Updated version fetching

* source formatting (clang-format v11) (#509)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Merged

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-19 16:05:38 -08:00
Benjamin Welton 3638351b4c Callback based handler for counter collection (#506)
* Callback based handler for counter collection

* source formatting (clang-format v11) (#507)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#508)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Doc fix

* Minor doc fix

* More doc fixes

* More doc fixes

* More doc fixes

* Update CI

* Changes to the API per comments

* Mutex exception for HSA

* source formatting (clang-format v11) (#511)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Doc fix

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-19 15:55:21 -08:00
Benjamin Welton 3eb6a27bc6 Add support for AQL dimensions (#262)
* Add support for AQL dimension changes

Adds support for returning dimensions from AQLProfile through rocprofiler
to tools. Includes a much larger expanded test suite that covers nearly
all files in counter collection.

Specific changes below:

samples/counter_collection/print_functional_counters: Modified to check
the validity of dimensions returned in comparison to the actual underlying
data obtained from a kernel execution.

rocprofiler-sdk/aql/helpers: adds function calls to support fetching
dimension information from AQLProfile.

rocprofiler-sdk/aql/packet_construct: modified to allow for events
to be exported to aid evaluate_ast in decoding the output buffer.

lib/rocprofiler-sdk/counters: Instance count now derived from dimension
sizes. rocprofiler_query_counter_dimensions now moved to a callback format
to improve usability.

rocprofiler-sdk/counters/core: Code migrations and exports of functions
for testing.

rocprofiler-sdk/counters/dimensions: Generates a dimension cache to be
used when querying dimension information for a counter id.

rocprofiler-sdk/counters/evaluate_ast: Modified to pass back correct
dimension information and to check/determine output dimensions for derived
counters.

rocprofiler-sdk/counters/id_decode: Modified to have a map between
dimension name -> dimension along with a conversion from the aql profile
id for a dimension (string) -> integer based id (happens only once during
init).

rocprofiler-sdk/hsa/queue: Modified to allow for making testing easier.
Specifically to allow Queue to now be mocked in unit tests for counter
collection.

* Merge with changes for serialization

* Added suggestions

* source formatting (clang-format v11) (#457)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor fix

* Test change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-07 22:03:21 -06:00
Gopesh Bhardwaj 8a25b239bc Fixing counter collection in tools and enabling tests (#436)
* Fixing coutner colleciton in tools and enabling tests

* fixing tests

* improving coverage on test

* Adding vector operations app

* Fixing tools bug for counter collection

* removing roctx linking
2024-02-06 09:55:07 -08:00