Граф коммитов

126 Коммитов

Автор SHA1 Сообщение Дата
Giovanni Lenzi Baraldi 69b8a43dc6 Gbaraldi/threadtrace2 (#724)
* Added first ATT API

* Finalizing thread trace API

* Fixing more rebase conflicts

* Added codeobj disassembly sample

* Fixing merge issues with rebase [2]

* Adding ATT packets

* Implemented thread trace intercept

* Moved codeobj parser to same repo as rocprofiler

* Moved thread trace to new API

* Fixing merge conflicts

* Fixing more merge conflicts

* Adding thread trace packet reuse

* Merged aql_profile_v2 headers

* Linked ATT sample to aqlprofile

* Updated decoder to include non-loaded codeobjs

* Implemented ISA decoder into ATT sample

* Added marker_id to vaddr

* Updating aql_profile_v2 API to memcpy

* Updating thread trace API to include 64bit markers. Using the result of ISA matching.

* Added instruction type and cycles summary

* Updated sample with selection of kernel by kernel_object

* Added option to copy from memory kernels

* Moved tool_data in thread_trace to dynamic alloc

* Restoring hsa.cpp

* Fixed ATT sample crash. General improvements.

* Moved codeobj library to outside src/

* Updated license header

* Moved codeobj_capture to camelcase

* Solving some more merge conflicts

* Update samples/advanced_thread_trace/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update samples/advanced_thread_trace/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update samples/code_object_isa_decode/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/thread_trace/CMakeLists.txt

* Removing unused parameter check

* Adding const to isEmpty

* Removing unused warning

* Adding libdw-dev to requirements

* Running clang-format

* Commenting out new aql calls

* Clang format

* Unused variable fix

* Adding codeobj-decoder coverage

* Commenting out threadtrace

* Update samples/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* P

* WOverloaded

* Addressing clang-tidy

* Virtual destructor on ttracer class

* Corr id

* Fixing code source format

* Update CMakeLists.txt

* Build fixes

* Update source/lib/rocprofiler-sdk-codeobj/code_object_track.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fix shadowing

* Update CMakeLists.txt

* Update samples/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2024-04-08 12:43:02 -07:00
Mythreya 4fa165ec1a Add support for scratch reporting (#523)
* Add ToolsApiTable

Add ToolsApiTable wrapping for
scratch memory tracking

* Add initial support for scratch memory tracking

Buffering is implemented

* cmake formatting (cmake-format) (#525)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* source formatting (clang-format v11) (#524)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* Add callback tracing for scratch

Fixed the error where scratch tracking init was called irrespective of whether any client requested for it

* Apply suggestions from code review

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix tools api copy/update

Table were saved/updated incorrectly in previous
commit. Also adds passing user data through the callback

* Fix OpKind sequence for scratch tracking

Previously scratch was using OpKind from rocprofiler-sdk, but
templates were instantiated using API ID. These differ by 1

* Integration tests for scratch reporting

Added buffer and callback integration tests for scratch reporting

* source formatting (clang-format v11) (#550)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* cmake formatting (cmake-format) (#551)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* python formatting (black) (#549)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* CI fixes

* source formatting (clang-format v11) (#554)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* Update api

Rebase on main and updates based on PR feedback

* Update scratch reporting and address PR comments

- Added agent id to buffer records
- Updated `test_internal_correlation_ids` - Is almost identical to
  one in async-copy
- Updated scratch test to check for agent id
- Updated queue id serialization in callback records (prints
  handle as nested key)
- Remove `marker_api_traces` from scratch `test_internal_correlation_ids`
  validation test
- Rename `amd_tools_api` to `scratch_memory`
- Added doxygen comments
- Remove scratch callback from `tool.cpp`
- Replace assert with `LOF_IF` in `scratch_memory.cpp`

* Update tools table

Changed to match up with changes to hsa tables in main branch

* Rework scratch memory structure

* Update tests

- Added suggestions from PR review, and updated tests accordingly

* Misc cleanup

* Update scratch test

As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled.

Note,
> This API: `hsa_amd_agent_set_async_scratch_limit` is currently
> disabled. We need some changes in CP firmware to be able to do this
> and these changes are not ready yet.
> With the current code, you will also not get notifications for
> alternate-scratch allocations because this feature has been disabled
> while CP firmware is making additional changes
> We are hoping to have that feature enabled by ROCm-6.3

* Minor update to lib/rocprofiler-sdk/internal_threading.*

- delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-05 20:32:57 -05:00
Ammar ELWazir 5bb087f072 Adding useful scripts for formating and building (#737)
* Addin useful scripts for formating and building

* Update build.sh

* Update build.sh

* Update continuous_integration.yml
2024-04-04 06:49:17 -05:00
Benjamin Welton e0caae9ebc Add debug printing for write interceptor injected packets (#674)
* Add debug printing for write interceptor injected packets

Adds debug printing for write interceptor injected
packets. All packets that pass through the write
intercepter while enabled will be printed.

Only executes/prints when the environment variable
GLOG_v is set to 2 or higher (otherwise it is a no-op
and the expression is not evaluated).

* source formatting (clang-format v11) (#675)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Changes on fmt location

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-04-03 18:14:22 -07:00
Benjamin Welton 41c0ddd72d Convert LOG() -> ROCP_X logging macros. (#695)
* Convert LOG() -> ROCP_X logging macros.

This patch converts the LOG() macro to the ROCP_X logging macros.
There are the following levels of logs.

Logs whos expressions are not evaluated unless the log level is enabled:

ROCP_TRACE - VLOG(2) (enabeled by env variable GLOG_v=2)
ROCP_INFO - VLOG(1) (enabeled by env variable GLOG_v=1)

Logs whos expressions are always evaluated:

ROCP_WARNING - LOG(WARNING)
ROCP_ERROR - LOG(ERROR)
ROCP_FATAL - LOG(FATAL)
ROCP_DFATAL - DLOG(FATAL) (only fatal in debug mode)

* source formatting (clang-format v11) (#696)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Minor fix

* Fixes for VLOG before main

* fix vmodule

* source formatting (clang-format v11) (#718)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* memory leak fix

* Vlog change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-04-02 17:15:30 -07:00
Benjamin Welton 1e612a5e52 Wait for all memory copies to complete before allowing destruction (#725)
* Wait for all mem copies to complete before destroying.

* Update source/lib/rocprofiler-sdk/hsa/async_copy.cpp

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* Update async_copy.cpp

---------

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
2024-04-02 08:22:37 -05:00
Jonathan R. Madsen 939e23e9d1 Stop all client contexts prior to finalization (#721)
* Stop all client contexts prior to finalization

* Update lib/common/container/static_vector.hpp

- improve emplace_back for non-{move,copy}-assignable object

* Update samples/intercept_table/client.cpp

- improve robustness against static object destruction

* Update lib/rocprofiler-sdk/context/context.cpp

- change storage of registered context array
  - stable_vector of optional contexts
  - common::static_object wrapper around stable_vector

* Update samples/intercept_table/client.cpp

- use variable template for underlying function pointer
2024-04-02 03:05:11 -05:00
Gopesh Bhardwaj e3c7eed7c0 SWDEV-451569: bug in tracing options (#728) 2024-04-02 03:03:02 -05:00
Ammar ELWazir 2905fb5e95 Update run-ci.py (#641)
* Temp: Fixing node id

* source formatting (clang-format v11) (#709)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Using logical node id

* Update agent.cpp

* Update agent.cpp

* Python formatting

* Update run-ci.py

* Update run-ci.py

* Update continuous_integration.yml

* Update continuous_integration.yml

running directly using the prepared runner container

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update run-ci.py

* Clean up

* Fixing install paths

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Fixing GPU Agents Test Validation

* python formatting (black) (#712)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Fixing the issue with rocclr detected kernels __amd_rocclr_.*

* python formatting (black) (#713)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Fixing the issue with rocclr detected kernels __amd_rocclr_.*

* Fixing static number of async copies and using hsa_api instead for validation

* python formatting (black) (#714)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Increasing the time limit for waiting on active signals

* Update continuous_integration.yml

* Update async_copy.cpp

* Update CMakeLists.txt

* changing node id to logical node id in rocprofv3

* Update tool.cpp

* testing async mem copy signal decrement

* Update logging.cpp

* Update validate.py

---------

Co-authored-by: Ammar ELWazir <aelwazir@rocprofiler1.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <aelwazir@rocprofiler2.amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-02 01:39:24 -05:00
Ammar ELWazir 62625d0aa1 Use logical_node_id for mapping rocprofiler agents to HSA agents (#708)
* Temp: Fixing node id

* source formatting (clang-format v11) (#709)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Using logical node id

* Update agent.cpp

* Update agent.cpp

* Python formatting

---------

Co-authored-by: Ammar ELWazir <aelwazir@rocprofiler1.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <aelwazir@rocprofiler2.amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-02 01:38:18 -05:00
Jonathan R. Madsen 092c428b78 Update internal threading (#720)
- update lib/rocprofiler-sdk/internal_threading.*
- use PTL::TaskManager instead of PTL::TaskGroup
  - easier to handle for our needs
  - eliminate data race in rocprofiler_flush_buffer
  - combine memory management of TaskManager and ThreadPool
2024-04-01 20:31:54 -05:00
Gopesh Bhardwaj ecc79b1fa3 SWDEV-452077 Fixing MI300 list counters and metrics issue (#701) 2024-03-29 14:44:38 -07:00
Benjamin Welton f0924c6aa7 Make dimension error message print the counter name (#658)
* temp

* source formatting (clang-format v11) (#659)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-03-26 17:19:04 -05:00
Jonathan R. Madsen bc9f86ec62 Update HSA copy table (#687)
- two copies of HSA table: internal and tracing
- internal is used to invoke HSA function without any possibility of triggering tracing, etc.
2024-03-26 17:11:34 -05:00
Jonathan R. Madsen 1addfed9f6 Fix agent node id + randomize offset id (#625)
* Fix agent node id + randomize offset id

- fixes the node_id value
- randomizes a constant offset for the id.handle values
- switch to using node ids in rocprofiler-sdk-tool library
- update tests related to agents

* Logical node id

- sequential node id values from 0 to (N-1) where N is the number of agents
2024-03-21 20:04:21 -05:00
Jonathan R. Madsen 2f9b1767e9 Handle hsa_queue_destroy after finalization (#679)
* Handle hsa_queue_destroy after finalization

- fixes issue where hsa_queue_destroy(...) is invoked after rocprofiler-sdk has finalized
- hsa::get_queue_controller() returns pointer
- if queue controller is a null pointer, skip invoking QueueController::destroy_queue

* Update HIP/HSA/marker update_table logging

* Update rocprofv3 tests

- remove HSA_TOOLS_LIB env variable
- remove setting ROCPROFILER_LOG_LEVEL env variable
- add timeouts to tests which are missing them

* Disable thread sanitizer deadlock detection

* Update CI workflow

- rename vega20-ubuntu job to core-ci
- enable navi32 in core-ci and sanitizers

* Update run-ci.py

- set gcovr html medium and high threshold

* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp

- remove this capture from enable/disable serialization

* Update lib/rocprofiler-sdk/hsa/{hsa_barrier,profile_serializer}.*

- hsa_barrier::set_barrier accepts const-ref to queue map
- profile_serializer::enable and profile_serializer::disable accept const-ref to queue map

* Logging for HIP/HSA/marker/profile_serializer

* Logging for HIP/HSA/marker/queue_controller

* Improve test_retired_correlation_ids asserts

* Fix tests/counter-collection/validate.py

- scale expected SQ_WAVES counter value based on warp size of GPU

* Tweak github comment for code coverage

* Remove gcovr html high/medium threshold args

* Fix tests/counter-collection/validate.py

- round before casting to int in test_counter_values

* operator bool for profile_serializer

- only wait on CV if profile_serializer is used

* Logging updates (profile_serializer + code_object)

* Update counter-collection validate.py

* QueueController does not wait on CV if finalizing/finalized

* Update CI workflow

- remove navi32 from core job

* Improve HIP/HSA/marker tracing get_functor/functor

- remove lambda wrapper around functor

* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp

- do not acquire cvmutex lock during finalization

* Update lib/rocprofiler-sdk/hsa/hsa_barrier.*

- move ctor and dtor to implementation
- skip signal store screlease and destroy if already finalized

* Update CI workflow

- remove navi32 runners

* bwelton fixes for hangs

* CMake improvements + simplified demangle

- remove amd-comgr from common target (and thus removed from roctx DT_NEEDED)

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
2024-03-21 17:52:15 -05:00
Vladimir Indic 78939e705a PCS parser is aware of external correlation IDs (#639)
* PCS parser is aware of external correlation IDs

* source formatting (clang-format v11) (#640)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-15 14:04:06 -05:00
Jonathan R. Madsen 0fdb21c050 Context Updates (#624)
* Improve error checks related to context create/start/stop/is_valid

* Bump version to 0.2.1

* Track number of kernels associated with correlation id

- add atomic kernel counter variable to context::correlation_id

* Update lib/rocprofiler-sdk/hsa/queue.cpp

- apply the +/- kernel count
2024-03-14 04:40:58 -05:00
Jonathan R. Madsen 7ab1a8015f Fix tracing context domain logic for operations (#621)
* Fix tracing context domain logic for operations

- logic error: domain enabled (all operations all implicitly enabled) + domain enabled for subset of operations resulted in only explicitly enabled operations being treated as enabled
- domain_context: split single bitset for operations in all domains into array of bitsets for each domain

* Update lib/common/mpl.hpp

- assert_false for static_asserts in if constexpr expressions

* Update lib/rocprofiler-sdk/tests/contexts.cpp

- Tests for validating logic regarding domain and operations for callback and buffer tracing
2024-03-14 01:25:43 -05:00
Ammar ELWazir 2bfce8b86d Temporary move CI to hip staging (#615)
* Update continuous_integration.yml

* Update ostream.hpp
2024-03-13 12:34:29 -05:00
Benjamin Welton a06eef3488 Use static_object for dimension map (#616) 2024-03-13 09:28:33 -05:00
Jonathan R. Madsen 8591ed1c96 Use small_vector for API iterate_args (#597)
* Use small_vector for API iterate_args

- replace dim3 value arguments with rocprofiler_dim3_t
  - dim3 has a non-trivial destructor
- common::mpl::unqualified_type
- common::stringified_argument_array_t<N> alias
- assert_public_data_type_properties()
- common::container::small_vector<T>::at function
- stringize returns small_vector<stringified_argument>
  - stack allocated vector
- remove has_pc_sampling condition (HSA, HIP)
  - this will be handled in queue interception

* Misc tweaks
2024-03-13 07:36:55 -05:00
Benjamin Welton 2a262235db Disable test that checks dims (need AQL fake data support) (#612)
* Disable test that checks dims (need AQL fake data support

* source formatting (clang-format v11) (#613)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-03-12 23:20:14 -05:00
Benjamin Welton 1de44447f4 Deadlock Fix for HSA and Serialization Disable/Enabling support (#582)
* Initial barrier

* Working on profiler serializer extraction

* Current progress

* Serializtion Support

* source formatting (clang-format v11) (#583)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#584)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Minor fix

* Current Progress

* Current progress

* More fixes

* Serialization Fixes

* Bug fix

* source formatting (clang-format v11) (#600)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* More fixes

* More minor fixes

* source formatting (clang-format v11) (#603)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* source formatting (clang-format v11) (#604)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Lock order inversion false positive

* order fix

* More changes

* source formatting (clang-format v11) (#607)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* minor test fix

* Minor test changes

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-03-08 09:02:43 -06:00
Jonathan R. Madsen 7b6d3c70bd Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin

* Renamed cmake project in tests/bin

* Update samples

- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages

* Update tests

- Use ROCPROFILER_DEFAULT_FAIL_REGEX

* Add tests/lib

- libraries with HIP code

* Update PTL submodule

- remove atexit delete of thread_id_map

* Update cmake/rocprofiler_options.cmake

- Set ROCPROFILER_DEFAULT_FAIL_REGEX

* Update common lib: env + logging

- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*

* Add lib/rocprofiler-sdk/shared_library.cpp

- shared library constructor

* Update lib/rocprofiler-sdk-tool/tool.cpp

- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging

* Add tests/bin/hip-in-libraries

- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
  - used for testing deadlocking within __hipRegisterFatBinary

* Update bin/rocprofv3

- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1

* Add tests/rocprofv3/tracing-hip-in-libraries

- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels

* Update bin/rocprofv3

- fix counter collection (no exec)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- replace "Kernel-Name" with "Kernel_Name"

* Update lib/rocprofiler-sdk/registration.cpp

Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries

* Update tests/rocprofv3

- replace "Kernel-Name" with "Kernel_Name"

* Update tests

- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection

* Update samples/counter_collection

- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes

* Update bin/rocprofv3

- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace

* Update lib/rocprofiler-sdk-tool

- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime

* Update lib/rocprofiler-sdk/agent.cpp

- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)

* GPR (vector and scalar) info in kernel symbol data

- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info

* Header include order fix

- Include repo headers first
- Third party library headers next
- standard library headers last

* Update dispatch profiling public API

- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback

* Update tests/bin/CMakeLists.txt

- fix add_subdirectory(hip-in-libraries) order

* Update VERSION

- bump to 0.2.0 in prep for AFAR
2024-03-07 22:21:26 -06:00
Jonathan R. Madsen 1d33d4cf78 Update rocprofiler_query_available_agents(...) (#596)
* Agent info version

* Complete implementation

- revert "rocprofiler_iterate_agents" to "rocprofiler_query_available_agents"

* Misc tweaks

- update rocprofiler_query_available_agents impl

* Update include/rocprofiler-sdk/agent.h

- Fix undocumented param for rocprofiler_query_available_agents
2024-03-06 02:17:40 -06:00
Jonathan R. Madsen 19971d5719 Fix rocprofiler_context_is_active(...) (#595)
* Fix rocprofiler_context_is_active

- previously returning ROCPROFILER_STATUS_ERROR_CONTEXT_NOT_FOUND if context was inactive

* Update include/rocprofiler-sdk/context.h

- Update doxygen docs
2024-03-06 00:32:34 -06:00
SrirakshaNag c7407d0a9f Adding list-metrics (#585)
* Adding list-metrics

* cmake formatting (cmake-format) (#587)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* source formatting (clang-format v11) (#586)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* Fixing issues with validation tests

* python formatting (black) (#588)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* source formatting (clang-format v11) (#589)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* cmake formatting (cmake-format) (#590)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* Update conftest.py

* Update validate.py

* python formatting (black) (#591)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* python formatting (black) (#592)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* Checking if agent-id in validate.py

* Fixing list metrics execute test

* cmake formatting (cmake-format) (#593)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* Fixing CI failure

* cmake formatting (cmake-format) (#594)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* Review Comments

* Update source/bin/rocprofv3

Support -L shorthand for --list-metrics

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2024-03-05 20:53:15 -06:00
Jonathan R. Madsen b0a88d9124 Update registration client search (#569)
* Update registration client search

- Search ROCP_TOOL_LIBRARIES before dlopen search
- Fatal error if ROCP_TOOL_LIBRARIES entry does not contain rocprofiler_configure symbol
- Use RTLD_DEFAULT and RTLD_NEXT to (potentially) find first two instances of rocprofiler_configure
  - if no rocprofiler_configure found via RTLD_NEXT, do not do extensive search via link map

* _GNU_SOURCE instead of GNU_SOURCE

* Clang-tidy fix
2024-03-01 17:44:12 -06:00
Jonathan R. Madsen 1bb94add11 Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args

- `include/rocprofiler-sdk/hip/api_args.h`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
  - iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
  - fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
  - improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
  - remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
  - destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
  - improve parallelism in tool_tracing_callback
  - serialize the marker api args
  - only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
  - reduce timeout on tests to 120 seconds

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- disable dereference of double pointer in stringize_impl

* Update lib/common

- indirection_level in mpl.hpp
- stringize_arg.hpp

* Rework rocprofiler_iterate_callback_tracing_kind_operation_args

- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
2024-03-01 01:46:07 -06:00
Jonathan R. Madsen a1267e1fd2 C compatibility for public headers (#566)
* C compatibility for public headers

- add tests/tools/c-tool.c
  - builds a tool (which does nothing) with C language
  - ensures that tool can be compiled in C
- add tests/c-tool/CMakeLists.txt
  - ensures that tool library build from C is a valid tool
- rocprofiler_counter_info_v0_t is_derived is int instead of bool
  - C does not have bool unless <stdbool.h> is included
- add `include/rocprofiler-sdk/hsa/api_trace_version.h
  - handles providing HSA_*_TABLE_(MAJOR|STEP)_VERSION values if compiled from C
- cmake define in version.h.in for ROCPROFILER_HSA_*_TABLE_(MAJOR|STEP)_VERSION
  - HSA table versions compiled with
- use rocprofiler_(hsa|hip|marker)_api_no_args struct to handle incompatibility b/t empty structs in C vs. C++ (size of 0 vs. size of 1)
- extern "C" in include/rocprofiler-sdk/{hsa,hip,marker}/api_args.h
- fixed spelling error: derrived -> derived
- scope YY_NO_INPUT compile definition to lib/rocprofiler-sdk/counters/parser/*

* Revert CDash dashboard
2024-02-29 23:49:54 -06:00
SrirakshaNag bd81e6a5f8 tools support for callback counter collection (#515)
* tools support for callback counter collection

* source formatting (clang-format v11) (#516)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Fixing conflicts with main

* source formatting (clang-format v11) (#517)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* source formatting (clang-format v11) (#529)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* Adding a column for counter name and counter value

* source formatting (clang-format v11) (#530)

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>

* fixing issues with atomic

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
2024-02-25 20:30:38 -06:00
Jonathan R. Madsen 875f53b608 Correlation ID Retirement + misc (#527)
* Correlation ID Retirement

- include/rocprofiler-sdk/buffer_tracing.h
  - add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
  - ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
  - kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
  - emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
  - pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
  - update correlation_id struct
    - make ref_count private
    - {get,add,sub}_ref_count() functions
      - sub_ref_count() performs correlation id retirement
    - use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
  - check retired_correlation_ids
- tests/common/serialization.hpp
  - support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
  - check retired_correlation_ids
- tests/common/CMakeLists.txt
  - perfetto external project
- tests/common/perfetto.hpp
  - perfetto categories + aliases
  - add_perfetto_annotation
  - metaprogramming helpers
- tests/tools/CMakeLists.txt
  - link to tests-perfetto
- tests/tools/json-tool.cpp
  - demangling functions
  - serialization of marker API callback args
  - reduce parallel bottleneck in tool_tracing_callback
  - support correlation id retirement
  - Multiple threads for buffers
  - Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
  - write_perfetto() function

* Update tests/rocprofv3/tracing/validate.py

- tweak test_hsa_api_trace

* Update PTL submodule

- fixes for data race during destruction of task

* Update lib/rocprofiler-sdk/buffer.*

- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t

* Reduce timeouts in counter collection samples [skip ci]

* Update tests/tools/json-tool.cpp

- tweak demangle(string_view, int*) -> demangle(string_view, int&)

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more
2024-02-23 10:30:33 -06:00
Gopesh Bhardwaj 2d71520953 Fixing opensuse compilation (#521)
* Fixing opensuse compilation

* source formatting (clang-format v11) (#526)

Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>

* Update tests/tools/CMakeLists.txt

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2024-02-22 01:10:23 -06:00
Jonathan R. Madsen 0d939edbba Updates/fixes for CI, docs, tests, samples, and common library (#528)
- .github/workflows/continuous_integration.yml
  - apt-get update before apt-get install
  - remove libgtest-dev
  - actions-comment-pull-request: v2.4.3 -> v2.5.0
- .github/workflows/formatting.yml
  - create-pull-request: v5 -> v6
- cmake/rocprofiler_options.cmake
  - remove unused ROCPROFILER_DEBUG_TRACE and ROCPROFILER_LD_AQLPROFILE options
- samples/counter_collection/callback_client.cpp
  - corr_id field renamed to correlation_id
- samples/counter_collection/client.cpp
  - corr_id field renamed to correlation_id
- include/rocprofiler-sdk/fwd.h
  - In rocprofiler_record_counter_t: rename corr_id field to correlation_id
  - doxygen fixes
- lib/common/utility.*
  - remove get_accurate_clock_id_impl
  - timestamp_ns() defaults to CLOCK_BOOTTIME
- lib/rocprofiler-sdk/counters/core.cpp
  - fix spelling mistake: extrenal -> external
  - corr_id field renamed to correlation_id
- lib/rocprofiler-sdk-tool/tool.cpp
  - fix destruction of static tool::output_file before finalization
- scripts/update-docs.sh
  - define PROJECT_NAME
- tests/async-copy-tracing/validate.py
  - init_time and fini_time checks
  - hip_api_traces, marker_api_tracing
- tests/common/serialization.hpp
  - fix save function for rocprofiler_record_counter_t following rename of corr_id to correlation_id
- tests/kernel-tracing/validate.py
  - init_time and fini_time checks
  - relax test_total_runtime range
- tests/rocprofv3/tracing/CMakeLists.txt
  - remove -M from rocprofv3-test-systrace-execute
  - exclude test_hsa_api_trace in rocprofv3-test-systrace-validate due to HIP API tracing
- tests/rocprofv3/tracing/validate.py
  - update test_kernel_trace to accept mangled or demangled
- tests/tools/json-tool.cpp
  - remove use of GLOG
  - include init_time and fini_time
  - write_json(...) function
2024-02-22 00:16:43 -06:00
Benjamin Welton 7adffd5b22 Add rocprofiler_query_counter_info function (#452)
* Add rocprofiler_query_counter_info function

Replaces rocprofiler_query_counter_name. Allows for
querying other types of info from counters (such as
description) and gives us some flexibility to add
return data in the near future (if we have to).

* source formatting (clang-format v11) (#453)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Updated version fetching

* source formatting (clang-format v11) (#509)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Merged

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-19 16:05:38 -08:00
Benjamin Welton 3638351b4c Callback based handler for counter collection (#506)
* Callback based handler for counter collection

* source formatting (clang-format v11) (#507)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#508)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Doc fix

* Minor doc fix

* More doc fixes

* More doc fixes

* More doc fixes

* Update CI

* Changes to the API per comments

* Mutex exception for HSA

* source formatting (clang-format v11) (#511)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Doc fix

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-19 15:55:21 -08:00
Gopesh Bhardwaj c9dfa9e617 Fixing AFAR builds for some GNU tool chains (#498) 2024-02-15 08:08:37 -08:00
Benjamin Welton e9e7fc8e3f Add GFX1010 to counter data (#466)
NAVI 10 support.
2024-02-12 13:42:13 -06:00
Benjamin Welton 3eb6a27bc6 Add support for AQL dimensions (#262)
* Add support for AQL dimension changes

Adds support for returning dimensions from AQLProfile through rocprofiler
to tools. Includes a much larger expanded test suite that covers nearly
all files in counter collection.

Specific changes below:

samples/counter_collection/print_functional_counters: Modified to check
the validity of dimensions returned in comparison to the actual underlying
data obtained from a kernel execution.

rocprofiler-sdk/aql/helpers: adds function calls to support fetching
dimension information from AQLProfile.

rocprofiler-sdk/aql/packet_construct: modified to allow for events
to be exported to aid evaluate_ast in decoding the output buffer.

lib/rocprofiler-sdk/counters: Instance count now derived from dimension
sizes. rocprofiler_query_counter_dimensions now moved to a callback format
to improve usability.

rocprofiler-sdk/counters/core: Code migrations and exports of functions
for testing.

rocprofiler-sdk/counters/dimensions: Generates a dimension cache to be
used when querying dimension information for a counter id.

rocprofiler-sdk/counters/evaluate_ast: Modified to pass back correct
dimension information and to check/determine output dimensions for derived
counters.

rocprofiler-sdk/counters/id_decode: Modified to have a map between
dimension name -> dimension along with a conversion from the aql profile
id for a dimension (string) -> integer based id (happens only once during
init).

rocprofiler-sdk/hsa/queue: Modified to allow for making testing easier.
Specifically to allow Queue to now be mocked in unit tests for counter
collection.

* Merge with changes for serialization

* Added suggestions

* source formatting (clang-format v11) (#457)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor fix

* Test change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2024-02-07 22:03:21 -06:00
Gopesh Bhardwaj 8a25b239bc Fixing counter collection in tools and enabling tests (#436)
* Fixing coutner colleciton in tools and enabling tests

* fixing tests

* improving coverage on test

* Adding vector operations app

* Fixing tools bug for counter collection

* removing roctx linking
2024-02-06 09:55:07 -08:00
SrirakshaNag f6198f226a Kernel Serialization Support (#379)
* Serialization-rebased with main branch

* Removing client_id from queue completion callbacks

* removing debugging code

* source formatting (clang-format v11) (#449)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* moving ready signal handler to anonymous namespace

* source formatting (clang-format v11) (#450)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Handling deque search better in queue destructor

* source formatting (clang-format v11) (#451)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* disabling test_total_runtime test  in code coverage

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
2024-02-05 12:42:59 -06:00
Jonathan R. Madsen aaff4976d2 Kernel Tracing Fix (#439)
* Update lib/rocprofiler-sdk/hsa/queue.cpp

- switch using the kernel_pkt.kernel_dispatch.completion_signal instead of interrupt signal for getting the dispatch time

* Update tests/kernel-tracing/validate.py

- add verification of total runtime collected in test_timestamps
  - the sum of the runtime of all the kernels in reproducible-runtime should be ~1 sec +/- 10%

* Remove include/rocprofiler-sdk/rocprofiler_plugin.h

* Update CI workflow

- update actions/cache@v3 -> v4
- actions/cache/save@v3 -> v4
- thollander/actions-comment-pull-request@v2 -> v2.4.3

* Update pytest.ini

- change default options to one that is more verbose

* Update tests/kernel-tracing/CMakeLists.txt

- skip test_total_runtime when Address or Thread Sanitizer enabled
  - overhead skews the results

* Update tests/kernel-tracing/validate.py

- separate test_total_runtime test
2024-01-30 14:52:17 -06:00
Jonathan R. Madsen 3f39339926 API Tracing Overhaul (#437)
* Update include/rocprofiler-sdk/hsa/*

- split HSA API IDs into separate enumerations
- add support for finalize ext table

* Update include/rocprofiler-sdk/hip/*

- remove compiler_api_args.h
- rocprofiler_hip_api_args_t contains all for HIP runtime and HIP compiler
- ROCPROFILER_HIP_API_ID_ -> ROCPROFILER_HIP_RUNTIME_API_ID_

* Update include/rocprofiler-sdk/marker/table_api_id.h

- ROCPROFILER_MARKER_API_TABLE_ID_ -> ROCPROFILER_MARKER_TABLE_ID_

* Update include/rocprofiler-sdk/*/table_api_id.h

- table_api_id.h -> table_id.h

* Update include/rocprofiler-sdk/*/table_api_id.h

- table_api_id.h -> table_id.h

* Update include/rocprofiler-sdk/fwd.h

- ROCPROFILER_CALLBACK_TRACING_HSA_API split into 4 enum values:
  - ROCPROFILER_CALLBACK_TRACING_HSA_CORE_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_AMD_EXT_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_IMAGE_EXT_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_FINALIZE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_API split into 4 enum values:
  - ROCPROFILER_BUFFER_TRACING_HSA_CORE_API
  - ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API
  - ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API
  - ROCPROFILER_BUFFER_TRACING_HSA_FINALIZE_EXT_API
- rocprofiler_callback_tracing_code_object_operation_t renamed to rocprofiler_code_object_operation_t (more consistent)
- doxygen updates

* Update include/rocprofiler-sdk/buffer_tracing.h

- improved doxygen comments
- removed unused rocprofiler_buffer_tracing_queue_scheduling_record_t
- removed unused rocprofiler_buffer_tracing_correlation_record_t

* Update include/rocprofiler-sdk/callback_tracing.h

- removed rocprofiler_callback_tracing_hip_compiler_api_data_t
  - rocprofiler_hip_api_args_t and rocprofiler_hip_compiler_api_args_t were combined
  - rocprofiler_hsa_api_retval_t and rocprofiler_hsa_compiler_api_retval_t were combined

* Update lib/rocprofiler-sdk/hsa/*

- utils.hpp
  - formatters for hsa_ext_program_t and hsa_ext_control_directives_t
- defines.hpp
  - removed variadic macros from lib/common/defines.hpp
  - HSA_API_META_DEFINITION, HSA_API_INFO_DEFINITION_0, HSA_API_INFO_DEFINITION_V specialize on table id
- async_copy.cpp
  - ROCPROFILER_HSA_API_ID_* -> ROCPROFILER_HSA_AMD_EXT_API_ID_*
  - add table id to templates
  - improve async_copy_fini
- hsa.hpp
  - add hsa_table_id_lookup
  - add hsa_domain_info
  - add table id to templates
  - add copy_table function
- hsa.cpp
  - add table id to templates
  - require hsa tables to be trivial and standard layout
  - remove set_data_args specialization for hsa_amd_memory_async_copy_rect
  - implement copy_table function
- hsa.def.cpp
  - update enums

* Update lib/rocprofiler-sdk/hip/*

- defines.hpp
  - use lib/common/defines.hpp
  - add hip_table_id_lookup to HIP_API_TABLE_LOOKUP_DEFINITION
- hip.hpp
  - hip_table_id_lookup
  - template iterate_args on table id
  - templated copy_table and update_table
- hip.cpp
  - replaced api_id_bounds with hip_domain_info
  - templated iterate_args on table id
  - templated copy_table and update_table

* Update lib/rocprofiler-sdk/marker/*

- defines.hpp
  - use lib/common/defines.hpp
- marker.cpp
  - updated enums
- marker.def.cpp
  - updated enums

* Update lib/rocprofiler-sdk/tests

- common.hpp
  - ROCPROFILER_CALL_EXPECT
  - callback_data_ext
  - update get_callback_tracing_names with new enums
  - update get_buffer_tracing_names with new enums
- external_correlation.cpp
  - support new HSA API enums
- intercept_table.cpp
  - use test/common.hpp
  - update to new HSA API enums
- registration.cpp
  - support new HSA API enums
- naming.cpp
  - validation for all get_ids(), get_names(), name_by_id(), id_by_name(), etc.

* Update lib/common

- defines.hpp
  - Move IMPL_DETAIL_FOR_EACH_NARG, GET_ADDR_MEMBER_FIELDS, and GET_NAMED_MEMBER_FIELDS here
    - used by HSA, HIP, and Marker
- static_object.hpp
  - is_trivial_standard_layout static constexpr member function
  - suppress register_static_dtor when is_trivial_standard_layout

* Update lib/rocprofiler-sdk/hsa/code_object.*

- name_by_id
- id_by_name
- get_names
- get_ids

* Update lib/rocprofiler-sdk/registration.cpp

- Update rocprofiler_set_api_table for HSA

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- Update for new HSA enums
- Rework to use switch statement
  - rocprofiler_query_callback_tracing_kind_operation_name
  - rocprofiler_iterate_callback_tracing_kind_operations
  - rocprofiler_iterate_callback_tracing_kind_operation_args

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- Update for new HSA enums
- Rework to use switch statement
  - rocprofiler_query_buffer_tracing_kind_operation_name
  - rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler-sdk-tool

- helper.cpp
  - update get_buffer_id_names with new enums
  - update get_callback_id_names with new enums
- tools.cpp
  - update to use new HSA enums

* Update samples/common

- added call_stack.hpp
  - source_location struct
  - call_stack_t alias
  - print_call_stack function
- added name_info.hpp
  - utils for getting buffer/callback domain and operation names

* Update samples/api_buffered_tracing/client.cpp

- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums

* Update samples/api_callback_tracing/client.cpp

- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums

* Update tests/tools/json-tool.cpp

- update for new HSA enums

* Update tests/rocprofv3/tracing/validate.py

- update for new HSA domain names

* Update samples/counter_collection/main.cpp

- reduce number of kernels to 50,000 since 200,000 causes issues with thread sanitizer
2024-01-30 12:14:26 -06:00
Jonathan R. Madsen 9efafc4d23 Split ROCTx API tables and update intercept table API (#421)
* Update include/rocprofiler-sdk

- buffer_tracing.h
  - fix doxygen for rocprofiler_buffer_tracing_hip_api_record_t
  - update doxygen for rocprofiler_buffer_tracing_marker_api_record_t
    - remove unused marker_id field
- fwd.h
  - Split ROCPROFILER_CALLBACK_TRACING_MARKER_API into ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API
  - Split ROCPROFILER_BUFFER_TRACING_MARKER_API into ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API
  - split rocprofiler_runtime_library_t into rocprofiler_runtime_library_t and rocprofiler_intercept_table_t
    - after split of ROCTx into 3 tables, specifying rocprofiler_at_internal_thread_create became confusing

* Update include/rocprofiler-sdk-roctx/api_trace.h

- Split into three tables: core, control, and name
  - core: what it sounds like
  - control: functions for controling the profiler
  - name: functions for giving resources names

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- modifications following split into multiple tables

* Update lib/rocprofiler-sdk/marker/*

- modifications following split of ROCTx API into multiple intercept tables

* Update lib/rocprofiler-sdk/tests

- common.hpp
  - add enums to get_callback_tracing_names() and get_buffer_tracing_names()
- intercept_table.cpp
  - update test to use rocprofiler_intercept_table_t (and enums) instead of rocproifler_runtime_library_t
  - update OR combos tested
- roctx.cpp
  - updates following split of ROCTx API table into multiple tables
  - use simplified specification of control API

* Update lib/rocprofiler-sdk

- buffer_tracing.cpp
  - Updates for ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- callback_tracing.cpp
  - Updates for ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- intercept_table.hpp
  - notify_runtime_api_registration -> notify_intercept_table_registration
- intercept_table.cpp
  - updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- registration.cpp
  - updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
  - updates for notify_runtime_api_registration -> notify_intercept_table_registration

* Update lib/rocprofiler-sdk-tool

- helper.cpp
  - Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- tool.cpp
  - migrate to new enums for split ROCTx tables
  - use simplified split for control table vs. core+name tables

* Update samples/{api_callback_tracing,intercept_table}

- intercept_table/client.cpp
  - rocprofiler_runtime_library_t -> rocprofiler_intercept_table_t
- api_callback_tracing/client.cpp
  - Updates for new enums in get_callback_id_names()
  - use simplified split for control table vs. core+name tables
  - migrate to new enums for split ROCTx tables

* Update tests

- rocprofv3/tracing/validate.py
  - handle new marker domain names
- tools/json-tool.cpp
  - Updates for new enums in get_callback_id_names() and get_buffer_id_names()
  - use simplified split for control table vs. core+name tables
  - migrate to new enums for split ROCTx tables

* Update tests/rocprofv3/tracing/CMakeLists.txt

- fix FAIL_REGULAR_EXPRESSION for rocprofv3-test-trace-execute

* Update lib/rocprofiler-sdk-tool/{output_file,tool}.*

- logging in output_file dtor
- support stdout/stderr

* Update lib/common/container/record_header_buffer.hpp

- reduce probability of is_empty() returning true while emplace is happening

* Update lib/rocprofiler-sdk-tool/tool.cpp

- logging for buffered_tracing_callback
- counter collection uses CSV encoder

* Update bin/rocprofv3

- remove -i flag from help menu
2024-01-26 13:56:15 -06:00
Benjamin Welton 75264b5587 Clang-tidy performance error fixes (#411)
Fixes perf errors + ambiguity issues raised by clang-tidy
2024-01-26 10:19:18 -08:00
Jonathan R. Madsen aa813f5c9b Update lib/rocprofiler-sdk/hsa/queue_controller.cpp (#420)
- designated initializers for default_agent
2024-01-26 07:13:15 -06:00
Jonathan R. Madsen 3547a45c0c Improve buffer flush error handling (#416)
* Update include/rocprofiler-sdk/fwd.h

- add ROCPROFILER_STATUS_ERROR_FINALIZED error code

* Update lib/rocprofiler-sdk/rocprofiler.cpp

- status string for ROCPROFILER_STATUS_ERROR_FINALIZED

* Update lib/rocprofiler-sdk/buffer.cpp

- return error code if buffer flush invoked after finalized
- fatal error if task group destroyed
- error message if task runs after finalized
- improve join of task group

* Update lib/rocprofiler-sdk/counters/tests/evaluate_ast_tests.cpp

- Update lambdas to return reference due to strange -Warray-bounds and -Wstringop-overflow warnings with g++ (Ubuntu 13.1.0-8ubuntu1~20.04.2) 13.1.0
2024-01-26 04:01:09 -06:00
Jonathan R. Madsen 9a8b6f6b7b Counter API and Samples Updates (#410)
* Update include/rocprofiler-sdk/{counters,profile_config}.h

- use rocprofiler_agent_id_t instead of rocprofiler_agent_t

* Update samples

- use rocprofiler-sdk::rocprofiler-sdk instead of rocprofiler::rocprofiler in cmake
- api_callback_tracing sample roctxProfiler{Pause,Resume}
- api_callback_tracing sample uses ROCTx
- updates to use rocprofiler_agent_id_t

* Update run-ci.py

- exclude rocprofiler-sdk-tool from samples (no sample uses that code)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- Update rocprofiler_iterate_agent_supported_counters to use agent ID

* Update lib/rocprofiler-sdk/counters/core.*

- profile_config has pointer to agent instead of copy

* Update lib/rocprofiler-sdk/agent.*

- provide get_agent(...) func via rocp agent id

* Update lib/rocprofiler-sdk/{buffer,callback}_tracing.cpp

- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED for enums missing implementation

* Update lib/rocprofiler-sdk/counters.cpp

- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t

* Update lib/rocprofiler-sdk/profile_config.cpp

- update to use rocprofiler_agent_id_t instead of rocprofiler_agent_t

* Update source/docs

- requirements.txt + install reqs in cmake

* Bump version to 0.1.0

* Update samples/api_callback_tracing/CMakeLists.txt

- LD_LIBRARY_PATH for test

* Update test/rocprofv3/tracing/CMakeLists.txt

- reorder validation files so memory copy comes first

* Update lib/rocprofiler-sdk-tool/tool.cpp

- logging for flushing buffers
- variables for buffer_size and buffer_watermark
  - increase the watermark to a full buffer
- use dedicated threads for each buffer

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- test sets ROCPROF_LOG_LEVEL and ROCPROFILER_LOG_LEVEL to info

* Remove lib/rocprofiler-sdk-tool/trace_buffer.hpp

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- drop log level to warning when leak sanitizer is enabled (produces small memory leak)
2024-01-25 23:47:40 -06:00
Jonathan R. Madsen c641749fe6 HIP API Tracing (#357)
* Update include/rocprofiler-sdk/hip*

- updates for intercept table

* Update lib/common/units.hpp

- clang-tidy fixes

* Add lib/rocprofiler-sdk/hip

- tracing implementation for the HIP intercept table

* Update source/lib/rocprofiler-sdk/CMakeLists.txt

- add_subdirectory(hip)

* Update source/lib/rocprofiler-sdk/hsa

- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION

* Update lib/rocprofiler-sdk/hip

- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- stringize_impl print dereferenced pointers when possible

* Update lib/rocprofiler-sdk/tests/intercept_table.cpp

- remove failures for intercepting HIP API tables

* Update include/rocprofiler-sdk/fwd.h

- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args

* Update lib/rocprofiler-sdk/intercept_table.cpp

- support HipDispatchTable and HipCompilerDispatchTable

* Update lib/rocprofiler-sdk/internal_threading.cpp

- Support ROCPROFILER_HIP_COMPILER_LIBRARY

* Update lib/rocprofiler-sdk/registration.cpp

- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging

* Update samples/api_{buffered,callback}_tracing

- Modifications to demonstrate HIP API tracing

* Update tests/kernel-tracing

- Modifications to handle/test HIP API tracing

* Separate HIP tracing from HIP compiler tracing

* Fix installation of include/rocprofiler-sdk/hip/*

- add compiler and table headers to install

* Fixes to HIP interception

- hip_api_trace.hpp was updated a bit
  - removed hipGetDeviceProperties (generic)
  - added hipGetDevicePropertiesR0600
  - added hipGetDevicePropertiesR0000
  - removed hipRegisterTracerCallback
  - reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
  - added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers

* Update lib/rocprofiler-sdk/hip/hip.*

- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)

* Update lib/rocprofiler-sdk/hsa/hsa.*

- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)

* Update test/kernel-tracing/validate.py

- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register

* Update tests/tools/json-tool.cpp

- fix context associated with "HIP_API_CALLBACK"

* Update external/CMakeLists.txt

- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
  - BUILD_TESTING (OFF)
  - BUILD_SHARED_LIBS (OFF)
  - BUILD_OBJECT_LIBS (OFF)
  - BUILD_STATIC_LIBS (ON)
  - CMAKE_POSITION_INDEPENDENT_CODE (ON)
  - CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
  - CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog

* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt

- remove explicit setting of SKIP_BUILD_RPATH

* Update CMakeLists.txt

- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH

* Update tests/CMakeLists.txt

- include(GNUInstallDirs)

* Update samples/CMakeLists.txt

- include(GNUInstallDirs)

* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h

- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)

* Update lib/rocprofiler-sdk/hip/details/ostream.hpp

- clang-tidy fixes

* Update cmake/rocprofiler_linting.cmake

- add a feature for clang tidy exe

* Update lib/rocprofiler-sdk/hip/hip.cpp

- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- fix merge

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- fix merge

* Update bin/rocprofv3

- args for marker, HIP runtime, and HIP compiler tracing

* Update tests/apps/simple-transpose

- use roctx

* Update tests/rocprofv3/tracing

- validate marker API data

* Update lib/rocprofiler-sdk-tool

- support for HIP runtime, HIP compiler, marker API

* Update queue/queue_controller/registration/utility

- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
  - implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
  - this is used to sync each queue during queue_controller_fini()

* Fix data races: queue/context/stable_vector

- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array

* Update lib/rocprofiler-sdk/hsa/hsa.*

- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables

* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp

- use HSA subtable accessors

* Update rocprofiler_memcheck and CI workflow

- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
  - GCC 13 uses libtsan.so.2

* Update CI workflow

* Update lib/rocprofiler-sdk/counters/{metrics,counters}

- fix possibly dangling reference to a temporary from gcc-13

* Update thread-sanitizer-suppr.txt

- Ignore data races originating in hsa-runtime library

* Update cmake/rocprofiler_memcheck.cmake

- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library

* Update tests/rocprofv3/tracing/CMakeLists.txt

- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test

* Update lib/common/container/record_header_buffer.hpp

- fix data race identified by gcc v13 and libtsan.so.2

* Update hip API id, args, and def

- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0

* Update lib/common/container/record_header_buffer.hpp

- fix deadlock in save/read/reset

* Update source/docs/CMakeLists.txt

- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr

* Update lib/rocprofiler-sdk/hip/details/ostream.hpp

- remove overloads for HIP_MEMSET_NODE_PARAMS

* Update docs/CMakeLists.txt

- use find_program for shell instead of hardcoded /bin/bash
2024-01-24 16:32:54 -06:00