Графік комітів

39 Коміти

Автор SHA1 Повідомлення Дата
Rawat, Swati 97b7a6315d update copyright date to 2025 (#102)
* Update LICENSE

* Update conf.py

* Update copyright year

* [fix] Update copyright year

* Update copyright year "ROCm Developer Tools"

* Add license headers to c++ files

* Add license to *.py

* Update licenses in rocdecode sources

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Mythreya <mythreya.kuricheti@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-22 19:11:20 -06:00
Trowbridge, Ian e307b89ca4 rocDecode API Tracing Support (#49)
* rocDecode API Tracing support

* Test bin file added to rocdecode. Need to add validate python methods

* Added option to not make rocDecode tests

* Added rocdecode and rocprofv3 tests

* Added csv test

* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI

* Add option to avoid building rocdecode tests

* Added option to avoid building rocdecode bin file

* Merge conflict error

* CMake files changed in response to review comments. Attempting to implement callbacks.

* Turned off test building for rocdecode

* Minor fixes for review comments

* Review comments

* Updated formatting

* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.

* Remove iterate args support

* Remove iterate-args

* enforce abi versioning in macro if

* Fix doc error

* removed spaces to fix indentation error

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-01-17 14:42:25 -08:00
Jakaraddi, Manjunath 78d8f4b8ea SWDEV-492623: Hip Host Function to Device Symbols Mapping (#18)
* Adding changes to register and read symbols from the hip fat binary

* adding json output for host_functions

* added error handling

* adding json tool support

* Adding tests

* formatting changes

* Adding documentation

* refactoring as per amd-staging

* Adding intializers and changing macros

* Fix page-migration background thread on fork (#31)

* Fix page-migration background thread on fork

After falling off main in the forked child, all the children
try to join on on the parent's monitoring thread. This results
in a deadlock. Parent is waiting for the child to exit, but
the child is trying to join the parent's thread which is
signaled from the parent's static destructors.

Even with just one parent and child, due to copy-on-write
semantics, a child signalling the background thread to join
will still block (thread's updated state is not visible
in the child).

This fix creates background treads on fork per-child with a
pthread_atfork handler, ensuring that each child has its own
monitoring thread.

* Formatting fixes

* Detach page-migration background thread and update test timeout

* Attach files with ctest

* Update corr-id assert

* Tweak on-fork, simplify background thread

* Revert thread detach

* Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9)

* Adding Trace Period feature to rocprofv3

* Adding feature documentation

* Update source/bin/rocprofv3.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fixing format

* Moving to Collection Period and changing the input params

* Format Fixes

* Fixing rebasing issues

* Removing atomic include from the tool

* Adding more options for units, optimizing the code

* Fixing rocprofv3.py

* Fixing time conv & adding time controlled app

* Fixing format

* Changing to shared memory testing methodology

* use of shmem use

* Fix include headers for transpose-time-controlled.cpp

* Format upload-image-to-github.py

* Removing shmem and using only env var to dump timestamps from the tool

* Tool Fixes + Test Config

* Adding Tests

* Fixing Review comments

* Update trace period implementation

* Update trace period tests

* check between start and stop timestamps

* Merge Fix

* Update validate.py

* Improve safety of rocprofiler_stop_context after finalization

* Pass context id to collection_period_cntrl by value

* Adding 20 us error margin

* Ensure log level for collection-period test is not more than warning

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- move error code check macros to implementation
- fix macros which check error code
- use constexpr values instead of #define

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- debugging for error that cannot be locally reproduced

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- improve error handling and logging

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- tweak to non-fatal logging messages

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- cleanup of logging messages

* Update host kernel symbol register data fields

* Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-06 11:42:37 +00:00
Madsen, Jonathan 00c46fd5e5 SDK: OMPT Support (#22)
* Ability to select alternative compiler per file

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Misc updates

Update OpenMP target sample

- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files

Rework OpenMP implementation

Minor OpenMP implementation cleanup

Rename samples/openmp_target CMake targets

Add tests/bin/openmp

- OpenMP target test app in tests/bin/openmp/target

Format samples/openmp_target CMakeLists.txt

Misc lib/rocprofiler-sdk/openmp cleanup

- fix includes
- convert_arg

Update openmp.def.cpp

- tweak includes
- remove lots of temporary variables

Update samples

- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample

Fix code object operation names

- add "CODE_OBJECT_" prefix

Update include/rocprofiler-sdk/openmp/api_id.h

- remove spurious comment

Miscellaneous openmp updates

- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events

[SWDEV-484495] Fix int truncation in CSV output (#1098)

CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.

Co-authored-by: Benjamin Welton <ben@amd.com>

Update limit for max counter records in rocprof-tool (#1073)

A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.

adding proxy ompt_data_t * arguments

fixes for proxy pointers

- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers

formatting

SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)

Fixing some accumulate metrics (#1089)

* Fixing some accumulate metrics

* Fixing some more accumulate metrics

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

updating rocprofv3 help options (#1113)

* updating rocprofv3 help options

* updating CHANGELOG

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)

* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding backlog for codeobj changes

* Formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

---------

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

SWDEV-487621: Fixes for metric definitions (#1118)

* Fixes for metric definitions

* Removing gfx8

* Update changelog

* Fixing unit tests

* Small fixes

* Fix for write size

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3

clang-18 build fix for RCCL (#1123)

Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).

mem copy direction field update (#1124)

Adding Node-id for debugging with log level trace (#1090)

fix botched rebase

Per Jonathan to remove -rdynamic warning so CI will continue

pedantic formatting

Correct the package name of rocprofiler-sdk (#1126)

* Correct the package name of rocprofiler-sdk

ROCM VERSION(for ex: 60300) was missing in the package name.
Added the same

* Use cmake cache string while setting the variable for ROCm Version

* correct the cmake-format

---------

Co-authored-by: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>

Fixing kokkosp tool library packaging (#1121)

* Fixing kokkosp tool library packaging

* Update source/lib/rocprofiler-sdk-tool/kokkosp/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt

* Update CMakeLists.txt

* Component Requirement in CPack

* Adding package dependency

* Update CMakeLists.txt

* Update rocprofiler_config_packaging.cmake

* Fix rocprofiler-sdk-tool-kokkosp BUILD/INSTALL RPATH

- CMAKE_INSTALL_LIBDIR doesn't help

* Add BUILD/INSTALL RPATH to rocprofv3-trigger-list-metrics

- fixes packaging issues

* Update packaging

- core depends on rocprofiler-sdk-roctx
- add CPACK_DEBIAN_PACKAGE_SHLIBDEPS_PRIVATE_DIRS to resolve inter-package dependencies

* Fix package depends version format

* Improve tests/rocprofv3/summary/validate logging

* Update CI workflow

- prioritize roctx package in Install Packages step

* Remove setting <package-name>_VERSION in config.cmake.in

- this is automatically handled by existence of <package-name>-config-version.cmake

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements to same major and minor version

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements (remove EXACT, specify range)

* Tweak CI workflow

* Update perfetto_reader.py

- better handle failure to load trace processor

* Misc cleanup for config packaging

* Update config packaging

* Update config packaging

* Revert perfetto for core-rpm packages

* Revert perfetto for core-rpm packages

- perfetto < 0.9.0

* Tweak tests/rocprofv3/summary/validate.py

- reorder some checks

---------

Co-authored-by: Ammar Elwazir <aelwazir@useocpm2m-387-013.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

Clang Warning Fixes (#1131)

Builds prevented on clang-18

Adding start and end timestamp columns in csv (#1128)

* Adding start and end timestamp columns in csv

* Adding assert check for the counter timestamps

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

rocprofv3: docs and help menu updates (#1129)

* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence

Renamed agent profiling service to device counting service (#1132)

* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

Testing updated RPM dockers (#1136)

* Testing updated RPM dockers

* Trying to fix PSDB for test package dependency

Agent Profiling Fixes for Broken/Improper API Usage (#1122)

Prevent's multiple setups of agent profiling on the same agent.

Fixes agent read context to only read agents that were setup.

Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.

Simplifying PR template (#1139)

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3

delete unused files

added arguments to some OMPT buffter records

* Fix cmake issues

Remove rocprofiler_ompt_finalize_tool

- a public API function is not necessary: should just finalize rocprofiler-sdk

Fix duplicate ROCPROFILER_{BUFFER,CALLBACK}_TRACING_KIND_STRING

Add lib/rocprofiler-sdk/ompt.hpp

- declares rocprofiler::sdk::finalize_ompt

Remove change to tests/rocprofv3/summary/conftest.py

Add set_fini_status(1) back to registration.cpp

Deleted uneeded files

Incoporate OpenMP code and sample

Fix merge issues with amd-staging

Add push_correlation_id for OpenMP tasking; improve debugability

fixup bad merge

* Suppress OpenMP data race

* Fix openmp_target sample

* Enum and struct name changes + source code reorg

- remove mix of ompt and openmp
  - opted for ompt
- changes made for consistency
  - ompt_api -> ompt
  - openmp_api -> ompt
  - OPENMP -> OMPT

* Update tests and more renaming

- dest_device_num -> dst_device_num
- src_addr -> src_address
- dest_addr -> dst_address
- remove info_type::begin
- require OMP_TARGET_OFFLOAD

* Update openmp-target test/sample env and labels

* Formatting

* Tweaks to cmake for openmp target

- Disable for thread sanitizers due to preloading issue

* OpenMP target cmake updates

- remove gfx1010 (fails on mi300)
- OPENMP_GPU_TARGETS

* Remove device_unload and target_map_emi support

- these are never supported by AMD OpenMP compilers

* Update CI workflow

- exclude openmp-target tests from navi3 and vega20

---------

Co-authored-by: Larry Meadows <Lawrence.Meadows@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-05 22:48:19 -06:00
Jonathan R. Madsen 249c50fc40 Runtime Initialization Tracing (#1105)
* Runtime initialization tracing

- calbacks and buffer entries notifying when a runtime has been initialized

* Minor cleanup to registration.cpp

* JSON tool implementation

* Increase perfetto_reader timeout

* Handle perfetto_reader timeout when attr doesn't exist

* clang-tidy fixes to memory_allocation.cpp
2024-11-18 20:50:29 -06:00
itrowbri 3bd7773cf7 Memory Allocation Tracking (#1142)
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions

* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected

* Debugging and implementing generateCSV function

* Memory allocation size and starting address outputted to csv and json file formats

* Formatting

* Initial setup for OTF2 and Perfetto generation

* Collecting agent id for memory_allocation and formatting

* Modified memory_allocation.cpp to set up code for AMD_EXT commands

* Support for memory_pool_allocate added

* Removed accidently added file

* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works

* Formatting

* Fixed perfetto and otf2 output

* Fixed flag issue due to incorrect buffer use

* Updated documentation

* Small cleaning and comments

* Added test for HSA memory allocation tracing

* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error

* Decreased lower limit of hip calls for test

* Modified summary tests to vary number of allocate requests

* Minor fixes to address comments. Still need to address OTF2 comments

* Fix docs and changed OTF2 to use enum for type specified in location_base construction

* Fixed schema error

* Added vmem command tracking. Need to add test

* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.

* OTF2 enum update and mispelling fix

* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API

* Update CMakeLists.txt for memory allocation test

* Updated summary test

* Minor fixes to address comments

* Moved domain_type.hpp enum to before LAST

* Fixed compile errors and formatting

* Fixed stats summary domain name error

* Added rocprofv3 test

* Page migration test fix

* Undo page migration test changes. Failures do not appear to have to do with memory allocation
2024-11-18 20:22:14 -06:00
Benjamin Welton bb69467765 Renamed agent profiling service to device counting service (#1132)
* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
2024-10-18 14:14:11 +05:30
Gopesh Bhardwaj 320427b5f5 rocprofv3: docs and help menu updates (#1129)
* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence
2024-10-17 13:28:53 +05:30
Mythreya 2a146259c7 Add support for RCCL tracing (#1047)
* [Draft]: Add support for RCCL tracing

Address comments

* [Draft]: Add support for RCCL tracing

Address PR comments, changes from RCCL upstream

* Add RCCL library table registration

Working on adding support to rocprofiler-register

* Support compilation w/o <rccl/amd_detail/api_trace.h>

- dummy api_trace.h header
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED when RCCL does not have api_trace.h header

* RCCL API tracing tool support

- add to rocprofv3
- add to json-tool

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-09-12 00:42:58 -05:00
Jonathan R. Madsen 5d54682468 Misc cleanup and stale code removal (#1026)
* Remove custom allocators

- remove unused lib/rocprofiler-sdk/allocator.*
- remove unused lib/rocprofiler-sdk/context/allocator.hpp

* Fix rocprofiler_strip_target (rocprofiler_utilities.cmake)

* Remove old HSA_TOOLS_LIB support

- remove OnLoad/OnUnload functions used by HSA_TOOLS_LIB env variable

* Fix linter warnings + specific NOLINT exceptions

- replace bare NOLINT with NOLINT(<warning-name>)
2024-08-20 01:07:32 -05:00
Giovanni Lenzi Baraldi fa1b9e67ab ATT Agent fixes and improvements (#1011)
* Tidying ATT dispatch API. ATT Agent to be initialized with rest of profiler. Removing read_index-based wait.

* Formatting

* Adding some input validation

* Add perf test for agent

* Removing async
2024-08-15 13:57:13 -03:00
Gopesh Bhardwaj dc671497da look for symbols in dynsym table (#990)
* look for symbols in dynsym table

* checking both symtab and dynsym

* Avoid symbol duplication in non stripped binaries

* clang-format

* Minor elf_utils.cpp updates

- use 'else if' instead of 'if'
- logging tweaks

* Update registration

- tweak logging

* Update testing

- strip the rocprofiler-sdk-c-tool library
- add test-c-tool-rocp-tool-lib-execute test which does NOT LD_PRELOAD the library (uses only ROCP_TOOL_LIBRARIES instead)

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-07-26 10:21:04 +05:30
Jonathan R. Madsen 2be3543c7b Parse ELF format for rocprofiler_configure symbol (#970)
* Parse ELF format to search for rocprofiler_configure

* Use ELF parsing in registration
2024-07-11 20:22:26 -05:00
Giovanni Lenzi Baraldi 8da0c35079 Adding wrappers on HSA for executable load/unload and allowing multiple agents per context on ATT (#951)
* Codeobj wrappers around HSA calls for ATT

* Formatting

* Bookeeping

* Tidy

* Tidy

* Update source/lib/rocprofiler-sdk/thread_trace/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/thread_trace/att_core.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Variable naming

---------

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
2024-06-25 13:46:13 -03:00
Jonathan R. Madsen 62ec95eae6 Sync queue and async copy on client finalizer (#950) 2024-06-24 20:38:34 -05:00
Ammar ELWazir 987ae3cc47 PC Sampling Support (#715)
* cmake formatting (cmake-format) (#188)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#189)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets

* source formatting (clang-format v11) (#191)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#192)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: shadow variable fix

* pcs: fix for compiler errors reported by CI/CD

* source formatting (clang-format v11) (#193)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: docs fix; samples uses rocprofiler::rocprofiler library

* cmake formatting (cmake-format) (#195)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client in samples folder fixed

* pcs: client requires rocprofiler package as dependency

* pcs: client uses single context

* source formatting (clang-format v11) (#196)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client using single buffer; no buffer destroy in client

* pcs: client::setup explicitly called from the example

* pcs: rocprofiler_pc_sample_record_t updated

* pcs: fixed init of external correlation id

* source formatting (clang-format v11) (#198)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: remove outdated files; update CMakeLists

* cmake formatting (cmake-format) (#212)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: using rocprofiler_agent_id_t

* pcs: Removing trailing whitespaces

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* source formatting (clang-format v11) (#214)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: mapping agent_id to the agent

* source formatting (clang-format v11) (#215)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: const while iterating over agents

* source formatting (clang-format v11) (#216)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: calling get_buffer instead of get_buffers

* pcs: workgroup typo

* pcs: documentation for the public PC sampling API

* pcs: queue_cb_t signature adaptation

* pcs: mocks removed

* pcs: updating HsaApiTable with HSA/ROCr PC sampling API

* pcs: querying available PC sampling configs through IOCTL

* pcs: create the PCS session in IOCTL

* pcs: first actual PC samples delivered to the rocprofiler's client :)

* pcs: works with marker packet too

* pcs: using HSA table to call pc sampling related functions

* pcs: using ioctl instead of kfd in naming

* pcs: configuration service test fixed

* pcs: sample processing test fixed

* pcs: marker packet macro wrapper removed

* pcs: marker packet is part of the rocprofiler_packet union

* pcs: one fixme added

* pcs: client that uses pc-sampling and code obj tracing

* pcs: client that supprts PC sampling and code obj tracing refactored

* pcs: show more info for each PC sample

* pcs: hex output for the samples that do not belong to the matmul kernel

* pcs: querying avail configuration happens immediately before configuring

* pcs: hsa_ven_amd_pcs_create_from_id renamed

* pcs: using hsa_stop; accessing a buffer by id from parser

* pcs: includes reworked, tests returned to life

* pcs: rocrofiler dir removed as outdated

* cmake formatting (cmake-format) (#271)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#272)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: some warnings fixed

* source formatting (clang-format v11) (#273)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#274)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: show MI200 relevant information in the sample

* pcs: queue cb fixed; rocr.h include fixed

* source formatting (clang-format v11) (#296)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: getting hsa_agent and the doorbell_id from hsa_queue

* source formatting (clang-format v11) (#297)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: correlation ID logic fixed

* source formatting (clang-format v11) (#303)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: pure pc sampling example fixed

* source formatting (clang-format v11) (#307)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#308)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: interval value if the PC sampling is already configured

* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED

New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.

* pcs: hsa_amd_queue_get_info mocked in tests

* source formatting (clang-format v11) (#328)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs (tests): query configs after configuring service

* source formatting (clang-format v11) (#329)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: sample checks workgroup_id_* and wave_id

* source formatting (clang-format v11) (#330)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs samples: running samples on the device 0

* pcs: kfd_ioctl updated

* pcs: ioctl config struct changed fields names

* pcs: status when PC sampling is configured by another process is renamed

* pcs: HSA PC sampling API table fixed

* pcs: tmp hack to be able to use HSA pc sampling table

* source formatting (clang-format v11) (#443)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs service use CIDs generated by HIP API tracing service

* source formatting (clang-format v11) (#455)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#456)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: CID manager

* pcs: explicit flush with no delivered data executes retirement logic

* source formatting (clang-format v11) (#464)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update

* source formatting (clang-format v11) (#465)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_configure_pc_sampling_service docs update

* pcs: explicit sync introduced in PCSCIDManager

* pcs: new logic for retiring CIDs in PC sampling service documented

* pcs: queue interception cb signature updated

* source formatting (clang-format v11) (#471)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: if no agents supports PC sampling, fail gracefully

* elaborating when KFD returns EBUSY and EEXIST

* pcs: the second PC sampling examples fails gracefully

* code samples use only single kernel for now

* pcs: CID manager refactored

* source formatting (clang-format v11) (#481)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: ioctl update

* source formatting (clang-format v11) (#531)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs:code sample to test PC sampling applied on concurrent kernels

* source formatting (clang-format v11) (#533)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: pc sampling strest test included

* cmake formatting (cmake-format) (#539)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#540)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: standalone benchmark

* cmake formatting (cmake-format) (#555)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: glance in external correlation IDs

* source formatting (clang-format v11) (#557)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* another change in ioctl interface

* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version

* source formatting (clang-format v11) (#611)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: avoid running problematic PC sampling test

* pcs: guarding tests not to fail on architectures not supporting PC sampling

* source formatting (clang-format v11) (#617)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: check IOCTL version prior to each KFD call

* pcs: ioctl refactoring

* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch

* cmake formatting (cmake-format) (#631)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#632)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: PC sampling service provides external correlation IDs

* source formatting (clang-format v11) (#644)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: use rocprofiler_dim3_t for workgrou_ip

* source formatting (clang-format v11) (#645)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: minor fixes

* pcs: updating the documentation for the pc sampling API functions

* pcs: api table and queue controller fix

* pcs: don't generate marker packets for the agent if PC sampling is not configured on it

* pcs: multi-GPU and single-GPU clients

* source formatting (clang-format v11) (#700)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: warning and errors fixed

* source formatting (clang-format v11) (#702)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: clang compiler errors and warnings fixed

* source formatting (clang-format v11) (#716)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const reference in cid manager

* source formatting (clang-format v11) (#717)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const & func in manager explicit

* pcs: test to cover creating PC sampling service of agent that does not exist

* pcs: generate marker packets if service is active

* source formatting (clang-format v11) (#719)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp

* Update utils.cpp

* moving pc-sampling tests and samples to pc-sampling label

* Format fix

* pcs: use configured instead of active service

* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp

* pcs: ensure configuring PC sampling on the HSA level is called only once

* pcs: minor fix

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* pcs: refactoring IOCTL integration

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: reverting back what bot doubled

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: retesting the bot

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: why bot fails on this IOCTL status

* pcs: why failing on <vector>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: returning commits removed by bot

* pcs: formatting locally

* pcs: clients are flushing buffers inside the tool_fini

* pcs: sync function in public API

* pcs: sync prior to unloading the code object

* pcs: sync function requires context

* pcs: client uses CID retirement service

* pcs: test for flusing internal ROCr buffers

* pcs: source formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: code samples refactoring

* pcs: public API header refactored

* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too

* pcs: remove unnecessary functions

* pcs: do not call hsa's copytables

* pcs: include reordering

* pcs: using ROCP_ERROR inside PC sampling implementation

* pcs: pc_sampling sample uses ostream instean of printfs

* pcs: pc_sampling_codeobj tracing using ostream instead of prints

* pcs: registering once for interceptor callbacks

* pcs: do not generate internal CIDs if not in debug mode

* pcs: rebasing fixed; missing external correlation IDs

* pcs: code formatting

* enable kernel tracing service to receive external correlation IDs

* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL

* pcs: polishing parser

* formatting

* updating parser to use workgroup_id

* kfd_ioctl.h extracted in details folder

* refactoring

* pcs: preparing to generate code object information

* flush internal buffers prior to unloading code object

* pcs: generating marker records

* pcs: wrap code_object's shutdown function

* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment

* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES

* pcs: separate structs for code object loading/unloading markers

* pcs: inst_pkt_t changed the namespace

* pcs: removing wrapper around the shutdown function

* pcs: size in record field

* pcs: documentation refactoring + typdefs

* renaming PCSAgentConfig to PCSAgentSession

* pcs: service does not keep a pointer to the context

* pcs: static assertions related to the versioning

* pcs: rocprofiler_pc_sampling_configuration_t size field

* pcs: report API unimplemented unleass explicitly enabled

* pcs: skip tests if KFD does not support PC sampling

* pcs: if ROCr hides some devices, no PC samples will be delivered for it

* pcs: hip error check after kernel launch

* formatting

* removing PCS info from agent.h

* fix based on review

* Update continuous integration workflow

- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300

* Updating pc sampling test labels

* ROCP_PC_SAMPLING_ENABLED env in CI

* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs

* Rearrange sanitizer assignments

* fixes according to review

* removed unused functions

* pcs: rocprofiler_agent_id_t instead of handle as a key in map

* Update source/lib/rocprofiler-sdk/context/context.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* removing drm_fd from the agent.h

* pcs: removing one sample due to complexity

* pcs: refactoring sample

* simplifying sample

* new lines

* Improve queue_control enable intercepter logic

* Update lib/rocprofiler-sdk/hsa/types.hpp

- handle amd_ext size for HSA 1.12.0

* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED

* Update hsa_adapter.cpp

- anonymous namespace + remove debug

* parser update

* Apply suggestions from code review

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-05-24 09:49:44 -05:00
Benjamin Welton 28e6430d04 [2/N] Agent Counter implementation with unit tests to check functionality (#846)
Agent Counter Collection API with tests and samples.
---------

Co-authored-by: Benjamin Welton <ben@amd.com>
2024-05-21 13:34:54 -07:00
Jonathan R. Madsen 4d5b71b0e7 Update logging (#838)
* Update logging

* Remove unused function

* Fix lib/rocprofiler-sdk/hsa/pc_sampling.cpp logging compilation

* Fix logging FLAGS_vmodule string leak and numerical log level

* Update logging

* Update glog submodule

* Leak fixes

* format
2024-05-20 15:38:18 -05:00
Vladimir Indic 733aa8e438 Restructure code object source code (#826)
* public codeobj info

* Restructure code object source code file layout

* Update get_unloaded_code_objects + add iterate_loaded_code_objects

* Remove get_unloaded_code_objects from visible internal API

- iterate_loaded_code_objects + functor which filters on the hsa_executable_t effectively reproduces this behavior

* Whitespace removal

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-25 14:03:04 -05:00
Vladimir Indic deabd869b5 Introducing PcSamplingExtTable (#735)
* pcs: updating the PCS table

* Fixing Clang Tidy errors

* pcs: reverting old table version

* testint wrong table size

* new size

* testing step

* reverting old steps

* hsa_amd_queue_get_info introduced

* pcs: testing table version

* formatting

* removing redundand declarations

* removing unnecessary files

* Apply suggestions from code review

* Apply suggestions from code review

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/hsa/pc_sampling.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Enable function pointer offset check in hsa::pc_sampling::copy_table

- add offset() to HSA_API_META_DEFINITION
- check if offset() >= size of struct

* Support build without PC sampling API table

* ids for ROCr's PC sampling public functions

---------

Co-authored-by: Ammar ELWazir <aelwazir@hpe6u-21.amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-22 20:07:28 -05:00
Mythreya fd3d97287c Page migration reporting (#651)
* Page migration reporting support

* Page migration: Update parser and reporting

Container does not lave latest KFD header, so CI might fail

* Add kfd_ioctl.h

* Formatting

* Update get_key

- get key was not used (and shouldn't be), so delete it

* clang-tidy fixes

* Tests for page migration

* Apply suggestions from code review

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update tests/bin/page-migration/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update page-migration test app

- add hipHostRegister to register mmap'ed allocation with HIP
- misc cleanup and reorg
- remove HSA_XNACK=1 from test env

* Update lib/rocprofiler-sdk/tests/page_migration.cpp

- fix compilation error

* Minor updates (reorg, rename)

* Page migration reporting support

* Page migration: Update parser and reporting

Container does not lave latest KFD header, so CI might fail

* Update page migration tests, fix trigger types

* Page Migration Tracing Support Refactoring (#753)

* Reorganization

* Update page migration init/fini

* Formatting

* Update page_migration.cpp

- change logging severity

* Skip test if KFD does not support page migration reporting

* Rework skipping test if KFD does not support page migration

* Fix event trigger enum values

* Fix clang-diagnostic-unused-const-variable

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
2024-04-12 15:51:44 -05:00
Gopesh Bhardwaj e2d8ccad4b adding pandas and pytest to rquirements.txt (#748)
* adding pandas and pytest to rquirements.txt

* setting up requrements.txt

* Update requirements

- formatting packages
- remove packages not directly used by rocprofiler-sdk

* Update cmake formatting, linting, and options

- if BUILD_CI -> force BUILD_DEVELOPER and BUILD_WERROR
- support python installed clang-format and python installed clang-tidy

* Update build.sh

- split into install-deps.sh and install-apt-deps.sh

* Improve code coverage

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-09 07:24:40 -05:00
Mythreya 4fa165ec1a Add support for scratch reporting (#523)
* Add ToolsApiTable

Add ToolsApiTable wrapping for
scratch memory tracking

* Add initial support for scratch memory tracking

Buffering is implemented

* cmake formatting (cmake-format) (#525)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* source formatting (clang-format v11) (#524)

Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>

* Add callback tracing for scratch

Fixed the error where scratch tracking init was called irrespective of whether any client requested for it

* Apply suggestions from code review

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix tools api copy/update

Table were saved/updated incorrectly in previous
commit. Also adds passing user data through the callback

* Fix OpKind sequence for scratch tracking

Previously scratch was using OpKind from rocprofiler-sdk, but
templates were instantiated using API ID. These differ by 1

* Integration tests for scratch reporting

Added buffer and callback integration tests for scratch reporting

* source formatting (clang-format v11) (#550)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* cmake formatting (cmake-format) (#551)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* python formatting (black) (#549)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* CI fixes

* source formatting (clang-format v11) (#554)

Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com>

* Update api

Rebase on main and updates based on PR feedback

* Update scratch reporting and address PR comments

- Added agent id to buffer records
- Updated `test_internal_correlation_ids` - Is almost identical to
  one in async-copy
- Updated scratch test to check for agent id
- Updated queue id serialization in callback records (prints
  handle as nested key)
- Remove `marker_api_traces` from scratch `test_internal_correlation_ids`
  validation test
- Rename `amd_tools_api` to `scratch_memory`
- Added doxygen comments
- Remove scratch callback from `tool.cpp`
- Replace assert with `LOF_IF` in `scratch_memory.cpp`

* Update tools table

Changed to match up with changes to hsa tables in main branch

* Rework scratch memory structure

* Update tests

- Added suggestions from PR review, and updated tests accordingly

* Misc cleanup

* Update scratch test

As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled.

Note,
> This API: `hsa_amd_agent_set_async_scratch_limit` is currently
> disabled. We need some changes in CP firmware to be able to do this
> and these changes are not ready yet.
> With the current code, you will also not get notifications for
> alternate-scratch allocations because this feature has been disabled
> while CP firmware is making additional changes
> We are hoping to have that feature enabled by ROCm-6.3

* Minor update to lib/rocprofiler-sdk/internal_threading.*

- delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-05 20:32:57 -05:00
Benjamin Welton 41c0ddd72d Convert LOG() -> ROCP_X logging macros. (#695)
* Convert LOG() -> ROCP_X logging macros.

This patch converts the LOG() macro to the ROCP_X logging macros.
There are the following levels of logs.

Logs whos expressions are not evaluated unless the log level is enabled:

ROCP_TRACE - VLOG(2) (enabeled by env variable GLOG_v=2)
ROCP_INFO - VLOG(1) (enabeled by env variable GLOG_v=1)

Logs whos expressions are always evaluated:

ROCP_WARNING - LOG(WARNING)
ROCP_ERROR - LOG(ERROR)
ROCP_FATAL - LOG(FATAL)
ROCP_DFATAL - DLOG(FATAL) (only fatal in debug mode)

* source formatting (clang-format v11) (#696)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Minor fix

* Fixes for VLOG before main

* fix vmodule

* source formatting (clang-format v11) (#718)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* memory leak fix

* Vlog change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-04-02 17:15:30 -07:00
Jonathan R. Madsen 939e23e9d1 Stop all client contexts prior to finalization (#721)
* Stop all client contexts prior to finalization

* Update lib/common/container/static_vector.hpp

- improve emplace_back for non-{move,copy}-assignable object

* Update samples/intercept_table/client.cpp

- improve robustness against static object destruction

* Update lib/rocprofiler-sdk/context/context.cpp

- change storage of registered context array
  - stable_vector of optional contexts
  - common::static_object wrapper around stable_vector

* Update samples/intercept_table/client.cpp

- use variable template for underlying function pointer
2024-04-02 03:05:11 -05:00
Jonathan R. Madsen bc9f86ec62 Update HSA copy table (#687)
- two copies of HSA table: internal and tracing
- internal is used to invoke HSA function without any possibility of triggering tracing, etc.
2024-03-26 17:11:34 -05:00
Jonathan R. Madsen 7b6d3c70bd Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin

* Renamed cmake project in tests/bin

* Update samples

- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages

* Update tests

- Use ROCPROFILER_DEFAULT_FAIL_REGEX

* Add tests/lib

- libraries with HIP code

* Update PTL submodule

- remove atexit delete of thread_id_map

* Update cmake/rocprofiler_options.cmake

- Set ROCPROFILER_DEFAULT_FAIL_REGEX

* Update common lib: env + logging

- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*

* Add lib/rocprofiler-sdk/shared_library.cpp

- shared library constructor

* Update lib/rocprofiler-sdk-tool/tool.cpp

- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging

* Add tests/bin/hip-in-libraries

- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
  - used for testing deadlocking within __hipRegisterFatBinary

* Update bin/rocprofv3

- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1

* Add tests/rocprofv3/tracing-hip-in-libraries

- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels

* Update bin/rocprofv3

- fix counter collection (no exec)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- replace "Kernel-Name" with "Kernel_Name"

* Update lib/rocprofiler-sdk/registration.cpp

Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries

* Update tests/rocprofv3

- replace "Kernel-Name" with "Kernel_Name"

* Update tests

- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection

* Update samples/counter_collection

- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes

* Update bin/rocprofv3

- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace

* Update lib/rocprofiler-sdk-tool

- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime

* Update lib/rocprofiler-sdk/agent.cpp

- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)

* GPR (vector and scalar) info in kernel symbol data

- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info

* Header include order fix

- Include repo headers first
- Third party library headers next
- standard library headers last

* Update dispatch profiling public API

- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback

* Update tests/bin/CMakeLists.txt

- fix add_subdirectory(hip-in-libraries) order

* Update VERSION

- bump to 0.2.0 in prep for AFAR
2024-03-07 22:21:26 -06:00
Jonathan R. Madsen b0a88d9124 Update registration client search (#569)
* Update registration client search

- Search ROCP_TOOL_LIBRARIES before dlopen search
- Fatal error if ROCP_TOOL_LIBRARIES entry does not contain rocprofiler_configure symbol
- Use RTLD_DEFAULT and RTLD_NEXT to (potentially) find first two instances of rocprofiler_configure
  - if no rocprofiler_configure found via RTLD_NEXT, do not do extensive search via link map

* _GNU_SOURCE instead of GNU_SOURCE

* Clang-tidy fix
2024-03-01 17:44:12 -06:00
Jonathan R. Madsen 1bb94add11 Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args

- `include/rocprofiler-sdk/hip/api_args.h`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
  - iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
  - fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
  - improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
  - remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
  - destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
  - improve parallelism in tool_tracing_callback
  - serialize the marker api args
  - only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
  - reduce timeout on tests to 120 seconds

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- disable dereference of double pointer in stringize_impl

* Update lib/common

- indirection_level in mpl.hpp
- stringize_arg.hpp

* Rework rocprofiler_iterate_callback_tracing_kind_operation_args

- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
2024-03-01 01:46:07 -06:00
Jonathan R. Madsen 875f53b608 Correlation ID Retirement + misc (#527)
* Correlation ID Retirement

- include/rocprofiler-sdk/buffer_tracing.h
  - add rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- include/rocprofiler-sdk/fwd.h
  - ROCPROFILER_BUFFER_TRACING_CORRELATION_ID_RETIREMENT
- lib/rocprofiler-sdk/buffer_tracing.cpp
  - kind string for correlation id retirement
- lib/rocprofiler-sdk/buffer.hpp
  - emplace returns bool
- lib/rocprofiler-sdk/registration.cpp
  - pass lib_instance to copy_table functions
- lib/rocprofiler-sdk/context/context.*
  - update correlation_id struct
    - make ref_count private
    - {get,add,sub}_ref_count() functions
      - sub_ref_count() performs correlation id retirement
    - use stack for "latest" thread-local correlation id
- lib/rocprofiler-sdk/hip/hip.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/hsa.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/marker/marker.*
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - return in iterate_args
  - handle table instance in copy_table
- lib/rocprofiler-sdk/hsa/async_copy.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - handle table instance in async_copy_init / async_copy_save
- lib/rocprofiler-sdk/hsa/queue.cpp
  - migrate to new {get,add,sub}_ref_count() for correlation ids
  - tweak to external correlation id mapping in WriteInterceptor
- tests/async-copy-tracing/validate.py
  - check retired_correlation_ids
- tests/common/serialization.hpp
  - support rocprofiler_buffer_tracing_correlation_id_retirement_record_t
- tests/kernel-tracing/validate.py
  - check retired_correlation_ids
- tests/common/CMakeLists.txt
  - perfetto external project
- tests/common/perfetto.hpp
  - perfetto categories + aliases
  - add_perfetto_annotation
  - metaprogramming helpers
- tests/tools/CMakeLists.txt
  - link to tests-perfetto
- tests/tools/json-tool.cpp
  - demangling functions
  - serialization of marker API callback args
  - reduce parallel bottleneck in tool_tracing_callback
  - support correlation id retirement
  - Multiple threads for buffers
  - Support ROCPROFILER_TOOL_CONTEXTS_EXCLUDE env variable
  - write_perfetto() function

* Update tests/rocprofv3/tracing/validate.py

- tweak test_hsa_api_trace

* Update PTL submodule

- fixes for data race during destruction of task

* Update lib/rocprofiler-sdk/buffer.*

- unique_buffer_vec_t uses std::unique_ptr instead of allocator::unique_static_ptr_t

* Reduce timeouts in counter collection samples [skip ci]

* Update tests/tools/json-tool.cpp

- tweak demangle(string_view, int*) -> demangle(string_view, int&)

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- move sub_ref_count() to later in async_copy_handler to delay retirement slightly more
2024-02-23 10:30:33 -06:00
Jonathan R. Madsen 3f39339926 API Tracing Overhaul (#437)
* Update include/rocprofiler-sdk/hsa/*

- split HSA API IDs into separate enumerations
- add support for finalize ext table

* Update include/rocprofiler-sdk/hip/*

- remove compiler_api_args.h
- rocprofiler_hip_api_args_t contains all for HIP runtime and HIP compiler
- ROCPROFILER_HIP_API_ID_ -> ROCPROFILER_HIP_RUNTIME_API_ID_

* Update include/rocprofiler-sdk/marker/table_api_id.h

- ROCPROFILER_MARKER_API_TABLE_ID_ -> ROCPROFILER_MARKER_TABLE_ID_

* Update include/rocprofiler-sdk/*/table_api_id.h

- table_api_id.h -> table_id.h

* Update include/rocprofiler-sdk/*/table_api_id.h

- table_api_id.h -> table_id.h

* Update include/rocprofiler-sdk/fwd.h

- ROCPROFILER_CALLBACK_TRACING_HSA_API split into 4 enum values:
  - ROCPROFILER_CALLBACK_TRACING_HSA_CORE_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_AMD_EXT_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_IMAGE_EXT_API
  - ROCPROFILER_CALLBACK_TRACING_HSA_FINALIZE_EXT_API
- ROCPROFILER_BUFFER_TRACING_HSA_API split into 4 enum values:
  - ROCPROFILER_BUFFER_TRACING_HSA_CORE_API
  - ROCPROFILER_BUFFER_TRACING_HSA_AMD_EXT_API
  - ROCPROFILER_BUFFER_TRACING_HSA_IMAGE_EXT_API
  - ROCPROFILER_BUFFER_TRACING_HSA_FINALIZE_EXT_API
- rocprofiler_callback_tracing_code_object_operation_t renamed to rocprofiler_code_object_operation_t (more consistent)
- doxygen updates

* Update include/rocprofiler-sdk/buffer_tracing.h

- improved doxygen comments
- removed unused rocprofiler_buffer_tracing_queue_scheduling_record_t
- removed unused rocprofiler_buffer_tracing_correlation_record_t

* Update include/rocprofiler-sdk/callback_tracing.h

- removed rocprofiler_callback_tracing_hip_compiler_api_data_t
  - rocprofiler_hip_api_args_t and rocprofiler_hip_compiler_api_args_t were combined
  - rocprofiler_hsa_api_retval_t and rocprofiler_hsa_compiler_api_retval_t were combined

* Update lib/rocprofiler-sdk/hsa/*

- utils.hpp
  - formatters for hsa_ext_program_t and hsa_ext_control_directives_t
- defines.hpp
  - removed variadic macros from lib/common/defines.hpp
  - HSA_API_META_DEFINITION, HSA_API_INFO_DEFINITION_0, HSA_API_INFO_DEFINITION_V specialize on table id
- async_copy.cpp
  - ROCPROFILER_HSA_API_ID_* -> ROCPROFILER_HSA_AMD_EXT_API_ID_*
  - add table id to templates
  - improve async_copy_fini
- hsa.hpp
  - add hsa_table_id_lookup
  - add hsa_domain_info
  - add table id to templates
  - add copy_table function
- hsa.cpp
  - add table id to templates
  - require hsa tables to be trivial and standard layout
  - remove set_data_args specialization for hsa_amd_memory_async_copy_rect
  - implement copy_table function
- hsa.def.cpp
  - update enums

* Update lib/rocprofiler-sdk/hip/*

- defines.hpp
  - use lib/common/defines.hpp
  - add hip_table_id_lookup to HIP_API_TABLE_LOOKUP_DEFINITION
- hip.hpp
  - hip_table_id_lookup
  - template iterate_args on table id
  - templated copy_table and update_table
- hip.cpp
  - replaced api_id_bounds with hip_domain_info
  - templated iterate_args on table id
  - templated copy_table and update_table

* Update lib/rocprofiler-sdk/marker/*

- defines.hpp
  - use lib/common/defines.hpp
- marker.cpp
  - updated enums
- marker.def.cpp
  - updated enums

* Update lib/rocprofiler-sdk/tests

- common.hpp
  - ROCPROFILER_CALL_EXPECT
  - callback_data_ext
  - update get_callback_tracing_names with new enums
  - update get_buffer_tracing_names with new enums
- external_correlation.cpp
  - support new HSA API enums
- intercept_table.cpp
  - use test/common.hpp
  - update to new HSA API enums
- registration.cpp
  - support new HSA API enums
- naming.cpp
  - validation for all get_ids(), get_names(), name_by_id(), id_by_name(), etc.

* Update lib/common

- defines.hpp
  - Move IMPL_DETAIL_FOR_EACH_NARG, GET_ADDR_MEMBER_FIELDS, and GET_NAMED_MEMBER_FIELDS here
    - used by HSA, HIP, and Marker
- static_object.hpp
  - is_trivial_standard_layout static constexpr member function
  - suppress register_static_dtor when is_trivial_standard_layout

* Update lib/rocprofiler-sdk/hsa/code_object.*

- name_by_id
- id_by_name
- get_names
- get_ids

* Update lib/rocprofiler-sdk/registration.cpp

- Update rocprofiler_set_api_table for HSA

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- Update for new HSA enums
- Rework to use switch statement
  - rocprofiler_query_callback_tracing_kind_operation_name
  - rocprofiler_iterate_callback_tracing_kind_operations
  - rocprofiler_iterate_callback_tracing_kind_operation_args

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- Update for new HSA enums
- Rework to use switch statement
  - rocprofiler_query_buffer_tracing_kind_operation_name
  - rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler-sdk-tool

- helper.cpp
  - update get_buffer_id_names with new enums
  - update get_callback_id_names with new enums
- tools.cpp
  - update to use new HSA enums

* Update samples/common

- added call_stack.hpp
  - source_location struct
  - call_stack_t alias
  - print_call_stack function
- added name_info.hpp
  - utils for getting buffer/callback domain and operation names

* Update samples/api_buffered_tracing/client.cpp

- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums

* Update samples/api_callback_tracing/client.cpp

- use samples/common/call_stack.hpp
- use samples/common/name_info.hpp
- update for new HSA enums

* Update tests/tools/json-tool.cpp

- update for new HSA enums

* Update tests/rocprofv3/tracing/validate.py

- update for new HSA domain names

* Update samples/counter_collection/main.cpp

- reduce number of kernels to 50,000 since 200,000 causes issues with thread sanitizer
2024-01-30 12:14:26 -06:00
Jonathan R. Madsen 9efafc4d23 Split ROCTx API tables and update intercept table API (#421)
* Update include/rocprofiler-sdk

- buffer_tracing.h
  - fix doxygen for rocprofiler_buffer_tracing_hip_api_record_t
  - update doxygen for rocprofiler_buffer_tracing_marker_api_record_t
    - remove unused marker_id field
- fwd.h
  - Split ROCPROFILER_CALLBACK_TRACING_MARKER_API into ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API
  - Split ROCPROFILER_BUFFER_TRACING_MARKER_API into ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API
  - split rocprofiler_runtime_library_t into rocprofiler_runtime_library_t and rocprofiler_intercept_table_t
    - after split of ROCTx into 3 tables, specifying rocprofiler_at_internal_thread_create became confusing

* Update include/rocprofiler-sdk-roctx/api_trace.h

- Split into three tables: core, control, and name
  - core: what it sounds like
  - control: functions for controling the profiler
  - name: functions for giving resources names

* Update lib/rocprofiler-sdk-roctx/roctx.cpp

- modifications following split into multiple tables

* Update lib/rocprofiler-sdk/marker/*

- modifications following split of ROCTx API into multiple intercept tables

* Update lib/rocprofiler-sdk/tests

- common.hpp
  - add enums to get_callback_tracing_names() and get_buffer_tracing_names()
- intercept_table.cpp
  - update test to use rocprofiler_intercept_table_t (and enums) instead of rocproifler_runtime_library_t
  - update OR combos tested
- roctx.cpp
  - updates following split of ROCTx API table into multiple tables
  - use simplified specification of control API

* Update lib/rocprofiler-sdk

- buffer_tracing.cpp
  - Updates for ROCPROFILER_BUFFER_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- callback_tracing.cpp
  - Updates for ROCPROFILER_CALLBACK_TRACING_MARKER_{CORE,CONTROL,NAME}_API enum values
- intercept_table.hpp
  - notify_runtime_api_registration -> notify_intercept_table_registration
- intercept_table.cpp
  - updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
- registration.cpp
  - updates for new rocprofiler_intercept_table_t enum and new ROCTx tables
  - updates for notify_runtime_api_registration -> notify_intercept_table_registration

* Update lib/rocprofiler-sdk-tool

- helper.cpp
  - Updates for new enums in get_callback_id_names() and get_buffer_id_names()
- tool.cpp
  - migrate to new enums for split ROCTx tables
  - use simplified split for control table vs. core+name tables

* Update samples/{api_callback_tracing,intercept_table}

- intercept_table/client.cpp
  - rocprofiler_runtime_library_t -> rocprofiler_intercept_table_t
- api_callback_tracing/client.cpp
  - Updates for new enums in get_callback_id_names()
  - use simplified split for control table vs. core+name tables
  - migrate to new enums for split ROCTx tables

* Update tests

- rocprofv3/tracing/validate.py
  - handle new marker domain names
- tools/json-tool.cpp
  - Updates for new enums in get_callback_id_names() and get_buffer_id_names()
  - use simplified split for control table vs. core+name tables
  - migrate to new enums for split ROCTx tables

* Update tests/rocprofv3/tracing/CMakeLists.txt

- fix FAIL_REGULAR_EXPRESSION for rocprofv3-test-trace-execute

* Update lib/rocprofiler-sdk-tool/{output_file,tool}.*

- logging in output_file dtor
- support stdout/stderr

* Update lib/common/container/record_header_buffer.hpp

- reduce probability of is_empty() returning true while emplace is happening

* Update lib/rocprofiler-sdk-tool/tool.cpp

- logging for buffered_tracing_callback
- counter collection uses CSV encoder

* Update bin/rocprofv3

- remove -i flag from help menu
2024-01-26 13:56:15 -06:00
Jonathan R. Madsen c641749fe6 HIP API Tracing (#357)
* Update include/rocprofiler-sdk/hip*

- updates for intercept table

* Update lib/common/units.hpp

- clang-tidy fixes

* Add lib/rocprofiler-sdk/hip

- tracing implementation for the HIP intercept table

* Update source/lib/rocprofiler-sdk/CMakeLists.txt

- add_subdirectory(hip)

* Update source/lib/rocprofiler-sdk/hsa

- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION

* Update lib/rocprofiler-sdk/hip

- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- stringize_impl print dereferenced pointers when possible

* Update lib/rocprofiler-sdk/tests/intercept_table.cpp

- remove failures for intercepting HIP API tables

* Update include/rocprofiler-sdk/fwd.h

- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args

* Update lib/rocprofiler-sdk/intercept_table.cpp

- support HipDispatchTable and HipCompilerDispatchTable

* Update lib/rocprofiler-sdk/internal_threading.cpp

- Support ROCPROFILER_HIP_COMPILER_LIBRARY

* Update lib/rocprofiler-sdk/registration.cpp

- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging

* Update samples/api_{buffered,callback}_tracing

- Modifications to demonstrate HIP API tracing

* Update tests/kernel-tracing

- Modifications to handle/test HIP API tracing

* Separate HIP tracing from HIP compiler tracing

* Fix installation of include/rocprofiler-sdk/hip/*

- add compiler and table headers to install

* Fixes to HIP interception

- hip_api_trace.hpp was updated a bit
  - removed hipGetDeviceProperties (generic)
  - added hipGetDevicePropertiesR0600
  - added hipGetDevicePropertiesR0000
  - removed hipRegisterTracerCallback
  - reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
  - added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers

* Update lib/rocprofiler-sdk/hip/hip.*

- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)

* Update lib/rocprofiler-sdk/hsa/hsa.*

- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)

* Update test/kernel-tracing/validate.py

- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register

* Update tests/tools/json-tool.cpp

- fix context associated with "HIP_API_CALLBACK"

* Update external/CMakeLists.txt

- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
  - BUILD_TESTING (OFF)
  - BUILD_SHARED_LIBS (OFF)
  - BUILD_OBJECT_LIBS (OFF)
  - BUILD_STATIC_LIBS (ON)
  - CMAKE_POSITION_INDEPENDENT_CODE (ON)
  - CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
  - CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog

* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt

- remove explicit setting of SKIP_BUILD_RPATH

* Update CMakeLists.txt

- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH

* Update tests/CMakeLists.txt

- include(GNUInstallDirs)

* Update samples/CMakeLists.txt

- include(GNUInstallDirs)

* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h

- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)

* Update lib/rocprofiler-sdk/hip/details/ostream.hpp

- clang-tidy fixes

* Update cmake/rocprofiler_linting.cmake

- add a feature for clang tidy exe

* Update lib/rocprofiler-sdk/hip/hip.cpp

- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- fix merge

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- fix merge

* Update bin/rocprofv3

- args for marker, HIP runtime, and HIP compiler tracing

* Update tests/apps/simple-transpose

- use roctx

* Update tests/rocprofv3/tracing

- validate marker API data

* Update lib/rocprofiler-sdk-tool

- support for HIP runtime, HIP compiler, marker API

* Update queue/queue_controller/registration/utility

- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
  - implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
  - this is used to sync each queue during queue_controller_fini()

* Fix data races: queue/context/stable_vector

- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array

* Update lib/rocprofiler-sdk/hsa/hsa.*

- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables

* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp

- use HSA subtable accessors

* Update rocprofiler_memcheck and CI workflow

- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
  - GCC 13 uses libtsan.so.2

* Update CI workflow

* Update lib/rocprofiler-sdk/counters/{metrics,counters}

- fix possibly dangling reference to a temporary from gcc-13

* Update thread-sanitizer-suppr.txt

- Ignore data races originating in hsa-runtime library

* Update cmake/rocprofiler_memcheck.cmake

- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library

* Update tests/rocprofv3/tracing/CMakeLists.txt

- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test

* Update lib/common/container/record_header_buffer.hpp

- fix data race identified by gcc v13 and libtsan.so.2

* Update hip API id, args, and def

- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0

* Update lib/common/container/record_header_buffer.hpp

- fix deadlock in save/read/reset

* Update source/docs/CMakeLists.txt

- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr

* Update lib/rocprofiler-sdk/hip/details/ostream.hpp

- remove overloads for HIP_MEMSET_NODE_PARAMS

* Update docs/CMakeLists.txt

- use find_program for shell instead of hardcoded /bin/bash
2024-01-24 16:32:54 -06:00
Jonathan R. Madsen 1f4cf1aa39 Tools update (#397)
* Srnagara/tool counters collect (#331)

* Adding counter collection capability to tools

* Adding counter collection feature to tools

* Adding counter collection capability to tools

* Fixing merge down issues

* Small tool fixes for build + prevent profile realloc

* Reproducing the counter name query issue in buffered callback

* Minor fix for init order + sample that directly uses sdk-tool for debug purposes

* Adding a temporary fix to print the counter names

* Fixing the output file name and reverting the changes of caching the profile config

* Fixing SGPR_Count value

* cleaning up debug prints

* Adding header to counter collection file

* Adding kernel filtering support

* Remove threading

* Cleaning up the code

* Removing redundant prints

* Revert "Remove threading"

This reverts commit 05c58fb9de826e92cf8d2e3d1c31d5578525dcb4.

* Revert "Cleaning up the code"

This reverts commit 1d964882bf2396dee8ad020cbb6c83b36e0674e9.

* Changing the tools code to align with init-order fix

* cmake formatting (cmake-format) (#335)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* source formatting (clang-format v11) (#336)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Adding support for async memory copy

* source formatting (clang-format v11) (#391)

Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>

* Fixing header typo

* Fixing tool_fini

* Replaceing the direction and kind fields values with description

* Update lib/rocprofiler-sdk-tool/helper.cpp

- Remove use of VLA

* Update lib/rocprofiler-sdk-tool/tool.cpp

- Formatting

* Migrate common/config.* to rocprofiler-sdk-tool

* Update lib/rocprofiler-sdk-tool/tool.cpp

- fix clang-tidy issues

* source formatting (clang-format v11) (#392)

Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

* Update lib/common/mpl.hpp

- is_string_type / is_string_type_impl for deducing if type is a string type

* Update include/rocprofiler-sdk/fwd.h

- ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_NONE starts at zero

* Update lib/rocprofiler-sdk/hsa/async_copy.*

- functions for operation ids and names

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- support iterating and getting names for ROCPROFILER_BUFFER_TRACING_MEMORY_COPY

* Update lib/rocprofiler-sdk-tool/config.*

- env ROCPROFILER_ prefix -> ROCPROF_ prefix
- add support for memory copy tracing, counter collection, etc.

* Update lib/rocprofiler-sdk-tool/helper.*

- removed TracerFlushRecord
- removed cxa_demangle (use one in common library)
- removed GetCounterNames (handled in config)
- removed GetKernelNames (handled in config)

* Add lib/rocprofiler-sdk-tool/output_file.*

- separate out get_output_stream function and output_file struct from tool.cpp

* Add lib/rocprofiler-sdk-tool/csv.hpp

- write_csv_entry automatically quotes strings
- csv_encoder struct enforces correct number of columns

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- add new files

* Update lib/rocprofiler-sdk-tool/tool.cpp

- update construction of output_file class
- add kernel_symbol_data for serializing kernel trace data
- use config instead of env lookups
- optimize counter collection profile config lookup/creation

* Update bin/rocprofv3

- rocprofv3 --help exits with 0 (as it should)
- command-line arg for memory copy tracing
- command-line arg for mangled kernels
- command-line arg for truncated kernels
- env ROCPROFILER_ prefix -> env ROCPROF_ prefix

* Update tests/async-copy-tracing/validate.py

- update test_async_copy_direction to new enum values

* Update tests/kernel-tracing/validate.py

- update test_async_copy_direction to new enum values

* Update tests/tools/json-tool.cpp

- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY to supported buffer_name_info

* Update samples/counter_collection/{CMakeLists.txt,main.cpp}

- remove counter-collection-sdk-tool

* Update .github/workflows/docs.yml

- fix paths triggering running the workflow

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

* adding counter collection support

* Adding counter collection test

* changing directory structure of counter collection tests

* Fixing test path for rocprofv3

* Adding hsa-tracing basic test

* cmake formatting (cmake-format) (#362)

Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>

* counter collection tests drop2

* fixing hsa-trace test for rocprofv3 path

* python formatting (black) (#371)

Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>

* both counter colleciton and tracing should work together

* Fixing rocprofv3 path

* Attempt to fix Segfault with AddressSanitizer

* fixing sanitizer segfault

* Update rocprofv3

* Update lib/rocprofiler-sdk-tool/README.md

- update env variables

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- return ROCPROFILER_STATUS_BUFFER_NOT_FOUND if buffer tracing service is configured with invalid buffer

* Update lib/rocprofiler-sdk-tool/tool.cpp

- designated hsa API trace buffer

* Update tests/hsa-tracing/CMakeLists.txt

- Fix environment

* Update rocprofv3

- do not override HSA_TOOLS_LIB
- support ROCPROF_PRELOAD
- LD_PRELOAD librocprofiler-sdk.so

* Restructure tests directory

- move all rocprofv3 integration tests into subfolder

* Update cmake/Templates/rocprofiler-sdk/config.cmake.in

- create rocprofiler-sdk::rocprofv3 cmake target

* Update tests/rocprofv3/hsa-tracing

- improve validate.py
- convert input to dict via csv.DictReader

* Update tests/apps/CMakeLists.txt

- fix build rpath for simple-transpose

* Update  cmake/rocprofiler_memcheck.cmake

- prefer libtsan.so.0

* Update tests/rocprofv3/hsa-tracing

- move to tests/rocprofv3/tracing
- include kernel tracing and memory copy tracing

* Update lib/rocprofiler-sdk-tool/tool.cpp

- normalize "_ID" vs. "_Id" in CSV column names (use "_Id")

* Update lib/rocprofiler-sdk/buffer.{hpp,cpp}

- change signature of buffer::get_buffers()
- buffer::get_buffers() uses static_object

* Update lib/rocprofiler-sdk/context/context.cpp

- update usage of buffer::get_buffers()
  - now returns pointer

* Update lib/rocprofiler-sdk/tests/buffer.cpp

- update to change for signature of buffer::get_buffers()

* Update tests/rocprofv3/tracing/CMakeLists.txt

- use %argt% with -d argument

* Update lib/rocprofiler-sdk-tool/tool.cpp

- use atexit for finalization

* Update tests/rocprofv3/tracing/CMakeLists.txt

- tweaked name of tests

* Update lib/rocprofiler-sdk/hsa/async_copy.*

- async_copy_fini + reference counting signals

* Update lib/rocprofiler-sdk/registration.cpp

- invoke hsa::async_copy_fini() to prevent data race on signals

---------

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: SrirakshaNag <SrirakshaNag@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: bgopesh <bgopesh@users.noreply.github.com>
2024-01-22 19:06:25 -06:00
Jonathan R. Madsen 21dd088c8e ROCTx Library Tracing (#390)
* Update include/rocprofiler-sdk/marker/*

- Update rocprofiler_marker_api_args_t for all API functions
- Add ROCPROFILER_MARKER_API_ID_roctxGetThreadId to rocprofiler_marker_api_id_t

* Update include/rocprofiler-sdk/marker/api_args.h

- fix include

* Update lib/common/mpl.hpp

- is_pair
- is_type_complete_v

* Update include/rocprofiler-sdk/marker/*

- fix rocprofiler_marker_api_retval_t
- add roctxGetThreadId to rocprofiler_marker_api_args_t
- fix type in enum: HsaDevice -> HsaAgent
- add table_api_id.h

* Update include/rocprofiler-sdk/marker.h

- include marker/table_api_id.h

* Update include/rocprofiler-sdk/buffer_tracing.h

- Buffer marker tracer records have begin and end timestamp

* Add lib/rocprofiler-sdk/marker

- tracing implementation for marker (roctx) library

* Update include/rocprofiler-sdk/{buffer_tracing,marker/table_api_id}.h

- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t

* Update lib/rocprofiler-sdk/buffer_tracing.cpp

- support for ROCPROFILER_BUFFER_TRACING_MARKER_API

* Update lib/rocprofiler-sdk/callback_tracing.cpp

- support for ROCPROFILER_CALLBACK_TRACING_MARKER_API

* Update lib/rocprofiler-sdk/intercept_table.cpp

- template instantiation for notify_runtime_api_registration

* Update lib/rocprofiler-sdk/registration.cpp

- enable roctx in rocprofiler_set_api_table

* Update lib/rocprofiler-sdk/marker/marker.cpp

- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t

* Update lib/rocprofiler/tests for roctx testing

- add roctx.cpp
  - unit tests for roctx callback and buffer tracing
- support marker API in get_{buffer,callback}_tracing_names()

* Update lib/common/logging.cpp

- logging initialized message mentions env variable

* Update lib/common/mpl.hpp

- NOLINT for misc-definitions-in-headers

* Update lib/rocprofiler-sdk/tests/CMakeLists.txt

- include LD_LIBRARY_PATH in rocprofiler-lib-tests-shared tests

* Update lib/rocprofiler-sdk/registration.cpp

- client_library_vec_t is now vector of option<client_library>
  - enables resetting the client_library after finalization
- removed acquiring registration lock when invoke_client_finalizers called via atexit
  - this was causing some lock-order-inversion warnings (potential deadlock)

* Update lib/rocprofiler-sdk/agent.cpp

- model name for agent supports spaces

* Update tests/common/serialization.hpp

- add serialization support for marker tracing data structures

* Update tests/apps

- Add ROCTx markers into reproducible-runtime and transpose

* Update tests/tools/json-tools.cpp

- add marker tracing support
- remove strdup (no longer necessary)

* Update tests/kernel-tracing/validate.py

- validate marker API tracing data

* Update tests/async-copy-tracing/validate.py

- validate marker API tracing data

* Update cmake for load path resolution during testing

* Update tests/async-copy-tracing/CMakeLists.txt

- fix test LD_LIBRARY_PATH

* Update cmake/Templates/rocprofiler-sdk-roctx/config.cmake.in

- fix constructing rocprofiler-sdk-roctx::rocprofiler-sdk-roctx
2024-01-18 09:48:06 -06:00
Jonathan R. Madsen dc8b8aa448 Cleanup + logging env variable (#387)
* [CP] Update tests/common/serialization.hpp

- remove duplication in rocprofiler_callback_tracing_code_object_load_data_t

* [CP] Update lib/rocprofiler-sdk/tests

- create common.hpp
- update registration.cpp to use common.hpp

* [CP] Add lib/common/logging.{hpp,cpp}

- generic init_logging function

* [CP] Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- remove excess logging

* [CP] Update lib/rocprofiler-sdk/registration.cpp

- use common::init_logging(...)
- enforce ROCPROFILER_REGISTER_FORCE_LOAD in rocprofiler_force_configure
- logging updates in rocprofiler_set_api_table

* Update include/rocprofiler-sdk/buffer_tracing.h

- rocprofiler_buffer_tracing_marker_record_t -> rocprofiler_buffer_tracing_marker_api_record_t

* Update lib/common/utility.hpp

- remove active_capacity_gate

* Update lib/rocprofiler-sdk/tests/common.hpp

- fix get_{callback,buffer}_tracing_names()

* Update lib/rocprofiler-sdk/counters/xml/{basic,derived}_counters.xml

- add entries for gfx1102
2024-01-17 00:28:20 -06:00
Jonathan R. Madsen 936816f762 Async memory copy tracing (#317)
* Update samples/api_buffered_tracing/client.cpp

- support ROCPROFILER_BUFFER_TRACING_MEMORY_COPY

* Update include/rocprofiler-sdk/{buffer_tracing,fwd}.h

- update rocprofiler_buffer_tracing_memory_copy_record_t
- add ROCPROFILER_BUFFER_TRACING_MEMORY_COPY_HOST_TO_HOST to rocprofiler_memory_copy_operation_t

* Update lib/rocprofiler-sdk/context/context.*

- get_registered_contexts functions (local copy)

* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp

- include some memory allocations and memory copies for better testing

* Update tests/common/serialization.hpp

- update serialization save function for rocprofiler_buffer_tracing_memory_copy_record_t

* Update lib/rocprofiler-sdk/hsa/hsa.*

- remove stale set_callback / activity_functor_t code
- forward decl hsa_api_meta
- template struct hsa_api_func for getting function return type and args

* Update tests/kernel-tracing/validate.py

- enforce memory_copies data size
- test timestamps in memory copies data
- improve internal and external correlation id validation

* Update lib/rocprofiler-sdk/hsa/defines.hpp

- HSA_API_META_DEFINITION macro

* Update lib/rocprofiler/hsa/rocprofiler-sdk/hsa/hsa.def.cpp

- HSA_API_META_DEFINITION specializations for async copy functions

* Add lib/rocprofiler-sdk/hsa/async_copy.{hpp,cpp}

- implements buffer memory tracing

* Update lib/rocprofiler-sdk/registration.cpp

- invoke rocprofiler::hsa::async_copy_init

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- logging improvements
- improve hsa <-> rocp agent mapping

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- load original signal in async signal handler before store_screlease

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- use store_relaxed instead of store_screlease

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- logging

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- logging

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- misc changes

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- misc changes

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- misc changes

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- return function pointer instead of lambda

* Update reproducible-runtime.cpp

- device sync

* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp

- use *Async variants of hipMalloc and hipMemcpy

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- populate async data properly

* Update tests/kernel-tracing/validate.py

- verification of async copy direction

* Update tests/apps/reproducible-runtime/reproducible-runtime.cpp

- temporarily disable async memcpy functions

* Create tests/tools

- directory containing tool libraries used for collecting data in integration tests

* Update tests/kernel-tracing

- remove kernel-tracing-test-tool library (now rocprofiler-sdk-json-tool)
- update cmake, validate.py, conftest.py accordingly

* Add tests/async-copy-tracing

- integration test validating async copy tracing in transpose example

* Update tests/CMakeLists.txt

- updates for restructuring

* Revert tests/apps/reproducible-runtime

- restore code to semi-original state (no memory copying)

* Update tests/async-copy-tracing/validate.py

- fix comment in test_async_copy_direction

* Fix building tests against installation
2024-01-09 11:34:46 -06:00
Jonathan R. Madsen 6b374b8e68 Improve static singleton memory safety (#316)
* Update GitHub links

* Update samples/api_buffered_tracing/client.cpp

- check if initialized before forcing initialization

* Add lib/common/static_object.*

- template class for creating a static allocation in the binary which has all the properties of a heap allocated singleton but does not trigger leak sanitizers

* Update include/rocprofiler-sdk/internal_threading.h

- document return values

* Update lib/rocprofiler-sdk/internal_threading.cpp

- return codes from rocprofiler_create_callback_thread and rocprofiler_assign_callback_thread
- use common::static_object for thread-pool object

* Update lib/rocprofiler-sdk/agent.cpp

- use common::static_object to store array of strings and their hashes

* Update lib/rocprofiler-sdk/hsa/code_object.cpp

- use common::static_object to store array of strings and their hashes to ensure strings exist until termination

* Update lib/rocprofiler-sdk/registration.cpp

- use common::static_object to store status and client libraries
- update return values for rocprofiler_set_api_table

* Update lib/rocprofiler-sdk/hsa/hsa.cpp

- check registration::get_fini_status() in hsa_api_impl::functor<Idx>(args...)

* Update lib/rocprofiler-sdk/context/context.cpp

- using common::static_object for correlation id map
2023-12-19 13:47:21 -06:00
Jonathan R. Madsen 9a0c84efa6 Use -sdk suffix and reset VERSION to 0.0.0 (#263)
* Fix find_package(rocprofiler) in build tree

* Move include/rocprofiler to include/rocprofiler-sdk

* Update include/CMakeLists.txt

- add_subdirectory(rocprofiler-sdk)

* Move lib/rocprofiler to lib/rocprofiler-sdk

* Move lib/rocprofiler-tool to lib/rocprofiler-sdk-tool

* Update lib/CMakeLists.txt

- add_subdirectory(rocprofiler-sdk)
- add_subdirectory(rocprofiler-sdk-tool)

* Update lib/rocprofiler-sdk/CMakeLists.txt

* Rename rocprofiler-tool to rocprofiler-sdk-tool

* Replace include rocprofiler/ with include rocprofiler-sdk/

* Replace include lib/rocprofiler/ with include lib/rocprofiler-sdk/

* Set VERSION to 0.0.0 and finish install to rocprofiler-sdk

* More fixes for rocprofiler -> rocprofiler-sdk

- fix issue with rocprofiler-sdk-config.cmake.in
- fix counters xml install path

* Fix documentation generation

* Create rocprofiler_LIB_ROCPROFILER_SDK_DIR for build tree

* cmake formatting (cmake-format) (#264)

Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-11-29 20:43:18 -06:00