Graphe des révisions

86 Révisions

Auteur SHA1 Message Date
Madsen, Jonathan 4d6a61f5e5 [SDK] Fix null handles (#474)
* Fix null handle

- use .handle=0, not .handle=numeric_limits<>::max()

* Update lib.common.hasher

* Fix ROCPROFILER_CONTEXT_NONE

* Use context operator==

* Update CHANGELOG

* Updated null handle for scratch memory and changed allocation test so that free ops account for null agent

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
2025-07-18 12:05:52 -05:00
Welton, Benjamin 7e3ea0c58e [SWDEV-540753] Reduce memory allocations in device profiling (#507)
Cache packet creation in all cases to reduce the number of allocations/
destruction operations made down to KFD. There is a bug that we
encounter after a period of runtime in KFD where allocations fail to be
visable to the GPU (suspect this is a FW issue, similar to other FW
issues they have had along the same lines). This sidesteps that issue in
rocprof (and likely should be done regardless)

Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-07-11 12:46:19 -07:00
Madsen, Jonathan caf1a2174e [SDK] Support HIP 7.0 API changes (#432)
* [Do not merge] Make changes to api_args

* Support HIP 7.0 API changes

- Provide ROCPROFILER_SDK_ variants of ROCPROFILER_ version defines
- Provide ROCPROFILER_SDK_COMPUTE_VERSION
- hipCtxGetApiVersion changes parameter from int* to unsigned int*
- hipMemcpyHtoD and hipMemcpyHtoDAsync changed void* to const void*

* Fix comment

---------

Co-authored-by: Jatin Chaudhary <jatchaud@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-06-03 21:50:50 -05:00
Madsen, Jonathan 7afedc63be [rocprofv3] SQLite3 database output (rocpd) support + rocprofiler-sdk-rocpd (#403)
* [rocprofv3] rocpd SQLite3 database output support

* Move counters xml and yaml to source/share/rocprofiler-sdk

- more representative of install hierarchy

* Add share/rocprofiler-sdk/rocpd SQL files

* Experimental rocprofiler-sdk SQL API

* rocprofv3 default output format is rocpd

* Fix rocpd event ids for counter collection w/o kernel dispatch

* Remove fktable entries from rocpd_tables.sql

* Fix rocpd schema path

* Fix install component for roctx python bindings

* rocprofiler-sdk-rocpd

- create include/rocprofiler-sdk-rocpd
- create rocprofiler-sdk-rocpd library, package, etc.
- default all "guid" fields to "{{guid}}" in tables
- remove "{{view_uuid}}" support (always unused)

* Migrate rocprofv3 to use rocprofiler-sdk-rocpd

* Fix missing foreign key reference

* Revert change

* Fix cmake comment

* Fix maybe-uninitialized compiler warning

* Fix maybe-uninitialized compiler warning

* Add logging to rocpd_sql_load_schema

* Improve string sanitization when inserting json strings

* Initialize rocpd logging on rocprofiler-sdk-rocpd library load

* Revert lib/output/generatePerfetto.cpp changes

* [temporary] Tweak rocprofv3-test-list-avail-trace-execute test log level

* Update get_install_path for lib/rocprofiler-sdk-rocpd/sql.cpp

- try to resolve issues on RHEL/SLES for dladdr

* Update lib/common/logging.cpp

- enable environ overrides

* dlsym for rocpd_sql_load_schema

* Make dl_info.dli_fname lexically normal

* Implement node_info alternatives if /etc/machine-id does not exist

* Misc include fixes

* SHA256 and UUIDv7 support

* Implement UUIDv7 in generateRocpd.cpp

* Support push/pop environment variables

* Minor tweak

* Fix glog segfaults when unsetting glog env

* Updated CHANGELOG

* Updates tests/pytest-packages

- rocpd_reader.py: RocpdReader

* Update tests / marker_views.sql

- add test_rocpd_data

* Update rocpd_tables.sql

- Use AUTOINCREMENT
- insert "uuid" and "guid" into rocpd_metadata

* Minor updates to generateRocpd.cpp

- don't quote GUID
- use sqlite3_open_v2
- use sqlite3_close_v2

* Update execute_raw_sql_statements_impl

- uses sqlite3_last_insert_rowid for autoincrement

* Update SQL deferred_transaction

- CI check for nullptr to connection

* Apply suggestions from code review

Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>

* Code review updates

- formatting
- replace if with switch
- remove loop for {{uuid}}

* Fix pmc_groups handling in rocprofv3

* Address code review feedback

- Include rocm_version in rocprofv3 version info
- Note `--version` option for `rocprofv3` in CHANGELOG.md
- remove commented out code

* Fix packaging dependencies

* Fix install package step of CI workflow

* Fix install package step of CI workflow

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
2025-05-30 00:13:19 -05:00
Madsen, Jonathan dbb2e52216 [SDK] Remove std::regex usage from rocprofiler-sdk library and common library (#421)
* Remove std::regex usage from rocprofiler-sdk and common library

- See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118408
- std::regex usage produces exceptions or segfaults when used when on applications compiled with dual ABI
- Add code restrictions workflow
  - simple workflow ensuring code restrictions (such as std::regex) are not used

* Update CHANGELOG

* Explicitly set permissions for restrictions workflow

* Fix handling of /proc/cpuinfo entries with no info

- e.g. "power_management:" (colon is last character)

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-05-29 23:11:13 -05:00
Madsen, Jonathan 7166b1ab58 [rocprofv3] Add rocpd output support (part 1: prelude) (#401)
* [rocprofv3] Add rocpd output support (part 1: prelude)

- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
  - md5sum
  - hasher
  - simple_timer
  - static_tl_object
  - get_process_start_time_ns(pid_t)
- output library updates
  - node_info
  - file_generator (generator is now virtual base class)
  - stream info updates

* Added submodules

* Code review updates

* Minor unused-but-set-X warning fixes

* Update CI

- install libsqlite3-dev package

* Update CI

- install libsqlite3-dev package

* Fix static thread-local object memory leak

- also fix signal handler chaining

* Remove URL from comment

* Remove page migration exception

* Enable ROCPROFILER_BUILD_SQLITE3 by default

- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON

* Fix gotcha installation

- make install of target optional

* Validate tracing + counter collection dispatch data

- i.e. correlation ids, thread ids, timestamps

* Make find_package(SQLite3) optional

- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)

* Fixes to tracing + counter collection test

* get_process_start_time_ns update

- original implementation did not work

* Fix pytest-packages test_perfetto_data for counter collection

- erroneous failure when used with same PMC + multiple agents

* cmake policy: option() honors normal variables

- for GOTCHA submodule

* Improve samples/api_buffered_tracing stability

- reduce likelihood of sporadic exception throw

* Update gotcha submodule

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-05-18 20:11:26 -05:00
Madsen, Jonathan 8a1ee46e47 [SDK] Fix double buffer data race (#394)
* Fix double buffer data race

- fixes relatively rare data race in double buffering scheme

In `rocprofiler::buffer::instance::emplace`, the `container::record_header_buffer::get_record_headers()` function returned a `std::vector<rocprofiler_record_header_t*>` and then invoked callback to tool. It was possible for that callback to still be executing while the buffer was being updated. This potentially introduced a scenario where the rocprofiler_record_header_t* was modified (or corrupted) before the tool processed the record. In rocprofv3, this would result in a "future" buffer record showing up among "past" buffer records. E.g., correlation id sequence of 1-15 where the buffer flushes after five values, could result in this during processing:
|     |     |     |     |      |
|:---:|:---:|:---:|:---:|:---:|
|  1 |  2 |  3 |  4 | 15 |
|  6 |  7 |  8 |  9 | 10 |
| 11 | 12 | 13 | 14 | 15 |

Because buffer A (of double buffering scheme) originally containing corr ids 1-5 stalled after process corr id 4 (e.g. write to disk), buffer B filled up with 6-10 and started flushing, causing a switch back to buffer A, and buffer A was filled with 11-15 by the time callback accessed what was originally corr id 5 but was now updated to corr id 15.

* Update CHANGELOG

* misc minor cleanup

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-05-14 13:19:22 -05:00
Nagaraj, Sriraksha 87badfbd15 [rocprofv3] signal handler fix (#332)
* rocprofv3: LD_PRELOAD for signal and sigaction

- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)

* [DO NOT COMMIT] extra logging

* Switch git submodule url for perfetto

- use GitHub URL as this is more accessible

* Update ring_buffer<Tp>

- account for alignment padding

* Update buffered_output

- track number of bytes stored
- add nullptr checks

* Update tmp_file_buffer

- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist

* Update tmp_file

- add exists member function for checking whether temporary file already exists
- tweak remove() implementation

* Update config.hpp

- add option to enable/disable signal handlers
- add option for minimum_output_bytes

* Make signal, sigaction functions visible

* rocprofv3 tool updates

- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes

* Add commandline support

* fixing test

* minor fix

* minor fix

* fix clang issue

* fix

* Adding docs

* review comments

* review changes

* review

* YUV pulldown additions to rocdecode

* More rocdecode changes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-04-17 21:10:52 -07:00
Madsen, Jonathan c5a3edc3fa [Misc] Rework header includes (#311)
* Update header file includes

* Fix includes for lib/rocprofiler-sdk/hip/hip.hpp

* Minor touch ups

* Minor include improvements

* Doxygen tweak

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-04-15 14:02:12 -07:00
Trowbridge, Ian 4fbcfd142c Copyright Compliance (#333)
* Added copyright information to requested files

* Formatting

* Fix bad function name error
2025-04-14 13:07:32 -05:00
Welton, Benjamin 4cd121e27b [SDK] Release 1.0 Public API Modifications (#277)
* Make sure all structs/enums can be forward declared

* Updates to counter collection

- consistency updates and cleanup

* Conversion of dimension information to info struct

* Added deprecated folder

* Testing changes

* merge changes

* Fix shadowed variable

* Source code formatting

* Fix shadowed variable

* Update rocprofiler_counter_info_v1_t member names

* Split version.h into version.h and ext_version.h

- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision

* profile_config -> counter_config

* EOF new line

* [Samples] Reduce header includes + reorg counter collection samples

* Misc compilation fixes

- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables

* Minor misc modifications

- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
  - move local anon namespace functions into rocprofiler::counters:: anon namespace
  - use std::string_view for get_static_string
  - const ref for get_static_ptr
  - misc namespace shortening

* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t

- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet

* [Tests] Improve async-copy-testing test

- relax constraints
- improve logging

* Update counter_config.h doxygen docs

* ROCPROFILER_SDK_BETA_COMPAT

- ppdef which helps with renaming when set to 1

* Remove spurious include

* Fix includes for cxx/version.hpp

* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet

* Public API Experimental Designation

- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries

* Fix use of assert instead of static_assert in hip/stream.cpp

* Use typedef instead of define for rocprofiler_profile_config_id_t

* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef

- added <rocprofiler-sdk/deprecated/profile_config.h>

* Doxygen for rocprofiler_{create,destroy}_profile_config

* ROCPROFILER_SDK_DEPRECATED_WARNINGS

* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1

* cmake formatting

* Misc variable renaming in samples and tests

* Fix declarations of types

* Fix hip stream tracing service struct name

- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t

* Rename "HIP_STREAM_API" to "HIP_STREAM"

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-24 12:07:33 +05:30
Trowbridge, Ian ccd1e54293 HIP Streams to Queues Translation (#235)
* rocprofiler_stream_id_t: opaque handle for a stream

- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
  - rocprofiler_buffer_tracing_hip_api_record_t
  - rocprofiler_buffer_tracing_memory_copy_record_t
  - rocprofiler_callback_tracing_hip_api_data_t
  - rocprofiler_callback_tracing_memory_copy_data_t
---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-14 02:45:13 -07:00
Madsen, Jonathan 470f347e50 SDK: remove majority of exceptions (#176)
* SDK: remove majority of exceptions

- replace with ROCP_FATAL, ROCP_CI_LOG(WARNING), etc.
- improve logging of symbolic link
- add --readlink and --realpath (hidden options) to rocprofv3 to follow symlinks for preloaded libraries

* Add rocprofv3 --rocm-root argument

* Fix registration resolved_exists

* Fix rocprofv3_avail.py

* Update logging for rocprofiler_configure search

- relax failure conditions

* Misc clang-tidy fixes

* Fix merge

* Fix merge

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
2025-02-18 10:44:37 -06:00
Madsen, Jonathan 6246ec4040 SDK: Agent UUIDs, agent runtime visibility, kernel symbol address (#154)
* [DO NOT MERGE] Misc UUID updates

- this is WIP

* Agent visibility

- Support for ROCR_VISIBLE_DEVICES, HIP_VISIBLE_DEVICES, CUDA_VISIBLE_DEVICES, GPU_DEVICE_ORDINAL

* Update CHANGELOG

* tweak to rocprofiler_agent_runtime_visiblity_t

* Code object kernel address

- new fields in code_object_kernel_symbol_register_data_t
  - kernel_code_entry_byte_offset
  - kernel_address

* Support ROCR_VISIBLE_DEVICES reordering devices for HIP

* Addressed code review changes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-02-11 14:36:23 -06:00
Rawat, Swati 97b7a6315d update copyright date to 2025 (#102)
* Update LICENSE

* Update conf.py

* Update copyright year

* [fix] Update copyright year

* Update copyright year "ROCm Developer Tools"

* Add license headers to c++ files

* Add license to *.py

* Update licenses in rocdecode sources

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Mythreya <mythreya.kuricheti@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-01-22 19:11:20 -06:00
Madsen, Jonathan bd447ab941 Misc AFAR VII updates + clang-tidy-19 + bump version to 0.6.0 (#54)
* Misc AFAR VII updates + clang-tidy-19 + bump version to 0.6.0

- move tests/rocprofv3/trace-period to tests/rocprofv3/collection-period
- bump clang-tidy to v19
- fix misc clang-tidy errors

* Update the collection period test

- don't attach files on fail bc when test is disabled, it causes problems

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-06 12:35:29 -06:00
Jonathan R. Madsen 5eb8c2658c rocprofv3: refactor and reorganize rocprofiler-sdk-tool library (#1138)
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool

* Initial source re-organization

- create "output" static library

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- add GPR count fields to kernel symbol serialization

* Add source/scripts/generate-rocpd.py

- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation

* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output

* Updates to generate-rocpd.py

- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names

* Update generate-rocpd.py

- Add --marker-mode option

* Update generate-rocpd.py

- Improve debugging of bad bulk SQLite statements

* Update rocprofv3-multi-node.md

- cleanup of proposed SQL schema

* lib/output/format_path.{hpp,cpp}

- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}

* Rework lib/output/tmp_file_buffer.{hpp,cpp}

* Update output_key.cpp

- support %cwd%, %launch_date%

* Rework lib/output/buffered_output.hpp

* Support csv_output_file constructed via domain_type

* Update lib/output/domain_type.{hpp,cpp}

- get_domain_trace_file_name
- get_domain_stats_file_name

* Update lib/rocprofiler-sdk-tool/tool.cpp

- tweak headers

* Update lib/output/generate*.cpp

- remove include of helpers.hpp
- CSV uses domain_type for filenames

* Update samples/counter_collection/per_dev_serialization.cpp

- make wait_on volatile

* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool

- Also split various structs into their own files
  - lib/output/agent_info
  - lib/output/metadata
  - lib/output/kernel_symbol_info
  - lib/output/counter_info
- Implemented rocprofiler::tool::metadata

* Optimize rocprofiler_tool_counter_collection_record_t

- reduce the size of the struct from 24784 bytes to 8376 bytes

* Introduced output_config

- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory

* Stream chunks of data into output instead of loading all info memory

* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization

* Adding Q&A to rocprofv3-multi-node.md

* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output

- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output

* Update Q&A of rocprofv3-multi-node.md

* Fix minor compilation errors + minor cleanup

* Update hsa/async_copy.cpp

- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time

* Update profiling_time.hpp

- fix log messages for when start/end time is less/greater than enqueue/current CPU time

* Fix generate_stats for tool_counter_record_t

* Dictionary optimization for generate-rocpd.py

---------

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
2024-11-07 01:15:19 -06:00
Jonathan R. Madsen 74facf87a6 CMake: Consistently name CMake Targets (#1082)
* Change all rocprofiler-X target names to rocprofiler-sdk-X

* Update rocprofiler-sdk-config.cmake

- fix install tree target names
- simplify logic for using find w/ components and find w/o components

* Update rocprofiler-sdk-roctx-config.cmake

- simplify logic for using find w/ components and find w/o components

* Update samples/intercept_table/CMakeLists.txt

- demonstrate/test use of `find_package(rocprofiler-sdk ... COMPONENTS ...)`
2024-10-25 11:17:34 -05:00
Jonathan R. Madsen 7861dcc6c6 Update HIP tracing ABI (#1025)
* Update HIP ABI tracing

* Minor HIP abi.cpp updates

* Misc roctx updates (version.h + more)

* Common static thread-local template struct

- static_tl_object
- similar to static_object but with thread-local semantics

* rocprofiler-sdk/version.h updates

* Update for HIP_RUNTIME_API_TABLE_STEP_VERSION == {4,5,6}

* Fix roctx.cpp tweaks
2024-09-13 17:10:35 -05:00
Jonathan R. Madsen 6098d52335 Prevent misaligned read from common::container::ring_buffer (#1076)
- causes undefined behavior
2024-09-12 18:25:05 -05:00
Jonathan R. Madsen 37e0d7efce Fix misaligned stores in buffer (#1063)
* Fix misaligned read/write to buffer

- causes undefined behavior

* Update run-ci.py

- fix spurious CDash submission failure warning

* Improve run-ci.py support for UBSan

* Relax rocprofv3 summary stats count expectation

* Update CHANGELOG
2024-09-10 17:08:57 -05:00
Jonathan R. Madsen 7a639f3439 Update HSA ABI checks for tracing (#1027)
* Update HSA ABI checks for tracing

* Update lib/common/abi.hpp

- perform ABI versioning checks even when `ROCPROFILER_CI` is not defined (or ROCPROFILER_CI=0)

* Enforce versioning size for various HSA AmdExt step versions + hsa_amd_enable_logging support

* Minor HIP abi.cpp updates
2024-08-20 01:08:34 -05:00
Jonathan R. Madsen 5d54682468 Misc cleanup and stale code removal (#1026)
* Remove custom allocators

- remove unused lib/rocprofiler-sdk/allocator.*
- remove unused lib/rocprofiler-sdk/context/allocator.hpp

* Fix rocprofiler_strip_target (rocprofiler_utilities.cmake)

* Remove old HSA_TOOLS_LIB support

- remove OnLoad/OnUnload functions used by HSA_TOOLS_LIB env variable

* Fix linter warnings + specific NOLINT exceptions

- replace bare NOLINT with NOLINT(<warning-name>)
2024-08-20 01:07:32 -05:00
Jonathan R. Madsen 16d535ef48 rocprofv3 OTF2 Output Support (#995)
* CMake support for OTF2 library

* Preliminary OTF2 generation implementation

* Completed OTF2 Support

- HSA API
- HIP API
- Marker API
- Async Memory Copies
- Kernel Dispatch

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix location type for dispatches

* Testing for OTF2 output

* Add OTF2 to requirements.txt

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix getting kernel name

* OTF2 testing with rocprofv3/tracing-hip-in-libraries

* Format external/otf2/CMakeLists.txt

* Update external/otf2/CMakeLists.txt

- guard CMP0135 for cmake < 3.24

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix duplicate string ref issue

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix header includes

* Update CI workflow

- sudo install pypi requirements for core-rpm for $HOME/.local installs

* Update pytest_utils/otf2_reader.py

- modifications for reading trace

* Update pytest_utils/otf2_reader.py

- misc cleanup

* Update CI workflow

- fix installer artifact naming

* Update pytest_utils/otf2_reader.py

- handle slightly overlapping kernel timestamps for MI300

* OTF2 attributes for category

* Testing with OTF2Reader category attributes

* Fix memory leak in OTF2 generation

- leaking OTF2_AttributeList
2024-07-30 19:57:19 -05:00
Jonathan R. Madsen ebb021c59f rocprofv3 kernel renaming support + misc rocprofv3 updates (#992)
* Increase rocprofv3 tool buffer size

- 32 pages instead of 1 page

* Improve rocprofv3 perfetto track labels

* Preliminary kernel renaming support + misc rocprofv3 updates

- add rocprofv3 option --kernel-rename
- add rocprofv3 options for perfetto settings (buffer size, etc.)
- add CSV columns for kernel trace
  - Thread_Id
  - Dispatch_Id
- add CSV column for counter_collection
 - Kernel_Id
2024-07-29 14:33:50 -05:00
Gopesh Bhardwaj dc671497da look for symbols in dynsym table (#990)
* look for symbols in dynsym table

* checking both symtab and dynsym

* Avoid symbol duplication in non stripped binaries

* clang-format

* Minor elf_utils.cpp updates

- use 'else if' instead of 'if'
- logging tweaks

* Update registration

- tweak logging

* Update testing

- strip the rocprofiler-sdk-c-tool library
- add test-c-tool-rocp-tool-lib-execute test which does NOT LD_PRELOAD the library (uses only ROCP_TOOL_LIBRARIES instead)

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-07-26 10:21:04 +05:30
Jonathan R. Madsen 90accda152 Common string entry (#971)
* Common string entry

* Add lib/common/string_entry.cpp + return const string*
2024-07-16 23:07:49 -07:00
Benjamin Welton 34897d318f Convert counter def format to YAML (#976)
* Convert counter def format to YAML

Converts counter definition format to YAML with the
following structure:

```yaml
COUNTER_NAME:
 architectures:
  gfxXX: // Can be more than one, / deliminated if they share idential data
    block: <Optional>
    event: <Optional>
    expression: <optional>
    description: <Optional> // In case per arch notes are needed
  gfxYY:
    ...
 description: General counter desctiption
```

All counters (derived and hardware) are now defined
in the same file for ease of future additions/subtractions.

Removes existing XML parser. Keeps the existing XML
definitions for now (since other tools still rely on
its presence).
2024-07-12 16:20:33 -07:00
Jonathan R. Madsen 2be3543c7b Parse ELF format for rocprofiler_configure symbol (#970)
* Parse ELF format to search for rocprofiler_configure

* Use ELF parsing in registration
2024-07-11 20:22:26 -05:00
Jonathan R. Madsen 60b1dbfb6f Update HIP API tracing (#958)
- support HipDispatchTable additions for HIP_RUNTIME_API_TABLE_STEP_VERSION 1 thru 4
2024-07-08 17:12:53 -05:00
Jonathan R. Madsen 1e49b43738 Miscellaneous updates (#959)
- missing-new-line CI job: ensures all source files end with new line
- logging updates
- add new line to the end of many files
- fix header include ordering is misc places
- transition to use hsa::get_core_table() and hsa::get_amd_ext_table() in various places instead of making copies
2024-07-08 16:50:32 -05:00
Benjamin Welton 45e165b256 Add calls to stop and wait for a debugger (#916)
Co-authored-by: Benjamin Welton <ben@amd.com>
2024-06-07 13:04:52 -07:00
Jonathan R. Madsen a76f61a0a3 Migrate to rocprofiler-sdk:: namespace in CMake everywhere (#892)
- remove all usage/support for rocprofiler:: namespace
2024-05-29 22:28:43 -05:00
Jonathan R. Madsen 92b7326910 Adding JSON support (#860)
* Adding json support

minor bugs

Fixing tests

Fixing formatting issues

Fixing test

test fix

Misc testing fixes

Use rocprofiler/cxx/name_info in rocprofiler-sdk-tool

fixes to reduce the Json file size

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/helper.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/helper.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/helper.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/tool.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update source/lib/rocprofiler-sdk-tool/generateJSON.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

misc fixes

Removing int cast for JSON tests

formatting

removing a condition test on Navi3

adding debug info

Misc fix

* CSV updates

- fix stats
- numerical formatter support for customizing write_csv_entry
- misc formatting
- get_marker_stats_file

* Misc tests/rocprofv3/counter-collection/input2 fixes

- rocprofiler_configure_pytest_files in rocprofv3/counter-collection/input2
- removed state code from merge in rocprofv3/counter-collection/input2

* Tool: "Agent-id" -> "Agent_Id"

- consistency

* Tool update

- remove rocprofiler_tool_marker_record_t
- add marker_tracing_kind_conversion
- fix memory leak in write_json
- minor update to get_output_stream
- rework handling of marker records

* Update tests/pytest-packages/pytest_utils/__init__.py

- add collapse_dict_list function for converting a dictionary value that is a list of length one into a directly mapped value

* Update tests/rocprofv3/**/conftest.py

- use collapse_dict_list when reading in JSONs

* Update tests/rocprofv3/counter-collection/input1/validate.py

- relax testing requirements gfx1102 (AQLProfile bugs)
  - in addition to relaxed testing requirements for gfx1101

* Update tests/rocprofv3/tracing/validate.py

- fix removal of PID in every marker record

* Update tests/rocprofv3/tracing-plus-cc

- remove test design that relies on iterating subdirectories

* Wrapper around __libc_start_main

- Ensures finalization happens before main returns
- Update tests/rocprofv3/tracing/validate.py
  - wrapper around __libc_start_main changed roctx calls

* Combine include/rocprofiler-sdk/cxx/serialization.hpp and include/rocprofiler-sdk/external/serialization.hpp

- tests/common/serialization.hpp simply includes include/rocprofiler-sdk/cxx/serialization.hpp now

* Update lib/rocprofiler-sdk/hip/hip.cpp

- tracing function immediately returns when fini_status is non-zero

* Update lib/rocprofiler-sdk/hsa/hsa.cpp

- remove logging of tracing function when fini_status is non-zero

* Update lib/rocprofiler-sdk-tool/CMakeLists.txt

- remove rocprofv3_trigger_list_metrics.cpp from TOOL_SOURCES

* Update tests/rocprofv3/tracing-plus-cc/CMakeLists.txt

- fix depends

* Domain statistics

* Update tests/rocprofv3/tracing-plus-cc/CMakeLists.txt

- do not set ROCP_LOG_LEVEL in env

* Remove erroneous <bits/utility.h> include

* Restructure tool source + reduce tool table + support multiple formats

- buffered_output struct for handling output
- support multiple output formats, e.g. --output-format csv,json
- rename buffer_type_t -> domain_type
- simplified generation of CSV output files
- removed rocprofiler_tool_marker_record_t

* Update lib/common/container/ring_buffer.hpp

- value_type alias in ring_buffer<Tp>

* Remove all but one json-execute tests

- generate CSV and JSON in same run

* Fix include for domain_type.cpp

* Update tests/rocprofv3/tracing-plus-cc/input.txt

- only specify counters which can be found on gfx8, gfx9, gfx10, gfx11, etc.
- use :device= syntax

* Update lib/rocprofiler-sdk-tool/config.cpp

- support :device=N syntax for counters file
- improve stripping comments in PMC files
- only read after pmc:

* Rework tool library counter collection

- fatal error if all requested counters for device are not found
- support :device= syntax

* Update tests/rocprofv3/tracing-plus-cc/input.txt

- removed L2CacheHit (not supported on mi300)

* Disable JSON tests in tests/rocprofv3

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- support rocprofiler_record_dimension_info_t

* Update tool JSON schema

- remove domain_type::CODE_OBJECT
- rocprofiler_tool_agent_v0_t
  - rocprofiler_agent_v0_t + counters
- rocprofiler_tool_counter_info_t
- get_code_object_data()

* Update JSON schema for tool

* Update lib/rocprofiler-sdk-tool/tool.cpp

- fix ROCP_WARNING_IF

* rocprofv3 -> rocprofv3.sh

- install rocprofv3.sh into sbin
- configure_file <source-tree>/rocprofv3.sh -> <binary-tree>/bin/rocprofv3

* Update tool counter collection

- rocprofiler_tool_record_counter_t
- rocprofiler_tool_counter_collection_record_t

* Update tests/rocprofv3/counter-collection/input1/CMakeLists.txt

- use rocprofiler_configure_pytest_files for validate.py, conftest.py, and input.txt

* Update tests/rocprofv3/counter-collection/input1/validate.py

- re-enable test_validate_counter_collection_pmc1_json

* Update tests/rocprofv3/counter-collection/input2/validate.py

- remove unused code

* Update tests/rocprofv3/counter-collection/input2/validate.py

- remove unused code

* Update tests/rocprofv3/hsa-queue-dependency/validate.py

- re-enable JSON tests

* Misc tests/rocprofv3 CMake updates

* Update tests/rocprofv3/tracing/validate.py

- re-enable JSON tests

* Update tests/rocprofv3/tracing-hip-in-libraries/validate.py

- re-enable JSON tests

* Update tests/rocprofv3/tracing/validate.py

- remove unused node_exists function

* Update tests/rocprofv3/tracing/validate.py

- fix test_marker_api_trace_json

---------

Co-authored-by: Sriraksha Nagaraj <Sriraksha.Nagaraj@amd.com>
2024-05-22 00:53:42 -05:00
Jonathan R. Madsen 7eeb7376b4 Fix ROCP_CI_LOG when ROCPROFILER_CI not defined (#864) 2024-05-20 16:04:28 -05:00
Jonathan R. Madsen 4d5b71b0e7 Update logging (#838)
* Update logging

* Remove unused function

* Fix lib/rocprofiler-sdk/hsa/pc_sampling.cpp logging compilation

* Fix logging FLAGS_vmodule string leak and numerical log level

* Update logging

* Update glog submodule

* Leak fixes

* format
2024-05-20 15:38:18 -05:00
Jonathan R. Madsen 1f96593b4f Test using HIP Graphs (#835)
* Test using hip graphs

* Remove assert for api_end < async_end

* Update rocprofv3/tracing-hip-in-libraries::test_api_trace

* Update rocprofv3/tracing-hip-in-libraries::test_api_trace

* Increase rocprofv3-test-trace-hip-in-libraries-validate timeout

* Update rocprofv3/tracing-hip-in-libraries::test_api_trace

* Remove submit retry

* Update rocprofv3/tracing-hip-in-libraries::test_api_trace

* Increase rocprofv3-test-trace-hip-in-libraries-validate timeout

* Update lib/common/container/record_header_buffer.hpp

- minor tweaks

* Update lib/rocprofiler-sdk/buffer.hpp

- tweak ROCPROFILER_BUFFER_POLICY_LOSSLESS flush behavior

* Increase rocprofv3-test-trace-hip-in-libraries-validate timeout

* Update rocprofv3/tracing-hip-in-libraries::test_api_trace

* Revert rocprofv3-test-trace-hip-in-libraries-validate timeout

* Update run-ci.py

- RETRY_COUNT set to zero
2024-05-07 15:10:22 -05:00
Jonathan R. Madsen 12c836f95f Async memory copy callback tracing + memory copy size (#791)
* Async memory copy tracing update

- rocprofiler_buffer_tracing_memory_copy_record_t: thread_id and bytes
- support ROCPROFILER_CALLBACK_TRACING_MEMORY_COPY
- init_public_api_struct can fully construct

* Testing for callback async copy tracing
2024-04-18 04:31:59 -05:00
Jonathan R. Madsen d6bb50cae1 Minor fixes + correlation id files + compute_runtime_sizeof (#757)
* Update lib/rocprofiler-sdk/context/*

- create correlation_id.{hpp,cpp} and moved implementation into these files instead of in context.{hpp,cpp}

* Update lib/rocprofiler-sdk/thread_trace/att_core.hpp

- fixed header includes

* Update lib/common/utility.hpp (runtime sizeof)

- added compute_runtime_sizeof<T>() function to set the "size" field to be the offset of the "reserved_padding" field if one exists

* Fix to compute_runtime_sizeof
2024-04-12 12:34:00 -05:00
Jonathan R. Madsen 3c005b81b1 Add support for hsa_amd_queue_get_info (#752)
* Add support for hsa_amd_queue_get_info

- HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x02

* Suppress unused-but-set-parameter warnings
2024-04-11 18:29:22 -05:00
SrirakshaNag bef14ad1b2 rocprofiler-sdk-tool library intermediate binary output (#734)
* Support for binary temporary files

* clang formatting

* formating ring buffer.hpp

* Update source/lib/common/container/ring_buffer.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fixing bugs

* fix loop range

* Fix for v3 test failures

* bug fix

* fix bug

* fix memory leaks

* destructing agent_info

* Update CMakeLists.txt

* clang-tidy fixes

* Fix data race on destructor of rocprofiler_agent_t map in rocprofiler-sdk-tool library

* Create lib/rocproifler-sdk-tool/tmp_file.*

- move tmp_file class into separate header/implementation

* Agent Info CSV in rocprofiler-sdk-tool

- update tests to use agent_info.csv instead of rocminfo

* Update lib/rocprofiler-sdk-tool/tool.cpp

- use logical_node_id instead of node_id

* Adding stats file

* Adding tests for stats

* Update scratch memory support

- convert scratch memory support to use binary output

* Tool Update: scratch memory stats + extended statistics

- replace generate_*_csv with generate_csv overloads
- added generate_csv for scratch memory
- enable stats for scratch memory
- replace ROCPROF_*_STATS env variables with ROCPROF_STATS env variable

* rocprofv3 update

- simple --stats option
- add scratch memory trace to --sys-trace

* Update tests/rocprofv3/tracing-hip-in-libraries

- extend validate.py to test stats data
- fix conftest.py for memory_copy_stats_data

* Code coverage fixes

- invoke __gcov_dump to ensure that code coverage is flushed after finalization

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-09 05:25:28 -05:00
Benjamin Welton 41c0ddd72d Convert LOG() -> ROCP_X logging macros. (#695)
* Convert LOG() -> ROCP_X logging macros.

This patch converts the LOG() macro to the ROCP_X logging macros.
There are the following levels of logs.

Logs whos expressions are not evaluated unless the log level is enabled:

ROCP_TRACE - VLOG(2) (enabeled by env variable GLOG_v=2)
ROCP_INFO - VLOG(1) (enabeled by env variable GLOG_v=1)

Logs whos expressions are always evaluated:

ROCP_WARNING - LOG(WARNING)
ROCP_ERROR - LOG(ERROR)
ROCP_FATAL - LOG(FATAL)
ROCP_DFATAL - DLOG(FATAL) (only fatal in debug mode)

* source formatting (clang-format v11) (#696)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Minor fix

* Fixes for VLOG before main

* fix vmodule

* source formatting (clang-format v11) (#718)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* memory leak fix

* Vlog change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-04-02 17:15:30 -07:00
Jonathan R. Madsen 939e23e9d1 Stop all client contexts prior to finalization (#721)
* Stop all client contexts prior to finalization

* Update lib/common/container/static_vector.hpp

- improve emplace_back for non-{move,copy}-assignable object

* Update samples/intercept_table/client.cpp

- improve robustness against static object destruction

* Update lib/rocprofiler-sdk/context/context.cpp

- change storage of registered context array
  - stable_vector of optional contexts
  - common::static_object wrapper around stable_vector

* Update samples/intercept_table/client.cpp

- use variable template for underlying function pointer
2024-04-02 03:05:11 -05:00
Ammar ELWazir 2905fb5e95 Update run-ci.py (#641)
* Temp: Fixing node id

* source formatting (clang-format v11) (#709)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Using logical node id

* Update agent.cpp

* Update agent.cpp

* Python formatting

* Update run-ci.py

* Update run-ci.py

* Update continuous_integration.yml

* Update continuous_integration.yml

running directly using the prepared runner container

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update run-ci.py

* Clean up

* Fixing install paths

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Update continuous_integration.yml

* Fixing GPU Agents Test Validation

* python formatting (black) (#712)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Fixing the issue with rocclr detected kernels __amd_rocclr_.*

* python formatting (black) (#713)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Fixing the issue with rocclr detected kernels __amd_rocclr_.*

* Fixing static number of async copies and using hsa_api instead for validation

* python formatting (black) (#714)

Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>

* Increasing the time limit for waiting on active signals

* Update continuous_integration.yml

* Update async_copy.cpp

* Update CMakeLists.txt

* changing node id to logical node id in rocprofv3

* Update tool.cpp

* testing async mem copy signal decrement

* Update logging.cpp

* Update validate.py

---------

Co-authored-by: Ammar ELWazir <aelwazir@rocprofiler1.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ammarwa <3832908+ammarwa@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <aelwazir@rocprofiler2.amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-04-02 01:39:24 -05:00
Jonathan R. Madsen 2f9b1767e9 Handle hsa_queue_destroy after finalization (#679)
* Handle hsa_queue_destroy after finalization

- fixes issue where hsa_queue_destroy(...) is invoked after rocprofiler-sdk has finalized
- hsa::get_queue_controller() returns pointer
- if queue controller is a null pointer, skip invoking QueueController::destroy_queue

* Update HIP/HSA/marker update_table logging

* Update rocprofv3 tests

- remove HSA_TOOLS_LIB env variable
- remove setting ROCPROFILER_LOG_LEVEL env variable
- add timeouts to tests which are missing them

* Disable thread sanitizer deadlock detection

* Update CI workflow

- rename vega20-ubuntu job to core-ci
- enable navi32 in core-ci and sanitizers

* Update run-ci.py

- set gcovr html medium and high threshold

* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp

- remove this capture from enable/disable serialization

* Update lib/rocprofiler-sdk/hsa/{hsa_barrier,profile_serializer}.*

- hsa_barrier::set_barrier accepts const-ref to queue map
- profile_serializer::enable and profile_serializer::disable accept const-ref to queue map

* Logging for HIP/HSA/marker/profile_serializer

* Logging for HIP/HSA/marker/queue_controller

* Improve test_retired_correlation_ids asserts

* Fix tests/counter-collection/validate.py

- scale expected SQ_WAVES counter value based on warp size of GPU

* Tweak github comment for code coverage

* Remove gcovr html high/medium threshold args

* Fix tests/counter-collection/validate.py

- round before casting to int in test_counter_values

* operator bool for profile_serializer

- only wait on CV if profile_serializer is used

* Logging updates (profile_serializer + code_object)

* Update counter-collection validate.py

* QueueController does not wait on CV if finalizing/finalized

* Update CI workflow

- remove navi32 from core job

* Improve HIP/HSA/marker tracing get_functor/functor

- remove lambda wrapper around functor

* Update lib/rocprofiler-sdk/hsa/queue_controller.cpp

- do not acquire cvmutex lock during finalization

* Update lib/rocprofiler-sdk/hsa/hsa_barrier.*

- move ctor and dtor to implementation
- skip signal store screlease and destroy if already finalized

* Update CI workflow

- remove navi32 runners

* bwelton fixes for hangs

* CMake improvements + simplified demangle

- remove amd-comgr from common target (and thus removed from roctx DT_NEEDED)

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
2024-03-21 17:52:15 -05:00
Jonathan R. Madsen 7ab1a8015f Fix tracing context domain logic for operations (#621)
* Fix tracing context domain logic for operations

- logic error: domain enabled (all operations all implicitly enabled) + domain enabled for subset of operations resulted in only explicitly enabled operations being treated as enabled
- domain_context: split single bitset for operations in all domains into array of bitsets for each domain

* Update lib/common/mpl.hpp

- assert_false for static_asserts in if constexpr expressions

* Update lib/rocprofiler-sdk/tests/contexts.cpp

- Tests for validating logic regarding domain and operations for callback and buffer tracing
2024-03-14 01:25:43 -05:00
Jonathan R. Madsen 8591ed1c96 Use small_vector for API iterate_args (#597)
* Use small_vector for API iterate_args

- replace dim3 value arguments with rocprofiler_dim3_t
  - dim3 has a non-trivial destructor
- common::mpl::unqualified_type
- common::stringified_argument_array_t<N> alias
- assert_public_data_type_properties()
- common::container::small_vector<T>::at function
- stringize returns small_vector<stringified_argument>
  - stack allocated vector
- remove has_pc_sampling condition (HSA, HIP)
  - this will be handled in queue interception

* Misc tweaks
2024-03-13 07:36:55 -05:00
Benjamin Welton 1de44447f4 Deadlock Fix for HSA and Serialization Disable/Enabling support (#582)
* Initial barrier

* Working on profiler serializer extraction

* Current progress

* Serializtion Support

* source formatting (clang-format v11) (#583)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#584)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Minor fix

* Current Progress

* Current progress

* More fixes

* Serialization Fixes

* Bug fix

* source formatting (clang-format v11) (#600)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* More fixes

* More minor fixes

* source formatting (clang-format v11) (#603)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* source formatting (clang-format v11) (#604)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* Lock order inversion false positive

* order fix

* More changes

* source formatting (clang-format v11) (#607)

Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>

* minor test fix

* Minor test changes

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <1683479+bwelton@users.noreply.github.com>
2024-03-08 09:02:43 -06:00
Jonathan R. Madsen 7b6d3c70bd Shared Library Constructor (rocprofv3 deadlock fix) (#599)
* Moved tests/apps to tests/bin

* Renamed cmake project in tests/bin

* Update samples

- Use ROCPROFILER_DEFAULT_FAIL_REGEX
- tweaks to stdout messages

* Update tests

- Use ROCPROFILER_DEFAULT_FAIL_REGEX

* Add tests/lib

- libraries with HIP code

* Update PTL submodule

- remove atexit delete of thread_id_map

* Update cmake/rocprofiler_options.cmake

- Set ROCPROFILER_DEFAULT_FAIL_REGEX

* Update common lib: env + logging

- improved customization of logging settings
- default to disabling logging to files
- install failure handler for rocprofv3
- set_env support in environment.*

* Add lib/rocprofiler-sdk/shared_library.cpp

- shared library constructor

* Update lib/rocprofiler-sdk-tool/tool.cpp

- destructor thread safety
- convert callback_name_info and buffered_name_info to pointers
- install failure handler for logging

* Add tests/bin/hip-in-libraries

- hip-in-libraries is an exe which uses two shared libraries where each shared library contains HIP kernels
  - used for testing deadlocking within __hipRegisterFatBinary

* Update bin/rocprofv3

- reorganized the env variables
- use exec to launch command
- set ROCPROFILER_LIBRARY_CTOR=1

* Add tests/rocprofv3/tracing-hip-in-libraries

- uses hip-in-libraries exe for exe which uses shared libraries to launch HIP kernels

* Update bin/rocprofv3

- fix counter collection (no exec)

* Update lib/rocprofiler-sdk-tool/tool.cpp

- replace "Kernel-Name" with "Kernel_Name"

* Update lib/rocprofiler-sdk/registration.cpp

Use RTLD_LOCAL instead of RTLD_GLOBAL for env libraries

* Update tests/rocprofv3

- replace "Kernel-Name" with "Kernel_Name"

* Update tests

- vector-ops (bin) stream syncs + runs with 4 queues per device
- improve counter-collection/input1 validation
- rocprofv3/tracing-hip-in-libraries does not do sys-trace
- improved validation script for tracing-hip-in-libraries
- updated dispatch_callback in json-tool.cpp following reworking of prototypes for counter collection

* Update samples/counter_collection

- updated dispatch_callback(s) and record_callback(s) following reworking of prototypes

* Update bin/rocprofv3

- reorganized help menu
- added options for sub-HSA tables
- added --hip-runtime-trace
- changed --hip-trace to include --hip-compiler-trace

* Update lib/rocprofiler-sdk-tool

- improved kernel filtering
- removed arch_vgpr, accum_vgpr, sgpr code (in rocprofiler-sdk)
- fixed issue with counter-collection w/o tracing
- added support for fine grained HSA API tracing
- removed directly linking to HSA-runtime

* Update lib/rocprofiler-sdk/agent.cpp

- rocp_agents != hsa_agents is non-fatal when ROCPROFILER_BUILD_CI=OFF (CMake option)

* GPR (vector and scalar) info in kernel symbol data

- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t contains general purpose register info

* Header include order fix

- Include repo headers first
- Third party library headers next
- standard library headers last

* Update dispatch profiling public API

- introduce rocprofiler_profile_counting_dispatch_data_t
- change signature of rocprofiler_profile_counting_dispatch_callback_t and rocprofiler_profile_counting_record_callback_t
- provide rocprofiler_user_data_t pointer in dispatch callback
- provide rocprofiler_user_data_t value (from dispatch cb) in record callback

* Update tests/bin/CMakeLists.txt

- fix add_subdirectory(hip-in-libraries) order

* Update VERSION

- bump to 0.2.0 in prep for AFAR
2024-03-07 22:21:26 -06:00
Jonathan R. Madsen 1bb94add11 Fix rocprofiler_iterate_callback_tracing_kind_operation_args for HIP compiler callbacks (#532)
* Fix HIP compiler iterate args

- `include/rocprofiler-sdk/hip/api_args.h`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/callback_tracing.cpp`
  - iterate_args for HIP compiler table
- `lib/rocprofiler-sdk/registration.cpp`
  - fix warning about roctx num_tables
- `lib/rocprofiler-sdk/hip/hip.def.cpp`
  - replace struct fields named "f" with "func"
  - replace hip stream fields named "hStream" with "stream"
- `lib/rocprofiler-sdk/{hip,hsa,marker}/utils.hpp`
  - improve `stringize_impl`
- `lib/rocprofiler-sdk/hsa/code_object.cpp`
  - remove stale commented out code
- `lib/rocprofiler-sdk/hsa/queue_controller.*`
  - destory_queue -> destroy_queue
- `tests/tools/json-tool.cpp`
  - improve parallelism in tool_tracing_callback
  - serialize the marker api args
  - only invoke rocprofiler_iterate_callback_tracing_kind_operation_args in exit phase
- `samples/counter_collection/CMakeLists.txt`
  - reduce timeout on tests to 120 seconds

* Update lib/rocprofiler-sdk/hsa/utils.hpp

- disable dereference of double pointer in stringize_impl

* Update lib/common

- indirection_level in mpl.hpp
- stringize_arg.hpp

* Rework rocprofiler_iterate_callback_tracing_kind_operation_args

- provide more information in rocprofiler_callback_tracing_operation_args_cb_t
- support specifying the dereference level to account for output paramters
2024-03-01 01:46:07 -06:00