Граф коммитов

134 Коммитов

Автор SHA1 Сообщение Дата
Madsen, Jonathan bd447ab941 Misc AFAR VII updates + clang-tidy-19 + bump version to 0.6.0 (#54)
* Misc AFAR VII updates + clang-tidy-19 + bump version to 0.6.0

- move tests/rocprofv3/trace-period to tests/rocprofv3/collection-period
- bump clang-tidy to v19
- fix misc clang-tidy errors

* Update the collection period test

- don't attach files on fail bc when test is disabled, it causes problems

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-06 12:35:29 -06:00
Jakaraddi, Manjunath 78d8f4b8ea SWDEV-492623: Hip Host Function to Device Symbols Mapping (#18)
* Adding changes to register and read symbols from the hip fat binary

* adding json output for host_functions

* added error handling

* adding json tool support

* Adding tests

* formatting changes

* Adding documentation

* refactoring as per amd-staging

* Adding intializers and changing macros

* Fix page-migration background thread on fork (#31)

* Fix page-migration background thread on fork

After falling off main in the forked child, all the children
try to join on on the parent's monitoring thread. This results
in a deadlock. Parent is waiting for the child to exit, but
the child is trying to join the parent's thread which is
signaled from the parent's static destructors.

Even with just one parent and child, due to copy-on-write
semantics, a child signalling the background thread to join
will still block (thread's updated state is not visible
in the child).

This fix creates background treads on fork per-child with a
pthread_atfork handler, ensuring that each child has its own
monitoring thread.

* Formatting fixes

* Detach page-migration background thread and update test timeout

* Attach files with ctest

* Update corr-id assert

* Tweak on-fork, simplify background thread

* Revert thread detach

* Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9)

* Adding Trace Period feature to rocprofv3

* Adding feature documentation

* Update source/bin/rocprofv3.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fixing format

* Moving to Collection Period and changing the input params

* Format Fixes

* Fixing rebasing issues

* Removing atomic include from the tool

* Adding more options for units, optimizing the code

* Fixing rocprofv3.py

* Fixing time conv & adding time controlled app

* Fixing format

* Changing to shared memory testing methodology

* use of shmem use

* Fix include headers for transpose-time-controlled.cpp

* Format upload-image-to-github.py

* Removing shmem and using only env var to dump timestamps from the tool

* Tool Fixes + Test Config

* Adding Tests

* Fixing Review comments

* Update trace period implementation

* Update trace period tests

* check between start and stop timestamps

* Merge Fix

* Update validate.py

* Improve safety of rocprofiler_stop_context after finalization

* Pass context id to collection_period_cntrl by value

* Adding 20 us error margin

* Ensure log level for collection-period test is not more than warning

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- move error code check macros to implementation
- fix macros which check error code
- use constexpr values instead of #define

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- debugging for error that cannot be locally reproduced

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- improve error handling and logging

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- tweak to non-fatal logging messages

* Update lib/rocprofiler-sdk/code_object/hip/code_object.*

- cleanup of logging messages

* Update host kernel symbol register data fields

* Update source/lib/rocprofiler-sdk/code_object/hip/code_object.hpp

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Kuricheti, Mythreya <Mythreya.Kuricheti@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-06 11:42:37 +00:00
Trowbridge, Ian 79006bb896 SWDEV-492625 memory free functions (#11)
* SWDEV-492625: Track free memory HSA functions to help determine total amount of memory allocated on the system at any one time

* Minor fixes to address comments

* Update allocation size description

* Moved get function back to specialization, minor typo fixes

* Removed memory_operation_type field, removed memory_pool allocation enum, converted starting address to hex string for json format.

* Made conversion to hex_string a function, changed address to use union rocprofiler_address_t type, changed VMEM descriptors

* Removed as_hex from the global namespace

* Formatting

* Removed TRACK_EVENT for memory allocation, now TRACK_COUNTER for memory allocation is being performed

* Check if address was recorded before retrieving allocation size in generate Perfetto

* Formatting

* Update source/lib/output/generatePerfetto.cpp

* Explicitly disable app-abort tests

* Remove excluding app-abort test from workflow CI

- redundant bc these tests are explicitly marked as disabled now

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-06 00:05:30 -06:00
Madsen, Jonathan 00c46fd5e5 SDK: OMPT Support (#22)
* Ability to select alternative compiler per file

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Misc updates

Update OpenMP target sample

- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files

Rework OpenMP implementation

Minor OpenMP implementation cleanup

Rename samples/openmp_target CMake targets

Add tests/bin/openmp

- OpenMP target test app in tests/bin/openmp/target

Format samples/openmp_target CMakeLists.txt

Misc lib/rocprofiler-sdk/openmp cleanup

- fix includes
- convert_arg

Update openmp.def.cpp

- tweak includes
- remove lots of temporary variables

Update samples

- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample

Fix code object operation names

- add "CODE_OBJECT_" prefix

Update include/rocprofiler-sdk/openmp/api_id.h

- remove spurious comment

Miscellaneous openmp updates

- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events

[SWDEV-484495] Fix int truncation in CSV output (#1098)

CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.

Co-authored-by: Benjamin Welton <ben@amd.com>

Update limit for max counter records in rocprof-tool (#1073)

A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.

adding proxy ompt_data_t * arguments

fixes for proxy pointers

- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers

formatting

SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)

Fixing some accumulate metrics (#1089)

* Fixing some accumulate metrics

* Fixing some more accumulate metrics

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

updating rocprofv3 help options (#1113)

* updating rocprofv3 help options

* updating CHANGELOG

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)

* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding backlog for codeobj changes

* Formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

---------

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

SWDEV-487621: Fixes for metric definitions (#1118)

* Fixes for metric definitions

* Removing gfx8

* Update changelog

* Fixing unit tests

* Small fixes

* Fix for write size

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3

clang-18 build fix for RCCL (#1123)

Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).

mem copy direction field update (#1124)

Adding Node-id for debugging with log level trace (#1090)

fix botched rebase

Per Jonathan to remove -rdynamic warning so CI will continue

pedantic formatting

Correct the package name of rocprofiler-sdk (#1126)

* Correct the package name of rocprofiler-sdk

ROCM VERSION(for ex: 60300) was missing in the package name.
Added the same

* Use cmake cache string while setting the variable for ROCm Version

* correct the cmake-format

---------

Co-authored-by: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>

Fixing kokkosp tool library packaging (#1121)

* Fixing kokkosp tool library packaging

* Update source/lib/rocprofiler-sdk-tool/kokkosp/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt

* Update CMakeLists.txt

* Component Requirement in CPack

* Adding package dependency

* Update CMakeLists.txt

* Update rocprofiler_config_packaging.cmake

* Fix rocprofiler-sdk-tool-kokkosp BUILD/INSTALL RPATH

- CMAKE_INSTALL_LIBDIR doesn't help

* Add BUILD/INSTALL RPATH to rocprofv3-trigger-list-metrics

- fixes packaging issues

* Update packaging

- core depends on rocprofiler-sdk-roctx
- add CPACK_DEBIAN_PACKAGE_SHLIBDEPS_PRIVATE_DIRS to resolve inter-package dependencies

* Fix package depends version format

* Improve tests/rocprofv3/summary/validate logging

* Update CI workflow

- prioritize roctx package in Install Packages step

* Remove setting <package-name>_VERSION in config.cmake.in

- this is automatically handled by existence of <package-name>-config-version.cmake

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements to same major and minor version

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements (remove EXACT, specify range)

* Tweak CI workflow

* Update perfetto_reader.py

- better handle failure to load trace processor

* Misc cleanup for config packaging

* Update config packaging

* Update config packaging

* Revert perfetto for core-rpm packages

* Revert perfetto for core-rpm packages

- perfetto < 0.9.0

* Tweak tests/rocprofv3/summary/validate.py

- reorder some checks

---------

Co-authored-by: Ammar Elwazir <aelwazir@useocpm2m-387-013.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

Clang Warning Fixes (#1131)

Builds prevented on clang-18

Adding start and end timestamp columns in csv (#1128)

* Adding start and end timestamp columns in csv

* Adding assert check for the counter timestamps

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

rocprofv3: docs and help menu updates (#1129)

* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence

Renamed agent profiling service to device counting service (#1132)

* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

Testing updated RPM dockers (#1136)

* Testing updated RPM dockers

* Trying to fix PSDB for test package dependency

Agent Profiling Fixes for Broken/Improper API Usage (#1122)

Prevent's multiple setups of agent profiling on the same agent.

Fixes agent read context to only read agents that were setup.

Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.

Simplifying PR template (#1139)

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3

delete unused files

added arguments to some OMPT buffter records

* Fix cmake issues

Remove rocprofiler_ompt_finalize_tool

- a public API function is not necessary: should just finalize rocprofiler-sdk

Fix duplicate ROCPROFILER_{BUFFER,CALLBACK}_TRACING_KIND_STRING

Add lib/rocprofiler-sdk/ompt.hpp

- declares rocprofiler::sdk::finalize_ompt

Remove change to tests/rocprofv3/summary/conftest.py

Add set_fini_status(1) back to registration.cpp

Deleted uneeded files

Incoporate OpenMP code and sample

Fix merge issues with amd-staging

Add push_correlation_id for OpenMP tasking; improve debugability

fixup bad merge

* Suppress OpenMP data race

* Fix openmp_target sample

* Enum and struct name changes + source code reorg

- remove mix of ompt and openmp
  - opted for ompt
- changes made for consistency
  - ompt_api -> ompt
  - openmp_api -> ompt
  - OPENMP -> OMPT

* Update tests and more renaming

- dest_device_num -> dst_device_num
- src_addr -> src_address
- dest_addr -> dst_address
- remove info_type::begin
- require OMP_TARGET_OFFLOAD

* Update openmp-target test/sample env and labels

* Formatting

* Tweaks to cmake for openmp target

- Disable for thread sanitizers due to preloading issue

* OpenMP target cmake updates

- remove gfx1010 (fails on mi300)
- OPENMP_GPU_TARGETS

* Remove device_unload and target_map_emi support

- these are never supported by AMD OpenMP compilers

* Update CI workflow

- exclude openmp-target tests from navi3 and vega20

---------

Co-authored-by: Larry Meadows <Lawrence.Meadows@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-05 22:48:19 -06:00
Meserve, Mark fc2513888f SWDEV-445864: SWDEV-445865: Update page migration events (#16)
* Update kfd ioctl header

- Adds new event for dropped events
- Mirrors kernel update by Philip Yang

* Add error code for page migration events

- Adds support for new error code field for page migration end events
  - Page migration end event is now generated for migration failure
  - Error code is zero for successful migration

* Add dropped event SMI event

- New event type indicates if events were dropped
  - Events are dropped if the buffer is full
2024-12-05 20:44:10 +00:00
Nagaraj, Sriraksha 50b185b9ac rocprofv3: PC Sampling Support (#14)
* Adding tool pc sampling support

Fixing merge issue

tool support on SDKupdates

link amd-comgr

Sanitizer failure fix

fix format

Addressing review comments

misc fix

Adding dispatch id to the CSV output

AddingCHANGELOG

[ROCProfV3][PC Sampling] Initial ROCProfV3 PC sampling tests for JSON and CSV formats (#17)

ROCProfV3 initial tests for JSON and CSV output.

Simple kernels that simplify the verification of samples to instruction decoding
has been introduced.

removing option to enable pc sampling explicitly

Adding documentation

no pc-sampling option in tests anymore

Addressing review comments

Updating docs

an option for choosing whether all units must be sampled

try ignoring PC sampling tests (#36)

* run pc-sampling tests on MI2xx runners
* use v_fmac_f32 instead of s_nop 0 in tests

* fixing docs
2024-12-04 18:32:48 -06:00
Benjamin Welton 7ddc72ad45 Add rocprofiler_load_counter_definition (#1193)
Adds rocprofiler_load_counter_definition. This function allows a counter definition file to be supplied to rocprofiler-sdk directly. Takes in a string containing the counter definition YAML, its size (in bytes), and a flag value to state whether this is an append operation or not.

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: usrihari123 <srihari.u@amd.com>
2024-11-22 01:55:47 -08:00
Gopesh Bhardwaj 7ea9ced493 SDK doc updates (#1183)
* correcting usage example

* rccl trace

* Adding Navi power state limitation

* Addressed feedback

* kernel-rename

* kokkos trace

* more information on kookos tracing

* Corecting tool library hardcoding

* summary domains

* Updating domain stats file

* updating images

* rocprofv3 default behavior update

* Removing README from API documentation

* Added missing description in Topics

* Fixed wrong rendering of README in API document

* Fixing Topics in API docs

* Removing API doc for details/rccl.h

* Addressed review comments
2024-11-22 12:05:11 +05:30
Vladimir Indic bc52c17e64 Host trap PC sampling uses new record type (#1207)
* Host trap PC sampling uses new record type

* removing redundant field

* formatting

* simplifying templates in the parser - no need for HostTrap boolean

* reviving some parser tests

* hw_id decoding on GFX9

* HW id parser test

* parser CID test

* Parser multigpu test

* removing rocprofiler_pc_sampling_record_t and some fields from hw_id

* simplifying parser context

* keep bench test internally

* initializing gfx9_hw_id_t differently

* anonymous struct first

* avoiding inlining initialization of struct
2024-11-20 14:02:47 -06:00
Jonathan R. Madsen 249c50fc40 Runtime Initialization Tracing (#1105)
* Runtime initialization tracing

- calbacks and buffer entries notifying when a runtime has been initialized

* Minor cleanup to registration.cpp

* JSON tool implementation

* Increase perfetto_reader timeout

* Handle perfetto_reader timeout when attr doesn't exist

* clang-tidy fixes to memory_allocation.cpp
2024-11-18 20:50:29 -06:00
itrowbri 3bd7773cf7 Memory Allocation Tracking (#1142)
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions

* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected

* Debugging and implementing generateCSV function

* Memory allocation size and starting address outputted to csv and json file formats

* Formatting

* Initial setup for OTF2 and Perfetto generation

* Collecting agent id for memory_allocation and formatting

* Modified memory_allocation.cpp to set up code for AMD_EXT commands

* Support for memory_pool_allocate added

* Removed accidently added file

* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works

* Formatting

* Fixed perfetto and otf2 output

* Fixed flag issue due to incorrect buffer use

* Updated documentation

* Small cleaning and comments

* Added test for HSA memory allocation tracing

* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error

* Decreased lower limit of hip calls for test

* Modified summary tests to vary number of allocate requests

* Minor fixes to address comments. Still need to address OTF2 comments

* Fix docs and changed OTF2 to use enum for type specified in location_base construction

* Fixed schema error

* Added vmem command tracking. Need to add test

* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.

* OTF2 enum update and mispelling fix

* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API

* Update CMakeLists.txt for memory allocation test

* Updated summary test

* Minor fixes to address comments

* Moved domain_type.hpp enum to before LAST

* Fixed compile errors and formatting

* Fixed stats summary domain name error

* Added rocprofv3 test

* Page migration test fix

* Undo page migration test changes. Failures do not appear to have to do with memory allocation
2024-11-18 20:22:14 -06:00
Mythreya 363f85dc72 Report page migration events as start/end (#793)
* Squashed commit of the following:

commit b76f2635f4b65599f03812a73d0cf410f5ada213
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Fri Apr 26 00:29:09 2024 +0000

    Changed for PR feedback

commit bedb8ad566ff42fbf117b19202c26c507abcf8ac
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:20:06 2024 -0500

    Fix installation

commit a98f8a69459a1450a1be9c98e20b3c1e7f2568c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:16:35 2024 -0500

    Restructure the headers

commit 46489a020ffafdd5f4ce3f580469ff233ef67fe1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:31:10 2024 +0000

    Update hsa include

commit 8e795282cce348fc6aa736b7857b21aeb32aa20a
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:02:32 2024 +0000

    Report page migration events as start/end

    * Updated tests accordingly
    * Page migration events are reported independently

commit 8784e5ad4895a626a2a8e4ac12f8021b34172bd4
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:57 2024 +0000

    Update handling of dropped page migration events

    Previously, we dropped all locally buffered events when we detect that
    KFD has dropped some events. This may drop too many pending events too eagerly.

    When we receive an end event and cannot find the corresponding start,
    we can be sure that KFD has dropped some events in the immediate past.

    When this happens, we look through all locally buffered events and report
    the start events that are older than 10s as partial events --- they have
    no "end" information (we expect that the end events have been dropped).

    We also set the polling timeout to 10s to prevent the local buffer from
    getting too large with events waiting to be paired up.

    Updated tests

commit 2e8e0b07eeda9b5990e1ae8d28dcd3a035ce38e1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:31 2024 +0000

    Docs for triggers

* Fix page migration sample

* Fix hasher, kfd install

* Add hsa include
* Install KFD include dir

* Updates from code review

- single timestamp field
- node_id -> agent_id
- from_node -> from_agent
- to_node -> to_agent

* Misc revisions

* Remove page-migration install target

* Update page-migration pytest

* Tweak to serialization

* Address PR comments

* Update page-migration test

* Add cli args, update iterations

* Address PR comments

* Add abi.cpp for static_asserts
* Update page_migration gtest with only runtime tests
* Moved helpers into utils.hpp

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-11-11 11:08:47 -06:00
Jonathan R. Madsen 5eb8c2658c rocprofv3: refactor and reorganize rocprofiler-sdk-tool library (#1138)
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool

* Initial source re-organization

- create "output" static library

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- add GPR count fields to kernel symbol serialization

* Add source/scripts/generate-rocpd.py

- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation

* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output

* Updates to generate-rocpd.py

- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names

* Update generate-rocpd.py

- Add --marker-mode option

* Update generate-rocpd.py

- Improve debugging of bad bulk SQLite statements

* Update rocprofv3-multi-node.md

- cleanup of proposed SQL schema

* lib/output/format_path.{hpp,cpp}

- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}

* Rework lib/output/tmp_file_buffer.{hpp,cpp}

* Update output_key.cpp

- support %cwd%, %launch_date%

* Rework lib/output/buffered_output.hpp

* Support csv_output_file constructed via domain_type

* Update lib/output/domain_type.{hpp,cpp}

- get_domain_trace_file_name
- get_domain_stats_file_name

* Update lib/rocprofiler-sdk-tool/tool.cpp

- tweak headers

* Update lib/output/generate*.cpp

- remove include of helpers.hpp
- CSV uses domain_type for filenames

* Update samples/counter_collection/per_dev_serialization.cpp

- make wait_on volatile

* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool

- Also split various structs into their own files
  - lib/output/agent_info
  - lib/output/metadata
  - lib/output/kernel_symbol_info
  - lib/output/counter_info
- Implemented rocprofiler::tool::metadata

* Optimize rocprofiler_tool_counter_collection_record_t

- reduce the size of the struct from 24784 bytes to 8376 bytes

* Introduced output_config

- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory

* Stream chunks of data into output instead of loading all info memory

* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization

* Adding Q&A to rocprofv3-multi-node.md

* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output

- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output

* Update Q&A of rocprofv3-multi-node.md

* Fix minor compilation errors + minor cleanup

* Update hsa/async_copy.cpp

- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time

* Update profiling_time.hpp

- fix log messages for when start/end time is less/greater than enqueue/current CPU time

* Fix generate_stats for tool_counter_record_t

* Dictionary optimization for generate-rocpd.py

---------

Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
2024-11-07 01:15:19 -06:00
Larry Meadows 62e0a9c1a3 SDK: OMPT Support part 1: include file and print formatters for OMPT support (#1175)
* include file and print formatters for OMPT support

* Apply suggestions from code review

* Remove rocprofiler_ompt_set_callbacks

* Reorder ROCPROFILER_EXTERNAL_CORRELATION_REQUEST_OPENMP

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-11-05 23:57:11 -06:00
Jonathan R. Madsen 7f416a2f82 Remove serializing Reserved field of HSA_CAPABILITY (#1170)
- reserved fields have no meaning
2024-10-30 00:12:48 -05:00
venkat1361 3f91d90bbc Check to force tools to initialize the ctx id to zero. (#1135)
* Check to force tool to initialize the ctx id to zero.

* initialize rocprofiler_context_id_t with 0 in units tests

* changelog

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
2024-10-22 18:09:25 +05:30
Benjamin Welton 210762c69d Added agent_id to rocprofiler_record_counter_t (#1078)
Co-authored-by: Benjamin Welton <ben@amd.com>
2024-10-21 16:29:53 -07:00
Benjamin Welton 788e687167 Agent Profiling Fixes for Broken/Improper API Usage (#1122)
Prevent's multiple setups of agent profiling on the same agent.

Fixes agent read context to only read agents that were setup.

Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.
2024-10-18 15:48:22 -07:00
Benjamin Welton bb69467765 Renamed agent profiling service to device counting service (#1132)
* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
2024-10-18 14:14:11 +05:30
Gopesh Bhardwaj 320427b5f5 rocprofv3: docs and help menu updates (#1129)
* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence
2024-10-17 13:28:53 +05:30
Benjamin Welton 28a6918b33 Clang Warning Fixes (#1131)
Builds prevented on clang-18
2024-10-15 22:20:38 -07:00
Benjamin Welton b46966e96b clang-18 build fix for RCCL (#1123)
Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).
2024-10-07 15:58:41 -07:00
Mythreya a3b41d04fc Fix PSDB change (#1120)
Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3
2024-10-07 17:37:19 -05:00
Gopesh Bhardwaj 9b2ece76c3 Fixing installed pacakge tests in CI (#1119)
* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter
2024-10-07 20:55:35 +05:30
Jonathan R. Madsen 7861dcc6c6 Update HIP tracing ABI (#1025)
* Update HIP ABI tracing

* Minor HIP abi.cpp updates

* Misc roctx updates (version.h + more)

* Common static thread-local template struct

- static_tl_object
- similar to static_object but with thread-local semantics

* rocprofiler-sdk/version.h updates

* Update for HIP_RUNTIME_API_TABLE_STEP_VERSION == {4,5,6}

* Fix roctx.cpp tweaks
2024-09-13 17:10:35 -05:00
venkat1361 bc82eccf4f SWDEV-476852 - Check added for agent architecture counters support. (#1022)
* check added for agent arch support

* formatting issue
2024-09-13 11:28:00 -07:00
Jonathan R. Madsen 8c1382fceb Package RCCL headers to support adding RCCL support w/o installed headers (#1075)
- in ROCm CI, rocprofiler-sdk gets built before RCCL is installed, this is a workaround for this issue
2024-09-12 18:24:50 -05:00
Mythreya 2a146259c7 Add support for RCCL tracing (#1047)
* [Draft]: Add support for RCCL tracing

Address comments

* [Draft]: Add support for RCCL tracing

Address PR comments, changes from RCCL upstream

* Add RCCL library table registration

Working on adding support to rocprofiler-register

* Support compilation w/o <rccl/amd_detail/api_trace.h>

- dummy api_trace.h header
- return ROCPROFILER_STATUS_ERROR_NOT_IMPLEMENTED when RCCL does not have api_trace.h header

* RCCL API tracing tool support

- add to rocprofv3
- add to json-tool

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-09-12 00:42:58 -05:00
Jonathan R. Madsen 395f01b689 rocprofv3: summary reports + more JSON metadata (#1029)
* Move include/rocprofiler-sdk/cxx/details/delimit.hpp to tokenize.hpp

* Update docs/how-to/using-rocprofv3.rst

- fix code block indents
- reorder rocprofv3 options, limit them to important options
- add docs for `--runtime-trace`

* Update rocprofv3.py

- parser argument groups
- new `--runtime-trace` option
- new `--summary` option
- new `--summary-per-domain` option
- new `--summary-groups` option
- new `--summary-output-file` option
- new `--summary-units` option

* Update lib/rocprofiler-sdk/hsa/async_copy.cpp

- fix async copy operation names: add "MEMORY_COPY_" prefix

* lib/rocprofiler-sdk-tool: update statistics.{hpp,cpp}

- statistics<>::get_percent function
- stats_entry_t struct
- stats_formatter struct
- percentage struct
- std::to_string(::rocprofiler::tool::percentage)

* lib/rocprofiler-sdk-tool: update domain_type.{hpp,cpp}

- reorder domain_type enum values

* lib/rocprofiler-sdk-tool: update generateCSV.{hpp,cpp}

- separate writing CSV from accumulating statistics
- a lot of functionality was moved to statistics.{hpp,cpp}

* lib/rocprofiler-sdk-tool: update output_file.{hpp,cpp}

- output_stream_t struct
- get_output_stream(...) returns output_stream_t instance

* lib/rocprofiler-sdk-tool: update generateJSON.cpp

- update get_output_stream usage to output_stream_t

* lib/rocprofiler-sdk-tool: update generateOTF2.cpp

- header include order tweak

* lib/rocprofiler-sdk-tool: update buffered_output.hpp

- stats_data_t was renamed to stats_entry_t

* lib/rocprofiler-sdk-tool: update generatePerfetto.cpp

- header include tweak

* lib/rocprofiler-sdk-tool: update tmp_file_buffer.hpp

- emit warning message if write_ring_buffer fails after offloading instead of aborting
- prefer placement new instead of assignment in write_ring_buffer

* lib/rocprofiler-sdk-tool: add generateStats.{hpp,cpp}

- functions for accumulating statistics

* Update tests/rocprofv3/tracing-hip-in-libraries/CMakeLists.txt

- accommodate tweak to CSV output file name for HIP and HSA traces

* lib/rocprofiler-sdk-tool: update config.{hpp,cpp}

- new config variables
  - stats_summary
  - stats_summary_per_domain
  - summary_output
  - stats_summary_unit_value
  - stats_summary_unit
  - stats_summary_file
  - stats_summary_groups
- support output keys for hostname: %hostname% / %h

* lib/rocprofiler-sdk-tool: update tool.cpp

- support summary output

* Documentation fixes

* Test for summary output

* Update tests/bin/transpose to use more ROCTx

- also support building with the roctracer ROCTx

* Remove roctxMark from OTF2 + fix kernel-rename tests

- following more ROCTx calls in transpose, kernel-rename validation had to be updated

* JSON metadata + JSON summary

- add serialization support for config
- add serialization support for statistics
- additions to json spec
  - rocprofiler-sdk-tool/metadata/config
  - rocprofiler-sdk-tool/metadata/command
  - rocprofiler-sdk-tool/summary
- config output_keys support for NVIDIA %q{<ENV-VAR>} syntax
- config output_keys support keys within keys

* rocprofv3 --summary-groups warning if no domain matches

- emit warning if a regex in for summary groups did not match any domain names

* Compile fix for lib/rocprofiler-sdk-tool/tool.cpp

- get_config().scratch_memory_trace
- pass contributions to write_json

* Update rocprofv3.py to preload rocprofiler-sdk-roctx

- appended to LD_PRELOAD when args.marker_trace is enabled

* Fix ReST link errors about subtitle underline being too short

* Patch tokenization of config::stats_summary_groups

- guard against array values of empty strings

* Tweak rocprofv3 summary test

- input-summary.yaml (used by rocprofv3-test-summary-inp-yaml-execute) only provides one summary group regex

* Disable LD_PRELOAD of librocprofiler-sdk-roctx.so

- this causes problems in the sanitizers, will be addressed in another PR
2024-09-09 11:20:55 -05:00
Vladimir Indic 93e82663d9 PC sampling: online partial PC sampling decoding (#1004)
* PC sampling: online partial PC sampling decoding

PC sampling service decodes a PC sample partially
by replacing the PC with an id of the loaded code object instance
containing PC and the offset of the PC within that code object instance.

* PC sampling: marker records removed

* PC sampling parser: minor doc update in mock

* PC sampling: introducing rocprofiler_pc_t

* NULL value of the code object id introduced.

* Clarifying documenation related to PC offset.

* PC offset documentation improvement

* PC sampling parser benchmark: Reducing the number of samples to recreate half of performance.
2024-09-05 11:35:46 -05:00
Jonathan R. Madsen 7a639f3439 Update HSA ABI checks for tracing (#1027)
* Update HSA ABI checks for tracing

* Update lib/common/abi.hpp

- perform ABI versioning checks even when `ROCPROFILER_CI` is not defined (or ROCPROFILER_CI=0)

* Enforce versioning size for various HSA AmdExt step versions + hsa_amd_enable_logging support

* Minor HIP abi.cpp updates
2024-08-20 01:08:34 -05:00
Jonathan R. Madsen bb25376480 Misc API cleanup and consistency fixes (#1023)
- ROCPROFILER_API after function
- use rocprofiler_tracing_operation_t in lieu of uint32_t where appropriate
- rocprofiler_tracing_operation_t is not int32_t typedef (formerly uint32_t)
- use const T* instead of T* where appropriate
2024-08-20 01:06:12 -05:00
Jonathan R. Madsen b15e498945 Add kernel profiling time info to counter collection records (#1000)
* Add kernel profiling time info to counter collection records

- lib/rocprofiler-sdk/kernel_dispatch
  - added profiling_time.{hpp,cpp}
  - restructured tracing.cpp
- updated queue.cpp AsyncSignalHandler
  - gets kernel dispatch profiling time and passes to dispatch_complete and signal callbacks
- structured some header includes to reduce cyclic include probability
  - originally, including kernel_dispatch/tracing.hpp in hsa/queue.hpp created a lot of cyclic includes

* Fix kernel_dispatch.cpp includes

* Fix kernel_dispatch.cpp

- include <cstring>
- replace use of ROCPROFILER_HSA_AMD_EXT_API_ID_NONE with ROCPROFILER_KERNEL_DISPATCH_LAST
2024-08-19 20:05:04 -05:00
Giovanni Lenzi Baraldi fa1b9e67ab ATT Agent fixes and improvements (#1011)
* Tidying ATT dispatch API. ATT Agent to be initialized with rest of profiler. Removing read_index-based wait.

* Formatting

* Adding some input validation

* Add perf test for agent

* Removing async
2024-08-15 13:57:13 -03:00
Jonathan R. Madsen 20e07caad4 Reorganize thread trace codeobj headers (#1001)
* include/rocprofiler-sdk/cxx/codeobj

- Relocated from include/rocprofiler-sdk/amd_detail/rocprofiler-sdk-codeobj

* Update include/rocprofiler-sdk/cxx

- cmake updates
- correct namespace rocprofiler::codeobj rocprofiler::sdk::codeobj

* Update codeobj tests and samples
2024-08-01 00:10:09 -05:00
Jonathan R. Madsen 16d535ef48 rocprofv3 OTF2 Output Support (#995)
* CMake support for OTF2 library

* Preliminary OTF2 generation implementation

* Completed OTF2 Support

- HSA API
- HIP API
- Marker API
- Async Memory Copies
- Kernel Dispatch

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix location type for dispatches

* Testing for OTF2 output

* Add OTF2 to requirements.txt

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix getting kernel name

* OTF2 testing with rocprofv3/tracing-hip-in-libraries

* Format external/otf2/CMakeLists.txt

* Update external/otf2/CMakeLists.txt

- guard CMP0135 for cmake < 3.24

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix duplicate string ref issue

* Update lib/rocprofiler-sdk-tool/generateOTF2.cpp

- fix header includes

* Update CI workflow

- sudo install pypi requirements for core-rpm for $HOME/.local installs

* Update pytest_utils/otf2_reader.py

- modifications for reading trace

* Update pytest_utils/otf2_reader.py

- misc cleanup

* Update CI workflow

- fix installer artifact naming

* Update pytest_utils/otf2_reader.py

- handle slightly overlapping kernel timestamps for MI300

* OTF2 attributes for category

* Testing with OTF2Reader category attributes

* Fix memory leak in OTF2 generation

- leaking OTF2_AttributeList
2024-07-30 19:57:19 -05:00
Jonathan R. Madsen 60b1dbfb6f Update HIP API tracing (#958)
- support HipDispatchTable additions for HIP_RUNTIME_API_TABLE_STEP_VERSION 1 thru 4
2024-07-08 17:12:53 -05:00
Giovanni Lenzi Baraldi 78fd8cb379 Returning code object id information in code_printing.cpp:Instruction (#965)
* Returning code object id information in code_printing.cpp:Instruction

* Adding assertions

* Simplifying decoder library
2024-07-08 16:59:40 -03:00
Giovanni Lenzi Baraldi a045947a89 Removing cache of decoded lines and returning shared_ptr (#953) 2024-06-25 16:00:59 -03:00
Jonathan R. Madsen af2f85ca93 Add logical_node_type_id field to rocprofiler_agent_t (#948)
* Add logical_node_type_id field to rocprofiler_agent_t

* Patch queue_controller
2024-06-24 23:18:58 -05:00
Jonathan R. Madsen 27fa455201 Fix documentation (#949) 2024-06-24 17:12:47 -05:00
Jonathan R. Madsen 12785ad365 Add HSA tracing support for hsa_amd_vmem_address_reserve_align (#946)
* Add support for hsa_amd_vmem_address_reserve_align

* Update lib/rocprofiler-sdk/hsa/types.hpp

- support HSA_AMD_EXT_API_TABLE_STEP_VERSION == 0x2 for HSA v1.14.0

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
2024-06-21 22:28:39 +05:30
Benjamin Welton 81d1407565 Incremental Counter Profile Creation (#933)
* Incremental Counter Profile Creation

Adds support for incremental counter creation. How this functions is the
behavior of rocprofiler_create_profile_config has been changed.

rocprofiler_create_profile_config(rocprofiler_agent_id_t           agent_id,
                                  rocprofiler_counter_id_t*        counters_list,
                                  size_t                           counters_count,
                                  rocprofiler_profile_config_id_t* config_id)

The behavior of this function now allows an existing config_id to be
supplied via config_id. The counters contained in this config will be
copied over and used as a base for a new config along with any counters
supplied in counters_list. The new config id is returned via config_id
and can be used in future dispatch/agent counting sessions.

A new config is created over modifying an existing config since there
is no gaurentee that the existing config isn't already in use. While we
could add locks (or other mutual exclusion properties) to check if its
in use and reject an update, the benefit from doing so is minor in
comparison to just creating a new config. This also side steps a common
pattern a tool may use to add additional counters at some point later on
during execution. Now they can do that without destroying the existing
config.

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-06-19 00:11:03 -07:00
Giovanni Lenzi Baraldi 9676295d3d ATT API changes - add user_data field and separation of dispatch vs agent profiling (#893)
* DRM Issue Fix for SLES 15 (#897)

* DRM Issue Fix

* Formatting Fix

* PC sampling: CID manager unit test (#898)

* Adding per-dispatch userdata field to ATT

* Clang tidy

* Formatting

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding dispatch_id, fixing user_data and update aql_profile_v2

* Formatting

* Tidy fixes

* Second fix for userdata

* removing assert for union

* Adding serialization. Created agent profiling-like thread trace

* Implemented agent thread trace

* Update source/lib/rocprofiler-sdk/hsa/aql_packet.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Restructured thread trace packets

* Added agent API tests

* Fixing multigpu for agent test

* Formatting

* Formatting

* Improving header locations

* Fixing merge conflicts

* Tidy

* Tidy

* Tidy

---------

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
2024-06-13 15:29:29 -03:00
Manjunath P Jakaraddi c49719649b SWDEV-465322: Adding support for Perfcounter SIMD Mask in ATT (#910)
* SWDEV-465322: Adding support for r Perfcounter SIMD Mask in ATT

* Apply suggestions from code review

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

* Adding unit tests

* Adding counters check for gfx9 and SQ block only

* Addressing review comments

* changing the struct size

* fixing header includes

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2024-06-12 16:25:06 -07:00
Vladimir Indic b4f7154716 The NULL value of an internal correlation ID defined (#901) 2024-06-10 16:12:01 +02:00
Vladimir Indic 211ee219c4 Disable PC sampling service if counter collection service is configured (#899) 2024-06-10 15:13:49 +02:00
Benjamin Welton 680448c444 Small doc update to remove restrictions no longer present (#917)
* Small doc update to remove restrictions no longer present
2024-06-06 16:30:48 -07:00
Jonathan R. Madsen 5525b400c3 Miscellanous AFAR 5 Updates (#891)
* Dispatch table copy/update uses ROCP_TRACE instead of ROCP_INFO

* Update rocprofiler-sdk CMake config

- rocprofiler::rocprofiler is alias to rocprofiler-sdk::rocprofiler-sdk instead of other way around

* Prefer rocprofiler-sdk::rocprofiler-sdk over rocprofiler::rocprofiler

* Fix WITH_UNWIND for glog

- requires a value of "none" instead of boolean now

* Update include/rocprofiler-sdk/registration.h

- explicit struct names to permit forward decl

* Update include/rocprofiler-sdk/cxx/serialization.hpp

- ROCPROFILER_SDK_CEREAL_NAMESPACE_BEGIN and ROCPROFILER_SDK_CEREAL_NAMESPACE_END to enable customized namespace
2024-05-29 16:45:56 -05:00
Giovanni Lenzi Baraldi 1b95089c28 Enable ATT continuous mode and code object tracing registration (#850)
* Adding ATT continuous mode and ATT code object tracking

* Fixing aql_packet.cpp

* Updating to aqlprofile codeobj changes

* Removing kernel packet from ATT dispatch callback

* Changing getSymbolMap() to return relative vaddr

* Tidy fixes

* Formatting

* Fix shadowing

* Fixing packet test

* Updating tests

* Simplifying multi-agent traces

* Adding dynamic codeobj tracking

* leftover book-keeping for codeobj markers

* Formatting

* Formatting

* Temporary removing codeobj marker

* Formatting

* Re-enabling codeobj tracking

* Making copy of coreapi table

* Fixing issues with toolData lifetile

* Formatting

* Fixing issues with ASAN

* Improving memory profile

* Removing misplaced annotation

* Fixing queue type and allowing shared_locks in globalThreadTracer

* Update logging

* Changing ATT formats to be more in line with the SDk (#883)

* Fixing some merge conflicts

* Fixing cmakelists

* Fixing merge conflicts

* Formatting
2024-05-29 11:09:28 -05:00