2
0

58 Cometimentos

Autor(a) SHA1 Mensagem Data
Gopesh Bhardwaj da457c9a43 [Documentation] rocprofv3 attach/detach (#1108)
* Fixing typo in script

* updating docs

* updating docs

* updating docs

* Update projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-process-attachment.rst

Co-authored-by: Mark Meserve <mark.meserve@amd.com>

* Update projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-process-attachment.rst

Co-authored-by: Mark Meserve <mark.meserve@amd.com>

---------

Co-authored-by: Mark Meserve <mark.meserve@amd.com>
2025-10-07 13:17:55 +05:30
Mark Meserve bf49039005 [rocprofiler-sdk][rocprofiler-register] Initial Attachment Support (#316)
* attach: milestone: API tracing

- This pairs with another commit in rocprofiler-sdk to fully
  function
- Add ptrace entry points for tool attachment
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup

- Remove hardcode for loading of tool library
- Make invoke registration functions public again

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-sdk

* attach: prestore overhaul

- Must be paired with commit in rocprofiler-sdk

* attach: add dispatch table rework

- Register will load the prestore library and provide entrypoints to sdk

* attach: formatting and cleanup

* attach: revise dispatch table scheme

* attach: formatting

* attach: milestone: API tracing

- This change must be paired with a change in rocprofiler-register to
  fully function.
- API tracing works at this commit
- Queue tracing not supported yet

* attach: cleanup and comments

* attach: Formatting and crash fixes

* attach: add attach duration

- Add option attach-duration-msec for attachment

* Formatting + sglang hang fix via signal handling

* Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues

* attach: proxy queue first draft

- Adds ability to trace with queues during attachment
- Must be paired with updated rocprofiler-register

* Allow null agents for scratch output

* attach: improve queue library interface

- Significant changes to force exported interfaces back to C
- Fixes bug with unknown agents at attachment
- Code objects' names may still be incorrect

* attach: add code_object support

- Kernel traces will now have names and all other information for launches
- Add capture of hsa_executable to the queue library
- Various logging improvements

* attach: rename queue library to prestore

* attach: prestore overhaul

- Must be paired with commit from rocprofiler-register
- Massive overhaul of code organization in prestore library
  - Separates registrations for different object types
  - Sets up future changes for initialization

* attach: add prestore dispatch table

- Removes linkage to prestore library from sdk

* attach: cleanup

* attach: formatting

* attach: fix input prompt not appearing

* attach: fix component name in cmake

* attach: revert change to export level

* Make prestore API public

* attach: update sdk attachment library WIP

- This commit is NONFUNCTIONAL

- Changes around structure to remove classes
- Seperate C linkage where needed
- Still needs updates to register for correct usage

* attach: update register with dispatch table WIP
- This commit is NONFUNCTIONAL

- Changes rocprofiler_register to handle dispatch table from attach
  library.
- Still needs changes in SDK with dispatch table usage

* attach: dispatch table wip
- This commit is NONFUNCTIONAL

* attach: move attach component into core

* attach: rename to rocprofv3-attach

* attach: add callbacks for new queues and code objects

* attach: finish dispatch table implementation

- Fixes kernel tracing

* attach: add cmake variable for attachment support

* feat: Add --attach alias for rocprofv3 with comprehensive attachment tests

- Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py
- Create comprehensive attachment test suite with CSV and JSON output validation:
- New attachment-test application for testing dynamic profiling scenarios
- Unified test script supporting both CSV and JSON output formats
- Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info
- Add CMake integration for automated attachment testing
- Support parameterized output directory and filename specification
- Implement proper environment setup for attachment queue registration

Tests verify successful attachment to running processes and capture of:
- Kernel dispatch traces with workgroup/grid dimensions
- Memory copy operations (H2D/D2H) with size validation
- HSA API call traces across multiple domains
- GPU/CPU agent information and capabilities

* Documentation Update

* attach: make attach script callable

* Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name

* attach: revert metrics library path changes

* Generic Attachment in Register (#942)

Remove tool references in register

* Add second param to attach call in rocprof register

* Add experimental reattachment support for ROCprofiler-SDK

This commit introduces experimental reattachment functionality allowing tools
to dynamically reattach to running processes with comprehensive design changes
to support multiple attach/detach cycles:

**Core Reattachment API:**
- Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks
- Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports
- Implement reattachment tracking in rocprofiler_register_attach to differentiate
initial attachment from reattachment cycles
- Add rocprofiler_register_invoke_reattach for handling reattachment requests

**Design Changes - Registration System Flow:**
The registration system now supports a dual-path initialization:

1. Initial Attachment Flow:
    - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations()
    - Full tool initialization with complete context setup
    - Sets prev_attached atomic flag to track state

2. Reattachment Flow:
    - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach()
    - Bypasses full re-initialization, calls client reattach callbacks instead
    - Preserves existing contexts and buffers, only reactivates profiling services

**Design Changes - Tool Library Loading:**
Enhanced rocprofiler-register library loading with function pointer resolution:
- Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers
- Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions
- Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability

**Design Changes - Context Management:**
Introduced dual context systems for attachment scenarios:
- get_contexts() - Original contexts for standard tool initialization
- get_attach_contexts() - Separate context map for attachment-specific lifecycle
- attach_init() - Creates contexts for ALL buffer tracing services using existing buffers
- attach_start() - Selectively starts contexts based on configuration options
- attach_detach() - Cleanly stops and destroys attachment contexts

**Design Changes - Buffer Management:**
Added reset_tmp_file_buffer() template for clean reattachment state:
- Properly closes and removes old temporary files
- Deletes existing file_buffer instances to prevent stale file position tracking
- Creates fresh file_buffer instances for clean reattachment cycles
- Addresses core issue where file position metadata becomes stale between cycles

**Design Changes - Environment Variable Injection:**
Added ROCP_REGISTERED_TOOL_ATTACH environment variable:
- Distinguishes attachment-loaded tools from LD_PRELOAD scenarios
- Enables registration system to apply attachment-specific logic
- Helps tools adapt behavior for attachment vs standard initialization

**Attachment Context Management:**
- Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle
- Add reset_tmp_file_buffer template for clean reattachment state management
- Implement get_attach_contexts() for tracking active attachment contexts

**Test Infrastructure:**
- Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite
- Include reattachment test scripts with unified attachment/detachment cycles
- Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info
- Add conftest.py for JSON and CSV data loading utilities

**Configuration Updates:**
- Update CMakeLists.txt to include reattachment tests in build system
- Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking
- Enhance rocprofiler-register library loading with reattach/detach function resolution

**Flow Impact Analysis:**
This design enables robust multi-cycle attachment by:
1. Preventing duplicate initialization on reattachment
2. Maintaining separate context lifecycles for attachment vs standard operation
3. Ensuring clean temporary file state between attachment cycles
4. Providing tools with explicit reattach/detach callback hooks
5. Supporting both programmatic and environment-based tool configuration

The experimental nature allows for iteration on the API while establishing
the foundation for production-ready dynamic profiling capabilities.

* Fix misc clang-tidy warnings/errors

* CMake Option and Environment Variable Updates

- CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT
- Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED ->

* Source reorganization

* Formatting + new lines at EOF

* Fix flake8 F841: local variable is assigned to but never used

* Update attachment test

- get rid of 5 second start delay
- add roctx

* Rework implementation

- Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach
- Add <rocprofiler-sdk/experimental/registration.h>
- TODO: Update process_attachment.rst

* Handle re-attachment options

- inherit options from previous attachment
- check previous options do not modify data collection services

* Fix support for tools w/o rocprofiler_configure_attach

- fix segfault when rocprofiler_configure_attach does not exist
- fix naming convention for functions accepting attach dispatch table
- cleanup rocprofiler_configure_attach implementation in rocprofv3 tool

* attach: remove unknown agent handling

- Change was from earlier commit, no longer needed

* attach: add error for attaching without library loaded

* attach: revise version numbering

* attach: register header revisions

* attach: clang format register

* attach: formatting

* attach: fix build failure

- Remove cross dependency into rocprofiler-sdk, fixes build on some systems

* attach: revise register library detection

* Update rocprofiler-register and attach library

- formatting
- proper signature of register_functor for rocprofiler-sdk-attach library callback
- remove get_dispatch_registration_table()

* Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion

* Fix output support for rocprofiler-sdk-tool

* Fix formatting

* Fix clang tidy errors

* Misc rocprofiler-sdk-attach fixes

* attach: add sigint handling to attach python

* tool README.md formatting

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* Fix buffered output issue

* attach: add errors for tool attach

* CI Fixes

* Rework tests

* attach: improve library loading in rocprofv3 attach

* formatting

* Update tests to use pytest framework

* Fix test_attachment_hsa_api_trace

* attach: catch ctypes exceptions

* attach: fix leak in registration

* attach: fix sanitizer tests

* attach: fix sanitizer tests further

* attach: disable attach asan tests

* attach: disable ubsan test

* attach: fix permissions in installed test package

* attach: formatting

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Claude Code <claude@anthropic.com>
Co-authored-by: Benjamin Welton <bwelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-09-18 18:10:45 -05:00
Julian Jose 2d3803da89 Update using-rocprofv3 documentation (#331)
* Update using-rocprofv3 documentation

* Update using-rocprofv3.rst

* Update using-rocprofv3.rst

* Update projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>

* Update projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst

Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>
Co-authored-by: Swati Rawat <120587655+SwRaw@users.noreply.github.com>
2025-09-11 12:11:04 +05:30
usrihari123 2449bfd483 Update the scratch memory docs with the new allocation_size field (#685)
* Update the scratch memory docs with the new allocation_size field

* Address review comment

---------

Co-authored-by: Srihari <srihariu1@gmail.com>
2025-08-28 17:37:06 +05:30
Bhardwaj, Gopesh f625253208 SWDEV-544115 Adding documentation for rocprofv3 advanced options (#516)
* SWDEV-544115 Adding documentaiton for rocprofv3 advanced options

* minor changes

* updating rocpd documentation

* updated changelog

* adressed Feedback

[ROCm/rocprofiler-sdk commit: 4120c12ed5]
2025-07-30 22:25:40 +05:30
Gill, Harkirat b88018d24d Update output file fields docs to correctly define Grid_Size (#526)
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: e948034c83]
2025-07-22 23:16:01 +05:30
Nagaraj, Sriraksha 28d2a8f5bb [rocprofv3-avail] - Add sample data (#514)
* Add sample data for avail and remove color code for non terminal output

* review comments

* review comments

* add documentation

* test fix

[ROCm/rocprofiler-sdk commit: 2447a85215]
2025-07-22 08:39:59 -07:00
U, Srihari 7243889d6a Add perfetto support for scratch memory (#303)
* Add perfetto support for scratch memory

* Updated tests and docs.

* Update docs data

* Added underflow check

* Record all free events to 0 bytes

* Add format

* Address review comment

* updated tests for scratch memory

* update scratch-memory tests.

[ROCm/rocprofiler-sdk commit: 6f2a5a9646]
2025-07-09 21:05:45 +05:30
Bhardwaj, Gopesh 1f4084c7b5 Adding rocpd documenation (#449)
* Adding rocpd docuemenation

* rocpd format

* CHANGELOG update and indexing

* Fixing links

* format fixes

* fixing table

* major edits

* fixed logical error

* fixing rocprofv3 avail

[ROCm/rocprofiler-sdk commit: 3e43b1f019]
2025-06-17 15:41:53 +05:30
Kandula, Venkateshwar reddy 1562e4573c [DOCS] SWDEV-534589 Update docs with new info in kernel_trace csv output (#438)
* Update docs with new info in kernel_trace csv output and add flag for csv in docs.

* Misc.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
Co-authored-by: Vaddireddy, Sushma <Sushma.Vaddireddy@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: 1c91774c6a]
2025-06-10 08:20:07 +05:30
Rawat, Swati 3f47d1fffa Doc review (#386)
* doc review

* more updates

* install title

* Update rocprofiler.h

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: c255ec5b5c]
2025-05-27 11:28:38 -05:00
Welton, Benjamin cfce653d86 [SDK] Standardize rocprofiler-sdk counter definition YAML schema (#370)
* Convert YAML Format

Convert YAML format and reader to properly read the YAML.

Comparison between output's from the YAML show only changes in ordering
of architectures (and ids).

* Test fixes

* Add script for converting the YAML schema to source/scripts

* Update documentation

* Change the extra counter code block to YAML

* Add missing new line at EOF

* remove name issues

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 33e43e66d3]
2025-05-14 13:31:51 -05:00
Trowbridge, Ian 24f054f509 Fix HIP Streams Duplication Error (#313)
* Fix stream duplication and fixed tests

* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code

* Removed roctx from CMakeLists.txt

* Updated documentation

* Fix documentation

* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table

* Added runtime initialization check for HIP

* Changed tool name, working on fixing memory management

* Added context for counter collection kernel rename combination

* Changed name from map to set and changed description

* Fix documentation description for group-by-queue

* Merged memory copy and kernel operations onto a single track when on the same stream

* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:

* Most pr comments addressed

* Added filter for counter collection and removed kernel buffer tracing hack

* Added PR comment fixes

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

[ROCm/rocprofiler-sdk commit: e626df43eb]
2025-05-01 00:56:15 -05:00
Madsen, Jonathan 871686ad40 [rocprofv3] Use -P for collection period shorthand option (#356)
* [rocprofv3] Use -P for collection period option

- Reserve -p for profiler attachment

* Update changelog

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: d2bde3ce27]
2025-04-27 20:18:26 -05:00
Bhardwaj, Gopesh 20bc0deaf1 doc improvements for 1.0.0 (#367)
* correcting rocprofiler_configure

* Fix indexing order

* doc feedback

[ROCm/rocprofiler-sdk commit: bbe9eab53a]
2025-04-24 17:05:22 +05:30
Bhardwaj, Gopesh 1646c6fdd0 Using miniconda docker (#366)
* Using miniconda docker

* remove sudo

* Remove double install of rocprofiler-docs conda environment

* Fix building docs

* Fix build docs

- Additional system packages

* Using miniforge

* Fixing warning as errors build issue

* cmake formatting

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 024cf0e5e3]
2025-04-23 23:52:03 -05:00
Bhardwaj, Gopesh 93abda4cfd Copilot suggestions (#360)
* Copilot suggestions

* Fixing perfetto links

* correcting default value of agent-index

[ROCm/rocprofiler-sdk commit: 1f1c192a5e]
2025-04-22 20:52:37 +05:30
Nagaraj, Sriraksha 2e7d0b3aec [rocprofv3] signal handler fix (#332)
* rocprofv3: LD_PRELOAD for signal and sigaction

- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)

* [DO NOT COMMIT] extra logging

* Switch git submodule url for perfetto

- use GitHub URL as this is more accessible

* Update ring_buffer<Tp>

- account for alignment padding

* Update buffered_output

- track number of bytes stored
- add nullptr checks

* Update tmp_file_buffer

- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist

* Update tmp_file

- add exists member function for checking whether temporary file already exists
- tweak remove() implementation

* Update config.hpp

- add option to enable/disable signal handlers
- add option for minimum_output_bytes

* Make signal, sigaction functions visible

* rocprofv3 tool updates

- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes

* Add commandline support

* fixing test

* minor fix

* minor fix

* fix clang issue

* fix

* Adding docs

* review comments

* review changes

* review

* YUV pulldown additions to rocdecode

* More rocdecode changes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: 87badfbd15]
2025-04-17 21:10:52 -07:00
Bhardwaj, Gopesh 8778237298 doc improvements for 1.0.0 part 2 (#330)
* update installation steps

* Github Issue #50 Adding README's for samples

* Making name change to ROCprofiler-SDK for consistency

* Fix HIP trace documentation

* Fix HSA trace in docs

* Fix kernel trace in docs

* Fixing memory copy and memory allocation traces

* runtime trace and sys trace doc update

* Fix scratch memory doc

* kernel naming and filtering options

* Adding collection period in docs

* Perfetto configs update

* summary output file

* kernel trace format fix

* update CHANGELOG

* Agent index doc update

* rocm-smi output

* group by queue option

* Updated --group-by-queue description

* perfetto visualization

---------

Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>

[ROCm/rocprofiler-sdk commit: ca7cce9e81]
2025-04-15 13:30:07 -07:00
Trowbridge, Ian 5467c83188 rocDecode Buffer Tracing Support (#315)
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing

* Updated perfetto to output args individually rather than as a string list

* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type

* Updated tests for review comments

* Test args exist and return value

* Updated to use string entry

* Change function name

* Updated PR to reflect review comments

* Updated for PR review comments

* Change function name

[ROCm/rocprofiler-sdk commit: 077723337a]
2025-04-11 21:56:36 +00:00
Bhardwaj, Gopesh 146169577b doc improvements and fixes SWDEV-523395,SWDEV-516979 (#314)
* doc improvements and fixes SWDEV-523395,SWDEV-516979

* Adding changes from PR 231

[ROCm/rocprofiler-sdk commit: 6d6eec230c]
2025-03-26 10:09:08 +05:30
Madsen, Jonathan 85897f3588 [rocprofv3] Support negating aggregate tracing options (#251)
* Support negating aggregate tracing options

- E.g. --runtime-trace --scratch-memory-trace=False

* Add tests

* Update CHANGELOG

* rocprofv3 tweaks

* Added docs update

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Srihari Uttanur <srihari.u@amd.com>

[ROCm/rocprofiler-sdk commit: b01465303b]
2025-03-21 18:22:39 -05:00
Srihari Uttanur b2c0f91aef Add perfetto support for counter collection
Fix endtimestamp for counter tracks

Add fix for rocprofv3 counter collection tests

Fix formats and refactors

Added docs and addressed review comments

Address more review comments.


[ROCm/rocprofiler-sdk commit: c9ca876b79]
2025-03-21 01:41:19 +05:30
Trowbridge, Ian 7aeaffd871 HIP Streams to Queues Translation (#235)
* rocprofiler_stream_id_t: opaque handle for a stream

- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
  - rocprofiler_buffer_tracing_hip_api_record_t
  - rocprofiler_buffer_tracing_memory_copy_record_t
  - rocprofiler_callback_tracing_hip_api_data_t
  - rocprofiler_callback_tracing_memory_copy_data_t
---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: ccd1e54293]
2025-03-14 02:45:13 -07:00
Welton, Benjamin c08db2daa1 [SWDEV-512693] Iteration based counter multiplexing (#272)
Adds iteration based multiplexing to counter collection. Counter groups can now be specified. These counter groups are collected on a device individually until a specified interval period is reached. When the interval is reached, the next counter group is set to be collected on subsequent kernel executions.

Supplies two new argument types that can be included in YAML/JSON inputs:

pmc_groups: an array of arrays containing the counter groups to run (i.e. [ ["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"])
pmc_group_interval: the number of kernel invocations on a GPU of a group before rotating to the next group

Note: originally there was a random_seed_generator proposed in the linked ticket, that was not implemented since there are very few instances where you would want the selection of the groups to be randomly generated (and if you do, you can randomly generate the pattern and place it as a large list of groups in pmc_group).

All existing counter functionality should be preserved (selection of counters on specific devices only, profiling of only specific kernels, etc).

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

[ROCm/rocprofiler-sdk commit: aa88dd44c7]
2025-03-14 02:05:36 -07:00
Rawat, Swati f7d1f14c60 Documentation updates (#236)
* Documentation updates

* formatting

* Update using-rocprofv3.rst

* Update counter_collection_services.md

---------

Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>

[ROCm/rocprofiler-sdk commit: 31b8f61c8e]
2025-02-28 10:10:26 +05:30
Trowbridge, Ian 74efdad1aa rocJPEG API Tracing (#73)
* rocDecode API Tracing support

* Test bin file added to rocdecode. Need to add validate python methods

* Added option to not make rocDecode tests

* Added rocdecode and rocprofv3 tests

* Added csv test

* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI

* Add option to avoid building rocdecode tests

* Added option to avoid building rocdecode bin file

* Support for rocJPEG API Trace

* Added newline to rocjpeg_version.h

* json-tool code added, initial test/bin commit

* Formatting

* Resolved rocjpeg bin test compilation errors

* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed

* Formatting and compilation errors

* Minor fixes

* Copyright year update and minor fixes

* Doc update fix

* Added rocjpeg csv file in data

* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes

* Added rocdecode and rocjpeg to CI

* Removed rocdecode and rocjpeg from CI and added back build tests option

* Updated Cmake Files

* Added rocDecode and rocJPEG to CI

* Remove cmake line added in error

* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes

* Added find_package for test

* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path

* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests

* Resolve merge conflicts and formatting

* Added regex find and replace instead of include for CI

* VAAPI package causing errors on Vega20

* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved

* Removed workflows regex

* Formatting and minor test modification

* Modified test for vega20

* Update rocDecode and rocJPEG cmake and tests

* Changelog

* Fix merge conflict

* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing

* Removed if found statements, replaced with TARGET:EXISTS

* Skip json file for rocjpeg and rocdecode tests if not supported

* Add os import

---------

Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 31fe8858d1]
2025-02-21 13:43:49 -08:00
Kandula, Venkateshwar reddy 7fde16067f Accum_vgpr support in Rocprofv3 (#70)
* output accumulate vgpr count

* fix logic for computing accum_vgpr

* add accum_vgpr to csv.

* accumulation vgpr's docs and support for rocprofv3

* CHANGELOG.md

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>

[ROCm/rocprofiler-sdk commit: 6427fbafc2]
2025-02-12 10:47:46 -08:00
Bhardwaj, Gopesh 9874a65bea output format envs doc update (#173)
[ROCm/rocprofiler-sdk commit: 075d36eb82]
2025-02-11 21:37:12 -06:00
Bhardwaj, Gopesh a8a82d7263 Adding ROCTx usage doc (#159)
* Adding Roctx usage doc

* updated CHANGELOG

* dpc update

* Fixing Related Pages issue

* updating doc

* updating docs

* Adding Resource naming section

* Fixed Formatting

* format fix

* format fix

* Fixing build due to incorrect indentation

[ROCm/rocprofiler-sdk commit: 12508b9521]
2025-02-05 11:04:24 -06:00
Nagaraj, Sriraksha 4282aa31d9 Adding att v3 support (#84)
* Adding att v3 support

* misc fix

* bug fix

* Python linting workflow and rules

* fix regex

* Adding temporary args

* fix temporary args

* fix format

* remove att_perfcounters from test input

* Review comments (#163)

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

* Revert "Review comments (#163)"

This reverts commit 9ef0f8e5a4489d5581255e1b70ced2aef5c1c1d0.

* Address review comments 2

* review changes

* review comments

* review

* cmake alias

* review

* review

* review

* review

* Enabling percounter in v3 script

* review

* formatting

* formatting

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Baraldi, Giovanni <Giovanni.Baraldi@amd.com>
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>

[ROCm/rocprofiler-sdk commit: d4a51e4102]
2025-02-04 04:05:38 -06:00
Trowbridge, Ian 0b0a3d473a Documentation Update to reflect that memory allocation trace records null pointers for free operations (#127)
Update documentation to reflect that nullpointers can be recorded in free memory operations

[ROCm/rocprofiler-sdk commit: 73e72bb088]
2025-01-22 11:20:50 -06:00
Trowbridge, Ian 3e1b8ba4ec rocDecode API Tracing Support (#49)
* rocDecode API Tracing support

* Test bin file added to rocdecode. Need to add validate python methods

* Added option to not make rocDecode tests

* Added rocdecode and rocprofv3 tests

* Added csv test

* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI

* Add option to avoid building rocdecode tests

* Added option to avoid building rocdecode bin file

* Merge conflict error

* CMake files changed in response to review comments. Attempting to implement callbacks.

* Turned off test building for rocdecode

* Minor fixes for review comments

* Review comments

* Updated formatting

* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.

* Remove iterate args support

* Remove iterate-args

* enforce abi versioning in macro if

* Fix doc error

* removed spaces to fix indentation error

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

[ROCm/rocprofiler-sdk commit: e307b89ca4]
2025-01-17 14:42:25 -08:00
Jakaraddi, Manjunath 324c57ede1 SWDEV-500520: Updated documentation for hang issue (#79)
* SWDEV-500520: Updated documentation for hang issue

* Avoid fatal error when invalid metric is found

* removing invalid metrics

* clang formatting

[ROCm/rocprofiler-sdk commit: dfee6489b1]
2025-01-16 02:14:22 -08:00
Bhardwaj, Gopesh a4aa2fd14e miscellaneous doc updates (#86)
* miscellaneous doc updates

* updated deprecartion message

* Updated memory allocation tracking documentation

* Update comparing-with-legacy-tools.rst

Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>

* Update comparing-with-legacy-tools.rst

Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>

* Update comparing-with-legacy-tools.rst

---------

Co-authored-by: Ian Trowbridge <ian.trowbridge@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>

[ROCm/rocprofiler-sdk commit: 81f600e3ba]
2025-01-14 11:17:45 -06:00
Nagaraj, Sriraksha fd9da7dc43 Updating rocprofv3 doc for pc sampling beta option (#59)
* Updating rocprofv3 doc for pc sampling beta option

* Update source/docs/rocprofv3_input_schema.json

* Update using-rocprofv3.rst

---------

Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>

[ROCm/rocprofiler-sdk commit: c509fe799d]
2024-12-06 17:41:28 -06:00
Trowbridge, Ian 792329fefd SWDEV-492625 memory free functions (#11)
* SWDEV-492625: Track free memory HSA functions to help determine total amount of memory allocated on the system at any one time

* Minor fixes to address comments

* Update allocation size description

* Moved get function back to specialization, minor typo fixes

* Removed memory_operation_type field, removed memory_pool allocation enum, converted starting address to hex string for json format.

* Made conversion to hex_string a function, changed address to use union rocprofiler_address_t type, changed VMEM descriptors

* Removed as_hex from the global namespace

* Formatting

* Removed TRACK_EVENT for memory allocation, now TRACK_COUNTER for memory allocation is being performed

* Check if address was recorded before retrieving allocation size in generate Perfetto

* Formatting

* Update source/lib/output/generatePerfetto.cpp

* Explicitly disable app-abort tests

* Remove excluding app-abort test from workflow CI

- redundant bc these tests are explicitly marked as disabled now

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: 79006bb896]
2024-12-06 00:05:30 -06:00
Elwazir, Ammar 90e3a30627 Adding --collection-period feature in rocprofv3 to match v1/v2 parity (#9)
* Adding Trace Period feature to rocprofv3

* Adding feature documentation

* Update source/bin/rocprofv3.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Fixing format

* Moving to Collection Period and changing the input params

* Format Fixes

* Fixing rebasing issues

* Removing atomic include from the tool

* Adding more options for units, optimizing the code

* Fixing rocprofv3.py

* Fixing time conv & adding time controlled app

* Fixing format

* Changing to shared memory testing methodology

* use of shmem use

* Fix include headers for transpose-time-controlled.cpp

* Format upload-image-to-github.py

* Removing shmem and using only env var to dump timestamps from the tool

* Tool Fixes + Test Config

* Adding Tests

* Fixing Review comments

* Update trace period implementation

* Update trace period tests

* check between start and stop timestamps

* Merge Fix

* Update validate.py

* Improve safety of rocprofiler_stop_context after finalization

* Pass context id to collection_period_cntrl by value

* Adding 20 us error margin

* Ensure log level for collection-period test is not more than warning

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

[ROCm/rocprofiler-sdk commit: a579c70b71]
2024-12-06 02:17:24 +00:00
Bhardwaj, Gopesh 9ef758936f updating roctx documentation for functions (#30)
updating roctx documentation for funcitons

[ROCm/rocprofiler-sdk commit: 6d2e70d8da]
2024-12-05 19:47:57 +05:30
Nagaraj, Sriraksha e67918dfe8 rocprofv3: rocprofv3-avail tool (#15)
* support avail tool

Updating avail library and script

Listing on Std output incase the output folder is not given

Extending list metrics test

misc fix

misc fix

fixing memory leak

changing list-metrics to list-avail

fixing formatting issue

Fixing CMakeLists

Add test for list avil with trace

Fix test fail

clang tidy errors fixed

Removing build commands for rocprofv3-trigger-list

Addressing review changes

addressing review comment

moving avail to libexec

merge fix

Fix test failures

updating doc

Fix doc error

* updating legacy doc

* fix formatting issue

* Addressing review comments

[ROCm/rocprofiler-sdk commit: c42bdc3128]
2024-12-04 18:34:10 -06:00
Nagaraj, Sriraksha 9cfbfb5060 rocprofv3: PC Sampling Support (#14)
* Adding tool pc sampling support

Fixing merge issue

tool support on SDKupdates

link amd-comgr

Sanitizer failure fix

fix format

Addressing review comments

misc fix

Adding dispatch id to the CSV output

AddingCHANGELOG

[ROCProfV3][PC Sampling] Initial ROCProfV3 PC sampling tests for JSON and CSV formats (#17)

ROCProfV3 initial tests for JSON and CSV output.

Simple kernels that simplify the verification of samples to instruction decoding
has been introduced.

removing option to enable pc sampling explicitly

Adding documentation

no pc-sampling option in tests anymore

Addressing review comments

Updating docs

an option for choosing whether all units must be sampled

try ignoring PC sampling tests (#36)

* run pc-sampling tests on MI2xx runners
* use v_fmac_f32 instead of s_nop 0 in tests

* fixing docs

[ROCm/rocprofiler-sdk commit: 50b185b9ac]
2024-12-04 18:32:48 -06:00
Benjamin Welton 39db3e8a1d Add rocprofiler_load_counter_definition (#1193)
Adds rocprofiler_load_counter_definition. This function allows a counter definition file to be supplied to rocprofiler-sdk directly. Takes in a string containing the counter definition YAML, its size (in bytes), and a flag value to state whether this is an append operation or not.

---------

Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: usrihari123 <srihari.u@amd.com>

[ROCm/rocprofiler-sdk commit: 7ddc72ad45]
2024-11-22 01:55:47 -08:00
Gopesh Bhardwaj 5bea1772ea SDK doc updates (#1183)
* correcting usage example

* rccl trace

* Adding Navi power state limitation

* Addressed feedback

* kernel-rename

* kokkos trace

* more information on kookos tracing

* Corecting tool library hardcoding

* summary domains

* Updating domain stats file

* updating images

* rocprofv3 default behavior update

* Removing README from API documentation

* Added missing description in Topics

* Fixed wrong rendering of README in API document

* Fixing Topics in API docs

* Removing API doc for details/rccl.h

* Addressed review comments

[ROCm/rocprofiler-sdk commit: 7ea9ced493]
2024-11-22 12:05:11 +05:30
itrowbri 94f4f56c40 Memory Allocation Tracking (#1142)
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions

* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected

* Debugging and implementing generateCSV function

* Memory allocation size and starting address outputted to csv and json file formats

* Formatting

* Initial setup for OTF2 and Perfetto generation

* Collecting agent id for memory_allocation and formatting

* Modified memory_allocation.cpp to set up code for AMD_EXT commands

* Support for memory_pool_allocate added

* Removed accidently added file

* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works

* Formatting

* Fixed perfetto and otf2 output

* Fixed flag issue due to incorrect buffer use

* Updated documentation

* Small cleaning and comments

* Added test for HSA memory allocation tracing

* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error

* Decreased lower limit of hip calls for test

* Modified summary tests to vary number of allocate requests

* Minor fixes to address comments. Still need to address OTF2 comments

* Fix docs and changed OTF2 to use enum for type specified in location_base construction

* Fixed schema error

* Added vmem command tracking. Need to add test

* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.

* OTF2 enum update and mispelling fix

* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API

* Update CMakeLists.txt for memory allocation test

* Updated summary test

* Minor fixes to address comments

* Moved domain_type.hpp enum to before LAST

* Fixed compile errors and formatting

* Fixed stats summary domain name error

* Added rocprofv3 test

* Page migration test fix

* Undo page migration test changes. Failures do not appear to have to do with memory allocation

[ROCm/rocprofiler-sdk commit: 3bd7773cf7]
2024-11-18 20:22:14 -06:00
srawat d8cfdd2887 Refactor API reference docs (#1125)
* Refactor API reference docs

* refactor API ref docs

* corrections

* consistent naming

* updates

* Update CHANGELOG.md

* improving SEO

* improving SEO

* Update using-rocprofv3.rst

* Update counter_collection_services.md

* Update using-rocprofv3.rst

* Fixing doc build errors

* changelogs and some formatting issues

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: 4204042ac6]
2024-10-30 19:39:08 +05:30
Manjunath P Jakaraddi fa35b62708 SWDEV-493402: Changing the header format for counter collection (#1148)
* SWDEV-493402: Changing the header format for counter collection

* Adding column widths

[ROCm/rocprofiler-sdk commit: f087debe84]
2024-10-29 10:11:29 +05:30
Jonathan R. Madsen 651e28f7f0 rocprofv3: support specifying HW counters via command line (#1130)
* rocprofv3: support specifying PMC counters via command line

- E.g. `rocprofv3 --pmc SQ_WAVES -- <app>`

* Update CHANGELOG

* updated rocprofv3 help and documentation

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: d93d4d9463]
2024-10-25 02:49:30 -05:00
Gopesh Bhardwaj 63c5f87b85 rocprofv3: docs and help menu updates (#1129)
* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence

[ROCm/rocprofiler-sdk commit: 320427b5f5]
2024-10-17 13:28:53 +05:30
Manjunath P Jakaraddi ad4b331279 Adding start and end timestamp columns in csv (#1128)
* Adding start and end timestamp columns in csv

* Adding assert check for the counter timestamps

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

[ROCm/rocprofiler-sdk commit: 673c21c6db]
2024-10-17 11:31:00 +05:30
Gopesh Bhardwaj d7aa58e291 comparing tool options in rocprof/rocprofv2/rocprofv3 (#1050)
* comparing tool options in rocprof/rocprofv2/rocprofv3

* Added more perfetto options

* Added new summary optipons

* Added Category for all options

* Apply suggestions from code review

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* changes requested in SWDEV-484472

* Correct broken table formatting

* detailed explanation on json custom format

* Feedback Resolution

---------

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

[ROCm/rocprofiler-sdk commit: 863b608ae3]
2024-09-16 20:08:11 +05:30