Commit-Graf

423 Incheckningar

Upphovsman SHA1 Meddelande Datum
Kuricheti, Mythreya f3ea8b1178 Fix fold expr (#389) 2025-05-09 14:18:19 -07:00
Kuricheti, Mythreya eaf3bbceb7 Add enum to string cxx utils (#153)
* Add enum to string cxx utils

* Add license header

* Address review comments

* Add version assertions

* Address review comments

* Fix thread trace header include

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-05-05 21:02:13 -05:00
Welton, Benjamin 65f60bbb96 Remove unnecessary log line (#387)
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-05-02 12:25:32 -07:00
Trowbridge, Ian e626df43eb Fix HIP Streams Duplication Error (#313)
* Fix stream duplication and fixed tests

* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code

* Removed roctx from CMakeLists.txt

* Updated documentation

* Fix documentation

* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table

* Added runtime initialization check for HIP

* Changed tool name, working on fixing memory management

* Added context for counter collection kernel rename combination

* Changed name from map to set and changed description

* Fix documentation description for group-by-queue

* Merged memory copy and kernel operations onto a single track when on the same stream

* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:

* Most pr comments addressed

* Added filter for counter collection and removed kernel buffer tracing hack

* Added PR comment fixes

---------

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-05-01 00:56:15 -05:00
Madsen, Jonathan 4f03ebc360 [CI] Fix code coverage and thread sanitizer workflows (#378)
* Fix code coverage workflow

* Relocate rocprofv3 conversion test script + rename tests

- these are rocprofv3 tests and were not properly located and not properly named

* Fix thread sanitizer

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-04-28 10:19:04 -05:00
Madsen, Jonathan 032c06db9a Fix evaulate_ast.evaulate_hybrid_counters test (#377)
- broken by #348

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-04-27 22:42:10 -05:00
Madsen, Jonathan 3580478426 Build system (libdw), correlation ID, and shebang fixes (#354)
* Fix compilation for output library

- link to targets for ATT (amd-comgr, dw, elf)

* Relax correlation ID retirement log failures

- only fail for correlation ID retirement underflow when building in CI mode

* Fix shebang for several files

- license was inserted before shebang in several places

* Update code coverage exclude folders for samples

* Tweak to agent tests

- test to make sure hsa agent is not the old value instead of testing that it is the new value

* Fix libdw include/link

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-04-27 20:16:18 -05:00
Kandula, Venkateshwar reddy 06005c7f6b [SDK] SWDEV-524163 - Add error msg when accumulate is used on counters not from sq block. (#348)
* add error log when accumulate is used on counters not from sq block.

* Address comments.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-04-27 20:03:27 -05:00
Meserve, Mark 61307442f0 [SDK] Fix for hang in hsa_barrier (#316)
* SWDEV-515895: Fix for hang in hsa_barrier

- Root cause of hang is enqueue_packet using a completed barrier
  - This occurs because set_barrier does not correctly mark the barrier
    as complete.
- Fixed by consolidating duplicate code for how a barrier is:
  - Marked as complete
    - Added common function and trace message
  - Checked as complete
    - Removed internal completion tracking variable in favor of signal
- Added some log messages which were useful in tracking this hang
- Destroying an hsa_barrier now calls the provided completion callback

* formatting

* Fix hsa_barrier test

- Removes assumption that enqueue_packet always generates packets
- Add assertion that a complete barrier does not generate packets

* formatting

* Address review comments
2025-04-27 19:55:32 -05:00
Baraldi, Giovanni a8f3397069 SWDEV-528686: ATT fix for gfx12 s_wait_idle. Fixes for csv. Default to parse to trace. Fix for ROCR_VISIBLE_DEVICES. (#345)
* Fix for gfx12 s_wait_idle. Added wait field on att.csv

* Format and default to ATT to trace

* Update .mds

* No fatal error for invalid agent

* Tidy fixes

* Rename wait to idle, removed uneeded headers

* Remove unused traceID

* Tidy fix

* Fix csv output

* Formatting

* Fix tests

* Fix tests

* Fix for visible devices

* Review comment: Fix cmake

* Review suggestion

* Remove changelog/readme

* Review comments

* Review comment for CSV

* Formatting

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-04-25 11:49:16 -05:00
Vaddireddy, Sushma d1e9e3917e Grid dimensions update (#329)
* Grid dimensions update

* minor edit

* CI compilation fix

* Adding target_compile_definitions

* Updating changes as per suggestions

* Skip checks for older grid dimension values

---------

Co-authored-by: Sushma Vaddireddy <svaddire@amd.com>
2025-04-21 15:33:37 -05:00
Elwazir, Ammar 8baea19df7 Older Glibc doesn't have gettid (#349)
* Older Glibc doesn't have gettid

* Format fix

* Using internal common get_tid

---------

Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
2025-04-21 10:22:50 -05:00
Nagaraj, Sriraksha 87badfbd15 [rocprofv3] signal handler fix (#332)
* rocprofv3: LD_PRELOAD for signal and sigaction

- wrappers around `signal` and `sigaction` to prevent applications which install signal handlers to replace the rocprofv3 signal handlers
- minor tweaks to buffer sizes (use page_size instead of
KiB)

* [DO NOT COMMIT] extra logging

* Switch git submodule url for perfetto

- use GitHub URL as this is more accessible

* Update ring_buffer<Tp>

- account for alignment padding

* Update buffered_output

- track number of bytes stored
- add nullptr checks

* Update tmp_file_buffer

- track number of bytes
- read_tmp_file does not create tmp file if it does not already exist

* Update tmp_file

- add exists member function for checking whether temporary file already exists
- tweak remove() implementation

* Update config.hpp

- add option to enable/disable signal handlers
- add option for minimum_output_bytes

* Make signal, sigaction functions visible

* rocprofv3 tool updates

- chained signals
- override the signal handler(s) installed by the application
- improve cleanup of temporary files
- support minimum output bytes

* Add commandline support

* fixing test

* minor fix

* minor fix

* fix clang issue

* fix

* Adding docs

* review comments

* review changes

* review

* YUV pulldown additions to rocdecode

* More rocdecode changes

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-04-17 21:10:52 -07:00
Baraldi, Giovanni 46818b0167 SWDEV-527202: Moving ATT to experimental (#335)
* Moving ATT to experimental

* Formatting + rebase

* Addressing review comments

* Formatting

* Update source/lib/att-tool/waitcnt/analysis.cpp

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-04-17 14:43:15 -05:00
Trowbridge, Ian d7c903654e Add buffer tracing header for rocdecode.hpp (#340)
* Add buffer tracing header for rocdecode.hpp

* Add new line for formatting
2025-04-16 13:07:44 -05:00
Kandula, Venkateshwar reddy 235a148ce6 [SDK] remove HIP_MEMSET_NODE_PARAMS for HIP step version to 13 (#343)
* Update api_args.h hip_memset_node_params to version 13

* Update format.hpp move version to 13 hip_memset_node_params
2025-04-16 10:34:48 -07:00
Kandula, Venkateshwar reddy 6cea857c6a [SDK] remove HIP_MEMSET_NODE_PARAMS for HIP step version >= 12. (#334)
* remove HIP_MEMSET_NODE_PARAMS for HIP step version >= 12.

* remove extra ifdef

* format.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-04-15 14:05:15 -07:00
Madsen, Jonathan c5a3edc3fa [Misc] Rework header includes (#311)
* Update header file includes

* Fix includes for lib/rocprofiler-sdk/hip/hip.hpp

* Minor touch ups

* Minor include improvements

* Doxygen tweak

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-04-15 14:02:12 -07:00
Kandula, Venkateshwar reddy bc85151a51 SWDEV-524130: add reduced sum counters for new mi355 counters (#328)
add derived counter for td, tcc, ta.

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-04-15 13:37:28 -07:00
Welton, Benjamin f143333df0 Add SerializedAtomicRatio counter (#327)
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-04-15 13:36:34 -07:00
Trowbridge, Ian 4fbcfd142c Copyright Compliance (#333)
* Added copyright information to requested files

* Formatting

* Fix bad function name error
2025-04-14 13:07:32 -05:00
Trowbridge, Ian 077723337a rocDecode Buffer Tracing Support (#315)
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing

* Updated perfetto to output args individually rather than as a string list

* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type

* Updated tests for review comments

* Test args exist and return value

* Updated to use string entry

* Change function name

* Updated PR to reflect review comments

* Updated for PR review comments

* Change function name
2025-04-11 21:56:36 +00:00
Kandula, Venkateshwar reddy a7f96dde29 SWDEV-524130: add missing mi355 counters and derived counters (#323)
* add missing counters from public doc.
;

* add reduce sum counter for mi355 tcc, tcp, ta.

---------

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-04-02 09:44:57 -07:00
Baraldi, Giovanni 48c672e23e SWDEV-523436: Fix logging of code object id=0. Add perfevent test. (#318)
* SWDEV-523436: Fix logging of code object id=0. Add perfevent test.

* Apply suggestions from code review

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-03-31 10:34:53 -07:00
Meserve, Mark a1fcdf7f83 Additional 1.0.0 changes (#317)
* Additional 1.0.0 changes

- Update VERSION
- Add beta compatibility for rocprofiler_agent_set_profile_callback_t

* Fix location of deprecated typedef rocprofiler_agent_set_profile_callback_t

* rocprofiler_record_counter_t -> rocprofiler_counter_record_t

* Experimental + deprecated annotations

* rocprofiler_record_dimension_info_t -> rocprofiler_counter_record_dimension_info_t

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-26 02:12:03 -05:00
Welton, Benjamin 4cd121e27b [SDK] Release 1.0 Public API Modifications (#277)
* Make sure all structs/enums can be forward declared

* Updates to counter collection

- consistency updates and cleanup

* Conversion of dimension information to info struct

* Added deprecated folder

* Testing changes

* merge changes

* Fix shadowed variable

* Source code formatting

* Fix shadowed variable

* Update rocprofiler_counter_info_v1_t member names

* Split version.h into version.h and ext_version.h

- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision

* profile_config -> counter_config

* EOF new line

* [Samples] Reduce header includes + reorg counter collection samples

* Misc compilation fixes

- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables

* Minor misc modifications

- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
  - move local anon namespace functions into rocprofiler::counters:: anon namespace
  - use std::string_view for get_static_string
  - const ref for get_static_ptr
  - misc namespace shortening

* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t

- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet

* [Tests] Improve async-copy-testing test

- relax constraints
- improve logging

* Update counter_config.h doxygen docs

* ROCPROFILER_SDK_BETA_COMPAT

- ppdef which helps with renaming when set to 1

* Remove spurious include

* Fix includes for cxx/version.hpp

* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet

* Public API Experimental Designation

- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries

* Fix use of assert instead of static_assert in hip/stream.cpp

* Use typedef instead of define for rocprofiler_profile_config_id_t

* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef

- added <rocprofiler-sdk/deprecated/profile_config.h>

* Doxygen for rocprofiler_{create,destroy}_profile_config

* ROCPROFILER_SDK_DEPRECATED_WARNINGS

* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1

* cmake formatting

* Misc variable renaming in samples and tests

* Fix declarations of types

* Fix hip stream tracing service struct name

- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t

* Rename "HIP_STREAM_API" to "HIP_STREAM"

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-24 12:07:33 +05:30
Trowbridge, Ian cd4676ae6f [SDK] Callback Tracing Iterate Args Support for rocDecode (#294)
Callback tracing for rocdecode created
2025-03-23 19:15:30 -05:00
Madsen, Jonathan e33dff7ad0 [SDK][rocprofv3] Buffer tracing records with args (HIP) (#285)
* [SDK][rocprofv3] HIP API buffer records with args (ext)

- New buffer tracing domain(s) for HIP APIs which include the arguments and the return value in the buffer records
- Update HIP stream support for extended HIP buffer tracing
- Update rocprofv3 tool library and output library to use extended HIP buffer tracing recods

* Update stream.cpp

- handle hipStream_t address being reused for a new stream

* Update doxygen docs for rocprofiler_iterate_buffer_tracing_record_args

* Update rocprofv3 tool.cpp

- configure buffer tracing services with HIP_*_API_EXT variants
- tweak logging level for hip_stream_display_callback

* Fix validation tests

- add HIP_RUNTIME_API_EXT and HIP_COMPILER_API_EXT to valid domain names

* Serialization support for buffer tracing args

* Disable stream service for __hipPopCallConfiguration

- this is interpreted as a stream create but it doesn't create a stream

* Fix execute_buffer_record_emplace for HIP extended contexts

* Add uint64_t_retval to rocprofiler_hip_api_retval_t union

- reading in hipError_t_retval during serialization of pointer return value causes undefined behavior

* Fix compilation warning about unused but set parameter

- in hip/stream.cpp

* Add synchronization for async_copy_data

* Fix compilation error

* Fix compilation error

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-22 19:57:32 -05:00
Madsen, Jonathan 2d072f9217 [CI] Miscellaneous Testing Updates (#305)
* Add rocprofiler-sdk-utilities.cmake

- contains cmake function rocprofiler_sdk_get_gfx_architectures

* Update perfetto_reader.py

- fix hash collision

* Update project names in tests folders

- rocprofiler-tests -> rocprofiler-sdk-tests

* Fix incorrect allocation-error handling

* [CI] Disable openmp tests for navi2, navi3, and navi4

* Suppress leaks by omptarget and llvm

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-22 18:51:42 -05:00
Indic, Vladimir 49ce79a5b5 [SDK][rocprofv3] MI300 Stochastic PC sampling (#92)
* MI300 Stochastic PC sampling SDK API implementation

* ROCProfV3: Stochastic PC sampling Support (#94)

* ROCProfV3: MI300 Stochastic PC sampling initial draft

* ROCProfV3: Initial Stochastic PC sampling Tests (#95)

ROCProfV3: Initial Stochastic PC sampling tests

* Update rocprofiler_pc_sampling_record_stochastic_v0_t

- update doxygen docs for members
- replace rocprofiler_correlation_id_t with rocprofiler_async_correlation_id_t

* Relax the check in JSON tests

* drain PC sampling buffer during finalize_rocprofv3

* Increase timeout for "Test Install Build" step

- 10 minutes -> 20 minutes
- "Test Installed Packages" has 20 minutes so "Test Install Build" should also

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-21 14:40:45 -05:00
Bhardwaj, Gopesh c06feccf2a Potential fix for code scanning alert no. 24: Use of potentially dangerous function (#220)
* Potential fix for code scanning alert no. 24: Use of potentially dangerous function

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* clang-format fix

* use std::localtime_r instead of localtime.

Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>

* localtime_r is defined in global namespace.

---------

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-03-21 14:21:49 +05:30
Kuricheti, Mythreya f27f76716e [SDK] Add Stack IDs (#269)
* Add Stack IDs

* Add memcpy test

* Add async corr id record

* Async events use `rocprofiler_async_correlation_id_t`
* Sync events use `rocprofiler_correlation_id_t`

* Update ATT to use asnyc IDs

* Review comments
2025-03-21 00:52:48 -05:00
Vaddireddy, Sushma ae0db8cee5 [SDK] Model Name fix for rocprofiler_lib.agent (#298)
* Model Name fix for rocprofiler_lib.agent

* fixing format

* formatting source

* Adding comments and example

---------

Co-authored-by: Sushma Vaddireddy <svaddire@amd.com>
2025-03-20 22:06:53 -05:00
Madsen, Jonathan 66e9dc54e9 [SDK] Memory copy src and dst addresses (#282)
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-20 21:10:19 -05:00
Kuricheti, Mythreya 6b6e17973f [CI] Disable debug annotations for page-migration test (#291)
fix: Disable debug annotations in test

Fixup of PR: disable perfetto debug annotations in json tool
2025-03-20 20:55:26 -05:00
Madsen, Jonathan 91f7f42104 [SDK] Update finalization and correlation ID retirement (#281)
* Update finalization and correlation ID retirement

- directly invoke finalize if only one client
- correlation_id_finalize

* Address PR comments

* Improve logging for correlation_id_finalize

* Fix correlation ID handling in memory allocation service

* Fix clang-tidy issues in hsa-memory-allocation test exe

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-20 16:59:23 -05:00
Baraldi, Giovanni b21452ec11 Fix for ATT codeobj table initialization (#290)
* Fix for codeobj HSA table order

* Fix tests

* Format

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-03-20 14:27:46 -07:00
Srihari Uttanur c9ca876b79 Add perfetto support for counter collection
Fix endtimestamp for counter tracks

Add fix for rocprofv3 counter collection tests

Fix formats and refactors

Added docs and addressed review comments

Address more review comments.
2025-03-21 01:41:19 +05:30
Madsen, Jonathan bcc15a28d0 [rocprofiler-sdk-att] Minor cmake update (#283)
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2025-03-20 10:57:35 -05:00
Vaddireddy, Sushma 09c7d44cc4 MI355X Support - PC Sampling and updating counter_defs.yaml (#206)
* Update mi350/gfx950 counter_defs.yaml (#131)

* Update gfx950 counter_defs.yaml

* Update F8 MFMA for gfx950

* Update counter_defs.yaml

* Update counter_defs.yaml

* add simd_util counter

* add new rdc ops gfx950

* Update counter_defs.yaml

* New mi350 CPC counters

* Update counter_defs.yaml

* New mi350 spi counters

* Update new mi350 sq counter_defs.yaml

* Update TA counter_defs.yaml

* Update TD GFX950counter_defs.yaml

* Update TCP gfx950 counter_defs.yaml

* Update new gfx950 tcc counter_defs.yaml

* Update TCP_PENDING_STALL_CYCLES counter_defs.yaml

* MI355X Host-Trap PC sampling Support (#130)

* Adding gfx12 to CU_NUM

* Add ELFABIVERSION_AMDGPU_HSA_V6

* add gfx950 to TEST_YAML_LOAD metric

* add gfx950 to append counters tests

* Updated CHANGELOG.md

---------

Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
2025-03-17 15:20:40 -05:00
Baraldi, Giovanni 821918a512 SWDEV-516846: Fix serialization services conflicts and ATT counter streaming (#230)
* Update TT API

* Rework serialization

* update att_core

* Fix tests

* Fix tool

* Formatting

* Fix perfcounter

* Formatting

* Rename agent TT

* Format

* Workaround for codeQL alert

* Tidy fix

* Fix compiler error

* Tidy

* Fix some tests

* Fixing some tests

* formatting

* Fixing ATT serialization

* Format

* Fix test commandline

* Fixing init order

* Format

* Tidy fixes

* Removing unused sample

* Fix tests and schema

* Added ATT + PMC test

* Fix mode

* Fix file mode

* Review comments

* Fix typo

* Review comments

* Review comments

* Fix missing id inc after review comment

* Review comments

* Suggested Fixes

* Testing changes

* Test fix

* Build fixes

* Minor build fix

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
2025-03-14 18:11:10 -07:00
Trowbridge, Ian ccd1e54293 HIP Streams to Queues Translation (#235)
* rocprofiler_stream_id_t: opaque handle for a stream

- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
  - rocprofiler_buffer_tracing_hip_api_record_t
  - rocprofiler_buffer_tracing_memory_copy_record_t
  - rocprofiler_callback_tracing_hip_api_data_t
  - rocprofiler_callback_tracing_memory_copy_data_t
---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-14 02:45:13 -07:00
Welton, Benjamin aa88dd44c7 [SWDEV-512693] Iteration based counter multiplexing (#272)
Adds iteration based multiplexing to counter collection. Counter groups can now be specified. These counter groups are collected on a device individually until a specified interval period is reached. When the interval is reached, the next counter group is set to be collected on subsequent kernel executions.

Supplies two new argument types that can be included in YAML/JSON inputs:

pmc_groups: an array of arrays containing the counter groups to run (i.e. [ ["SQ_WAVES", "GRBM_COUNT"], ["GRBM_GUI_ACTIVE"])
pmc_group_interval: the number of kernel invocations on a GPU of a group before rotating to the next group

Note: originally there was a random_seed_generator proposed in the linked ticket, that was not implemented since there are very few instances where you would want the selection of the groups to be randomly generated (and if you do, you can randomly generate the pattern and place it as a large list of groups in pmc_group).

All existing counter functionality should be preserved (selection of counters on specific devices only, profiling of only specific kernels, etc).

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-14 02:05:36 -07:00
Welton, Benjamin 007285272b [SWDEV-518071] Return HSA not loaded status (device counter collection) (#242)
* [SWDEV-518071] Return HSA not loaded status (device counter collection)

This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).

* Minor edit

* format

* [SWDEV-518081] Simplify Metric Loading (#243)

* [SWDEV-518071] Return HSA not loaded status (device counter collection)

This is a state that a caller would want to know about to understand if
they got no counters because of a failure or if they were trying to
collect counters too early (as is the case in the sample, which can
attempt to collect counters before HSA is inited).
* [SWDEV-518324] Add AST update support

Allows the ability for ASTs to be updated (instead of an unchangable
static value). Adds a shared pointer return type to protect against
static destructors/modifications from invalidating potentially in use
AST definitions. No functionality/use changes in this PR.
* [SWDEV-518593] Add updatable dimension cache + fix string issues (#252)

* [SWDEV-518593] Add updatable dimension cache + fix string issues

Updates dimension cache to use the same design pattern as AST/Metrics.

Fixes the string scoping issue seen in ASTs, which appears here as well.

* Add rocprofiler_create_counter

Creates derived counters based on input from the API. This PR does three
things:

1. Adds the API + test case
2. Validates that an AST can be constructed from the counter supplied.
3. Updates metrics, ast, and dimension caches to include the new metric.

Metric should be available for use immediately after the call completes.

Due to the regeneration of ASTs, this call should not be performed in
performance sensitive code.

* Suggestion fixes

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

* Minor tweak

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

* Fixes for comments

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-03-14 01:07:16 -07:00
Nagaraj, Sriraksha c30bb7cbda Adding agent-index (#189)
* Adding agent-index

* review changes

* review comments addressed

* minor fix

* fix CI failure

* review comments

* Fix agent index test and address review comments

* Build Fixes

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-14 00:51:32 -07:00
Trowbridge, Ian 6518c5463d Temporarily Fix Incorrect Kernel Perfetto Trace Duration due to Firmware Timestamp Bug (#134)
* Perfetto duration temp fix setup

* Add timestamp change amounts to ROCP Info

* Groups kernel dispatch info by agent and queue id before sorting. Midpoint interpolation is then performed on the sorted kernels

* Moved dispatch bins into the for-loop

* Fix compilation error by using const ref

* Modified for review comments

* Changed variable names
2025-03-13 20:40:03 -07:00
Verma, Saurabh cffda33d3c Fixes for runtime errors reported in id_decode.hpp:set_dim_in_rec() by Mi300 UndefinedBehaviorSanitizer job (#114)
* Initial fix for runtime error in id_decode.hpp:set_dim_in_rec()

* actual fix: corrected the handling of case where dim==1 (ROCPROFILER_DIMENSION_NONE)

* removing magic numbers

* minor fix

* fix for invalid bool value at runtime

* clang format

* build fix

---------

Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-13 20:17:32 -07:00
Baraldi, Giovanni 346c7149dd SWDEV-518826: Adding nullptr check after gpu name query (#257)
* Fix segfault on fail to query GPU name

* Format

* Review comments

* Format

* Review comment

---------

Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
2025-03-13 16:25:16 +00:00
Kandula, Venkateshwar reddy 8735ae4eb0 SWDEV-518356: added check to avoid out of range hip host to device. (#267)
added check to avoid out of range.

Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
2025-03-11 15:37:59 -05:00
Welton, Benjamin f7e94c1ee8 Add debug printing statement to packet submission (#212)
* Add debug printing statement to packet submission

Adds debug printing to packets being submitted to HSA Queue in device
counting mode.

* Minor change

* Small fix

* formatting

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
2025-03-10 14:02:30 -07:00