Graf commitů

42 Commity

Autor SHA1 Zpráva Datum
lloginov-amd e49b501e9a Add scratch memory support (#2211) 2026-01-19 16:24:30 +01:00
Milan Radosavljevic 318d13870f [rocprofiler-systems] Update logging to use spdlog library (#2428)
## Motivation

- Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Better performance through compile-time formatting
- Consistent formatting using fmt library
- Runtime log level control via arguments and environment variables
- Easier maintenance and debugging capabilities

## Technical Details

- Added spdlog as a submodule and integrated it into CMake build system
- Created new `rocprofiler-systems-logger` library wrapping spdlog functionality
- Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.)
- Implemented log level control through command-line arguments and environment variables
- Converted assertion macros to proper error handling with exceptions and std::abort()
2026-01-14 15:27:51 -05:00
Sajina PK b3f59a37e4 [Rocprofiler-system]: Fix GPU event enumeration for rocprof-sys-avail and CLI option for parsing GPU HW Counters (#2476)
## Motivation

The `rocprof-sys-avail -H -c GPU` command is returning blank output which is expected to display a list of available GPU hardware counters instead.
The `rocprof-sys-sample` and `rocprof-sys-run` is missing the `--gpu-events` option for specifying GPU counter events during profiling.

## Technical Details

The initialize_event_info() function had a logic bug where it only called set_agents() if the agent_manager was empty, but the actual issue was that the gpu_agents and cpu_agents vectors were empty even when agents were discovered.
Fixed the conditional logic to properly call set_agents() when gpu_agents and cpu_agents are empty, regardless of the agent_manager state.

Added the `--gpu-events (-G)` option which sets the `ROCPROFSYS_ROCM_EVENTS` environment variable to the specified values.

Fixes an issue where unsupported GPU/APU arch is being skipped gracefully - more details about this issue in the below comment.
2026-01-09 11:59:45 -05:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
Milan Radosavljevic 666e76deac [rocprofiler-systems] Add cached demangler and replace old demangle (#2135)
* Add cached demangler and replace old

* Add unit tests

* Applied suggestions from code review

* Applied suggestions from code review
2025-12-16 08:32:18 +01:00
marantic-amd 3b11e01716 Perfetto traces from cached data (#1704)
## Motivation

The idea is to unify the way and place where we store our traces. Current implementation uses `trace_cache` for rocpd traces, but perfetto is in lined inside of each module. This change allows us to have a single point in code where we will collect data, process it and store it in the desired format. This means that we can declutter the code further and have single point of responsibility and single point of failure.

## Technical Details

New `processor` (perfetto_post_processing.cpp) is added to the `trace_cache` which purpose is to use the cached data to populate perfetto tracks. Cache manager is responsible for keeping the instance of this processor and for its lifetime.
2025-12-01 09:59:16 -05:00
Kian Cossettini b506c75f28 [rocprof-sys] Fix roctx wall clock tree, change timemory push/pop to use proper category, and add roctx as valid domain choice (#2062)
When doing this ticket, I also noticed the program would SEGFAULT when ROCPROFSYS_ROCM_DOMAINS=roctx even though the docs tell us we can do this. Went ahead and fixed that.

Also noticed that timemory push/pop in rocprofiler-sdk.cpp was always using category::rocm_marker_api instead of CategoryT. Fixed that as well.
2025-12-01 09:50:58 -05:00
Kian Cossettini 76a23eab14 [rocprofiler-systems] Add support for ompt_callback_thread_begin (#1681)
* Add thread_begin callback

* Make OMPT callbacks that are instant have start_ts = end_ts
2025-11-26 13:38:04 -05:00
Milan Radosavljevic 4d670099fa [rocprof-sys] Refactor trace_cache architecture with improved type erasure and processing pipeline (#1710)
- Redesigned buffer_storage with a flush_worker pattern for better thread management and resource cleanup
- Introduced type-safe abstractions through new components: cacheable.hpp, cache_type_traits.hpp, sample_processor.hpp, and type_registry.hpp
- Optimized type erasure implementation in sample processor to reduce runtime overhead
- Renamed rocpd_post_processing to rocpd_processor and restructured the processing pipeline
- Removed storage_parser.cpp and integrated functionality into header-based template implementation
- Enhanced cache_manager with improved processing workflow and better separation of concerns
2025-11-20 14:18:13 -05:00
Milan Radosavljevic a77be32660 Prevent duplicated sdk events (#1826) 2025-11-13 22:36:36 -05:00
habajpai-amd 590c6c3b4f fix: null pointer after delete in get_stream_id (#1720) 2025-11-06 23:43:34 -05:00
habajpai-amd ea31a0bf18 rocprofiler-sdk: fix per-record group_by_queue scoping (#1676)
* rocprofiler-sdk: fix per-record group_by_queue scoping

* added under resolved issues to CHANGELOG.md

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-05 21:46:44 -05:00
David Galiffi 4b0fb2cdf5 Rename "corr_id" to "stack_id" in Perfetto annotations to match new n… (#1618)
* Rename "corr_id" to "stack_id" in Perfetto annotations to match new naming in schema.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* correlation_id.ancestor was not added until ROCPROFILER_VERSION 1.0

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-11-04 14:20:11 -05:00
Kian Cossettini 883caf2719 [rocprofiler-systems] Overhaul skip condition of implicit_task and add ROCPD validation test (#1589)
- Add rocpd validation check and fix implicit_task check
- SWDEV-562896
2025-10-31 09:59:23 -04:00
Milan Radosavljevic 8806be162c Change how cache manager handles child process trace cache for rocpd (#1033)
* Change how cache manager handles child process trace cache

* Sampling and backtrace metrics to cache

* Apply cmake formatting

* Fix parsing of metadata json

* Code clean up

* Fix build nlohmann json from source

* Fix storage parsed finished callback

* Revert sampling for child process

* Change cache file name generating

* Fix thread start stop

* Fix process start end timestamp

* Applied suggestions from code review

* Try with late start of flushing task thread

* Change dockerfiles for ci

* Revert changes on github workflows

* Remove json_fwd.hpp include

* fix dump

* Build nlohmann/json by default

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update location of build artifacts for nlohmann/json

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Revert use_output_suffix

* Remove unused logs

* Fix cache store inside counter due to structure change

* Remove decode tests from debian ci

* Fix issue where all databases have the same UUID (#1499)

Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>

* Removing the cpack and install steps to save space

* Revert "Remove decode tests from debian ci"

This reverts commit ddabf6dd142dcf438e6b8997b8abe86f2c868468.

* Revert "Removing the cpack and install steps to save space"

This reverts commit 973da3a1ba99d99d529af5269d30e177092f9bfa.

* Add prepare-runner job as dependency to clean up the space

* Fix formatting

* Free up even more space

* Remove verbose for workflows

* remove hw_counters from ext_data

* move space clean up inside container

* try to remove external folder to free up space

* Check space

* Refactor Cleanup to it's own step

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
2025-10-24 11:47:15 -04:00
Milan Radosavljevic 48fdcebf62 Add caching of category region for rocpd (#1420)
* Add caching of category region

Fix vaapi traces

Remove region_with_name

* Applied suggestions from code review
2025-10-20 16:05:14 -04:00
Kian Cossettini 0c53a12a88 [rocprofiler-systems] [ROCpd] Add OMPT callbacks to ROCpd (#1016)
* Add OMPT to ROCpd

* Use correct category

* Added wrapper functions for future control

* Formatting

* Fix naming

* Comment change

* Remove ompt_get_cb_args

* Switched to using region_sample for OMPT

* Remove relic function

* Remove get_use_rocpd that was used in this pr (one still remains)

* Rename ompt_get_args_string and reuse in tool_tracing_callback_stop

* Make lock init and destroy cb instant

* [Prototype] ROCPD Name fix

* [Prototype] ROCPD Name fix P1

* [Prototype] ROCPD Name fix P2

* ROCPD Name fix

* Var name changes

* Rewrite cb overwrite to single function

* [Important] Use parallel_data as key for parallel callback map

* Fix workflow failure

* Make cpp USE_ROCM consistent with hpp and use default constructor if USE_ROCM = 0

* Add missing ROCPROFILER_VERSION check

* Improve readability

* Make ompt storage maps thread local

* Part 1: Variable name fix, memory cleanup, and fixed asserts

* Part 2: Add comments

* Part 3: Add CI_THROW

* Part 4: Formatting

* Part 5: Move #include to cpp
2025-10-07 19:01:25 -04:00
Kian Cossettini 7eb606a582 Make lock init and destroy cb events instant (#1074)
Removed names changes for `ROCPROFILER_OMPT_ID_lock_init` and `ROCPROFILER_OMPT_ID_lock_destroy`. 
Made both of these callbacks instant.
2025-09-24 07:41:47 -04:00
Sajina PK 2da209da7f SWDEV-536287 - Detect SELinux mode and log error if enabled (#819)
* Detect SELinux mode and fail-fast

* Detect SELinux status by reading /sys/fs/selinux/enforce during initialization.
* Fix the verbose mode for HIP Stream events

* Add more information in the logs
Add information to the user about how to change the setting
2025-09-03 09:16:36 -04:00
Kian Cossettini 07a7b9b845 Use rocprofiler-SDK for OMPT tracing (#702)
Switch to using SDK for OMPT tracing and remove older OMPT code path
2025-08-26 16:54:01 -04:00
Milan Radosavljevic df7b9d559f Fix collecting of stream id's for rocpd (#751) 2025-08-26 16:17:42 -04:00
systems-assistant[bot] 1f86010ca2 ROCpd support [Part 2] (#109)
* Rocpd part 2, caching

* Fix shadowed variables

* backward compatibility

* Fixed designated initializers

* Fix timemory include

* Remove benchmark & Fix build issues for rhel

* Add missing bracket

* Fix shadowing and pedantic

* Fix pedantic pt2

* Fix duplicated SDK calls

* Add decay in get_size_impl

* Rename sample cache to trace cache

* Add cache storage supported types

* Resolving track naming in sampling module

* fix sampling of flushing thread

* fix sampling of flushing thread 2

* throw exception upon store while buffer storage is not running

* Prevent fork crashing

* Fix rebase issue

* Applied suggestions from code review

* Change flushing thread to use PTL

* Fix agent creation order

* Fix stream id ci throw

* Remove force setup of rocprofiler-sdk

* Code cleanup

* Change initialization for agent

* Add missing namespace

* Fix the mismatch within the tool_agent->device_id

* Switch from using handle to use agent type index

* Fix pmc info comparator in metadata registry

---------

Co-authored-by: Aleksandar <aleksandar.djordjevic@amd.com>
Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com>
Co-authored-by: Marjan Antic <marantic@amd.com>
2025-08-19 22:01:04 -04:00
habajpai-amd 15fb4943e2 Fix the openmp-target ctest (#300)
- openmp-target: add runtime rpath for libomptarget and update tests
- Handle events not associated with a HIP Stream
  - Kernels from OpenMP target offload are not associated with a HIP stream. Fix handling with the callback record's stream_id is 0

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: c424dac261]
2025-07-31 10:41:27 -04:00
ajanicijamd e2fc692ee0 Allow events to be grouped by HIP stream ID (#274)
- Corelate memory_copy and kernel_dispatch events with their HIP stream_id and add stream_id as an annotation in Perfetto.
- By default, group memory_copy and kernel_dispatch events in Perfetto output by their stream_id.
- Add option, with the configuration setting ROCPROFSYS_ROCM_GROUP_BY_QUEUE, to group by HSA queue instead.

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 4b4a846b58]
2025-07-23 21:28:26 -04:00
darren-amd a5ca94ab9c Fix ROCtx event ranges in trace output (#278)
* Fix marker api traces

* Remove space

* Formatting change

* Small change

* Update Changelog

* Add period to changelog

* Update source/lib/rocprof-sys/library/rocprofiler-sdk.cpp

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* Fix roctx tests

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: c996c23a13]
2025-07-14 19:31:14 -04:00
David Galiffi 0403aaa97f Use clang-format-18 for source formatting (#256)
* Updating clang-format to v18

- Updates the pre-commit-config
- Formats source files according to the utility

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update format source workflow

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CONTRIBUTING

* Update comment in .clang-format

* Update CONTRIBUTING.md

* Update helper script

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 1e13b590e7]
2025-06-22 08:48:08 -04:00
David Galiffi 133834335d Unhandled enum in switch statement (#247)
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 244c193a57]
2025-06-17 09:57:27 -04:00
David Galiffi c7c3c3f97e Use rocprofiler-sdk for RCCL-API tracing (#126)
- Add support for RCCL API tracing through rocprofiler-sdk.
- Refactored the comm_data code to use the SDK RCCL_API callbacks.
- Add a runtime version check for SDK to gate callback enablement, rather than just the compile-time check.
- Fixed: SAMPLING_TIMEOUT was not being handled correctly in add_test.

[ROCm/rocprofiler-systems commit: af77d93f75]
2025-06-06 11:36:17 -04:00
habajpai-amd 75a335b245 Add corr_id for HIP Runtime API in Perfetto (#218)
for SWDEV-533883

---------

Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>

[ROCm/rocprofiler-systems commit: 39090bfc54]
2025-05-29 16:00:31 -04:00
anujshuk-amd 31dc2414e3 Reverting PR-154 Changes since VCN data not seen on Perfetto file (#191)
[ROCm/rocprofiler-systems commit: ff109912c2]
2025-05-02 16:19:43 -04:00
Sohaib Nadeem 160edff37f Initialization fixes (#154)
- Remove tooling initialization from rocprofiler_configure:
when rocprofiler configure is called from __hip_module_ctor
(which in turn is called as a global constructor when loading shared
libraries or before main in a hip program), initializing tooling
in it can cause problems because it is too early to do some of the tasks
that it involves (e.g. opening shared libraries, creating threads).
Instead, we rely on rocprofsys_main to initialize tooling later.

- Skip rocprofiler_configure if ROCPROFSYS_PRELOAD is not set since
preload is required for tooling (such as perfetto, which is used by
the rocprofiler callbacks) to be initialized.

- Revert RCCL initialization changes: These are no longer needed since rocprofsys_init_tooling_hidden will not
be called from rocprofiler_configure

- Force rocprofiler_configure in rocprofsys_init_tooling_hidden if it hasn't been
called through __hip_module_ctor global constructor

[ROCm/rocprofiler-systems commit: 0e535daa93]
2025-04-21 17:04:24 -04:00
David Galiffi bb4ed0b3ba Add rocm-6.4 to workflows (#165)
* Add rocm-6.4 to workflows

* Update containers.yml

* Update cpack.yml

* Update cpack.yml

* Disable OpenMP Target Examples on GitHub Runners

* Fix build warnings.

Switch statements with unhandled enums.

* Enable testing on 6.3 and 6.4

* Ubuntu 24 workflow. Build both ROCm 6.3 and 6.4

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 169c9a0d49]
2025-04-18 09:32:26 -04:00
Sajina PK 04fb7e4fe7 RocJpeg cmake and document fixes (#157)
- Fix for rocjpeg sample cmake due to changes in the rocJPEG project
- Fix for rocprofiler-sdk version check - change the format
- Edits to docs for jpeg and vcn activity support - mention that these values may not be supported on all ASICs.

[ROCm/rocprofiler-systems commit: fad3a0d341]
2025-04-09 16:20:02 -04:00
David Galiffi 70b456f2a3 Fix "ROCPROFSYS_USE_ROCM" runtime config setting. (#144)
[ROCm/rocprofiler-systems commit: b6b39af011]
2025-03-27 16:03:46 -04:00
David Galiffi bd0eeb9555 Reapply "Upgrade ROCm-SMI to AMD SMI (#86)" (#147)
* Reapply "Upgrade ROCm-SMI to AMD SMI (#86)"

This reverts commit 9fcea73122.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 85bbea4954]
2025-03-25 17:31:27 -04:00
David Galiffi 92e439627e Fix logging error (#130)
When we create profile config with rocprofiler we log the counters being registered. However, this log was being skipped in certain cases.

[ROCm/rocprofiler-systems commit: eb0a969a9c]
2025-03-06 14:30:45 -05:00
Sohaib Nadeem 95a07edf0b Fix hardware counter summary files not being generated after profiling (#124)
- Register a cleanup function in tim::manager instance to write out data in
counter storages

- The counter_storage::write() calls in tool_fini happen after the storage is destroyed
which is too late for the write to happen.

- Adjust traits for counter_data_tracker

- Add MIN, MAX, VAR, STDDEV columns
- Remove DEPTH, UNITS, %SELF columns

- Update "add_validation_test" to test for the existence of output file(s).
- Added step to test perfetto output for `transpose-rocprofiler-sampling`
and `transpose-rocprofiler-binary-rewrite`

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 42922ec851]
2025-03-05 16:05:18 -05:00
Sajina PK 1db0539c30 Add support for rocJPEG API tracing (#116)
- Add rocDecode API Tracing support using domain `rocjpeg_api` in ROCPROFSYS_ROCM_DOMAINS.
- Modify existing `videodecode` and `jpegdecode` ctests to verify API tracing
- Print Perfetto values for easy debugging in verbose mode
- Convert CMake error to a warning and skip building the "decode" examples if requirements are not found

[ROCm/rocprofiler-systems commit: 3bea1d8eac]
2025-02-25 21:14:14 -05:00
Sajina PK fba64f8acd Add support for VA-API and rocDecode tracing (#92)
- VA API tracing using Timemory gotcha wrappers.
- rocDecode API tracing integration using callback to ROCPROFILER_CALLBACK_TRACING_ROCDECODE_API
- Updated videodecode ctest to validate rocDecode APIs in perfetto trace. 

[ROCm/rocprofiler-systems commit: 697d1ac02f]
2025-02-11 13:08:23 -05:00
David Galiffi 9fcea73122 Revert "Upgrade ROCm-SMI to AMD SMI (#86)" (#100)
This reverts commit 8c5db3f1d8.

[ROCm/rocprofiler-systems commit: b3eee295dd]
2025-02-07 11:45:26 -05:00
cfallows-amd 8c5db3f1d8 Upgrade ROCm-SMI to AMD SMI (#86)
* Integrating amd-smi into rocprofiler-systems due to rocm-smi deprecation.
* No functionality changes to users other than naming conventions.
* New tracks available in perfetto- gpu busy percentage metrics now splits gfx busy into separate gfx, umc, and mm engine measurements.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 0c32dfd6bc]
2025-01-30 21:32:07 -05:00
David Galiffi b29cfac106 Update to use rocprofiler-sdk (#55)
- Renames the CMake option "ROCPROFSYS_USE_HIP" to "ROCPROFSYS_USE_ROCM"
- Remove the "ROCPROFSYS_USE_ROCM_SMI option. Controlled with the "ROCPROFSYS_USE_ROCM" option, instead.
   - Runtime configuration can still toggle ROCPROFSYS_USE_ROCM_SMI to disable the sampling.
- Rename ROCPROFSYS_HIP_VERSION macro to ROCPROFSYS_ROCM_VERSION and remove blocks for `ROCPROFSYS_ROCM_VERSION < 60000`
- Remove ROCPROFSYS_USE_ROCTRACER and ROCPROFSYS_USE_ROCPROFILER
- Update test cases
- Update docker files and workflows to install cmake 3.21, which is required for the rocprofiler-sdk findPackage script.
- Removed rocm-6.2 from workflows due to a rocprofiler-sdk API change. 

[ROCm/rocprofiler-systems commit: 88aa2d3cbe]
2024-12-13 18:48:39 -05:00