Grafico dei commit

289 Commit

Autore SHA1 Messaggio Data
marantic-amd 51f49d8835 Add notice for the newly deprecated env variables (#2690) 2026-01-20 13:59:31 +01:00
Milan Radosavljevic b533f56197 Add automatic PyTorch library discovery for Python applications (#2623)
* Add automatic PyTorch library discovery for Python applications (#2623)
2026-01-20 08:42:49 +01:00
lloginov-amd e49b501e9a Add scratch memory support (#2211) 2026-01-19 16:24:30 +01:00
habajpai-amd b53c99669c Revert "fix: prevent double-free crash during process exit in amd-smi (#2213)" (#2640)
This reverts commit 7b00d3a89b.

The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
2026-01-16 16:08:52 -05:00
Milan Radosavljevic 940488ed58 [rocprofiler-systems] Fix naming and description of process_page category (#2606) 2026-01-15 16:10:50 +01:00
Milan Radosavljevic 318d13870f [rocprofiler-systems] Update logging to use spdlog library (#2428)
## Motivation

- Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Better performance through compile-time formatting
- Consistent formatting using fmt library
- Runtime log level control via arguments and environment variables
- Easier maintenance and debugging capabilities

## Technical Details

- Added spdlog as a submodule and integrated it into CMake build system
- Created new `rocprofiler-systems-logger` library wrapping spdlog functionality
- Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.)
- Implemented log level control through command-line arguments and environment variables
- Converted assertion macros to proper error handling with exceptions and std::abort()
2026-01-14 15:27:51 -05:00
Sajina PK b3f59a37e4 [Rocprofiler-system]: Fix GPU event enumeration for rocprof-sys-avail and CLI option for parsing GPU HW Counters (#2476)
## Motivation

The `rocprof-sys-avail -H -c GPU` command is returning blank output which is expected to display a list of available GPU hardware counters instead.
The `rocprof-sys-sample` and `rocprof-sys-run` is missing the `--gpu-events` option for specifying GPU counter events during profiling.

## Technical Details

The initialize_event_info() function had a logic bug where it only called set_agents() if the agent_manager was empty, but the actual issue was that the gpu_agents and cpu_agents vectors were empty even when agents were discovered.
Fixed the conditional logic to properly call set_agents() when gpu_agents and cpu_agents are empty, regardless of the agent_manager state.

Added the `--gpu-events (-G)` option which sets the `ROCPROFSYS_ROCM_EVENTS` environment variable to the specified values.

Fixes an issue where unsupported GPU/APU arch is being skipped gracefully - more details about this issue in the below comment.
2026-01-09 11:59:45 -05:00
Aleksandar Djordjevic aecea25a61 [rocprofiler-systems] CMake Cleanup (#2455)
## Technical Details

- Removed `configure_file()` call that was generating `defines.hpp` from `defines.hpp.in` and update CMake file to reference renamed file.
- Remove duplicate `find_library(pthread_LIBRARY NAMES pthread pthreads)`
2026-01-07 14:07:37 -05:00
anujshuk-amd 596ffce5fe [rocprof-sys] Fix segfault from thread ID array overflow (#2172)
**Thread limit configuration and enforcement: **

* Added a check in `CMakeLists.txt` to ensure `ROCPROFSYS_MAX_THREADS` is at least 128, automatically setting it to 128 with a warning if a lower value is provided.
* Replaced hardcoded thread limit (`allowed_max_threads`) in `pthread_create_gotcha.cpp` with the configurable `ROCPROFSYS_MAX_THREADS` value, ensuring all runtime checks and warnings use the actual configured limit.

**Documentation improvements: **

* Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules.

**Test updates: **

* Modified thread limit tests to use the configurable `ROCPROFSYS_MAX_THREADS` value instead of a hardcoded limit and expanded the range of tested thread values.
* Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.
2026-01-07 14:03:37 -05:00
habajpai-amd 9e4d1c31c7 fix: prevent static initialization deadlock in thread_data (#2474)
* fix: prevent static initialization deadlock in thread_data

* update comment
2026-01-06 16:39:32 +05:30
marantic-amd bb83791b17 Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code (#2434) 2025-12-25 13:36:04 +01:00
marantic-amd c3132773c8 Fix agent device ID in the cached kernel_dispatch trace (#2452) 2025-12-25 10:23:16 +01:00
Milan Radosavljevic 719556fbba [rocprofiler-systems] Add SIGKILL delay option (#2384)
## Motivation

When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.

## Technical Details

- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization
2025-12-22 21:17:57 -05:00
habajpai-amd 447025011a [Rocprof-Sys] Resolve crash when profiling TensorFlow GPU application (#2381)
* fix: resolve crash when profiling TensorFlow GPU application

* incorporate review comments

* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
2025-12-22 14:00:55 -05:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
Aleksandar Djordjevic 7da3275b42 [rocprofiler-systems] Improve metadata parsing (#2238)
* Improve metadata JSON parsing
* Fix string ownership
2025-12-22 12:30:51 +01:00
habajpai-amd 7b00d3a89b fix: prevent double-free crash during process exit in amd-smi (#2213) 2025-12-19 11:56:40 +05:30
habajpai-amd b4e04b07ed test: add unit tests for common utilities from PR #1249 (#2237)
* test: add unit tests for common utilities from PR #1249

* incorporate review comments specific to tests formatting

* use filesystem API instead of std::system for safer cleanup

* Add ghc/filesystem submodule v1.5.14 for portable C++17 filesystem support

* fix: add cmake/GhcFilesystem.cmake for CI submodule auto-checkout

* incorporate review comment

* incorporate review comment
2025-12-18 11:03:14 +05:30
marantic-amd 79ad00fb15 Scale down memory usage data when the actual data is stored to cache (#2343) 2025-12-17 14:57:41 +01:00
Aleksandar Djordjevic 0b4a309ff7 [rocprofiler-systems] Add span (#2142)
* Add span
* Update unit tests
2025-12-16 13:38:47 +01:00
Milan Radosavljevic 666e76deac [rocprofiler-systems] Add cached demangler and replace old demangle (#2135)
* Add cached demangler and replace old

* Add unit tests

* Applied suggestions from code review

* Applied suggestions from code review
2025-12-16 08:32:18 +01:00
Mario Limonciello d1aaae2539 Run pre-commit's whitespace related hooks on projects/rocprofiler-systems (#2123)
In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-04 23:39:42 -05:00
habajpai-amd 30161885e2 refactor: centralize update_env across binaries with unit test added … (#2029)
* refactor: centralize update_env across binaries with unit test added for testing

* removed unused includes suggested by clangd and small cleanup

* use centralized update_env in argparse as well

* review comments incorporated

* move update_env tests closer to common library

* fix: missing common:: prefix in rocprof-sys-sample

* cmake formatting
2025-12-04 19:24:27 +05:30
Milan Radosavljevic fddef714a0 [rocprofiler-systems] Add trace_cache unit tests (#2086)
Improve test coverage and reliability of the trace_cache module by adding comprehensive unit tests for all major components.
2025-12-03 09:25:33 -05:00
Milan Radosavljevic 09a9f9e31d [rocprofiler-systems] Improve rocpd writing speed (#2061) 2025-12-03 13:11:15 +01:00
marantic-amd 3b11e01716 Perfetto traces from cached data (#1704)
## Motivation

The idea is to unify the way and place where we store our traces. Current implementation uses `trace_cache` for rocpd traces, but perfetto is in lined inside of each module. This change allows us to have a single point in code where we will collect data, process it and store it in the desired format. This means that we can declutter the code further and have single point of responsibility and single point of failure.

## Technical Details

New `processor` (perfetto_post_processing.cpp) is added to the `trace_cache` which purpose is to use the cached data to populate perfetto tracks. Cache manager is responsible for keeping the instance of this processor and for its lifetime.
2025-12-01 09:59:16 -05:00
Kian Cossettini b506c75f28 [rocprof-sys] Fix roctx wall clock tree, change timemory push/pop to use proper category, and add roctx as valid domain choice (#2062)
When doing this ticket, I also noticed the program would SEGFAULT when ROCPROFSYS_ROCM_DOMAINS=roctx even though the docs tell us we can do this. Went ahead and fixed that.

Also noticed that timemory push/pop in rocprofiler-sdk.cpp was always using category::rocm_marker_api instead of CategoryT. Fixed that as well.
2025-12-01 09:50:58 -05:00
Kian Cossettini ae29018bb0 [rocprofiler-systems] Enable HOST OMPVV runtime-instrumentation CTests (#1970)
* Enable HOST ompvv runtime-instrumentation ctests

* Fix rocprofiler-systems-avail-regex-negation test failure

* Exclude problematic function from instrumentation

* Make push pop skip an env option for ctests

* Remove SKIP_PUSH_POP_CHECK from argument parse

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-12-01 09:26:24 -05:00
Kian Cossettini 76a23eab14 [rocprofiler-systems] Add support for ompt_callback_thread_begin (#1681)
* Add thread_begin callback

* Make OMPT callbacks that are instant have start_ts = end_ts
2025-11-26 13:38:04 -05:00
marantic-amd daf8596ce9 [rocprof-sys] Process all information regarding agents and store them as extdata in rocpd database (#1880)
## Motivation

Resolved: SWDEV-566226

The current implementation of agents inside of rocprof-systems keeps just the minimal necessary set of information required for populating the `info_agent` table inside of rocpd database. There is a sufficient amount of data that is being left out from database, so this change should fix that and store the additional agent information as an `extdata` row inside of `info_agent` table.

## Technical Details

This PR introduces additional filed inside of `agent` structure inside which is representing the JSON formatted string of all the additional information we can acquire about particular agent. This data is processed and added during the initial fetching of agents, and afterwards pushed inside of the database.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-25 17:33:12 -05:00
habajpai-amd 1a3564a51a [rocprof-sys] Fix fork() handling for GPU profiling and AMD SMI (#1930)
- Fix fork() handling for GPU profiling and AMD SMI
- Add hipMallocConcurrency test for CI with GPU
2025-11-24 09:21:27 -05:00
marantic-amd ebd55d2ce0 Track process_sampler state for CPU sampling (#1993) 2025-11-24 15:03:08 +01:00
Sajina PK d77b245730 [Rocprofiler-systems] : Refactor papi enumeration to fix a hang on Intel systems (#1672)
* Refactor papi enumeration to fix a hang on Intel systems

- Add an exclude argument to available_events_info() for
  perf_event_uncore causing hang like case on Intel systems with large
number of uncore events.
- Enumerate papi available events only when papi events are specified by
  users inside early initialization logic
- Move papi available event query for ROCPROFSYS_SAMPLING_OVERFLOW_EVENT
  config setting to the avail component, to move the heavy logic outside
initialization.
- Make category option for rocprof-sys-avail -H -c case insensitive
- Provide new option to query available overflow events that can be
  specified for ROCPROFSYS_SAMPLING_OVERFLOW_EVENT using new command
option rocprof-sys-avail -H -c overflow

* Update projects/rocprofiler-systems/source/bin/rocprof-sys-avail/common.cpp

Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com>

* Update timemory submodule pointer

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Fix errors on compile

* Change 1: Optimization for the category matching lambda

Optmization changes.

* Modify the rocprof-sys-avail -c option for overflow

Overflow should not be displayed as a device in rocprof-sys-avail -H -c CPU

Users can instead do regex on summary where overflow is appended in description

User can do rocprof-sys-avail -H -c CPU -d -r overflow

* Revert change to column width

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Milan Radosavljevic <milan.radosavljevic@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-21 00:19:58 -05:00
Milan Radosavljevic 4d670099fa [rocprof-sys] Refactor trace_cache architecture with improved type erasure and processing pipeline (#1710)
- Redesigned buffer_storage with a flush_worker pattern for better thread management and resource cleanup
- Introduced type-safe abstractions through new components: cacheable.hpp, cache_type_traits.hpp, sample_processor.hpp, and type_registry.hpp
- Optimized type erasure implementation in sample processor to reduce runtime overhead
- Renamed rocpd_post_processing to rocpd_processor and restructured the processing pipeline
- Removed storage_parser.cpp and integrated functionality into header-based template implementation
- Enhanced cache_manager with improved processing workflow and better separation of concerns
2025-11-20 14:18:13 -05:00
habajpai-amd b09834e784 refactor: duplicated path helpers into common/path.hpp (#1249)
* refactor: duplicated path helpers into common/path.hpp

* update rocprof-sys-instrument to use shared path utility

* Add path::realpath(std::string[, std::string*]) helper function in common/path.hpp for binaries

* common: centralize remove_env implementation in environment.hpp

* remove unused includes from rocprof-sys binaries and argparse

* changing set to unordered_set wherever sorting is not required and additional cleanup

* review comment incorporated

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* copilot review for remove_env incorporated

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-20 09:55:00 +05:30
Milan Radosavljevic 3ee393047c Add user_api_active flag to enable/disable user-defined regions (#312)
* Add user start/stop bool

* Update documentation for user-api

* Update projects/rocprofiler-systems/source/lib/rocprof-sys-dl/dl.cpp

Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>

* Format fix

---------

Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
2025-11-18 13:48:27 -05:00
Sajina PK 09b8342e22 [Rocprofiler-systems] : Add XGMI and PCIe metrics to the profiling data (#1628)
* Add XGMI and PCIe metrics to the profiling data

Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
  * XGMI link width in bits
  * XGMI link speed in GT/s
  * Per-link read bandwidth (KB)
  * Per-link write bandwidth (KB)

- Add new categories for PCIe metrics:
  * PCIe link width
  * PCIe link speed in GT/s
  * Accumulated bandwidth (MB)
  * Instantaneous bandwidth (MB/s)

* Fix VCN/JPEG insert logic

* Modify the gpu_metrics struct to accomodate XCP structure

* Add ctest automation for gpu interconnect metrics

* Refactor to move gpu_metrics struct and serialization to another file

* Possible fix for timeout in CI

Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.

* Change2: Address review comments

Change ctest sampling to avoid timeout
Change variable name and code structuring

* Add option in ctest to run rocprof-sys-run without rewrite

Run transferbench with rocprof-sys-run without sampling

* Change3: Fix sample insert bug and address review comments

xgmi and pci support check
renaming variables
additional hip_api validation in rocpd

* Reduce the load from the trnasferBench sample

The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...
2025-11-14 19:42:33 -05:00
Milan Radosavljevic a77be32660 Prevent duplicated sdk events (#1826) 2025-11-13 22:36:36 -05:00
Milan Radosavljevic 833c250c27 Add clean up fixture for trace cache temporary files (#1836)
* Add clean up fixture for trace cache tmp files

* Switch to bash instead of cmake running command
2025-11-13 21:01:04 -05:00
Aleksandar Djordjevic f39a60ac25 [rocprofiler-systems] Apply new CMake formatting for the latest gersemi version (#1778)
* Fix cmake formatting

* Updated rev. in `.pre-commit-config.yaml`

* Pin the gersemi used in CI to v0.23.1, matching the pre-commit

---------

Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-10 13:08:44 -05:00
Milan Radosavljevic d9b00da102 Add clean up of buffered_storage files (#1738)
* Add clean up of buffered_storage files

* Add step to workflows to test for remaining temp files after tests

* Applied suggestions from code review

* add deletion of all cache files

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-07 11:51:09 -05:00
Milan Radosavljevic a9082a7158 ROCpd schema fetching from rocprofiler-sdk (#1501)
- Integrate rocprofiler-systems with rocprofiler-sdk-rocpd to fetch schema
- If rocprofiler-sdk-rocpd is not availabe, use embedded schema files. With this we provide rocpd format support even if ROCm is not available
- Include detection in CMake if rocprofiler-sdk-rocpd package is available (and valid), and build database class upon that
- Update embedded schema that is used as a fallback.
- Update some validation tests to account for schema changes.
2025-11-07 09:45:29 -05:00
habajpai-amd 590c6c3b4f fix: null pointer after delete in get_stream_id (#1720) 2025-11-06 23:43:34 -05:00
habajpai-amd ea31a0bf18 rocprofiler-sdk: fix per-record group_by_queue scoping (#1676)
* rocprofiler-sdk: fix per-record group_by_queue scoping

* added under resolved issues to CHANGELOG.md

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-11-05 21:46:44 -05:00
David Galiffi 4b0fb2cdf5 Rename "corr_id" to "stack_id" in Perfetto annotations to match new n… (#1618)
* Rename "corr_id" to "stack_id" in Perfetto annotations to match new naming in schema.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* correlation_id.ancestor was not added until ROCPROFILER_VERSION 1.0

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2025-11-04 14:20:11 -05:00
Kian Cossettini 883caf2719 [rocprofiler-systems] Overhaul skip condition of implicit_task and add ROCPD validation test (#1589)
- Add rocpd validation check and fix implicit_task check
- SWDEV-562896
2025-10-31 09:59:23 -04:00
marantic-amd 08d259c24c Fix the issue when sampling JAX with rocpd (#1552) 2025-10-27 09:59:51 -04:00
Milan Radosavljevic 8806be162c Change how cache manager handles child process trace cache for rocpd (#1033)
* Change how cache manager handles child process trace cache

* Sampling and backtrace metrics to cache

* Apply cmake formatting

* Fix parsing of metadata json

* Code clean up

* Fix build nlohmann json from source

* Fix storage parsed finished callback

* Revert sampling for child process

* Change cache file name generating

* Fix thread start stop

* Fix process start end timestamp

* Applied suggestions from code review

* Try with late start of flushing task thread

* Change dockerfiles for ci

* Revert changes on github workflows

* Remove json_fwd.hpp include

* fix dump

* Build nlohmann/json by default

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update location of build artifacts for nlohmann/json

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Revert use_output_suffix

* Remove unused logs

* Fix cache store inside counter due to structure change

* Remove decode tests from debian ci

* Fix issue where all databases have the same UUID (#1499)

Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>

* Removing the cpack and install steps to save space

* Revert "Remove decode tests from debian ci"

This reverts commit ddabf6dd142dcf438e6b8997b8abe86f2c868468.

* Revert "Removing the cpack and install steps to save space"

This reverts commit 973da3a1ba99d99d529af5269d30e177092f9bfa.

* Add prepare-runner job as dependency to clean up the space

* Fix formatting

* Free up even more space

* Remove verbose for workflows

* remove hw_counters from ext_data

* move space clean up inside container

* try to remove external folder to free up space

* Check space

* Refactor Cleanup to it's own step

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: Aleksandar Djordjevic <aleksandar.djordjevic@amd.com>
Co-authored-by: Aleksandar Djordjevic <adjordje@amd.com>
2025-10-24 11:47:15 -04:00
Milan Radosavljevic 48fdcebf62 Add caching of category region for rocpd (#1420)
* Add caching of category region

Fix vaapi traces

Remove region_with_name

* Applied suggestions from code review
2025-10-20 16:05:14 -04:00
Milan Radosavljevic 00faa48ac2 Add flushing of perfetto buffer (#1417)
- Add flushing of perfetto buffer
- Add `ROCPROFSYS_PERFETTO_FLUSH_PERIOD_MS` config setting.
- Update CHANGELOG.sh
- Resolves SWDEV-518817

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-10-17 09:30:29 -04:00