Wykres commitów

684 Commity

Autor SHA1 Wiadomość Data
David Galiffi 4c458fae9c [rocprofiler-systems] Fix ROCM_VERSION guard used for the scratch_memory_record structure (#2948)
- Fix ROCM_VERSION guard used for the scratch_memory_record structure
- This fixes a rocm/7.0.2 build failure

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-29 09:34:27 -05:00
Sajina PK e265e0e24f [rocprofiler-systems]: Add documentation for communication API tracing (#2478)
Add documentation for communication runtime tracing for MPI, UCX, RCCL.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2026-01-27 23:48:27 -05:00
Kian Cossettini 0eac446cb0 [rocprofiler-systems] - Implement subset of CTests into PyTests (#2666)
Convert a subset of the ctest to pytest to be used in TheRock CI.
Create a new cmake flag `ROCPROFSYS_INSTALL_TESTING` to control test suite installation.
- pytest package will be installed to share/rocprofiler-systems/tests
- all compiled examples are put in share/rocprofiler-systems/examples
- all test relevant scripts are put in share/rocprofiler-systems/tests
- see README.md in share/rocprofiler-systems/tests
2026-01-26 23:10:01 -05:00
David Galiffi 5a7a32bc12 Bump version from 1.4.0 to 1.5.0 (#2834)
Bumping the version now that 7.10 has branched.
2026-01-24 01:10:27 -05:00
marantic-amd 7af2dba741 [rocprof-sys] Align rocpd symbols of the same counter types (#2675)
## Motivation

In order for Optiq to be able to detect that counter tracks are of the same type, we aligned `info_pmc` symbol naming across the tracks of the same type. Being able to know this will be useful for grouping and categorizing similar types of counter tracks and for setting up a consistent y-axis scale when plotting the values on charts.

## Technical Details

Replace unique and/or ordered symbol names with counter-common symbol name which will be the same for the counters of the same type, with counter track name remaining the unique identifier for that counter track. For example, the "symbol" field was "JpegAct_0" but is now "JpegAct".
2026-01-23 00:36:08 -05:00
marantic-amd 956a73c4c8 [rocprof-sys] Use fmt APIs to construct strings instead of JOIN (#2643)
## Motivation

With the introduction of the new logging system base on `spdlog` library, opportunity shows to replace `timemory` dependent JOIN implementation with `fmt` library `format` and `join` APIs, which are shipped as a part of `spdlog` lib

## Technical Details

Use `fmt` provided APIs to properly format and package strings.
2026-01-23 00:34:58 -05:00
habajpai-amd 3318c540ea fix roctx range markers not paired correctly in rocpd output (#2793)
## Motivation

Fix roctx range markers (Push/Pop, Start/Stop) not being displayed correctly in rocpd output. The Visualizer was showing only Stop/Pop events as instant markers instead of proper duration ranges with labels, while Perfetto output displayed them correctly.

## Technical Details

In `tool_tracing_callback_stop()`, the rocpd/database output was using `user_data->value` (timestamp of the Pop/Stop event) instead of `begin_ts` (corrected timestamp from the corresponding Push/Start event) when calling `cache_region()`.

The Perfetto output already used `begin_ts` correctly (line 818). This change aligns the rocpd output with the Perfetto behavior by using `begin_ts` instead of `user_data->value` (line 887).

Updated rocpd validation rules
2026-01-22 23:47:43 -05:00
Sajina PK 6cf9b217d3 Update CHANGELOG.md (#2813)
Modified the release version for the XGMI/PCIe feature.
2026-01-22 17:48:30 -05:00
habajpai-amd 7e74d163fd [rocprof-sys] Fix RCCL comm_data counters in rocpd output (#2607)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
The validate-rccl-* tests were failing because "RCCL Comm" counters were not being written to perfetto traces when using the new cached-perfetto approach.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
Root Cause: The write_perfetto_counter_track() in rccl.cpp was only called when config::get_use_perfetto() returned true, which requires ROCPROFSYS_TRACE_LEGACY=ON. This meant RCCL counters weren't captured with the new trace cache approach.

Solution: Integrated RCCL with the trace cache system:

Changes to source/lib/rocprof-sys/library/rocprofiler-sdk/rccl.cpp:

- Added cache_rccl_comm_data_events<Track>() function to store RCCL comm data via pmc_event_with_sample with category::comm_data
- Modified tool_tracing_callback_rccl() to always cache events for new perfetto approach, while preserving legacy write_perfetto_counter_track() calls for backward compatibility

Changes to tests/rocprof-sys-testing.cmake:

- Added rccl_api to ROCPROFSYS_ROCM_DOMAINS to enable RCCL API callback tracing

Handler verification: The perfetto_processor_t already has a handler for ROCPROFSYS_CATEGORY_COMM_DATA in m_pmc_track_map that processes the cached events.
2026-01-22 15:38:19 -05:00
Kian Cossettini 28b2ade7d2 Update mentions of OpenMP to reflect newer implementation (#2701)
Update timemory examples in docs to use the `rocprofiler-sdk` API.
2026-01-21 07:18:51 -05:00
Kian Cossettini 7c9361190b [rocprofiler-systems] Fix MPI recv_data calculation (#2694)
Fix incorrect `mpi_recv` calculation. It was using `_send_size` instead of `_recv_size` for `mpi_recv`.
2026-01-20 16:17:22 -05:00
Sajina PK 15c82d6da8 [rocprofiler-system]: Enable UCX Communication API tracing (#2306)
## Motivation

Enable UCX communication tracing and communication metadata 

## Technical Details

Implement UCX API wrappers to trace transport-layer communication. This adds communication data tracking and exposes “UCX Comm Send/Recv” timelines, enabling detailed analysis of MPI, OpenSHMEM, and other UCX-based runtime communication patterns.

- Implements function interception for UCX functions across multiple categories using gotcha component.
- Extended comm_data component to track UCX send/recv operations - Added ucx_send and ucx_recv labels for Perfetto counter tracks. Integrated UCX data tracking with existing MPI/RCCL tracking infrastructure.
- Added ROCPROFSYS_USE_UCX configuration option (enabled by default).
- Created FindUCX.cmake module for UCX header detection. Falls back to internal UCX headers if system headers not found.
- Updated all Dockerfiles  to include UCX dependencies.
2026-01-20 13:16:43 -05:00
Kian Cossettini 698ac6b8bc [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626)
## Motivation
 - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
 - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` 
 - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
 - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
 - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
 - OMPVV offload tests now only compile if `amdflang` version is `>= 20`
 - Improve link time by reducing the number of GFX targets that binaries need to support.
   - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
2026-01-20 12:13:21 -05:00
marantic-amd 51f49d8835 Add notice for the newly deprecated env variables (#2690) 2026-01-20 13:59:31 +01:00
Milan Radosavljevic b533f56197 Add automatic PyTorch library discovery for Python applications (#2623)
* Add automatic PyTorch library discovery for Python applications (#2623)
2026-01-20 08:42:49 +01:00
David Galiffi c83b3aae07 Fix Python Formatting (#2679)
Updated version of black to 26.1.0 updated some formatting rules

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-19 21:26:50 -05:00
lloginov-amd e49b501e9a Add scratch memory support (#2211) 2026-01-19 16:24:30 +01:00
habajpai-amd b53c99669c Revert "fix: prevent double-free crash during process exit in amd-smi (#2213)" (#2640)
This reverts commit 7b00d3a89b.

The workaround is no longer needed - root cause fixed in:
- rocm-smi-lib (PR #2531): Made devInfoTypesStrings file-local static
- amdsmi (PR #2575): Added visibility("hidden") attribute
2026-01-16 16:08:52 -05:00
Kian Cossettini 9f014db6a4 [rocprofiler-systems] Update install path for examples (#2625)
* Update install path for examples to `share/rocprofiler-systems/examples`

----

Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-15 21:51:16 -05:00
Milan Radosavljevic 940488ed58 [rocprofiler-systems] Fix naming and description of process_page category (#2606) 2026-01-15 16:10:50 +01:00
Milan Radosavljevic 318d13870f [rocprofiler-systems] Update logging to use spdlog library (#2428)
## Motivation

- Structured logging with proper log levels (TRACE, DEBUG, INFO, WARNING, ERROR, CRITICAL)
- Better performance through compile-time formatting
- Consistent formatting using fmt library
- Runtime log level control via arguments and environment variables
- Easier maintenance and debugging capabilities

## Technical Details

- Added spdlog as a submodule and integrated it into CMake build system
- Created new `rocprofiler-systems-logger` library wrapping spdlog functionality
- Replaced custom logging macros (`ROCPROFSYS_VERBOSE`, `ROCPROFSYS_DEBUG`, `ROCPROFSYS_FATAL`, `ROCPROFSYS_REQUIRE`, `ROCPROFSYS_CI_THROW`, etc.) with spdlog equivalents (`LOG_DEBUG`, `LOG_WARNING`, `LOG_CRITICAL`, etc.)
- Implemented log level control through command-line arguments and environment variables
- Converted assertion macros to proper error handling with exceptions and std::abort()
2026-01-14 15:27:51 -05:00
David Galiffi 2daec0e4d0 Revert 63713f01e0 (#2585)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
Remove Fortran example due to Palamida scan violation.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
Revert 63713f01e0.
New test to be added later.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-12 23:44:26 -05:00
Sajina PK b3f59a37e4 [Rocprofiler-system]: Fix GPU event enumeration for rocprof-sys-avail and CLI option for parsing GPU HW Counters (#2476)
## Motivation

The `rocprof-sys-avail -H -c GPU` command is returning blank output which is expected to display a list of available GPU hardware counters instead.
The `rocprof-sys-sample` and `rocprof-sys-run` is missing the `--gpu-events` option for specifying GPU counter events during profiling.

## Technical Details

The initialize_event_info() function had a logic bug where it only called set_agents() if the agent_manager was empty, but the actual issue was that the gpu_agents and cpu_agents vectors were empty even when agents were discovered.
Fixed the conditional logic to properly call set_agents() when gpu_agents and cpu_agents are empty, regardless of the agent_manager state.

Added the `--gpu-events (-G)` option which sets the `ROCPROFSYS_ROCM_EVENTS` environment variable to the specified values.

Fixes an issue where unsupported GPU/APU arch is being skipped gracefully - more details about this issue in the below comment.
2026-01-09 11:59:45 -05:00
David Galiffi cb17e59a57 [rocprofiler-systems] Improve build time by refactoring RCCL test cmake (#1656)
Improve cmake configuration time by making sure the rccl-tests are built during the build phase rather than the configuration phase.
2026-01-07 19:51:54 -05:00
anujshuk-amd c35a7dd8cb [rocprofiler-systems] Update timemory submodule (#2440)
- Fixes SWDEV-559349 
- Fix build failure caused by correct libunwind not being found in some environments.
- Updated the `timemory` submodule to commit `24407d37ab85c46ba6c18fba9498320f825ee4e4 `.
2026-01-07 19:35:23 -05:00
Aleksandar Djordjevic aecea25a61 [rocprofiler-systems] CMake Cleanup (#2455)
## Technical Details

- Removed `configure_file()` call that was generating `defines.hpp` from `defines.hpp.in` and update CMake file to reference renamed file.
- Remove duplicate `find_library(pthread_LIBRARY NAMES pthread pthreads)`
2026-01-07 14:07:37 -05:00
anujshuk-amd 596ffce5fe [rocprof-sys] Fix segfault from thread ID array overflow (#2172)
**Thread limit configuration and enforcement: **

* Added a check in `CMakeLists.txt` to ensure `ROCPROFSYS_MAX_THREADS` is at least 128, automatically setting it to 128 with a warning if a lower value is provided.
* Replaced hardcoded thread limit (`allowed_max_threads`) in `pthread_create_gotcha.cpp` with the configurable `ROCPROFSYS_MAX_THREADS` value, ensuring all runtime checks and warnings use the actual configured limit.

**Documentation improvements: **

* Updated the development guide to explain the new thread limit behavior, including how exceeding the limit is handled gracefully, how to configure it, and the build-time validation rules.

**Test updates: **

* Modified thread limit tests to use the configurable `ROCPROFSYS_MAX_THREADS` value instead of a hardcoded limit and expanded the range of tested thread values.
* Increased test timeouts to accommodate larger thread counts and ensure reliability with higher limits.
2026-01-07 14:03:37 -05:00
habajpai-amd 9e4d1c31c7 fix: prevent static initialization deadlock in thread_data (#2474)
* fix: prevent static initialization deadlock in thread_data

* update comment
2026-01-06 16:39:32 +05:30
Jason Bonnell 1d5a6e9bfe Update rocprofiler workflows to use new mi325 runner names (#2467)
* Update rocprofiler workflows to use new runner naming for mi325

* Add input options to workflow_dispatch for rocprofiler-systems CI workflow

* Update runner name on therock-ci-linux.yml as well
2026-01-05 15:41:01 -05:00
marantic-amd bb83791b17 Remove redundant ROCPROFSYS_TRACE_CACHED variable from the code (#2434) 2025-12-25 13:36:04 +01:00
marantic-amd c3132773c8 Fix agent device ID in the cached kernel_dispatch trace (#2452) 2025-12-25 10:23:16 +01:00
Milan Radosavljevic 719556fbba [rocprofiler-systems] Add SIGKILL delay option (#2384)
## Motivation

When profiling multi-process applications where a parent process sends SIGKILL to child processes, the termination can occur before the profiler has a chance to flush collected data. This PR introduces a configurable delay before SIGKILL signals are forwarded, allowing profiling data to be captured before process termination. This is workaround.

## Technical Details

- Added new configuration setting `ROCPROFSYS_KILL_DELAY` (default: 0 seconds) to specify a delay before SIGKILL signals are forwarded to other processes
- Implemented `kill_gotcha` component that intercepts the `kill()` system call
- The gotcha only delays SIGKILL signals sent to external processes (pid > 0 and not self)
- Integrated `kill_gotcha_t` into the `preinit_bundle_t` for early initialization
2025-12-22 21:17:57 -05:00
habajpai-amd 447025011a [Rocprof-Sys] Resolve crash when profiling TensorFlow GPU application (#2381)
* fix: resolve crash when profiling TensorFlow GPU application

* incorporate review comments

* updated min_rows from 3 to 2 for threads table validation as internal threads are not profiled and are now correctly bypassed
2025-12-22 14:00:55 -05:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
Aleksandar Djordjevic 7da3275b42 [rocprofiler-systems] Improve metadata parsing (#2238)
* Improve metadata JSON parsing
* Fix string ownership
2025-12-22 12:30:51 +01:00
habajpai-amd 7b00d3a89b fix: prevent double-free crash during process exit in amd-smi (#2213) 2025-12-19 11:56:40 +05:30
habajpai-amd b4e04b07ed test: add unit tests for common utilities from PR #1249 (#2237)
* test: add unit tests for common utilities from PR #1249

* incorporate review comments specific to tests formatting

* use filesystem API instead of std::system for safer cleanup

* Add ghc/filesystem submodule v1.5.14 for portable C++17 filesystem support

* fix: add cmake/GhcFilesystem.cmake for CI submodule auto-checkout

* incorporate review comment

* incorporate review comment
2025-12-18 11:03:14 +05:30
marantic-amd 79ad00fb15 Scale down memory usage data when the actual data is stored to cache (#2343) 2025-12-17 14:57:41 +01:00
Aleksandar Djordjevic 0b4a309ff7 [rocprofiler-systems] Add span (#2142)
* Add span
* Update unit tests
2025-12-16 13:38:47 +01:00
Milan Radosavljevic 666e76deac [rocprofiler-systems] Add cached demangler and replace old demangle (#2135)
* Add cached demangler and replace old

* Add unit tests

* Applied suggestions from code review

* Applied suggestions from code review
2025-12-16 08:32:18 +01:00
habajpai-amd 6b45657493 update build rccl-tests infrastructure and add getAlgoProtoChannels support (#2212) 2025-12-11 18:29:06 +05:30
David Galiffi e4bd55b0f0 [rocprof-sys] Update VERSION to 1.4.0 (#2216)
Bumping version to 1.4.0, since `release/rocm-rel-7.2` has branched
2025-12-08 15:15:18 -05:00
Mario Limonciello d1aaae2539 Run pre-commit's whitespace related hooks on projects/rocprofiler-systems (#2123)
In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-04 23:39:42 -05:00
Jason Bonnell 463126770a Update build docker container workflow, opensuse dockerfiles (#1883)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->

- __Reduced Code Duplication__: Version parsing logic moved from individual Dockerfiles to the central build script
- __Improved Edge Case Handling__: Better handling of ROCm versions with and without patch numbers (e.g., `6.2` vs `6.2.0`)
- __Easier Maintenance__: Future version-related changes only need to be made in one place
- __Cleaner Dockerfiles__: Simplified Dockerfiles focus on package installation rather than complex shell logic
- __Updated Platform Support__: Refreshed container matrix to reflect current platform/ROCm version combinations
- __Fix OpenSUSE Docker Generation__: OpenSUSE container generation fails due to a change to the `binutils-gold` package
- __Error Handling__: Fix bug where errors in docker image build were being masked, allowing workflow to pass anyway.


## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
- Updated `Dockerfile.opensuse` and `Dockerfile.opensuse.ci` docker files to remove `binutils-gold`
  - Not needed since we build `binutils` with systems anyways
- Updated `rocprofiler-systems-containers.yml` to remove `pushd/popd` commands and just run the shell scripts
  - There was a silent failure observed here, which I verified in this PR before adding the fix for openSUSE
- Refactor ROCm version parsing. Move this logic to the `build-docker.sh` script to reduce duplication.
  - Fix bug that caused ROCm 7.0 to fail installation. The trailing `.0` was being trimmed.
- Fixed inconsistencies in `containers.yml` that lead to invalid ROCm-OS_VERSION combinations.
- Formatting fixes 
  - Removed trailing whitespace
  - Fix docker build warnings. Use an `=` rather than ` ` when assigning an environment variable.
2025-12-04 23:33:15 -05:00
habajpai-amd 30161885e2 refactor: centralize update_env across binaries with unit test added … (#2029)
* refactor: centralize update_env across binaries with unit test added for testing

* removed unused includes suggested by clangd and small cleanup

* use centralized update_env in argparse as well

* review comments incorporated

* move update_env tests closer to common library

* fix: missing common:: prefix in rocprof-sys-sample

* cmake formatting
2025-12-04 19:24:27 +05:30
Kian Cossettini 43f0a53fb0 Skip transferbench validation if binary is not built (#2136)
Added set(skip_validation TRUE) if transferBench binary not found.
2025-12-03 10:14:16 -05:00
Milan Radosavljevic fddef714a0 [rocprofiler-systems] Add trace_cache unit tests (#2086)
Improve test coverage and reliability of the trace_cache module by adding comprehensive unit tests for all major components.
2025-12-03 09:25:33 -05:00
Milan Radosavljevic 09a9f9e31d [rocprofiler-systems] Improve rocpd writing speed (#2061) 2025-12-03 13:11:15 +01:00
marantic-amd 3b11e01716 Perfetto traces from cached data (#1704)
## Motivation

The idea is to unify the way and place where we store our traces. Current implementation uses `trace_cache` for rocpd traces, but perfetto is in lined inside of each module. This change allows us to have a single point in code where we will collect data, process it and store it in the desired format. This means that we can declutter the code further and have single point of responsibility and single point of failure.

## Technical Details

New `processor` (perfetto_post_processing.cpp) is added to the `trace_cache` which purpose is to use the cached data to populate perfetto tracks. Cache manager is responsible for keeping the instance of this processor and for its lifetime.
2025-12-01 09:59:16 -05:00
Kian Cossettini b506c75f28 [rocprof-sys] Fix roctx wall clock tree, change timemory push/pop to use proper category, and add roctx as valid domain choice (#2062)
When doing this ticket, I also noticed the program would SEGFAULT when ROCPROFSYS_ROCM_DOMAINS=roctx even though the docs tell us we can do this. Went ahead and fixed that.

Also noticed that timemory push/pop in rocprofiler-sdk.cpp was always using category::rocm_marker_api instead of CategoryT. Fixed that as well.
2025-12-01 09:50:58 -05:00