116 Commity

Autor SHA1 Wiadomość Data
Kian Cossettini 698ac6b8bc [rocprofiler-systems] Add build option for "examples" to specify gfx-arch (#2626)
## Motivation
 - Added `check_rocminfo` function that returns true if the provided regex was found, false otherwise. Can also use `GET_OUTPUT` to get the raw output filtered with or without a regex.
 - Moved `rocprofiler_systems_get_gfx_archs()` to `MacroUtilities.cmake` 
 - Added `rocprofiler_systems_lookup_gfx()`, which detects whether a given `gfx` is from the `instinct`, `radeon` or `apu` family.
 - Added `ROCPROFSYS_GFX_TARGETS` as a build argument. Used to specify the offloading architectures that GPU examples should compile for. If empty, defaults to whatever your system has.
 - GPU examples now check if the given `gfx` targets (from `ROCPROFSYS_GFX_TARGETS`) are supported.
 - OMPVV offload tests now only compile if `amdflang` version is `>= 20`
 - Improve link time by reducing the number of GFX targets that binaries need to support.
   - RCCL is now passed a `GPU_TARGETS` var specifying the architectures to build/link against.
2026-01-20 12:13:21 -05:00
Kian Cossettini 9f014db6a4 [rocprofiler-systems] Update install path for examples (#2625)
* Update install path for examples to `share/rocprofiler-systems/examples`

----

Co-authored-by: Kian Cossettini <Kian.Cossettini@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-15 21:51:16 -05:00
David Galiffi 2daec0e4d0 Revert 63713f01e0 (#2585)
## Motivation

<!-- Explain the purpose of this PR and the goals it aims to achieve. -->
Remove Fortran example due to Palamida scan violation.

## Technical Details

<!-- Explain the changes along with any relevant GitHub links. -->
Revert 63713f01e0.
New test to be added later.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2026-01-12 23:44:26 -05:00
David Galiffi cb17e59a57 [rocprofiler-systems] Improve build time by refactoring RCCL test cmake (#1656)
Improve cmake configuration time by making sure the rccl-tests are built during the build phase rather than the configuration phase.
2026-01-07 19:51:54 -05:00
marantic-amd ba1380a75d Put cached perfetto traces as default one (#2138)
* Put cached perfetto traces as default one

* Improve cached data and perfetto traces in order to be more aligned with E2E tests

* Addressing PR comments and findings

* Force early instrumentation bundle instantiation

* Sync-up insturumented containers with thread growth data

* Revert ompvv number of host threads to default 8

* Fixed counter track namings for amd-smi

* AIPROFSYST-34 [rocprof-sys] Update documentation describing newly introduced changes to default tracing mechanism
2025-12-22 12:47:35 +01:00
habajpai-amd 6b45657493 update build rccl-tests infrastructure and add getAlgoProtoChannels support (#2212) 2025-12-11 18:29:06 +05:30
Mario Limonciello d1aaae2539 Run pre-commit's whitespace related hooks on projects/rocprofiler-systems (#2123)
In order for pre-commit to be useful, everything needs to meet a common
baseline.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
2025-12-04 23:39:42 -05:00
Kian Cossettini 63713f01e0 [rocprofiler-systems] Add Fortran MPI CTests (#1172)
* Add MPI CTests (use gfortran)

* Add proper regex check

* Skip Runtime-Instrument due to incompatibility with MPI

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Sajina Kandy <sputhala@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-11-27 10:32:09 -05:00
anujshuk-amd 85b5c03f36 [rocprof-sys] Fix test build failure on RHEL 10 (#1955)
## Motivation

To solve: SWDEV-566076 
FFmpeg versions >= 58.134 no longer expose read_seek and read_seek2 function pointers in AVInputFormat,
requiring alternative seek detection methods. This pull request updates the `VideoDemuxer` class to improve compatibility with newer versions of FFmpeg. The main change is how the code determines whether the input file is seekable, addressing differences in FFmpeg API versions.


## Technical Details

In `video_demuxer.h`, added a conditional check for `USE_AVCODEC_GREATER_THAN_58_134` to set `is_seekable_` to `true` for newer FFmpeg versions, since `read_seek` and `read_seek2` are no longer exposed in `AVFormatContext`. For older versions, the previous method of checking these fields remains in place. The conditional compilation
now assumes seek capability is available for newer FFmpeg versions.
2025-11-25 15:25:05 -05:00
habajpai-amd 1a3564a51a [rocprof-sys] Fix fork() handling for GPU profiling and AMD SMI (#1930)
- Fix fork() handling for GPU profiling and AMD SMI
- Add hipMallocConcurrency test for CI with GPU
2025-11-24 09:21:27 -05:00
Sajina PK 09b8342e22 [Rocprofiler-systems] : Add XGMI and PCIe metrics to the profiling data (#1628)
* Add XGMI and PCIe metrics to the profiling data

Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
  * XGMI link width in bits
  * XGMI link speed in GT/s
  * Per-link read bandwidth (KB)
  * Per-link write bandwidth (KB)

- Add new categories for PCIe metrics:
  * PCIe link width
  * PCIe link speed in GT/s
  * Accumulated bandwidth (MB)
  * Instantaneous bandwidth (MB/s)

* Fix VCN/JPEG insert logic

* Modify the gpu_metrics struct to accomodate XCP structure

* Add ctest automation for gpu interconnect metrics

* Refactor to move gpu_metrics struct and serialization to another file

* Possible fix for timeout in CI

Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.

* Change2: Address review comments

Change ctest sampling to avoid timeout
Change variable name and code structuring

* Add option in ctest to run rocprof-sys-run without rewrite

Run transferbench with rocprof-sys-run without sampling

* Change3: Fix sample insert bug and address review comments

xgmi and pci support check
renaming variables
additional hip_api validation in rocpd

* Reduce the load from the trnasferBench sample

The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...
2025-11-14 19:42:33 -05:00
ajanicijamd 2f9017f706 Fix build failure with Clang 20. (#1667)
* Modified for Clang

* Updated timemory version so it compiles with Clang 20

* Using TBB version 2018.6 for both GCC and Clang builds
2025-11-08 11:36:12 -05:00
Kian Cossettini f4d0aeb8f3 Adjust host thread count for OpenMP-VV tests (#1742)
Reducing test time
2025-11-06 16:04:47 -05:00
Kian Cossettini 2a080641a1 [rocprofiler-systems] Consolidate CTests to tests/ folder (#1461)
* Consolidate CTests to tests/ folder

* Remove comment

* Consolidate CTests to tests/ folder

* Remove comment

* Separate source code and test code for thread-limit into appropriate folders

* Remove sleeper.cpp and instead use linux sleep cmd

* Merge python-console tests into python-tests
2025-11-03 11:03:35 -05:00
Kian Cossettini db949445c3 [rocprofiler-systems] Overhaul OpenMP-VV Test compilation (#1389)
* Reworked Compilation

* Formatting

* Change compile log name

* Optimize Code

* Remove gfx940 and gfx941
2025-10-23 13:58:11 -04:00
habajpai-amd 74fc268a32 Add libomptarget discovery to prevent OpenMP/HIP segfaults (#1043)
This PR fixes a segmentation fault seen when running rocprof-sys-sample with multi-process OpenMP/HIP applications.
The crash was caused by missing libomptarget.so on the runtime loader path or incorrect LD_PRELOAD settings.

Fixes SWDEV-552804

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-10-01 09:51:26 -04:00
Julian Jose 8157437273 [Palamida scan] SWDEV-553054 Adding missing copyrights information (#900)
* Add missing copyright headers in rocprofiler-systems
* Update python-tests
* Update causal test

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-09-12 14:17:58 -04:00
Kian Cossettini 5d582fcd37 [rocprofiler-systems] Add Fortran OpenMP CTests (#874)
* Added Fortran (amdflang) openmp tests using the openmp-vv project

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
2025-09-12 09:52:16 -04:00
habajpai-amd 1c7293e6d0 Add bounds checks in transpose_a for both load and store so edge tiles dont read/write past MxN (#950) 2025-09-12 17:32:30 +05:30
habajpai-amd fb6fe518e8 fix(transpose): correct host allocation and GB/s calculation (#860) 2025-09-04 16:08:16 -04:00
habajpai-amd cd729ab630 Improve library discovery in openmp-target example (#792)
cmake(openmp/target): make libomptarget discovery robust across ROCm layouts
2025-08-28 14:55:55 -04:00
Kian Cossettini 07a7b9b845 Use rocprofiler-SDK for OMPT tracing (#702)
Switch to using SDK for OMPT tracing and remove older OMPT code path
2025-08-26 16:54:01 -04:00
David Galiffi 847580dd9e Update minimum_cmake_required to match version used in CI (#679)
- Update minimum_cmake_required to match version used in CI
  - We should match the minimum version that we test against

- Ensure ".S" files are treated as assembly.
2025-08-21 15:56:47 -04:00
habajpai-amd 15fb4943e2 Fix the openmp-target ctest (#300)
- openmp-target: add runtime rpath for libomptarget and update tests
- Handle events not associated with a HIP Stream
  - Kernels from OpenMP target offload are not associated with a HIP stream. Fix handling with the callback record's stream_id is 0

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: c424dac261]
2025-07-31 10:41:27 -04:00
Kian Cossettini df7605835f fork-runtime-instrument ctest fix (#295)
* Fix fork-runtime-instrument ctest failure by adding -fno-inline flag

[ROCm/rocprofiler-systems commit: d1a2deba1f]
2025-07-31 07:59:41 -04:00
Sajina PK aa9b265302 Manually search for rocdecode and rocjpeg libraries in cmake (#294)
* Manually search for rocdecode ad rocjpeg libraries

* Update examples/jpegdecode/CMakeLists.txt

Fix typo.

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: f4e9846e1c]
2025-07-21 14:07:00 -04:00
habajpai-amd 7eb189db84 Add missing <cstring> include for C string functions in RCCL tests (#282)
* Fix: Add missing <string.h> include for C string functions in RCCL tests

* Update examples/rccl/rccl-tests/src/common.h

Yes, confirmed—<cstring> alone works in my environment. Updated the PR

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* clang-format

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 0ec3072e05]
2025-07-16 11:23:50 -04:00
Sajina PK 1892bdef83 Make CMakeFile fixes to align with rocDecode and rocJpeg changes (#281)
[ROCm/rocprofiler-systems commit: be06384250]
2025-07-14 19:23:19 -04:00
Pranjal Swarup 0497b7934f Update RCCL-tests in examples folder (#261)
- Create a local copy for ROCm/rccl-tests for our examples.
- Update argument parsing to no longer use getopt_long.
- Workaround for Dyninst instrumentation.

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 4e5029221b]
2025-06-27 11:44:13 -04:00
anujshuk-amd 0c91a0d8ed Add ctests to verify roctx api (#260)
---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 3631362903]
2025-06-25 14:01:04 -04:00
David Galiffi 8fcf3a50b0 Use gersemi for CMake formatting (#257)
* Replace `cmake-format` with `gersemi`

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Remove .cmake-format.yaml

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update workflow to use gersemi

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CONTRIBUTING.md

* Update helper scripts

* Don't include `*/external/*` in workflows

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 122623a929]
2025-06-22 10:44:33 -04:00
David Galiffi 0403aaa97f Use clang-format-18 for source formatting (#256)
* Updating clang-format to v18

- Updates the pre-commit-config
- Formats source files according to the utility

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update format source workflow

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CONTRIBUTING

* Update comment in .clang-format

* Update CONTRIBUTING.md

* Update helper script

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 1e13b590e7]
2025-06-22 08:48:08 -04:00
Daniel Su f155355b33 Add thread header to videodecode example (#252)
[ROCm/rocprofiler-systems commit: 475d6c0f1f]
2025-06-18 12:40:24 -04:00
David Galiffi db21150ab0 Fix OpenMP-Target ctest (#241)
Test is missing from rocm-7.0 stack because of a HIP version check.
In these builds, hip_version.h is still reporting 6.5.0.
This check was originally put in to skip the test on older versions
of ROCm, which should no longer be required

- For SWDEV-537718

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 28bee27253]
2025-06-11 16:00:48 -04:00
Sajina PK 916aac1e92 Enable MPI tracing for Fortran (#185)
- Move the MPI gotcha functionality from Timemory to the repo.
- Add the PMPI Fortran MPI functions to the existing mpi gotcha handle.

[ROCm/rocprofiler-systems commit: 4fcd8cc78d]
2025-06-04 18:06:18 -04:00
David Galiffi 172fc80443 Fix openmp-target test on gfx950 (#204)
for SWDEV530112

[ROCm/rocprofiler-systems commit: 45b600b34e]
2025-05-12 14:20:24 -04:00
David Galiffi 29a312f7df Fix build failure for the openmp-target example when building in docker. (#197)
- Add cmake formatting rule for `rocprofiler-systems-custom-compilation`
- Resolve build failure observed with the `openmp-target` example when building in some environments
  - Observed in our docker images
  - Ensure `libomptarget-amdgpu-gfx*.bc` files are found


[ROCm/rocprofiler-systems commit: 96b31d92c3]
2025-05-05 21:49:46 -04:00
Sajina PK b5b5bac8b2 Updates to rocDecode and rocJPEG samples (#166)
* Edit CMakeList files and include paths for library headers.


[ROCm/rocprofiler-systems commit: e805c0f3e7]
2025-04-16 17:12:21 -04:00
Sajina PK 04fb7e4fe7 RocJpeg cmake and document fixes (#157)
- Fix for rocjpeg sample cmake due to changes in the rocJPEG project
- Fix for rocprofiler-sdk version check - change the format
- Edits to docs for jpeg and vcn activity support - mention that these values may not be supported on all ASICs.

[ROCm/rocprofiler-systems commit: fad3a0d341]
2025-04-09 16:20:02 -04:00
Sajina PK e004775878 Jpeg sample fix for seg fault (#158)
Update the rocJpeg sample used for testing with the latest on from the rocJPEG repo.

[ROCm/rocprofiler-systems commit: 83d6f73f03]
2025-04-03 13:54:32 -04:00
Sajina PK 3ca3d63d5c Fix for excluding JPEG and VCN activity test. (#135)
JPEG activity recording is currently only supported on MI300 serries.
VCN activity is supported in MI100 also but there is a bug currently being fixed by FW.

- Currently only testing the Activity verification tests for MI300
- Also moves the Jpeg image copying code to after the package is found.

[ROCm/rocprofiler-systems commit: e605e5d33f]
2025-03-11 14:12:28 -04:00
Sajina PK 1db0539c30 Add support for rocJPEG API tracing (#116)
- Add rocDecode API Tracing support using domain `rocjpeg_api` in ROCPROFSYS_ROCM_DOMAINS.
- Modify existing `videodecode` and `jpegdecode` ctests to verify API tracing
- Print Perfetto values for easy debugging in verbose mode
- Convert CMake error to a warning and skip building the "decode" examples if requirements are not found

[ROCm/rocprofiler-systems commit: 3bea1d8eac]
2025-02-25 21:14:14 -05:00
Sajina PK 572f9532ef JPEG Activity tracing in Perfetto (#108)
- Add JPEG activity track in perfetto trace
- Add JPEG decode tests to the examples
- Change existing videodecode test to include JPEG testing
- Rename videodecode test file to decode to include jpeg tests too
- Fix a bug in the test which checks for total activity of 0
- Disable rocDecode and rocJPEG samples from the github image files

[ROCm/rocprofiler-systems commit: 59d3399901]
2025-02-21 10:25:01 -05:00
David Galiffi 0c1cf2c7d7 Update the videodecode ctest (#85)
- Copy sample videos and include in install.
- Only using the H26* videos, so the same tests can be used on MI100, which lacks AV1 decode support
- Update test parameters to videodecode test, based on feedback from the rocDecode team.

[ROCm/rocprofiler-systems commit: 5b439d80a0]
2025-01-29 16:53:16 -05:00
Peter Park 3f9a3861ac Update copyright year to 2025 (#83)
[ROCm/rocprofiler-systems commit: 0a15d355e0]
2025-01-29 16:53:16 -05:00
Sajina PK b502b03be2 Add tests to check VCN activity tracing in perfetto output (#75)
The test will run sampling on the example from the rocDecode repo:
https://github.com/ROCm/rocDecode/tree/develop/samples/videoDecodeBatch

Then, ensure that the Perfetto output captures VCN activity in the trace.

[ROCm/rocprofiler-systems commit: 6dd4938b3e]
2025-01-29 16:53:16 -05:00
darren-amd 5a6f6ed83d Allow disabling of openmp examples (#64)
Add flag to disable openmp examples

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

---------

Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 888b9a43a0]
2024-12-16 14:31:56 -05:00
David Galiffi 6a6fd7f0f9 OMPT Target Offload Support (#17)
- Porting from https://github.com/ROCm/omnitrace/pull/411
- Improve OMPT support
- Add OpenMP target example to testing
- Update Timemory submodule to use ROCm/Timemory rather than NERSC/Timemory
- Update `actions/upload-artifacts` to v4
- Standardize the `cmake_minimum_required` to 3.18.4 across workflows, project, and examples
- Updated Ubuntu 20.04 workflows

[ROCm/rocprofiler-systems commit: 7dce5926a7]
2024-11-07 16:49:32 -05:00
David Galiffi 181a782835 Rename some examples still using the "omni" prefix. (#8)
* Rename some examples still using the "omni" prefix.

* CMake formatting

[ROCm/rocprofiler-systems commit: 656b34b61f]
2024-10-17 14:52:09 -04:00
David Galiffi 489eda995d Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed. 

Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"

---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: d07bf508a9]
2024-10-15 11:20:40 -04:00