9 Commits

Auteur SHA1 Bericht Datum
Sajina PK 09b8342e22 [Rocprofiler-systems] : Add XGMI and PCIe metrics to the profiling data (#1628)
* Add XGMI and PCIe metrics to the profiling data

Add support for AMD XGMI (GPU-to-GPU interconnect) and PCIe
metrics:
  * XGMI link width in bits
  * XGMI link speed in GT/s
  * Per-link read bandwidth (KB)
  * Per-link write bandwidth (KB)

- Add new categories for PCIe metrics:
  * PCIe link width
  * PCIe link speed in GT/s
  * Accumulated bandwidth (MB)
  * Instantaneous bandwidth (MB/s)

* Fix VCN/JPEG insert logic

* Modify the gpu_metrics struct to accomodate XCP structure

* Add ctest automation for gpu interconnect metrics

* Refactor to move gpu_metrics struct and serialization to another file

* Possible fix for timeout in CI

Fix redundant skip check in ctest
Add xgmi and pcie option in rocprof-sys-avail.

* Change2: Address review comments

Change ctest sampling to avoid timeout
Change variable name and code structuring

* Add option in ctest to run rocprof-sys-run without rewrite

Run transferbench with rocprof-sys-run without sampling

* Change3: Fix sample insert bug and address review comments

xgmi and pci support check
renaming variables
additional hip_api validation in rocpd

* Reduce the load from the trnasferBench sample

The CI builds were timing out when flushing a big temporary file to the
DB: (2720824.23 KB / 2720.82 MB / 2.72 GB)...
2025-11-14 19:42:33 -05:00
Kian Cossettini 348afae1a8 Improve rocprof-sys-avail to report VCN and JPEG metrics on supported devices (#226)
* SWDEV-535445: rocprof-sys-avail shows jpeg_activity even when unsupported

* Added vcn tracking

* jpeg and vcn description now includes supported gpus

* Add getter methods per device to check vcn and jpeg support

Add logic to check if vcn activity and vcn busy values are supported for each device.
Add logic to check if jpeg activity and jpeg busy values are supported for each device.

Co-authored-by: Sajina P Kandy <sputhala@amd.com>

* Add getter methods per device to check vcn and jpeg support (#228)

* Formatting

* Variable fix

* List of supported GPUs are now ordered

* Removed the ability to see which gpu supports jpeg and vcn activity to reduce clutter

* Formatting

* Testing for busy support

* jpeg and vcn only show if supported

* Removed commented code

* Formatting

* Applied amd_smi cpp/hpp fixes

* Added break condition for xcp loop

* Modified loops for efficiency

* Removed unneccessary macro

* Removed unneccessary includes

---------

Co-authored-by: Sajina Kandy <sputhala@amd.com>
Co-authored-by: Sajina PK <Sajina.PuthalathKandy@amd.com>

[ROCm/rocprofiler-systems commit: 0380cf58ba]
2025-06-09 16:14:53 -04:00
David Galiffi bd0eeb9555 Reapply "Upgrade ROCm-SMI to AMD SMI (#86)" (#147)
* Reapply "Upgrade ROCm-SMI to AMD SMI (#86)"

This reverts commit 9fcea73122.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 85bbea4954]
2025-03-25 17:31:27 -04:00
David Galiffi 9fcea73122 Revert "Upgrade ROCm-SMI to AMD SMI (#86)" (#100)
This reverts commit 8c5db3f1d8.

[ROCm/rocprofiler-systems commit: b3eee295dd]
2025-02-07 11:45:26 -05:00
cfallows-amd 8c5db3f1d8 Upgrade ROCm-SMI to AMD SMI (#86)
* Integrating amd-smi into rocprofiler-systems due to rocm-smi deprecation.
* No functionality changes to users other than naming conventions.
* New tracks available in perfetto- gpu busy percentage metrics now splits gfx busy into separate gfx, umc, and mm engine measurements.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: 0c32dfd6bc]
2025-01-30 21:32:07 -05:00
Peter Park 3f9a3861ac Update copyright year to 2025 (#83)
[ROCm/rocprofiler-systems commit: 0a15d355e0]
2025-01-29 16:53:16 -05:00
David Galiffi b29cfac106 Update to use rocprofiler-sdk (#55)
- Renames the CMake option "ROCPROFSYS_USE_HIP" to "ROCPROFSYS_USE_ROCM"
- Remove the "ROCPROFSYS_USE_ROCM_SMI option. Controlled with the "ROCPROFSYS_USE_ROCM" option, instead.
   - Runtime configuration can still toggle ROCPROFSYS_USE_ROCM_SMI to disable the sampling.
- Rename ROCPROFSYS_HIP_VERSION macro to ROCPROFSYS_ROCM_VERSION and remove blocks for `ROCPROFSYS_ROCM_VERSION < 60000`
- Remove ROCPROFSYS_USE_ROCTRACER and ROCPROFSYS_USE_ROCPROFILER
- Update test cases
- Update docker files and workflows to install cmake 3.21, which is required for the rocprofiler-sdk findPackage script.
- Removed rocm-6.2 from workflows due to a rocprofiler-sdk API change. 

[ROCm/rocprofiler-systems commit: 88aa2d3cbe]
2024-12-13 18:48:39 -05:00
David Galiffi 489eda995d Rename Omnitrace to ROCm Systems Profiler (#4)
The Omnitrace program is being renamed. 

Full name: "ROCm Systems Profiler"
Package name: "rocprofiler-systems"
Binary / Library names: "rocprof-sys-*"

---------
Co-authored-by: Xuan Chen <xuchen@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>

[ROCm/rocprofiler-systems commit: d07bf508a9]
2024-10-15 11:20:40 -04:00
Jonathan R. Madsen b2bedda138 restructure libomnitrace + tasking and omnitrace-causal updates (#237)
* restructured libomnitrace

- this is necessary to incorporate some of the binary analysis capabilities into omnitrace exe
- created libomnitrace-core (static)
- created libomnitrace-binary (static)
- created libomnitrace (static)
- omnitrace-avail links to libomnitrace.a
- omnitrace-critical-trace links to libomnitrace.a
- tweaked the testing
  - reduced verbosity on some of MPI tests
  - excluded trace-time-window from tests on Ubuntu 18.04
  - reduced causal e2e iterations
- minor tweak to tasking
  - manually create `PTL::UserTaskQueue` instance instead of relying on `PTL::ThreadPool` to create it

* Update formatting workflow

- source formatting uses ubuntu-22.04
- check-includes doesn't generate false positive for 'include "timemory.hpp"'

* omnitrace-causal --generate-configs

- fix config generation in omnitrace causal
- add test for omnitrace-causal + generating configs

* Fix omnitrace-object-library build

- accidentally included rocm sources in non-rocm builds

* Fix rocm compilation w/o rocprofiler

* update timemory submodule with mpi_get warning messages

* sampling offload file updates

- more verbose messages
- disable offload before stopping

* testing updates

- increase causal e2e iterations to 12
- increase lock_environment verbose to 2 (for sampling offload messages)
- fix return for omnitrace_add_validation_test

[ROCm/rocprofiler-systems commit: e7d3125459]
2023-02-04 10:59:50 -06:00