2
0

109 Cometimentos

Autor(a) SHA1 Mensagem Data
vedithal-amd 996202f560 [rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894)
* Backport documentation changes from ROCm 7.1 release branch

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review comments

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 17:22:41 -05:00
ggottipa-amd 77f7541755 [rocprofiler-compute] Adding --torch-trace option for SWDEV-559789 (#2089)
* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.

* --torch-operators flag added to rocprofiler-sdk

* Adding ctest for --torch-operators.

* Adding pytest markers.

* Corrections in ctest and message logging.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Adding a check for pytorch installation only when --torch-operators is passed.

* moving inject_roctx.py into src/utils.

* rebase

* Updating docs and changelog.

* Update projects/rocprofiler-compute/src/argparser.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removing special characters.

* Minor corrections.

* Setting default value for torch_operators_enabled.

* Updating the number of files according to the number of passes.

* Adding rocpd support.

* Adding a warning message to be shown when profiling a non-python workload.

* copilot suggestions, rocpd+native tool fix

* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()

* ruff format fix

* ruff formating

* Deleting torch_trace.csvs after consolidating the operator data.

* Removing checks since *torch_trace.csv files are deleted.

* Fixing file deletion.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/tests/test_profile_general.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Using default options in the testcase.

* Adding test for overhead measurement.

* Corrections in docs.

* doc updates.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Handling potential empty frames.

* Corrected the test cases.

* Changing the flag to --torch-trace

* Fixed helper_app path issues

* Path issues

* process_torch_trace_output() now takes csv file paths as input + allows default usage.

* Replaced pandas with sqlite3

* Adding marker_trace extraction to rocpd_data.py

* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.

* Modified help section for the flag.

* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removed the accesses to the tables.

* Ruff fixes.

* ruff

* Ruff Fixes

* Adding getattr for args.torch_trace to handle mock args.

* Fix for 'Missing guid in counter collection data - in csv mode'

* Sending output_format to process_torch_trace_output

* Warning for self contained binaries.

* Ruff

* Ruff

* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.

* Minor fixes in input arguments

* Ruff

* Loging PyTorch version

* Fix ruff formatting for PyTorch version logging

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 19:50:25 +05:30
vedithal-amd aa5dfb98f9 MI350 Fix L2 cache to HBM read counters/metrics (#2501)
* Fix rocprofiler-sdk metrics definition

* Use TCC_EA0_RDREQ_128B instead of TCC_BUBBLE counter for L2 cache to
  HBM counters and metrics

* Update MI350 counter definitions
    * FETCH_SIZE
    * BANDWIDTH_EA

* Update MI350 metrics definitions
    * System Speed of Light, L2-Fabric Read BW
    * Roofline Plot Points, AI (Arithmetic Intensity) HBM
    * Roofline Performance Rates, HBM Bandwidth

* Remove redundant definition for gfx950 and fix BANDWIDTH_EA definition

Test HBM bandwidth metric for memcopy workload

* Add memcopy.cpp workload

* Add metric validation test suite to validate HBM Bandwidth metric for
  memcopy workload

* Move gpu_soc() to test_utils.py for better re-usability

* Update TUI analysis config

* Fix hbm bandwidth formula for mi350 in calc_ai_profile

Co-authored-by: Alysa Liu <Alysa.Liu@amd.com>
2026-01-23 15:56:24 -05:00
jamessiddeley-amd 69281bbcf4 [rocprofiler-compute] Threshold Based Clamping in Analyze Stage (#2565)
* add threshold clamping function + parse in parser.py (with I/O)

* implemented hybrid threshold solution

* update changelog

* removed absolute threshold hybrid approach; restored relative threshold + warn

* edited warning msg, threshold -> 1%

* update changelog

* added 2 test cases

* ran master workflow yaml config files

* added to FAQ

* Revert "ran master workflow yaml config files"

This reverts commit 75a670e14d6f1619ebbda0ec218755ccbe0d22b1.

* update FAQ

* update config hashes

* Broke down long functions into Class with sub-functions

* ruff format

* addressed comments
2026-01-23 00:54:54 -05:00
vedithal-amd 4a5cbbfba5 [rocprofiler-compute] Fix kernel/dispatch filtering (#2479)
* Fix kernel/dispatch fitlering in GUI

* Disallow --kernel and --dispatch filtering in analyze --gui mode since
  GUI frontend offers dropdown menu for kernel and dispatch filtering
    * Update CHANGELOG and documentation

* Gracefully handle N/A values

* Ensure workload path is valid before using it in GUI

* Ignore kernel filters if dispatch filters provided

* Add documentation for dispatch filtering overriding kernel filtering

* Fix typo

* Fix documentation

* remove unnecessary whitespace

* Address review comments

* Allow kernel/dispatch filtering with --gui

* Address review comments

* Address review comments

* Update CHANGELOG

* Fix formatting
2026-01-20 10:02:31 -05:00
vedithal-amd f64d8e0f43 [rocprofiler-compute] Improve native tool discovery and partition detection (#2630)
* Improve native tool discovery and partition detection

- Enhanced native tool path resolution to support CMAKE_INSTALL_LIBDIR variations
  (lib, lib64, lib32, etc.) using glob pattern matching
- Extracted path variables to avoid duplication in error messages
- Improved error message clarity by showing exact paths searched for .so and .cpp files
- Simplified code path construction using consistent Path.resolve().parents[x] syntax

- Fixed redundant partition warnings on pre-MI300 GPUs by adding architecture check
- Only query compute/memory partition on MI300+ series (gfx940+)
- Added proper type hints for gpu_arch parameter
- Moved gpu_info extraction after soc_info to ensure gpu_arch is available
- Improved code comments for MI300 series threshold

* Handle gpu arch like a hex string
2026-01-16 10:36:19 -05:00
vedithal-amd 51ba3c3a53 [rocprofiler-compute] Standalone roofline should create HTML instead of PDF (#2535)
* Standalone roofline should create HTML instead of PDF

* Eiminate the dependency on kaleido and plotly_get_chrome by moving
  towards plotly native HTML image roofline chart generation

* Address review comments
2026-01-09 09:05:49 -05:00
vedithal-amd 9c1560b8bb [rocprofiler-compute] Fix merging logic for multi process (#2445)
* Fix merging logic for multi process

* Fix dispatch id reset logic in case of rocpd format

* Fix kernel id reset logic in case of csv format

* Revert correlation logic change in csv format

* Do inner join instead of left join
2025-12-27 09:47:42 -05:00
vedithal-amd 588773f9bf [rocprofiler-compute] Fix for multi process workload profiling (#2418)
* Fix for multi process workload profiling

Native counter collection tool updates:
    * Do not dump empty counter data for a process
    * Use PID instead of UUID for dumped csv files to facilitate correlation
    * Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
      native tool) files
    * Handle merging multiple pairs of csv (from sdk tool) and csv (from
      native tool) files

Rocpd output format updates:
    * Merge multiple rocpd databases into a single csv
    * Reset dispatch id and kernel id for unique dispatches and unique
      kernels respectively
    * Retain multiple rocpd databases per run for multi process workloads

* Add test case for multiprocess profiling using rocflop workload

* Add rocflop

* Fix native counter csv to rocprofv3 csv conversion

* Use kernel_id instead of dispatch_id to correlate native counter csv
  and kernel trace csv

* python formatting using ruff 0.14 instead of 0.13
2025-12-23 13:12:18 -05:00
vedithal-amd e4abee4f7d [rocprofiler-compute] Improve iteration multiplexing code and documentation (#2080)
* Improve Iteration multiplexing

* Improve iteration multiplexing documentation by adding usage note and
  listing caveats

* Bugfixes for iteration mulitplexing
    * Use merge iteration multiplexing in analysis webui and db mode
    * Do not remove Dispatch_ID column in merge iteration multiplexing
      since it is needed for analysis of top dispatches based on
duration

* Bugfixes for analysis logic
    * Graceful handling of missing counters in case of iteration
      multiplexing
    * Improved warnings when metrics could not be calculated due to
      missing counter data
    * Fix the check to prevent showing table when a column is full of
      N/A
    * Improve detection of empty values when metric evaludation fails
      due to missing counter data

* Bugfixes for profile logic
    * Fix kernel filtering during roofline benchmark phase

* Update changelog for bugfixes

* Remove unnecessary columns when merging dispatches for iteration multiplexing

* bugfix

* Better analysis warnings

* fix to_std() in parser

* Use median in merge iteration multiplex

* Address review comments

* Fix cmake formatting

* fix None handling of parser util functions

* Enable stochastic counter accuracy test

* fix cmake formatting
2025-12-18 11:51:21 -05:00
xuchen-amd c738e73d99 [rocprofiler-compute][tui] menu bar lag fix (#1942) 2025-12-16 17:02:27 -05:00
vedithal-amd 793732a04e [rocprofiler-compute] Improve amdsmi interface (#2245)
* Improve amdsmi interface

* Fix issue where max mem clock was being set as max gfx clock

* Handle the case when all device handles might not be usable due to
  devices being hidden by ROCR and HIP environment variables

* Fix get gpu vram size to return str in KB

* Improve testing of amdsmi interface functions
2025-12-12 09:02:37 -05:00
jamessiddeley-amd 8f452d29df [rocprof-compute] Update Docs 7.2 + Dual Issue Detection (#2160)
* modified changelog for docs updates 7.2

* update documentation for 7.2

* update FAQ wording

* Update projects/rocprofiler-compute/docs/reference/faq.rst

Co-authored-by: cfallows-amd <Carrie.Fallows@amd.com>

* addressed comments

* fixed header for 'On MI350 and newer platforms'

* Update projects/rocprofiler-compute/src/rocprof_compute_soc/analysis_configs/gfx950/1100_compute_units_compute_pipeline.yaml

Co-authored-by: cfallows-amd <Carrie.Fallows@amd.com>

* ruff format

---------

Co-authored-by: cfallows-amd <Carrie.Fallows@amd.com>
2025-12-11 14:23:34 -05:00
vedithal-amd 252a5e8146 [rocprofiler-compute] Remove TCP_TCP_LATENCY_sum counter for MI300 (#2174)
* Remove TCP_TCP_LATENCY_sum counter for MI300

* Remove TCP_TCP_LATENCY_sum counter which is unsupported for MI300 per register specification

* Remove VL1 Lat metric from memory chart section (block 3) for MI 300
  since it uses TCP_TCP_LATENCY_sum counter which is unsupported

* Remove references to TCP_TCP_LATENCY_sum

* Update CHANGELOG

* reword changelog
2025-12-10 09:41:46 -05:00
cfallows-amd 9d34098350 [rocprofiler-compute] Roofline runtime compilation patch (#2232)
* Add install into CMakeLists.txt file- resolves 'no hip module' issues.
* Readd printout line for peak VALU during benchmarking removed on accident in a different commit.
* Add CHANGELOG entry for commit 2bfa9a4 ("Integrate roofline benchmark into rocprof-compute (#2015)")

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Run formatter checks on rocprof-compute to clear PR checks

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update benchmark.py link in changelog

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestions to CHANGELOG from code review

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-12-10 01:44:28 -05:00
vedithal-amd 2700ca0287 Update changelog to reflect 7.2 cherry-pick (#2137) 2025-12-02 17:26:17 -05:00
abchoudh-amd fd61b0f507 Add CU Utilization and deprecate Active CUs (#1822)
* ChangeLog

* Deprecation notice in old arch

* Deprecation notice current arch

* New config hash

* Added Config deltas

* Added metric description
2025-11-28 11:32:25 -05:00
vedithal-amd 2e10041210 Fix sL1D values in memory chart (#2037) 2025-11-27 09:13:19 -05:00
vedithal-amd 6540155c9d Bugfixes (#1971)
* Implement AMDGPU driver info and GPU VRAM attributes in system info.
  section of analysis report.

* Backward compatibility for rocprofiler-sdk avail module path migration

* Fix roofline calculation where AI data points are N/A
2025-11-21 10:54:25 -05:00
ggottipa-amd bf521c996b Correcting peak VALU Roofline profiling and analysis by removing FP8 VALU and BF16 VALU benchmarking. (#1442)
* Removing FP8 from peak VALU datatypes - PEAK_OPS_DATATYPES.

* Similar change for BF16.

* Roofline binaries from rocm-amdgpu-bench generated 10/22.
https://github.com/ROCm/rocm-amdgpu-bench/commit/2113ef1f5eada8a4a6e44e6d07fd6abac9b0a3f8
Bins include change that removes FP8 and BF16 peak VALU benchmarks.
Built and tested on rhel8, azl3, ubuntu22.04, sles15sp6.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Re-committing the bins

accidentally copied over bins from the wrong folder earlier, caught by Gowthami during testing.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Updated changelog

* gersemi fix

* Changelog corrected.

* Changelog fix.

* Adding this to the 7.2.0 section to be picked up in an RC build.

* Moving changelog entry into unreleasesd section - team reconfirmed cutoff date after I requested this change so I am just quickly correcting my mistake in my ask.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com>
2025-11-20 12:46:48 +05:30
abchoudh-amd 433af908a6 Iteration multiplexing in rocprof-compute (#1533)
* Profile with multiple input files

* Iteration multiplexing kernel option

* Iteration multiplexing data

* Iteration multiplexing

* Counter profile caching

* Counter dispatch info

* Sanitize CLI args

* Formatting and removed unused header file

* Formattng

* Changed CLI args

* Merge counters for analysis

* Iteration multiplexing log while analysis

* Formatting

* Log

* Guard against incomplete profiling

* Fixed merge counter

* Tests

* Update doc

* Test update

* Fixed formatting

* Test fix

* Merge conflict commit

* Fix tests

* Added comment for counter definition file

* Do not allow dispatch filtering with iteration multiplexing

* Fixed formatting

* Doc indentation update
2025-11-19 21:09:16 +05:30
abchoudh-amd 76ea35787d Split roofline tests, and fix none outputs (#1913)
* Split roofline tests

* Use N/A for missing values

* Test eval_expression for no valid data

* Fixed tests

* Updated Changelog for N/A

* Fixed platform specific test failure
2025-11-19 15:36:08 +05:30
vedithal-amd ae8f72fa79 [rocprofiler-compute] Use native tool for counter collection (#1212)
* Use native tool for counter collection

* Add native counter collection tool which uses rocprofiler-sdk C++
  library public API to get counter collection data
    * This is enabled by default, unless --no-native-tool option is
      provided or ROCPROF=rocprofv3 env. var. is provided
    * This tool is only supported for ROCm version >=7.x.x
    * This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
  t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
  collection and data collected by native tool is merged into the
  rocpd/csv output of rocprofiler-sdk tool

* Make `rocpd` choice the default choice for `--format-rocprof-output`
  option
    * If `rocpd` public API from rocprofiler-sdk library is not present,
      then fallback to `csv` choice
    * In this case only `pmc_perf.csv` is written in workload folder
      instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
  functions identical to `csv` option

* Rename option `--rocprofiler-sdk-library-path` to
  `--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
  rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object

* Fix the meaning of `--dispatch` option in `profile` mode to mention
  dispatch iteration filtering instead of dispatch id filtering
    * --dispatch option in analyze mode does dispatch id filtering

* Move standalone binary creation logic from cmake file to docker file

* fix native counter collection tool during attach/detach

* improve logging

* fix attach detach with native tool

* fix attach detach with native tool

* do not support attach/detach in native tool

* Update changelog

* add standalone binary creation functionality in cmake

* address review comments

* address review comments

* fix formatting

* address review comments

* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Updating changelog per docs team recommendations

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Apply suggestions from code review to changelog

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Do not required HIP complier to build native counter collection tool

* fix cmake

* gersemi formatting on latest cmake change

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* gersemi run for formatting

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* include cstdint for uint32_t

* Run formatting on helper.cpp

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr

* fix version

* fix changelog

* fix changelog

* run ruff formatter

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix rocprofiler-sdk attach so path

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-11-18 23:34:38 -05:00
vedithal-amd 44a32e23ac [rocprofiler-compute] Bump version and update changelog ahead of ROCm 7.2 release (#1908) 2025-11-18 10:04:28 -05:00
Pratik Basyal 9d84958527 JSON profiler option removed (#1649) 2025-11-04 17:49:22 +01:00
vedithal-amd bb5fd1d4ae [rocprofiler-compute] Update analysis db for visualizer integration (#1548)
* Analysis db changes for visualizer

* Add support for per kernel analysis metrics

* Add support for dispatch timeline visualiztion

* Show median instead of mean of dispatch duration in kernel view

* Add test case to validate analysis db schema

* Analysis db schema updte
    * Add Kernel table and make Metric and Dispatch table its children
    * Kernel table is a child of Workload table
    * Update metric_view to show kernel_name column
    * Add disptach timestamps to Dispatch table for dispatch timeline
      visualization
    * Update kernel_view to show duration_ns_median instead of mean
      duration

* Add mean duation in kernel view

* update changelog

---------

Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
2025-11-03 09:25:12 -05:00
xuchen-amd b774f28181 [rocprofiler-compute] Remove grafana and mongodb integration (#978)
* Remove grafana and mongodb integration

* Remove grafana documentation assets

* clarify changelog

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-29 11:32:06 -04:00
abchoudh-amd a7bbe0c5d2 Use amd-smi Python API instead of CLI (#1334)
* Use amd-smi Python API instead of CLI

Formatting fix

python path

* Update CHANGELOG

* Create amdsmi interface

* Added amdsmi tests

* Removed run

* Prioritize rocm's amdsmi python API

* address review comments

* update changelog

* fix ruff formatting

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-24 11:11:33 +05:30
vedithal-amd 2a37cbf2ca Bump VERSION and add CHANGELOG for ROCm 7.1.1 release (#1447) 2025-10-23 09:34:18 -04:00
ywang103-amd 9b562c0e58 pc sampling multi kernel (#1382)
* initial commit

* add csv support extraction for non kernel selection mode

* add --kernel-trace for rocprofiler-sdk mode

* make non kernel selective mode runnable

* make kernel selection work with -k

* remove upper case of arg hint

* update documentation

* display same kernel name at only one place and merge instruction id with same obj id as well as offset

* remove kernel name's display for single kernel selection

* change log added

---------

Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
2025-10-23 01:26:08 -04:00
xuchen-amd 578589d363 [rocprofiler-compute] metrics generator (#1199) 2025-10-22 15:17:43 -04:00
cfallows-amd c215ace6c3 Update Roofline binaries with improved flops benchmarking (#1402)
* Update roof bins- rebuild from rocm-amdgpu-bench as of oct15/25

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update CHANGELOG.md

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-10-21 10:14:43 -04:00
Ben Richard 35b07e041f [rocprof-compute] Run roofline test on GPU 0 by default (#1390)
* rocprof-compute: Default roofline to GPU 0

Previously was running the roofline test on ALL GPUs but only
selecting the first entry in the roofline.csv. So even in default
ALL case, GPU 0 was selected.

* Update CHANGELOG.MD

* Use better wording in changelog entry
2025-10-20 16:36:55 -04:00
vedithal-amd ecf0d32644 Update CHANGELOG.md for ROCm 7.1.0 release (#1362) 2025-10-15 14:25:34 -05:00
vedithal-amd bd7a1de879 Remove rocprofv1/v2 in favour of rocprofiler-sdk (#673)
* Set default rocprof interface as rocprofiler-sdk

* Remove rocrprofv1 and rocprofv2 interfaces

* Remove deprecation notice for rocprof v1/v2/v3 interfaces
  * Make rocprofiler-sdk the default interface and make rocprofv3 interface opt-in using ROCPROF=rocprofv3

* Add deprecation notice for rocprofv3
2025-09-24 10:37:01 -04:00
systems-assistant[bot] 872f0aed0c Live attach/detach and its unit tests (#53) 2025-09-23 13:17:08 -04:00
cfallows-amd 9819e1cbfc Refactor roofline binary detection (#933)
* Simplify the roofline binary pickup process by determining which base distribution the system OS is based off of, and select the correct binary.
* Add more OS distribution support to roofline by modifying the detection parameters and adding an AZL binary
* Update changelog to include roofline support additions

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-09-22 12:04:20 -04:00
systems-assistant[bot] 3b5467b746 [DOC] single pass counter collection (#95) 2025-09-16 11:00:11 -04:00
jamessiddeley-amd 62843ed900 [rocprof-compute] Wrap negative values in L2 - Fabric interface detailed metrics (#833)
* update gfx942 soc_config 1700

* add MAX wrapper for Write and Atomic (32B)

* removed trailing whitespace and EA* fix

* added to CHANGELOG.md

* edited changelog
2025-09-11 14:51:58 -04:00
systems-assistant[bot] d58adf96da [DOC] TUI kernel selection (#94) 2025-09-08 13:52:39 -04:00
abchoudh-amd 682ae2d014 Streamline --list-metrics command line option in rocprof-compute (#310)
* Remove L2 channels from --list-metrics

--list-metrics moved to general options

List metrics for the current architecture

Filter blocks for metrics

Removed test for --list-metrics in profile mode

Test the options don't throw error

Fixed --config-dir error

Test stdout for command line options

Provide path list for loading panel configs

Show L2 Cache (per) channel metrics

Changed command line option names

Can show two levels only

Removed filtering blocks

Moved blocks to original position

Removed filter block tests

Removed filtering

Formaating fix

Readability enhancement

Test formatting

Filter L2 channels without sysinfo

Show avilable metrics for current arch

Intermediate commit

Fixed tests

Added argument sanitization

Added list_metrics to ctest

merge iconflict resolution

Updated test marker

Updated changelog

Fixed formatting

* Updated docs
2025-09-08 20:21:46 +05:30
ywang103-amd c9b1ad72a5 scientific notion of memchart(CLI, TUI and GUI) (#764) 2025-09-08 01:30:30 -04:00
cfallows-amd c68ba44e72 Add single kernel filtering to roofline plots (#757)
* Add single kernel filtering for roofline
* Add --kernel to documentation
* Add kernel labels to roofline pdfs

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add test cases

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add autodetect for mode (profile or analyze) during roof validate and filter
Prevent --kernel from affecting roofline in gui mode- although this may be broken in develop branch anyways

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add note about roof-only usage checking for existing profiling files in the dir. If roof-only is not provided, rocprof-compute currently assumes it has to profile in full regardless. Will look into this another day.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update CHANGELOG.md

Add line in resolved issues section to highlight that kernel filtering is now working for roofline plots

* Apply changes suggested by docs team

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update projects/rocprofiler-compute/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-08-27 13:41:07 -04:00
Pratik Basyal cfd3fee0e2 ROCm Compute Profiler changelog for 7.0 updated (#740)
* ROCm Compute Profiler changelog for 7.0 updated

* Roofline limited support for MI350 removed

* Update CHANGELOG.md
2025-08-27 11:24:49 -04:00
vedithal-amd 323d06c79c [rocprofiler-compute] Add database output format to analyze mode (#748)
Analysis data dump

* Add `--output-format` and `--output-name` option to analyze mode

* Remove `--output` and `-save-dfs` option to analyze mode

* Add documentation on `rocpd` output format and analysis database file

* Create sqlite3 database using object relation mapping (ORM) provided
  by sqlalchemy library

* Fix metrics config to remove metrics marked as `null`, fix `Unit` header, add
  missing `title`

* Add test cases to ensure analysis data dump work
2025-08-26 14:15:05 -04:00
xuchen-amd 5c8b34ddf5 [rocprofiler-compute][TUI] Add interactive metric description (#718) 2025-08-25 15:53:55 -04:00
Taylor Ding b5c8c8bcb1 Eval metrics performance optimizations (#435)
Post-analysis eval metrics performance optimizations.
2025-08-22 16:35:48 -04:00
xuchen-amd 0bf66a519c [rocprofiler-compute][TUI] Restructure Performance Metrics (#232) 2025-08-20 17:00:54 -04:00
vedithal-amd 80b7e6baee Backport CHANGELOG change from rc4 cherry pick (#342)
* Backport from https://github.com/ROCm/rocprofiler-compute/pull/860
2025-08-13 14:49:53 -04:00
systems-assistant[bot] d3f9ab25eb Use own counter definition (#91)
* Use own counter definition
  * Do not depend on rocprofiler-sdk counter definition

* Add missing counter definitions for MI100, MI200, MI300, MI350 series
  * Counters added based on register specification
  * This prevents some missing metrics

* Enable SQC_DCACHE_INFLIGHT_LEVEL counter and associated metrics

* Enable TCP_TCP_LATENCY counter and associated counter for all GPUs
  except MI300

* Update TCC_EA_* counters for MI100 to TCC_EA0_*
  * Update MI100 metrics which depend on TCC_EA0_* counters

* Enable accumulation counters for MI100

* Improve rocprof list avail usage to get a better idea of supported
  counters

* Update CHANGELOG

* Move accumulation counters to counter definition

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-08-08 14:39:10 -04:00