50 Коммитов

Автор SHA1 Сообщение Дата
vedithal-amd 996202f560 [rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894)
* Backport documentation changes from ROCm 7.1 release branch

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address review comments

---------

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 17:22:41 -05:00
cfallows-amd 4d7f709510 [rocprofiler-compute] Update baseline comparison notes in documentation (#2878)
* Update baseline comparison with anchor, text, samples, image in CLI page. Fixes broken 404 links after grafana was removed.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update options in list to full name, correct gpu id option.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Formatting and broken intersphinx fixed

* Indentation formatting fixed

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: prbasyal <prbasyal@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2026-01-27 16:04:21 -05:00
ggottipa-amd 77f7541755 [rocprofiler-compute] Adding --torch-trace option for SWDEV-559789 (#2089)
* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.

* --torch-operators flag added to rocprofiler-sdk

* Adding ctest for --torch-operators.

* Adding pytest markers.

* Corrections in ctest and message logging.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Adding a check for pytorch installation only when --torch-operators is passed.

* moving inject_roctx.py into src/utils.

* rebase

* Updating docs and changelog.

* Update projects/rocprofiler-compute/src/argparser.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removing special characters.

* Minor corrections.

* Setting default value for torch_operators_enabled.

* Updating the number of files according to the number of passes.

* Adding rocpd support.

* Adding a warning message to be shown when profiling a non-python workload.

* copilot suggestions, rocpd+native tool fix

* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()

* ruff format fix

* ruff formating

* Deleting torch_trace.csvs after consolidating the operator data.

* Removing checks since *torch_trace.csv files are deleted.

* Fixing file deletion.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/tests/test_profile_general.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Using default options in the testcase.

* Adding test for overhead measurement.

* Corrections in docs.

* doc updates.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Handling potential empty frames.

* Corrected the test cases.

* Changing the flag to --torch-trace

* Fixed helper_app path issues

* Path issues

* process_torch_trace_output() now takes csv file paths as input + allows default usage.

* Replaced pandas with sqlite3

* Adding marker_trace extraction to rocpd_data.py

* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.

* Modified help section for the flag.

* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removed the accesses to the tables.

* Ruff fixes.

* ruff

* Ruff Fixes

* Adding getattr for args.torch_trace to handle mock args.

* Fix for 'Missing guid in counter collection data - in csv mode'

* Sending output_format to process_torch_trace_output

* Warning for self contained binaries.

* Ruff

* Ruff

* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.

* Minor fixes in input arguments

* Ruff

* Loging PyTorch version

* Fix ruff formatting for PyTorch version logging

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 19:50:25 +05:30
vedithal-amd 4a5cbbfba5 [rocprofiler-compute] Fix kernel/dispatch filtering (#2479)
* Fix kernel/dispatch fitlering in GUI

* Disallow --kernel and --dispatch filtering in analyze --gui mode since
  GUI frontend offers dropdown menu for kernel and dispatch filtering
    * Update CHANGELOG and documentation

* Gracefully handle N/A values

* Ensure workload path is valid before using it in GUI

* Ignore kernel filters if dispatch filters provided

* Add documentation for dispatch filtering overriding kernel filtering

* Fix typo

* Fix documentation

* remove unnecessary whitespace

* Address review comments

* Allow kernel/dispatch filtering with --gui

* Address review comments

* Address review comments

* Update CHANGELOG

* Fix formatting
2026-01-20 10:02:31 -05:00
vedithal-amd 0254181f42 [rocprofiler-compute] Analysis Database Schema Improvements (v1.2.0) (#2526)
* Analysis database v1.2.0

* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table

* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table

* Add kernel existence check for roofline data to prevent KeyError (#2536)

* Initial plan

* Add kernel existence check for roofline data to prevent KeyError

Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>

* Optimize analysis performance

* Refactor database schema: separate metric definitions from kernels

Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:

- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
  metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
  MetricValue
  references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams

* Add min amd max aggregation in kernel_view

* Add primary key id from tables into the view

---------

Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
2026-01-19 15:25:43 -05:00
vedithal-amd 51ba3c3a53 [rocprofiler-compute] Standalone roofline should create HTML instead of PDF (#2535)
* Standalone roofline should create HTML instead of PDF

* Eiminate the dependency on kaleido and plotly_get_chrome by moving
  towards plotly native HTML image roofline chart generation

* Address review comments
2026-01-09 09:05:49 -05:00
vedithal-amd 769d3dd67a [rocprofiler-compute] Data imputation strategy for iteration multiplexing (#2468)
* Data imputation strategy for iteration multiplexing

* Implement data imputation methodology to handle missing counter values
  in case of iteration multiplexing

* Enable dispatch filtering with iteration multiplexing since we are no
  longer merging dispatches

* Bugfix to prevent check for missing counter values when using csv
  format when profiling with iteration multiplexing

* Move warning and info message in case of iteration multiplexing to
  sanitize function which comes earlier in analyze mode

* Address review comments

* Fix typo in documentation

* Move profiling config init. after path check in sanitize()

* Graceful handling of dispatches with all counters empty within data
  imputation logic

* Improve info message for iteration multiplexing based analysis

* Ensure proper error message when trying to run iteration multiplexing with attach/detach

* fix test case
2026-01-08 12:01:51 -05:00
vedithal-amd e4abee4f7d [rocprofiler-compute] Improve iteration multiplexing code and documentation (#2080)
* Improve Iteration multiplexing

* Improve iteration multiplexing documentation by adding usage note and
  listing caveats

* Bugfixes for iteration mulitplexing
    * Use merge iteration multiplexing in analysis webui and db mode
    * Do not remove Dispatch_ID column in merge iteration multiplexing
      since it is needed for analysis of top dispatches based on
duration

* Bugfixes for analysis logic
    * Graceful handling of missing counters in case of iteration
      multiplexing
    * Improved warnings when metrics could not be calculated due to
      missing counter data
    * Fix the check to prevent showing table when a column is full of
      N/A
    * Improve detection of empty values when metric evaludation fails
      due to missing counter data

* Bugfixes for profile logic
    * Fix kernel filtering during roofline benchmark phase

* Update changelog for bugfixes

* Remove unnecessary columns when merging dispatches for iteration multiplexing

* bugfix

* Better analysis warnings

* fix to_std() in parser

* Use median in merge iteration multiplex

* Address review comments

* Fix cmake formatting

* fix None handling of parser util functions

* Enable stochastic counter accuracy test

* fix cmake formatting
2025-12-18 11:51:21 -05:00
vedithal-amd 252a5e8146 [rocprofiler-compute] Remove TCP_TCP_LATENCY_sum counter for MI300 (#2174)
* Remove TCP_TCP_LATENCY_sum counter for MI300

* Remove TCP_TCP_LATENCY_sum counter which is unsupported for MI300 per register specification

* Remove VL1 Lat metric from memory chart section (block 3) for MI 300
  since it uses TCP_TCP_LATENCY_sum counter which is unsupported

* Remove references to TCP_TCP_LATENCY_sum

* Update CHANGELOG

* reword changelog
2025-12-10 09:41:46 -05:00
Pratik Basyal 792ecc1a83 Formatting fixed (#1691) 2025-11-27 18:55:45 -05:00
abchoudh-amd 433af908a6 Iteration multiplexing in rocprof-compute (#1533)
* Profile with multiple input files

* Iteration multiplexing kernel option

* Iteration multiplexing data

* Iteration multiplexing

* Counter profile caching

* Counter dispatch info

* Sanitize CLI args

* Formatting and removed unused header file

* Formattng

* Changed CLI args

* Merge counters for analysis

* Iteration multiplexing log while analysis

* Formatting

* Log

* Guard against incomplete profiling

* Fixed merge counter

* Tests

* Update doc

* Test update

* Fixed formatting

* Test fix

* Merge conflict commit

* Fix tests

* Added comment for counter definition file

* Do not allow dispatch filtering with iteration multiplexing

* Fixed formatting

* Doc indentation update
2025-11-19 21:09:16 +05:30
vedithal-amd ae8f72fa79 [rocprofiler-compute] Use native tool for counter collection (#1212)
* Use native tool for counter collection

* Add native counter collection tool which uses rocprofiler-sdk C++
  library public API to get counter collection data
    * This is enabled by default, unless --no-native-tool option is
      provided or ROCPROF=rocprofv3 env. var. is provided
    * This tool is only supported for ROCm version >=7.x.x
    * This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
  t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
  collection and data collected by native tool is merged into the
  rocpd/csv output of rocprofiler-sdk tool

* Make `rocpd` choice the default choice for `--format-rocprof-output`
  option
    * If `rocpd` public API from rocprofiler-sdk library is not present,
      then fallback to `csv` choice
    * In this case only `pmc_perf.csv` is written in workload folder
      instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
  functions identical to `csv` option

* Rename option `--rocprofiler-sdk-library-path` to
  `--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
  rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object

* Fix the meaning of `--dispatch` option in `profile` mode to mention
  dispatch iteration filtering instead of dispatch id filtering
    * --dispatch option in analyze mode does dispatch id filtering

* Move standalone binary creation logic from cmake file to docker file

* fix native counter collection tool during attach/detach

* improve logging

* fix attach detach with native tool

* fix attach detach with native tool

* do not support attach/detach in native tool

* Update changelog

* add standalone binary creation functionality in cmake

* address review comments

* address review comments

* fix formatting

* address review comments

* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Updating changelog per docs team recommendations

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Apply suggestions from code review to changelog

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

* Do not required HIP complier to build native counter collection tool

* fix cmake

* gersemi formatting on latest cmake change

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* gersemi run for formatting

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* include cstdint for uint32_t

* Run formatting on helper.cpp

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr

* fix version

* fix changelog

* fix changelog

* run ruff formatter

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix rocprofiler-sdk attach so path

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-11-18 23:34:38 -05:00
jamessiddeley-amd 42cc721a4b [rocprof-compute] remove references to --kernel-names (#1543)
* remove references to --kernel-names

* ruff format

* remove redundant comments

* update docs and roofline image

* added two output lines to docs
2025-11-10 11:47:39 -05:00
Pratik Basyal 9d84958527 JSON profiler option removed (#1649) 2025-11-04 17:49:22 +01:00
xuchen-amd b774f28181 [rocprofiler-compute] Remove grafana and mongodb integration (#978)
* Remove grafana and mongodb integration

* Remove grafana documentation assets

* clarify changelog

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-29 11:32:06 -04:00
ywang103-amd 9b562c0e58 pc sampling multi kernel (#1382)
* initial commit

* add csv support extraction for non kernel selection mode

* add --kernel-trace for rocprofiler-sdk mode

* make non kernel selective mode runnable

* make kernel selection work with -k

* remove upper case of arg hint

* update documentation

* display same kernel name at only one place and merge instruction id with same obj id as well as offset

* remove kernel name's display for single kernel selection

* change log added

---------

Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
2025-10-23 01:26:08 -04:00
xuchen-amd 578589d363 [rocprofiler-compute] metrics generator (#1199) 2025-10-22 15:17:43 -04:00
Young Hui - AMD 161e44c425 [rocprof-compute] Documentation changes for move to super-repo for 7.1 (#1329)
- also remove json output mention in docs
2025-10-15 15:32:54 -04:00
systems-assistant[bot] 872f0aed0c Live attach/detach and its unit tests (#53) 2025-09-23 13:17:08 -04:00
systems-assistant[bot] 3b5467b746 [DOC] single pass counter collection (#95) 2025-09-16 11:00:11 -04:00
vedithal-amd 85a557673d Handle mutually exclusive report section filters (#710)
* Make --roof-only, --block and --set mutually exclusive from each other

* Update help output and documentation
  * Add sanitize function for checking profiler options

* Update filter blocks arguments when --set or --roof-only is provided

* Update filter_blocks in profiling_config.yaml based on --set option
  * Log Filtered Sections instead of Report Sections and Set Selection

* Move soc class function calls from rocprof compute base class to profiler base class

* Fix bug in panel level filtering using --filter-block option

* Remove roofline specific pmc files

* Move microbenchmark entry point from gfx specific soc class to base soc class

* Run microbenchmarks only if block 4 is selected or roof only is selected; skip for mi100
2025-09-09 17:48:20 -04:00
systems-assistant[bot] d58adf96da [DOC] TUI kernel selection (#94) 2025-09-08 13:52:39 -04:00
abchoudh-amd 682ae2d014 Streamline --list-metrics command line option in rocprof-compute (#310)
* Remove L2 channels from --list-metrics

--list-metrics moved to general options

List metrics for the current architecture

Filter blocks for metrics

Removed test for --list-metrics in profile mode

Test the options don't throw error

Fixed --config-dir error

Test stdout for command line options

Provide path list for loading panel configs

Show L2 Cache (per) channel metrics

Changed command line option names

Can show two levels only

Removed filtering blocks

Moved blocks to original position

Removed filter block tests

Removed filtering

Formaating fix

Readability enhancement

Test formatting

Filter L2 channels without sysinfo

Show avilable metrics for current arch

Intermediate commit

Fixed tests

Added argument sanitization

Added list_metrics to ctest

merge iconflict resolution

Updated test marker

Updated changelog

Fixed formatting

* Updated docs
2025-09-08 20:21:46 +05:30
cfallows-amd c68ba44e72 Add single kernel filtering to roofline plots (#757)
* Add single kernel filtering for roofline
* Add --kernel to documentation
* Add kernel labels to roofline pdfs

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add test cases

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add autodetect for mode (profile or analyze) during roof validate and filter
Prevent --kernel from affecting roofline in gui mode- although this may be broken in develop branch anyways

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add note about roof-only usage checking for existing profiling files in the dir. If roof-only is not provided, rocprof-compute currently assumes it has to profile in full regardless. Will look into this another day.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update CHANGELOG.md

Add line in resolved issues section to highlight that kernel filtering is now working for roofline plots

* Apply changes suggested by docs team

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Update projects/rocprofiler-compute/CHANGELOG.md

Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
2025-08-27 13:41:07 -04:00
Pratik Basyal cfd3fee0e2 ROCm Compute Profiler changelog for 7.0 updated (#740)
* ROCm Compute Profiler changelog for 7.0 updated

* Roofline limited support for MI350 removed

* Update CHANGELOG.md
2025-08-27 11:24:49 -04:00
vedithal-amd 323d06c79c [rocprofiler-compute] Add database output format to analyze mode (#748)
Analysis data dump

* Add `--output-format` and `--output-name` option to analyze mode

* Remove `--output` and `-save-dfs` option to analyze mode

* Add documentation on `rocpd` output format and analysis database file

* Create sqlite3 database using object relation mapping (ORM) provided
  by sqlalchemy library

* Fix metrics config to remove metrics marked as `null`, fix `Unit` header, add
  missing `title`

* Add test cases to ensure analysis data dump work
2025-08-26 14:15:05 -04:00
xuchen-amd 5c8b34ddf5 [rocprofiler-compute][TUI] Add interactive metric description (#718) 2025-08-25 15:53:55 -04:00
jamessiddeley-amd 5840940caa [rocprof-compute] Generalize Roofline (#325)
* per kernel analysis Roofline

* added per-kernel eval_metric calculation with display

* fixed typo

* updated tty.py show_all()

* formatting

* fixed ctest failures and updated equations

* formatting

* updated metric descriptoins

* review tweaks

* update docs

* added roofline gui analysis

* updated GUI docs

* updated print statement

* comment tweaks and ran ruff formatting
2025-08-20 09:58:08 -04:00
vedithal-amd 354fe5f52c Unified configuration for metrics (#726)
* Show description of metrics during analysis
    * Use --include-cols Description show the Description column in analyze mode (this is hidden by default)
    * Remove tips field from analysis config

* Align metric names in analysis config and documentation

* Add unified config utils/unified_config.yaml

* Add python script utils/split_config.py to auto generate analysis configuration and documentation metrics description
   * Add test case to ensure unified config is older than auto-generated config
   * Auto generate analysis config and documentation metrics description

* Update CONTRIBUTING.md to add instructions to build documentation assets
    * Add docker image and compose file to build documentation

* Update CHANGELOG and Documentation

* Use jinja template instead of hardcoding metric tables in documentation

[ROCm/rocprofiler-compute commit: bb44e90b2d]
2025-07-25 14:01:34 -04:00
vedithal-amd d9da3feadf Improve block filtering to accept metric ids (#821)
* Fix tests
* Update CHANGELOG and documentation

[ROCm/rocprofiler-compute commit: a70ae40ddc]
2025-07-23 16:16:29 -04:00
cfallows-amd f6f3a6ed3e Update standalone roofline intro (#830)
[ROCm/rocprofiler-compute commit: 2a7bbc4cc2]
2025-07-23 15:17:00 -04:00
vedithal-amd 46ae3d36d9 Remove hardware IP block based filtering (#820)
* Analysis report block based filtering is the default now

* Update documentation

* Update CHANGELOG

* Fix tests
    * Replace hardware block based filtering tests with report block
      based filtering tests

[ROCm/rocprofiler-compute commit: 98bb0f4237]
2025-07-21 09:37:35 -04:00
Pratik Basyal 7c228474ac Minor editorial changes data type selection feature (#816)
[ROCm/rocprofiler-compute commit: 24c27462d7]
2025-07-16 12:39:24 -04:00
xuchen-amd a91363744c Update TUI docs. (#796)
[ROCm/rocprofiler-compute commit: bfb2dc0795]
2025-07-15 11:13:24 -04:00
Fei Zheng e0ba0631b0 Update cli doc description (#804)
[ROCm/rocprofiler-compute commit: 5b8d12fde2]
2025-07-14 13:05:01 -06:00
Fei Zheng 769caa3124 Update PC sampling doc (#798)
[ROCm/rocprofiler-compute commit: 78c1898ba0]
2025-07-14 13:04:14 -06:00
Pratik Basyal 2cfcd9baab roofline footnote updated (#808)
[ROCm/rocprofiler-compute commit: 81d95d8e4a]
2025-07-14 13:27:43 -04:00
cfallows-amd 429b17a1e0 Add roofline PDF output to general profiling runs (#774)
Change when Roofline PDFs are generated- during general profiling and --roof-only profiling (skip only when --no-roof option is present)

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: 630bc149ff]
2025-06-25 01:19:28 -04:00
vedithal-amd 9e743cdff2 Remove rocscope related code and add deprecation warning for mongo db usecase (#744)
* Remove rocscope related code

* Add deprecation warning for database update mode which is used for grafana and mongodb functionality

[ROCm/rocprofiler-compute commit: cdd41dee40]
2025-06-12 14:21:24 -04:00
Pratik Basyal 1440fe180c Formatting issue in code block and TOC fixed for PC Sampling (#731)
* Formatting issue in code block and TOC fixed

* Performance model reverted

[ROCm/rocprofiler-compute commit: ed05c00103]
2025-06-06 16:16:55 -04:00
Fei Zheng 243aa68712 Support stochastic pc sampling
[ROCm/rocprofiler-compute commit: d756aeb3fd]
2025-06-06 12:43:52 -06:00
cfallows-amd 04919c13e0 Fixes for roofline datatype plot outputs (#659)
Profile mode:
Fix roofline plots for datatypes that have peakVALU only. Check for highest roofline to plot the bandwidth lines to proper height, don't rely on existence of peakMFMA for every datatype.
Analyze mode:
Add roofline-data-type option for viewing pdfs in standalone gui. Default is same as profile mode, FP32.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: c45e20f325]
2025-04-07 12:10:37 -04:00
cfallows-amd 5079a1803f Datatype selection option for roofline (#624)
Added command line option to specify which datatype(s) to capture into the roofline PDF(s).
All datatypes are still collected by roofline call if applicable, but only specific datatypes are plotted into PDF outputs. Will dump out all datatypes into one graph, but separate FP from Int into two graphs if needed. Will skip datatype and give error message if the datatype is not valid on a particular gpu arch.
Default is FP32

Reworked roofline calls and plotting to be general enough such that any new datatypes added into rocm-amdgpu-bench can easily be reflected in rocprof-compute with simple modifications in roofline_calc.py.

Adjusted ctest to reflect expected default pdf outputs from roofline.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: a492e92034]
2025-03-25 15:02:09 -04:00
vedithal-amd 7968645191 Analysis report block based filtering for profiling (#566)
* Analysis report block based filtering for profiling

* Profiling mode changes

- `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23)
    - Only counters mentioned in the selected analysis report blocks will be collected
        - Add parsing logic to identify hardware counters from analysis report blocks
        - Add filtering logic to only write filtered counters in perfmon files
        - Log not collected counters in one line
- `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode
- Write arguments provided during profiling in profiling_configuration.yaml file

* Analysis mode changes

- During analysis mode, only show report blocks selected during profiling
    - If `-b` option is provided in analysis mode, then follow provided filters
- Do not show empty tables in analysis report

* Miscellaneous changes

- Update CHANGELOG
- Add test cases
    - Instruction mix report block filter
    - Instruction mix and Memory chart report block filter
    - Instruction mix report block filter and CPC hardware block filter
    - TA hardware block filter
    - --list-metrics in profile mode should work
- Move binary handler fixtures to conftest.py to avoid importing
  fixtures
- cmake file in tests directory has been updated to compile sample/vmem.hip for testing

* Public documentation changes

- Use the term "Hardware report block" instead of "Hardware block"
- Add documentation for "--list-metrics" option in profile mode
- Add example of filtering by hardware report block such as instruction
  mix and wavefront launch statistics
- Add deprecation warning for hardware component (sq, tcc) based filtering

[ROCm/rocprofiler-compute commit: 55cf0e237e]
2025-03-10 14:42:56 -04:00
cfallows-amd 54ec17a185 FP8 roofline support (#592)
Adding FP8 datatype to roofline feature in rocprof-compute on MI300-based systems.
FP8 now shows in terminal output and roofline csv, and outputs a standalone PDF.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: 848fa1dc18]
2025-03-07 11:27:01 -05:00
Cole Ramos 5a7cb724ce Sync staging with mainline (#524)
* External CI: rename pipeline to rocprofiler-compute (#463)

Signed-off-by: Daniel Su <danielsu@amd.com>

* Update webui branding (#459)

* Update name and icon for browser tab to rocprofiler-compute.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Update name and icon for browser tab to rocprofiler-compute.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Update branding in documentation (#442)

* find/replace Omniperf to ROCm Compute Profiler

Signed-off-by: Peter Park <peter.park@amd.com>

* update name in Sphinx conf

Signed-off-by: Peter Park <peter.park@amd.com>

* mv what-is-omniperf.rst -> what-is-rocprof-compute.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* update Tutorials section

Signed-off-by: Peter Park <peter.park@amd.com>

* add Omniperf as keyword to Conceptual section for internal search

Signed-off-by: Peter Park <peter.park@amd.com>

* update Reference section

Signed-off-by: Peter Park <peter.park@amd.com>

* black fmt conf.py

Signed-off-by: Peter Park <peter.park@amd.com>

* update profile mode and basic usage subsections

Signed-off-by: Peter Park <peter.park@amd.com>

* update how to use analyze mode subsection

Signed-off-by: Peter Park <peter.park@amd.com>

* update install section

Signed-off-by: Peter Park <peter.park@amd.com>

* fix sphinx warnings

Signed-off-by: Peter Park <peter.park@amd.com>

* fix cmd line examples in profile/mode.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* update install decision tree image

Signed-off-by: Peter Park <peter.park@amd.com>

* fix TOC and index

Signed-off-by: Peter Park <peter.park@amd.com>

fix weird wording

* fix cli text: deriving rocprofiler-compute metrics...

Signed-off-by: Peter Park <peter.park@amd.com>

* update standalone-gui.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* restore removed doc updates from #428

Signed-off-by: Peter Park <peter.park@amd.com>

* update ref to Omniperf in index.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* fix grafana connection name to match image

Signed-off-by: Peter Park <peter.park@amd.com>

* update cmds in tutorials

Signed-off-by: Peter Park <peter.park@amd.com>

---------

Signed-off-by: Peter Park <peter.park@amd.com>

* MI300 roofline enablement in rocprofiler-compute (#470)

* MI300 roofline enablement in rocprofiler-compute

requirements.txt
- running some modules complained about numpy version too new, adding extra requirement that numpy be 1.x
pmc_roof_perf.txt
- adding TCC_BUBBLE_sum counter to profile
soc_gfx940.py
soc_gfx941.py
soc_gfx942.py
- remove console logs reading that roofline is temporarily disabled, uncommenting blocks that check for roofline csv and run roofline post-processing
roofline_calc.py
- add mi300 to supported soc
- add new calculation for hbm_data for MI300 using tcc_bubble_sum, checks if counter > 0 to use
- add to a few comments
roofline-ubuntu-20_04-mi300-rocm6
- binary for the ubuntu systems to enable mi300 roofline calculations from rocm-amdgpu-bench

Note- other distros will get roofline bins to enable mi300, but need to be further tested before putting into branch.

Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>

* Reformatting roofline_calc.py

Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>

* Update Python format checker (#471)

* Add pre commit hook for Python formatting

Signed-off-by: coleramos425 <colramos@amd.com>

* Update formatting workflow to run on latest Python and add isort formatter

Signed-off-by: coleramos425 <colramos@amd.com>

* Fix caught yaml formatting issues

* Update pyproject file

* Add pre-commit hook instruction to CONTRIBUTING guide

* Remove target-version from black pyproject.toml

* Fixed formatting errors found with black and isort

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Run hook: Whitespaces, fix end of file spaces

---------

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* Bump cryptography from 43.0.0 to 43.0.1 in /docs/sphinx (#473)

Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.0 to 43.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/43.0.0...43.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix file permission on MI300 roofline binary (#477)

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Removing numpy requirements of <2 (#478)

Checks are failing if version too high and no need for lower version

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Fix crash when loading web UI roofline for gfx942 (#479)

* Fix crash when loading web UI roofline for gfx942

* Fix formatting

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Make same changs for gfx940, gfx942.

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Fix formatting in soc_gfx940 and soc_gfx941.

Signed-off-by: benrichard-amd <ben.richard@amd.com>

---------

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Rebranding name change patch (#469)

* Patch in missed name change for rebranding.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Patch in missed name change for rebranding.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Move dependabot.yml to .github/ and bump rocm-docs-core (#481)

* Move dependabot.yml to .github/

* Bump rocm-docs-core to 1.8.5

* Bump rocm-docs-core to 1.9.0

* Fix packaging for upgrading (#486)

Specify that "rocprofiler-compute" replaces / obsoletes the "omniperf" package.

* Renamed extension path from omniperf to rocprofiler_compute (#487)

Signed-off-by: Tim Gu <Tim.Gu@amd.com>

* MI300 rhel and sles roofline binaries (#480)

* Roofline bins for MI300 on rhel and sles distributions
Built from rocm-amdgpu-bench, tested on respective distro systems with MI300 hardware.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Minor modifications removing hardcoded variables in roofline files.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Modify test_profile_general.py ctest to include MI300 enablement (#498)

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* part 1 to support rocprofv3 (#492)

* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* Use correct GPU ID

When converting JSON -> CSV we were assigning node_id to GPU_ID. Since
the JSON contains non-GPU devices, the node_id for GPUs might not
start at 0 as expected.

This commit maps the agent ID to the appropriate GPU ID.

* Parse scratch memory per work item from JSON

* Support rocprofv3 CSV parsing

JSON decoding is very slow for large files. Include support for parsing
rocprofv3 CSV output and make that the default.

CSV/JSON can be toggled via the ROCPROF_OUTPUT_FORMAT environment
variable e.g. ROCPROF_OUTPUT_FORMAT=csv or ROCPROF_OUTPUT_FORMAT=json

* black format after merge

* format isort

* change return of rocprof_cmd to try to resolve test's error

* hack to pick last part of rocminfo's name

* debug log of hacks

* Modify test_profile_general.py ctest to include MI300 enablement. Currently failing because of explicitly excluded roofline files for the soc and autofailed asserts for roof-only tests- originally in place because roofline was not enabled on mi300 yet.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* black and isort formated

* corrected line of copyright

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix for crash of timestamp of part 1 for rocprofv3 (#499)

* fix the error caused by ignoring the lack of counter csv file from rocprofv3 for timestamp

* isort and black formated

* quick fix for gfx906 roofline (#505)

* Multi node support (#503)

* [CTest] Pipeline failures for MI300 (#483)

* Propagate new chip_id logic to testing workflow

Signed-off-by: coleramos425 <colramos@amd.com>

* Add a debug line to tests

Signed-off-by: coleramos425 <colramos@amd.com>

* Trying to set rocprofv2 generally in CTest module

Signed-off-by: coleramos425 <colramos@amd.com>

* Remove temp debugging lines from CI

Signed-off-by: coleramos425 <colramos@amd.com>

* Add roofline entry for MI300 expected files in CI tests

Signed-off-by: coleramos425 <colramos@amd.com>

* Make num_devices modifier global in scope

Signed-off-by: coleramos425 <colramos@amd.com>

* Change kernel name in PyTest to confirm rocprofv2 bug

Related to https://ontrack-internal.amd.com/browse/SWDEV-503453

Signed-off-by: coleramos425 <colramos@amd.com>

---------

Signed-off-by: coleramos425 <colramos@amd.com>

* Spatial-multiplexing: part 1 profiling stage (#465)

* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* debug code

* add logic code for multiplexing

* minor fix

* more fixes

* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* count accu files as well

* Use correct GPU ID

When converting JSON -> CSV we were assigning node_id to GPU_ID. Since
the JSON contains non-GPU devices, the node_id for GPUs might not
start at 0 as expected.

This commit maps the agent ID to the appropriate GPU ID.

* fix error with csv file parse from json and merge during post-processing

* implemented parsing of csv files from v3 output for optimization

* Parse scratch memory per work item from JSON

* Support rocprofv3 CSV parsing

JSON decoding is very slow for large files. Include support for parsing
rocprofv3 CSV output and make that the default.

CSV/JSON can be toggled via the ROCPROF_OUTPUT_FORMAT environment
variable e.g. ROCPROF_OUTPUT_FORMAT=csv or ROCPROF_OUTPUT_FORMAT=json

* black format after merge

* format isort

* change return of rocprof_cmd to try to resolve test's error

* hack to pick last part of rocminfo's name

* debug log of hacks

* Modify test_profile_general.py ctest to include MI300 enablement. Currently failing because of explicitly excluded roofline files for the soc and autofailed asserts for roof-only tests- originally in place because roofline was not enabled on mi300 yet.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* black and isort formated

* formated by isort and black

* change default rocprof's output to csv

* repaired crash caused by missing csv counter file when running for timestamp

* change name to spatial-multiplexing from multiplexing

* make necessary modification for review

* set the value of spatial_multiplexing in argument defautly to None

* repair the part that blocks regular pmc files' generation

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: fei.zheng <fei.zheng@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Simple fix for gpu model value. (#508)

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Add FP64 to plot adhering to pdf name (#507)

* Replacing FP32-only plot with an FP32&FP64 combo plot. Results will likely be negligible but the plot name indicates both should be graphed.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove duplicate AI plot to clean up fp32 fp64 graph

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add gpu series for roofline (#510)

* Add gpu_series for roofline.

* Use gpu_series in path names for roofline.

* Fix  TCC on MI200 when introduce rocprofv3 (#509)

* quick fix for v2

* one more fix

* revert a bit

---------

Co-authored-by: ywang103-amd <ywang103@amd.com>

* Bump rocm-docs-core from 1.9.0 to 1.12.0 in /docs/sphinx (#511)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.9.0 to 1.12.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.0...v1.12.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update sample roofline plot img (#516)

* Modify path to use gpu_model instead of gpu_series to match other workload directory path creation/search points. Affects manual testing, does not seem to affect ctests. (#513)

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Improve formatting when displaying rocprof command. (#476)

* Improve formatting when displaying rocprof command.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Fix python formatting.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Strip unwanted characters (rocprofv1 specific) from rocprof commands.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Strip unwanted characters (rocprofv1 specific) from rocprof commands.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Save the unmodified arguments for rocprof for debug message display.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

* quick fix for mpi_support (#518)

* Pass accumulate counters to rocprofv3 using -E option (#522)

rocprofv3 has a new -E option where extra counters can be passed (see accum_counters.yaml) instead
of defining them in counter_defs.yaml.

* Unify all file handling with pathlib (#512)

* Replace occurences of os.path functions with equivalent functions from
  pathlib library

* Remove unwanted imports of os.path and os

* Add coding guidelines for using pathlib instead of os.path

* Auto sync staging and mainline on a weekly cadence (#517)

Signed-off-by: coleramos425 <colramos@amd.com>

---------

Signed-off-by: Daniel Su <danielsu@amd.com>
Signed-off-by: xuchen-amd <xuchen@amd.com>
Signed-off-by: Peter Park <peter.park@amd.com>
Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: xuchen-amd <xuchen@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: cfallows-amd <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ben Richard <143630488+benrichard-amd@users.noreply.github.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: ywang103-amd <ywang103@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
Co-authored-by: fei.zheng <fei.zheng@amd.com>
Co-authored-by: vedithal-amd <Vignesh.Edithal@amd.com>

[ROCm/rocprofiler-compute commit: 272e5b6e32]
2025-01-02 15:29:47 -06:00
xuchen-amd 825440e7ba Rename Omniperf to ROCm Compute Profiler (#428)
- Update filenames.
- Update executable to `rocprof-compute` 
- Update update package to `rocprofiler-compute`
- Update name in application output and logs
- Update name in README files
- Update testing and workflows

---------

Signed-off-by: Xuan Chen <xuchen@amd.com>

[ROCm/rocprofiler-compute commit: 31b4de1a38]
2024-11-01 12:20:21 -04:00
David Galiffi 3565b2ee46 Remove dev and main branch from workflows. (#404)
* Remove `dev` and `main` branch from workflows.

Update links in documentation.

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* `amd-staging` -> `amd-mainline` in docs

Signed-off-by: Peter Jun Park <peter.park@amd.com>

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Co-authored-by: Peter Jun Park <peter.park@amd.com>

[ROCm/rocprofiler-compute commit: 68e5db2dbd]
2024-09-25 17:21:39 +00:00
Peter Park f3ae7a618b Docs housekeeping post-6.2.0 (#394)
* remove leftover css

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* fix link to panel_config_template.yaml

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add note in archived docs pointing to latest

Signed-off-by: Peter Jun Park <peter.park@amd.com>

rm repetition

---------

Signed-off-by: Peter Jun Park <peter.park@amd.com>

[ROCm/rocprofiler-compute commit: e8fc341345]
2024-08-09 09:46:42 -04:00
Peter Park 5d22d5ac8e Docs: refactor and integrate into ROCm docs portal (#362)
* pip-compile docs/requirements.txt

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add Sphinx docs config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add Sphinx config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update docs build config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* style(conf.py): Apply black formatting to docs/conf.py

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

* Update docs requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update to rocm-docs-core 1.3.0

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update docs requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

pip-compile requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

bump rocm-docs-core to 1.5.0

bump rocm-docs-core to 1.4.1

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* Add dependabot.yml and update CODEOWNERS

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update toc and conf

Signed-off-by: Peter Jun Park <peter.park@amd.com>

update dependabot

* Port docs to rocm-docs standard

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add toc and Diataxis cards

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add basic file structure

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add glossary

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add includes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add license.rst

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add compatible hw

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix spelling and license

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up index

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up installation guides

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add basic usage (quickstart)

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add ref to global options

update toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

modularize modes and global options

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add profile mode

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg and clean up

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add dynamic omniperf version number in installation guide

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add datatemplate

more reorg

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg images

move profile mode

reorg

reorg

reorg more

fix formatting

fix headings

ref anchor mi2xx note

add extlinks

add extlinks

Signed-off-by: Peter Jun Park <peter.park@amd.com>

black format

fix formatting, anchors

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg

fix words and formatting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

formatting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

same

reorg

format

fix formatting

fix toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

format

* impr internal linking and fix sphinx warnings

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add spellcheck/linting from rocm-docs-core

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix rst directives

satisfy spellcheck

fix more spelling

rm unused files

fix spelling and update wordlist

* bump rocm-docs-core to 1.6.0

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add fixes from @skyreflectedinmirrors and @lpaoletti

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add references to toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add more fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add package manager install section

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add metadata and fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add fixes

bump to 1.6.1

more fixes

fix fmt in profiling examples

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add missing mem type table

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix formatting

fmt

* add custom css

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix css fs

* make images/figs click-to-expand

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add missed image

update

fix link

* update documentation link in README

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* formatting fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

more formatting

* fix heading

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* move archived docs

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* exclude archived docs from docs build

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* update archived docs workflow

Signed-off-by: Peter Jun Park <peter.park@amd.com>

move files

update archived docs workflow

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix version number

clean up workflow

workflow test

workflow test

another workflow test

* rm docs linting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* Apply cmake-format suggested changes

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

* Apply cmake-format

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

---------

Signed-off-by: Peter Jun Park <peter.park@amd.com>
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

[ROCm/rocprofiler-compute commit: a0dc485ceb]
2024-08-09 09:46:42 -04:00