31 Коммитов

Автор SHA1 Сообщение Дата
ggottipa-amd 77f7541755 [rocprofiler-compute] Adding --torch-trace option for SWDEV-559789 (#2089)
* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.

* --torch-operators flag added to rocprofiler-sdk

* Adding ctest for --torch-operators.

* Adding pytest markers.

* Corrections in ctest and message logging.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Adding a check for pytorch installation only when --torch-operators is passed.

* moving inject_roctx.py into src/utils.

* rebase

* Updating docs and changelog.

* Update projects/rocprofiler-compute/src/argparser.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removing special characters.

* Minor corrections.

* Setting default value for torch_operators_enabled.

* Updating the number of files according to the number of passes.

* Adding rocpd support.

* Adding a warning message to be shown when profiling a non-python workload.

* copilot suggestions, rocpd+native tool fix

* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()

* ruff format fix

* ruff formating

* Deleting torch_trace.csvs after consolidating the operator data.

* Removing checks since *torch_trace.csv files are deleted.

* Fixing file deletion.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/tests/test_profile_general.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Using default options in the testcase.

* Adding test for overhead measurement.

* Corrections in docs.

* doc updates.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Handling potential empty frames.

* Corrected the test cases.

* Changing the flag to --torch-trace

* Fixed helper_app path issues

* Path issues

* process_torch_trace_output() now takes csv file paths as input + allows default usage.

* Replaced pandas with sqlite3

* Adding marker_trace extraction to rocpd_data.py

* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.

* Modified help section for the flag.

* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removed the accesses to the tables.

* Ruff fixes.

* ruff

* Ruff Fixes

* Adding getattr for args.torch_trace to handle mock args.

* Fix for 'Missing guid in counter collection data - in csv mode'

* Sending output_format to process_torch_trace_output

* Warning for self contained binaries.

* Ruff

* Ruff

* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.

* Minor fixes in input arguments

* Ruff

* Loging PyTorch version

* Fix ruff formatting for PyTorch version logging

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 19:50:25 +05:30
jamessiddeley-amd 69281bbcf4 [rocprofiler-compute] Threshold Based Clamping in Analyze Stage (#2565)
* add threshold clamping function + parse in parser.py (with I/O)

* implemented hybrid threshold solution

* update changelog

* removed absolute threshold hybrid approach; restored relative threshold + warn

* edited warning msg, threshold -> 1%

* update changelog

* added 2 test cases

* ran master workflow yaml config files

* added to FAQ

* Revert "ran master workflow yaml config files"

This reverts commit 75a670e14d6f1619ebbda0ec218755ccbe0d22b1.

* update FAQ

* update config hashes

* Broke down long functions into Class with sub-functions

* ruff format

* addressed comments
2026-01-23 00:54:54 -05:00
abchoudh-amd 6d9d880d31 [rocprofiler-compute] Counter accuracy tests and improvements for iteration multiplexing (#2011)
* Added laplace solver in samples

* Add laplace eqn in CMake

* Added counter accuracy test

* Add iteration CLI arg for laplace eq

* Unnest profile method

* Missing counter warning

* Updated insufficient kernel warning

* Added reference for laplace equation

* variable name change

* Added comments for data comparison

* Included scipy as test requirement

* Added line number for ref

* split stochastic and deterministic tests

* Added order cli option for laplace_eqn

* Install laplace eqn

* Missing counter warning

* Warn about missing kernels during analysis

* Update tests

* Split iteration multiplexing ctests

* Updated warning

* Incorporated copilot's suggestions
2025-12-17 18:26:39 +05:30
abchoudh-amd 433af908a6 Iteration multiplexing in rocprof-compute (#1533)
* Profile with multiple input files

* Iteration multiplexing kernel option

* Iteration multiplexing data

* Iteration multiplexing

* Counter profile caching

* Counter dispatch info

* Sanitize CLI args

* Formatting and removed unused header file

* Formattng

* Changed CLI args

* Merge counters for analysis

* Iteration multiplexing log while analysis

* Formatting

* Log

* Guard against incomplete profiling

* Fixed merge counter

* Tests

* Update doc

* Test update

* Fixed formatting

* Test fix

* Merge conflict commit

* Fix tests

* Added comment for counter definition file

* Do not allow dispatch filtering with iteration multiplexing

* Fixed formatting

* Doc indentation update
2025-11-19 21:09:16 +05:30
abchoudh-amd 76ea35787d Split roofline tests, and fix none outputs (#1913)
* Split roofline tests

* Use N/A for missing values

* Test eval_expression for no valid data

* Fixed tests

* Updated Changelog for N/A

* Fixed platform specific test failure
2025-11-19 15:36:08 +05:30
xuchen-amd b774f28181 [rocprofiler-compute] Remove grafana and mongodb integration (#978)
* Remove grafana and mongodb integration

* Remove grafana documentation assets

* clarify changelog

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-29 11:32:06 -04:00
Fei Zheng 2c59a82fe1 Fix rocprof-compute TUI build err with python 39 (#303)
* Upgrade min python version from 3.8 to 3.9

* Set min version for textual-fspicker for TUI support

* Update workflows to use python 3.9 instead of 3.8

* fix formatting

* fix bug

---------

Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
2025-10-21 00:27:35 -04:00
xuchen-amd c3054c00b1 [rocprofiler-compute] Type annotation patch for analysis_db.py (#981) 2025-09-23 17:05:37 -04:00
systems-assistant[bot] 872f0aed0c Live attach/detach and its unit tests (#53) 2025-09-23 13:17:08 -04:00
abchoudh-amd a927f246f6 Fix test failures (#1059)
* Test fix

* Added path not exists check
2025-09-22 18:07:13 +05:30
xuchen-amd eb46160a8f update proj toml (#974) 2025-09-12 16:24:44 -04:00
xuchen-amd 7ed6000e32 [rocprofiler-compute] Refactor to add type annotation and misc (#787) 2025-09-12 13:53:24 -04:00
ywang103-amd 2a216ecbc1 pc sampling unit tests (#194) 2025-08-23 10:13:22 -04:00
systems-assistant[bot] 58d2a016ce Format source code to PEP8 using Ruff (#36)
* added ruff docs

* style: Run ruff and black before yapf pass

* yapf -r -i (23 fixes)

* fixed conf.py and ran ruff format .

* fixed conf.py 2

* formatted argparser.py

* formatted src/rocprof_compute_analyze

* formatted src/rocprof_compute_profile

* formatted soc_base.py

* formatted rocprof_compute_tui

* formatted gui_components

* formatted src/utils

* formatted tests/

* format extra files

* cleanup

* fix test_utils.py

* fixed typos

* Update pyproject.toml

* Update README.md

* Update test_utils.py

---------

Signed-off-by: jamessiddeley-amd <James.Siddeley@amd.com>
Co-authored-by: James Siddeley <James.Siddeley@amd.com>
Co-authored-by: systems-assistant[bot] <systems-assistant[bot]@users.noreply.github.com>
2025-08-08 15:32:30 -04:00
xuchen-amd 34dd26fb07 Enable single pass counter collection (#833)
[ROCm/rocprofiler-compute commit: 6a77d241ed]
2025-08-06 10:35:05 -04:00
vedithal-amd 46ae3d36d9 Remove hardware IP block based filtering (#820)
* Analysis report block based filtering is the default now

* Update documentation

* Update CHANGELOG

* Fix tests
    * Replace hardware block based filtering tests with report block
      based filtering tests

[ROCm/rocprofiler-compute commit: 98bb0f4237]
2025-07-21 09:37:35 -04:00
xuchen-amd af114a1539 Add test for gfx942 number of xcds. (#674)
* Add test for 9fx942 number of xcds.

* Improve the structure of mi gpu specs, add num_xcds_spec_class test.

* Add to ctest.

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

[ROCm/rocprofiler-compute commit: 85bfa73e2c]
2025-04-28 11:29:14 -04:00
xuchen-amd 08e083cc25 Add mi300 TCP counter tests (#644)
* Add new sample applications.

* Generalize py test launcher for additional apps.

* Add TCP pytest, and add to ctest.

* Update licensing.

* Disable for non-mi300 machines.

[ROCm/rocprofiler-compute commit: 591632dd69]
2025-04-02 20:32:13 -04:00
vedithal-amd 7968645191 Analysis report block based filtering for profiling (#566)
* Analysis report block based filtering for profiling

* Profiling mode changes

- `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23)
    - Only counters mentioned in the selected analysis report blocks will be collected
        - Add parsing logic to identify hardware counters from analysis report blocks
        - Add filtering logic to only write filtered counters in perfmon files
        - Log not collected counters in one line
- `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode
- Write arguments provided during profiling in profiling_configuration.yaml file

* Analysis mode changes

- During analysis mode, only show report blocks selected during profiling
    - If `-b` option is provided in analysis mode, then follow provided filters
- Do not show empty tables in analysis report

* Miscellaneous changes

- Update CHANGELOG
- Add test cases
    - Instruction mix report block filter
    - Instruction mix and Memory chart report block filter
    - Instruction mix report block filter and CPC hardware block filter
    - TA hardware block filter
    - --list-metrics in profile mode should work
- Move binary handler fixtures to conftest.py to avoid importing
  fixtures
- cmake file in tests directory has been updated to compile sample/vmem.hip for testing

* Public documentation changes

- Use the term "Hardware report block" instead of "Hardware block"
- Add documentation for "--list-metrics" option in profile mode
- Add example of filtering by hardware report block such as instruction
  mix and wavefront launch statistics
- Add deprecation warning for hardware component (sq, tcc) based filtering

[ROCm/rocprofiler-compute commit: 55cf0e237e]
2025-03-10 14:42:56 -04:00
Cole Ramos 5a7cb724ce Sync staging with mainline (#524)
* External CI: rename pipeline to rocprofiler-compute (#463)

Signed-off-by: Daniel Su <danielsu@amd.com>

* Update webui branding (#459)

* Update name and icon for browser tab to rocprofiler-compute.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Update name and icon for browser tab to rocprofiler-compute.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Update branding in documentation (#442)

* find/replace Omniperf to ROCm Compute Profiler

Signed-off-by: Peter Park <peter.park@amd.com>

* update name in Sphinx conf

Signed-off-by: Peter Park <peter.park@amd.com>

* mv what-is-omniperf.rst -> what-is-rocprof-compute.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* update Tutorials section

Signed-off-by: Peter Park <peter.park@amd.com>

* add Omniperf as keyword to Conceptual section for internal search

Signed-off-by: Peter Park <peter.park@amd.com>

* update Reference section

Signed-off-by: Peter Park <peter.park@amd.com>

* black fmt conf.py

Signed-off-by: Peter Park <peter.park@amd.com>

* update profile mode and basic usage subsections

Signed-off-by: Peter Park <peter.park@amd.com>

* update how to use analyze mode subsection

Signed-off-by: Peter Park <peter.park@amd.com>

* update install section

Signed-off-by: Peter Park <peter.park@amd.com>

* fix sphinx warnings

Signed-off-by: Peter Park <peter.park@amd.com>

* fix cmd line examples in profile/mode.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* update install decision tree image

Signed-off-by: Peter Park <peter.park@amd.com>

* fix TOC and index

Signed-off-by: Peter Park <peter.park@amd.com>

fix weird wording

* fix cli text: deriving rocprofiler-compute metrics...

Signed-off-by: Peter Park <peter.park@amd.com>

* update standalone-gui.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* restore removed doc updates from #428

Signed-off-by: Peter Park <peter.park@amd.com>

* update ref to Omniperf in index.rst

Signed-off-by: Peter Park <peter.park@amd.com>

* fix grafana connection name to match image

Signed-off-by: Peter Park <peter.park@amd.com>

* update cmds in tutorials

Signed-off-by: Peter Park <peter.park@amd.com>

---------

Signed-off-by: Peter Park <peter.park@amd.com>

* MI300 roofline enablement in rocprofiler-compute (#470)

* MI300 roofline enablement in rocprofiler-compute

requirements.txt
- running some modules complained about numpy version too new, adding extra requirement that numpy be 1.x
pmc_roof_perf.txt
- adding TCC_BUBBLE_sum counter to profile
soc_gfx940.py
soc_gfx941.py
soc_gfx942.py
- remove console logs reading that roofline is temporarily disabled, uncommenting blocks that check for roofline csv and run roofline post-processing
roofline_calc.py
- add mi300 to supported soc
- add new calculation for hbm_data for MI300 using tcc_bubble_sum, checks if counter > 0 to use
- add to a few comments
roofline-ubuntu-20_04-mi300-rocm6
- binary for the ubuntu systems to enable mi300 roofline calculations from rocm-amdgpu-bench

Note- other distros will get roofline bins to enable mi300, but need to be further tested before putting into branch.

Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>

* Reformatting roofline_calc.py

Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>

* Update Python format checker (#471)

* Add pre commit hook for Python formatting

Signed-off-by: coleramos425 <colramos@amd.com>

* Update formatting workflow to run on latest Python and add isort formatter

Signed-off-by: coleramos425 <colramos@amd.com>

* Fix caught yaml formatting issues

* Update pyproject file

* Add pre-commit hook instruction to CONTRIBUTING guide

* Remove target-version from black pyproject.toml

* Fixed formatting errors found with black and isort

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Run hook: Whitespaces, fix end of file spaces

---------

Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>

* Bump cryptography from 43.0.0 to 43.0.1 in /docs/sphinx (#473)

Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.0 to 43.0.1.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/43.0.0...43.0.1)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix file permission on MI300 roofline binary (#477)

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Removing numpy requirements of <2 (#478)

Checks are failing if version too high and no need for lower version

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Fix crash when loading web UI roofline for gfx942 (#479)

* Fix crash when loading web UI roofline for gfx942

* Fix formatting

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Make same changs for gfx940, gfx942.

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Fix formatting in soc_gfx940 and soc_gfx941.

Signed-off-by: benrichard-amd <ben.richard@amd.com>

---------

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Rebranding name change patch (#469)

* Patch in missed name change for rebranding.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Patch in missed name change for rebranding.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Move dependabot.yml to .github/ and bump rocm-docs-core (#481)

* Move dependabot.yml to .github/

* Bump rocm-docs-core to 1.8.5

* Bump rocm-docs-core to 1.9.0

* Fix packaging for upgrading (#486)

Specify that "rocprofiler-compute" replaces / obsoletes the "omniperf" package.

* Renamed extension path from omniperf to rocprofiler_compute (#487)

Signed-off-by: Tim Gu <Tim.Gu@amd.com>

* MI300 rhel and sles roofline binaries (#480)

* Roofline bins for MI300 on rhel and sles distributions
Built from rocm-amdgpu-bench, tested on respective distro systems with MI300 hardware.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Minor modifications removing hardcoded variables in roofline files.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Modify test_profile_general.py ctest to include MI300 enablement (#498)

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* part 1 to support rocprofv3 (#492)

* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* Use correct GPU ID

When converting JSON -> CSV we were assigning node_id to GPU_ID. Since
the JSON contains non-GPU devices, the node_id for GPUs might not
start at 0 as expected.

This commit maps the agent ID to the appropriate GPU ID.

* Parse scratch memory per work item from JSON

* Support rocprofv3 CSV parsing

JSON decoding is very slow for large files. Include support for parsing
rocprofv3 CSV output and make that the default.

CSV/JSON can be toggled via the ROCPROF_OUTPUT_FORMAT environment
variable e.g. ROCPROF_OUTPUT_FORMAT=csv or ROCPROF_OUTPUT_FORMAT=json

* black format after merge

* format isort

* change return of rocprof_cmd to try to resolve test's error

* hack to pick last part of rocminfo's name

* debug log of hacks

* Modify test_profile_general.py ctest to include MI300 enablement. Currently failing because of explicitly excluded roofline files for the soc and autofailed asserts for roof-only tests- originally in place because roofline was not enabled on mi300 yet.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* black and isort formated

* corrected line of copyright

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>

* fix for crash of timestamp of part 1 for rocprofv3 (#499)

* fix the error caused by ignoring the lack of counter csv file from rocprofv3 for timestamp

* isort and black formated

* quick fix for gfx906 roofline (#505)

* Multi node support (#503)

* [CTest] Pipeline failures for MI300 (#483)

* Propagate new chip_id logic to testing workflow

Signed-off-by: coleramos425 <colramos@amd.com>

* Add a debug line to tests

Signed-off-by: coleramos425 <colramos@amd.com>

* Trying to set rocprofv2 generally in CTest module

Signed-off-by: coleramos425 <colramos@amd.com>

* Remove temp debugging lines from CI

Signed-off-by: coleramos425 <colramos@amd.com>

* Add roofline entry for MI300 expected files in CI tests

Signed-off-by: coleramos425 <colramos@amd.com>

* Make num_devices modifier global in scope

Signed-off-by: coleramos425 <colramos@amd.com>

* Change kernel name in PyTest to confirm rocprofv2 bug

Related to https://ontrack-internal.amd.com/browse/SWDEV-503453

Signed-off-by: coleramos425 <colramos@amd.com>

---------

Signed-off-by: coleramos425 <colramos@amd.com>

* Spatial-multiplexing: part 1 profiling stage (#465)

* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* debug code

* add logic code for multiplexing

* minor fix

* more fixes

* rocprofv3 support initial commit

-Can run rocprofv3 but ultimately fails. rocprofv3 says the counter capacity
is exceeded and the output CSV file format is different from v1/v2.

* Add rocprofv3 detection so v2 can still be used

It's hacky but it'll do for now.

* Add code path to convert rocprofv3 JSON output into CSV

* Grab correct value for Queue ID

* Use _sum suffix to sum TCC counters

Previously we were specifying each channel for TCC counters. rocprofv3 does
not support specifing each TCC channel, and instead will auto sum given
the TCC counter name. The counter name with the _sum suffix is also
supported and is also supported in v1 and v2. So we will use the TCC
counter name with the _sum suffix.

* Fix incorrect counter outputs when using rocprofv3

In the JSON output some counters appear multime times and must be
summed to get the correct value. These summed values match the
rocprofv3 output in CSV mode and also match the rocprofv2
output.

* Remove duplicate Correlation_ID and Wave_Size in output

* Handle json output that does not contain any dispatches

Omniperf was assuming each JSON output from rocprofv3 would always contain
dispatches. This is not the case. For example, in a multi-process
workload where one of the processes does not dispatch any kernels. A JSON
file will still be output for this process but it will not contain any dispatches.

* Code cleanup

* Update search path for rocprofv3 results

Rocprofv3 was updated to include the hostname in the path where
it outputs results.

* Handle accumulate counters

In v1/v2 rocprof uses the SQ_ACCUM_PREV_HIRES counter for the accumualte
counters. v3 does not have this. So we need to define our own counters
in counter_defs.yaml. For this we use the counter name + _ACCUM, for
example SQ_INSTR_LEVEL_SMEM_ACCUM.

To use rocprofv3 you will need to update counter_defs.yaml to include
these new counter definitions.

* count accu files as well

* Use correct GPU ID

When converting JSON -> CSV we were assigning node_id to GPU_ID. Since
the JSON contains non-GPU devices, the node_id for GPUs might not
start at 0 as expected.

This commit maps the agent ID to the appropriate GPU ID.

* fix error with csv file parse from json and merge during post-processing

* implemented parsing of csv files from v3 output for optimization

* Parse scratch memory per work item from JSON

* Support rocprofv3 CSV parsing

JSON decoding is very slow for large files. Include support for parsing
rocprofv3 CSV output and make that the default.

CSV/JSON can be toggled via the ROCPROF_OUTPUT_FORMAT environment
variable e.g. ROCPROF_OUTPUT_FORMAT=csv or ROCPROF_OUTPUT_FORMAT=json

* black format after merge

* format isort

* change return of rocprof_cmd to try to resolve test's error

* hack to pick last part of rocminfo's name

* debug log of hacks

* Modify test_profile_general.py ctest to include MI300 enablement. Currently failing because of explicitly excluded roofline files for the soc and autofailed asserts for roof-only tests- originally in place because roofline was not enabled on mi300 yet.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* black and isort formated

* formated by isort and black

* change default rocprof's output to csv

* repaired crash caused by missing csv counter file when running for timestamp

* change name to spatial-multiplexing from multiplexing

* make necessary modification for review

* set the value of spatial_multiplexing in argument defautly to None

* repair the part that blocks regular pmc files' generation

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: fei.zheng <fei.zheng@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Simple fix for gpu model value. (#508)

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Add FP64 to plot adhering to pdf name (#507)

* Replacing FP32-only plot with an FP32&FP64 combo plot. Results will likely be negligible but the plot name indicates both should be graphed.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Remove duplicate AI plot to clean up fp32 fp64 graph

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Add gpu series for roofline (#510)

* Add gpu_series for roofline.

* Use gpu_series in path names for roofline.

* Fix  TCC on MI200 when introduce rocprofv3 (#509)

* quick fix for v2

* one more fix

* revert a bit

---------

Co-authored-by: ywang103-amd <ywang103@amd.com>

* Bump rocm-docs-core from 1.9.0 to 1.12.0 in /docs/sphinx (#511)

Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.9.0 to 1.12.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.9.0...v1.12.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Update sample roofline plot img (#516)

* Modify path to use gpu_model instead of gpu_series to match other workload directory path creation/search points. Affects manual testing, does not seem to affect ctests. (#513)

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

* Improve formatting when displaying rocprof command. (#476)

* Improve formatting when displaying rocprof command.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Fix python formatting.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Strip unwanted characters (rocprofv1 specific) from rocprof commands.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Strip unwanted characters (rocprofv1 specific) from rocprof commands.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Save the unmodified arguments for rocprof for debug message display.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

* quick fix for mpi_support (#518)

* Pass accumulate counters to rocprofv3 using -E option (#522)

rocprofv3 has a new -E option where extra counters can be passed (see accum_counters.yaml) instead
of defining them in counter_defs.yaml.

* Unify all file handling with pathlib (#512)

* Replace occurences of os.path functions with equivalent functions from
  pathlib library

* Remove unwanted imports of os.path and os

* Add coding guidelines for using pathlib instead of os.path

* Auto sync staging and mainline on a weekly cadence (#517)

Signed-off-by: coleramos425 <colramos@amd.com>

---------

Signed-off-by: Daniel Su <danielsu@amd.com>
Signed-off-by: xuchen-amd <xuchen@amd.com>
Signed-off-by: Peter Park <peter.park@amd.com>
Signed-off-by: Carrie Fallows <carrie.fallows@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Signed-off-by: benrichard-amd <ben.richard@amd.com>
Signed-off-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: Daniel Su <danielsu@amd.com>
Co-authored-by: xuchen-amd <xuchen@amd.com>
Co-authored-by: Peter Park <peter.park@amd.com>
Co-authored-by: cfallows-amd <Carrie.Fallows@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Ben Richard <143630488+benrichard-amd@users.noreply.github.com>
Co-authored-by: Tim Gu <Tim.Gu@amd.com>
Co-authored-by: ywang103-amd <ywang103@amd.com>
Co-authored-by: benrichard-amd <ben.richard@amd.com>
Co-authored-by: YANG WANG <ywang@ywang-ubuntu.amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
Co-authored-by: fei.zheng <fei.zheng@amd.com>
Co-authored-by: vedithal-amd <Vignesh.Edithal@amd.com>

[ROCm/rocprofiler-compute commit: 272e5b6e32]
2025-01-02 15:29:47 -06:00
xuchen-amd 825440e7ba Rename Omniperf to ROCm Compute Profiler (#428)
- Update filenames.
- Update executable to `rocprof-compute` 
- Update update package to `rocprofiler-compute`
- Update name in application output and logs
- Update name in README files
- Update testing and workflows

---------

Signed-off-by: Xuan Chen <xuchen@amd.com>

[ROCm/rocprofiler-compute commit: 31b4de1a38]
2024-11-01 12:20:21 -04:00
Karl W Schulz f442662aa2 update analyze_commands tests to demarcate a subset of tests which
should be run serially; needed to avoid file output race conditions in
a common directory

Signed-off-by: Karl W Schulz <karl.schulz@amd.com>


[ROCm/rocprofiler-compute commit: b38dcfd2c9]
2024-03-14 13:39:10 -05:00
Jose Santos b1f08cc9d8 blocks -> block
Signed-off-by: Jose Santos <josantos@amd.com>


[ROCm/rocprofiler-compute commit: f9e806833b]
2024-03-08 16:27:19 -06:00
Jose Santos f02945c84d Typo: Change blocks to block
Signed-off-by: Jose Santos <josantos@amd.com>


[ROCm/rocprofiler-compute commit: 09f9b9e544]
2024-03-08 16:27:19 -06:00
Jose Santos 1eda63c1d2 Change ipblocks flag to --blocks
Signed-off-by: Jose Santos <josantos@amd.com>


[ROCm/rocprofiler-compute commit: 66696f852b]
2024-03-07 15:26:40 -06:00
Jose Santos 450a0ec12c updating Pytest mark and tests/workloads
Signed-off-by: Jose Santos <josantos@login1.hpcfund>


[ROCm/rocprofiler-compute commit: b018d9207f]
2024-02-23 10:00:23 -06:00
Karl W Schulz 9594d5ef7b adding to project markers
Signed-off-by: Karl W Schulz <karl.schulz@amd.com>


[ROCm/rocprofiler-compute commit: 18251da534]
2024-02-16 10:33:12 -06:00
Karl W Schulz 55344d586c define pytest marker groups
Signed-off-by: Karl W Schulz <karl.schulz@amd.com>


[ROCm/rocprofiler-compute commit: ca6d08d652]
2024-01-04 11:54:12 -06:00
Karl W Schulz 37382bf95f Introduce a cmake option to enable python code coverage. It now
defaults to being disabled and can be enabled via a
-DENABLE_COVERAGE=ON option (#194).  Introduce CI on mi100 that
leverages code coverage and publishes results along with a testing
report.

Signed-off-by: Karl W Schulz <karl.schulz@amd.com>


[ROCm/rocprofiler-compute commit: a39514f02f]
2023-10-31 09:51:36 -04:00
colramos425 c58dfc3ccc Merge branch '30-multi-normalization' into dev
Signed-off-by: colramos425 <colramos@amd.com>


[ROCm/rocprofiler-compute commit: f9f8b749c8]
2022-12-02 11:56:56 -06:00
colramos-amd f0cf74353e Initial commit
[ROCm/rocprofiler-compute commit: 62d130b458]
2022-11-04 14:49:36 -05:00