Commit Graph

555 Commits

Author SHA1 Message Date
xuchen-amd c48e6e31cf Add the ability to determine GPU model from Chip ID (#423)
* Add the ability to determine GPU model from Chip ID for distinguishing MI300 systems by using a built-in dictionary.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Add support for MI300X_A1

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Remove MI308X identification using num CUs, and format Python using black.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Add Read the Docs

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Add sphinx requirement

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Remove gpu_model identification using gpu_arch

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Remove OMNIPERF_ARCH_OVERRIDE and its usage. Determining MI300 gpu model solely based on chip id.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Fix Python formatting using black.

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>
2024-09-25 17:21:40 +00:00
coleramos425 bc4d386683 Fix typo and CHANGELOG modification
After meeting with the DevOps team, I've added the Unreleased keyword to new CHANGELOG section per their request

Signed-off-by: coleramos425 <colramos@amd.com>
2024-09-25 17:21:40 +00:00
David Galiffi 75a4b51d0d Check Python version on application launch (#393)
* Check that the minimum required Python (3.8) version is used.

Prints a descriptive error message, rather than a cryptic import
failure, if minimum Python version is not met.

Internal ticket SWDEV-477233.


* Disable the RPM mangling of shebangs.

The are changing the `#!/usr/bin/python3` to `#!/usr/libexec/platform-python`.
With this set, omniperf is always using the platform installed version
of python, which is python 3.6 on RHEL 8. Using virtual environments,
like conda, did not work.

* Fix pylint issues

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
2024-09-25 17:21:39 +00:00
xuchen-amd 6cea66edaf Enable rocprofv1 for additional socs (#391)
* Enable rocprofv1 for gfx940/941/942.

Signed-off-by: xuchen-amd <xuchen@amd.com>

* Enable rocprofv1 for gfx940/941/942.

Enable rocprofv1 for gfx940/941/942, removing rocscope.
Jira Tracking: https://ontrack-internal.amd.com/browse/SWDEV-474924

Signed-off-by: xuchen-amd <xuchen@amd.com>

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>
2024-08-09 09:46:42 -04:00
Peter Park a0dc485ceb Docs: refactor and integrate into ROCm docs portal (#362)
* pip-compile docs/requirements.txt

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add Sphinx docs config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add Sphinx config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update docs build config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* style(conf.py): Apply black formatting to docs/conf.py

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

* Update docs requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update to rocm-docs-core 1.3.0

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update docs requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

pip-compile requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

bump rocm-docs-core to 1.5.0

bump rocm-docs-core to 1.4.1

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* Add dependabot.yml and update CODEOWNERS

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update toc and conf

Signed-off-by: Peter Jun Park <peter.park@amd.com>

update dependabot

* Port docs to rocm-docs standard

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add toc and Diataxis cards

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add basic file structure

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add glossary

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add includes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add license.rst

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add compatible hw

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix spelling and license

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up index

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up installation guides

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add basic usage (quickstart)

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add ref to global options

update toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

modularize modes and global options

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add profile mode

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg and clean up

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add dynamic omniperf version number in installation guide

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add datatemplate

more reorg

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg images

move profile mode

reorg

reorg

reorg more

fix formatting

fix headings

ref anchor mi2xx note

add extlinks

add extlinks

Signed-off-by: Peter Jun Park <peter.park@amd.com>

black format

fix formatting, anchors

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg

fix words and formatting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

formatting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

same

reorg

format

fix formatting

fix toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

format

* impr internal linking and fix sphinx warnings

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add spellcheck/linting from rocm-docs-core

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix rst directives

satisfy spellcheck

fix more spelling

rm unused files

fix spelling and update wordlist

* bump rocm-docs-core to 1.6.0

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add fixes from @skyreflectedinmirrors and @lpaoletti

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add references to toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add more fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add package manager install section

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add metadata and fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add fixes

bump to 1.6.1

more fixes

fix fmt in profiling examples

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add missing mem type table

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix formatting

fmt

* add custom css

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix css fs

* make images/figs click-to-expand

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add missed image

update

fix link

* update documentation link in README

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* formatting fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

more formatting

* fix heading

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* move archived docs

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* exclude archived docs from docs build

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* update archived docs workflow

Signed-off-by: Peter Jun Park <peter.park@amd.com>

move files

update archived docs workflow

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix version number

clean up workflow

workflow test

workflow test

another workflow test

* rm docs linting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* Apply cmake-format suggested changes

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

* Apply cmake-format

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

---------

Signed-off-by: Peter Jun Park <peter.park@amd.com>
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
2024-08-09 09:46:42 -04:00
benrichard-amd 96803e327c Fix code formatting
Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd c5ea1d0ff0 Set correct number of TCC channels for gfx942
Ran into rocprof error:
ROCProfiler: fatal error: input metric'TCC_EA0_RDREQ[16]' not supported on this hardware: gfx942

gfx942 has 16 channels, not 32.

Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd 28900f88cb Remove unused method
Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd 4723ecb6c9 Update to work with rocprof v1
Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd d77d9f1719 Save accumulate counters to SQ_ files
Omniperf analyze expects the accumulate files to be in SQ_*.csv files.

Since these files also contain PMC counters (we are trying to
fit as many counters into each file as possible to minimize runs),
we need to include these SQ_*.csv files in pmc_perf.csv.

Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd 241a0949e1 Remove duplicate normal counters
Interleve TCC channel counters in putput file  e.g.  TCC_HIT[0] TCC_ATOMIC[0] ... TCC_HIT[1] TCC_ATOMIC[1]

Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd a4fdee488b Interleve TCC channel counters
Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
benrichard-amd c93fead779 Improve perfmon coalescing
Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
coleramos425 013bf218d0 Comply to Python formatting
Signed-off-by: coleramos425 <colramos@amd.com>
2024-08-09 09:46:42 -04:00
coleramos425 3d5493a3ff Add detail to Omniperf logs
Signed-off-by: coleramos425 <colramos@amd.com>
2024-08-09 09:46:42 -04:00
coleramos425 19a229a9f2 Split rocprofv2 cmd args prior to subprocess call (#347)
Signed-off-by: coleramos425 <colramos@amd.com>
2024-08-09 09:46:42 -04:00
coleramos425 fa90752d0c Fix bug in abs diff calculation for analysis output
Signed-off-by: coleramos425 <colramos@amd.com>
2024-08-09 09:46:42 -04:00
Ben Richard 69e5c32d52 Work around crash when profiling multi-process/multi-GPU application (#376)
* Fix crash in multi-GPU scenario

Exclude -o option when invoking rocprof so that each rocprof process
writes to a different .csv file. Concatenate into a single .csv file
when finished.

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Only combine csv files when using rocprofv2

rocprofv1 does not have separate csv files

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Fix indices in combined CSV file

Use ignore_index flag to ensure there are no duplicate indices.

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Fix Dispatch_ID column and remove unnamed column

-Pandas was inserting an unnamed column (index column)
-Overwrite the Dispatch_ID column so that every row is unique, starting at 0
-Remove fixup_rocprofv2_dispatch_ids as no longer needed

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Fix code formatting

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Fix code formatting (for real this time)

Signed-off-by: benrichard-amd <ben.richard@amd.com>

---------

Signed-off-by: benrichard-amd <ben.richard@amd.com>
2024-08-09 09:46:42 -04:00
coleramos425 7046ea15bd Comply to formatting
Signed-off-by: coleramos425 <colramos@amd.com>
2024-06-03 13:47:45 -05:00
coleramos425 1d19ae9483 Detection of MI308X and hardcode mclk to address bug in rocm-smi
Signed-off-by: coleramos425 <colramos@amd.com>
2024-06-03 13:47:45 -05:00
coleramos425 295b344646 Re-implementing HBM stack / XCD info for incoming product sku
Co-authored-by: Nicholas Curtis <nicholas.curtis@amd.com>
Signed-off-by: coleramos425 <colramos@amd.com>
2024-06-03 13:47:45 -05:00
coleramos425 49371cacec Create dedicated subdirectory in perfmon configs for archs supporting roofline
Separate subdirs allows us to target different roofline counters for different archs (i.e. MI300 vs MI200)

Signed-off-by: coleramos425 <colramos@amd.com>
2024-05-31 16:09:58 -05:00
Karl W. Schulz 7a01f499d7 remove use of distutils package entirely to avoid future deprecation
issues

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>
2024-05-28 15:34:00 -05:00
Karl W. Schulz aca084f41c updated approach for runtime dependency check that does use "pkg_resources"
which will reportedly be deprecated at some point in the future.

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>
2024-05-28 15:34:00 -05:00
coleramos425 c6cfa9cc26 Wrap text displayed in 'Top Dispatch' table for neatness
Signed-off-by: coleramos425 <colramos@amd.com>
2024-05-28 15:34:00 -05:00
Nicholas Curtis 047d7771f3 Add fix for case where we pass a single 'nan' value to to_avg
This is triggered by doing e.g., analyze -p <whatever> -k <kernel> -n per_kernel -b 17 18
Manifests as e.g.:

```
  ERROR [analysis] 'float' object has no attribute 'empty'
```

because of:

https://github.com/ROCm/omniperf/blob/d1ee2ec8709b21f2e72536cc14dba8ac2f8621ab/src/utils/parser.py#L135

Instead, we first check whether numpy thinks the whole array is nan's, and bail early if so

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>
2024-05-28 15:34:00 -05:00
Nicholas Curtis 1f584c1612 handle unspecified case
Signed-off-by: Nicholas Curtis <nicurtis@amd.com>
2024-05-28 15:34:00 -05:00
Nick Curtis 5579beeed5 fix formatting
Signed-off-by: Nick Curtis <nicholas.curtis@amd.com>
2024-05-28 15:34:00 -05:00
Nick Curtis 989dd3b7ae Add ability to overide arch when name missing in rocminfo
Signed-off-by: Nick Curtis <nicholas.curtis@amd.com>
2024-05-28 15:34:00 -05:00
coleramos425 1f370c9fe7 Format CMake and Python
Signed-off-by: coleramos425 <colramos@amd.com>
2024-05-10 09:07:40 -06:00
coleramos425 dba868973b Add support for --quiet flag to roofline
Signed-off-by: coleramos425 <colramos@amd.com>
2024-05-10 09:07:40 -06:00
coleramos425 3ab51735b5 Add docs for --quiet mode and update README
Signed-off-by: coleramos425 <colramos@amd.com>
2024-05-10 09:07:40 -06:00
coleramos425 8bca70a6ef Update docs for new Grafana reorg
Signed-off-by: coleramos425 <colramos@amd.com>
2024-05-10 09:07:40 -06:00
coleramos425 519bcb9b3e Update from https://github.com/ROCm/mibench/commit/b704bd3ec439f8cbece6713852fcafc855c5b07e 2024-05-10 09:07:40 -06:00
Karl W. Schulz d1ee2ec870 Adding a top-level runtime python dependency checker. Goal is to
provide a kinder error message in the case where python dependencies
are not available locally. This is motivated for future execution by
users who are running from rocm-based binary packaging instead of using
normal cmake build system which would have verified the dependencies.

Signed-off-by: Karl W. Schulz <karl.schulz@amd.com>
2024-05-03 15:26:27 -05:00
coleramos425 7d34e80567 Replace deprecated roofline warning with logging helper function
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-25 18:43:20 +00:00
coleramos425 0fc620ce79 Add TCC_TOO_MANY_EA_WRREQS_STALL to gfx940 input configs (#349)
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-25 18:22:00 +00:00
Karl W Schulz b5011ff0ae additional mod needed to support roofline binaries potentially
executing from two different locations

Signed-off-by: Karl W Schulz <karl.schulz@amd.com>
2024-04-22 09:00:18 -05:00
Karl W Schulz 093a4511ee update logic to detect roofline binaries in two alternate paths
depending on whether user is running within local clone or from form
install.

Signed-off-by: Karl W Schulz <karl.schulz@amd.com>
2024-04-21 14:53:14 -05:00
Karl W Schulz 3c562588ff update logic to detect VERSION file to accommodate rocm packaging;
check two locations to cover case where user is running within local
git clone directly or alternatively, from package install.

Signed-off-by: Karl W Schulz <karl.schulz@amd.com>
2024-04-21 14:53:14 -05:00
Karl W Schulz 65967658e9 fix execution error when OMNIPERF_COLOR env is set; update coloring to
support four modes:

(0) - no coloring and no loglevel delimiters
(1) - colored loglevel delimiters
(2) - non-colored loglevel delimiters
(3) - fully colored messages for all levels besides INFO

Signed-off-by: Karl W Schulz <karl.schulz@amd.com>
2024-04-04 14:44:51 -05:00
coleramos425 9403dce667 Define a README for /src subdir
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-01 14:30:21 -05:00
coleramos425 aac471c0fa Reorganizing docs runner and setting archive subdir for old docs
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-01 14:30:21 -05:00
coleramos425 9c93449cc7 Remove hardcoded URLs from docs in favor of relative links
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-01 14:30:21 -05:00
coleramos425 e0556f32ab Move dispatch id patch to proper util func in utils. Enable in rocprofv2 post-processing
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-01 14:30:21 -05:00
Nick Curtis a1017b68e9 implement rocprofv2 workaround for dispatch ids (#336)
* implement rocprofv2 workaround for dispatch ids

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>

* formatting

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>

---------

Signed-off-by: Nicholas Curtis <nicurtis@amd.com>
Co-authored-by: Nicholas Curtis <nicurtis@amd.com>
2024-04-01 14:30:21 -05:00
coleramos425 1a3bdad90a Adding documentation for global command line options
Signed-off-by: coleramos425 <colramos@amd.com>
2024-04-01 14:30:21 -05:00
colramos-amd 6cc8f0154f Restore OMNIPERF_COLOR global to disable default log coloring
Signed-off-by: colramos-amd <colramos@amd.com>
2024-04-01 14:30:21 -05:00
colramos-amd b1d0b3905c Extending log coloring to message text. Enable by default.
Signed-off-by: colramos-amd <colramos@amd.com>
2024-04-01 14:30:21 -05:00
colramos-amd 78c48eaed5 Remove superfluous logging statement
Signed-off-by: colramos-amd <colramos@amd.com>
2024-04-01 14:30:21 -05:00