Commit Graph

1146 Commitit

Tekijä SHA1 Viesti Päivämäärä
vedithal-amd 27585a8a2b Support MI 350 profiling (#632)
* Add MI 350 hardware information

* Refactor MI GPU YAML file and corresponding interface

* Add SoC file for gfx950 architecture

* Add analysis report configs for MI 350 containing existing metrics

* Add placeholder None valued metrics for previous architectures to make
  baseline comparison work

* Enable testing on MI 350

* Analysis config metric changes
    - SPI changes
        - Update metric formula for default SPI pipe counter
             - Use efficiently collected pipe wise SPI counters
        - Add SPI Wave Occupancy
        - Add Scheduler-Pipe Wave Utilization
        - Update formula for VGPR Writes
        - Add Scheduler-Pipe FIFO Full Rate
   - CPC changes
	- Add CPC SYNC FIFO Full Rate
	- Add CPC CANE Stall Rate
        - Add CPC ADC Utilization
   - SQ changes
        - Add VALU co-issue efficiency
        - Add F6F4 datatype metrics
        - Update formula for total FLOPs by adding F6F4 counters
        - Add LDS STORE / LOAD / ATOMIC metrics
        - Add LDS STORE / LOAD / ATOMIC bandwidth
        - Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates

* Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350)

* Do not inject SQ_ACCUM_PREV_HIRES unnecesarily

* Do not hardcode memory and shader clock speeds

* Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling

* Move generate sysinfo.csv to pre processing step of profiling

* Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase

* Update CHANGELOG

* Analysis phase warning to use --specs-correction when needed

[ROCm/rocprofiler-compute commit: f9aa7be97c]
2025-04-03 02:21:18 -04:00
xuchen-amd 1273a5e2a9 Add mi350 ta td tcp tcc counters (#653)
* Add mi350 TA and TD metrics.

* Add mi350 TCC metrics, and separate write and atomic metrics.

* Add mi350 TCP metrics.

* Add none values for non-gfx950 socs, remove missing metrics in rocprofv3.

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

[ROCm/rocprofiler-compute commit: f3736778f4]
2025-04-02 21:25:47 -04:00
xuchen-amd 08e083cc25 Add mi300 TCP counter tests (#644)
* Add new sample applications.

* Generalize py test launcher for additional apps.

* Add TCP pytest, and add to ctest.

* Update licensing.

* Disable for non-mi300 machines.

[ROCm/rocprofiler-compute commit: 591632dd69]
2025-04-02 20:32:13 -04:00
xuchen-amd 35acf4c410 remove flask debug msg (#655)
* Suppress Flask warning message in quiet mode.

* Init args.gui if dne.

[ROCm/rocprofiler-compute commit: c7202923b0]
2025-04-02 20:29:39 -04:00
xuchen-amd b21384ca60 Enable tuned performance counters for gfx950 (#652)
* Enable non-functional performance counters for gfx950.

* Update changelog.

* Add none value metrics for non-gfx950 socs

* Remove rocprofv3 missing metrics.

[ROCm/rocprofiler-compute commit: dce75f4afa]
2025-04-02 14:43:12 -04:00
raramakr 7bfc49e9f8 SWDEV-521636 - Add dependent script path to system path in rocprof-compute (#651)
In wheel environment, rocprof-compute in bin folder is not a soft link. For executing rocprof-compute from bin folder, the system path should also have the dependency script paths. Added the same

[ROCm/rocprofiler-compute commit: df2296529b]
2025-04-02 09:41:02 -07:00
vedithal-amd ab290f250d Weekly rebase liangdin-test on top of amd-mainline (#650)
[ROCm/rocprofiler-compute commit: a7ebbbd41e]
2025-04-01 14:18:29 -04:00
xuchen-amd abc1c336f6 Improve chip id logic (#648)
* Improve chip id logic, add missing physical and virtual chip ids.

[ROCm/rocprofiler-compute commit: e77dd1a1ab]
2025-04-01 12:18:07 -04:00
ywang103-amd 6e1cab4e03 re-write fucntion that detects whether v1 is in use to avoid false negative result when ROCPROF is not set (#647)
[ROCm/rocprofiler-compute commit: 7b38766caa]
2025-03-31 16:40:53 -04:00
Fei Zheng ee5df82698 Support host-trap PC Sampling on CLI (beta version)
[ROCm/rocprofiler-compute commit: 9bacad0876]
2025-03-28 16:51:49 -06:00
Ben Richard b0844b42bb Read Accum_VGPR_Count from rocprof output if provided (#645)
[ROCm/rocprofiler-compute commit: 9bd45f5135]
2025-03-28 10:43:24 -04:00
ywang103-amd ad070d94db fix the wrong number of channels of TCC counters to put in pmc txt file (#633)
[ROCm/rocprofiler-compute commit: 7c1f14123a]
2025-03-27 18:15:41 -04:00
ywang103-amd 79a333231c fix ip block test by changing ways of extracting agent id (#639)
[ROCm/rocprofiler-compute commit: cdb93b7a4c]
2025-03-27 16:28:00 -04:00
vedithal-amd 04dbdc5c5d Inject SQ_ACCUM_PREV_HIRES for LEVEL counters only (#641)
[ROCm/rocprofiler-compute commit: af76525baa]
2025-03-27 10:24:21 -04:00
cfallows-amd c615c12209 Add datatypes for roofline profiling (#642)
Rebuild of rocm-amdgpu-bench roofline binaries for MI200/MI300 systems with rocm6.
Added datatype options to roofline feature.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: 6cb5bcdbe9]
2025-03-26 21:07:48 -04:00
Cole Ramos 088fa6a4ac Fix incorrect logging in mi_gpu_spec.py (#626)
* Move console logging to logger function to avoid circular dependency in utils module

Signed-off-by: coleramos425 <colramos@amd.com>

* Apply python formatting

Signed-off-by: coleramos425 <colramos@amd.com>

* Remove the default StreamHandler before adding the custom

 If you are not explicitly removing this default handler, it could be causing duplicate outputs.

Signed-off-by: coleramos425 <colramos@amd.com>

* Fix lingering bugs from merge conflict resolution

Signed-off-by: coleramos425 <colramos@amd.com>

* Comply to python formatting and update pre-commit hook helper

Signed-off-by: coleramos425 <colramos@amd.com>

* Removing redundant console_log call as the get_mi300_num_xcds() call, otherwise ALL Mi200 profiling runs will print this message

Signed-off-by: coleramos425 <colramos@amd.com>

---------

Signed-off-by: coleramos425 <colramos@amd.com>

[ROCm/rocprofiler-compute commit: 04f92b72a9]
2025-03-25 17:06:37 -05:00
xuchen-amd a851c977c7 Improve readability. (#628)
[ROCm/rocprofiler-compute commit: 3294c495f5]
2025-03-25 17:49:42 -04:00
Cole Ramos 796241206d Generalize locale checker to support more UTF-8 types (#623)
Signed-off-by: coleramos425 <colramos@amd.com>

[ROCm/rocprofiler-compute commit: 38c7dce84a]
2025-03-25 16:39:02 -05:00
ywang103-amd d8c291a29d fix the crash related to agent id in rocprofv3 (#631)
[ROCm/rocprofiler-compute commit: 983f902fa0]
2025-03-25 16:33:12 -04:00
ywang103-amd 7e94296408 disable TCC flattern for rocprofv1 to avoid its crash becasue of unsupported implementation (#629)
[ROCm/rocprofiler-compute commit: a92bf96e56]
2025-03-25 15:12:19 -04:00
cfallows-amd 5079a1803f Datatype selection option for roofline (#624)
Added command line option to specify which datatype(s) to capture into the roofline PDF(s).
All datatypes are still collected by roofline call if applicable, but only specific datatypes are plotted into PDF outputs. Will dump out all datatypes into one graph, but separate FP from Int into two graphs if needed. Will skip datatype and give error message if the datatype is not valid on a particular gpu arch.
Default is FP32

Reworked roofline calls and plotting to be general enough such that any new datatypes added into rocm-amdgpu-bench can easily be reflected in rocprof-compute with simple modifications in roofline_calc.py.

Adjusted ctest to reflect expected default pdf outputs from roofline.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: a492e92034]
2025-03-25 15:02:09 -04:00
vedithal-amd 67b20baf11 Remove static pmc files (#606)
* Delete static pmc files

* Counter parsing changes
	- Move counter parsing logic to another function
	- Fix counter parsing regex
	- Log list of counters being collected

* Sanity check counters supported by rocprof
	- Emit warning instead of error since rocprof support counters
	  list might be inaccurate

* Do not collect these counters
	- TCP_TCP_LATENCY_sum (except for gfx908 and gfx90a)
	- SQC_DCACHE_INFLIGHT_LEVEL

* Update logic of writing TCC channel counter definition yaml file

* Fix bug in capture_subprocess_output() utility function
	- Make logging optional in capture_subprocess_output()

* Fix formatting and tests

* Update changelong

[ROCm/rocprofiler-compute commit: 58cf702d40]
2025-03-24 17:43:29 -04:00
ywang103-amd c52fa4ef5f fix the part that casue _sum counters missing for TCC (#627)
[ROCm/rocprofiler-compute commit: 0c0906f238]
2025-03-24 11:16:03 -04:00
dependabot[bot] bfce7de3ad Bump rocm-docs-core from 1.17.0 to 1.18.1 in /docs/sphinx (#611)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.18.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.17.0...v1.18.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocprofiler-compute commit: ebb21c4bba]
2025-03-21 17:16:51 -06:00
cfallows-amd a2b7bf38dd Add Alibaba Cloud Linux 3 to distro checking for roofline (#620)
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: fbe5e34a46]
2025-03-21 12:46:17 -04:00
xuchen-amd c0676b3489 Improve chip info logic. (#581)
* Clean up unused functions.

* Fix number of XCDs for MI300X CPX (core partition).

* Add support for memory partition mode.

* Modify total_xcd to adapt to all gpu models.

* Run black and isort.

* Make gpu_arch regex more generic.

* Add error checking for compute partition mode num xcds.

* Set gpu_chip_id as optional.

* Fix get_gpu_model.

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>

[ROCm/rocprofiler-compute commit: 2e7f82aa13]
2025-03-21 02:02:58 -04:00
ywang103-amd 5162744be0 Revert "fix the error of output path of multi-node mode (#616)" (#621)
[ROCm/rocprofiler-compute commit: b596098d14]
2025-03-20 22:03:02 -04:00
ywang103-amd b812f54a9b TCC per channel metric's fix for rocprofv3 (#597)
[ROCm/rocprofiler-compute commit: 46e15dc840]
2025-03-20 20:09:27 -04:00
vedithal-amd 77ffa73e5b Update default PR reviewers (#617)
[ROCm/rocprofiler-compute commit: 40ad99eae1]
2025-03-19 16:00:28 -04:00
ywang103-amd 1d2b1fc707 fix the error of output path of multi-node mode (#616)
* solve the error that makes name passed by -n not used in multi-node applications

* isort and black formatted

[ROCm/rocprofiler-compute commit: 23b42e90c9]
2025-03-18 17:19:19 -04:00
vedithal-amd ec4359658b Fix kernel filtering when using rocprofv3 (#615)
When using rocprof v3:
* Use --kernel-include-regex for kernel name filtering
* Use --kernel-iteration-range for kernel dispatch filtering

Update changelog

[ROCm/rocprofiler-compute commit: 45b8937d5d]
2025-03-18 11:26:45 -04:00
vedithal-amd f775c7cdbd Band aid fix for MI 100 no counters collected (#614)
* rocprofv3 might not collect any counters for MI 100, handle this case gracefully to prevent test failures

[ROCm/rocprofiler-compute commit: 64ccd588de]
2025-03-18 11:26:17 -04:00
Ben Richard d43675860b Remove dependency on en_US.UTF-8 locale (#613)
Set locale to C.utf8 instead of en_US.UTF-8

Avoid forcing the user to use en_US.UTF-8. Most Linux systems have C.utf8.

[ROCm/rocprofiler-compute commit: 96a25e8cbc]
2025-03-17 16:40:36 -04:00
ywang103-amd 83edd97f78 replace rocm-smi with amd-smi cmd (#612)
[ROCm/rocprofiler-compute commit: 0c6cec5671]
2025-03-17 16:20:41 -04:00
cfallows-amd eba173de5e Debug logging during intensities calculations when no flops recorded (#608)
Added debug log for when no flops are recorded (total_flops is 0), so AI points will not be plotted.
Removed commented out print statement that is not functional- contains nonexistent method call.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: 1c237c1382]
2025-03-14 12:37:19 -04:00
vedithal-amd 30a3751a42 Fix tests on MI 100 (#609)
[ROCm/rocprofiler-compute commit: 6827330135]
2025-03-13 12:07:03 -04:00
vedithal-amd e4fce70067 selective counter bugfix (#602)
Allow block filter of the form xx.x

[ROCm/rocprofiler-compute commit: 30752d1547]
2025-03-11 13:34:48 -04:00
vedithal-amd 0f4b5e91bd Standalone binary no self execute fix (#603)
* Fix nuitka command

[ROCm/rocprofiler-compute commit: 15edbf475e]
2025-03-11 13:34:37 -04:00
cfallows-amd 1eb6fef1b3 Add fp8 graph to standalone gui in analysis mode (#600)
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: 097d30dc5c]
2025-03-11 13:32:37 -04:00
Fei Zheng 1ce11882a6 Fix counter collection inconsistency with rocprofv3
[ROCm/rocprofiler-compute commit: 7e8d2d2c0e]
2025-03-10 21:05:40 -06:00
vedithal-amd 2167d9d295 Update sphinx docs version in develop (#601)
* Update sphinx docs version in develop instead of amd-staging branch
* Add repo admins to pr review

[ROCm/rocprofiler-compute commit: 51c9c6fad3]
2025-03-10 18:01:37 -04:00
cfallows-amd 8109228f02 Force kaleido version to be no greater than 0.2.1 (#599)
Higher versions (eg. 0.4.1) have external dependencies that are causing errors and forcing early exits without creating roof plots

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: fd288e6d13]
2025-03-10 15:16:14 -04:00
vedithal-amd 7968645191 Analysis report block based filtering for profiling (#566)
* Analysis report block based filtering for profiling

* Profiling mode changes

- `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23)
    - Only counters mentioned in the selected analysis report blocks will be collected
        - Add parsing logic to identify hardware counters from analysis report blocks
        - Add filtering logic to only write filtered counters in perfmon files
        - Log not collected counters in one line
- `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode
- Write arguments provided during profiling in profiling_configuration.yaml file

* Analysis mode changes

- During analysis mode, only show report blocks selected during profiling
    - If `-b` option is provided in analysis mode, then follow provided filters
- Do not show empty tables in analysis report

* Miscellaneous changes

- Update CHANGELOG
- Add test cases
    - Instruction mix report block filter
    - Instruction mix and Memory chart report block filter
    - Instruction mix report block filter and CPC hardware block filter
    - TA hardware block filter
    - --list-metrics in profile mode should work
- Move binary handler fixtures to conftest.py to avoid importing
  fixtures
- cmake file in tests directory has been updated to compile sample/vmem.hip for testing

* Public documentation changes

- Use the term "Hardware report block" instead of "Hardware block"
- Add documentation for "--list-metrics" option in profile mode
- Add example of filtering by hardware report block such as instruction
  mix and wavefront launch statistics
- Add deprecation warning for hardware component (sq, tcc) based filtering

[ROCm/rocprofiler-compute commit: 55cf0e237e]
2025-03-10 14:42:56 -04:00
Peter Park 3335ea5f21 Fix name in package manager install docs (#593)
* Fix post analysis gui in standalone binary (#591)

* Fix post analysis gui in standalone binary

* Add post analysis gui assets and required server libraries for GUI
  server and web page

* Add port forwarding to docker test compose

* Update README me to use `docker compose up` instead of `docker compose run`
  to run containers with port forwarding and to leverage other
  functionalities of docker compose

* Fix rocprofv1 output processing. (#588)

* fix rocprof-compute binary name in package manager install docs

---------

Co-authored-by: vedithal-amd <Vignesh.Edithal@amd.com>
Co-authored-by: xuchen-amd <xuchen@amd.com>

[ROCm/rocprofiler-compute commit: 0aefd15b7b]
2025-03-10 13:47:30 -04:00
cfallows-amd 54ec17a185 FP8 roofline support (#592)
Adding FP8 datatype to roofline feature in rocprof-compute on MI300-based systems.
FP8 now shows in terminal output and roofline csv, and outputs a standalone PDF.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>

[ROCm/rocprofiler-compute commit: 848fa1dc18]
2025-03-07 11:27:01 -05:00
Fei Zheng 5793abe46e fix specs-correction
[ROCm/rocprofiler-compute commit: f64c83fc5e]
2025-03-07 11:27:01 -05:00
xuchen-amd 66d538c2e0 Disable --kokkos-trace in rocprof-compute (#594)
.

[ROCm/rocprofiler-compute commit: 1b6bc89137]
2025-03-07 11:27:01 -05:00
xuchen-amd 663a41075b Fix rocprofv1 output processing. (#588)
[ROCm/rocprofiler-compute commit: b81310070e]
2025-03-07 11:27:01 -05:00
vedithal-amd 536fb5ea26 Fix post analysis gui in standalone binary (#591)
* Fix post analysis gui in standalone binary

* Add post analysis gui assets and required server libraries for GUI
  server and web page

* Add port forwarding to docker test compose

* Update README me to use `docker compose up` instead of `docker compose run`
  to run containers with port forwarding and to leverage other
  functionalities of docker compose

[ROCm/rocprofiler-compute commit: 0b3114fa88]
2025-03-07 11:27:01 -05:00
dependabot[bot] 254592953c Bump rocm-docs-core from 1.15.0 to 1.17.0 in /docs/sphinx (#572)
Bumps [rocm-docs-core](https://github.com/ROCm/rocm-docs-core) from 1.15.0 to 1.17.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.15.0...v1.17.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

[ROCm/rocprofiler-compute commit: 3b78dc9177]
2025-02-28 16:34:11 -07:00