コミットグラフ

660 コミット

作成者 SHA1 メッセージ 日付
vedithal-amd 7c0ffd14a9 Print counter list in DEBUG level (#758) 2025-06-16 16:53:00 -04:00
ywang103-amd b8dd6d049d format the code after mem chart's fix of test (#753) 2025-06-13 12:03:42 -04:00
ywang103-amd bb0c417871 fix broken test with --cols options (#750) 2025-06-12 19:45:24 -04:00
Vignesh Edithal 6054b3b7fd Bugfix for PR #744 2025-06-12 16:09:15 -04:00
vedithal-amd d27ee69b52 Change default rocprof to rocprofv3 (#748)
* Revert of https://github.com/ROCm/rocprofiler-compute/pull/738

* Change default rocprof backend interface to rocprofv3

* Add MI 350 support in documentation

* Added known issue that MI 100 profiling will not work unless rocprofv1
  is explicitly opted in

* Remove MI 50 soc gfx python class since MI 50 is not supported
2025-06-12 15:45:11 -04:00
vedithal-amd cdd41dee40 Remove rocscope related code and add deprecation warning for mongo db usecase (#744)
* Remove rocscope related code

* Add deprecation warning for database update mode which is used for grafana and mongodb functionality
2025-06-12 14:21:24 -04:00
David Galiffi 1cd989a110 Copyright Header Compliance (#745)
- for SWDEV-537492
2025-06-12 12:02:58 -04:00
jamessiddeley-amd f004aeebe9 fixed long kernel names cut off in --kernel-names option (#728)
* reformatted kernel roofline PDF to use table

* restored kernel symbol icons

* enhance code readability

* restored cell text wrap

---------

Signed-off-by: jamessiddeley-amd <James.Siddeley@amd.com>
2025-06-12 10:23:40 -04:00
cfallows-amd 0415bb9740 Add roofline cli_generate_plot method (#737)
Add option to print out roofline plot in terminal using plotext.
Takes in one datatype and returns the str from plot.build() which contains the visual plot of roofline analysis for said datatype.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-06-11 15:38:21 -04:00
cfallows-amd 24d3e7eecd Update roofline binaries (#741)
Update roofline binaries from rocm-amdgpu-bench
- uses hip to find number of CUs dynamically instead of hardcoded values in table

Remove duplicate AI plot points printing
- only print ai points once on plot since we are measuring using total flops and value is same
- remove datatype from legend labels

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-06-10 15:43:56 -04:00
cfallows-amd ce3ef1400e Fix load_kernel_top arg for GUI analyze mode (#740)
--gui option for analyze mode failing due to missing arg in load_kernel_top call in pre_processing

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-06-10 11:27:03 -04:00
vedithal-amd a1ef3425c6 Revert default interface to rocprof v1 (#738)
* Add deprecation warning for rocprof v1 / v2 / v3 interfaces to use
  rocprofiler-sdk interface
2025-06-09 16:39:11 -04:00
Fei Zheng 96aa04fb13 TUI improvement (#732) 2025-06-09 11:29:10 -06:00
vedithal-amd 721053bd03 Bugfix for rocprofiler sdk interface not working in MI 200 (#733) 2025-06-09 12:33:25 -04:00
Fei Zheng e5b31af2a4 CLI: enable mem_chart for single run (#643) 2025-06-06 16:15:56 -06:00
Fei Zheng d756aeb3fd Support stochastic pc sampling 2025-06-06 12:43:52 -06:00
xuchen-amd ca0cdaf948 Introduce rocprof-compute TUI (Text User Interface) (#682)
* rocprof-compute TUI (Text User Interface) - providing users interactive analyze experience with visuals.

* Analyze results with tables, charts, plots.

* Add menu bar, terminal, directory dialog. Improve logging and ui.

* Add display config file to manipulate result categorization.

* Add support for recently opened dirs.

* Update licensing and version.
2025-06-04 17:06:08 -04:00
Fei Zheng ab6665d317 Fix peak flops of F8 I8 F16 and BF16 on MI300 2025-06-04 12:51:46 -06:00
ywang103-amd e5c7d4795a Tcc new format input yaml (#723) 2025-06-04 12:24:57 -04:00
xuchen-amd f0fad19e8b Add chip specs (#681)
* Add perfmon config spec, enhance memory partition info.

* Add gfx950 perfmon config.

* Add High Freq variants in gfx942.

* Add backup detection methods for gpu model.

* Improve get_num_xcds logic by adding detection of 1to1 arch-to-compute_partition logic.

* Add default compute partition settings spx:8 for when gpu_model=None.

* Update gpu spec tests.

* Add backup compute partition detection.

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>
2025-05-29 16:35:34 -04:00
Ben Richard 45296ceb46 Upgrade to Dash 3.0 (#719) 2025-05-29 14:36:03 -04:00
cfallows-amd bbe2e17b80 Rename roofline bins (#717)
Rename roofline bins, remove rocm version in naming. Change method for binary search.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-05-28 14:55:51 -04:00
cfallows-amd cb2d928ecf Add F4 F6 to roofline for MI350 series (#709)
Add roofline bins with FP4 FP6 datatypes enabled for gfx950 arch

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-05-26 18:36:31 -04:00
jamessiddeley-amd 09b6ef4508 Fixed duplicate keys in analysis_configs yamls (#707)
* fixed duplicate keys in analysis_configs yamls

* Fix: removed TODO comment

Signed-off-by: James Siddeley <james.siddeley@amd.com>

---------

Signed-off-by: James Siddeley <james.siddeley@amd.com>
2025-05-20 13:12:46 -04:00
Ben Richard 41dd4aab90 Update illegal character check for profile name (#703) 2025-05-16 15:45:16 -04:00
cfallows-amd 43dbf38b27 Check mode during soc init for roofline (#705)
Check mode before creating roofline object- skip if only printing specs

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-05-16 12:54:53 -04:00
vedithal-amd 5cb86e31fc Implement interface to rocprofiler sdk (#695)
* Setting ROCPROF=rocprofiler-sdk environment variable will use rocprofiler-sdk C++ library instead of rocprofv3 python script

* Add runtime option --rocprofiler-sdk-library-path to use custom version of rocprofiler sdk library
    * Add --rocprofiler-sdk-library-path conftest option for tests

* Setup appropriate environment variables to inject rocprofiler sdk code to user command
    * Add env. vars. for counter collection and filtering
    * Add env. vars. for pc sampling

* Use python bindings to list counters supported by rocprofiler sdk
2025-05-13 10:48:21 -04:00
cfallows-amd d527d77337 Fix setting roofline-data-type option in both profile and analyze modes (#702)
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-05-12 23:02:47 -04:00
vedithal-amd abd500593b Fix PC sampling analysis config issue (#697) 2025-05-06 18:22:15 -04:00
Ben Richard 35493f440c Avoid crash when profiling data not generated (#694)
* Avoid crash when profiling data not generated

-Handle case where program has no kernel launches
-Improve error messages
-Avoid roofline when profiling data is missing

Signed-off-by: benrichard-amd <ben.richard@amd.com>

* Update other soc_gfx files to catch missing pmc_perf.csv

* Fix formatting

* Fix incorrectly ordered imports

---------

Signed-off-by: benrichard-amd <ben.richard@amd.com>
2025-05-05 16:09:48 -04:00
cfallows-amd 41e73650d5 Enable roofline for MI350 series (#677)
Rework of roofline binaries generated from rocm-amdgpu-bench
- removed arch identifier in bin name
- removed rocm5 bins altogether

Updated required distros for roofline
- updated distro checks and bin naming
- moved up ubuntu20.04->22.04 and sles15.3->15.6 per rocm support

Enabled ctests for mi350 for test_roof_*
- removed mi350 series check to skip these specific tests

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-04-28 16:08:23 -04:00
Daniel Su b77fcf575e Set rocprofV3 agent-index to absolute (#675)
Signed-off-by: Daniel Su <danielsu@amd.com>
2025-04-28 15:38:07 -04:00
xuchen-amd 85bfa73e2c Add test for gfx942 number of xcds. (#674)
* Add test for 9fx942 number of xcds.

* Improve the structure of mi gpu specs, add num_xcds_spec_class test.

* Add to ctest.

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>
2025-04-28 11:29:14 -04:00
xuchen-amd ee73c2a119 process hip trace output. (#654)
Signed-off-by: xuchen-amd <xuchen@amd.com>
2025-04-22 18:31:47 -04:00
xuchen-amd f145f89e30 Patch in new rocprofv3 metrics. (#679) 2025-04-22 18:30:26 -04:00
ywang103-amd 3e09f038e5 change default rocprof version to v3 when not setting env variable (#673) 2025-04-16 12:38:20 -04:00
cfallows-amd c056a39db4 Add roofline support for rhel10 (#667)
-add check for rhel10 (platform:el10), force use rhel roof binary
-update changelog in 'unreleased- added' section

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-04-11 17:45:53 -04:00
cfallows-amd 03732d3719 Fix rpath checks during RPM generation on RHEL10 (#669)
Invalid rpath on roofline binaries reported during build testing for new RHEL10 addition, removed rpaths to prevent rpath check failures.

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-04-11 17:40:59 -04:00
xuchen-amd e7a7af539a Add mi325 specs. (#663) 2025-04-07 17:03:40 -04:00
cfallows-amd c45e20f325 Fixes for roofline datatype plot outputs (#659)
Profile mode:
Fix roofline plots for datatypes that have peakVALU only. Check for highest roofline to plot the bandwidth lines to proper height, don't rely on existence of peakMFMA for every datatype.
Analyze mode:
Add roofline-data-type option for viewing pdfs in standalone gui. Default is same as profile mode, FP32.

---------

Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
2025-04-07 12:10:37 -04:00
vedithal-amd f9aa7be97c Support MI 350 profiling (#632)
* Add MI 350 hardware information

* Refactor MI GPU YAML file and corresponding interface

* Add SoC file for gfx950 architecture

* Add analysis report configs for MI 350 containing existing metrics

* Add placeholder None valued metrics for previous architectures to make
  baseline comparison work

* Enable testing on MI 350

* Analysis config metric changes
    - SPI changes
        - Update metric formula for default SPI pipe counter
             - Use efficiently collected pipe wise SPI counters
        - Add SPI Wave Occupancy
        - Add Scheduler-Pipe Wave Utilization
        - Update formula for VGPR Writes
        - Add Scheduler-Pipe FIFO Full Rate
   - CPC changes
	- Add CPC SYNC FIFO Full Rate
	- Add CPC CANE Stall Rate
        - Add CPC ADC Utilization
   - SQ changes
        - Add VALU co-issue efficiency
        - Add F6F4 datatype metrics
        - Update formula for total FLOPs by adding F6F4 counters
        - Add LDS STORE / LOAD / ATOMIC metrics
        - Add LDS STORE / LOAD / ATOMIC bandwidth
        - Add LDS FIFO and TA ADDR / CMD / DATA FIFO full rates

* Collect TCP_TCP_LATENCY_sum only for gfx950 (MI 350)

* Do not inject SQ_ACCUM_PREV_HIRES unnecesarily

* Do not hardcode memory and shader clock speeds

* Write num_hbm_channels to sysinfo.csv instead of hbm_bw while profiling

* Move generate sysinfo.csv to pre processing step of profiling

* Add warnings to use --specs-correction for missing sysinfo.csv values during analysis phase

* Update CHANGELOG

* Analysis phase warning to use --specs-correction when needed
2025-04-03 02:21:18 -04:00
xuchen-amd f3736778f4 Add mi350 ta td tcp tcc counters (#653)
* Add mi350 TA and TD metrics.

* Add mi350 TCC metrics, and separate write and atomic metrics.

* Add mi350 TCP metrics.

* Add none values for non-gfx950 socs, remove missing metrics in rocprofv3.

---------

Signed-off-by: xuchen-amd <xuchen@amd.com>
2025-04-02 21:25:47 -04:00
xuchen-amd c7202923b0 remove flask debug msg (#655)
* Suppress Flask warning message in quiet mode.

* Init args.gui if dne.
2025-04-02 20:29:39 -04:00
xuchen-amd dce75f4afa Enable tuned performance counters for gfx950 (#652)
* Enable non-functional performance counters for gfx950.

* Update changelog.

* Add none value metrics for non-gfx950 socs

* Remove rocprofv3 missing metrics.
2025-04-02 14:43:12 -04:00
raramakr df2296529b SWDEV-521636 - Add dependent script path to system path in rocprof-compute (#651)
In wheel environment, rocprof-compute in bin folder is not a soft link. For executing rocprof-compute from bin folder, the system path should also have the dependency script paths. Added the same
2025-04-02 09:41:02 -07:00
xuchen-amd e77dd1a1ab Improve chip id logic (#648)
* Improve chip id logic, add missing physical and virtual chip ids.
2025-04-01 12:18:07 -04:00
ywang103-amd 7b38766caa re-write fucntion that detects whether v1 is in use to avoid false negative result when ROCPROF is not set (#647) 2025-03-31 16:40:53 -04:00
Fei Zheng 9bacad0876 Support host-trap PC Sampling on CLI (beta version) 2025-03-28 16:51:49 -06:00
Ben Richard 9bd45f5135 Read Accum_VGPR_Count from rocprof output if provided (#645) 2025-03-28 10:43:24 -04:00
ywang103-amd 7c1f14123a fix the wrong number of channels of TCC counters to put in pmc txt file (#633) 2025-03-27 18:15:41 -04:00