* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.
* --torch-operators flag added to rocprofiler-sdk
* Adding ctest for --torch-operators.
* Adding pytest markers.
* Corrections in ctest and message logging.
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Adding a check for pytorch installation only when --torch-operators is passed.
* moving inject_roctx.py into src/utils.
* rebase
* Updating docs and changelog.
* Update projects/rocprofiler-compute/src/argparser.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/utils/inject_roctx.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Removing special characters.
* Minor corrections.
* Setting default value for torch_operators_enabled.
* Updating the number of files according to the number of passes.
* Adding rocpd support.
* Adding a warning message to be shown when profiling a non-python workload.
* copilot suggestions, rocpd+native tool fix
* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()
* ruff format fix
* ruff formating
* Deleting torch_trace.csvs after consolidating the operator data.
* Removing checks since *torch_trace.csv files are deleted.
* Fixing file deletion.
* Update projects/rocprofiler-compute/src/utils/inject_roctx.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/utils/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/tests/test_profile_general.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Using default options in the testcase.
* Adding test for overhead measurement.
* Corrections in docs.
* doc updates.
* Update projects/rocprofiler-compute/src/utils/inject_roctx.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Handling potential empty frames.
* Corrected the test cases.
* Changing the flag to --torch-trace
* Fixed helper_app path issues
* Path issues
* process_torch_trace_output() now takes csv file paths as input + allows default usage.
* Replaced pandas with sqlite3
* Adding marker_trace extraction to rocpd_data.py
* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.
* Modified help section for the flag.
* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.
* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Removed the accesses to the tables.
* Ruff fixes.
* ruff
* Ruff Fixes
* Adding getattr for args.torch_trace to handle mock args.
* Fix for 'Missing guid in counter collection data - in csv mode'
* Sending output_format to process_torch_trace_output
* Warning for self contained binaries.
* Ruff
* Ruff
* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.
* Minor fixes in input arguments
* Ruff
* Loging PyTorch version
* Fix ruff formatting for PyTorch version logging
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
*Added iteration_multiplex_impute_counters on pmc data- GUI dataframe did not implement this in the build_layout method previously
*Created a Workload() in profile mode post-processing for roofline html standalone plot to be generated- this will be removed once roofline plot is moved to analyze phase in future release
*Added iteration_multiplexing run parameter to roofline object init so that we can accurately parse dataframe if the option was used during profiling- this helps us to avoid reading nan values in certain dispatches that did not get imputed in calc_ai_profile
*Cleanup for unused legacy code, adjusted method parameters to assist in moving roofline plotting to analyze mode in future release
*Update iteration multiplexing data imputation algorithm to impute counters for ungrouped dispatches at the end based on the previous group. This however won't work if there are no dispatches that can be grouped (i.e. number of dispatches < number of counter buckets)
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Removed attach tool library path
* Support new attach/detach API
* New attach/detach API was introduced in
https://github.com/ROCm/rocm-systems/pull/1653
* Provide backward compatibility with old api
* Stabilize attach/detach tests by adding sleep to help workload get
ready for attachment
* Fix typo in test name
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
* Analysis database v1.2.0
* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table
* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table
* Add kernel existence check for roofline data to prevent KeyError (#2536)
* Initial plan
* Add kernel existence check for roofline data to prevent KeyError
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
* Optimize analysis performance
* Refactor database schema: separate metric definitions from kernels
Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:
- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
MetricValue
references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams
* Add min amd max aggregation in kernel_view
* Add primary key id from tables into the view
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
* Standalone roofline should create HTML instead of PDF
* Eiminate the dependency on kaleido and plotly_get_chrome by moving
towards plotly native HTML image roofline chart generation
* Address review comments
* Faster counter accuracy testing
* Better handle SPI_CSN_* metrics for lesser than MI350 series
* Use metric filtering to collect only relevant counters for comparison
* Ensure all workload folders are deleted after testing is completed
* Dont use clean_existing=False
* Add manual test for all counter accuracy
* Fix for multi process workload profiling
Native counter collection tool updates:
* Do not dump empty counter data for a process
* Use PID instead of UUID for dumped csv files to facilitate correlation
* Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
native tool) files
* Handle merging multiple pairs of csv (from sdk tool) and csv (from
native tool) files
Rocpd output format updates:
* Merge multiple rocpd databases into a single csv
* Reset dispatch id and kernel id for unique dispatches and unique
kernels respectively
* Retain multiple rocpd databases per run for multi process workloads
* Add test case for multiprocess profiling using rocflop workload
* Add rocflop
* Fix native counter csv to rocprofv3 csv conversion
* Use kernel_id instead of dispatch_id to correlate native counter csv
and kernel trace csv
* python formatting using ruff 0.14 instead of 0.13
* Improve Iteration multiplexing
* Improve iteration multiplexing documentation by adding usage note and
listing caveats
* Bugfixes for iteration mulitplexing
* Use merge iteration multiplexing in analysis webui and db mode
* Do not remove Dispatch_ID column in merge iteration multiplexing
since it is needed for analysis of top dispatches based on
duration
* Bugfixes for analysis logic
* Graceful handling of missing counters in case of iteration
multiplexing
* Improved warnings when metrics could not be calculated due to
missing counter data
* Fix the check to prevent showing table when a column is full of
N/A
* Improve detection of empty values when metric evaludation fails
due to missing counter data
* Bugfixes for profile logic
* Fix kernel filtering during roofline benchmark phase
* Update changelog for bugfixes
* Remove unnecessary columns when merging dispatches for iteration multiplexing
* bugfix
* Better analysis warnings
* fix to_std() in parser
* Use median in merge iteration multiplex
* Address review comments
* Fix cmake formatting
* fix None handling of parser util functions
* Enable stochastic counter accuracy test
* fix cmake formatting
* Enable running tests from installation only
* Use cmake option -DTEST_FROM_INSTALL=ON to enable running tests from installation folder only
* It is not possible to run tests from build folder in this case
* This option prevents changing working directory to source folder
* Fix SourceFileLoader to import rocprof-compute main module correctly
* Install sample executables in the test folder
* fix num_xcds_cli_output test
* Fix tests
* Skip autogen. config. test and add a TODO task for re-design of this
test
* Add flexible import of source code in test_gpu_specs.py
* Update cmake to install tests/workloads folder when INSTALL_TESTS=ON
* Fix sys.argv[0] for tests
* fix live attach detach test
* Split roofline tests
* Use N/A for missing values
* Test eval_expression for no valid data
* Fixed tests
* Updated Changelog for N/A
* Fixed platform specific test failure
* Use native tool for counter collection
* Add native counter collection tool which uses rocprofiler-sdk C++
library public API to get counter collection data
* This is enabled by default, unless --no-native-tool option is
provided or ROCPROF=rocprofv3 env. var. is provided
* This tool is only supported for ROCm version >=7.x.x
* This tool is not supported for attach/detach scenario
* Build native tool shared object during build time
* If using rocprof-compute without building then runtime compilation of
t push native tool shared object is performed
* rocprofiler-sdk tools is still used for services other than counter
collection and data collected by native tool is merged into the
rocpd/csv output of rocprofiler-sdk tool
* Make `rocpd` choice the default choice for `--format-rocprof-output`
option
* If `rocpd` public API from rocprofiler-sdk library is not present,
then fallback to `csv` choice
* In this case only `pmc_perf.csv` is written in workload folder
instead of multiple `csv` files for each profiling run
* Remove `json` choice from `--format-rocprof-output` option since it
functions identical to `csv` option
* Rename option `--rocprofiler-sdk-library-path` to
`--rocprofiler-sdk-tool-path` since we LD_PRELOAD the
rocprofiler-sdk tool shared object and not the rocprofiler-sdk library
shared object
* Fix the meaning of `--dispatch` option in `profile` mode to mention
dispatch iteration filtering instead of dispatch id filtering
* --dispatch option in analyze mode does dispatch id filtering
* Move standalone binary creation logic from cmake file to docker file
* fix native counter collection tool during attach/detach
* improve logging
* fix attach detach with native tool
* fix attach detach with native tool
* do not support attach/detach in native tool
* Update changelog
* add standalone binary creation functionality in cmake
* address review comments
* address review comments
* fix formatting
* address review comments
* Adding paths for cmake to search. Also updated min. cmake requirement to 3.21 as this was when hip was supported.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Update hip compiler ID check, sometimes comes up as Clang, sometimes ROCMClang- depends on setup.
Updated formatting.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* RHEL8.10 unable to compile due to defaulting to old c++ version, need to force c++17
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Updating changelog per docs team recommendations
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Apply suggestions from code review to changelog
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Do not required HIP complier to build native counter collection tool
* fix cmake
* gersemi formatting on latest cmake change
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* ex ci updated dependencies to include rocprofiler-sdk, but cmake was still not capturing the path- there was a commit that added to the cmake_prefix_path entry that specified rocprof-sdk's cmake location ut was too specific for the search paths in find_package's config mode.
removing the cmake_prefix_path var and adding hints to find_package call instead, and specifying config mode so it knows how to construct the search paths
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* gersemi run for formatting
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Still need prefix path, should not have been removed in last commit but does need to be shortened to just the rocm path to allow for find_package config mode to do the job
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* include cstdint for uint32_t
* Run formatting on helper.cpp
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Remove rocm 7.2 release stuff from version and changelog and handle it in separate pr
* fix version
* fix changelog
* fix changelog
* run ruff formatter
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* fix rocprofiler-sdk attach so path
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
* Analysis db changes for visualizer
* Add support for per kernel analysis metrics
* Add support for dispatch timeline visualiztion
* Show median instead of mean of dispatch duration in kernel view
* Add test case to validate analysis db schema
* Analysis db schema updte
* Add Kernel table and make Metric and Dispatch table its children
* Kernel table is a child of Workload table
* Update metric_view to show kernel_name column
* Add disptach timestamps to Dispatch table for dispatch timeline
visualization
* Update kernel_view to show duration_ns_median instead of mean
duration
* Add mean duation in kernel view
* update changelog
---------
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
* add double mode of workload dynamic_share with on remove sleeping and
set ROCP_TOOL_ATTACH=1 for running workload
* add comment in dynamic_shared.hip to exaplain how to use argv
* refactor the attach/detach profiling time in unit tests
* Set default rocprof interface as rocprofiler-sdk
* Remove rocrprofv1 and rocprofv2 interfaces
* Remove deprecation notice for rocprof v1/v2/v3 interfaces
* Make rocprofiler-sdk the default interface and make rocprofv3 interface opt-in using ROCPROF=rocprofv3
* Add deprecation notice for rocprofv3
* Make --roof-only, --block and --set mutually exclusive from each other
* Update help output and documentation
* Add sanitize function for checking profiler options
* Update filter blocks arguments when --set or --roof-only is provided
* Update filter_blocks in profiling_config.yaml based on --set option
* Log Filtered Sections instead of Report Sections and Set Selection
* Move soc class function calls from rocprof compute base class to profiler base class
* Fix bug in panel level filtering using --filter-block option
* Remove roofline specific pmc files
* Move microbenchmark entry point from gfx specific soc class to base soc class
* Run microbenchmarks only if block 4 is selected or roof only is selected; skip for mi100
* Remove L2 channels from --list-metrics
--list-metrics moved to general options
List metrics for the current architecture
Filter blocks for metrics
Removed test for --list-metrics in profile mode
Test the options don't throw error
Fixed --config-dir error
Test stdout for command line options
Provide path list for loading panel configs
Show L2 Cache (per) channel metrics
Changed command line option names
Can show two levels only
Removed filtering blocks
Moved blocks to original position
Removed filter block tests
Removed filtering
Formaating fix
Readability enhancement
Test formatting
Filter L2 channels without sysinfo
Show avilable metrics for current arch
Intermediate commit
Fixed tests
Added argument sanitization
Added list_metrics to ctest
merge iconflict resolution
Updated test marker
Updated changelog
Fixed formatting
* Updated docs
* Add single kernel filtering for roofline
* Add --kernel to documentation
* Add kernel labels to roofline pdfs
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Add test cases
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Add autodetect for mode (profile or analyze) during roof validate and filter
Prevent --kernel from affecting roofline in gui mode- although this may be broken in develop branch anyways
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Add note about roof-only usage checking for existing profiling files in the dir. If roof-only is not provided, rocprof-compute currently assumes it has to profile in full regardless. Will look into this another day.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Update CHANGELOG.md
Add line in resolved issues section to highlight that kernel filtering is now working for roofline plots
* Apply changes suggested by docs team
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Update projects/rocprofiler-compute/CHANGELOG.md
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Pratik Basyal <pratik.basyal@amd.com>
Analysis data dump
* Add `--output-format` and `--output-name` option to analyze mode
* Remove `--output` and `-save-dfs` option to analyze mode
* Add documentation on `rocpd` output format and analysis database file
* Create sqlite3 database using object relation mapping (ORM) provided
by sqlalchemy library
* Fix metrics config to remove metrics marked as `null`, fix `Unit` header, add
missing `title`
* Add test cases to ensure analysis data dump work
* Add `rocpd` choice for `--format-rocprof-output` option
* Add rocpd_data.py which defines SQL queries to extract data from rocpd database
* Use sqlite3 package to read the database
* Add `--retain-rocpd-output` option in profile mode to retain raw
rocpd database
* Add warning notice to say `--format-rocprof-output rocpd` will be
default in future release
For rocpd output:
* Use only `pmc_perf.csv` instead of reading individual coll_level results csv files
* Post process csv files using pandas in analysis mode instead of profile mode
* Use ACCUM counters instead of SQ_ACCUM_PREV_HIRES
* Add test cases for rocpd output format
* Fix code formatting issues
* Update CHANGELOG
[ROCm/rocprofiler-compute commit: 03d27c0ba0]
* Show description of metrics during analysis
* Use --include-cols Description show the Description column in analyze mode (this is hidden by default)
* Remove tips field from analysis config
* Align metric names in analysis config and documentation
* Add unified config utils/unified_config.yaml
* Add python script utils/split_config.py to auto generate analysis configuration and documentation metrics description
* Add test case to ensure unified config is older than auto-generated config
* Auto generate analysis config and documentation metrics description
* Update CONTRIBUTING.md to add instructions to build documentation assets
* Add docker image and compose file to build documentation
* Update CHANGELOG and Documentation
* Use jinja template instead of hardcoding metric tables in documentation
[ROCm/rocprofiler-compute commit: bb44e90b2d]
* Analysis report block based filtering is the default now
* Update documentation
* Update CHANGELOG
* Fix tests
* Replace hardware block based filtering tests with report block
based filtering tests
[ROCm/rocprofiler-compute commit: 98bb0f4237]
* Fix roofline rocm version bug
* Fix utils bug
* Remove unnecessary tests
* Do not check textual-fspicker package in cmake build
* Use rocprofv3 to test MI 100 and fix tests
[ROCm/rocprofiler-compute commit: 000fd4f5b2]