* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.
* --torch-operators flag added to rocprofiler-sdk
* Adding ctest for --torch-operators.
* Adding pytest markers.
* Corrections in ctest and message logging.
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Apply suggestion from @Copilot
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Adding a check for pytorch installation only when --torch-operators is passed.
* moving inject_roctx.py into src/utils.
* rebase
* Updating docs and changelog.
* Update projects/rocprofiler-compute/src/argparser.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/utils/inject_roctx.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Removing special characters.
* Minor corrections.
* Setting default value for torch_operators_enabled.
* Updating the number of files according to the number of passes.
* Adding rocpd support.
* Adding a warning message to be shown when profiling a non-python workload.
* copilot suggestions, rocpd+native tool fix
* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()
* ruff format fix
* ruff formating
* Deleting torch_trace.csvs after consolidating the operator data.
* Removing checks since *torch_trace.csv files are deleted.
* Fixing file deletion.
* Update projects/rocprofiler-compute/src/utils/inject_roctx.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/utils/utils.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/tests/test_profile_general.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Using default options in the testcase.
* Adding test for overhead measurement.
* Corrections in docs.
* doc updates.
* Update projects/rocprofiler-compute/src/utils/inject_roctx.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Handling potential empty frames.
* Corrected the test cases.
* Changing the flag to --torch-trace
* Fixed helper_app path issues
* Path issues
* process_torch_trace_output() now takes csv file paths as input + allows default usage.
* Replaced pandas with sqlite3
* Adding marker_trace extraction to rocpd_data.py
* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.
* Modified help section for the flag.
* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.
* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Removed the accesses to the tables.
* Ruff fixes.
* ruff
* Ruff Fixes
* Adding getattr for args.torch_trace to handle mock args.
* Fix for 'Missing guid in counter collection data - in csv mode'
* Sending output_format to process_torch_trace_output
* Warning for self contained binaries.
* Ruff
* Ruff
* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.
* Minor fixes in input arguments
* Ruff
* Loging PyTorch version
* Fix ruff formatting for PyTorch version logging
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Pin versions in requirements-test.txt
- Validated compatibility to version pins in requirements.txt
- Validated compatibility with pytest, ctest, automatic test suite
- Validated compatibility with Python 3.9, 3.10, 3.11, and 3.12.
* Remove unused mock dependency
* Initial cleanup of compute workflows and skeleton of ghcr workflow
* Add containers-ci.yml, update opensuse and rhel dockerfiles
* rename id in rocprofiler-compute-ghcr.yml
* Add new line to end of containers-ci.yml
* Update action versions for rocprofiler-compute-ghcr.yml
* Switch back to SHA for action versions
* Add conda set solver classic fix to compute CI dockerfiles
* Update conda install for compute Dockerfiles
* Change opensuse version to 15.6 in containers-ci.yml
* Add fix for ubuntu noble to compute Dockerfile.ubuntu.ci
* Add default distro and version to Dockerfile.ubuntu.ci
* Updated regex for tarball version
* Remove Python3.8 from compute CI Dockerfiles
* Change RHEL 9.4 to 9, add retry for compute workflow
* Revert name change for compute rhel workflow
* update path naming
* Remove binutils-gold from Dockerfile.opensuse.ci
* Remove conda python installs from Dockerfile.ci files in compute
* Change CMake version to 3.21 in compute Dockerfile.ci files
* Update checkout actions from v4 to v5
* Pin dependencies and fix test paths for package layout
- Pin all dependencies in requirements.txt to specific versions to ensure stability and reproducibility.
- Update test_autogen_config.py to correctly resolve source paths for both development and installed package layouts.
- Validated compatibility with Python 3.9, 3.10, 3.11, and 3.12.
* Remove setuptools dependency since we dont support pip install and instead use cmake
*Added iteration_multiplex_impute_counters on pmc data- GUI dataframe did not implement this in the build_layout method previously
*Created a Workload() in profile mode post-processing for roofline html standalone plot to be generated- this will be removed once roofline plot is moved to analyze phase in future release
*Added iteration_multiplexing run parameter to roofline object init so that we can accurately parse dataframe if the option was used during profiling- this helps us to avoid reading nan values in certain dispatches that did not get imputed in calc_ai_profile
*Cleanup for unused legacy code, adjusted method parameters to assist in moving roofline plotting to analyze mode in future release
*Update iteration multiplexing data imputation algorithm to impute counters for ungrouped dispatches at the end based on the previous group. This however won't work if there are no dispatches that can be grouped (i.e. number of dispatches < number of counter buckets)
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Use TheRock nightly builds in testing container
* Add HIP_DEVICE_LIB_PATH env var for hipcc to work
* Add HIP_PLATFORM env var for cmake hip package
* Add tarball placeholder
* Add -f to curl command to fail on HTTP error
* Removed attach tool library path
* Support new attach/detach API
* New attach/detach API was introduced in
https://github.com/ROCm/rocm-systems/pull/1653
* Provide backward compatibility with old api
* Stabilize attach/detach tests by adding sleep to help workload get
ready for attachment
* Fix typo in test name
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
Co-authored-by: Fei Zheng <44449748+feizheng10@users.noreply.github.com>
* Analysis database v1.2.0
* `pc_sampling` and `roofline_data` tables should relate to `kernel` table instead of `workload` table
* Remove `kernel_name` fields in `pc_sampling` and `roofline_data` table
* Add kernel existence check for roofline data to prevent KeyError (#2536)
* Initial plan
* Add kernel existence check for roofline data to prevent KeyError
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
* Optimize analysis performance
* Refactor database schema: separate metric definitions from kernels
Reorganize the database ORM to decouple metric definitions from kernel
objects. This improves the schema design by:
- Rename Metric -> MetricDefinition and Value -> MetricValue for clarity
- Move metric definitions from kernel-level to workload-level, since
metric definitions are shared across kernels
- Update relationships: MetricDefinition belongs to Workload,
MetricValue
references both MetricDefinition and Kernel
- Refactor metric_view to join through the new schema structure
- Update test fixtures to use renamed table and class names
- Update documentation with new example output using nbody workload
- Regenerate database schema and views diagrams
* Add min amd max aggregation in kernel_view
* Add primary key id from tables into the view
---------
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: vedithal-amd <191402304+vedithal-amd@users.noreply.github.com>
* Update readme general section and citation version and date.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Minor change to project title- changing now to not forget but we are waiti8ng on feedback about citation from r&d.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Edit citation from R&D feedback
---------
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Improve native tool discovery and partition detection
- Enhanced native tool path resolution to support CMAKE_INSTALL_LIBDIR variations
(lib, lib64, lib32, etc.) using glob pattern matching
- Extracted path variables to avoid duplication in error messages
- Improved error message clarity by showing exact paths searched for .so and .cpp files
- Simplified code path construction using consistent Path.resolve().parents[x] syntax
- Fixed redundant partition warnings on pre-MI300 GPUs by adding architecture check
- Only query compute/memory partition on MI300+ series (gfx940+)
- Added proper type hints for gpu_arch parameter
- Moved gpu_info extraction after soc_info to ensure gpu_arch is available
- Improved code comments for MI300 series threshold
* Handle gpu arch like a hex string
* attach: Formalize ROCAttach API
- Make ROCAttach public with public headers
- Change detach to take a PID
- attach and detach are now reentrant
- Cleanup of states and signal handling in ptrace session
- Fixes mixed up definition of ROCPROF_ATTACH_TOOL_LIBRARY
- ROCPROF_ATTACH_TOOL_LIBRARY now always means the tool library loaded by the attachment target
- ROCPROF_ATTACH_LIBRARY refers to the library used to perform attachment
- Add direct call of rocprof-attach
- Fix python library call of rocprof-attach
- Function now named attach(), changed from main()
* attach: rocprof-compute ROCAttach updates
- Update to new library names
- Correct usage of C lib detach
* attach: add test for rocattach
- Disable ASan, TSan, and UBSan for the new parallel-attach test
- Lower log level for LSan tests, existing behavior from other tests
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
* Add cmake based instructions to create standalone binary
* Specify standalone binary extraction path in doc.
* Add documentation to explain how to specify self-extraction path
when building the standalone binary where contents of the binary
are extracted during execution
* Pin Nuitka to version 2.6 for consistency in building standalone binary
* Replace O(n^2²) nested loop with O(1) dictionary lookup when associating
metric values with metrics. Pre-group values by (metric_id, kernel_name)
to eliminate redundant iteration over entire values dataframe for each
metric-kernel combination.
* This optimization significantly improves database write performance for
workloads with large numbers of metrics and kernels.
* Standalone roofline should create HTML instead of PDF
* Eiminate the dependency on kaleido and plotly_get_chrome by moving
towards plotly native HTML image roofline chart generation
* Address review comments
config_hashes json had mismatched md5s for the delta_hash values, regenerated the file with the existing files in develop branch.
Signed-off-by: Carrie Fallows <Carrie.Fallows@amd.com>
* Data imputation strategy for iteration multiplexing
* Implement data imputation methodology to handle missing counter values
in case of iteration multiplexing
* Enable dispatch filtering with iteration multiplexing since we are no
longer merging dispatches
* Bugfix to prevent check for missing counter values when using csv
format when profiling with iteration multiplexing
* Move warning and info message in case of iteration multiplexing to
sanitize function which comes earlier in analyze mode
* Address review comments
* Fix typo in documentation
* Move profiling config init. after path check in sanitize()
* Graceful handling of dispatches with all counters empty within data
imputation logic
* Improve info message for iteration multiplexing based analysis
* Ensure proper error message when trying to run iteration multiplexing with attach/detach
* fix test case
* Remove MFMA functionality in rocflop sample since its not supported in MI50
* Add gfx arc based support for MFMA and SMFMAC in rocflop.cpp
* Add --int32 usage doc
* Address review comments
* Update rocprofiler workflows to use new runner naming for mi325
* Add input options to workflow_dispatch for rocprofiler-systems CI workflow
* Update runner name on therock-ci-linux.yml as well
* Fix merging logic for multi process
* Fix dispatch id reset logic in case of rocpd format
* Fix kernel id reset logic in case of csv format
* Revert correlation logic change in csv format
* Do inner join instead of left join
* Added tool for dumping counter and metric values
* Skip Linting
* Added support for iteration multiplexing
* Remove subparser and supress compute options
* Specify output dir
* Add kernel info
* csv name change
* Added comments
* Support dispatch id-less dataframes
* Formatting fix
* Add default for path
* Print help with no args
* Support only single workload
* Faster counter accuracy testing
* Better handle SPI_CSN_* metrics for lesser than MI350 series
* Use metric filtering to collect only relevant counters for comparison
* Ensure all workload folders are deleted after testing is completed
* Dont use clean_existing=False
* Add manual test for all counter accuracy
* Test env. vars. in rocprofiler-sdk backend
* Improve rocprofiler-sdk backend test case to check for env. vars. and
ensure we do not overwrite irrelevant env. vars.
* Remove unnecessary usage of ROCPROF_INDIVIDUAL_XCC_MODE env. var.
* Formatting fixes
* Test fixes
* Remove redundant code in tests
* Remove usage of utils_mod and use utils instead, this prevents
duplicate imports
* Fix for multi process workload profiling
Native counter collection tool updates:
* Do not dump empty counter data for a process
* Use PID instead of UUID for dumped csv files to facilitate correlation
* Handle merging multiple pairs of rocpd (from sdk tool) and csv (from
native tool) files
* Handle merging multiple pairs of csv (from sdk tool) and csv (from
native tool) files
Rocpd output format updates:
* Merge multiple rocpd databases into a single csv
* Reset dispatch id and kernel id for unique dispatches and unique
kernels respectively
* Retain multiple rocpd databases per run for multi process workloads
* Add test case for multiprocess profiling using rocflop workload
* Add rocflop
* Fix native counter csv to rocprofv3 csv conversion
* Use kernel_id instead of dispatch_id to correlate native counter csv
and kernel trace csv
* python formatting using ruff 0.14 instead of 0.13
* Install rocm-dev in rocprofiler-compute-tarball.yml workflow
* Update paths for push and PR for rocprofiler-compute-tarball.yml
* Add ROCm dependencies to disttest job
* cmake fix binary link creation and fix format
* Use python3 instead of python3.9 in RHEL 8 and RHEL 9 workflows
* set default python3 to python3.9 in rhel8
* Try alternatives setup for python3 in RHEL8 env
* Add pip install cmake to debug RHEL8 issue
* Remove python3.11 in RHEL8 workflow
* Add back comment regarding RHEL8
---------
Co-authored-by: Vignesh Edithal <Vignesh.Edithal@amd.com>
* Improve Iteration multiplexing
* Improve iteration multiplexing documentation by adding usage note and
listing caveats
* Bugfixes for iteration mulitplexing
* Use merge iteration multiplexing in analysis webui and db mode
* Do not remove Dispatch_ID column in merge iteration multiplexing
since it is needed for analysis of top dispatches based on
duration
* Bugfixes for analysis logic
* Graceful handling of missing counters in case of iteration
multiplexing
* Improved warnings when metrics could not be calculated due to
missing counter data
* Fix the check to prevent showing table when a column is full of
N/A
* Improve detection of empty values when metric evaludation fails
due to missing counter data
* Bugfixes for profile logic
* Fix kernel filtering during roofline benchmark phase
* Update changelog for bugfixes
* Remove unnecessary columns when merging dispatches for iteration multiplexing
* bugfix
* Better analysis warnings
* fix to_std() in parser
* Use median in merge iteration multiplex
* Address review comments
* Fix cmake formatting
* fix None handling of parser util functions
* Enable stochastic counter accuracy test
* fix cmake formatting
* added graceful errors/exit in profile/analyze roofline.csv
* edit if statement truth
* restore if statement truth (roofline_csv needs at least 2 rows)
* addressed comments and skipped showing roof metrics when data invalid
* fix workload merge
* changed warning to error
* removed redundant variable definition
* added roofline csv validate check in TUI
* add test cases to test validation function
* ruff format
* simplified TUI roofline handling
* Improve amdsmi interface
* Fix issue where max mem clock was being set as max gfx clock
* Handle the case when all device handles might not be usable due to
devices being hidden by ROCR and HIP environment variables
* Fix get gpu vram size to return str in KB
* Improve testing of amdsmi interface functions