Files
rocm-systems/projects/rocprofiler-compute
ggottipa-amd 77f7541755 [rocprofiler-compute] Adding --torch-trace option for SWDEV-559789 (#2089)
* Adding --torch-operator option in rocprof-compute. Creates csv file for
each operator that has gpu activity, showing operator to counter values
mapping.

* --torch-operators flag added to rocprofiler-sdk

* Adding ctest for --torch-operators.

* Adding pytest markers.

* Corrections in ctest and message logging.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Adding a check for pytorch installation only when --torch-operators is passed.

* moving inject_roctx.py into src/utils.

* rebase

* Updating docs and changelog.

* Update projects/rocprofiler-compute/src/argparser.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removing special characters.

* Minor corrections.

* Setting default value for torch_operators_enabled.

* Updating the number of files according to the number of passes.

* Adding rocpd support.

* Adding a warning message to be shown when profiling a non-python workload.

* copilot suggestions, rocpd+native tool fix

* Fixed the incorrect usage of dispatch_id as event_id in the function update_rocpd_pmc_events()

* ruff format fix

* ruff formating

* Deleting torch_trace.csvs after consolidating the operator data.

* Removing checks since *torch_trace.csv files are deleted.

* Fixing file deletion.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/utils/utils.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/tests/test_profile_general.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Using default options in the testcase.

* Adding test for overhead measurement.

* Corrections in docs.

* doc updates.

* Update projects/rocprofiler-compute/src/utils/inject_roctx.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Handling potential empty frames.

* Corrected the test cases.

* Changing the flag to --torch-trace

* Fixed helper_app path issues

* Path issues

* process_torch_trace_output() now takes csv file paths as input + allows default usage.

* Replaced pandas with sqlite3

* Adding marker_trace extraction to rocpd_data.py

* Allowing all workloads to use --torch-trace option. Assuming the workload is user verified.

* Modified help section for the flag.

* Added difference in runtimes for longest running kernels in each profiling runs to overhead measurements.

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Removed the accesses to the tables.

* Ruff fixes.

* ruff

* Ruff Fixes

* Adding getattr for args.torch_trace to handle mock args.

* Fix for 'Missing guid in counter collection data - in csv mode'

* Sending output_format to process_torch_trace_output

* Warning for self contained binaries.

* Ruff

* Ruff

* Measuring longest_running_kernel_baseline instead of worst_kernel_increase, very small kernel runtimes are blowing up the worst_kernel_increase metric.

* Minor fixes in input arguments

* Ruff

* Loging PyTorch version

* Fix ruff formatting for PyTorch version logging

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-27 19:50:25 +05:30
..
2025-01-02 15:29:47 -06:00
2025-01-02 15:29:47 -06:00

ROCm Compute Profiler

General

ROCm Compute Profiler is a system performance profiling tool for machine learning/HPC workloads running on AMD MI GPUs. The tool presently targets usage on MI100, MI200, MI300, and MI350 series accelerators.

  • For more information on available features, installation steps, and workload profiling and analysis, please refer to the online documentation.

  • ROCm Compute Profiler is an AMD open source tool that is part of the ROCm software stack. We welcome contributions and feedback from the community. Please see the CONTRIBUTING.md file for additional details on our contribution process.

  • Licensing information can be found in the LICENSE file.

Development

ROCm Compute Profiler is now included in the rocm-systems super-repo. The latest sources are in the develop branch. You can find particular releases in the release/rocm-rel-X.Y branch for the particular release you're looking for.

Pulling the source using sparse-checkout

Being in the super-repo, if you only want to pull the source for a particular project, do a sparse checkout:

git clone --no-checkout --filter=blob:none https://github.com/ROCm/rocm-systems.git
cd rocm-systems
git sparse-checkout init --cone
git sparse-checkout set projects/rocprofiler-compute
git checkout develop

cd projects/rocprofiler-compute
python3 -m pip install -r requirements.txt

Testing

Populate the variable in docker/docker-compose.customrocmtest.yml. Populate the <rocm_build_image> variable in docker/Dockerfile.customrocmtest based on latest ROCm CI build information.

To quickly get the environment (bash shell) for building and testing, run the following commands:

  • cd docker
  • If the docker image is not available on the machine, then build the image, otherwise skip this step: docker compose -f docker-compose.customrocmtest.yml build
  • Launch the container, and check the name of the container: docker compose -f docker-compose.customrocmtest.yml up --force-recreate -d
  • Run bash shell on the launched container: docker exec -it <container_name> bash
  • If testing is done, kill the container: docker container kill <container_name>

Inside the docker container, clean, build, then install the project with tests enabled:

rm -rf build install && cmake -B build -D CMAKE_INSTALL_PREFIX=install -D ENABLE_TESTS=ON -D INSTALL_TESTS=ON -DENABLE_COVERAGE=ON -S . && cmake --build build --target install --parallel 8

Note that per the above command, build assets will be stored under build directory and installed assets will be stored under install directory.

Then, to run the automated test suite, run the following commands:

mkdir build
ctest

For manual testing, you can find the executable at install/bin/rocprof-compute

Standalone binary

Create standalone binary using docker container

This method uses the cmake target inside a docker container.

To create a standalone binary, run the following commands:

  • cd docker
  • Optionally, provide --build-arg STANDALONEBINARY_EXTRACT_DIR=/<path> option in build container command to change the absolute path where standalone binary will extract its contents. This option should be specified after the build keyword. Default is /tmp.
  • docker compose -f docker-compose.standalone.yml build (build container command)
  • docker compose -f docker-compose.standalone.yml up --force-recreate -d && docker attach docker-standalone-1 (run container and attach to see its output)

Create standalone binary using cmake target locally without docker

To create a standalone binary, run the following commands:

  • pip install -r requirements.txt (install python dependencies)
  • Optionally, provide -D STANDALONEBINARY_EXTRACT_DIR=/<path> option in cmake config. command to change the absolute path where standalone binary will extract its contents. Default is /tmp.
  • cmake -B build -S . (cmake config. command)
  • cmake --build build --target standalonebinary (call standalonebinary cmake target)

Standalone binary creation methodology

To build the binary we follow these steps:

  • Use RHEL 8.10 docker image as the base image (only in docker method)
  • Install python3.9 (only in docker method)
  • Install runtime dependencies (only in docker method)
  • Install dependencies for building standalone binary
  • Call the standalonebinary cmake target which uses Nuitka to build the standalone binary

You should find the rocprof-compute.bin standalone binary inside the build folder in the root directory of the project.

Things to note about standalone binary

  • Nuitka is used for compiling the python interpreter, python dependencies and source code into C and then to a executable. The whole process takes about 30 minutes. The self-extracting standalone binary itself is approximately 150 MB in size, however, the total size of the extracted compiled artifacts is approximately 650 MB.

  • By default, standalone binary extracts its contents to a directory rocprof_compute_standalonebinary_<pid> under /tmp parent directory upon execution, however, the parent directory can be configured as explained in standalone binary creation section.

  • When using docker method, since RHEL 8 ships with glibc version 2.28, this standalone binary can only be run on environment with glibc version greater than 2.28. glibc version can be checked using ldd --version command.

  • If not using docker, the minimum glibc version is determined by the OS where cmake is run.

To test the standalone binary provide the --call-binary option to pytest.

How to Cite

This software can be cited using a Zenodo DOI reference. A BibTex style reference is provided below for convenience:

@misc{xiaomin_lu_2022_7314631
  author       = {Xiaomin Lu and
                  Cole Ramos and
                  Fei Zheng and
                  Karl W. Schulz and
                  Jose Santos and
                  Keith Lowery and
                  Nicholas Curtis and
                  Cristian Di Pietrantonio},
  title        = {rocprofiler-compute},
  url          = {https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-compute}
}