aa5dfb98f9
* Fix rocprofiler-sdk metrics definition
* Use TCC_EA0_RDREQ_128B instead of TCC_BUBBLE counter for L2 cache to
HBM counters and metrics
* Update MI350 counter definitions
* FETCH_SIZE
* BANDWIDTH_EA
* Update MI350 metrics definitions
* System Speed of Light, L2-Fabric Read BW
* Roofline Plot Points, AI (Arithmetic Intensity) HBM
* Roofline Performance Rates, HBM Bandwidth
* Remove redundant definition for gfx950 and fix BANDWIDTH_EA definition
Test HBM bandwidth metric for memcopy workload
* Add memcopy.cpp workload
* Add metric validation test suite to validate HBM Bandwidth metric for
memcopy workload
* Move gpu_soc() to test_utils.py for better re-usability
* Update TUI analysis config
* Fix hbm bandwidth formula for mi350 in calc_ai_profile
Co-authored-by: Alysa Liu <Alysa.Liu@amd.com>
130 line
6.2 KiB
Markdown
130 line
6.2 KiB
Markdown
# ROCm Compute Profiler
|
|
|
|
## General
|
|
|
|
ROCm Compute Profiler is a system performance profiling tool for machine
|
|
learning/HPC workloads running on AMD MI GPUs. The tool presently
|
|
targets usage on MI100, MI200, MI300, and MI350 series accelerators.
|
|
|
|
* For more information on available features, installation steps, and
|
|
workload profiling and analysis, please refer to the online
|
|
[documentation](https://rocm.docs.amd.com/projects/rocprofiler-compute/en/latest/).
|
|
|
|
* ROCm Compute Profiler is an AMD open source tool that is part of the ROCm software stack. We welcome contributions and
|
|
feedback from the community. Please see the
|
|
[CONTRIBUTING.md](CONTRIBUTING.md) file for additional details on our
|
|
contribution process.
|
|
|
|
* Licensing information can be found in the [LICENSE](LICENSE.md) file.
|
|
|
|
## Development
|
|
|
|
ROCm Compute Profiler is now included in the rocm-systems super-repo. The latest sources are in the `develop` branch. You can find particular releases in the `release/rocm-rel-X.Y` branch for the particular release you're looking for.
|
|
|
|
### Pulling the source using sparse-checkout
|
|
|
|
Being in the super-repo, if you only want to pull the source for a particular project, do a sparse checkout:
|
|
|
|
```bash
|
|
git clone --no-checkout --filter=blob:none https://github.com/ROCm/rocm-systems.git
|
|
cd rocm-systems
|
|
git sparse-checkout init --cone
|
|
git sparse-checkout set projects/rocprofiler-compute
|
|
git checkout develop
|
|
|
|
cd projects/rocprofiler-compute
|
|
python3 -m pip install -r requirements.txt
|
|
```
|
|
|
|
## Testing
|
|
|
|
Populate the <usename> variable in `docker/docker-compose.customrocmtest.yml`.
|
|
Populate the <rocm_build_image> variable in `docker/Dockerfile.customrocmtest` based on latest ROCm CI build information.
|
|
|
|
To quickly get the environment (bash shell) for building and testing, run the following commands:
|
|
* `cd docker`
|
|
* If the docker image is not available on the machine, then build the image, otherwise skip this step: `docker compose -f docker-compose.customrocmtest.yml build`
|
|
* Launch the container, and check the name of the container: `docker compose -f docker-compose.customrocmtest.yml up --force-recreate -d `
|
|
* Run bash shell on the launched container: `docker exec -it <container_name> bash`
|
|
* If testing is done, kill the container: `docker container kill <container_name>`
|
|
|
|
Inside the docker container, clean, build, then install the project with tests enabled:
|
|
```
|
|
rm -rf build install && cmake -B build -D CMAKE_INSTALL_PREFIX=install -D ENABLE_TESTS=ON -D INSTALL_TESTS=ON -DENABLE_COVERAGE=ON -S . && cmake --build build --target install --parallel 8
|
|
```
|
|
|
|
Note that per the above command, build assets will be stored under `build` directory and installed assets will be stored under `install` directory.
|
|
|
|
Then, to run the automated test suite, run the following commands:
|
|
```
|
|
mkdir build
|
|
ctest
|
|
```
|
|
|
|
For manual testing, you can find the executable at `install/bin/rocprof-compute`
|
|
|
|
## Standalone binary
|
|
|
|
### Create standalone binary using docker container
|
|
|
|
This method uses the cmake target inside a docker container.
|
|
|
|
To create a standalone binary, run the following commands:
|
|
* `cd docker`
|
|
* Optionally, provide `--build-arg STANDALONEBINARY_EXTRACT_DIR=/<path>` option in build container command to change the absolute path where standalone binary will extract its contents. This option should be specified after the `build` keyword. Default is `/tmp`.
|
|
* `docker compose -f docker-compose.standalone.yml build` (build container command)
|
|
* `docker compose -f docker-compose.standalone.yml up --force-recreate -d && docker attach docker-standalone-1` (run container and attach to see its output)
|
|
|
|
### Create standalone binary using cmake target locally without docker
|
|
|
|
To create a standalone binary, run the following commands:
|
|
* `pip install -r requirements.txt` (install python dependencies)
|
|
* Optionally, provide `-D STANDALONEBINARY_EXTRACT_DIR=/<path>` option in cmake config. command to change the absolute path where standalone binary will extract its contents. Default is `/tmp`.
|
|
* `cmake -B build -S .` (cmake config. command)
|
|
* `cmake --build build --target standalonebinary` (call standalonebinary cmake target)
|
|
|
|
### Standalone binary creation methodology
|
|
|
|
To build the binary we follow these steps:
|
|
* Use RHEL 8.10 docker image as the base image (only in docker method)
|
|
* Install python3.9 (only in docker method)
|
|
* Install runtime dependencies (only in docker method)
|
|
* Install dependencies for building standalone binary
|
|
* Call the standalonebinary cmake target which uses Nuitka to build the standalone binary
|
|
|
|
You should find the rocprof-compute.bin standalone binary inside the `build` folder in the root directory of the project.
|
|
|
|
### Things to note about standalone binary
|
|
|
|
* [Nuitka](https://nuitka.net/user-documentation/) is used for compiling the python interpreter, python dependencies and source code into C and then to a executable. The whole process takes about 30 minutes. The self-extracting standalone binary itself is approximately 150 MB in size, however, the total size of the extracted compiled artifacts is approximately 650 MB.
|
|
|
|
* By default, standalone binary extracts its contents to a directory `rocprof_compute_standalonebinary_<pid>` under `/tmp` parent directory upon execution, however, the parent directory can be configured as explained in standalone binary creation section.
|
|
|
|
* When using docker method, since RHEL 8 ships with glibc version 2.28, this standalone binary can only be run on environment with glibc version greater than 2.28.
|
|
glibc version can be checked using `ldd --version` command.
|
|
|
|
* If not using docker, the minimum glibc version is determined by the OS where cmake is run.
|
|
|
|
To test the standalone binary provide the `--call-binary` option to pytest.
|
|
|
|
## How to Cite
|
|
|
|
This software can be cited using a Zenodo
|
|
[DOI](https://doi.org/10.5281/zenodo.7314631) reference. A BibTex
|
|
style reference is provided below for convenience:
|
|
|
|
```
|
|
@misc{xiaomin_lu_2022_7314631
|
|
author = {Xiaomin Lu and
|
|
Cole Ramos and
|
|
Fei Zheng and
|
|
Karl W. Schulz and
|
|
Jose Santos and
|
|
Keith Lowery and
|
|
Nicholas Curtis and
|
|
Cristian Di Pietrantonio},
|
|
title = {rocprofiler-compute},
|
|
url = {https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-compute}
|
|
}
|
|
```
|