SWDEV-415147 Added Supported architecures for V1 and V2
Change-Id: I307061154a17dd42ead49459a73522c9bdedca8c
Šī revīzija ir iekļauta:
revīziju iesūtīja
Gopesh Bhardwaj
vecāks
1d8401dec8
revīzija
77988ac0b2
+35
-16
@@ -23,7 +23,7 @@ The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
@@ -40,7 +40,7 @@ int main() {
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.4.4/include -L/opt/rocm-5.4.4/lib -lrocprofiler64
|
||||
gcc main.c -I/opt/rocm-5.4.4/include -L/opt/rocm-5.4.4/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
@@ -50,7 +50,7 @@ The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocsight …
|
||||
rocsight …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocmtools` API do the following:
|
||||
@@ -67,7 +67,7 @@ int main() {
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.4.4/include -L/opt/rocm-5.4.4/lib -lrocmtools
|
||||
gcc main.c -I/opt/rocm-5.4.4/include -L/opt/rocm-5.4.4/lib -lrocmtools
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on `/opt/rocm-5.4.4/lib/librocmtools.so.1`.
|
||||
@@ -84,12 +84,11 @@ available in ROCm 5.5 but is deprecated and will be removed in a future release.
|
||||
| **API include** | `include/rocprofiler/rocprofiler.h` | `include/rocprofiler/rocprofiler.h` | `include/rocmtools/rocmtools.h` |
|
||||
| **API library** | `lib/librocprofiler64.so.1` | `lib/librocprofiler64.so.1` | `lib/librocmtools.so.1` |
|
||||
|
||||
|
||||
The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API it is necessary to
|
||||
@@ -108,7 +107,7 @@ int main() {
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.5.0/include -L/opt/rocm-5.5.0/lib -lrocprofiler64
|
||||
gcc main.c -I/opt/rocm-5.5.0/include -L/opt/rocm-5.5.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
@@ -118,7 +117,7 @@ The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
@@ -135,7 +134,7 @@ int main() {
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.5.0/include -L/opt/rocm-5.5.0/lib -lrocprofiler64
|
||||
gcc main.c -I/opt/rocm-5.5.0/include -L/opt/rocm-5.5.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
@@ -157,7 +156,7 @@ The ROCm Profiler Tool that uses `rocprofilerV1` can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprof …
|
||||
rocprof …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV1` API do the following:
|
||||
@@ -174,7 +173,7 @@ int main() {
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
@@ -184,7 +183,7 @@ The ROCm Profiler that uses `rocprofilerV2` API can be invoked using the
|
||||
following command:
|
||||
|
||||
```sh
|
||||
$ rocprofv2 …
|
||||
rocprofv2 …
|
||||
```
|
||||
|
||||
To write a custom tool based on the `rocprofilerV2` API do the following:
|
||||
@@ -201,38 +200,48 @@ int main() {
|
||||
This can be built in the following manner:
|
||||
|
||||
```sh
|
||||
$ gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64v2
|
||||
gcc main.c -I/opt/rocm-5.6.0/include -L/opt/rocm-5.6.0/lib -lrocprofiler64v2
|
||||
```
|
||||
|
||||
The resulting `a.out` will depend on
|
||||
`/opt/rocm-5.6.0/lib/librocprofiler64.so.2`.
|
||||
|
||||
### Optimized
|
||||
|
||||
- Improved Test Suite
|
||||
|
||||
### Added
|
||||
|
||||
- 'end_time' need to be disabled in roctx_trace.txt
|
||||
- support for hsa_amd_memory_async_copy_on_engine API function trace
|
||||
|
||||
### Fixed
|
||||
|
||||
- rocprof in ROcm/5.4.0 gpu selector broken.
|
||||
- rocprof in ROCm/5.4.1 fails to generate kernel info.
|
||||
- rocprof clobbers LD_PRELOAD.
|
||||
|
||||
## ROCprofiler for rocm 5.7.0
|
||||
|
||||
### Navi support
|
||||
|
||||
Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced thread tracing (ATT) for Navi21 and Navi31 GPUs.
|
||||
|
||||
- On Navi3x, counter collection requires the GPU to be in a stable power state. See README.md for instructions. HIP RT in ATT not yet supported.
|
||||
|
||||
### Changed
|
||||
|
||||
- ATT analysis will not run by default. For ATT to have the same behaviour as 5.5, use --plugin att <as.s> --mode network
|
||||
- Kernel Names are now removed from HIP API records, users of the API can get the kernel names from the corresponding HIP Dispatch OPS using the correlation ID, this change was done to optimize and to manage the data copied.
|
||||
- Removing Replay modes as we discovered that some of them will corrupt the applications' behavior, we will re-add them once we implement the fix for them.
|
||||
|
||||
### Optimized
|
||||
|
||||
- Improved ATT parser performance and filesizes.
|
||||
- Now profiler autocorrects user input errors for pmc and throws exception for wrong input with this message:"Bad input metric. usage --> pmc: [counter1] [counter2]"
|
||||
|
||||
### Added
|
||||
|
||||
- Every API trace in V2 reported synchronously will have two records, one for Enter phase and for Exit phase
|
||||
- File Plugin now reports the HSA OPS operation kind as part of the output text
|
||||
- MI300 counters support for rocprof v1 and v2.
|
||||
@@ -243,20 +252,23 @@ Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced
|
||||
- File plugin is splitted to File & CLI plugins, CLI plugin is responsible for showing results on the terminal screen and will be automatically the choice if no -d option given in rocprof, File plugin on the other hand is responsible for writing the output results in files if -d option is given.
|
||||
- Structure of the results is different for both CLI & File plugin; File plugin will make sure every type of result is in a separate file, starting by specifying the header; CLI plugin will have the records in the old way.
|
||||
Example for file plugin output:
|
||||
```
|
||||
|
||||
```string
|
||||
Dispatch_ID,GPU_ID,Queue_ID,Queue_Index,PID,TID,GRD,WGR,LDS,SCR,Arch_VGPR,ACCUM_VGPR,SGPR,Wave_Size,SIG,OBJ,Kernel_Name,Start_Timestamp,End_Timestamp,Correlation_ID,GRBM_COUNT
|
||||
|
||||
1,4,1,1,1584730,1584730,10,10,0,0,8,0,16,64,140464978048000,1,"helloworld(char*, char*) (.kd)",0,140469300947216,33,12637.000000
|
||||
```
|
||||
```
|
||||
|
||||
```string
|
||||
Domain,Function,Kernel_Name,Start_Timestamp,End_Timestamp,Correlation_ID
|
||||
|
||||
HIP_API_DOMAIN,hipGetDeviceProperties,,316678074094190,316678074098929,1
|
||||
HIP_API_DOMAIN,hipMalloc,,316678074105702,316678074130851,2
|
||||
HIP_API_DOMAIN,hipMalloc,,316678074131382,316678074136111,3
|
||||
```
|
||||
|
||||
- Removing Record IDs from tracer records in CLI plugin.
|
||||
- Added Flush Interval and Trace Period functionality, where --flush-interval <time_in_ms>, for flushing the buffers every given interval by the user, and --trace-period <delay>:<trace_time>:<interval>, where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (ommiting interval = infinite). For more details please refer to the ROCProfV2 tool usage document.
|
||||
- Added Flush Interval and Trace Period functionality, where --flush-interval [time_in_ms], for flushing the buffers every given interval by the user, and --trace-period [delay]:[trace_time]:[interval], where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (ommiting interval = infinite). For more details please refer to the ROCProfV2 tool usage document.
|
||||
- Added requirements.txt to be used to install all the necessary python3 packages.
|
||||
- ATT plugin:
|
||||
- Added --mode, --mpi and --depth parameters.
|
||||
@@ -269,6 +281,7 @@ Example for file plugin output:
|
||||
- Added "DISPATCH=id" or "DISPATCH=id,rank" to set which dispatch ids to profile for which MPI rank.
|
||||
|
||||
### Fixed
|
||||
|
||||
- Samples are fixed to show the new usage of phases.
|
||||
- Plugin option validates the plugin names.
|
||||
- Fixing rocsys, for rocsys options, rocsys -h can be called
|
||||
@@ -280,3 +293,9 @@ Example for file plugin output:
|
||||
- If ROCPROFILER_METRICS_PATH environment variable is not set, the counters xml path will be taken from the following path (../libexec/rocprofiler/counters/derived_counters.xml) which is relative to librocprofiler64.so.2.0.0
|
||||
- Repeated base metrics were not being properly reused by derived counters.
|
||||
- Fixed wrong dispatch ID on kernel.txt
|
||||
|
||||
## ROCprofiler for rocm 6.0
|
||||
|
||||
### Added
|
||||
|
||||
- Updated supported GPU architectures in README with profiler versions
|
||||
|
||||
+357
-240
@@ -1,16 +1,19 @@
|
||||
# ROCm Profiling Tools
|
||||
|
||||
## DISCLAIMER
|
||||
|
||||
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. Any computer system has risks of security vulnerabilities that cannot be completely prevented or mitigated. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.THIS INFORMATION IS PROVIDED ‘AS IS.” AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY RELIANCE, DIRECT, INDIRECT, SPECIAL, OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.
|
||||
|
||||
© 2022 Advanced Micro Devices, Inc. All Rights Reserved.
|
||||
|
||||
## ROC Profiler v1
|
||||
|
||||
## Introduction
|
||||
|
||||
ROCProfiler is AMD’s tooling infrastructure that provides a hardware specific low level performance analysis interface for the profiling and the tracing of GPU compute applications.
|
||||
|
||||
## ROCProfiler V1
|
||||
|
||||
Profiling with metrics and traces based on perfcounters (PMC) and traces (SPM).
|
||||
Implementation is based on AqlProfile HSA extension.
|
||||
Library supports GFX8/GFX9.
|
||||
The last API library version for ROCProfiler v1 is 8.0.0
|
||||
|
||||
The library source tree:
|
||||
@@ -33,14 +36,14 @@ The library source tree:
|
||||
Roctracer & Rocprofiler need to be installed in the same directory.
|
||||
|
||||
```bash
|
||||
export CMAKE_PREFIX_PATH=<path to hsa-runtime includes>:<path to hsa-runtime library>
|
||||
export CMAKE_PREFIX_PATH=<path_to_hsa-runtime_includes>:<path_to_hsa-runtime_library>
|
||||
export CMAKE_BUILD_TYPE=<debug|release> # release by default
|
||||
export CMAKE_DEBUG_TRACE=1 # 1 to enable debug tracing
|
||||
```
|
||||
|
||||
To build with the current installed ROCM:
|
||||
|
||||
```bash
|
||||
```bash
|
||||
cd .../rocprofiler
|
||||
./build.sh ## (for clean build use `-cb`)
|
||||
```
|
||||
@@ -48,36 +51,51 @@ cd .../rocprofiler
|
||||
To run the test:
|
||||
|
||||
```bash
|
||||
$ cd .../rocprofiler/build
|
||||
$ export LD_LIBRARY_PATH=.:<other paths> # paths to ROC profiler and oher libraries
|
||||
$ export HSA_TOOLS_LIB=librocprofiler64.so.1 # ROC profiler library loaded by HSA runtime
|
||||
$ export ROCP_TOOL_LIB=test/librocprof-tool.so # tool library loaded by ROC profiler
|
||||
$ export ROCP_METRICS=metrics.xml # ROC profiler metrics config file
|
||||
$ export ROCP_INPUT=input.xml # input file for the tool library
|
||||
$ export ROCP_OUTPUT_DIR=./ # output directory for the tool library, for metrics results file 'results.txt' and trace files
|
||||
$ <your test>
|
||||
|
||||
Internal 'simple_convolution' test run script:
|
||||
$ cd .../rocprofiler/build
|
||||
$ ./run.sh
|
||||
|
||||
To enabled error messages logging to '/tmp/rocprofiler_log.txt':
|
||||
|
||||
$ export ROCPROFILER_LOG=1
|
||||
|
||||
To enable verbose tracing:
|
||||
|
||||
$ export ROCPROFILER_TRACE=1
|
||||
cd .../rocprofiler/build
|
||||
export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH # paths to ROC profiler and oher libraries
|
||||
export HSA_TOOLS_LIB=librocprofiler64.so.1 # ROC profiler library loaded by HSA runtime
|
||||
export ROCP_TOOL_LIB=test/librocprof-tool.so # tool library loaded by ROC profiler
|
||||
export ROCP_METRICS=metrics.xml # ROC profiler metrics config file
|
||||
export ROCP_INPUT=input.xml # input file for the tool library
|
||||
export ROCP_OUTPUT_DIR=./ # output directory for the tool library, for metrics results file 'results.txt' and trace files
|
||||
./<your_test>
|
||||
```
|
||||
|
||||
## ROCProfiler v2
|
||||
Internal 'simple_convolution' test run script:
|
||||
|
||||
## Introduction
|
||||
```bash
|
||||
cd .../rocprofiler/build
|
||||
./run.sh
|
||||
```
|
||||
|
||||
- To enabled error messages logging to '/tmp/rocprofiler_log.txt':
|
||||
|
||||
```bash
|
||||
export ROCPROFILER_LOG=1
|
||||
```
|
||||
|
||||
- To enable verbose tracing:
|
||||
|
||||
```bash
|
||||
export ROCPROFILER_TRACE=1
|
||||
```
|
||||
|
||||
## Supported AMD GPU Architectures (V1)
|
||||
|
||||
The following AMD GPU architectures are supported with ROCprofiler V1:
|
||||
|
||||
- gfx8 (Fiji/Ellesmere)
|
||||
- gfx900 (AMD Vega 10)
|
||||
- gfx906 (AMD Vega 7nm also referred to as AMD Vega 20)
|
||||
- gfx908 (AMD Instinct™ MI100 accelerator)
|
||||
- gfx90a (AMD Instinct™ MI200)
|
||||
|
||||
## ROCProfiler V2
|
||||
|
||||
ROCProfilerV2 is a newly developed design for AMD’s tooling infrastructure that provides a hardware specific low level performance analysis interface for profiling of GPU compute applications.
|
||||
The first API library version for ROCProfiler v2 is 9.0.0
|
||||
|
||||
#### Note: ROCProfilerV2 is currently considered a beta version and is subject to change in future releases
|
||||
### Note: ROCProfilerV2 is currently considered a beta version and is subject to change in future releases
|
||||
|
||||
### ROCProfilerV2 Modules
|
||||
|
||||
@@ -100,6 +118,7 @@ The first API library version for ROCProfiler v2 is 9.0.0
|
||||
- libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev on ubuntu or their corresponding packages on any other OS
|
||||
- Cppheaderparser, websockets, matplotlib, lxml, barectf Python3 Packages
|
||||
- Python packages can be installed using:
|
||||
|
||||
```bash
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
@@ -112,242 +131,310 @@ The user has two options for building:
|
||||
|
||||
- Run
|
||||
|
||||
```bash
|
||||
# Normal Build
|
||||
./build.sh --build OR ./build.sh -b
|
||||
# Clean Build
|
||||
./build.sh --clean-build OR ./build.sh -cb
|
||||
```
|
||||
Normal Build
|
||||
|
||||
```bash
|
||||
./build.sh --build OR ./build.sh -b
|
||||
```
|
||||
|
||||
Clean Build
|
||||
|
||||
```bash
|
||||
./build.sh --clean-build OR ./build.sh -cb
|
||||
```
|
||||
|
||||
- Option 2 (Where ROCM_PATH envronment need to be set with the current installation directory of rocm), run the following:
|
||||
|
||||
```bash
|
||||
# Creating the build directory
|
||||
mkdir build && cd build
|
||||
- Creating the build directory
|
||||
|
||||
# Configuring the rocprofv2 build
|
||||
cmake -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_MODULE_PATH=$ROCM_PATH/hip/cmake <CMAKE_OPTIONS> ..
|
||||
```bash
|
||||
mkdir build && cd build
|
||||
```
|
||||
|
||||
# Building the main runtime of the rocprofv2 project
|
||||
cmake --build . -- -j
|
||||
- Configuring the rocprofv2 build
|
||||
|
||||
# Optionally, for building API documentation
|
||||
cmake --build . -- -j doc
|
||||
```bash
|
||||
cmake -DCMAKE_PREFIX_PATH=$ROCM_PATH -DCMAKE_MODULE_PATH=$ROCM_PATH/hip/cmake -DROCPROFILER_BUILD_TESTS=1 -DROCPROFILER_BUILD_SAMPLES=1 <CMAKE_OPTIONS> ..
|
||||
```
|
||||
|
||||
# Optionally, for building ROCProfiler V2 samples
|
||||
cmake --build . -- -j samples
|
||||
- Building the main runtime of the rocprofv2 project
|
||||
|
||||
# Optionally, for building packages (DEB, RPM, TGZ)
|
||||
cmake --build . -- -j tests
|
||||
```bash
|
||||
cmake --build . -- -j
|
||||
```
|
||||
|
||||
# Optionally, for building packages (DEB, RPM, TGZ)
|
||||
# Note: Requires rpm package on ubuntu
|
||||
cmake --build . -- -j package
|
||||
```
|
||||
- Optionally, for building API documentation
|
||||
|
||||
```bash
|
||||
cmake --build . -- -j doc
|
||||
```
|
||||
|
||||
- Optionally, for building packages (DEB, RPM, TGZ)
|
||||
Note: Requires rpm package on ubuntu
|
||||
|
||||
```bash
|
||||
cmake --build . -- -j package
|
||||
```
|
||||
|
||||
### Install
|
||||
|
||||
- Optionally, run the following to install
|
||||
|
||||
```bash
|
||||
# Install rocprofv2 in the ROCM_PATH path
|
||||
./rocprofv2 --install
|
||||
```
|
||||
|
||||
OR, if you are using option 2 in building
|
||||
|
||||
```bash
|
||||
cd build
|
||||
# Install rocprofv2 in the ROCM_PATH path
|
||||
cmake --build . -- -j install
|
||||
```
|
||||
|
||||
## Features & Usage
|
||||
|
||||
- ### rocsys
|
||||
##### A command line utility to control a session (launch/start/stop/exit), with the required application to be traced or profiled in a rocprofv2 context. Usage:
|
||||
### rocsys
|
||||
|
||||
```bash
|
||||
# Launch the application with the required profiling and tracing options with giving a session identifier to be used later
|
||||
rocsys --session session_name launch mpiexec -n 2 ./rocprofv2 -i samples/input.txt Histogram
|
||||
A command line utility to control a session (launch/start/stop/exit), with the required application to be traced or profiled in a rocprofv2 context. Usage:
|
||||
|
||||
# Start a session with a given identifier created at launch
|
||||
rocsys --session session_name start
|
||||
- Launch the application with the required profiling and tracing options with giving a session identifier to be used later
|
||||
|
||||
# Stop a session with a given identifier created at launch
|
||||
rocsys –session session_name stop
|
||||
```bash
|
||||
rocsys --session session_name launch mpiexec -n 2 rocprofv2 -i samples/input.txt Histogram
|
||||
```
|
||||
|
||||
# Exit a session with a given identifier created at launch
|
||||
rocsys –session session_name exit
|
||||
```
|
||||
- Start a session with a given identifier created at launch
|
||||
|
||||
- ### Counters and Metric Collection
|
||||
HW counters and derived metrics can be collected using following option:
|
||||
```bash
|
||||
rocsys --session session_name start
|
||||
```
|
||||
|
||||
```bash
|
||||
rocprofv2 -i samples/input.txt <app_relative_path>
|
||||
input.txt
|
||||
```
|
||||
- Stop a session with a given identifier created at launch
|
||||
|
||||
input.txt content Example (Details of what is needed inside input.txt will be mentioned with every feature):
|
||||
```bash
|
||||
rocsys –session session_name stop
|
||||
```
|
||||
|
||||
```bash
|
||||
pmc: SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE SQ_INSTS_VALU
|
||||
```
|
||||
- Exit a session with a given identifier created at launch
|
||||
|
||||
- ### Application Trace Support
|
||||
Different trace options are available while profiling an app:
|
||||
```bash
|
||||
rocsys –session session_name exit
|
||||
```
|
||||
|
||||
```bash
|
||||
# HIP API & asynchronous activity tracing
|
||||
rocprofv2 --hip-api <app_relative_path> ## For synchronous HIP API Activity tracing
|
||||
rocprofv2 --hip-activity <app_relative_path> ## For both Synchronous & ASynchronous HIP API Activity tracing
|
||||
rocprofv2 --hip-trace <app_relative_path> ## Same as --hip-activity, added for backward compatibility
|
||||
### Counters and Metric Collection
|
||||
|
||||
# HSA API & asynchronous activity tracing
|
||||
rocprofv2 --hsa-api <app_relative_path> ## For synchronous HSA API Activity tracing
|
||||
rocprofv2 --hsa-activity <app_relative_path> ## For both Synchronous & ASynchronous HSA API Activity tracing
|
||||
rocprofv2 --hsa-trace <app_relative_path> ## Same as --hsa-activity, added for backward compatibility
|
||||
HW counters and derived metrics can be collected using following option:
|
||||
|
||||
# Kernel dispatches tracing
|
||||
rocprofv2 --kernel-trace <app_relative_path> ## Kernel Dispatch Tracing
|
||||
```bash
|
||||
rocprofv2 -i samples/input.txt <app_relative_path>
|
||||
```
|
||||
|
||||
# HIP & HSA API and asynchronous activity and kernel dispatches tracing
|
||||
rocprofv2 --sys-trace <app_relative_path> ## Same as combining --hip-trace & --hsa-trace & --kernel-trace
|
||||
```
|
||||
input.txt content Example (Details of what is needed inside input.txt will be mentioned with every feature):
|
||||
|
||||
For complete usage options, please run rocprofv2 help
|
||||
`pmc: SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE SQ_INSTS_VALU`
|
||||
|
||||
```bash
|
||||
rocprofv2 --help
|
||||
```
|
||||
### Application Trace Support
|
||||
|
||||
- ### Plugin Support
|
||||
We have a template for adding new plugins. New plugins can be written on top of rocprofv2 to support the desired output format using include/rocprofiler/v2/rocprofiler_plugins.h header file. These plugins are modular in nature and can easily be decoupled from the code based on need. Installation files:
|
||||
Different trace options are available while profiling an app:
|
||||
|
||||
```bash
|
||||
rocprofiler-plugins_9.0.0-local_amd64.deb
|
||||
rocprofiler-plugins-9.0.0-local.x86_64.rpm
|
||||
```
|
||||
- file plugin: outputs the data in txt files.
|
||||
- Perfetto plugin: outputs the data in protobuf format.
|
||||
- Protobuf files can be viewed using ui.perfetto.dev or using trace_processor
|
||||
- ATT (Advanced thread tracer) plugin: advanced hardware traces data in binary format. Please refer ATT section.
|
||||
- CTF plugin: Outputs the data in ctf format(a binary trace format)
|
||||
- CTF binary output can be viewed using TraceCompass or babeltrace.
|
||||
- HIP API & asynchronous activity tracing
|
||||
|
||||
Usage:
|
||||
```bash
|
||||
rocprofv2 --hip-api <app_relative_path> ## For synchronous HIP API Activity tracing
|
||||
rocprofv2 --hip-activity <app_relative_path> ## For both Synchronous & ASynchronous HIP API Activity tracing
|
||||
rocprofv2 --hip-trace <app_relative_path> ## Same as --hip-activity, added for backward compatibility
|
||||
```
|
||||
|
||||
```bash
|
||||
# plugin_name can be file, perfetto , ctf
|
||||
./rocprofv2 --plugin plugin_name -i samples/input.txt -d output_dir <app_relative_path> # -d is optional, but can be used to define the directory output for output results
|
||||
```
|
||||
- HSA API & asynchronous activity tracing
|
||||
|
||||
Both the output directory and filenames allow for simple environment variable substitution via a special syntax %q{var} -> $var, e.g.:
|
||||
```bash
|
||||
```bash
|
||||
rocprofv2 --hsa-api <app_relative_path> ## For synchronous HSA API Activity tracing
|
||||
rocprofv2 --hsa-activity <app_relative_path> ## For both Synchronous & ASynchronous HSA API Activity tracing
|
||||
rocprofv2 --hsa-trace <app_relative_path> ## Same as --hsa-activity, added for backward compatibility
|
||||
```
|
||||
|
||||
- Kernel dispatches tracing
|
||||
|
||||
```bash
|
||||
rocprofv2 --kernel-trace <app_relative_path> ## Kernel Dispatch Tracing
|
||||
```
|
||||
|
||||
- HIP & HSA API and asynchronous activity and kernel dispatches tracing
|
||||
|
||||
```bash
|
||||
rocprofv2 --sys-trace <app_relative_path> ## Same as combining --hip-trace & --hsa-trace & --kernel-trace
|
||||
```
|
||||
|
||||
- For complete usage options, please run rocprofv2 help
|
||||
|
||||
```bash
|
||||
rocprofv2 --help
|
||||
```
|
||||
|
||||
### Plugin Support
|
||||
|
||||
We have a template for adding new plugins. New plugins can be written on top of rocprofv2 to support the desired output format using include/rocprofiler/v2/rocprofiler_plugins.h header file. These plugins are modular in nature and can easily be decoupled from the code based on need. Installation files:
|
||||
|
||||
```string
|
||||
rocprofiler-plugins_2.0.0-local_amd64.deb
|
||||
rocprofiler-plugins-2.0.0-local.x86_64.rpm
|
||||
```
|
||||
|
||||
- File plugin: outputs the data in txt files.
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
rocprofv2 --plugin file -i samples/input.txt -d output_dir <app_relative_path> # -d is optional, but can be used to define the directory output for output results
|
||||
```
|
||||
|
||||
- Perfetto plugin: outputs the data in protobuf format. Protobuf files can be viewed using ui.perfetto.dev or using trace_processor.
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
rocprofv2 --plugin perfetto --hsa-trace -d output_dir <app_relative_path> # -d is optional, but can be used to define the directory output for output results
|
||||
```
|
||||
|
||||
Both the output directory and filenames allow for simple environment variable substitution via a special syntax %q{var} -> $var, e.g.:
|
||||
|
||||
```bash
|
||||
export var="FOO"
|
||||
rocprofv2 --plugin perfetto -o file_%q{var}_name
|
||||
# Generates file names: file_FOO_name[...].pftrace
|
||||
```
|
||||
```
|
||||
|
||||
- #### (ATT) Advanced Thread Trace
|
||||
Tool used to collect fine-grained hardware metrics. Provides ISA-level instruction hotspot analysis via hardware tracing.
|
||||
- CTF plugin: Outputs the data in ctf format(a binary trace format). CTF binary output can be viewed using TraceCompass or babeltrace.
|
||||
Usage:
|
||||
|
||||
```bash
|
||||
# ATT(Advanced Thread Trace) needs some preparation before running.
|
||||
```bash
|
||||
rocprofv2 --plugin ctf --hip-trace -d output_dir <app_relative_path> # -d is optional, but can be used to define the directory output for output results
|
||||
```
|
||||
|
||||
# 1. Make sure to generate the assembly file for application by executing the following before compiling your HIP Application
|
||||
# This can be achieved globally by following environment variable
|
||||
export HIPCC_COMPILE_FLAGS_APPEND="--save-temps -g"
|
||||
# Similarly, the --save-temps -g flags can be added per file for better ISA generation control.
|
||||
- ATT (Advanced thread tracer) plugin: advanced hardware traces data in binary format. Please refer ATT section.
|
||||
Tool used to collect fine-grained hardware metrics. Provides ISA-level instruction hotspot analysis via hardware tracing.
|
||||
|
||||
# 2. Install plugin package
|
||||
# see Plugin Support section for installation
|
||||
***
|
||||
Note: ATT(Advanced Thread Trace) needs some preparation before running.
|
||||
***
|
||||
|
||||
# 3. Run the following to view the trace
|
||||
# Att-specific options must come right after the assembly file
|
||||
rocprofv2 -i input.txt --plugin att <app_assembly_file> --mode network <app_relative_path>
|
||||
```
|
||||
```bash
|
||||
# Example for vectoradd on navi31.
|
||||
# Special attention to gfx1100.s==navi31 in the ISA file name.
|
||||
# Use gfx1030 for navi21, gfx90a for MI200 and gfx940 for MI300
|
||||
hipcc -g --save-temps vectoradd_hip.cpp -o vectoradd_hip.exe
|
||||
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode network ./vectoradd_hip.exe
|
||||
# Then open the browser at http://localhost:8000
|
||||
# The ISA can also be obtained from llvm/roc objdump, however, annotations will be different
|
||||
```
|
||||
For MPI or very long applications, we recommend to run collection, and later run the parser with already collected data:
|
||||
```bash
|
||||
# Run only collection: The assembly file is not used. Use mpirun [...] rocprofv2 [...] if needed.
|
||||
rocprofv2 -i input.txt --plugin att none ./vectoradd_hip.exe
|
||||
# Remove the binary/application: Only runs the parser.
|
||||
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode network
|
||||
```
|
||||
- ##### app_assembly_file_relative_path
|
||||
AMDGCN ISA file with .s extension generated in 1st step
|
||||
- ##### app_relative_path
|
||||
Path for the running application
|
||||
- ##### ATT plugin optional parameters
|
||||
- --depth [n]: How many waves per slot to parse (maximum).
|
||||
- --mpi [proc]: Parse with this many mpi processes, for greater analysis speed. Does not change results. Requires mpi4py.
|
||||
- --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use).
|
||||
- --trace_file "files": glob (wildcards allowed) of traces files to parse. Requires quotes for use with wildcards.
|
||||
- --mode [network, file, off (default)]
|
||||
- ##### network
|
||||
Opens the server with the browser UI.
|
||||
att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") to change these.
|
||||
In case rocprofv2 is running on a different machine, use port forwarding "ssh -L 8000:localhost:8000 <user@IP>" so the browser can be used locally. For docker, use --network=host --ipc=host -p8000:8000 -p18000:18000
|
||||
- ##### file
|
||||
Dumps the analyzed json files to disk for vieweing at a later time. Run python3 httpserver.py from within the generated ui/ folder to view the trace, similarly to network mode. The folder can be copied to another machine, and will run without rocm.
|
||||
- ##### off
|
||||
Runs trace collection but not analysis, so it can be analyzed at a later time. Run rocprofv2 ATT [network, file] with the same parameters, removing the application binary, to analyze previously generated traces. We recommend not setting the mode when collecting for MPI applications.
|
||||
- ##### input.txt
|
||||
Required. Used to select specific compute units and other trace parameters.
|
||||
For first time users, we recommend compiling and running vectorAdd with
|
||||
```bash
|
||||
att: TARGET_CU=1
|
||||
SE_MASK=0x1
|
||||
SIMD_MASK=0x3
|
||||
```
|
||||
and histogram with
|
||||
```bash
|
||||
att: TARGET_CU=0
|
||||
SE_MASK=0xFF
|
||||
SIMD_MASK=0xF // 0xF for GFX9, SIMD_MASK=0 for Navi
|
||||
```
|
||||
Possible contents:
|
||||
- att: TARGET_CU=1 //or some other CU [0,15] - WGP for Navi [0,8]
|
||||
- SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables 1 out of 4 shader engines.
|
||||
- SIMD_MASK=0xF // GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3].
|
||||
- DISPATCH=ID,RN // collect trace only for the given dispatch_ID (from --kernel-trace) and MPI rank RN. RN is optional and ignored for single processes. Multiple lines with varying combinations of RN and ID can be added.
|
||||
- KERNEL=kernname // Profile only kernels containing the string kernname (c++ mangled name). Multiple lines can be added.
|
||||
- PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. 0=fastest (usually once every 16 cycles). GFX9 only. Counters will be shown in a graph over time in the browser UI.
|
||||
- PERFCOUNTER=counter_name // Add a SQ counter to be collected with ATT; period defined by PERFCOUNTERS_COL_PERIOD. GFX9 only.
|
||||
- BUFFER_SIZE=[size] // Sets size of the ATT buffer collection, per dispatch, in megabytes (shared among all shader engines).
|
||||
- Make sure to generate the assembly file for application by executing the following before compiling your HIP Application. This can be achieved globally by following environment variable
|
||||
|
||||
```bash
|
||||
export HIPCC_COMPILE_FLAGS_APPEND="--save-temps -g"
|
||||
```
|
||||
|
||||
Similarly, the --save-temps -g flags can be added per file for better ISA generation control.
|
||||
|
||||
- Install plugin package. See Plugin Support section for installation
|
||||
- Run the following to view the trace. Att-specific options must come right after the assembly file
|
||||
|
||||
```bash
|
||||
rocprofv2 -i input.txt --plugin att <app_assembly_file> --mode network <app_relative_path>
|
||||
```
|
||||
|
||||
- Example for vectoradd on navi31.
|
||||
|
||||
***
|
||||
Note: Special attention to gfx1100.s==navi31 in the ISA file name.
|
||||
Use gfx1030 for navi21, gfx90a for MI200 and gfx940 for MI300
|
||||
***
|
||||
|
||||
```bash
|
||||
hipcc -g --save-temps vectoradd_hip.cpp -o vectoradd_hip.exe
|
||||
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode network ./vectoradd_hip.exe
|
||||
```
|
||||
|
||||
Then open the browser at `http://localhost:8000`
|
||||
The ISA can also be obtained from llvm/roc objdump, however, annotations will be different
|
||||
|
||||
- app_assembly_file_relative_path
|
||||
AMDGCN ISA file with .s extension generated in 1st step
|
||||
- app_relative_path
|
||||
Path for the running application
|
||||
- ATT plugin optional parameters
|
||||
- --depth [n]: How many waves per slot to parse (maximum).
|
||||
- --mpi [proc]: Parse with this many mpi processes, for greater analysis speed. Does not change results. Requires mpi4py.
|
||||
- --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use).
|
||||
- --trace_file "files": glob (wildcards allowed) of traces files to parse. Requires quotes for use with wildcards.
|
||||
- --mode [network, file, off (default)]
|
||||
- network
|
||||
Opens the server with the browser UI.
|
||||
att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") to change these.
|
||||
In case rocprofv2 is running on a different machine, use port forwarding `ssh -L 8000:localhost:8000 <user@IP>` so the browser can be used locally. For docker, use --network=host --ipc=host -p8000:8000 -p18000:18000
|
||||
- file
|
||||
Dumps the analyzed json files to disk for vieweing at a later time. Run python3 httpserver.py from within the generated ui/ folder to view the trace, similarly to network mode. The folder can be copied to another machine, and will run without rocm.
|
||||
- off
|
||||
Runs trace collection but not analysis, so it can be analyzed at a later time. Run rocprofv2 ATT [network, file] with the same parameters, removing the application binary, to analyze previously generated traces.
|
||||
- input.txt
|
||||
Required. Used to select specific compute units and other trace parameters.
|
||||
For first time users, we recommend compiling and running vectorAdd with
|
||||
|
||||
```string
|
||||
att: TARGET_CU=1
|
||||
SE_MASK=0x1
|
||||
SIMD_MASK=0x3
|
||||
```
|
||||
|
||||
and histogram with
|
||||
|
||||
```string
|
||||
att: TARGET_CU=0
|
||||
SE_MASK=0xFF
|
||||
SIMD_MASK=0xF // 0xF for GFX9, SIMD_MASK=0 for Navi
|
||||
```
|
||||
|
||||
Possible contents:
|
||||
- att: TARGET_CU=1 //or some other CU [0,15] - WGP for Navi [0,8]
|
||||
- SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables 1 out of 4 shader engines.
|
||||
- SIMD_MASK=0xF // GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3].
|
||||
- DISPATCH=ID,RN // collect trace only for the given dispatch_ID and MPI rank RN. RN is optional and ignored for single processes. Multiple line with varying combinations of RN and ID can be added.
|
||||
- KERNEL=kernname // Profile only kernels containing the string kernname (c++ mangled name). Multiple lines can be added.
|
||||
- PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. 0=fastest (usually once every 16 cycles). GFX9 only. Counters will be shown in a graph over time in the browser UI.
|
||||
- PERFCOUNTER=counter_name // Add a SQ counter to be collected with ATT; period defined by PERFCOUNTERS_COL_PERIOD. GFX9 only.
|
||||
- BUFFER_SIZE=[size] // Sets size of the ATT buffer collection, per dispatch, in megabytes (shared among all shader engines).
|
||||
|
||||
***
|
||||
Note: For MPI or long running applications, we recommend to run collection, and later run the parser with already collected data:
|
||||
Run only collection: The assembly file is not used. Use mpirun [...] rocprofv2 [...] if needed.
|
||||
|
||||
```bash
|
||||
rocprofv2 -i input.txt --plugin att none ./vectoradd_hip.exe
|
||||
```
|
||||
|
||||
Remove the binary/application: Only runs the parser.
|
||||
|
||||
- ### Flush Interval
|
||||
Flush interval can be used to control the interval time in milliseconds between the buffers flush for the tool. However, if the buffers are full the flush will be called on its own. This can be used as in the next example:
|
||||
```bash
|
||||
rocprofv2 --flush-interval <TIME_INTERVAL_IN_MILLISECONDS> <rest_of_rocprofv2_arguments> <app_relative_path>
|
||||
```
|
||||
```bash
|
||||
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode network
|
||||
```
|
||||
|
||||
- ### Trace Period
|
||||
Trace period can be used to control when the profiling or tracing is enabled using two arguments, the first one is the delay time, which is the time spent idle without tracing or profiling. The second argument is the profiling or the tracing time, which is the active time where the profiling and tracing are working, so basically, the session will work in the following timeline:
|
||||
```
|
||||
# <DELAY_TIME> => <PROFILING_OR_TRACING_SESSION_START> => <ACTIVE_PROFILING_OR_TRACING_TIME> => <PROFILING_OR_TRACING_SESSION_STOP>
|
||||
```
|
||||
This feature can be used using the following command:
|
||||
```bash
|
||||
rocprofv2 --trace-period <delay>:<active_time>:<interval> <rest_of_rocprofv2_arguments> <app_relative_path>
|
||||
```
|
||||
- delay: Time delay to start profiling (ms).
|
||||
- active_time: How long to profile for (ms).
|
||||
- interval: If set, profiling sessions will start (loop) every "interval", and run for "active_time", until the application ends. Must be higher than "active_time".
|
||||
***
|
||||
|
||||
### Flush Interval
|
||||
|
||||
- Device Profiling: A device profiling session allows the user to profile the GPU device for counters irrespective of the running applications on the GPU. This is different from application profiling. device profiling session doesn't care about the host running processes and threads. It directly provides low level profiling information.
|
||||
Flush interval can be used to control the interval time in milliseconds between the buffers flush for the tool. However, if the buffers are full the flush will be called on its own. This can be used as in the next example:
|
||||
|
||||
- Session Support: A session is a unique identifier for a profiling/tracing/pc-sampling task. A ROCProfilerV2 Session has enough information about what needs to be collected or traced and it allows the user to start/stop profiling/tracing whenever required. More details on the API can be found in the API specification documentation that can be installed using rocprofiler-doc package. Samples also can be found for how to use the API in samples directory.
|
||||
```bash
|
||||
rocprofv2 --flush-interval <TIME_INTERVAL_IN_MILLISECONDS> <rest_of_rocprofv2_arguments> <app_relative_path>
|
||||
```
|
||||
|
||||
### Trace Period
|
||||
|
||||
Trace period can be used to control when the profiling or tracing is enabled using two arguments, the first one is the delay time, which is the time spent idle without tracing or profiling. The second argument is the profiling or the tracing time, which is the active time where the profiling and tracing are working, so basically, the session will work in the following timeline:
|
||||
|
||||
```string
|
||||
<DELAY_TIME> => <PROFILING_OR_TRACING_SESSION_START> => <ACTIVE_PROFILING_OR_TRACING_TIME> => <PROFILING_OR_TRACING_SESSION_STOP>
|
||||
```
|
||||
|
||||
This feature can be used using the following command:
|
||||
|
||||
```bash
|
||||
rocprofv2 --trace-period <delay>:<active_time>:<interval> <rest_of_rocprofv2_arguments> <app_relative_path>
|
||||
```
|
||||
|
||||
- delay: Time delay to start profiling (ms).
|
||||
- active_time: How long to profile for (ms).
|
||||
- interval: If set, profiling sessions will start (loop) every "interval", and run for "active_time", until the application ends. Must be higher than "active_time".
|
||||
|
||||
### Device Profiling
|
||||
|
||||
A device profiling session allows the user to profile the GPU device for counters irrespective of the running applications on the GPU. This is different from application profiling. device profiling session doesn't care about the host running processes and threads. It directly provides low level profiling information.
|
||||
|
||||
### Session Support
|
||||
|
||||
A session is a unique identifier for a profiling/tracing/pc-sampling task. A ROCProfilerV2 Session has enough information about what needs to be collected or traced and it allows the user to start/stop profiling/tracing whenever required. More details on the API can be found in the API specification documentation that can be installed using rocprofiler-doc package. Samples also can be found for how to use the API in samples directory.
|
||||
|
||||
## Tests
|
||||
|
||||
@@ -359,48 +446,63 @@ The user has two options for building:
|
||||
|
||||
- memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions.
|
||||
|
||||
installation:
|
||||
installation:
|
||||
rocprofiler-tests_9.0.0-local_amd64.deb
|
||||
rocprofiler-tests-9.0.0-local.x86_64.rpm
|
||||
|
||||
```bash
|
||||
rocprofiler-tests_9.0.0-local_amd64.deb
|
||||
rocprofiler-tests-9.0.0-local.x86_64.rpm
|
||||
```
|
||||
### List and Run tests
|
||||
|
||||
- Optionally, for tests: run the following:
|
||||
#### Run unit tests on the commandline
|
||||
|
||||
- Option 1, using rocprofv2 script:
|
||||
```bash
|
||||
./build/tests/unittests/runUnitTests
|
||||
```
|
||||
|
||||
```bash
|
||||
cd build && ./rocprofv2 -t
|
||||
```
|
||||
#### Run profilerfeaturetests on the commandline
|
||||
|
||||
- Option 2, using cmake directly:
|
||||
```bash
|
||||
./build/tests/featuretests/profiler/runFeatureTests
|
||||
```
|
||||
|
||||
#### Run tracer featuretests on the commandline
|
||||
|
||||
```bash
|
||||
./build/tests/featuretests/tracer/runTracerFeatureTests
|
||||
```
|
||||
|
||||
#### Run all tests
|
||||
|
||||
```bash
|
||||
rocprofv2 -t
|
||||
```
|
||||
|
||||
### Guidelines for adding new tests
|
||||
|
||||
- Prefer to enhance an existing test as opposed to writing a new one. Tests have overhead to start and many small tests spend precious test time on startup and initialization issues.
|
||||
- Make the test run standalone without requirement for command-line arguments. This makes it easier to debug since the name of the test is shown in the test report and if you know the name of the test you can the run the test.
|
||||
|
||||
```bash
|
||||
cd build && cmake --build . -- -j check
|
||||
```
|
||||
## Logging
|
||||
|
||||
To enable error messages logging to '/tmp/rocprofiler_log.txt':
|
||||
```bash
|
||||
$ export ROCPROFILER_LOG=1
|
||||
```
|
||||
|
||||
```bash
|
||||
export ROCPROFILER_LOG=1
|
||||
```
|
||||
|
||||
## Documentation
|
||||
|
||||
We make use of doxygen to automatically generate API documentation. Generated document can be found in the following path:
|
||||
|
||||
```bash
|
||||
# ROCM_PATH by default is /opt/rocm
|
||||
# It can be set by the user in different location if needed.
|
||||
<ROCM_PATH>/share/doc/rocprofv2
|
||||
```
|
||||
ROCM_PATH by default is /opt/rocm
|
||||
It can be set by the user in different location if needed.
|
||||
<ROCM_PATH>/share/doc/rocprofv2
|
||||
|
||||
installation:
|
||||
installation:
|
||||
|
||||
```bash
|
||||
rocprofiler-docs_9.0.0-local_amd64.deb
|
||||
rocprofiler-docs-9.0.0-local.x86_64.rpm
|
||||
```
|
||||
```string
|
||||
rocprofiler-docs_9.0.0-local_amd64.deb
|
||||
rocprofiler-docs-9.0.0-local.x86_64.rpm
|
||||
```
|
||||
|
||||
## Samples
|
||||
|
||||
@@ -409,7 +511,7 @@ We make use of doxygen to automatically generate API documentation. Generated do
|
||||
|
||||
installation:
|
||||
|
||||
```bash
|
||||
```string
|
||||
rocprofiler-samples_9.0.0-local_amd64.deb
|
||||
rocprofiler-samples-9.0.0-local.x86_64.rpm
|
||||
```
|
||||
@@ -462,11 +564,26 @@ samples can be run as independent executables once installed
|
||||
Please report in the Github Issues
|
||||
|
||||
## Limitations
|
||||
- ##### Navi3x requires a stable power state for counter collection.
|
||||
|
||||
- Navi3x requires a stable power state for counter collection.
|
||||
Currently, this state needs to be set by the user.
|
||||
To do so, set "power_dpm_force_performance_level" to be writeable for non-root users, then set performance level to profile_standard:
|
||||
|
||||
```bash
|
||||
sudo chmod 777 /sys/class/drm/card0/device/power_dpm_force_performance_level
|
||||
echo profile_standard >> /sys/class/drm/card0/device/power_dpm_force_performance_level
|
||||
```
|
||||
|
||||
Recommended: "profile_standard" for counter collection and "auto" for all other profiling. Use rocm-smi to verify the current power state. For multiGPU systems (includes integrated graphics), replace "card0" by the desired card.
|
||||
|
||||
## Supported AMD GPU Architectures (V2)
|
||||
|
||||
The following AMD GPU architectures are supported with ROCprofiler V2:
|
||||
|
||||
- gfx900 (AMD Vega 10)
|
||||
- gfx906 (AMD Vega 7nm also referred to as AMD Vega 20)
|
||||
- gfx908 (AMD Instinct™ MI100 accelerator)
|
||||
- gfx90a (AMD Instinct™ MI200)
|
||||
- gfx94x (AMD Instinct™ MI300)
|
||||
- gfx10xx ([Navi2x] AMD Radeon(TM) Graphics)
|
||||
- gfx11xx ([Navi3x] AMD Radeon(TM) Graphics)
|
||||
|
||||
Atsaukties uz šo jaunā problēmā
Block a user