SWDEV-421283: ATT readme updates

Change-Id: I282a53ba6a6349b10c5976ab4232f451f51d82f6
Цей коміт міститься в:
Giovanni LB
2023-10-11 00:52:39 -03:00
зафіксовано Giovanni Baraldi
джерело 865d2782e8
коміт fdd4af4023
+40 -37
Переглянути файл
@@ -304,59 +304,58 @@ Usage:
Tool used to collect fine-grained hardware metrics. Provides ISA-level instruction hotspot analysis via hardware tracing.
- Install plugin package. See Plugin Support section for installation
- Run the following to view the trace. Att-specific options must come right after the assembly file
- Run the following to view the trace. Att-specific options must come right after the assembly file.
- On ROCm 6.0, ATT enables automatic capture of the ISA during kernel execution, and does not require recompiling. It is recommeneded to leave at "auto".
```bash
rocprofv2 -i input.txt --plugin att <app_assembly_file> --mode network <app_relative_path>
rocprofv2 -i input.txt --plugin att auto --mode csv <app_relative_path>
# This is the 5.7 version:
# rocprofv2 -i input.txt --plugin att <app_assembly_file> --mode csv <app_relative_path>
```
- app_assembly_file:
On ROCm 6.0, ATT enables automatic capture of the ISA during kernel execution, and does not require recompiling. It is recommeneded to leave at "auto".
- app_relative_path
Path for the running application
- ATT plugin optional parameters
- --depth [n]: How many waves per slot to parse (maximum).
- --mpi [proc]: Parse with this many mpi processes, for greater analysis speed. Does not change results. Requires mpi4py.
- --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use).
- --trace_file "files": glob (wildcards allowed) of traces files to parse. Requires quotes for use with wildcards.
- --mode [network, file, csv, off (default)]
- network
- --att_kernel "filename": Kernel filename(s) (glob) to use. A CSV file (or UI folder) will be generated for each kernel.txt file. Default: all in current folder.
- --mode [csv, network, file, off (default)]
- off
Runs trace collection but not analysis, so it can be analyzed at a later time. Run rocprofv2 ATT with the same parameters (+ --mode csv), removing the application binary, to analyze previously generated traces.
- csv
Dumps the analyzed assembly into a CSV format, with the hitcount and total cycles cost. Recommended mode for most users.
- network (deprecated)
Opens the server with the browser UI.
att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") to change these.
In case rocprofv2 is running on a different machine, use port forwarding `ssh -L 8000:localhost:8000 <user@IP>` so the browser can be used locally. For docker, use --network=host --ipc=host -p8000:8000 -p18000:18000
- file
- file (deprecated)
Dumps the analyzed json files to disk for vieweing at a later time. Run python3 httpserver.py from within the generated ui/ folder to view the trace, similarly to network mode. The folder can be copied to another machine, and will run without rocm.
- csv
Dumps the analyzed assembly into a CSV format, with the hitcount and total cycles cost.
Use rocprofv2's -o option to specify output file name (default "att_output.csv").
- off
Runs trace collection but not analysis, so it can be analyzed at a later time. Run rocprofv2 ATT [network, file] with the same parameters, removing the application binary, to analyze previously generated traces.
- input.txt
Required. Used to select specific compute units and other trace parameters.
For first time users, we recommend compiling and running vectorAdd with
For first time users, using the following input file:
```string
```bash
# vectoradd
att: TARGET_CU=1
SE_MASK=0x1
SIMD_MASK=0x3
SIMD_SELECT=0x3
```
and histogram with
```string
```bash
# histogram
att: TARGET_CU=0
SE_MASK=0xFF
SIMD_MASK=0xF // 0xF for GFX9, SIMD_MASK=0 for Navi
SIMD_SELECT=0xF // 0xF for GFX9, SIMD_MASK=0 for Navi
```
Possible contents:
- att: TARGET_CU=1 //or some other CU [0,15] - WGP for Navi [0,8]
- att: TARGET_CU=1 // or some other CU [0,15] - WGP for Navi [0,8]
- SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables 1 out of 4 shader engines.
- SIMD_MASK=0xF // GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3].
- DISPATCH=ID,RN // collect trace only for the given dispatch_ID and MPI rank RN. RN ignored for single processes. Multiple lines with varying combinations of RN and ID can be added.
- SIMD_SELECT=0xF // GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3]. Recommended 0xF for GFX9 and 0x0 for Navi.
- DISPATCH=ID // collect trace only for the given dispatch_ID. Multiple lines for can be added.
- DISPATCH=ID,RN // collect trace only for the given dispatch_ID and MPI rank RN. Multiple lines with varying combinations of RN and ID can be added.
- KERNEL=kernname // Profile only kernels containing the string kernname (c++ mangled name). Multiple lines can be added.
- PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. 0=fastest (usually once every 16 cycles). GFX9 only. Counters will be shown in a graph over time in the browser UI.
- PERFCOUNTER=counter_name // Add a SQ counter to be collected with ATT; period defined by PERFCOUNTERS_COL_PERIOD. GFX9 only.
- PERFCOUNTERS_CTRL=0x3 // Multiplier period for counter collection [0~31]. 0=fastest. GFX9 only.
- PERFCOUNTER_MASK=0xFFF // Bitmask for perfcounter collection. GFX9 only.
- PERFCOUNTER=counter_name // Add a SQ counter to be collected with ATT; period defined by PERFCOUNTERS_CTRL. GFX9 only.
- BUFFER_SIZE=[size] // Sets size of the ATT buffer collection, per dispatch, in megabytes (shared among all shader engines).
- ISA_CAPTURE_MODE=[0,1,2] // Set capture mode during kernel dispatch.
- 0 = capture symbols only.
@@ -366,23 +365,21 @@ Tool used to collect fine-grained hardware metrics. Provides ISA-level instructi
- Example for vectoradd.
```bash
# -g add debugging symbols to the binary. Required only for tracking disassembly back to c++.
hipcc -g --save-temps vectoradd_hip.cpp -o vectoradd_hip.exe
# "auto" means to use the automatically captured ISA. "csv" dumps result to "my.csv".
rocprofv2 -i input.txt -o my.csv --plugin att auto --mode csv ./vectoradd_hip.exe
# -g adds debugging symbols to the binary. Required only for tracking disassembly back to c++.
hipcc -g vectoradd_hip.cpp -o vectoradd_hip.exe
# "auto" means to use the automatically captured ISA. "csv" dumps result to "vectoradd_float_v0.csv".
rocprofv2 -i input.txt --plugin att auto --mode csv ./vectoradd_hip.exe
```
Instruction latencies will be in my.csv
Instruction latencies will be in vectoradd_float_v0.csv
- Example with symbolic ISA (as in ROCm 5.7 or previous).
```bash
hipcc -g --save-temps vectoradd_hip.cpp -o vectoradd_hip.exe
# A custom ISA can be used such as vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s
# Special attention to the correct architecture for the ISA, such as "gfx1100" (navi31).
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode network ./vectoradd_hip.exe
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode csv ./vectoradd_hip.exe
```
Then open the browser at `http://localhost:8000`
***
Note: For MPI or long running applications, we recommend to run collection, and later run the parser with already collected data:
Run only collection: The assembly file is not used. Use mpirun [...] rocprofv2 [...] if needed.
@@ -394,7 +391,13 @@ Tool used to collect fine-grained hardware metrics. Provides ISA-level instructi
Remove the binary/application from the command line.
```bash
# Only runs the parser on previously collected data.
rocprofv2 -i input.txt --plugin att auto --mode network
rocprofv2 -i input.txt --plugin att auto --mode csv
```
Note 2: By default, ATT only collects a SINGLE kernel dispatch for the whole application, which is the first dispatch matching the given filters (DISPATCH=<id> or KERNEL=<name>). To collect multiple dispatches in a single application run, use:
```bash
export ROCPROFILER_MAX_ATT_PROFILES=<max_collections>
```
***