SWDEV-421283: ATT readme updates
Change-Id: I282a53ba6a6349b10c5976ab4232f451f51d82f6
Цей коміт міститься в:
зафіксовано
Giovanni Baraldi
джерело
865d2782e8
коміт
fdd4af4023
+40
-37
@@ -304,59 +304,58 @@ Usage:
|
||||
Tool used to collect fine-grained hardware metrics. Provides ISA-level instruction hotspot analysis via hardware tracing.
|
||||
|
||||
- Install plugin package. See Plugin Support section for installation
|
||||
- Run the following to view the trace. Att-specific options must come right after the assembly file
|
||||
- Run the following to view the trace. Att-specific options must come right after the assembly file.
|
||||
- On ROCm 6.0, ATT enables automatic capture of the ISA during kernel execution, and does not require recompiling. It is recommeneded to leave at "auto".
|
||||
|
||||
```bash
|
||||
rocprofv2 -i input.txt --plugin att <app_assembly_file> --mode network <app_relative_path>
|
||||
rocprofv2 -i input.txt --plugin att auto --mode csv <app_relative_path>
|
||||
# This is the 5.7 version:
|
||||
# rocprofv2 -i input.txt --plugin att <app_assembly_file> --mode csv <app_relative_path>
|
||||
```
|
||||
|
||||
- app_assembly_file:
|
||||
On ROCm 6.0, ATT enables automatic capture of the ISA during kernel execution, and does not require recompiling. It is recommeneded to leave at "auto".
|
||||
- app_relative_path
|
||||
Path for the running application
|
||||
- ATT plugin optional parameters
|
||||
- --depth [n]: How many waves per slot to parse (maximum).
|
||||
- --mpi [proc]: Parse with this many mpi processes, for greater analysis speed. Does not change results. Requires mpi4py.
|
||||
- --att_kernel "filename": Kernel filename to use (instead of ATT asking which one to use).
|
||||
- --trace_file "files": glob (wildcards allowed) of traces files to parse. Requires quotes for use with wildcards.
|
||||
- --mode [network, file, csv, off (default)]
|
||||
- network
|
||||
- --att_kernel "filename": Kernel filename(s) (glob) to use. A CSV file (or UI folder) will be generated for each kernel.txt file. Default: all in current folder.
|
||||
- --mode [csv, network, file, off (default)]
|
||||
- off
|
||||
Runs trace collection but not analysis, so it can be analyzed at a later time. Run rocprofv2 ATT with the same parameters (+ --mode csv), removing the application binary, to analyze previously generated traces.
|
||||
- csv
|
||||
Dumps the analyzed assembly into a CSV format, with the hitcount and total cycles cost. Recommended mode for most users.
|
||||
- network (deprecated)
|
||||
Opens the server with the browser UI.
|
||||
att needs 2 ports available (e.g. 8000, 18000). There is an option (default: --ports "8000,18000") to change these.
|
||||
In case rocprofv2 is running on a different machine, use port forwarding `ssh -L 8000:localhost:8000 <user@IP>` so the browser can be used locally. For docker, use --network=host --ipc=host -p8000:8000 -p18000:18000
|
||||
- file
|
||||
- file (deprecated)
|
||||
Dumps the analyzed json files to disk for vieweing at a later time. Run python3 httpserver.py from within the generated ui/ folder to view the trace, similarly to network mode. The folder can be copied to another machine, and will run without rocm.
|
||||
- csv
|
||||
Dumps the analyzed assembly into a CSV format, with the hitcount and total cycles cost.
|
||||
Use rocprofv2's -o option to specify output file name (default "att_output.csv").
|
||||
- off
|
||||
Runs trace collection but not analysis, so it can be analyzed at a later time. Run rocprofv2 ATT [network, file] with the same parameters, removing the application binary, to analyze previously generated traces.
|
||||
- input.txt
|
||||
Required. Used to select specific compute units and other trace parameters.
|
||||
For first time users, we recommend compiling and running vectorAdd with
|
||||
For first time users, using the following input file:
|
||||
|
||||
```string
|
||||
```bash
|
||||
# vectoradd
|
||||
att: TARGET_CU=1
|
||||
SE_MASK=0x1
|
||||
SIMD_MASK=0x3
|
||||
SIMD_SELECT=0x3
|
||||
```
|
||||
|
||||
and histogram with
|
||||
|
||||
```string
|
||||
```bash
|
||||
# histogram
|
||||
att: TARGET_CU=0
|
||||
SE_MASK=0xFF
|
||||
SIMD_MASK=0xF // 0xF for GFX9, SIMD_MASK=0 for Navi
|
||||
SIMD_SELECT=0xF // 0xF for GFX9, SIMD_MASK=0 for Navi
|
||||
```
|
||||
|
||||
Possible contents:
|
||||
- att: TARGET_CU=1 //or some other CU [0,15] - WGP for Navi [0,8]
|
||||
- att: TARGET_CU=1 // or some other CU [0,15] - WGP for Navi [0,8]
|
||||
- SE_MASK=0x1 // bitmask of shader engines. The fewer, the easier on the hardware. Default enables 1 out of 4 shader engines.
|
||||
- SIMD_MASK=0xF // GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3].
|
||||
- DISPATCH=ID,RN // collect trace only for the given dispatch_ID and MPI rank RN. RN ignored for single processes. Multiple lines with varying combinations of RN and ID can be added.
|
||||
- SIMD_SELECT=0xF // GFX9: bitmask of SIMDs. Navi: SIMD Index [0-3]. Recommended 0xF for GFX9 and 0x0 for Navi.
|
||||
- DISPATCH=ID // collect trace only for the given dispatch_ID. Multiple lines for can be added.
|
||||
- DISPATCH=ID,RN // collect trace only for the given dispatch_ID and MPI rank RN. Multiple lines with varying combinations of RN and ID can be added.
|
||||
- KERNEL=kernname // Profile only kernels containing the string kernname (c++ mangled name). Multiple lines can be added.
|
||||
- PERFCOUNTERS_COL_PERIOD=0x3 // Multiplier period for counter collection [0~31]. 0=fastest (usually once every 16 cycles). GFX9 only. Counters will be shown in a graph over time in the browser UI.
|
||||
- PERFCOUNTER=counter_name // Add a SQ counter to be collected with ATT; period defined by PERFCOUNTERS_COL_PERIOD. GFX9 only.
|
||||
- PERFCOUNTERS_CTRL=0x3 // Multiplier period for counter collection [0~31]. 0=fastest. GFX9 only.
|
||||
- PERFCOUNTER_MASK=0xFFF // Bitmask for perfcounter collection. GFX9 only.
|
||||
- PERFCOUNTER=counter_name // Add a SQ counter to be collected with ATT; period defined by PERFCOUNTERS_CTRL. GFX9 only.
|
||||
- BUFFER_SIZE=[size] // Sets size of the ATT buffer collection, per dispatch, in megabytes (shared among all shader engines).
|
||||
- ISA_CAPTURE_MODE=[0,1,2] // Set capture mode during kernel dispatch.
|
||||
- 0 = capture symbols only.
|
||||
@@ -366,23 +365,21 @@ Tool used to collect fine-grained hardware metrics. Provides ISA-level instructi
|
||||
- Example for vectoradd.
|
||||
|
||||
```bash
|
||||
# -g add debugging symbols to the binary. Required only for tracking disassembly back to c++.
|
||||
hipcc -g --save-temps vectoradd_hip.cpp -o vectoradd_hip.exe
|
||||
# "auto" means to use the automatically captured ISA. "csv" dumps result to "my.csv".
|
||||
rocprofv2 -i input.txt -o my.csv --plugin att auto --mode csv ./vectoradd_hip.exe
|
||||
# -g adds debugging symbols to the binary. Required only for tracking disassembly back to c++.
|
||||
hipcc -g vectoradd_hip.cpp -o vectoradd_hip.exe
|
||||
# "auto" means to use the automatically captured ISA. "csv" dumps result to "vectoradd_float_v0.csv".
|
||||
rocprofv2 -i input.txt --plugin att auto --mode csv ./vectoradd_hip.exe
|
||||
```
|
||||
Instruction latencies will be in my.csv
|
||||
Instruction latencies will be in vectoradd_float_v0.csv
|
||||
|
||||
- Example with symbolic ISA (as in ROCm 5.7 or previous).
|
||||
```bash
|
||||
hipcc -g --save-temps vectoradd_hip.cpp -o vectoradd_hip.exe
|
||||
# A custom ISA can be used such as vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s
|
||||
# Special attention to the correct architecture for the ISA, such as "gfx1100" (navi31).
|
||||
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode network ./vectoradd_hip.exe
|
||||
rocprofv2 -i input.txt --plugin att vectoradd_hip-hip-amdgcn-amd-amdhsa-gfx1100.s --mode csv ./vectoradd_hip.exe
|
||||
```
|
||||
|
||||
Then open the browser at `http://localhost:8000`
|
||||
|
||||
***
|
||||
Note: For MPI or long running applications, we recommend to run collection, and later run the parser with already collected data:
|
||||
Run only collection: The assembly file is not used. Use mpirun [...] rocprofv2 [...] if needed.
|
||||
@@ -394,7 +391,13 @@ Tool used to collect fine-grained hardware metrics. Provides ISA-level instructi
|
||||
Remove the binary/application from the command line.
|
||||
```bash
|
||||
# Only runs the parser on previously collected data.
|
||||
rocprofv2 -i input.txt --plugin att auto --mode network
|
||||
rocprofv2 -i input.txt --plugin att auto --mode csv
|
||||
```
|
||||
|
||||
Note 2: By default, ATT only collects a SINGLE kernel dispatch for the whole application, which is the first dispatch matching the given filters (DISPATCH=<id> or KERNEL=<name>). To collect multiple dispatches in a single application run, use:
|
||||
|
||||
```bash
|
||||
export ROCPROFILER_MAX_ATT_PROFILES=<max_collections>
|
||||
```
|
||||
|
||||
***
|
||||
|
||||
Посилання в новій задачі
Заблокувати користувача