issue 40 typos fix

Change-Id: I486301c42bc5691a4d8a852e0ce168f8ca7776a0


[ROCm/rocprofiler commit: 1a63cba43d]
Šī revīzija ir iekļauta:
gobhardw
2024-08-07 23:00:28 +05:30
revīziju iesūtīja Giovanni Baraldi
vecāks f731ae7593
revīzija b363586f82
5 mainīti faili ar 22 papildinājumiem un 22 dzēšanām
+5 -5
Parādīt failu
@@ -231,14 +231,14 @@ Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced
### Changed
- ATT analysis will not run by default. For ATT to have the same behaviour as 5.5, use --plugin att <as.s> --mode network
- ATT analysis will not run by default. For ATT to have the same behavior as 5.5, use --plugin att <as.s> --mode network
- Kernel Names are now removed from HIP API records, users of the API can get the kernel names from the corresponding HIP Dispatch OPS using the correlation ID, this change was done to optimize and to manage the data copied.
- Removing Replay modes as we discovered that some of them will corrupt the applications' behavior, we will re-add them once we implement the fix for them.
### Optimized
- Improved ATT parser performance and filesizes.
- Now profiler autocorrects user input errors for pmc and throws exception for wrong input with this message:"Bad input metric. usage --> pmc: [counter1] [counter2]"
- Improved ATT parser performance and file sizes.
- Now profiler autocorrect user input errors for pmc and throws exception for wrong input with this message:"Bad input metric. usage --> pmc: [counter1] [counter2]"
### Added
@@ -249,7 +249,7 @@ Rocprofiler for ROCm 5.7 added support for counter collection (PMC) and advanced
- MI300 individual XCC counters dumped per-xcc as separate records but with same record-id and kernel dispatch info
- Naming for MPI ranks. Filenames containing "%rank" are replaced by variables "MPI_RANK", "OMPI_COMM_WORLD_RANK" or "MV2_COMM_WORLD_RANK".
- MPI Rank will appear in perfetto track names.
- File plugin is splitted to File & CLI plugins, CLI plugin is responsible for showing results on the terminal screen and will be automatically the choice if no -d option given in rocprof, File plugin on the other hand is responsible for writing the output results in files if -d option is given.
- File plugin has been split to File & CLI plugins, CLI plugin is responsible for showing results on the terminal screen and will be automatically the choice if no -d option given in rocprof, File plugin on the other hand is responsible for writing the output results in files if -d option is given.
- Structure of the results is different for both CLI & File plugin; File plugin will make sure every type of result is in a separate file, starting by specifying the header; CLI plugin will have the records in the old way.
Example for file plugin output:
@@ -268,7 +268,7 @@ Example for file plugin output:
```
- Removing Record IDs from tracer records in CLI plugin.
- Added Flush Interval and Trace Period functionality, where --flush-interval [time_in_ms], for flushing the buffers every given interval by the user, and --trace-period [delay]:[trace_time]:[interval], where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (ommiting interval = infinite). For more details please refer to the ROCProfV2 tool usage document.
- Added Flush Interval and Trace Period functionality, where --flush-interval [time_in_ms], for flushing the buffers every given interval by the user, and --trace-period [delay]:[trace_time]:[interval], where delay is the time to wait before starting session, trace_time is the time between every start and stop session and interval the time between two consecutive sessions (omitting interval = infinite). For more details please refer to the ROCProfV2 tool usage document.
- Added requirements.txt to be used to install all the necessary python3 packages.
- ATT plugin:
- Added --mode, --mpi and --depth parameters.
+5 -5
Parādīt failu
@@ -154,7 +154,7 @@ The user has two options for building:
./build.sh --clean-build OR ./build.sh -cb
```
- Option 2 (Where ROCM_PATH envronment need to be set with the current installation directory of rocm), run the following:
- Option 2 (Where ROCM_PATH environment need to be set with the current installation directory of rocm), run the following:
- Creating the build directory
@@ -353,7 +353,7 @@ Tool used to collect fine-grained hardware metrics. Provides ISA-level instructi
- Install plugin package. See Plugin Support section for installation
- Run the following to view the trace. Att-specific options must come right after the assembly file.
- On ROCm 6.0, ATT enables automatic capture of the ISA during kernel execution, and does not require recompiling. It is recommeneded to leave at "auto".
- On ROCm 6.0, ATT enables automatic capture of the ISA during kernel execution, and does not require recompiling. It is recommended to leave at "auto".
```bash
rocprofv2 -i input.txt --plugin att auto --mode csv <app_relative_path>
@@ -371,7 +371,7 @@ Tool used to collect fine-grained hardware metrics. Provides ISA-level instructi
- csv
Dumps the analyzed assembly into a CSV format, with the hitcount and total cycles cost. Recommended mode for most users.
- file (deprecated)
Dumps the analyzed json files to disk for vieweing at a later time. Run python3 httpserver.py from within the generated name_ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm.
Dumps the analyzed json files to disk for viewing at a later time. Run python3 httpserver.py from within the generated name_ui/ folder to view the trace. The folder can be copied to another machine, and will run without rocm.
- file,csv
Both options can be used at the same time, generating a UI folder and a .csv.
- network [removed]
@@ -503,7 +503,7 @@ A device profiling session allows the user to profile the GPU device for counter
- unittests (Gtest Based) : These includes tests for core classes. Any newly added functionality should have a unit test written to it.
- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in evry run.
- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in every run.
- memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions.
@@ -607,7 +607,7 @@ samples can be run as independent executables once installed
- plugin
- file: File Plugin
- perfetto: Perfetto Plugin
- att: Adavced thread tracer Plugin
- att: Advanced thread tracer Plugin
- ctf: CTF Plugin
- samples: Samples of how to use the API, and also input.txt input file samples for counter collection and ATT.
- script: Scripts needed for tracing
@@ -124,7 +124,7 @@ Info API:
- rocprofiler_info_query_t - profiling info query
- rocprofiler_info_data_t - profiling info data
- rocprofiler_get_info - return the info for a given info kind
- rocprofiler_iterote_inf_ - iterate over the info for a given info kind
- rocprofiler_iterate_inf_ - iterate over the info for a given info kind
- rocprofiler_query_info - iterate over the info for a given info query
Context API:
@@ -38,7 +38,7 @@ ROCProfiler: input from "/tmp/rpl_data_191018_011134_9695/input0.xml"
Device name Ellesmere [Radeon RX 470/480/570/570X/580/580X]
PASSED!
ROCPRofiler: 1 contexts collected, output directory /tmp/rpl_data_191018_011134_9695/input0_results_191018_011134
ROCprofiler: 1 contexts collected, output directory /tmp/rpl_data_191018_011134_9695/input0_results_191018_011134
RPL: '/…./MatrixTranspose/input.csv' is generated
```
#### 2.1.1. Counters and metrics
@@ -94,7 +94,7 @@ Derived metrics:
TCC_HIT_sum = sum(TCC_HIT,16)
gpu-agent0 : TCC_MISS_sum : Number of cache misses. Sum over TCC instances.
TCC_MISS_sum = sum(TCC_MISS,16)
gpu-agent0 : TCC_MC_RDREQ_sum : Number of 32-byte reads. Sum over TCC instaces.
gpu-agent0 : TCC_MC_RDREQ_sum : Number of 32-byte reads. Sum over TCC instances.
TCC_MC_RDREQ_sum = sum(TCC_MC_RDREQ,16)
. . .
```
@@ -199,7 +199,7 @@ hsa: hsa_queue_create hsa_amd_memory_pool_allocate
#### 3.2.2. Tracing time period
Trace can be dumped periodically with initial delay, dumping period length and rate:
```
--trace-period <dealy:length:rate>
--trace-period <delay:length:rate>
```
### 3.3. Concurrent kernels
Currently concurrent kernels profiling is not supported which is a planned feature. Kernels are serialized.
@@ -268,12 +268,12 @@ Options:
-o <output file> - output CSV file [<input file base>.csv]
-d <data directory> - directory where profiler store profiling data including traces [/tmp]
The data directory is renoving autonatically if the directory is matching the temporary one, which is the default.
The data directory is removed automatically if the directory is matching the temporary one, which is the default.
-t <temporary directory> - to change the temporary directory [/tmp]
By changing the temporary directory you can prevent removing the profiling data from /tmp or enable removing from not '/tmp' directory.
--basenames <on|off> - to turn on/off truncating of the kernel full function names till the base ones [off]
--timestamp <on|off> - to turn on/off the kernel disoatches timestamps, dispatch/begin/end/complete [off]
--timestamp <on|off> - to turn on/off the kernel dispatches timestamps, dispatch/begin/end/complete [off]
--ctx-wait <on|off> - to wait for outstanding contexts on profiler exit [on]
--ctx-limit <max number> - maximum number of outstanding contexts [0 - unlimited]
--heartbeat <rate sec> - to print progress heartbeats [0 - disabled]
@@ -297,11 +297,11 @@ Options:
</trace>
--trace-start <on|off> - to enable tracing on start [on]
--trace-period <dealy:length:rate> - to enable trace with initial delay, with periodic sample length and rate
--trace-period <delay:length:rate> - to enable trace with initial delay, with periodic sample length and rate
Supported time formats: <number(m|s|ms|us)>
Configuration file:
You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/evgeny:<package path>
You can set your parameters defaults preferences in the configuration file 'rpl_rc.xml'. The search path sequence: .:/home/user:<package path>
First the configuration file is looking in the current directory, then in your home, and then in the package directory.
Configurable options: 'basenames', 'timestamp', 'ctx-limit', 'heartbeat', 'obj-tracking'.
An example of 'rpl_rc.xml':
@@ -51,7 +51,7 @@ The user has two options for building:
./build.sh --clean-build OR ./build.sh -cb
```
- Option 2 (Where ROCM_PATH envronment need to be set with the current installation directory of rocm), run the following:
- Option 2 (Where ROCM_PATH environment need to be set with the current installation directory of rocm), run the following:
```bash
# Creating the build directory
mkdir build && cd build
@@ -121,7 +121,7 @@ The user has two options for building:
pmc: SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE SQ_INSTS_VALU
```
- Application Trace Support: Differnt trace options are available while profiling an app:
- Application Trace Support: Different trace options are available while profiling an app:
```bash
# HIP API & asynchronous activity tracing
rocprofv2 --hip-api <app_relative_path> ## For synchronous HIP API Activity tracing
@@ -202,7 +202,7 @@ The user has two options for building:
- unittests (Gtest Based) : These includes tests for core classes. Any newly added functionality should have a unit test written to it.
- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in evry run.
- featuretests (standalone and Gtest Based): These includes both API tests and tool tests. Tool is tested against different applications to make sure we have right output in every run.
- memorytests (standalone): This includes running address sanitizer for memory leaks, corruptions.
@@ -267,7 +267,7 @@ samples can be run as independent executables once installed
- plugin
- file: File Plugin
- perfetto: Perfetto Plugin
- att: Adavced thread tracer Plugin
- att: Advanced thread tracer Plugin
- ctf: CTF Plugin
- samples: Samples of how to use the API, and also input.txt input file samples for counter collection and ATT.
- script: Scripts needed for tracing