Implementing docs feedback with typos and minor enhancements

Co-authored-by: Jose Santos <josantos@amd.com> Signed-off-by: colramos-amd <colramos@amd.com> [ROCm/rocprofiler-compute commit: cf36fb7fbf]
2024-03-26 12:54:51 -05:00
@@ -5,7 +5,7 @@
   :glob:
   :maxdepth: 4
 ```
-Omniperf offers several ways to interact with the metrics it generates from profiling. The option you choose will likely be influnced by your familiarity with the profiled application, computing enviroment, and experience with Omniperf.
+Omniperf offers several ways to interact with the metrics it generates from profiling. The option you choose will likely be influenced by your familiarity with the profiled application, computing environment, and experience with Omniperf.

 While analyzing with the CLI offers quick and straightforward access to Omniperf metrics from terminal, the GUI adds an extra layer of styling and interactiveness some users may prefer.

@@ -59,7 +59,7 @@ Run `omniperf analyze -h` for more details.
  [analysis] deriving Omniperf metrics...

  --------------------------------------------------------------------------------
-  Detected Kernels (sorted decending by duration)
+  Detected Kernels (sorted descending by duration)
  ╒════╤══════════════════════════════════════════════╕
  │    │ Kernel_Name                                  │
  ╞════╪══════════════════════════════════════════════╡
@@ -472,7 +472,7 @@ Multiple performance number normalizations are provided to allow performance ins
 ##### Baseline Comparison
 Omniperf enables baseline comparison to allow checking A/B effect. Currently baseline comparison is limited to the same SoC. Cross comparison between SoCs is in development.

-For both the Current Workload and the Baseline Workload, one can independently setup the following filters to allow fine grained comparions:
+For both the Current Workload and the Baseline Workload, one can independently setup the following filters to allow fine grained comparisons:
 - Workload Name 
 - GPU ID filtering (multi-selection)
 - Kernel Name filtering (multi-selection)
@@ -576,7 +576,7 @@ Found sysinfo file
 KernelName shortening enabled
 Kernel name verbose level: 2
 Password:
-Password recieved
+Password received
 -- Conversion & Upload in Progress --
  0%|                                                                                                                                                                                                             | 0/11 [00:00<?, ?it/s]/home/auser/repos/omniperf/sample/workloads/vcopy/MI200/SQ_IFETCH_LEVEL.csv
  9%|█████████████████▉                                                                                                                                                                                   | 1/11 [00:00<00:01,  8.53it/s]/home/auser/repos/omniperf/sample/workloads/vcopy/MI200/pmc_perf.csv
@@ -669,7 +669,7 @@ There are currently 18 main panel categories available for analyzing the compute
  - Per-channel L2-EA Atomic requests
  - Per-channel L2-EA Read latency
  - Per-channel L2-EA Write latency
-  - Per-channel L2-EA Atomic  latency
+  - Per-channel L2-EA Atomic latency
  - Per-channel L2-EA Read stall (I/O, GMI, HBM)
  - Per-channel L2-EA Write stall (I/O, GMI, HBM, Starve)

@@ -36,7 +36,7 @@

 3. **Analyze at the command line**

-   After generating a local output folder (./workloads/\<name>), the command line tool can also be used to quickly interface with profiling results. View different metrics derived from your profiled results and get immediate access all metrics organized by hardware blocks.
+   After generating a local output folder (e.g. ./workloads/vcopy_data/MI200), the command line tool can also be used to quickly interface with profiling results. View different metrics derived from your profiled results and get immediate access all metrics organized by hardware blocks.

   If no kernel, dispatch, or hardware block filters are applied at this stage, analysis will be reflective of the entirety of the profiling data.

@@ -64,7 +64,7 @@ Modes change the fundamental behavior of the Omniperf command line tool. Dependi

 - **Analyze**: Profiling data from `-p`/`--path` directory is loaded into the Omniperf CLI analyzer where users have immediate access to profiling results and generated metrics. Metrics are quickly generated from the entirety of your profiled application or a subset you’ve identified through the Omniperf CLI analysis filters.

-    To gererate a lightweight GUI interface users can add the `--gui` flag to their analysis command.
+    To generate a lightweight GUI interface users can add the `--gui` flag to their analysis command.

    This mode is designed to be a middle ground to the highly detailed Omniperf Grafana GUI and is great for users who want immediate access to a hardware component they’re already familiar with.

@@ -8,14 +8,14 @@

 The [Omniperf](https://github.com/ROCm/omniperf) Tool is architecturally composed of three major components, as shown in the following figure.

- **Omniperf Profiling**: Acquire raw performance counters via application replay based on [rocProf](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html).  The counters are stored in a comma-seperated value, for further analysis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators.
+- **Omniperf Profiling**: Acquire raw performance counters via application replay based on [rocProf](https://rocm.docs.amd.com/projects/rocprofiler/en/latest/rocprof.html). The counters are stored in a comma-separated format, for further analysis. A set of MI200 specific micro benchmarks are also run to acquire the hierarchical roofline data. The roofline model is not available on earlier accelerators.

 - **Omniperf Grafana Analyzer**: 
  - *Grafana database import*: All raw performance counters are imported into the backend MongoDB database for Grafana GUI analysis and visualization. Compatibility of previously generated data between Omniperf versions is not necessarily guaranteed.
  - *Grafana GUI Analyzer*: A Grafana dashboard is designed to retrieve the raw counters info from the backend database. It also creates the relevant performance metrics and visualization.
 - **Omniperf Standalone GUI Analyzer**: A standalone GUI is provided to enable performance analysis without importing data into the backend database.

-![Omniperf Architectual Diagram](images/omniperf_server_vs_client_install.png)
+![Omniperf Architectural Diagram](images/omniperf_server_vs_client_install.png)

 > Note: To learn more about the client vs. server model of Omniperf and our install process please see the [Deployment section](./installation.md) of the docs.

@@ -19,7 +19,7 @@ Omniperf is broken into two installation components:

 Determine what you need to install based on how you would like to interact with Omniperf. See the decision tree below to help determine what installation is right for you.

-![Omniperf Installtion Decision Tree](images/install_decision_tree.png)
+![Omniperf Installation Decision Tree](images/install_decision_tree.png)

 ---

@@ -162,7 +162,7 @@ Omniperf server-side requires the following basic software dependencies prior to

 The recommended process for enabling the server-side of Omniperf is to use the provided Docker file to build the Grafana and MongoDB instance.

-Once you have decided which machine you would like to use to host the Grafana and MongoDB instance, please follow the set up instructions below.
+Once you have decided which machine you would like to use to host the Grafana and MongoDB instance, please follow the set-up instructions below.

 ### Install MongoDB Utils
 Omniperf uses [mongoimport](https://www.mongodb.com/docs/database-tools/mongoimport/) to upload data to Grafana's backend database. Install for Ubuntu 20.04 is as follows:
@@ -193,6 +193,13 @@ $ sudo docker-compose up -d
 ```
 > Note that TCP ports for Grafana (4000) and MongoDB (27017) in the docker container are mapped to 14000 and 27018, respectively, on the host side.

+### Restart (Debug)
+In the event that your Grafana or MongoDB instance crash fatally, you can always restart the server. Just navigate to your install directory and run:
+```bash
+$ sudo docker-compose down
+$ sudo docker-compose up -d
+```
+
 ### Setup Grafana Instance
 Once you have launched your docker container you should be able to reach Grafana at **http://\<host-ip>:14000**. The default login credentials for the first-time Grafana setup are:

@@ -233,7 +240,7 @@ Once you have imported a dashboard you are ready to begin! Start by browsing ava

 ![Opening your dashboard](images/opening_dashboard.png)

-Remeber, you will need to upload workload data to the DB backend before analyzing in your Grafana interface. We provide a detailed example of this in our [Analysis section](./analysis.md#grafana-gui-import).
+Remember, you will need to upload workload data to the DB backend before analyzing in your Grafana interface. We provide a detailed example of this in our [Analysis section](./analysis.md#grafana-gui-import).

 After a workload has been successfully uploaded, you should be able to select it from the workload dropdown located at the top of your Grafana dashboard.

@@ -38,7 +38,7 @@ Releasing CPU memory
 ```

 ## Omniperf Profiling
-The *omniperf* executable, available through the Omniperf repository, is used to aquire all necessary performance monitoring data through analysis of compute workloads.
+The *omniperf* executable, available through the Omniperf repository, is used to acquire all necessary performance monitoring data through analysis of compute workloads.

 **omniperf help:**
 ```shell-session
@@ -128,7 +128,7 @@ Collecting Performance Counters
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 [profiling] Current input file: /home/auser/repos/omniperf/sample/workloads/vcopy/MI200/perfmon/SQ_IFETCH_LEVEL.txt
-   |-> [rocprof] RPL: on '240312_174329' from '/opt/rocm-5.2.1' in '/home/colramos/GitHub/omniperf'
+   |-> [rocprof] RPL: on '240312_174329' from '/opt/rocm-5.2.1' in '/home/auser/repos/omniperf/src/omniperf'
   |-> [rocprof] RPL: profiling '""./vcopy -n 1048576 -b 256""'
   |-> [rocprof] RPL: input file '/home/auser/repos/omniperf/sample/workloads/vcopy/MI200/perfmon/SQ_IFETCH_LEVEL.txt'
   |-> [rocprof] RPL: output dir '/tmp/rpl_data_240312_174329_692890'
@@ -226,11 +226,11 @@ To reduce profiling time and the counters collected one may use profiling filter

 Filtering Options:

- The `-k` / `--kernel` flag allows for kernel filtering. Useage is equivalent with the current rocProf utility ([see details below](#kernel-filtering)).
+- The `-k` / `--kernel` \<kernel-substr> flag allows for kernel filtering. Usage is equivalent with the current rocProf utility ([see details below](#kernel-filtering)).

- The `-d` / `--dispatch` flag allows for dispatch ID filtering. Useage is equivalent with the current rocProf utility ([see details below](#dispatch-filtering)).
+- The `-d` / `--dispatch` \<dispatch-id> flag allows for dispatch ID filtering. Usage is equivalent with the current rocProf utility ([see details below](#dispatch-filtering)).

- The `-b` / `--block` flag allows system profiling on one or more selected hardware components to speed up the profiling process ([see details below](#hardware-component-filtering)).
+- The `-b` / `--block` \<block-name> flag allows system profiling on one or more selected hardware components to speed up the profiling process ([see details below](#hardware-component-filtering)).

 ```{note}
 Be cautious while combining different profiling filters in the same call. Conflicting filters may result in error.
@@ -348,9 +348,9 @@ Standalone Roofline Options:

 - The `--sort` \<desired_sort> allows you to specify whether you would like to overlay top kernel or top dispatch data in your roofline plot.

- The `-m` \<cache_level> allows you to specify specific level(s) of cache you would like to include in your roofline plot.
+- The `-m`/`--mem-level` \<cache_level> allows you to specify specific level(s) of cache you would like to include in your roofline plot.

- The `--device` \<gpu_id> allows you to specify a device id to collect performace data from when running our roofline benchmark on your system.
+- The `--device` \<gpu_id> allows you to specify a device id to collect performance data from when running our roofline benchmark on your system.

 - If you would like to distinguish different kernels in your .pdf roofline plot use `--kernel-names`. This will give each kernel a unique marker identifiable from the plot's key.

@@ -209,7 +209,7 @@ class DatabaseConnector:
            except Exception as e:
                console_error("database", "PASSWORD ERROR %s" % e)
            else:
-                console_log("database", "Password recieved")
+                console_log("database", "Password received")
        else:
            password = self.connection_info["password"]

@@ -244,8 +244,6 @@ def show_kernel_stats(args, runs, archConfigs, output):
    """
    Show the kernels and dispatches from "Top Stats" section.
    """
-    # print("\n" + "-" * 80, file=output)
-    # print("Detected Kernels (sorted decending by duration)", file=output)

    df = pd.DataFrame()
    for panel_id, panel in archConfigs.panel_configs.items():
@@ -260,7 +258,8 @@ def show_kernel_stats(args, runs, archConfigs, output):
                    if table_config["id"] == 1:
                        print("\n" + "-" * 80, file=output)
                        print(
-                            "Detected Kernels (sorted decending by duration)", file=output
+                            "Detected Kernels (sorted descending by duration)",
+                            file=output,
                        )
                        df = pd.concat([df, single_df["Kernel_Name"]], axis=1)