From 93abda4cfd8a9cb4f4af0d360f6d44db9595dc51 Mon Sep 17 00:00:00 2001 From: "Bhardwaj, Gopesh" Date: Tue, 22 Apr 2025 20:52:37 +0530 Subject: [PATCH] Copilot suggestions (#360) * Copilot suggestions * Fixing perfetto links * correcting default value of agent-index [ROCm/rocprofiler-sdk commit: 1f1c192a5eedd3fc59d77dd269bdc74664b7f5d3] --- .../comparing-with-legacy-tools.rst | 10 ++--- .../source/docs/how-to/samples.rst | 12 +++--- .../source/docs/how-to/using-pc-sampling.rst | 16 ++++---- .../how-to/using-rocprofiler-sdk-roctx.rst | 10 ++--- .../docs/how-to/using-rocprofv3-with-mpi.rst | 8 ++-- .../source/docs/how-to/using-rocprofv3.rst | 38 ++++++++++--------- .../source/docs/install/installation.rst | 10 ++--- 7 files changed, 52 insertions(+), 52 deletions(-) diff --git a/projects/rocprofiler-sdk/source/docs/conceptual/comparing-with-legacy-tools.rst b/projects/rocprofiler-sdk/source/docs/conceptual/comparing-with-legacy-tools.rst index ae141b7266..ee7a10e1a0 100644 --- a/projects/rocprofiler-sdk/source/docs/conceptual/comparing-with-legacy-tools.rst +++ b/projects/rocprofiler-sdk/source/docs/conceptual/comparing-with-legacy-tools.rst @@ -1,5 +1,5 @@ .. meta:: - :description: Documentation of the installation, configuration, use of the ROCprofiler-SDK, and rocprofv3 command-line tool + :description: ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software :keywords: ROCprofiler-SDK tool, ROCprofiler-SDK library, rocprofv3, ROCm, API, reference .. _comparing-with-legacy-tools: @@ -134,7 +134,7 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more - Part of `--hsa-trace` option - Part of `--hsa-trace` option - `--hsa-image-trace` - - New option for collecting HSA API Traces (Image-extenson API), e.g. HSA functions prefixed with only `hsa_ext_image_` (i.e. hsa_ext_image_get_capability). + - New option for collecting HSA API Traces (Image-extension API), e.g. HSA functions prefixed with only `hsa_ext_image_` (i.e. hsa_ext_image_get_capability). - * - Granular tracing options - HSA Finalizer trace @@ -266,9 +266,9 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more - Perfetto data collection backend - *Not available* - *Not available* - - `--perfetto-backend` {inprocess,system} + - `--perfetto-backend` {in-process,system} - New option for perfetto data collection backend. 'system' mode requires starting traced and perfetto daemons - - `rocprofv2` used only in-process collection for perfetto plugin, However, `rocprofv3` give the option to the user + - `rocprofv2` used only in-process collection for perfetto plugin, However, `rocprofv3` gives the user the option. * - Perfetto-specific options - Perfetto Buffer Size - *Not available* @@ -338,7 +338,7 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more - Supports input text and XML format - Only supports input text format - Input support for text, YAML and JSON formats - - | # Its not possible to check for valid text file. Hence rocprofv3 supports strongly typed input formats. + - | # It's not possible to check for valid text file. Hence rocprofv3 supports strongly typed input formats. | # YAML and JSON formats are more readable and easy to maintain. | # Allows flexibility to add more features for the tool input - diff --git a/projects/rocprofiler-sdk/source/docs/how-to/samples.rst b/projects/rocprofiler-sdk/source/docs/how-to/samples.rst index 3fe185693b..b8c5ea2765 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/samples.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/samples.rst @@ -1,9 +1,8 @@ -.. --- -.. myst: -.. html_meta: -.. "description": "ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software." -.. "keywords": "ROCprofiler-SDK, ROCProfiler-SDK samples" -.. --- +.. meta:: + :description: "ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software." + :keywords: "ROCprofiler-SDK, ROCProfiler-SDK samples" + +.. _rocprofiler-sdk-samples: ROCprofiler-SDK samples ======================== @@ -47,3 +46,4 @@ To run the built samples, ``cd`` into the ``build-rocprofiler-sdk-samples`` dire ctest -V +The `-V` option enables verbose output, providing detailed information about the test execution. \ No newline at end of file diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-pc-sampling.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-pc-sampling.rst index 824b25bee3..47f6d4cedd 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-pc-sampling.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-pc-sampling.rst @@ -8,7 +8,7 @@ Using PC sampling ================== -PC (Program Counter) sampling service for GPU profiling is a profiling technique to periodically sample the program counter during GPU kernel execution. PC sampling helps to understand code execution patterns and hotspots. +PC (Program Counter) sampling service for GPU profiling is a profiling technique to periodically sample the program counter during GPU kernel execution. PC sampling helps in understanding code execution patterns and identifying hotspot(s). Here are the benefits of using PC sampling: @@ -55,7 +55,7 @@ Based on the preceding configuration, you can use the following command to profi rocprofv3 --pc-sampling-beta-enabled --pc-sampling-method host_trap --pc-sampling-unit time --pc-sampling-interval 1 -- -The preceding command enables PC sampling with the ``host_trap`` method, ``time`` unit, and an interval of ``1`` μs (micro second). Replace ```` with the path to the application you want to profile. +The preceding command enables PC sampling with the ``host_trap`` method, ``time`` unit, and an interval of ``1`` μs (microsecond). Replace ```` with the path to the application you want to profile. This generates two files, ``agent_info.csv`` and ``pc_sampling_host_trap.csv``. Both files are prefixed with the process ID. @@ -186,10 +186,10 @@ Hardware-Based (Stochastic) PC Sampling Method =============================================== The new ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` has been introduced for gfx942 architecture. -It employes a specific hardware for probing waves actively running on GPU. +It employs a specific hardware for probing waves actively running on GPU. Beside information already provided with ``ROCPROFILER_PC_SAMPLING_METHOD_HOST_TRAP`` useful for determining hot-spots within the kernel, it delivers additional information that tells whether a sampled wave issued an instruction represented with particular PC. -If not, it tells what is the reason for not issuing the instruction (stall reason). +If not, it provides the reason for not issuing the instruction (stall reason). This type of information is particularly useful for understanding stalls during the kernel execution. To use this method on gfx942, we recommend listing available PC sampling configurations to verify if the latest ROCm stack is installed @@ -199,7 +199,7 @@ on the system by running: rocprofv3 -L -Outputi similar to the following indicates that the ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method is available: +Output similar to the following indicates that the ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method is available: .. code-block:: bash @@ -208,9 +208,9 @@ Outputi similar to the following indicates that the ``ROCPROFILER_PC_SAMPLING_ME Minimum_Interval: 256 Maximum_Interval: 2147483648 -Please note that on gfx942, ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` requires intervals to be specified in cycles whose value are power of 2. +Please note that on gfx942, `ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC` requires intervals to be specified in cycles, whose values are powers of 2 -To profile a gfx942 accelarated application with ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` PC sampling, one can use the following command: +To profile a gfx942 accelerated application with ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` PC sampling, one can use the following command: .. code-block:: bash @@ -230,7 +230,7 @@ generates additional fields: :widths: 20,10,10,10,10,20,10,20,20,10 :header-rows: 1 -Similarly, ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method delievers additional information to every sample in the JSON output. +Similarly, ``ROCPROFILER_PC_SAMPLING_METHOD_STOCHASTIC`` method delivers additional information to every sample in the JSON output. The following snippet shows one sample from ``out_results.json`` file. .. code-block:: text diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofiler-sdk-roctx.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofiler-sdk-roctx.rst index f9eb4fb763..161fb89e66 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofiler-sdk-roctx.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofiler-sdk-roctx.rst @@ -8,8 +8,8 @@ Using ROCTx ============ -ROCTx is AMD tools extension library, a cross platform API for annotating code with markers and ranges. The ROCTx API is written in C++. -In certain situations, such as debugging performance issues in large-scale GPU programs, API-level tracing might be too fine-grained to provide a big picture of the program execution. +ROCTx is an AMD tools extension library, a cross platform API for annotating code with markers and ranges. The ROCTx API is written in C++. +In certain situations, such as debugging performance issues in large-scale GPU programs, API-level tracing might be too fine-grained to provide an overview of the program execution. In such cases, it is helpful to define specific tasks to be traced. To specify the tasks for tracing, enclose the respective source code with the API calls provided by the ROCTx library. This process is also known as instrumentation. @@ -21,14 +21,14 @@ ROCTx provides two types of annotations: markers and ranges. Markers ======== -Markers are used to insert a marker in the code with a message. Creating markers help you see when a line of code is executed. +Markers are used to insert a marker in the code with a message. Creating markers helps you see when a line of code is executed. Ranges ======= Ranges are used to define the scope of code for instrumentation using enclosing API calls. A range is a programmer-defined task that has a well-defined start and end code scope. -You can also refine the scope specified within a range using further nested ranges. ``rocprofv3`` also reports the timelines for these nested ranges. +You can further refine the scope specified within a range using nested ranges. ``rocprofv3`` also reports the timelines for these nested ranges. These are the two types of ranges: @@ -139,7 +139,7 @@ To trace the preceding code, use: rocprofv3 --marker-trace --hip-trace -- -The preceding command generates a ``hip_api_trace.csv`` file prefixed with the process ID. The file has only two ``hipMemcpy`` calls with the in-between ``hipMemcpyDeviceToHost`` hidden . +The preceding command generates a ``hip_api_trace.csv`` file prefixed with the process ID. The file contains two ``hipMemcpy`` calls with the in-between ``hipMemcpyDeviceToHost`` call hidden . .. code-block:: shell diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-mpi.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-mpi.rst index 0649bcf4c9..d191e28e21 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-mpi.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-mpi.rst @@ -1,5 +1,5 @@ .. meta:: - :description: Documentation of the mpi usage for rocprofv3 + :description: Documentation of the MPI usage for rocprofv3 :keywords: ROCprofiler-SDK tool, mpirun, rocprofv3, rocprofv3 tool usage, mpich, ROCprofiler-SDK command line tool, ROCprofiler-SDK CLI @@ -9,9 +9,9 @@ Using rocprofv3 with MPI +++++++++++++++++++++++++++++ Message Passing Interface (MPI) is a standardized and portable message-passing system designed to function on a wide variety of parallel computing architectures. MPI is widely used for developing parallel applications and is considered the de facto standard for communication in high-performance computing (HPC) environments. -MPI applications are parallel applications running across multiple processes that can be distributed over one or more nodes. +MPI applications are parallel programs that run across multiple processes, which can be distributed over one or more nodes. -For MPI applications or other job launchers such as SLURM, place ``rocprofv3`` inside the job launcher. The following example demonstrates how to use ``rocprofv3`` with MPI: +For MPI applications or other job launchers such as `SLURM `_, place ``rocprofv3`` inside the job launcher. The following example demonstrates how to use ``rocprofv3`` with MPI: .. code-block:: bash @@ -30,7 +30,7 @@ The preceding command runs the application with ``rocprofv3`` and generates the 2293215_agent_info.csv 2293215_hip_api_trace.csv -Since the data collection is performed in-process, it's ideal to collect data from within the process(es) launched by MPI. Outside of ``mpirun``, the tool library is loaded into the ``mpirun`` executable. +Since the data collection is performed in-process, it's ideal to collect data from within the processes launched by MPI. When ``rocprofv3`` is run outside of ``mpirun``, the tool library is loaded into the `mpirun` executable.. Collecting data outside of ``mpirun`` works but fetches agent info for the ``mpirun`` process too. For example: .. code-block:: bash diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst index 178bf82db8..7e3d1c8163 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3.rst @@ -8,18 +8,18 @@ Using rocprofv3 ====================== -``rocprofv3`` is a CLI tool that helps you quickly optimize applications and understand the low-level kernel details without requiring any modification in the source code. -It's backward compatible with its predecessor, ``rocprof``, and provides more features for application profiling with better accuracy. +``rocprofv3`` is a CLI tool that helps you optimize applications and analyze the low-level kernel details without requiring any modification in the source code. +It's backward compatible with its predecessor, `rocprof `_, and provides enhanced features for application profiling with better accuracy. The following sections demonstrate the use of ``rocprofv3`` for application tracing and kernel counter collection using various command-line options. -``rocprofv3`` is installed with ROCm under ``/opt/rocm/bin``. To use the tool from anywhere in the system, export ``PATH`` variable: +``rocprofv3`` is installed with ROCm under ``/opt/rocm/bin``. To use the tool from anywhere in the system, export the ``PATH`` variable: .. code-block:: bash export PATH=$PATH:/opt/rocm/bin -Before you start tracing or profiling your HIP application using ``rocprofv3``, build the application using: +Before tracing or profiling your HIP application using ``rocprofv3``, build it using: .. code-block:: bash @@ -55,7 +55,7 @@ The following table lists the commonly used ``rocprofv3`` command-line options c - | Specifies the path to the input file. JSON and YAML formats support configuration of all command-line options for tracing and profiling whereas the text format supports only the specification of HW counters. |br| |br| | Specifies output file name. If nothing is specified, the default path is ``%hostname%/%pid%``. |br| |br| | Specifies the output path for saving the output files. If nothing is specified, the default path is ``%hostname%/%pid%``. |br| |br| - | Specifies output format. Supported formats: CSV, JSON, and PFTrace. |br| |br| |br| + | Specifies output format. Supported formats: CSV, JSON, PFTrace, and OTF2. |br| |br| |br| | Sets the desired log level. |br| |br| |br| | Specifies the path to a YAML file consisting of extra counter definitions. @@ -85,7 +85,7 @@ The following table lists the commonly used ``rocprofv3`` command-line options c | ``--hsa-trace`` [BOOL] |br| |br| |br| |br| |br| |br| |br| |br| | ``--rccl-trace`` [BOOL] |br| |br| |br| |br| | ``--kokkos-trace`` [BOOL] |br| |br| |br| |br| - | ``--rocdecode-trace`` [BOOL] + | ``--rocdecode-trace`` [BOOL] |br| |br| |br| |br| - | Combination of ``--hip-runtime-trace`` and ``--hip-compiler-trace``. This option only enables the HIP API tracing. Unlike previous iterations of ``rocprof``, this option doesn't enable kernel tracing, memory copy tracing, and so on. |br| |br| | Collects marker (ROCTx) traces. Similar to ``--roctx-trace`` option in earlier ``rocprof`` versions, but with improved ``ROCTx`` library with more features. |br| |br| | Collects kernel dispatch traces. |br| |br| @@ -95,7 +95,7 @@ The following table lists the commonly used ``rocprofv3`` command-line options c | Collects ``--hsa-core-trace``, ``--hsa-amd-trace``, ``--hsa-image-trace``, and ``--hsa-finalizer-trace``. This option only enables the HSA API tracing. Unlike previous iterations of ``rocprof``, this doesn't enable kernel tracing, memory copy tracing, and so on. |br| |br| | Collects traces for RCCL (ROCm Communication Collectives Library), which is also pronounced as 'Rickle'. |br| |br| | Enables builtin Kokkos tools support, which implies enabling ``--marker-trace`` collection and ``--kernel-rename``. |br| |br| - | Collects traces for rocDecode APIs. + | Collects traces for rocDecode APIs. |br| |br| * - Granular tracing - | ``--hip-runtime-trace`` [BOOL] |br| |br| |br| |br| @@ -321,9 +321,9 @@ Marker trace Kokkos trace ++++++++++++++ -`Kokkos `_ is a C++ library for writing performance portable applications. Kokkos is used in many scientific applications for writing performance portable code that can run on CPUs, GPUs, and other accelerators. +`Kokkos `_ is a C++ library for writing performance portable applications. Kokkos is widely used in scientific applications to write performance-portable code for CPUs, GPUs, and other accelerators. ``rocprofv3`` loads an inbuilt `Kokkos Tools library `_, which emits roctx ranges with the labels passed using Kokkos APIs. For example, ``Kokkos::parallel_for(“MyParallelForLabel”, …)`` calls ``roctxRangePush`` internally and enables the kernel renaming option to replace the highly templated kernel names with the Kokkos labels. -To enable the inbuilt marker support, use the ``kokkos-trace`` option. Internally, this option enables ``marker-trace`` and ``kernel-rename``: +To enable the inbuilt marker support, use the ``kokkos-trace`` option. Internally, this option automatically enables ``marker-trace`` and ``kernel-rename``: .. code-block:: bash @@ -429,7 +429,7 @@ For the description of the fields in the output file, see :ref:`output-file-fiel Runtime trace +++++++++++++++ -This is a short-hand option that targets the most relevant tracing options for a standard user by +This is a shorthand option that targets the most relevant tracing options for a standard user by excluding traces for HSA runtime API and HIP compiler API. The HSA runtime API is excluded because it is a lower-level API upon which HIP and OpenMP target are built and @@ -525,7 +525,7 @@ Here are the contents of ``rocdecode_api_trace.csv`` file: :widths: 10,10,10,10,10,20,20 :header-rows: 1 -Perfetto will also show rocDeocde API arguments. Pointers will not be dereferenced and only the address will be displayed. +Perfetto will also show rocDecode API arguments. Pointers will not be dereferenced and only the address will be displayed. rocJPEG trace +++++++++++++++ @@ -724,6 +724,8 @@ To supply the input file for collecting traces, use: rocprofv3 -i input.yaml -- +Please note that input file format must be a valid `YAML` or `JSON` file. + Disabling specific tracing options ++++++++++++++++++++++++++++++++++++ @@ -775,9 +777,9 @@ For a comprehensive list of counters available on MI200, see `MI200 performance Counter collection using input file +++++++++++++++++++++++++++++++++++++ -You can use an input file in text (.txt), YAML (.yaml/.yml), or JSON (.json) format to collect the desired counters. +Input files can be in text (.txt), YAML (.yaml/.yml), or JSON (.json) format to specify the the desired counters for collection. -When using input file in text format, the line consisting of the counter names must begin with ``pmc``. The number of counters that can be collected in one run of profiling are limited by the GPU hardware resources. If too many counters are selected, the kernels need to be executed multiple times to collect them. For multi-pass execution, include multiple ``pmc`` rows in the input file. Counters in each ``pmc`` row can be collected in each application run. +When using input file in text format, the line consisting of the counter names must begin with ``pmc``. The number of counters that can be collected in one profiling run are limited by the GPU hardware resources. If too many counters are selected, the kernels need to be executed multiple times(multi-pass execution) to collect all the counters. For multi-pass execution, include multiple ``pmc`` rows in the input file. Counters in each ``pmc`` row can be collected in each application run. Here is a sample input.txt file for specifying counters for collection: @@ -978,14 +980,14 @@ Perfetto visualization for traces +++++++++++++++++++++++++++++++++++++++++++++ Users can generate Perfetto trace files using the ``--output-format pftrace`` option. This allows users to visualize the traces in the Perfetto viewer. -Perfetto is a powerful open-source tracing tool that provides a comprehensive view of system performance. It allows you to visualize the collected traces in a user-friendly interface, making it easier to analyze and understand the performance characteristics of your application. +Perfetto is an open-source tracing tool that provides a detailed view of system performance. It allows you to visualize the collected traces in a user-friendly interface, making it easier to analyze and understand the performance characteristics of your application. To generate a Perfetto trace file, use the ``--output-format pftrace`` option along with the desired tracing options. For example, to collect system traces and generate a Perfetto trace file, use: .. code-block:: bash rocprofv3 --sys-trace --output-format pftrace -- -The generated Perfetto trace file can be opened in the Perfetto UI (https://ui.perfetto.dev/). +The generated Perfetto trace file can be opened in the `Perfetto UI `_. **Figure 1:** Generic perfetto visualization @@ -1012,9 +1014,9 @@ To generate a Perfetto trace file with counter data, use: rocprofv3 --pmc SQ_WAVES GRBM_COUNT --output-format pftrace -- -The generated Perfetto trace file can be opened in the Perfetto UI (https://ui.perfetto.dev/). In the viewer, performance counters will appear as counter tracks organized by agent, allowing you to visualize counter values changing over time alongside kernel executions and other traced activities. +The generated Perfetto trace file can be opened in the `Perfetto UI `_. In the viewer, performance counters will appear as counter tracks organized by agent, allowing you to visualize counter values changing over time alongside kernel executions and other traced activities. -you can also combine this with the system trace option to get a more comprehensive view of the system's performance. For example, you can use the following command to collect both system trace and performance counter data: +You can also combine this with the system trace option to get a more comprehensive view of the system's performance. For example, you can use the following command to collect both system trace and performance counter data: .. code-block:: bash rocprofv3 --pmc SQ_WAVES GRBM_COUNT --sys-trace --output-format pftrace -- @@ -1053,7 +1055,7 @@ The agent index is a unique identifier for each agent in the system. It is used - **type-relative** == *logical_node_type_id* - relative index of the agent accounting for cgroups masking where indexing starts at zero for each agent type. e.g. CPU-0, GPU-0, GPU-1 -To set the agent index in the output files, use the ``--agent-index`` option. The default value is ``absolute``. +To set the agent index in the output files, use the ``--agent-index`` option. The default value is ``relative``. The following example shows how to set the agent index on a system with multiple GPUs and CPUs: diff --git a/projects/rocprofiler-sdk/source/docs/install/installation.rst b/projects/rocprofiler-sdk/source/docs/install/installation.rst index dc4757b879..bf5c76911a 100644 --- a/projects/rocprofiler-sdk/source/docs/install/installation.rst +++ b/projects/rocprofiler-sdk/source/docs/install/installation.rst @@ -1,9 +1,7 @@ -.. --- -.. myst: -.. html_meta: -.. "description": "ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software." -.. "keywords": "Installing ROCprofiler-SDK, Install ROCprofiler-SDK, Build ROCprofiler-SDK" -.. --- +.. meta:: + :description: "ROCprofiler-SDK is a tooling infrastructure for profiling general-purpose GPU compute applications running on the ROCm software." + :keywords: "Installing ROCprofiler-SDK, Install ROCprofiler-SDK, Build ROCprofiler-SDK" + ROCprofiler-SDK installation ============================