From 2222ce9b83f3bd8ed6c1d1c47bbbeeb07955f7c6 Mon Sep 17 00:00:00 2001 From: Sajina PK Date: Thu, 6 Mar 2025 18:03:33 -0500 Subject: [PATCH] Documentation update for VCN, JPEG, rocDecode and rocJPEG feature (#109) * Documentation update for VCN, JPEG, rocDecode and rocJPEG feature Update documents to include the new tracks for tracing VCN and JPEG activity. Update the rocDecode and rocJPEG tracing enabled using ROCprofiler-SDK. Update headings to the perfetto output images. * Add few more lines about domain values. * Add missing words to the dictionary --- .wordlist.txt | 3 ++ CHANGELOG.md | 6 +++ README.md | 4 ++ docs/conceptual/rocprof-sys-feature-set.rst | 3 ++ docs/how-to/configuring-runtime-options.rst | 41 ++++++++++++++++++- .../understanding-rocprof-sys-output.rst | 10 +++++ 6 files changed, 66 insertions(+), 1 deletion(-) diff --git a/.wordlist.txt b/.wordlist.txt index 705ade97ef..ae9070776b 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -38,6 +38,9 @@ pid polymorphism ppc proc +rocDecode +rocJPEG +roctx rpath rvalues sdk diff --git a/CHANGELOG.md b/CHANGELOG.md index bb56075c76..df413bd1c8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,12 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/](https://rocm.docs.amd.com/projects/rocprofiler-systems/en/latest/). +## ROCm Systems Profiler 1.1.0 for ROCm 6.5 + +### Added + +- Added profiling and metric collection capabilities for VCN engine activity, JPEG engine activity and API tracing for rocDecode, rocJPEG and VA-APIs. + ## ROCm Systems Profiler 0.1.1 for ROCm 6.3.2 ### Resolved issues diff --git a/README.md b/README.md index fed53600e8..2d99d84527 100755 --- a/README.md +++ b/README.md @@ -62,11 +62,15 @@ The documentation source files reside in the [`/docs`](/docs) folder of this rep - HIP kernel tracing - HSA API tracing - HSA operation tracing +- rocDecode API tracing +- rocJPEG API tracing - System-level sampling (via rocm-smi) - Memory usage - Power usage - Temperature - Utilization + - VCN Utilization + - JPEG Utilization ### CPU Metrics diff --git a/docs/conceptual/rocprof-sys-feature-set.rst b/docs/conceptual/rocprof-sys-feature-set.rst index fed24b55b3..5d4c264b3c 100644 --- a/docs/conceptual/rocprof-sys-feature-set.rst +++ b/docs/conceptual/rocprof-sys-feature-set.rst @@ -52,6 +52,8 @@ GPU metrics * HIP kernel tracing * HSA API tracing * HSA operation tracing +* rocDecode API tracing +* rocJPEG API tracing * System-level sampling (via rocm-smi) * Memory usage @@ -59,6 +61,7 @@ GPU metrics * Temperature * Utilization * VCN activity + * JPEG activity CPU metrics ======================================== diff --git a/docs/how-to/configuring-runtime-options.rst b/docs/how-to/configuring-runtime-options.rst index 9694582148..a8352b3bef 100644 --- a/docs/how-to/configuring-runtime-options.rst +++ b/docs/how-to/configuring-runtime-options.rst @@ -217,6 +217,44 @@ The following example: ROCPROFSYS_ROCM_EVENTS = GPUBusy SQ_WAVES:device=0 SQ_INSTS_VALU:device=1 +Exploring GPU Metrics +--------------------- + +ROCm Systems Profiler supports GPU metrics collection, sampling, and API tracing via `ROCprofiler-SDK `_ and `ROCm-SMI `_. +ROCprofiler-SDK supports application tracing to provide a big picture of the GPU application execution and kernel profiling to provide low-level hardware details from the performance counters. +The ROCm-SMI library offers a unified tool for managing, monitoring, and retrieving information about the system's drivers and GPUs. + +Sampling GPU metrics like utilization, temperature, power consumption, memory usage, etc., can be configured with ``ROCPROFSYS_ROCM_SMI_METRICS``. +The ``ROCPROFSYS_USE_ROCM_SMI`` setting should be enabled for GPU metric collection. + +For example, the following is a valid configuration: + +.. code-block:: shell + + ROCPROFSYS_ROCM_SMI_METRICS=busy,temp,power,vcn_activity,mem_usage + +Supported values for ``ROCPROFSYS_ROCM_SMI_METRICS`` are: ``busy``, ``temp``, ``power``, ``vcn_activity``, ``mem_usage``, ``jpeg_activity``. + +API tracing is configured with the ``ROCPROFSYS_ROCM_DOMAINS`` setting. The domains are used to filter the events that are captured during profiling. +Supported values for this setting are those supported by ROCprofiler-SDK, which are returned by the API ``get_callback_tracing_names()`` and ``get_buffer_tracing_names()``. See the `ROCprofiler-SDK developer API documentation `_ to learn more about ROCprofiler-SDK APIs. +Use the following command to view the available domains: + +.. code-block:: shell + + rocprof-sys-avail -bd -r ROCM_DOMAINS + +.. note:: + +Some settings can enable tracing for multiple domains, such as ``hip_api`` which will enable both ``hip_runtime_api`` and ``hip_compiler_api``. +And ``hsa_api`` which will enable all hsa domains, ``hsa_core_api``, ``hsa_amd_ext_api``, ``hsa_image_exit_api``, ``hsa_finalize_ext_api``. +The setting ``marker_api`` or ``roctx`` can be used to enable the roctx marker API tracing. + +For example, the following is a valid configuration: + +.. code-block:: shell + + ROCPROFSYS_ROCM_DOMAINS=hip_runtime_api,kernel_dispatch,memory_copy,rocdecode_api,rocjpeg_api + rocprof-sys-avail examples ----------------------------------- @@ -474,7 +512,8 @@ Viewing components | sampling_gpu_power | GPU Power Usage via ROCm-SMI. Derived fro... | | sampling_gpu_temp | GPU Temperature via ROCm-SMI. Derived fro... | | sampling_gpu_busy | GPU Utilization (% busy) via ROCm-SMI. De... | - | sampling_vcn_busy | GPU VCN Utilization (% activity) via ROCm... | + | sampling_gpu_vcn | GPU VCN Utilization (% activity) via ROCm... | + | sampling_gpu_jpeg | GPU JPEG Utilization (% activity) via ROCm.. | | sampling_gpu_memory_usage | GPU Memory Usage via ROCm-SMI. Derived fr... | |-----------------------------------|----------------------------------------------| diff --git a/docs/how-to/understanding-rocprof-sys-output.rst b/docs/how-to/understanding-rocprof-sys-output.rst index 73e27fe354..878effefab 100644 --- a/docs/how-to/understanding-rocprof-sys-output.rst +++ b/docs/how-to/understanding-rocprof-sys-output.rst @@ -366,22 +366,32 @@ absolute path, then all ``ROCPROFSYS_OUTPUT_PATH`` and similar settings are ignored. Visit `ui.perfetto.dev `_ and open this file. +**Figure 1:** Visualization of a performance graph in Perfetto + .. image:: ../data/rocprof-sys-perfetto.png :alt: Visualization of a performance graph in Perfetto :width: 800 +**Figure 2:** Visualization of ROCm data in Perfetto + .. image:: ../data/rocprof-sys-rocm.png :alt: Visualization of ROCm data in Perfetto :width: 800 +**Figure 3:** Visualization of ROCm flow data in Perfetto + .. image:: ../data/rocprof-sys-rocm-flow.png :alt: Visualization of ROCm flow data in Perfetto :width: 800 +**Figure 4:** Visualization of ROCm API calls in Perfetto + .. image:: ../data/rocprof-sys-user-api.png :alt: Visualization of ROCm API calls in Perfetto :width: 800 +**Figure 5:** Visualization of ROCm GPU metrics in Perfetto + .. image:: ../data/rocprof-sys-gpu-metrics.png :alt: Visualization of ROCm GPU metrics in Perfetto :width: 800