diff --git a/projects/rocprofiler-compute/CHANGELOG.md b/projects/rocprofiler-compute/CHANGELOG.md index 4e5576a9a0..b385de2a1f 100644 --- a/projects/rocprofiler-compute/CHANGELOG.md +++ b/projects/rocprofiler-compute/CHANGELOG.md @@ -13,6 +13,8 @@ Full documentation for ROCm Compute Profiler is available at [https://rocm.docs. * Show description of metrics during analysis * Use `--include-cols Description` to show the Description column, which is excluded by default from the ROCm Compute Profiler CLI output. +* `--set` filtering option in profile mode to enable single-pass counter collection for predefined subsets of metrics. +* `--list-sets` filtering option in profile mode to list the sets available for single pass counter collection * Add missing counters based on register specification which enables missing metrics * Enable SQC_DCACHE_INFLIGHT_LEVEL counter and associated metrics diff --git a/projects/rocprofiler-compute/docs/how-to/profile/mode.rst b/projects/rocprofiler-compute/docs/how-to/profile/mode.rst index 2ec2133326..4a2f82b0f2 100644 --- a/projects/rocprofiler-compute/docs/how-to/profile/mode.rst +++ b/projects/rocprofiler-compute/docs/how-to/profile/mode.rst @@ -279,6 +279,11 @@ Filtering options Allows for dispatch ID filtering. Usage is equivalent with the current ``rocprof`` utility. See :ref:`profiling-dispatch-filtering`. +``--set `` + Allows for single pass counter collection of sets of metrics with minimized profiling overhead. + Cannot be used with ``--roof-only`` or ``--block``. + See :ref:`profiling-metric-sets`. + .. tip:: Be cautious when combining different profiling filters in the same call. @@ -470,6 +475,80 @@ of the application (note zero-based indexing). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ... +.. _profiling-metric-sets: + +Metric sets filtering +^^^^^^^^^^^^^^^^^^ + +A metrics set contains a subset of metrics that can be collected in a single pass. This filtering option minimizes profiling overhead by only collecting counters of interest. +The `--set` filter option provides a convenient way to group related metrics for common profiling scenarios, eliminating the need to manually specify individual metrics for typical analysis workflows. +This option cannot be used with ``--roof-only`` and ``--block``. + +.. code-block:: shell-session + + $ rocprof-compute profile --name vcopy --set compute_thruput_util -- ./vcopy -n 1048576 -b 256 + + __ _ + _ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___ + | '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \ + | | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/ + |_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___| + |_| |_| + + rocprofiler-compute version: 2.0.0 + Profiler choice: rocprofv1 + Path: /home/auser/repos/rocprofiler-compute/sample/workloads/vcopy/MI200 + Target: MI200 + Command: ./vcopy -n 1048576 -b 256 + Kernel Selection: None + Dispatch Selection: ['0'] + Set Selection: compute_thruput_util + Report Sections: ['11.2.3', '11.2.4', '11.2.6', '11.2.7', '11.2.9'] + + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Collecting Performance Counters + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ... + + +To see a list of available sets, use the ``--list-sets`` option. + +.. code-block:: shell-session + + $ rocprof-compute profile --list-sets + + __ _ + _ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___ + | '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \ + | | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/ + |_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___| + |_| |_| + + Available Sets: + =================================================================================================================== + Set Option Set Title Metric Name Metric ID + ------------------------------------------------------------------------------------------------------------------- + compute_thruput_util Compute Throughput Utilization SALU Utilization 11.2.3 + VALU Utilization 11.2.4 + VMEM Utilization 11.2.6 + Branch Utilization 11.2.7 + + ... + + launch_stats Launch Stats Grid Size 7.1.0 + Workgroup Size 7.1.1 + Total Wavefronts 7.1.2 + VGPRs 7.1.5 + AGPRs 7.1.6 + SGPRs 7.1.7 + LDS Allocation 7.1.8 + Scratch Allocation 7.1.9 + + Usage Examples: + rocprof-compute profile --set compute_thruput_util # Profile this set + rocprof-compute profile --list-sets # Show this help + + .. _standalone-roofline: Standalone roofline