Files
vedithal-amd 354fe5f52c Unified configuration for metrics (#726)
* Show description of metrics during analysis
    * Use --include-cols Description show the Description column in analyze mode (this is hidden by default)
    * Remove tips field from analysis config

* Align metric names in analysis config and documentation

* Add unified config utils/unified_config.yaml

* Add python script utils/split_config.py to auto generate analysis configuration and documentation metrics description
   * Add test case to ensure unified config is older than auto-generated config
   * Auto generate analysis config and documentation metrics description

* Update CONTRIBUTING.md to add instructions to build documentation assets
    * Add docker image and compose file to build documentation

* Update CHANGELOG and Documentation

* Use jinja template instead of hardcoding metric tables in documentation

[ROCm/rocprofiler-compute commit: bb44e90b2d]
2025-07-25 14:01:34 -04:00

59 řádky
2.2 KiB
ReStructuredText
Surový Trvalý odkaz Blame Historie

Tento soubor obsahuje nejednoznačné znaky Unicode
Tento soubor obsahuje znaky Unicode, které mohou být zaměněny s jinými znaky. Pokud si myslíte, že je to záměrné, můžete toto varování bezpečně ignorovat. Použijte tlačítko Escape sekvence k jejich zobrazení.
.. meta::
:description: ROCm Compute Profiler performance model: Command processor (CP)
:keywords: Omniperf, ROCm Compute Profiler, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC
**********************
Command processor (CP)
**********************
The command processor (CP) is responsible for interacting with the AMDGPU kernel
driver -- the Linux kernel -- on the CPU and for interacting with user-space
HSA clients when they submit commands to HSA queues. Basic tasks of the CP
include reading commands (such as, corresponding to a kernel launch) out of
:hsa-runtime-pdf:`HSA queues <68>`, scheduling work to subsequent parts of the
scheduler pipeline, and marking kernels complete for synchronization events on
the host.
The command processor consists of two sub-components:
* :ref:`Fetcher <cpf-metrics>` (CPF): Fetches commands out of memory to hand
them over to the CPC for processing.
* :ref:`Packet processor <cpc-metrics>` (CPC): Micro-controller running the
command processing firmware that decodes the fetched commands and (for
kernels) passes them to the :ref:`workgroup processors <desc-spi>` for
scheduling.
Before scheduling work to the accelerator, the command processor can
first acquire a memory fence to ensure system consistency
(:hsa-runtime-pdf:`Section 2.6.4 <91>`). After the work is complete, the
command processor can apply a memory-release fence. Depending on the AMD CDNA™
accelerator under question, either of these operations *might* initiate a cache
write-back or invalidation.
Analyzing command processor performance is most interesting for kernels
that you suspect to be limited by scheduling or launch rate. The command
processors metrics therefore are focused on reporting, for example:
* Utilization of the fetcher
* Utilization of the packet processor, and decoding processing packets
* Stalls in fetching and processing
.. _cpf-metrics:
Command processor fetcher (CPF)
===============================
.. jinja:: cpf-metrics
:file: _templates/metrics_table.j2
.. _cpc-metrics:
Command processor packet processor (CPC)
========================================
.. jinja:: cpc-metrics
:file: _templates/metrics_table.j2