a0dc485ceb
* pip-compile docs/requirements.txt Signed-off-by: Peter Jun Park <peter.park@amd.com> Add Sphinx docs config Signed-off-by: Peter Jun Park <peter.park@amd.com> Add Sphinx config Signed-off-by: Peter Jun Park <peter.park@amd.com> Update docs build config Signed-off-by: Peter Jun Park <peter.park@amd.com> * style(conf.py): Apply black formatting to docs/conf.py Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> * Update docs requirements Signed-off-by: Peter Jun Park <peter.park@amd.com> Update to rocm-docs-core 1.3.0 Signed-off-by: Peter Jun Park <peter.park@amd.com> Update docs requirements Signed-off-by: Peter Jun Park <peter.park@amd.com> pip-compile requirements Signed-off-by: Peter Jun Park <peter.park@amd.com> bump rocm-docs-core to 1.5.0 bump rocm-docs-core to 1.4.1 Signed-off-by: Peter Jun Park <peter.park@amd.com> * Add dependabot.yml and update CODEOWNERS Signed-off-by: Peter Jun Park <peter.park@amd.com> Update toc and conf Signed-off-by: Peter Jun Park <peter.park@amd.com> update dependabot * Port docs to rocm-docs standard Signed-off-by: Peter Jun Park <peter.park@amd.com> Add toc and Diataxis cards Signed-off-by: Peter Jun Park <peter.park@amd.com> Add basic file structure Signed-off-by: Peter Jun Park <peter.park@amd.com> add glossary Signed-off-by: Peter Jun Park <peter.park@amd.com> add includes Signed-off-by: Peter Jun Park <peter.park@amd.com> Add license.rst Signed-off-by: Peter Jun Park <peter.park@amd.com> add compatible hw Signed-off-by: Peter Jun Park <peter.park@amd.com> fix spelling and license Signed-off-by: Peter Jun Park <peter.park@amd.com> clean up index Signed-off-by: Peter Jun Park <peter.park@amd.com> clean up installation guides Signed-off-by: Peter Jun Park <peter.park@amd.com> add basic usage (quickstart) Signed-off-by: Peter Jun Park <peter.park@amd.com> add ref to global options update toc Signed-off-by: Peter Jun Park <peter.park@amd.com> modularize modes and global options Signed-off-by: Peter Jun Park <peter.park@amd.com> add profile mode Signed-off-by: Peter Jun Park <peter.park@amd.com> fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> reorg and clean up Signed-off-by: Peter Jun Park <peter.park@amd.com> add dynamic omniperf version number in installation guide Signed-off-by: Peter Jun Park <peter.park@amd.com> add datatemplate more reorg Signed-off-by: Peter Jun Park <peter.park@amd.com> clean up Signed-off-by: Peter Jun Park <peter.park@amd.com> reorg images move profile mode reorg reorg reorg more fix formatting fix headings ref anchor mi2xx note add extlinks add extlinks Signed-off-by: Peter Jun Park <peter.park@amd.com> black format fix formatting, anchors Signed-off-by: Peter Jun Park <peter.park@amd.com> reorg fix words and formatting Signed-off-by: Peter Jun Park <peter.park@amd.com> formatting Signed-off-by: Peter Jun Park <peter.park@amd.com> same reorg format fix formatting fix toc Signed-off-by: Peter Jun Park <peter.park@amd.com> format * impr internal linking and fix sphinx warnings Signed-off-by: Peter Jun Park <peter.park@amd.com> * add spellcheck/linting from rocm-docs-core Signed-off-by: Peter Jun Park <peter.park@amd.com> fix rst directives satisfy spellcheck fix more spelling rm unused files fix spelling and update wordlist * bump rocm-docs-core to 1.6.0 Signed-off-by: Peter Jun Park <peter.park@amd.com> * add fixes from @skyreflectedinmirrors and @lpaoletti Signed-off-by: Peter Jun Park <peter.park@amd.com> add references to toc Signed-off-by: Peter Jun Park <peter.park@amd.com> add more fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> * add package manager install section Signed-off-by: Peter Jun Park <peter.park@amd.com> * add fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> add metadata and fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> add fixes bump to 1.6.1 more fixes fix fmt in profiling examples Signed-off-by: Peter Jun Park <peter.park@amd.com> add missing mem type table Signed-off-by: Peter Jun Park <peter.park@amd.com> fix formatting fmt * add custom css Signed-off-by: Peter Jun Park <peter.park@amd.com> fix css fs * make images/figs click-to-expand Signed-off-by: Peter Jun Park <peter.park@amd.com> add missed image update fix link * update documentation link in README Signed-off-by: Peter Jun Park <peter.park@amd.com> * formatting fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> more formatting * fix heading Signed-off-by: Peter Jun Park <peter.park@amd.com> * move archived docs Signed-off-by: Peter Jun Park <peter.park@amd.com> * exclude archived docs from docs build Signed-off-by: Peter Jun Park <peter.park@amd.com> * update archived docs workflow Signed-off-by: Peter Jun Park <peter.park@amd.com> move files update archived docs workflow Signed-off-by: Peter Jun Park <peter.park@amd.com> fix version number clean up workflow workflow test workflow test another workflow test * rm docs linting Signed-off-by: Peter Jun Park <peter.park@amd.com> * Apply cmake-format suggested changes Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> * Apply cmake-format Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> --------- Signed-off-by: Peter Jun Park <peter.park@amd.com> Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
155 wiersze
4.2 KiB
ReStructuredText
155 wiersze
4.2 KiB
ReStructuredText
.. meta::
|
||
:description: Omniperf performance model: Command processor (CP)
|
||
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC
|
||
|
||
**********************
|
||
Command processor (CP)
|
||
**********************
|
||
|
||
The command processor (CP) is responsible for interacting with the AMDGPU kernel
|
||
driver -- the Linux kernel -- on the CPU and for interacting with user-space
|
||
HSA clients when they submit commands to HSA queues. Basic tasks of the CP
|
||
include reading commands (such as, corresponding to a kernel launch) out of
|
||
:hsa-runtime-pdf:`HSA queues <68>`, scheduling work to subsequent parts of the
|
||
scheduler pipeline, and marking kernels complete for synchronization events on
|
||
the host.
|
||
|
||
The command processor consists of two sub-components:
|
||
|
||
* :ref:`Fetcher <cpf-metrics>` (CPF): Fetches commands out of memory to hand
|
||
them over to the CPC for processing.
|
||
|
||
* :ref:`Packet processor <cpc-metrics>` (CPC): Micro-controller running the
|
||
command processing firmware that decodes the fetched commands and (for
|
||
kernels) passes them to the :ref:`workgroup processors <desc-spi>` for
|
||
scheduling.
|
||
|
||
Before scheduling work to the accelerator, the command processor can
|
||
first acquire a memory fence to ensure system consistency
|
||
(:hsa-runtime-pdf:`Section 2.6.4 <91>`). After the work is complete, the
|
||
command processor can apply a memory-release fence. Depending on the AMD CDNA™
|
||
accelerator under question, either of these operations *might* initiate a cache
|
||
write-back or invalidation.
|
||
|
||
Analyzing command processor performance is most interesting for kernels
|
||
that you suspect to be limited by scheduling or launch rate. The command
|
||
processor’s metrics therefore are focused on reporting, for example:
|
||
|
||
* Utilization of the fetcher
|
||
|
||
* Utilization of the packet processor, and decoding processing packets
|
||
|
||
* Stalls in fetching and processing
|
||
|
||
.. _cpf-metrics:
|
||
|
||
Command processor fetcher (CPF)
|
||
===============================
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
|
||
* - Metric
|
||
|
||
- Description
|
||
|
||
- Unit
|
||
|
||
* - CPF Utilization
|
||
|
||
- Percent of total cycles where the CPF was busy actively doing any work.
|
||
The ratio of CPF busy cycles over total cycles counted by the CPF.
|
||
|
||
- Percent
|
||
|
||
* - CPF Stall
|
||
|
||
- Percent of CPF busy cycles where the CPF was stalled for any reason.
|
||
|
||
- Percent
|
||
|
||
* - CPF-L2 Utilization
|
||
|
||
- Percent of total cycles counted by the CPF-:doc:`L2 <l2-cache>` interface
|
||
where the CPF-L2 interface was active doing any work. The ratio of CPF-L2
|
||
busy cycles over total cycles counted by the CPF-L2.
|
||
|
||
- Percent
|
||
|
||
* - CPF-L2 Stall
|
||
|
||
- Percent of CPF-:doc:`L2 <l2-cache>` L2 busy cycles where the CPF-L2
|
||
interface was stalled for any reason.
|
||
|
||
- Percent
|
||
|
||
* - CPF-UTCL1 Stall
|
||
|
||
- Percent of CPF busy cycles where the CPF was stalled by address
|
||
translation.
|
||
|
||
- Percent
|
||
|
||
.. _cpc-metrics:
|
||
|
||
Command processor packet processor (CPC)
|
||
========================================
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
|
||
* - Metric
|
||
|
||
- Description
|
||
|
||
- Unit
|
||
|
||
* - CPC Utilization
|
||
|
||
- Percent of total cycles where the CPC was busy actively doing any work.
|
||
The ratio of CPC busy cycles over total cycles counted by the CPC.
|
||
|
||
- Percent
|
||
|
||
* - CPC Stall
|
||
|
||
- Percent of CPC busy cycles where the CPC was stalled for any reason.
|
||
|
||
- Percent
|
||
|
||
* - CPC Packet Decoding Utilization
|
||
|
||
- Percent of CPC busy cycles spent decoding commands for processing.
|
||
|
||
- Percent
|
||
|
||
* - CPC-Workgroup Manager Utilization
|
||
|
||
- Percent of CPC busy cycles spent dispatching workgroups to the
|
||
:ref:`workgroup manager <desc-spi>`.
|
||
|
||
- Percent
|
||
|
||
* - CPC-L2 Utilization
|
||
|
||
- Percent of total cycles counted by the CPC-:doc:`L2 <l2-cache>` interface
|
||
where the CPC-L2 interface was active doing any work.
|
||
|
||
- Percent
|
||
|
||
* - CPC-UTCL1 Stall
|
||
|
||
- Percent of CPC busy cycles where the CPC was stalled by address
|
||
translation.
|
||
|
||
- Percent
|
||
|
||
* - CPC-UTCL2 Utilization
|
||
|
||
- Percent of total cycles counted by the CPC's :doc:`L2 <l2-cache>` address
|
||
translation interface where the CPC was busy doing address translation
|
||
work.
|
||
|
||
- Percent
|
||
|