5d22d5ac8e
* pip-compile docs/requirements.txt
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add Sphinx docs config
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add Sphinx config
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update docs build config
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* style(conf.py): Apply black formatting to docs/conf.py
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Update docs requirements
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update to rocm-docs-core 1.3.0
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update docs requirements
Signed-off-by: Peter Jun Park <peter.park@amd.com>
pip-compile requirements
Signed-off-by: Peter Jun Park <peter.park@amd.com>
bump rocm-docs-core to 1.5.0
bump rocm-docs-core to 1.4.1
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* Add dependabot.yml and update CODEOWNERS
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update toc and conf
Signed-off-by: Peter Jun Park <peter.park@amd.com>
update dependabot
* Port docs to rocm-docs standard
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add toc and Diataxis cards
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add basic file structure
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add glossary
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add includes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add license.rst
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add compatible hw
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix spelling and license
Signed-off-by: Peter Jun Park <peter.park@amd.com>
clean up index
Signed-off-by: Peter Jun Park <peter.park@amd.com>
clean up installation guides
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add basic usage (quickstart)
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add ref to global options
update toc
Signed-off-by: Peter Jun Park <peter.park@amd.com>
modularize modes and global options
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add profile mode
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
reorg and clean up
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add dynamic omniperf version number in installation guide
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add datatemplate
more reorg
Signed-off-by: Peter Jun Park <peter.park@amd.com>
clean up
Signed-off-by: Peter Jun Park <peter.park@amd.com>
reorg images
move profile mode
reorg
reorg
reorg more
fix formatting
fix headings
ref anchor mi2xx note
add extlinks
add extlinks
Signed-off-by: Peter Jun Park <peter.park@amd.com>
black format
fix formatting, anchors
Signed-off-by: Peter Jun Park <peter.park@amd.com>
reorg
fix words and formatting
Signed-off-by: Peter Jun Park <peter.park@amd.com>
formatting
Signed-off-by: Peter Jun Park <peter.park@amd.com>
same
reorg
format
fix formatting
fix toc
Signed-off-by: Peter Jun Park <peter.park@amd.com>
format
* impr internal linking and fix sphinx warnings
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add spellcheck/linting from rocm-docs-core
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix rst directives
satisfy spellcheck
fix more spelling
rm unused files
fix spelling and update wordlist
* bump rocm-docs-core to 1.6.0
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add fixes from @skyreflectedinmirrors and @lpaoletti
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add references to toc
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add more fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add package manager install section
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add metadata and fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add fixes
bump to 1.6.1
more fixes
fix fmt in profiling examples
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add missing mem type table
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix formatting
fmt
* add custom css
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix css fs
* make images/figs click-to-expand
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add missed image
update
fix link
* update documentation link in README
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* formatting fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
more formatting
* fix heading
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* move archived docs
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* exclude archived docs from docs build
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* update archived docs workflow
Signed-off-by: Peter Jun Park <peter.park@amd.com>
move files
update archived docs workflow
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix version number
clean up workflow
workflow test
workflow test
another workflow test
* rm docs linting
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* Apply cmake-format suggested changes
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Apply cmake-format
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
---------
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
[ROCm/rocprofiler-compute commit: a0dc485ceb]
155 строки
4.2 KiB
ReStructuredText
155 строки
4.2 KiB
ReStructuredText
.. meta::
|
||
:description: Omniperf performance model: Command processor (CP)
|
||
:keywords: Omniperf, ROCm, profiler, tool, Instinct, accelerator, command, processor, fetcher, packet processor, CPF, CPC
|
||
|
||
**********************
|
||
Command processor (CP)
|
||
**********************
|
||
|
||
The command processor (CP) is responsible for interacting with the AMDGPU kernel
|
||
driver -- the Linux kernel -- on the CPU and for interacting with user-space
|
||
HSA clients when they submit commands to HSA queues. Basic tasks of the CP
|
||
include reading commands (such as, corresponding to a kernel launch) out of
|
||
:hsa-runtime-pdf:`HSA queues <68>`, scheduling work to subsequent parts of the
|
||
scheduler pipeline, and marking kernels complete for synchronization events on
|
||
the host.
|
||
|
||
The command processor consists of two sub-components:
|
||
|
||
* :ref:`Fetcher <cpf-metrics>` (CPF): Fetches commands out of memory to hand
|
||
them over to the CPC for processing.
|
||
|
||
* :ref:`Packet processor <cpc-metrics>` (CPC): Micro-controller running the
|
||
command processing firmware that decodes the fetched commands and (for
|
||
kernels) passes them to the :ref:`workgroup processors <desc-spi>` for
|
||
scheduling.
|
||
|
||
Before scheduling work to the accelerator, the command processor can
|
||
first acquire a memory fence to ensure system consistency
|
||
(:hsa-runtime-pdf:`Section 2.6.4 <91>`). After the work is complete, the
|
||
command processor can apply a memory-release fence. Depending on the AMD CDNA™
|
||
accelerator under question, either of these operations *might* initiate a cache
|
||
write-back or invalidation.
|
||
|
||
Analyzing command processor performance is most interesting for kernels
|
||
that you suspect to be limited by scheduling or launch rate. The command
|
||
processor’s metrics therefore are focused on reporting, for example:
|
||
|
||
* Utilization of the fetcher
|
||
|
||
* Utilization of the packet processor, and decoding processing packets
|
||
|
||
* Stalls in fetching and processing
|
||
|
||
.. _cpf-metrics:
|
||
|
||
Command processor fetcher (CPF)
|
||
===============================
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
|
||
* - Metric
|
||
|
||
- Description
|
||
|
||
- Unit
|
||
|
||
* - CPF Utilization
|
||
|
||
- Percent of total cycles where the CPF was busy actively doing any work.
|
||
The ratio of CPF busy cycles over total cycles counted by the CPF.
|
||
|
||
- Percent
|
||
|
||
* - CPF Stall
|
||
|
||
- Percent of CPF busy cycles where the CPF was stalled for any reason.
|
||
|
||
- Percent
|
||
|
||
* - CPF-L2 Utilization
|
||
|
||
- Percent of total cycles counted by the CPF-:doc:`L2 <l2-cache>` interface
|
||
where the CPF-L2 interface was active doing any work. The ratio of CPF-L2
|
||
busy cycles over total cycles counted by the CPF-L2.
|
||
|
||
- Percent
|
||
|
||
* - CPF-L2 Stall
|
||
|
||
- Percent of CPF-:doc:`L2 <l2-cache>` L2 busy cycles where the CPF-L2
|
||
interface was stalled for any reason.
|
||
|
||
- Percent
|
||
|
||
* - CPF-UTCL1 Stall
|
||
|
||
- Percent of CPF busy cycles where the CPF was stalled by address
|
||
translation.
|
||
|
||
- Percent
|
||
|
||
.. _cpc-metrics:
|
||
|
||
Command processor packet processor (CPC)
|
||
========================================
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
|
||
* - Metric
|
||
|
||
- Description
|
||
|
||
- Unit
|
||
|
||
* - CPC Utilization
|
||
|
||
- Percent of total cycles where the CPC was busy actively doing any work.
|
||
The ratio of CPC busy cycles over total cycles counted by the CPC.
|
||
|
||
- Percent
|
||
|
||
* - CPC Stall
|
||
|
||
- Percent of CPC busy cycles where the CPC was stalled for any reason.
|
||
|
||
- Percent
|
||
|
||
* - CPC Packet Decoding Utilization
|
||
|
||
- Percent of CPC busy cycles spent decoding commands for processing.
|
||
|
||
- Percent
|
||
|
||
* - CPC-Workgroup Manager Utilization
|
||
|
||
- Percent of CPC busy cycles spent dispatching workgroups to the
|
||
:ref:`workgroup manager <desc-spi>`.
|
||
|
||
- Percent
|
||
|
||
* - CPC-L2 Utilization
|
||
|
||
- Percent of total cycles counted by the CPC-:doc:`L2 <l2-cache>` interface
|
||
where the CPC-L2 interface was active doing any work.
|
||
|
||
- Percent
|
||
|
||
* - CPC-UTCL1 Stall
|
||
|
||
- Percent of CPC busy cycles where the CPC was stalled by address
|
||
translation.
|
||
|
||
- Percent
|
||
|
||
* - CPC-UTCL2 Utilization
|
||
|
||
- Percent of total cycles counted by the CPC's :doc:`L2 <l2-cache>` address
|
||
translation interface where the CPC was busy doing address translation
|
||
work.
|
||
|
||
- Percent
|
||
|