Files
rocm-systems/projects/rocprofiler-compute/docs/tutorial/includes/valu-arithmetic-instruction-mix.rst
T
Peter Park 5d22d5ac8e Docs: refactor and integrate into ROCm docs portal (#362)
* pip-compile docs/requirements.txt

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add Sphinx docs config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add Sphinx config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update docs build config

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* style(conf.py): Apply black formatting to docs/conf.py

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

* Update docs requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update to rocm-docs-core 1.3.0

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update docs requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

pip-compile requirements

Signed-off-by: Peter Jun Park <peter.park@amd.com>

bump rocm-docs-core to 1.5.0

bump rocm-docs-core to 1.4.1

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* Add dependabot.yml and update CODEOWNERS

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Update toc and conf

Signed-off-by: Peter Jun Park <peter.park@amd.com>

update dependabot

* Port docs to rocm-docs standard

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add toc and Diataxis cards

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add basic file structure

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add glossary

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add includes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

Add license.rst

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add compatible hw

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix spelling and license

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up index

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up installation guides

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add basic usage (quickstart)

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add ref to global options

update toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

modularize modes and global options

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add profile mode

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg and clean up

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add dynamic omniperf version number in installation guide

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add datatemplate

more reorg

Signed-off-by: Peter Jun Park <peter.park@amd.com>

clean up

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg images

move profile mode

reorg

reorg

reorg more

fix formatting

fix headings

ref anchor mi2xx note

add extlinks

add extlinks

Signed-off-by: Peter Jun Park <peter.park@amd.com>

black format

fix formatting, anchors

Signed-off-by: Peter Jun Park <peter.park@amd.com>

reorg

fix words and formatting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

formatting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

same

reorg

format

fix formatting

fix toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

format

* impr internal linking and fix sphinx warnings

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add spellcheck/linting from rocm-docs-core

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix rst directives

satisfy spellcheck

fix more spelling

rm unused files

fix spelling and update wordlist

* bump rocm-docs-core to 1.6.0

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add fixes from @skyreflectedinmirrors and @lpaoletti

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add references to toc

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add more fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add package manager install section

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* add fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add metadata and fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add fixes

bump to 1.6.1

more fixes

fix fmt in profiling examples

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add missing mem type table

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix formatting

fmt

* add custom css

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix css fs

* make images/figs click-to-expand

Signed-off-by: Peter Jun Park <peter.park@amd.com>

add missed image

update

fix link

* update documentation link in README

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* formatting fixes

Signed-off-by: Peter Jun Park <peter.park@amd.com>

more formatting

* fix heading

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* move archived docs

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* exclude archived docs from docs build

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* update archived docs workflow

Signed-off-by: Peter Jun Park <peter.park@amd.com>

move files

update archived docs workflow

Signed-off-by: Peter Jun Park <peter.park@amd.com>

fix version number

clean up workflow

workflow test

workflow test

another workflow test

* rm docs linting

Signed-off-by: Peter Jun Park <peter.park@amd.com>

* Apply cmake-format suggested changes

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

* Apply cmake-format

Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

---------

Signed-off-by: Peter Jun Park <peter.park@amd.com>
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>

[ROCm/rocprofiler-compute commit: a0dc485ceb]
2024-08-09 09:46:42 -04:00

114 řádky
5.7 KiB
ReStructuredText

.. _valu-arith-instruction-mix-ex:
VALU arithmetic instruction mix
===============================
For this example, consider the
:dev-sample:`instruction mix sample <instmix.hip>` distributed as a part
of Omniperf.
.. note::
The examples in the section are expected to work on all CDNA™ accelerators.
However, the actual experiment results in this section were collected on an
:ref:`MI2XX <mixxx-note>` accelerator.
.. _valu-experiment-design:
Design note
-----------
This code uses a number of inline assembly instructions to cleanly
identify the types of instructions being issued, as well as to avoid
optimization / dead-code elimination by the compiler. While inline
assembly is inherently not portable, this example is expected to work on
all GCN™ GPUs and CDNA accelerators.
We reproduce a sample of the kernel as follows:
.. code-block:: cpp
// fp32: add, mul, transcendental and fma
float f1, f2;
asm volatile(
"v_add_f32_e32 %0, %1, %0\n"
"v_mul_f32_e32 %0, %1, %0\n"
"v_sqrt_f32 %0, %1\n"
"v_fma_f32 %0, %1, %0, %1\n"
: "=v"(f1)
: "v"(f2));
These instructions correspond to:
* A 32-bit floating point addition,
* a 32-bit floating point multiplication,
* a 32-bit floating point square-root transcendental operation, and
* a 32-bit floating point fused multiply-add operation.
For more detail, refer to the `CDNA2 ISA
Guide <https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf>`__.
Instruction mix
^^^^^^^^^^^^^^^
This example was compiled and run on a MI250 accelerator using ROCm
v5.6.0, and Omniperf v2.0.0.
.. code-block:: shell
$ hipcc -O3 instmix.hip -o instmix
Generate the profile for this example using the following command.
.. code-block:: shell
$ omniperf profile -n instmix --no-roof -- ./instmix
Analyze the instruction mix section.
.. code-block:: shell
$ omniperf analyze -p workloads/instmix/mi200/ -b 10.2
<...>
10. Compute Units - Instruction Mix
10.2 VALU Arithmetic Instr Mix
╒═════════╤════════════╤═════════╤════════════════╕
│ Index │ Metric │ Count │ Unit │
╞═════════╪════════════╪═════════╪════════════════╡
│ 10.2.0 │ INT32 │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.1 │ INT64 │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.2 │ F16-ADD │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.3 │ F16-MUL │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.4 │ F16-FMA │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.5 │ F16-Trans │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.6 │ F32-ADD │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.7 │ F32-MUL │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.8 │ F32-FMA │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.9 │ F32-Trans │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.10 │ F64-ADD │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.11 │ F64-MUL │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.12 │ F64-FMA │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.13 │ F64-Trans │ 1.00 │ Instr per wave │
├─────────┼────────────┼─────────┼────────────────┤
│ 10.2.14 │ Conversion │ 1.00 │ Instr per wave │
╘═════════╧════════════╧═════════╧════════════════╛
This shows that we have exactly one of each type of VALU arithmetic instruction
by construction.