a0dc485ceb
* pip-compile docs/requirements.txt Signed-off-by: Peter Jun Park <peter.park@amd.com> Add Sphinx docs config Signed-off-by: Peter Jun Park <peter.park@amd.com> Add Sphinx config Signed-off-by: Peter Jun Park <peter.park@amd.com> Update docs build config Signed-off-by: Peter Jun Park <peter.park@amd.com> * style(conf.py): Apply black formatting to docs/conf.py Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> * Update docs requirements Signed-off-by: Peter Jun Park <peter.park@amd.com> Update to rocm-docs-core 1.3.0 Signed-off-by: Peter Jun Park <peter.park@amd.com> Update docs requirements Signed-off-by: Peter Jun Park <peter.park@amd.com> pip-compile requirements Signed-off-by: Peter Jun Park <peter.park@amd.com> bump rocm-docs-core to 1.5.0 bump rocm-docs-core to 1.4.1 Signed-off-by: Peter Jun Park <peter.park@amd.com> * Add dependabot.yml and update CODEOWNERS Signed-off-by: Peter Jun Park <peter.park@amd.com> Update toc and conf Signed-off-by: Peter Jun Park <peter.park@amd.com> update dependabot * Port docs to rocm-docs standard Signed-off-by: Peter Jun Park <peter.park@amd.com> Add toc and Diataxis cards Signed-off-by: Peter Jun Park <peter.park@amd.com> Add basic file structure Signed-off-by: Peter Jun Park <peter.park@amd.com> add glossary Signed-off-by: Peter Jun Park <peter.park@amd.com> add includes Signed-off-by: Peter Jun Park <peter.park@amd.com> Add license.rst Signed-off-by: Peter Jun Park <peter.park@amd.com> add compatible hw Signed-off-by: Peter Jun Park <peter.park@amd.com> fix spelling and license Signed-off-by: Peter Jun Park <peter.park@amd.com> clean up index Signed-off-by: Peter Jun Park <peter.park@amd.com> clean up installation guides Signed-off-by: Peter Jun Park <peter.park@amd.com> add basic usage (quickstart) Signed-off-by: Peter Jun Park <peter.park@amd.com> add ref to global options update toc Signed-off-by: Peter Jun Park <peter.park@amd.com> modularize modes and global options Signed-off-by: Peter Jun Park <peter.park@amd.com> add profile mode Signed-off-by: Peter Jun Park <peter.park@amd.com> fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> reorg and clean up Signed-off-by: Peter Jun Park <peter.park@amd.com> add dynamic omniperf version number in installation guide Signed-off-by: Peter Jun Park <peter.park@amd.com> add datatemplate more reorg Signed-off-by: Peter Jun Park <peter.park@amd.com> clean up Signed-off-by: Peter Jun Park <peter.park@amd.com> reorg images move profile mode reorg reorg reorg more fix formatting fix headings ref anchor mi2xx note add extlinks add extlinks Signed-off-by: Peter Jun Park <peter.park@amd.com> black format fix formatting, anchors Signed-off-by: Peter Jun Park <peter.park@amd.com> reorg fix words and formatting Signed-off-by: Peter Jun Park <peter.park@amd.com> formatting Signed-off-by: Peter Jun Park <peter.park@amd.com> same reorg format fix formatting fix toc Signed-off-by: Peter Jun Park <peter.park@amd.com> format * impr internal linking and fix sphinx warnings Signed-off-by: Peter Jun Park <peter.park@amd.com> * add spellcheck/linting from rocm-docs-core Signed-off-by: Peter Jun Park <peter.park@amd.com> fix rst directives satisfy spellcheck fix more spelling rm unused files fix spelling and update wordlist * bump rocm-docs-core to 1.6.0 Signed-off-by: Peter Jun Park <peter.park@amd.com> * add fixes from @skyreflectedinmirrors and @lpaoletti Signed-off-by: Peter Jun Park <peter.park@amd.com> add references to toc Signed-off-by: Peter Jun Park <peter.park@amd.com> add more fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> * add package manager install section Signed-off-by: Peter Jun Park <peter.park@amd.com> * add fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> add metadata and fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> add fixes bump to 1.6.1 more fixes fix fmt in profiling examples Signed-off-by: Peter Jun Park <peter.park@amd.com> add missing mem type table Signed-off-by: Peter Jun Park <peter.park@amd.com> fix formatting fmt * add custom css Signed-off-by: Peter Jun Park <peter.park@amd.com> fix css fs * make images/figs click-to-expand Signed-off-by: Peter Jun Park <peter.park@amd.com> add missed image update fix link * update documentation link in README Signed-off-by: Peter Jun Park <peter.park@amd.com> * formatting fixes Signed-off-by: Peter Jun Park <peter.park@amd.com> more formatting * fix heading Signed-off-by: Peter Jun Park <peter.park@amd.com> * move archived docs Signed-off-by: Peter Jun Park <peter.park@amd.com> * exclude archived docs from docs build Signed-off-by: Peter Jun Park <peter.park@amd.com> * update archived docs workflow Signed-off-by: Peter Jun Park <peter.park@amd.com> move files update archived docs workflow Signed-off-by: Peter Jun Park <peter.park@amd.com> fix version number clean up workflow workflow test workflow test another workflow test * rm docs linting Signed-off-by: Peter Jun Park <peter.park@amd.com> * Apply cmake-format suggested changes Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> * Apply cmake-format Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> --------- Signed-off-by: Peter Jun Park <peter.park@amd.com> Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com> Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
114 lines
5.7 KiB
ReStructuredText
114 lines
5.7 KiB
ReStructuredText
.. _valu-arith-instruction-mix-ex:
|
|
|
|
VALU arithmetic instruction mix
|
|
===============================
|
|
|
|
For this example, consider the
|
|
:dev-sample:`instruction mix sample <instmix.hip>` distributed as a part
|
|
of Omniperf.
|
|
|
|
.. note::
|
|
|
|
The examples in the section are expected to work on all CDNA™ accelerators.
|
|
However, the actual experiment results in this section were collected on an
|
|
:ref:`MI2XX <mixxx-note>` accelerator.
|
|
|
|
.. _valu-experiment-design:
|
|
|
|
Design note
|
|
-----------
|
|
|
|
This code uses a number of inline assembly instructions to cleanly
|
|
identify the types of instructions being issued, as well as to avoid
|
|
optimization / dead-code elimination by the compiler. While inline
|
|
assembly is inherently not portable, this example is expected to work on
|
|
all GCN™ GPUs and CDNA accelerators.
|
|
|
|
We reproduce a sample of the kernel as follows:
|
|
|
|
.. code-block:: cpp
|
|
|
|
// fp32: add, mul, transcendental and fma
|
|
float f1, f2;
|
|
asm volatile(
|
|
"v_add_f32_e32 %0, %1, %0\n"
|
|
"v_mul_f32_e32 %0, %1, %0\n"
|
|
"v_sqrt_f32 %0, %1\n"
|
|
"v_fma_f32 %0, %1, %0, %1\n"
|
|
: "=v"(f1)
|
|
: "v"(f2));
|
|
|
|
These instructions correspond to:
|
|
|
|
* A 32-bit floating point addition,
|
|
|
|
* a 32-bit floating point multiplication,
|
|
|
|
* a 32-bit floating point square-root transcendental operation, and
|
|
|
|
* a 32-bit floating point fused multiply-add operation.
|
|
|
|
For more detail, refer to the `CDNA2 ISA
|
|
Guide <https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf>`__.
|
|
|
|
Instruction mix
|
|
^^^^^^^^^^^^^^^
|
|
|
|
This example was compiled and run on a MI250 accelerator using ROCm
|
|
v5.6.0, and Omniperf v2.0.0.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ hipcc -O3 instmix.hip -o instmix
|
|
|
|
Generate the profile for this example using the following command.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ omniperf profile -n instmix --no-roof -- ./instmix
|
|
|
|
Analyze the instruction mix section.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ omniperf analyze -p workloads/instmix/mi200/ -b 10.2
|
|
<...>
|
|
10. Compute Units - Instruction Mix
|
|
10.2 VALU Arithmetic Instr Mix
|
|
╒═════════╤════════════╤═════════╤════════════════╕
|
|
│ Index │ Metric │ Count │ Unit │
|
|
╞═════════╪════════════╪═════════╪════════════════╡
|
|
│ 10.2.0 │ INT32 │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.1 │ INT64 │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.2 │ F16-ADD │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.3 │ F16-MUL │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.4 │ F16-FMA │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.5 │ F16-Trans │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.6 │ F32-ADD │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.7 │ F32-MUL │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.8 │ F32-FMA │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.9 │ F32-Trans │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.10 │ F64-ADD │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.11 │ F64-MUL │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.12 │ F64-FMA │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.13 │ F64-Trans │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.14 │ Conversion │ 1.00 │ Instr per wave │
|
|
╘═════════╧════════════╧═════════╧════════════════╛
|
|
|
|
This shows that we have exactly one of each type of VALU arithmetic instruction
|
|
by construction.
|