5d22d5ac8e
* pip-compile docs/requirements.txt
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add Sphinx docs config
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add Sphinx config
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update docs build config
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* style(conf.py): Apply black formatting to docs/conf.py
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Update docs requirements
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update to rocm-docs-core 1.3.0
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update docs requirements
Signed-off-by: Peter Jun Park <peter.park@amd.com>
pip-compile requirements
Signed-off-by: Peter Jun Park <peter.park@amd.com>
bump rocm-docs-core to 1.5.0
bump rocm-docs-core to 1.4.1
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* Add dependabot.yml and update CODEOWNERS
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Update toc and conf
Signed-off-by: Peter Jun Park <peter.park@amd.com>
update dependabot
* Port docs to rocm-docs standard
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add toc and Diataxis cards
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add basic file structure
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add glossary
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add includes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Add license.rst
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add compatible hw
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix spelling and license
Signed-off-by: Peter Jun Park <peter.park@amd.com>
clean up index
Signed-off-by: Peter Jun Park <peter.park@amd.com>
clean up installation guides
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add basic usage (quickstart)
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add ref to global options
update toc
Signed-off-by: Peter Jun Park <peter.park@amd.com>
modularize modes and global options
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add profile mode
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
reorg and clean up
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add dynamic omniperf version number in installation guide
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add datatemplate
more reorg
Signed-off-by: Peter Jun Park <peter.park@amd.com>
clean up
Signed-off-by: Peter Jun Park <peter.park@amd.com>
reorg images
move profile mode
reorg
reorg
reorg more
fix formatting
fix headings
ref anchor mi2xx note
add extlinks
add extlinks
Signed-off-by: Peter Jun Park <peter.park@amd.com>
black format
fix formatting, anchors
Signed-off-by: Peter Jun Park <peter.park@amd.com>
reorg
fix words and formatting
Signed-off-by: Peter Jun Park <peter.park@amd.com>
formatting
Signed-off-by: Peter Jun Park <peter.park@amd.com>
same
reorg
format
fix formatting
fix toc
Signed-off-by: Peter Jun Park <peter.park@amd.com>
format
* impr internal linking and fix sphinx warnings
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add spellcheck/linting from rocm-docs-core
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix rst directives
satisfy spellcheck
fix more spelling
rm unused files
fix spelling and update wordlist
* bump rocm-docs-core to 1.6.0
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add fixes from @skyreflectedinmirrors and @lpaoletti
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add references to toc
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add more fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add package manager install section
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* add fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add metadata and fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add fixes
bump to 1.6.1
more fixes
fix fmt in profiling examples
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add missing mem type table
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix formatting
fmt
* add custom css
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix css fs
* make images/figs click-to-expand
Signed-off-by: Peter Jun Park <peter.park@amd.com>
add missed image
update
fix link
* update documentation link in README
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* formatting fixes
Signed-off-by: Peter Jun Park <peter.park@amd.com>
more formatting
* fix heading
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* move archived docs
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* exclude archived docs from docs build
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* update archived docs workflow
Signed-off-by: Peter Jun Park <peter.park@amd.com>
move files
update archived docs workflow
Signed-off-by: Peter Jun Park <peter.park@amd.com>
fix version number
clean up workflow
workflow test
workflow test
another workflow test
* rm docs linting
Signed-off-by: Peter Jun Park <peter.park@amd.com>
* Apply cmake-format suggested changes
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
* Apply cmake-format
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
---------
Signed-off-by: Peter Jun Park <peter.park@amd.com>
Signed-off-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
Co-authored-by: Sam Wu <22262939+samjwu@users.noreply.github.com>
[ROCm/rocprofiler-compute commit: a0dc485ceb]
114 řádky
5.7 KiB
ReStructuredText
114 řádky
5.7 KiB
ReStructuredText
.. _valu-arith-instruction-mix-ex:
|
|
|
|
VALU arithmetic instruction mix
|
|
===============================
|
|
|
|
For this example, consider the
|
|
:dev-sample:`instruction mix sample <instmix.hip>` distributed as a part
|
|
of Omniperf.
|
|
|
|
.. note::
|
|
|
|
The examples in the section are expected to work on all CDNA™ accelerators.
|
|
However, the actual experiment results in this section were collected on an
|
|
:ref:`MI2XX <mixxx-note>` accelerator.
|
|
|
|
.. _valu-experiment-design:
|
|
|
|
Design note
|
|
-----------
|
|
|
|
This code uses a number of inline assembly instructions to cleanly
|
|
identify the types of instructions being issued, as well as to avoid
|
|
optimization / dead-code elimination by the compiler. While inline
|
|
assembly is inherently not portable, this example is expected to work on
|
|
all GCN™ GPUs and CDNA accelerators.
|
|
|
|
We reproduce a sample of the kernel as follows:
|
|
|
|
.. code-block:: cpp
|
|
|
|
// fp32: add, mul, transcendental and fma
|
|
float f1, f2;
|
|
asm volatile(
|
|
"v_add_f32_e32 %0, %1, %0\n"
|
|
"v_mul_f32_e32 %0, %1, %0\n"
|
|
"v_sqrt_f32 %0, %1\n"
|
|
"v_fma_f32 %0, %1, %0, %1\n"
|
|
: "=v"(f1)
|
|
: "v"(f2));
|
|
|
|
These instructions correspond to:
|
|
|
|
* A 32-bit floating point addition,
|
|
|
|
* a 32-bit floating point multiplication,
|
|
|
|
* a 32-bit floating point square-root transcendental operation, and
|
|
|
|
* a 32-bit floating point fused multiply-add operation.
|
|
|
|
For more detail, refer to the `CDNA2 ISA
|
|
Guide <https://www.amd.com/system/files/TechDocs/instinct-mi200-cdna2-instruction-set-architecture.pdf>`__.
|
|
|
|
Instruction mix
|
|
^^^^^^^^^^^^^^^
|
|
|
|
This example was compiled and run on a MI250 accelerator using ROCm
|
|
v5.6.0, and Omniperf v2.0.0.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ hipcc -O3 instmix.hip -o instmix
|
|
|
|
Generate the profile for this example using the following command.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ omniperf profile -n instmix --no-roof -- ./instmix
|
|
|
|
Analyze the instruction mix section.
|
|
|
|
.. code-block:: shell
|
|
|
|
$ omniperf analyze -p workloads/instmix/mi200/ -b 10.2
|
|
<...>
|
|
10. Compute Units - Instruction Mix
|
|
10.2 VALU Arithmetic Instr Mix
|
|
╒═════════╤════════════╤═════════╤════════════════╕
|
|
│ Index │ Metric │ Count │ Unit │
|
|
╞═════════╪════════════╪═════════╪════════════════╡
|
|
│ 10.2.0 │ INT32 │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.1 │ INT64 │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.2 │ F16-ADD │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.3 │ F16-MUL │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.4 │ F16-FMA │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.5 │ F16-Trans │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.6 │ F32-ADD │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.7 │ F32-MUL │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.8 │ F32-FMA │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.9 │ F32-Trans │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.10 │ F64-ADD │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.11 │ F64-MUL │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.12 │ F64-FMA │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.13 │ F64-Trans │ 1.00 │ Instr per wave │
|
|
├─────────┼────────────┼─────────┼────────────────┤
|
|
│ 10.2.14 │ Conversion │ 1.00 │ Instr per wave │
|
|
╘═════════╧════════════╧═════════╧════════════════╛
|
|
|
|
This shows that we have exactly one of each type of VALU arithmetic instruction
|
|
by construction.
|