Files
Peter Park a36e280903 Fix links in docs (#564)
[ROCm/rocprofiler-compute commit: 9643afa62d]
2025-02-13 16:24:48 -05:00

153 řádky
5.3 KiB
ReStructuredText
Surový Trvalý odkaz Blame Historie

Tento soubor obsahuje nejednoznačné znaky Unicode
Tento soubor obsahuje znaky Unicode, které mohou být zaměněny s jinými znaky. Pokud si myslíte, že je to záměrné, můžete toto varování bezpečně ignorovat. Použijte tlačítko Escape sekvence k jejich zobrazení.
.. meta::
:description: ROCm Compute Profiler terminology and definitions
:keywords: Omniperf, ROCm Compute Profiler, ROCm, glossary, definitions, terms, profiler, tool,
Instinct, accelerator, AMD
***********
Definitions
***********
The following table briefly defines some terminology used in ROCm Compute Profiler interfaces
and in this documentation.
.. include:: ./includes/terms.rst
.. include:: ./includes/normalization-units.rst
.. _memory-spaces:
Memory spaces
=============
AMD Instinct™ MI-series accelerators can access memory through multiple address spaces
which may map to different physical memory locations on the system. The
following table provides a view into how various types of memory used
in HIP map onto these constructs:
.. list-table::
:header-rows: 1
* - LLVM Address Space
- Hardware Memory Space
- HIP Terminology
* - Generic
- Flat
- N/A
* - Global
- Global
- Global
* - Local
- LDS
- LDS/Shared
* - Private
- Scratch
- Private
* - Constant
- Same as global
- Constant
The following is a high-level description of the address spaces in the AMDGPU
backend of LLVM:
.. list-table::
:header-rows: 1
* - Address space
- Description
* - Global
- Memory that can be seen by all threads in a process, and may be backed by
the local accelerator's HBM, a remote accelerator's HBM, or the CPU's
DRAM.
* - Local
- Memory that is only visible to a particular workgroup. On AMD's Instinct
accelerator hardware, this is stored in :doc:`LDS <local-data-share>`
memory.
* - Private
- Memory that is only visible to a particular [work-item](workitem)
(thread), stored in the scratch space on AMD's Instinct accelerators.
* - Constant
- Read-only memory that is in the global address space and stored on the
local accelerator's HBM.
* - Generic
- Used when the compiler cannot statically prove that a pointer is
addressing memory in a single (non-generic) address space. Mapped to Flat
on AMD's Instinct accelerators, the pointer could dynamically address
global, local, private or constant memory.
`LLVM's documentation for AMDGPU Backend <https://llvm.org/docs/AMDGPUUsage.html#address-spaces>`_
has the most up-to-date information. Refer to this source for a more complete
explanation.
.. _memory-type:
Memory type
===========
AMD Instinct accelerators contain a number of different memory allocation
types to enable the HIP language's
:doc:`memory coherency model <hip:how-to/hip_runtime_api/memory_management/coherence_control>`.
These memory types are broadly similar between AMD Instinct accelerator
generations, but may differ in exact implementation.
In addition, these memory types *might* differ between accelerators on the same
system, even when accessing the same memory allocation.
For example, an :ref:`MI2XX <mixxx-note>` accelerator accessing *fine-grained*
memory allocated local to that device may see the allocation as coherently
cacheable, while a remote accelerator might see the same allocation as
*uncached*.
These memory types include:
.. list-table::
:header-rows: 1
* - Memory type
- Description
* - Uncached Memory (UC)
- Memory that will not be cached in this accelerator. On
:ref:`MI2XX <mixxx-note>` accelerators, this corresponds “fine-grained”
(or, “coherent”) memory allocated on a remote accelerator or the host,
for example, using ``hipHostMalloc`` or ``hipMallocManaged`` with default
allocation flags.
* - Non-hardware-Coherent Memory (NC)
- Memory that will be cached by the accelerator, and is only guaranteed to
be consistent at kernel boundaries / after software-driven
synchronization events. On :ref:`MI2XX <mixxx-note>` accelerators, this
type of memory maps to, for example, “coarse-grained” ``hipHostMalloc``d
memory -- that is, allocated with the ``hipHostMallocNonCoherent``
flag -- or ``hipMalloc``d memory allocated on a remote accelerator.
* - Coherently Cachable (CC)
- Memory for which only reads from the accelerator where the memory was
allocated will be cached. Writes to CC memory are uncached, and trigger
invalidations of any line within this accelerator. On
:ref:`MI2XX <mixxx-note>` accelerators, this type of memory maps to
“fine-grained” memory allocated on the local accelerator using, for
example, the ``hipExtMallocWithFlags`` API using the
``hipDeviceMallocFinegrained`` flag.
* - Read/Write Coherent Memory (RW)
- Memory that will be cached by the accelerator, but may be invalidated by
writes from remote devices at kernel boundaries / after software-driven
synchronization events. On :ref:`MI2XX <mixxx-note>` accelerators, this
corresponds to “coarse-grained” memory allocated locally to the
accelerator, using for example, the default ``hipMalloc`` allocator.
Find a good discussion of coarse and fine-grained memory allocations and what
type of memory is returned by various combinations of memory allocators, flags
and arguments in the
`Crusher quick-start guide <https://docs.olcf.ornl.gov/systems/crusher_quick_start_guide.html#floating-point-fp-atomic-operations-and-coarse-fine-grained-memory-allocations>`_.