rocm-systems/projects/hip/docs/how-to/hip_runtime_api/multi_device.rst

.. meta::
    :description: This chapter describes how to use multiple devices on one host.
    :keywords: ROCm, HIP, multi-device, multiple, GPUs, devices

.. _multi-device:

*******************************************************************************
Multi-device management
*******************************************************************************

Device enumeration
===============================================================================

Device enumeration involves identifying all the available GPUs connected to the
host system. A single host machine can have multiple GPUs, each with its own
unique identifier. By listing these devices, you can decide which GPU to use
for computation. The host queries the system to count and list all connected
GPUs that support the chosen ``HIP_PLATFORM``, ensuring that the application
can leverage the full computational power available. Typically, applications
list devices and their properties for deployment planning, and also make
dynamic selections during runtime to ensure optimal performance.

If the application does not define a specific GPU, device 0 is selected.

.. literalinclude:: ../../tools/example_codes/device_enumeration.cpp
    :start-after: // [sphinx-start]
    :end-before: // [sphinx-end]
    :language: cpp

.. _multi_device_selection:

Device selection
===============================================================================

Once you have enumerated the available GPUs, the next step is to select a
specific device for computation. This involves setting the active GPU that will
execute subsequent operations. This step is crucial in multi-GPU systems where
different GPUs might have different capabilities or workloads. By selecting the
appropriate device, you ensure that the computational tasks are directed to the
correct GPU, optimizing performance and resource utilization.

.. literalinclude:: ../../tools/example_codes/device_selection.hip
    :start-after: // [sphinx-start]
    :end-before: // [sphinx-end]
    :language: cpp

Stream and event behavior
===============================================================================

In a multi-device system, streams and events are essential for efficient
parallel computation and synchronization. Streams enable asynchronous task
execution, allowing multiple devices to process data concurrently without
blocking one another. Events provide a mechanism for synchronizing operations
across streams and devices, ensuring that tasks on one device are completed
before dependent tasks on another device begin. This coordination prevents race
conditions and optimizes data flow in multi-GPU systems. Together, streams and
events maximize performance by enabling parallel execution, load balancing, and
effective resource utilization across heterogeneous hardware.

.. literalinclude:: ../../tools/example_codes/multi_device_synchronization.hip
    :start-after: // [sphinx-start]
    :end-before: // [sphinx-end]
    :language: cpp

Peer-to-peer memory access
===============================================================================

In multi-GPU systems, peer-to-peer memory access enables one GPU to directly
read or write to the memory of another GPU. This capability reduces data
transfer times by allowing GPUs to communicate directly without involving the
host. Enabling peer-to-peer access can significantly improve the performance of
applications that require frequent data exchange between GPUs, as it eliminates
the need to transfer data through the host memory.

By adding peer-to-peer access to the example referenced in
:ref:`multi_device_selection`, data can be efficiently copied between devices.
If peer-to-peer access is not activated, the call to :cpp:func:`hipMemcpy`
still works but internally uses a staging buffer in host memory, which incurs a
performance penalty.

.. tab-set::

    .. tab-item:: with peer-to-peer

        .. literalinclude:: ../../tools/example_codes/p2p_memory_access.hip
            :start-after: // [sphinx-start]
            :end-before: // [sphinx-end]
            :emphasize-lines: 45-51, 65-69
            :language: cpp

    .. tab-item:: without peer-to-peer

        .. literalinclude:: ../../tools/example_codes/p2p_memory_access_host_staging.hip
            :start-after: // [sphinx-start]
            :end-before: // [sphinx-end]
            :emphasize-lines: 57-59
            :language: cpp