d0cf32a63a
* Update docs 2025 03 31 - Docs: remove virtual_rocr.rst - Fix documentation warnings - Reformat HIP RTC - Docs: Refactor HIP porting guide - Docs: Expand HIP porting guide and CUDA driver porting guide - Minor fix - Docs: Update environment variables file - Bump rocm-docs-core[api_reference] from 1.15.0 to 1.17.0 in /docs/sphinx - Docs: Update FP8 page to show both FP8 and FP16 types - Bump sphinxcontrib-doxylink from 1.12.4 to 1.13.0 in /docs/sphinx - Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.17.0 to 1.17.1. - Remove external link - Update programming model - Bump rocm-docs-core[api_reference] from 1.17.1 to 1.18.1 in /docs/sphinx - Docs: Add page for Complex Math API - Docs: Add page about HIP error codes - Update docs: the compilation cache is enabled by default - Fix fns32 function mask type in doc * Bump rocm-docs-core[api_reference] from 1.18.1 to 1.18.2 in /docs/sphinx Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.18.1 to 1.18.2. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](https://github.com/ROCm/rocm-docs-core/compare/v1.18.1...v1.18.2) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.18.2 dependency-type: direct:production update-type: version-update:semver-patch * Fix readme link * Docs: Fix verbose paths generated by doxygen * Handle git ssh in docs conf.py
177 linhas
7.2 KiB
ReStructuredText
177 linhas
7.2 KiB
ReStructuredText
.. meta::
|
|
:description: Compilation workflow of the HIP compilers.
|
|
:keywords: AMD, ROCm, HIP, CUDA, HIP runtime API
|
|
|
|
.. _hip_compilers:
|
|
|
|
********************************************************************************
|
|
HIP compilers
|
|
********************************************************************************
|
|
|
|
ROCm provides the compiler driver ``hipcc``, that can be used on AMD ROCm and
|
|
NVIDIA CUDA platforms.
|
|
|
|
On ROCm, ``hipcc`` takes care of the following:
|
|
|
|
- Setting the default library and include paths for HIP
|
|
- Setting some environment variables
|
|
- Invoking the appropriate compiler - ``amdclang++``
|
|
|
|
On NVIDIA CUDA platform, ``hipcc`` takes care of invoking compiler ``nvcc``.
|
|
``amdclang++`` is based on the ``clang++`` compiler. For more
|
|
details, see the :doc:`llvm project<llvm-project:index>`.
|
|
|
|
HIPCC
|
|
================================================================================
|
|
|
|
Common Compiler Options
|
|
--------------------------------------------------------------------------------
|
|
|
|
The following table shows the most common compiler options supported by
|
|
``hipcc``.
|
|
|
|
.. list-table::
|
|
:header-rows: 1
|
|
|
|
*
|
|
- Option
|
|
- Description
|
|
*
|
|
- ``--fgpu-rdc``
|
|
- Generate relocatable device code, which allows kernels or device functions
|
|
to call device functions in different translation units.
|
|
*
|
|
- ``-ggdb``
|
|
- Equivalent to `-g` plus tuning for GDB. This is recommended when using
|
|
ROCm's GDB to debug GPU code.
|
|
*
|
|
- ``--gpu-max-threads-per-block=<num>``
|
|
- Generate code to support up to the specified number of threads per block.
|
|
*
|
|
- ``-offload-arch=<target>``
|
|
- Generate code for the given GPU target.
|
|
For a full list of supported compilation targets see the `processor names in AMDGPU's llvm documentation <https://llvm.org/docs/AMDGPUUsage.html#processors>`_.
|
|
This option can appear multiple times to generate a fat binary for multiple
|
|
targets.
|
|
The actual support of the platform's runtime may differ.
|
|
*
|
|
- ``-save-temps``
|
|
- Save the compiler generated intermediate files.
|
|
*
|
|
- ``-v``
|
|
- Show the compilation steps.
|
|
|
|
Linking
|
|
--------------------------------------------------------------------------------
|
|
|
|
``hipcc`` adds the necessary libraries for HIP as well as for the accelerator
|
|
compiler (``nvcc`` or ``amdclang++``). We recommend linking with ``hipcc`` since
|
|
it automatically links the binary to the necessary HIP runtime libraries.
|
|
|
|
Linking Code With Other Compilers
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
``nvcc`` by default uses ``g++`` to generate the host code.
|
|
|
|
``amdclang++`` generates both device and host code. The code uses the same API
|
|
as ``gcc``, which allows code generated by different ``gcc``-compatible
|
|
compilers to be linked together. For example, code compiled using ``amdclang++``
|
|
can link with code compiled using compilers such as ``gcc``, ``icc`` and
|
|
``clang``. Take care to ensure all compilers use the same standard C++ header
|
|
and library formats.
|
|
|
|
libc++ and libstdc++
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
``hipcc`` links to ``libstdc++`` by default. This provides better compatibility
|
|
between ``g++`` and HIP.
|
|
|
|
In order to link to ``libc++``, pass ``--stdlib=libc++`` to ``hipcc``.
|
|
Generally, libc++ provides a broader set of C++ features while ``libstdc++`` is
|
|
the standard for more compilers, notably including ``g++``.
|
|
|
|
When cross-linking C++ code, any C++ functions that use types from the C++
|
|
standard library, such as ``std::string``, ``std::vector`` and other containers,
|
|
must use the same standard-library implementation. This includes cross-linking
|
|
between ``amdclang++`` and other compilers.
|
|
|
|
|
|
HIP compilation workflow
|
|
================================================================================
|
|
|
|
HIP provides a flexible compilation workflow that supports both offline
|
|
compilation and runtime or just-in-time (JIT) compilation. Each approach has
|
|
advantages depending on the use case, target architecture, and performance
|
|
needs.
|
|
|
|
The offline compilation is ideal for production environments, where the
|
|
performance is critical and the target GPU architecture is known in advance.
|
|
|
|
The runtime compilation is useful in development environments or when
|
|
distributing software that must run on a wide range of hardware without the
|
|
knowledge of the GPU in advance. It provides flexibility at the cost of some
|
|
performance overhead.
|
|
|
|
Offline compilation
|
|
--------------------------------------------------------------------------------
|
|
|
|
The HIP code compilation is performed in two stages: host and device code
|
|
compilation stage.
|
|
|
|
- Device-code compilation stage: The compiled device code is embedded into the
|
|
host object file. Depending on the platform, the device code can be compiled
|
|
into assembly or binary. ``nvcc`` and ``amdclang++`` target different
|
|
architectures and use different code object formats. ``nvcc`` uses the binary
|
|
``cubin`` or the assembly PTX files, while the ``amdclang++`` path is the
|
|
binary ``hsaco`` format. On CUDA platforms, the driver compiles the PTX files
|
|
to executable code during runtime.
|
|
|
|
- Host-code compilation stage: On the host side, ``hipcc`` or ``amdclang++`` can
|
|
compile the host code in one step without other C++ compilers. On the other
|
|
hand, ``nvcc`` only replaces the ``<<<...>>>`` kernel launch syntax with the
|
|
appropriate CUDA runtime function call and the modified host code is passed to
|
|
the default host compiler.
|
|
|
|
For an example on how to compile HIP from the command line, see :ref:`SAXPY
|
|
tutorial<compiling_on_the_command_line>` .
|
|
|
|
Runtime compilation
|
|
--------------------------------------------------------------------------------
|
|
|
|
HIP allows you to compile kernels at runtime using the ``hiprtc*`` API. Kernels
|
|
are stored as a text string, which is passed to HIPRTC alongside options to
|
|
guide the compilation.
|
|
|
|
For more details, see
|
|
:doc:`HIP runtime compiler <../how-to/hip_rtc>`.
|
|
|
|
Static libraries
|
|
================================================================================
|
|
|
|
``hipcc`` supports generating two types of static libraries.
|
|
|
|
- The first type of static library only exports and launches host functions
|
|
within the same library and not the device functions. This library type offers
|
|
the ability to link with a non-hipcc compiler such as ``gcc``. Additionally,
|
|
this library type contains host objects with device code embedded as fat
|
|
binaries. This library type is generated using the flag ``--emit-static-lib``:
|
|
|
|
.. code-block:: shell
|
|
|
|
hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a
|
|
gcc test.cpp -L. -lhipOptLibrary -L/path/to/hip/lib -lamdhip64 -o test.out
|
|
|
|
- The second type of static library exports device functions to be linked by
|
|
other code objects by using ``hipcc`` as the linker. This library type
|
|
contains relocatable device objects and is generated using ``ar``:
|
|
|
|
.. code-block:: shell
|
|
|
|
hipcc hipDevice.cpp -c -fgpu-rdc -o hipDevice.o
|
|
ar rcsD libHipDevice.a hipDevice.o
|
|
hipcc libHipDevice.a test.cpp -fgpu-rdc -o test.out
|
|
|
|
A full example for this can be found in the ROCm-examples, see the examples for
|
|
`static host libraries <https://github.com/ROCm/rocm-examples/tree/develop/HIP-Basic/static_host_library>`_
|
|
or `static device libraries <https://github.com/ROCm/rocm-examples/tree/develop/HIP-Basic/static_device_library>`_.
|