diff --git a/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.drawio b/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.drawio index a01c453452..1adf6c08c1 100644 --- a/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.drawio +++ b/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.drawio @@ -1,125 +1,71 @@ - + - - + + - - + + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + - + - + - + - + - + - - + + - - + + - + - + - - + + - + - + - - - - - + + - - - - - - - - - - - - - - - - - + + diff --git a/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.svg b/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.svg index a64a7a54dc..031b83e518 100644 --- a/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.svg +++ b/projects/hip/docs/data/how-to/hip_runtime_api/runtimes.svg @@ -1,2 +1,2 @@ -
HIP Runtime API
HIP Runtime API
CUDA Driver API
CUDA Driver API
CUDA runtime
CUDA runtime
ROCr runtime
ROCr runtime
PAL
PAL
CLR
CLR
AMD Platform
AMD Platform -
NVIDIA Platform
NVIDIA Platform
hipother
hipother
Text is not SVG - cannot display
\ No newline at end of file +
HIP Runtime API
HIP Runtime API
ROCr runtime
ROCr runtime
PAL
PAL
CLR
CLR
 AMD Platform 
 AMD Platform  +
Text is not SVG - cannot display
\ No newline at end of file diff --git a/projects/hip/docs/data/what_is_hip/hip.drawio b/projects/hip/docs/data/what_is_hip/hip.drawio index 1a47e4b097..bd208230d2 100644 --- a/projects/hip/docs/data/what_is_hip/hip.drawio +++ b/projects/hip/docs/data/what_is_hip/hip.drawio @@ -1,155 +1,105 @@ - + - - + + - - + + - - + + - - + + - - + + - - + + - - + + - + - - + + - - + + - - - - - - - - - - - - - - - - + - - + + - - + + - + + + + + + + + + + + + + - - + + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + - - + - + + + + + + + + + + - + - - - - - - - - - - + - - + + - + - - + + - - - - + + - - - - - + + - - - - - - - - - - - - - + + diff --git a/projects/hip/docs/data/what_is_hip/hip.svg b/projects/hip/docs/data/what_is_hip/hip.svg index c151dc8717..761045314c 100644 --- a/projects/hip/docs/data/what_is_hip/hip.svg +++ b/projects/hip/docs/data/what_is_hip/hip.svg @@ -1,2 +1,2 @@ -
NVIDIA runtime
NVIDIA runtime
NVIDIA Platform
NVIDIA Platform
HIP
HIP
AMD runtime
AMD runtime
AMD Platform
AMD Platform -
hipLibrary
hipLibrary
rocLibrary
rocLibrary
cuLibrary
cuLibrary
Application Implementation
Application Implementation
Application
Application
runtime API
runtime API
kernel language
kernel language
Text is not SVG - cannot display
\ No newline at end of file +
HIP
HIP
AMD runtime
AMD runtime
AMD Platform
AMD Platform +
hipLibrary
hipLibrary
rocLibrary
rocLibrary
Application Implementation
Application Implementation
Application
Application
runtime API
runtime API
kernel language
kernel language
Text is not SVG - cannot display
\ No newline at end of file diff --git a/projects/hip/docs/how-to/hip_runtime_api.rst b/projects/hip/docs/how-to/hip_runtime_api.rst index 064d577b92..76d314fe8b 100644 --- a/projects/hip/docs/how-to/hip_runtime_api.rst +++ b/projects/hip/docs/how-to/hip_runtime_api.rst @@ -1,6 +1,6 @@ .. meta:: :description: HIP runtime API usage - :keywords: AMD, ROCm, HIP, CUDA, HIP runtime API How to, + :keywords: AMD, ROCm, HIP, HIP runtime API How to, .. _hip_runtime_api_how-to: @@ -9,32 +9,20 @@ Using HIP runtime API ******************************************************************************** The HIP runtime API provides C and C++ functionalities to manage event, stream, -and memory on GPUs. On the AMD platform, the HIP runtime uses -:doc:`Compute Language Runtime (CLR) <../understand/amd_clr>`, while on NVIDIA -CUDA platform, it is only a thin layer over the CUDA runtime or Driver API. +and memory on GPUs. The HIP runtime uses :doc:`Compute Language Runtime (CLR) <../understand/amd_clr>`. -- **CLR** contains source code for AMD's compute language runtimes: ``HIP`` and - ``OpenCL™``. CLR includes the ``HIP`` implementation on the AMD - platform: `hipamd `_ and the - ROCm Compute Language Runtime (``rocclr``). ``rocclr`` is a - virtual device interface that enables the HIP runtime to interact with - different backends such as :doc:`ROCr ` on Linux or PAL on - Windows. CLR also includes the `OpenCL runtime `_ - implementation. -- The **CUDA runtime** is built on top of the CUDA driver API, which is a C API - with lower-level access to NVIDIA GPUs. For details about the CUDA driver and - runtime API with reference to HIP, see :doc:`CUDA driver API porting guide <../how-to/hip_porting_driver_api>`. +CLR contains source code for AMD ROCm's compute language runtimes: ``HIP`` and +``OpenCL™``. CLR includes the ``HIP`` implementation on the AMD ROCm +platform: `hipamd `_ and the +ROCm Compute Language Runtime (``rocclr``). ``rocclr`` is a +virtual device interface that enables the HIP runtime to interact with +different backends, such as :doc:`ROCr ` on Linux or PAL on +Microsoft Windows. CLR also includes the `OpenCL runtime `_ +implementation. -The backends of HIP runtime API under AMD and NVIDIA platform are summarized in -the following figure: +The HIP runtime API backends are summarized in the following figure: .. figure:: ../data/how-to/hip_runtime_api/runtimes.svg - -.. note:: - - On NVIDIA platform HIP runtime API calls CUDA runtime or CUDA driver via - hipother interface. For more information, see the `hipother repository `_. - Here are the various HIP Runtime API high level functions: * :doc:`./hip_runtime_api/initialization` diff --git a/projects/hip/docs/sphinx/_toc.yml.in b/projects/hip/docs/sphinx/_toc.yml.in index 7049c46d76..52e4bba9d2 100644 --- a/projects/hip/docs/sphinx/_toc.yml.in +++ b/projects/hip/docs/sphinx/_toc.yml.in @@ -16,12 +16,6 @@ subtrees: title: Installing HIP - file: install/build title: Building HIP from source - - url: https://rocm.docs.amd.com/projects/install-on-linux/en/${branch}/reference/system-requirements.html - title: Linux supported AMD GPUs - - url: https://rocm.docs.amd.com/projects/install-on-windows/en/${branch}/reference/system-requirements.html - title: Windows supported AMD GPUs - - url: https://developer.nvidia.com/cuda-gpus - title: NVIDIA supported GPUs - caption: Programming guide entries: diff --git a/projects/hip/docs/tutorial/saxpy.rst b/projects/hip/docs/tutorial/saxpy.rst index 55f12e6426..1d527db8b4 100644 --- a/projects/hip/docs/tutorial/saxpy.rst +++ b/projects/hip/docs/tutorial/saxpy.rst @@ -16,11 +16,10 @@ Prerequisites ============= To follow this tutorial, you'll need installed drivers and a HIP compiler -toolchain to compile your code. Because HIP for ROCm supports compiling and -running on Linux and Windows with AMD and NVIDIA GPUs, the combination of -install instructions is more than worth covering as part of this tutorial. For -more information about installing HIP development packages, see -:doc:`/install/install`. +toolchain to compile your code. Because HIP supports compiling and running on +Linux and Windows with AMD GPUs, the install instructions are more than worth +covering as part of this tutorial. For more information about +installing HIP development packages, see :doc:`/install/install`. .. _hip-tutorial-saxpy-heterogeneous-programming: @@ -158,8 +157,8 @@ for compilation" on Linux. To make invocations more terse, Linux and Windows example follow. .. tab-set:: - .. tab-item:: Linux and AMD - :sync: linux-amd + .. tab-item:: Linux + :sync: linux While distro maintainers might package ROCm so that it installs to system-default locations, AMD's packages aren't installed that way. They need @@ -182,19 +181,8 @@ example follow. have `/opt/rocm/bin` on the Path for convenience. This subtly affects CMake package detection logic of ROCm libraries. - .. tab-item:: Linux and NVIDIA - :sync: linux-nvidia - - Both distro maintainers and NVIDIA package CUDA so that ``nvcc`` and related - tools are available on the command line by default. You can call the - compiler on the command line with: - - .. code-block:: bash - - nvcc --version - - .. tab-item:: Windows and AMD - :sync: windows-amd + .. tab-item:: Windows + :sync: windows Windows compilers and command line tooling have traditionally relied on extra environmental variables and PATH entries to function correctly. @@ -244,90 +232,26 @@ example follow. clang++ --version - .. tab-item:: Windows and NVIDIA - :sync: windows-nvidia - - Windows compilers and command line tooling have traditionally relied on - extra environmental variables and PATH entries to function correctly. - Visual Studio refers to command lines with this setup as "Developer - Command Prompt" or "Developer PowerShell" for ``cmd.exe`` and PowerShell - respectively. - - The HIP and CUDA SDKs on Windows don't include complete toolchains. You will - also need: - - - The Microsoft Windows SDK. It provides the import libs to crucial system - libraries that all executables must link to and some auxiliary compiler - tooling. - - A Standard Template Library (STL). Installed as part of the Microsoft - Visual C++ compiler (MSVC) or with Visual Studio. - - If you don't have a version of Visual Studio 2022 installed, for a - minimal command line experience, install the - `Build Tools for Visual Studio 2022 `_ - with the Desktop Developemnt Workload. Under Individual Components select: - - - A version of the Windows SDK - - "MSVC v143 - VS 2022 C++ x64/x86 build tools (Latest)" - - "C++ CMake tools for Windows" (optional) - - .. note:: - - The "C++ CMake tools for Windows" individual component is a convenience which - puts both ``cmake.exe`` and ``ninja.exe`` onto the PATH inside developer - command prompts. You can install these manually, but then you must manage - them manually. - - Visual Studio 2017 and later are detectable as COM object instances via WMI. - To setup a command line from any shell for the latest Visual Studio's - default Visual C++ toolset issue: - - .. code-block:: powershell - - $InstallationPath = Get-CimInstance MSFT_VSInstance | Sort-Object -Property Version -Descending | Select-Object -First 1 -ExpandProperty InstallLocation - Import-Module $InstallationPath\Common7\Tools\Microsoft.VisualStudio.DevShell.dll - Enter-VsDevShell -InstallPath $InstallationPath -SkipAutomaticLocation -Arch amd64 -HostArch amd64 -DevCmdArguments '-no_logo' - - You should be able to call the compiler on the command line now: - - .. code-block:: powershell - - nvcc --version - Invoking the compiler manually ------------------------------ To compile and link a single-file application, use the following commands: .. tab-set:: - .. tab-item:: Linux and AMD - :sync: linux-amd + .. tab-item:: Linux + :sync: linux .. code-block:: bash amdclang++ ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -lamdhip64 -L /opt/rocm/lib -O2 - .. tab-item:: Linux and NVIDIA - :sync: linux-nvidia - - .. code-block:: bash - - nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu - - .. tab-item:: Windows and AMD - :sync: windows-amd + .. tab-item:: Windows + :sync: windows .. code-block:: powershell clang++ .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I .\Common -lamdhip64 -L ${env:HIP_PATH}lib -O2 - .. tab-item:: Windows and NVIDIA - :sync: windows-nvidia - - .. code-block:: powershell - - nvcc .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I ${env:HIP_PATH}include -I .\Common -O2 -x cu - Depending on your computer, the resulting binary might or might not run. If not, it typically complains about "Invalid device function". That error (corresponding to the ``hipErrorInvalidDeviceFunction`` entry of ``hipError_t``) @@ -341,8 +265,8 @@ find out what device binary flavors are embedded into the executable? .. tab-set:: - .. tab-item:: Linux and AMD - :sync: linux-amd + .. tab-item:: Linux + :sync: linux The utilities included with ROCm help significantly to inspect binary artifacts on disk. Add the ROCmCC installation folder to your PATH if you @@ -432,32 +356,8 @@ find out what device binary flavors are embedded into the executable? The filename notes the graphics IPs used by the compiler. The contents of this file are similar to the `*.s` file created with ``llvm-objdump`` earlier. - .. tab-item:: Linux and NVIDIA - :sync: linux-nvidia - - Unlike HIP on AMD, when compiling using the NVIDIA support of HIP the resulting - binary will be a valid CUDA executable as far as the binary goes. Therefor - it'll incorporate PTX ISA (Parallel Thread eXecution Instruction Set - Architecture) instead of AMDGPU binary. As s result, tooling shipping with the - CUDA SDK can be used to inspect which device ISA got compiled into a specific - executable. The tool most useful to us currently is ``cuobjdump``. - - .. code-block:: bash - - cuobjdump --list-ptx ./saxpy - - Which will print something like: - - .. code-block:: - - PTX file 1: saxpy.1.sm_52.ptx - - From this we can see that the saxpy kernel is stored as ``sm_52``, which shows - that a compute capability 5.2 ISA got embedded into the executable, so devices - which sport compute capability 5.2 or newer will be able to run this code. - - .. tab-item:: Windows and AMD - :sync: windows-amd + .. tab-item:: Windows + :sync: windows The HIP SDK for Windows don't yet sport the ``roc-*`` set of utilities to work with binary artifacts. To find out what binary formats are embedded into an @@ -562,36 +462,12 @@ find out what device binary flavors are embedded into the executable? s_endpgm ... - .. tab-item:: Windows and NVIDIA - :sync: windows-nvidia - - Unlike HIP on AMD, when compiling using the NVIDIA support for HIP, the resulting - binary will be a valid CUDA executable. Therefore, it'll incorporate PTX ISA - (Parallel Thread eXecution Instruction Set Architecture) instead of AMDGPU - binary. As a result, tooling included with the CUDA SDK can be used to - inspect which device ISA was compiled into a specific executable. The most - helpful to us currently is ``cuobjdump``. - - .. code-block:: bash - - cuobjdump.exe --list-ptx .\saxpy.exe - - Which prints something like: - - .. code-block:: - - PTX file 1: saxpy.1.sm_52.ptx - - This example shows that the SAXPY kernel is stored as ``sm_52``. It also shows - that a compute capability 5.2 ISA was embedded into the executable, so devices - that support compute capability 5.2 or newer will be able to run this code. - Now that you've found what binary got embedded into the executable, find which format our available devices use. .. tab-set:: - .. tab-item:: Linux and AMD - :sync: linux-amd + .. tab-item:: Linux + :sync: linux On Linux a utility called ``rocminfo`` helps us list all the properties of the devices available on the system, including which version of graphics IP @@ -618,60 +494,8 @@ format our available devices use. Calculating y[i] = a * x[i] + y[i] over 1000000 elements. First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] - .. tab-item:: Linux and NVIDIA - :sync: linux-nvidia - - On Linux HIP with the NVIDIA back-end, the ``deviceQuery`` CUDA SDK sample - can help us list all the properties of the devices available on the system, - including which version of compute capability a device sports. - ``.`` compute capability is passed to ``nvcc`` on the - command-line as ``sm_``, for eg. ``8.6`` is ``sm_86``. - - Because it's not included as a binary, compile the matching - example from ROCm. - - .. code-block:: bash - - nvcc ./HIP-Basic/device_query/main.cpp -o device_query -I ./Common -I /opt/rocm/include -O2 - - Filter the output to have only the lines of interest, for example: - - .. code-block:: bash - - ./device_query | grep "major.minor" - major.minor: 8.6 - major.minor: 7.0 - - .. note:: - - In addition to the ``nvcc`` executable is another tool called ``__nvcc_device_query`` - which prints the SM Architecture numbers to standard out as a comma - separated list of numbers. The utility's name suggests it's not a user-facing - executable but is used by ``nvcc`` to determine what devices are in the - system at hand. - - Now that you know which graphics IPs our devices use, recompile your program with - the appropriate parameters. - - .. code-block:: bash - - nvcc ./HIP-Basic/saxpy/main.hip -o saxpy -I ./Common -I /opt/rocm/include -O2 -x cu -arch=sm_70,sm_86 - - .. note:: - - If you want to portably target the development machine which is compiling, you - may specify ``-arch=native`` instead. - - Now the sample will run. - - .. code-block:: - - ./saxpy - Calculating y[i] = a * x[i] + y[i] over 1000000 elements. - First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] - - .. tab-item:: Windows and AMD - :sync: windows-amd + .. tab-item:: Windows + :sync: windows On Windows, a utility called ``hipInfo.exe`` helps us list all the properties of the devices available on the system, including which version of graphics IP @@ -698,56 +522,3 @@ format our available devices use. .\saxpy.exe Calculating y[i] = a * x[i] + y[i] over 1000000 elements. First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] - - .. tab-item:: Windows and NVIDIA - :sync: windows-nvidia - - On Windows HIP with the NVIDIA back-end, the ``deviceQuery`` CUDA SDK sample - can help us list all the properties of the devices available on the system, - including which version of compute capability a device sports. - ``.`` compute capability is passed to ``nvcc`` on the - command-line as ``sm_``, for eg. ``8.6`` is ``sm_86``. - - Because it's not included as a binary, compile the matching - example from ROCm. - - .. code-block:: powershell - - nvcc .\HIP-Basic\device_query\main.cpp -o device_query.exe -I .\Common -I ${env:HIP_PATH}include -O2 - - Filter the output to have only the lines of interest, for example: - - .. code-block:: powershell - - .\device_query.exe | Select-String "major.minor" - - major.minor: 8.6 - major.minor: 7.0 - - .. note:: - - Next to the ``nvcc`` executable is another tool called ``__nvcc_device_query.exe`` - which simply prints the SM Architecture numbers to standard out as a comma - separated list of numbers. The naming of this utility suggests it's not a user - facing executable but is used by ``nvcc`` to determine what devices are in the - system at hand. - - Now that you know which graphics IPs our devices use, recompile your program with - the appropriate parameters. - - .. code-block:: powershell - - nvcc .\HIP-Basic\saxpy\main.hip -o saxpy.exe -I ${env:HIP_PATH}include -I .\Common -O2 -x cu -arch=sm_70,sm_86 - - .. note:: - - If you want to portably target the development machine which is compiling, you - may specify ``-arch=native`` instead. - - Now the sample will run. - - .. code-block:: - - .\saxpy.exe - Calculating y[i] = a * x[i] + y[i] over 1000000 elements. - First 10 elements of the results: [ 3, 5, 7, 9, 11, 13, 15, 17, 19, 21 ] diff --git a/projects/hip/docs/what_is_hip.rst b/projects/hip/docs/what_is_hip.rst index 96875c65f2..a6e8b8d7f3 100644 --- a/projects/hip/docs/what_is_hip.rst +++ b/projects/hip/docs/what_is_hip.rst @@ -1,6 +1,6 @@ .. meta:: :description: This chapter provides an introduction to the HIP API. - :keywords: AMD, ROCm, HIP, CUDA, C++ language extensions + :keywords: AMD, ROCm, HIP, C++ language extensions .. _intro-to-hip: @@ -8,39 +8,36 @@ What is HIP? ******************************************************************************* -The Heterogeneous-computing Interface for Portability (HIP) API is a C++ runtime API -and kernel language that lets developers create portable applications running in heterogeneous systems, -using CPUs and AMD GPUs or NVIDIA GPUs from a single source code. HIP provides a simple -marshalling language to access either the AMD ROCM back-end, or NVIDIA CUDA back-end, -to build and run application kernels. +The Heterogeneous-computing Interface for Portability (HIP) API, part of AMD's +ROCm platform, is a C++ runtime API and kernel language that lets developers +create portable applications that run on heterogeneous systems, using CPUs and +AMD GPUs from a single source code base. .. figure:: data/what_is_hip/hip.svg :alt: HIP in an application. :align: center * HIP is a thin API with little or no performance impact over coding directly - in NVIDIA CUDA or AMD :doc:`ROCm `. + in AMD :doc:`ROCm `. -* HIP enables coding in a single-source C++ programming language including +* HIP enables coding in a single-source C++ programming language, including features such as templates, C++11 lambdas, classes, namespaces, and more. -* Developers can specialize for the platform (CUDA or ROCm) to tune for - performance or handle tricky cases. +* Developers can tune for performance or handle tricky cases via HIP. -ROCm offers compilers (``clang``, ``hipcc``), code -profilers (``rocprof``, ``omnitrace``), debugging tools (``rocgdb``), libraries -and HIP with the runtime API and kernel language, to create heterogeneous applications -running on both CPUs and GPUs. ROCm provides marshalling libraries like +ROCm offers compilers (``clang``, ``hipcc``), code profilers (``rocprofv3``), +debugging tools (``rocgdb``), libraries and HIP with the runtime +API and kernel language, to create heterogeneous applications running on both +CPUs and GPUs. ROCm provides marshalling libraries like :doc:`hipFFT ` or :doc:`hipBLAS ` that act as a -thin programming layer over either NVIDIA CUDA or AMD ROCm to enable support for -either back-end. These libraries offer pointer-based memory interfaces and are -easily integrated into your applications. +thin programming layer over AMD ROCm and offer API compatibility with the +equivalent Nvidia CUDA libraries. These libraries provide pointer-based memory +interfaces and can be easily integrated into your applications. -HIP supports the ability to build and run on either AMD GPUs or NVIDIA GPUs. +HIP supports building and running on both AMD GPUs or NVIDIA GPUs. GPU Programmers familiar with NVIDIA CUDA or OpenCL will find the HIP API -familiar and easy to use. Developers no longer need to choose between AMD or -NVIDIA GPUs. You can quickly port your application to run on the available -hardware while maintaining a single codebase. The :doc:`HIPify ` +familiar and easy to use. You can quickly port your application to run on the +available hardware while maintaining a single codebase. The :doc:`HIPify ` tools, based on the clang front-end and Perl language, can convert CUDA API calls into the corresponding HIP API calls. However, HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual @@ -56,8 +53,7 @@ develop massively parallel programs that run on GPUs, and provides access to GPU specific hardware capabilities. In summary, HIP simplifies cross-platform development, maintains performance, -and provides a familiar C++ experience for GPU programming that runs seamlessly -on both AMD and NVIDIA GPUs. +and provides a familiar C++ experience for GPU programming that runs seamlessly. HIP components =============================================== @@ -68,19 +64,11 @@ associated with each component, see :doc:`HIP licensing `. C++ runtime API ----------------------------------------------- -For the AMD ROCm platform, HIP provides headers and a runtime library built on -top of HIP-Clang compiler in the repository -:doc:`Compute Language Runtime (CLR) `. The HIP runtime -implements HIP streams, events, and memory APIs, and is an object library that -is linked with the application. The source code for all headers and the library -implementation is available on GitHub. - -For the NVIDIA CUDA platform, HIP provides headers that translate from the -HIP runtime API to the CUDA runtime API. The host-side contains mostly inlined -wrappers or even just preprocessor defines, with no additional overhead. -The device-side code is compiled with ``nvcc``, just like normal CUDA kernels, -and therefore one can expect the same performance as if directly coding in CUDA. -The CUDA specific headers can be found in the `hipother repository `_. +HIP provides headers and a runtime library built on top of HIP-Clang compiler in +the repository :doc:`Compute Language Runtime (CLR) `. The +HIP runtime implements HIP streams, events, and memory APIs, and is an object +library that is linked with the application. The source code for all headers and +the library implementation is available on GitHub. For further details, check :ref:`HIP Runtime API Reference `.