diff --git a/docs/markdown/hip_faq.md b/docs/markdown/hip_faq.md index 101d99486e..8aa3daa239 100644 --- a/docs/markdown/hip_faq.md +++ b/docs/markdown/hip_faq.md @@ -15,20 +15,23 @@ - [Do HIPIFY tools automatically convert all source code?](#do-hipify-tools-automatically-convert-all-source-code) - [What is NVCC?](#what-is-nvcc) - [What is HCC?](#what-is-hcc) +- [What is HIP-Clang?](#what-is-hip-clang) - [Why use HIP rather than supporting CUDA directly?](#why-use-hip-rather-than-supporting-cuda-directly) - [Can I develop HIP code on an Nvidia CUDA platform?](#can-i-develop-hip-code-on-an-nvidia-cuda-platform) -- [Can I develop HIP code on an AMD HCC platform?](#can-i-develop-hip-code-on-an-amd-hcc-platform) +- [Can I develop HIP code on an AMD HIP-Clang platform?](#can-i-develop-hip-code-on-an-amd-hip-clang-platform) +- [Do I need to make code changes in HIP code if switch compiler from HCC to HIP-Clang?](#Do-I-need-to-make-code-changes-in-hip-code-if-switch-compiler-from-hcc-to-hip-clang) +- [How to use HIP-Clang to build HIP programs instead of HCC?](#how-to-use-hip-clang-to-build-hip-programs-instead-of-hcc) +- [What is ROCclr?](#what-is-rocclr) - [Can a HIP binary run on both AMD and Nvidia platforms?](#can-a-hip-binary-run-on-both-amd-and-nvidia-platforms) - [What's the difference between HIP and hc?](#whats-the-difference-between-hip-and-hc) -- [On HCC, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang?](#on-hcc-can-i-link-hip-code-with-host-code-compiled-with-another-compiler-such-as-gcc-icc-or-clang-) -- [HIP detected my platform (hcc vs nvcc) incorrectly - what should I do?](#hip-detected-my-platform-hcc-vs-nvcc-incorrectly---what-should-i-do) -- [Can I install both CUDA SDK and HCC on same machine?](#can-i-install-both-cuda-sdk-and-hcc-on-same-machine) +- [On HIP-Clang, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang?](#on-HIP-Clang-can-i-link-hip-code-with-host-code-compiled-with-another-compiler-such-as-gcc-icc-or-clang-) +- [HIP detected my platform (hip-clang vs nvcc) incorrectly - what should I do?](#hip-detected-my-platform-hip-clang-vs-nvcc-incorrectly---what-should-i-do) +- [Can I install both CUDA SDK and HIP-clang on same machine?](#can-i-install-both-cuda-sdk-and-hip-clang-on-same-machine) - [On CUDA, can I mix CUDA code with HIP code?](#on-cuda-can-i-mix-cuda-code-with-hip-code) -- [On HCC, can I use HC functionality with HIP?](#on-hcc-can-i-use-hc-functionality-with-hip) +- [On HIP-Clang, can I use HC functionality with HIP?](#on-hip-clang-can-i-use-hc-functionality-with-hip) - [How do I trace HIP application flow?](#how-do-i-trace-hip-application-flow) - [What if HIP generates an error of "symbol multiply defined!" only on AMD machine?](#what-if-hip-generates-error-of-symbol-multiply-defined-only-on-amd-machine) -- [How do I disable HIP Generic Grid Launch option?](#how-do-i-disable-hip-generic-grid-launch-option) - +- [What is maximum limit of Generic kernel launching parameter?](#what-is-maximum-limit-of-generic-kernel-launching-parameter) ### What APIs and features does HIP support? @@ -47,6 +50,7 @@ HIP provides the following: The HIP API documentation describes each API and its limitations, if any, compared with the equivalent CUDA API. ### What is not supported? + #### Runtime/Driver API features At a high-level, the following features are not supported: - Textures (partial support available) @@ -62,7 +66,7 @@ See the [API Support Table](CUDA_Runtime_API_functions_supported_by_HIP.md) for #### Kernel language features - C++-style device-side dynamic memory allocations (free, new, delete) (CUDA 4.0) - Virtual functions, indirect functions and try/catch (CUDA 4.0) -- `__prof_trigger` +- `__prof_trigger` - PTX assembly (CUDA 4.0). HCC supports inline GCN assembly. - Several kernel features are under development. See the [HIP Kernel Language](hip_kernel_language.md) for more information. These include: - printf @@ -70,19 +74,18 @@ See the [API Support Table](CUDA_Runtime_API_functions_supported_by_HIP.md) for ### Is HIP a drop-in replacement for CUDA? No. HIP provides porting tools which do most of the work to convert CUDA code into portable C++ code that uses the HIP APIs. -Most developers will port their code from CUDA to HIP and then maintain the HIP version. +Most developers will port their code from CUDA to HIP and then maintain the HIP version. HIP code provides the same performance as native CUDA code, plus the benefits of running on AMD platforms. ### What specific version of CUDA does HIP support? -HIP APIs and features do not map to a specific CUDA version. HIP provides a strong subset of the functionality provided in CUDA, and the hipify tools can -scan code to identify any unsupported CUDA functions - this is useful for identifying the specific features required by a given application. +HIP APIs and features do not map to a specific CUDA version. HIP provides a strong subset of the functionality provided in CUDA, and the hipify tools can scan code to identify any unsupported CUDA functions - this is useful for identifying the specific features required by a given application. However, we can provide a rough summary of the features included in each CUDA SDK and the support level in HIP. Each bullet below lists the major new language features in each CUDA release and then indicate which are supported/not supported in HIP: - CUDA 4.0 and earlier : - HIP supports CUDA 4.0 except for the limitations described above. - CUDA 5.0 : - - Dynamic Parallelism (not supported) + - Dynamic Parallelism (not supported) - cuIpc functions (under development). - CUDA 5.5 : - CUPTI (not directly supported, [AMD GPUPerfAPI](http://developer.amd.com/tools-and-sdks/graphics-development/gpuperfapi/) can be used as an alternative in some cases) @@ -98,10 +101,9 @@ However, we can provide a rough summary of the features included in each CUDA SD - CUDA 8.0 : - Page Migration including cudaMemAdvise, cudaMemPrefetch, other cudaMem* APIs(not supported) - ### What libraries does HIP support? HIP includes growing support for the four key math libraries using hcBlas, hcFft, hcrng and hcsparse, as well as MIOpen for machine intelligence applications. -These offer pointer-based memory interfaces (as opposed to opaque buffers) and can be easily interfaced with other HIP applications. +These offer pointer-based memory interfaces (as opposed to opaque buffers) and can be easily interfaced with other HIP applications. The hip interfaces support both ROCm and CUDA paths, with familiar library interfaces. - [hipBlas](https://github.com/ROCmSoftwarePlatform/hipBLAS), which utilizes [rocBlas](https://github.com/ROCmSoftwarePlatform/rocBLAS). @@ -110,7 +112,7 @@ The hip interfaces support both ROCm and CUDA paths, with familiar library inter - [hiprng](https://github.com/ROCmSoftwarePlatform/hcrng) Additionally, some of the cublas routines are automatically converted to hipblas equivalents by the HIPIFY tools. These APIs use cublas or hcblas depending on the platform and replace the need -to use conditional compilation. +to use conditional compilation. ### How does HIP compare with OpenCL? Both AMD and Nvidia support OpenCL 1.2 on their devices so that developers can write portable code. @@ -131,7 +133,7 @@ HIP and CUDA provide similar math library calls as well. In summary, the HIP ph This reduces the potential for error, and also makes it easy to automate the translation. HIP's goal is to quickly get the ported program running on both platforms with little manual intervention, so that the programmer can focus on performance optimizations. -There have been several tools that have attempted to convert CUDA into OpenCL, such as CU2CL. OpenCL is a C99-based kernel language (rather than C++) and also does not support single-source compilation. +There have been several tools that have attempted to convert CUDA into OpenCL, such as CU2CL. OpenCL is a C99-based kernel language (rather than C++) and also does not support single-source compilation. As a result, the OpenCL syntax is different from CUDA, and the porting tools have to perform some heroic transformations to bridge this gap. The tools also struggle with more complex CUDA applications, in particular, those that use templates, classes, or other C++ features inside the kernel. @@ -153,52 +155,71 @@ NVCC is Nvidia's compiler driver for compiling "CUDA C++" code into PTX or devic ### What is HCC? HCC is AMD's compiler driver which compiles "heterogeneous C++" code into HSAIL or GCN device code for AMD GPUs. It's an open-source compiler based on recent versions of CLANG/LLVM. +In ROCM v3.5 release, HCC compiler is deprecated and HIP-Clang compiler is introduced to compile HIP programs. + +### What is HIP-Clang? +HIP-Clang is new compiler to emphasize its capability to compile HIP programs which can run on AMD platform. + ### Why use HIP rather than supporting CUDA directly? While HIP is a strong subset of the CUDA, it is a subset. The HIP layer allows that subset to be clearly defined and documented. -Developers who code to the HIP API can be assured their code will remain portable across Nvidia and AMD platforms. +Developers who code to the HIP API can be assured their code will remain portable across Nvidia and AMD platforms. In addition, HIP defines portable mechanisms to query architectural features and supports a larger 64-bit wavesize which expands the return type for cross-lane functions like ballot and shuffle from 32-bit ints to 64-bit ints. ### Can I develop HIP code on an Nvidia CUDA platform? -Yes. HIP's CUDA path only exposes the APIs and functionality that work on both NVCC and HCC back-ends. -"Extra" APIs, parameters, and features which exist in CUDA but not in HCC will typically result in compile-time or run-time errors. +Yes. HIP's CUDA path only exposes the APIs and functionality that work on both NVCC and AMDGPU back-ends. +"Extra" APIs, parameters, and features which exist in CUDA but not in HIP-Clang will typically result in compile-time or run-time errors. Developers need to use the HIP API for most accelerator code and bracket any CUDA-specific code with preprocessor conditionals. Developers concerned about portability should, of course, run on both platforms, and should expect to tune for performance. In some cases, CUDA has a richer set of modes for some APIs, and some C++ capabilities such as virtual functions - see the HIP @API documentation for more details. -### Can I develop HIP code on an AMD HCC platform? -Yes. HIP's HCC path only exposes the APIs and functions that work on both NVCC and HCC back ends. "Extra" APIs, parameters and features that appear in HCC but not CUDA will typically cause compile- or run-time errors. Developers must use the HIP API for most accelerator code and bracket any HCC-specific code with preprocessor conditionals. Those concerned about portability should, of course, test their code on both platforms and should tune it for performance. Typically, HCC supports a more modern set of C++11/C++14/C++17 features, so HIP developers who want portability should be careful when using advanced C++ features on the hc path. +### Can I develop HIP code on an AMD HIP-Clang platform? +Yes. HIP's HIP-Clang path only exposes the APIs and functions that work on AMD runtime back ends. "Extra" APIs, parameters and features that appear in HIP-Clang but not CUDA will typically cause compile- or run-time errors. Developers must use the HIP API for most accelerator code and bracket any HIP-Clang specific code with preprocessor conditionals. Those concerned about portability should, of course, test their code on both platforms and should tune it for performance. Typically, HIP-Clang supports a more modern set of C++11/C++14/C++17 features, so HIP developers who want portability should be careful when using advanced C++ features on the HIP-Clang path. +In ROCM v3.5 release, HCC compiler is deprecated, and the HIP-Clang compiler can be used for compiling HIP programs. + +### Do I need to make code changes in HIP code if switching compiler from HCC to HIP-Clang? +For most HIP applications, the transition from HCC to HIP-Clang is transparent as the HIPCC and HIP cmake files automatically choose compiler options for HIP-Clang and hide the difference between the HCC and HIP-Clang code. +However, minor changes may be required as HIP-Clang has stricter syntax and semantic checks compared to HCC. + +### How to use HIP-Clang to build HIP programs? +The environment variable can be used to set compiler path: +- HIP_CLANG_PATH: path to hip-clang. When set, this variable let hipcc to use hip-clang for compilation/linking + +There is an alternative environment variable to set compiler path: +- HIP_ROCCLR_HOME: path to root directory of the HIP-ROCclr runtime. When set, this variable let hipcc use hip-clang from the ROCclr distribution. +NOTE: If HIP_ROCCLR_HOME is set, there is no need to set HIP_CLANG_PATH since hipcc will deduce them from HIP_ROCCLR_HOME. + +### What is ROCclr? +ROCclr (Radeon Open Compute Common Language Runtime) is a virtual device interface that compute runtimes interact with backends such as ROCr on Linux, as well as PAL on Windows. ### Can a HIP binary run on both AMD and Nvidia platforms? -HIP is a source-portable language that can be compiled to run on either the HCC or NVCC platform. HIP tools don't create a "fat binary" that can run on either platform, however. - +HIP is a source-portable language that can be compiled to run on either AMD or NVIDIA platform. HIP tools don't create a "fat binary" that can run on either platform, however. ### What's the difference between HIP and hc? HIP is a portable C++ language that supports a strong subset of the CUDA run-time APIs and device-kernel language. It's designed to simplify CUDA conversion to portable C++. HIP provides a C-compatible run-time API, C-compatible kernel-launch mechanism, C++ kernel language and pointer-based memory management. -A C++ dialect, hc is supported by the AMD HCC compiler. It provides C++ run time, C++ kernel-launch APIs (parallel_for_each), C++ kernel language, and several memory-management options, including pointers, arrays and array_view (with implicit data synchronization). It's intended to be a leading indicator of the ISO C++ standard. +A C++ dialect, hc is supported by the AMD compiler. It provides C++ run time, C++ kernel-launch APIs (parallel_for_each), C++ kernel language, and several memory-management options, including pointers, arrays and array_view (with implicit data synchronization). It's intended to be a leading indicator of the ISO C++ standard. +The HCC compiler has been deprecated in the ROCm Release v3.5. - -### On HCC, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang ? -Yes. HIP/HCC generates the object code which conforms to the GCC ABI, and also links with libstdc++. This means you can compile host code with the compiler of your choice and link the generated object code +### On HIP-Clang, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang ? +Yes. HIP generates the object code which conforms to the GCC ABI, and also links with libstdc++. This means you can compile host code with the compiler of your choice and link the generated object code with GPU code compiled with HIP. Larger projects often contain a mixture of accelerator code (initially written in CUDA with nvcc) and host code (compiled with gcc, icc, or clang). These projects can convert the accelerator code to HIP, compile that code with hipcc, and link with object code from their preferred compiler. +### Can I install both CUDA SDK and HIP-Clang on the same machine? +Yes. You can use HIP_PLATFORM to choose which path hipcc targets. This configuration can be useful when using HIP to develop an application which is portable to both AMD and NVIDIA. -### HIP detected my platform (hcc vs nvcc) incorrectly - what should I do? -HIP will set the platform to HCC if it sees that the AMD graphics driver is installed and has detected an AMD GPU. -Sometimes this isn't what you want - you can force HIP to recognize the platform by setting HIP_PLATFORM to hcc (or nvcc) +### HIP detected my platform (HIP-Clang vs nvcc) incorrectly - what should I do? +HIP will set the platform to hcc and compiler to HIP-Clang if it sees that the AMD graphics driver is installed and has detected an AMD GPU. +Sometimes this isn't what you want - you can force HIP to recognize the platform by setting the following, ``` +export HIP_COMPILER=clang export HIP_PLATFORM=hcc ``` -One symptom of this problem is the message "error: 'unknown error'(11) at square.hipref.cpp:56". This can occur if you have a CUDA installation on an AMD platform, and HIP incorrectly detects the platform as nvcc. HIP may be able to compile the application using the nvcc tool-chain but will generate this error at runtime since the platform does not have a CUDA device. The fix is to set HIP_PLATFORM=hcc and rebuild. - -If you see issues related to incorrect platform detection, please file an issue with the GitHub issue tracker so we can improve HIP's platform detection logic. - -### Can I install both CUDA SDK and HCC on the same machine? -Yes. You can use HIP_PLATFORM to choose which path hipcc targets. This configuration can be useful when using HIP to develop an application which is portable to both AMD and NVIDIA. +One symptom of this problem is the message "error: 'unknown error'(11) at square.hipref.cpp:56". This can occur if you have a CUDA installation on an AMD platform, and HIP incorrectly detects the platform as nvcc. HIP may be able to compile the application using the nvcc tool-chain but will generate this error at runtime since the platform does not have a CUDA device. +The fix is to set HIP_PLATFORM=hcc and rebuild. ### On CUDA, can I mix CUDA code with HIP code? @@ -211,17 +232,8 @@ hipCUResultTohipError If platform portability is important, use #ifdef __HIP_PLATFORM_NVCC__ to guard the CUDA-specific code. -### On HCC, can I use HC functionality with HIP? -Yes. -The code can include hc.hpp and use HC functions inside the kernel. A typical use-case is to use AMD-specific hardware features such as the permute, swizzle, or DPP operations. -See the 'bit_extract' sample for an example. - -Also these functions can be used to extract HCC accelerator and accelerator_view structures from the HIP deviceId and hipStream_t: -hipHccGetAccelerator(int deviceId, hc::accelerator *acc); -hipError_t hipHccGetAcceleratorView(hipStream_t stream, hc::accelerator_view **av); - -If platform portability is important, use #ifdef __HIP_PLATFORM_HIPCC__ to guard the HCC-specific code. - +### On HIP-Clang, can I use HC functionality with HIP? +No. HC functionality is not supported by HIP-Clang. ### How do I trace HIP application flow? See the [HIP Profiling Guide](hip_porting_guide.md) for more information. @@ -231,15 +243,8 @@ Unlike CUDA, in HCC, for functions defined in the header files, the keyword of " Thus, if failed to define "static" keyword, you might see a lot of "symbol multiply defined!" errors at compilation. The workaround is to explicitly add the keyword of "static" before any functions that were defined as "__forceinline__". -### How do I disable HIP Generic Grid Launch option? -Generic Grid Launch(GGL) is currently the default method for hip kernel launch. -To disable it and use the legancy grid launch method, please either change the default value of GENERIC_GRID_LAUNCH to 0 in the following to header files and rebuild HIP: -$HIP/include/hip/hcc_detail/hip_runtime_api.h -$HIP/include/hip/hcc_detail/host_defines.h -Or pass "-DGENERIC_GRID_LAUNCH=0" to hipcc at application compilation time. - -### What is maximum limit of Generic Grid Launch parameters (grid and block)? -Product of (grid.x and block.x), (grid.y and block.y) or (grid.z and block.z) should always be less than UINT_MAX. +### What is maximum limit of kernel launching parameter? +Product of block.x, block.y, and block.z should be less than 1024. ### Are __shfl_*_sync functions supported on HIP platform? -__shfl_*_sync is not supported on HIP but for nvcc path CUDA 9.0 and above all shuffle calls get redirected to it's sync version. \ No newline at end of file +__shfl_*_sync is not supported on HIP but for nvcc path CUDA 9.0 and above all shuffle calls get redirected to it's sync version. diff --git a/docs/markdown/hip_porting_guide.md b/docs/markdown/hip_porting_guide.md index c291fa8ae6..816802fc79 100644 --- a/docs/markdown/hip_porting_guide.md +++ b/docs/markdown/hip_porting_guide.md @@ -1,13 +1,13 @@ # HIP Porting Guide In addition to providing a portable C++ programming environment for GPUs, HIP is designed to ease -the porting of existing CUDA code into the HIP environment. This section describes the available tools +the porting of existing CUDA code into the HIP environment. This section describes the available tools and provides practical suggestions on how to port CUDA code and work through common issues. ## Table of Contents -- [Porting a New Cuda Project](#porting-a-new-cuda-project) +- [Porting a New CUDA Project](#porting-a-new-cuda-project) * [General Tips](#general-tips) * [Scanning existing CUDA code to scope the porting effort](#scanning-existing-cuda-code-to-scope-the-porting-effort) * [Converting a project "in-place"](#converting-a-project-in-place) @@ -22,6 +22,7 @@ and provides practical suggestions on how to port CUDA code and work through com * [Device-Architecture Properties](#device-architecture-properties) * [Table of Architecture Properties](#table-of-architecture-properties) - [Finding HIP](#finding-hip) +- [Identifying HIP Runtime](#identifying-hip-runtime) - [hipLaunchKernel](#hiplaunchkernel) - [Compiler Options](#compiler-options) - [Linking Issues](#linking-issues) @@ -47,20 +48,20 @@ and provides practical suggestions on how to port CUDA code and work through com + [/usr/include/c++/v1/memory:5172:15: error: call to implicitly deleted default constructor of 'std::__1::bad_weak_ptr' throw bad_weak_ptr();](#usrincludecv1memory517215-error-call-to-implicitly-deleted-default-constructor-of-std__1bad_weak_ptr-throw-bad_weak_ptr) * [HIP Environment Variables](#hip-environment-variables) * [Editor Highlighting](#editor-highlighting) - + -## Porting a New Cuda Project +## Porting a New CUDA Project ### General Tips -- Starting the port on a Cuda machine is often the easiest approach, since you can incrementally port pieces of the code to HIP while leaving the rest in Cuda. (Recall that on Cuda machines HIP is just a thin layer over Cuda, so the two code types can interoperate on nvcc platforms.) Also, the HIP port can be compared with the original Cuda code for function and performance. -- Once the Cuda code is ported to HIP and is running on the Cuda machine, compile the HIP code using hcc on an AMD machine. -- HIP ports can replace Cuda versions: HIP can deliver the same performance as a native Cuda implementation, with the benefit of portability to both Nvidia and AMD architectures as well as a path to future C++ standard support. You can handle platform-specific features through conditional compilation or by adding them to the open-source HIP infrastructure. -- Use **[bin/hipconvertinplace-perl.sh](https://github.com/ROCm-Developer-Tools/HIP/blob/master/bin/hipconvertinplace-perl.sh)** to hipify all code files in the Cuda source directory. +- Starting the port on a CUDA machine is often the easiest approach, since you can incrementally port pieces of the code to HIP while leaving the rest in CUDA. (Recall that on CUDA machines HIP is just a thin layer over CUDA, so the two code types can interoperate on nvcc platforms.) Also, the HIP port can be compared with the original CUDA code for function and performance. +- Once the CUDA code is ported to HIP and is running on the CUDA machine, compile the HIP code using the HIP compiler on an AMD machine. +- HIP ports can replace CUDA versions: HIP can deliver the same performance as a native CUDA implementation, with the benefit of portability to both Nvidia and AMD architectures as well as a path to future C++ standard support. You can handle platform-specific features through conditional compilation or by adding them to the open-source HIP infrastructure. +- Use **[bin/hipconvertinplace-perl.sh](https://github.com/ROCm-Developer-Tools/HIP/blob/master/bin/hipconvertinplace-perl.sh)** to hipify all code files in the CUDA source directory. ### Scanning existing CUDA code to scope the porting effort -The hipexamine-perl.sh tool will scan a source directory to determine which files contain CUDA code and how much of that code can be automatically hipified, +The hipexamine-perl.sh tool will scan a source directory to determine which files contain CUDA code and how much of that code can be automatically hipified. ``` > cd examples/rodinia_3.0/cuda/kmeans > $HIP_DIR/bin/hipexamine-perl.sh. @@ -128,110 +129,119 @@ directory names. | CUDA Library | ROCm Library | Comment | |------- | --------- | ----- | -| cuBLAS | rocBLAS | Basic Linear Algebra Subroutines -| cuFFT | rocFFT | Fast Fourier Transfer Library -| cuSPARSE | rocSPARSE | Sparse BLAS + SPMV -| cuSolver | rocSolver | Lapack library +| cuBLAS | rocBLAS | Basic Linear Algebra Subroutines +| cuFFT | rocFFT | Fast Fourier Transfer Library +| cuSPARSE | rocSPARSE | Sparse BLAS + SPMV +| cuSolver | rocSOLVER | Lapack library | AMG-X | rocALUTION | Sparse iterative solvers and preconditioners with Geometric and Algebraic MultiGrid -| Thrust | hipThrust | C++ parallel algorithms library +| Thrust | rocThrust | C++ parallel algorithms library | CUB | rocPRIM | Low Level Optimized Parallel Primitives -| cuDNN | MIOpen | Deep learning Solver Library +| cuDNN | MIOpen | Deep learning Solver Library | cuRAND | rocRAND | Random Number Generator Library -| EIGEN | EIGEN – HIP port | C++ template library for linear algebra: matrices, vectors, numerical solvers, +| EIGEN | EIGEN – HIP port | C++ template library for linear algebra: matrices, vectors, numerical solvers, | NCCL | RCCL | Communications Primitives Library based on the MPI equivalents - + ## Distinguishing Compiler Modes - - + + ### Identifying HIP Target Platform -All HIP projects target either the hcc or nvcc platform. The platform affects which headers are included and which libraries are used for linking. - -- `HIPCC_PLATFORM_HCC` is defined if the HIP platform targets hcc -- `HIPCC_PLATFORM_NVCC` is defined if the HIP platform targets nvcc - -Many projects use a mixture of an accelerator compiler (hcc or nvcc) and a standard compiler (e.g., g++). These defines are set for both accelerator and standard compilers and thus are often the best option when writing code that uses conditional compilation. - - +All HIP projects target either AMD or NVIDIA platform. The platform affects which headers are included and which libraries are used for linking. + +- `HIP_PLATFORM_HCC` is defined if the HIP platform targets AMD + +- `HIP_PLATFORM_NVCC` is defined if the HIP platform targets NVIDIA + +On AMD platform, the compiler was hcc, but is deprecated in ROCM v3.5 release, and HIP-Clang compiler is introduced for compiling HIP programs. + +For most HIP applications, the transition from hcc to HIP-Clang is transparent. +HIPCC and HIP cmake files automatically choose compilation options for HIP-Clang and hide the difference between the hcc and hip-clang code. +However, minor changes may be required as HIP-Clang has stricter syntax and semantic checks compared to hcc. + +Many projects use a mixture of an accelerator compiler (AMD or NVIDIA) and a standard compiler (e.g. g++). These defines are set for both accelerator and standard compilers and thus are often the best option when writing code that uses conditional compilation. + + + ### Identifying the Compiler: hcc, hip-clang or nvcc -Often, it's useful to know whether the underlying compiler is hcc, hip-clang or nvcc. This knowledge can guard platform-specific code (features that only work on the nvcc, hip-clang or hcc path but not all) or aid in platform-specific performance tuning. - +Often, it's useful to know whether the underlying compiler is hcc, HIP-Clang or nvcc. This knowledge can guard platform-specific code or aid in platform-specific performance tuning. + + ``` #ifdef __HCC__ -// Compiled with hcc - +// Compiled with hcc + ``` ``` #ifdef __HIP__ -// Compiled with hip-clang - +// Compiled with HIP-Clang + ``` - + ``` #ifdef __NVCC__ -// Compiled with nvcc -// Could be compiling with Cuda language extensions enabled (for example, a ".cu file) +// Compiled with nvcc +// Could be compiling with CUDA language extensions enabled (for example, a ".cu file) // Could be in pass-through mode to an underlying host compile OR (for example, a .cpp file) - + ``` - + ``` #ifdef __CUDACC__ -// Compiled with nvcc (Cuda language extensions enabled) +// Compiled with nvcc (CUDA language extensions enabled) ``` - -hcc and hip-clang directly generates the host code (using the Clang x86 target) and passes the code to another host compiler. Thus, they have no equivalent of the \__CUDA_ACC define. - -The macro `__HIPCC__` is set if either `__HCC__`, `__HIP__` or `__CUDACC__` is defined. This configuration is useful in determining when code is being compiled using an accelerator-enabled compiler (hcc or nvcc) as opposed to a standard host compiler (GCC, ICC, Clang, etc.). - + +Compiler directly generates the host code (using the Clang x86 target) and passes the code to another host compiler. Thus, they have no equivalent of the \__CUDA_ACC define. + + ### Identifying Current Compilation Pass: Host or Device - -Both nvcc and hcc make two passes over the code: one for host code and one for device code. `__HIP_DEVICE_COMPILE__` is set to a nonzero value when the compiler (hcc or nvcc) is compiling code for a device inside a `__global__` kernel or for a device function. `__HIP_DEVICE_COMPILE__` can replace #ifdef checks on the `__CUDA_ARCH__` define. - + +nvcc makes two passes over the code: one for host code and one for device code. +HIP-Clang will have multiple passes over the code: one for the host code, and one for each architecture on the device code. +`__HIP_DEVICE_COMPILE__` is set to a nonzero value when the compiler (hcc, HIP-Clang or nvcc) is compiling code for a device inside a `__global__` kernel or for a device function. `__HIP_DEVICE_COMPILE__` can replace #ifdef checks on the `__CUDA_ARCH__` define. + ``` -// #ifdef __CUDA_ARCH__ +// #ifdef __CUDA_ARCH__ #if __HIP_DEVICE_COMPILE__ ``` - -Unlike `__CUDA_ARCH__`, the `__HIP_DEVICE_COMPILE__` value is 1 or undefined, and it doesn't represent the feature capability of the target device. + +Unlike `__CUDA_ARCH__`, the `__HIP_DEVICE_COMPILE__` value is 1 or undefined, and it doesn't represent the feature capability of the target device. ### Compiler Defines: Summary -|Define | hcc | hip-clang | nvcc | Other (GCC, ICC, Clang, etc.) +|Define | hcc | HIP-Clang | nvcc | Other (GCC, ICC, Clang, etc.) |--- | --- | --- | --- |---| |HIP-related defines:| |`__HIP_PLATFORM_HCC__`| Defined | Defined | Undefined | Defined if targeting hcc platform; undefined otherwise | |`__HIP_PLATFORM_NVCC__`| Undefined | Undefined | Defined | Defined if targeting nvcc platform; undefined otherwise | -|`__HIP_DEVICE_COMPILE__` | 1 if compiling for device; undefined if compiling for host | 1 if compiling for device; undefined if compiling for host |1 if compiling for device; undefined if compiling for host | Undefined +|`__HIP_DEVICE_COMPILE__` | 1 if compiling for device; undefined if compiling for host | 1 if compiling for device; undefined if compiling for host |1 if compiling for device; undefined if compiling for host | Undefined |`__HIPCC__` | Defined | Defined | Defined | Undefined -|`__HIP_ARCH_*` | 0 or 1 depending on feature support (see below) |0 or 1 depending on feature support (see below) | 0 or 1 depending on feature support (see below) | 0 +|`__HIP_ARCH_*` | 0 or 1 depending on feature support (see below) |0 or 1 depending on feature support (see below) | 0 or 1 depending on feature support (see below) | 0 |nvcc-related defines:| |`__CUDACC__` | Undefined | Undefined | Defined if source code is compiled by nvcc; undefined otherwise | Undefined |`__NVCC__` | Undefined | Undefined | Defined | Undefined -|`__CUDA_ARCH__` | Undefined | Undefined | Unsigned representing compute capability (e.g., "130") if in device code; 0 if in host code | Undefined +|`__CUDA_ARCH__` | Undefined | Undefined | Unsigned representing compute capability (e.g., "130") if in device code; 0 if in host code | Undefined |hcc-related defines:| |`__HCC__` | Defined | Undefined | Undefined | Undefined -|`__HCC_ACCELERATOR__` | Nonzero if in device code; otherwise undefined | Undefined | Undefined | Undefined +|`__HCC_ACCELERATOR__` | Nonzero if in device code; otherwise undefined | Undefined | Undefined | Undefined |hip-clang-related defines:| |`__HIP__` | Undefined | Defined | Undefined | Undefined -|hcc/hip-clang common defines:| +|hcc/HIP-Clang common defines:| |`__clang__` | Defined | Defined | Undefined | Defined if using Clang; otherwise undefined - ## Identifying Architecture Features ### HIP_ARCH Defines -Some Cuda code tests `__CUDA_ARCH__` for a specific value to determine whether the machine supports a certain architectural feature. For instance, +Some CUDA code tests `__CUDA_ARCH__` for a specific value to determine whether the machine supports a certain architectural feature. For instance, ``` -#if (__CUDA_ARCH__ >= 130) +#if (__CUDA_ARCH__ >= 130) // doubles are supported ``` -This type of code requires special attention, since hcc/AMD and nvcc/Cuda devices have different architectural capabilities. Moreover, you can't determine the presence of a feature using a simple comparison against an architecture's version number. HIP provides a set of defines and device properties to query whether a specific architectural feature is supported. +This type of code requires special attention, since hcc/AMD and nvcc/CUDA devices have different architectural capabilities. Moreover, you can't determine the presence of a feature using a simple comparison against an architecture's version number. HIP provides a set of defines and device properties to query whether a specific architectural feature is supported. -The `__HIP_ARCH_*` defines can replace comparisons of `__CUDA_ARCH__` values: +The `__HIP_ARCH_*` defines can replace comparisons of `__CUDA_ARCH__` values: ``` //#if (__CUDA_ARCH__ >= 130) // non-portable if __HIP_ARCH_HAS_DOUBLES__ { // portable HIP feature query @@ -281,7 +291,7 @@ The table below shows the full set of architectural properties that HIP supports |`__HIP_ARCH_HAS_SURFACE_FUNCS__` | hasSurfaceFuncs | |`__HIP_ARCH_HAS_3DGRID__` | has3dGrid | Grids and groups are 3D |`__HIP_ARCH_HAS_DYNAMIC_PARALLEL__` | hasDynamicParallelism | - + ## Finding HIP @@ -291,10 +301,24 @@ Makefiles can use the following syntax to conditionally provide a default HIP_PA HIP_PATH ?= $(shell hipconfig --path) ``` -## hipLaunchKernel +## Identifying HIP Runtime + +HIP can depend on ROCclr, or NVCC as runtime + +- AMD platform +`HIP_ROCclr` is defined on AMD platform that HIP use Radeon Open Compute Common Language Runtime, called ROCclr. + +ROCclr is a virtual device interface that HIP runtimes interact with different backends which allows runtimes to work on Linux , as well as Windows without much efforts. + +- NVIDIA platform +On Nvidia platform, HIP is just a thin layer on top of CUDA. +On non-AMD platform, HIP runtime determines if nvcc is available and can be used. If available, HIP_PLATFORM is set to nvcc and underneath CUDA path is used. + + +## hipLaunchKernel hipLaunchKernel is a variadic macro which accepts as parameters the launch configurations (grid dims, group dims, stream, dynamic shared size) followed by a variable number of kernel arguments. -This sequence is then expanded into the appropriate kernel launch syntax depending on the platform. +This sequence is then expanded into the appropriate kernel launch syntax depending on the platform. While this can be a convenient single-line kernel launch syntax, the macro implementation can cause issues when nested inside other macros. For example, consider the following: ``` @@ -325,57 +349,42 @@ MY_LAUNCH (hipLaunchKernel(vAdd, dim3(1024), dim3(1), 0, 0, Ad), true, "firstCal ## Compiler Options -hipcc is a portable compiler driver that will call nvcc or hcc (depending on the target system) and attach all required include and library options. It passes options through to the target compiler. Tools that call hipcc must ensure the compiler options are appropriate for the target compiler. The `hipconfig` script may helpful in making -infrastructure that identifies the target platform and sets options appropriately. It returns either "nvcc" or "hcc." The following sample shows the script in a makefile: - -``` -HIP_PLATFORM=$(shell hipconfig --compiler) - -ifeq (${HIP_PLATFORM}, nvcc) - HIPCC_FLAGS = -gencode=arch=compute_20,code=sm_20 -endif -ifeq (${HIP_PLATFORM}, hcc) - HIPCC_FLAGS = -Wno-deprecated-register -endif - -``` - +hipcc is a portable compiler driver that will call nvcc or HIP-Clang (depending on the target system) and attach all required include and library options. It passes options through to the target compiler. Tools that call hipcc must ensure the compiler options are appropriate for the target compiler. +The `hipconfig` script may helpful in identifying the target platform, compiler and runtime. It can also help set options appropriately. ## Linking Issues ### Linking With hipcc -hipcc adds the necessary libraries for HIP as well as for the accelerator compiler (nvcc or hcc). We recommend linking with hipcc. +hipcc adds the necessary libraries for HIP as well as for the accelerator compiler (nvcc or AMD compiler). We recommend linking with hipcc since it automatically links the binary to the necessary HIP runtime libraries. It also has knowledge on how to link and to manage the GPU objects. ### -lm Option - + hipcc adds -lm by default to the link command. ## Linking Code With Other Compilers -Cuda code often uses nvcc for accelerator code (defining and launching kernels, typically defined in .cu or .cuh files). -It also uses a standard compiler (g++) for the rest of the application. nvcc is a preprocessor that employs a standard host compiler (e.g., gcc) to generate the host code. -Code compiled using this tool can employ only the intersection of language features supported by both nvcc and the host compiler. -In some cases, you must take care to ensure the data types and alignment of the host compiler are identical to those of the device compiler. Only some host compilers are supported---for example, recent nvcc versions lack Clang host-compiler capability. +CUDA code often uses nvcc for accelerator code (defining and launching kernels, typically defined in .cu or .cuh files). +It also uses a standard compiler (g++) for the rest of the application. nvcc is a preprocessor that employs a standard host compiler (gcc) to generate the host code. +Code compiled using this tool can employ only the intersection of language features supported by both nvcc and the host compiler. +In some cases, you must take care to ensure the data types and alignment of the host compiler are identical to those of the device compiler. Only some host compilers are supported---for example, recent nvcc versions lack Clang host-compiler capability. hcc generates both device and host code using the same Clang-based compiler. The code uses the same API as gcc, which allows code generated by different gcc-compatible compilers to be linked together. For example, code compiled using hcc can link with code compiled using "standard" compilers (such as gcc, ICC and Clang). Take care to ensure all compilers use the same standard C++ header and library formats. ### libc++ and libstdc++ -Version 0.86 of hipcc now uses libstdc++ by default for the HCC platform. This improves cross-linking support between G++ and hcc, in particular for interfaces that use - standard C++ libraries (ie std::vector, std::string). +hipcc links to libstdc++ by default. This provides better compatibility between g++ and HIP. -If you pass "--stdlib=libc++" to hipcc, hipcc will use the libc++ library. Generally, libc++ provides a broader set of C++ features while libstdc++ is the standard -for more compilers (notably including g++). +If you pass "--stdlib=libc++" to hipcc, hipcc will use the libc++ library. Generally, libc++ provides a broader set of C++ features while libstdc++ is the standard for more compilers (notably including g++). -When cross-linking C++ code, any C++ functions that use types from the C++ standard library (including std::string, std::vector and other containers) must use the same standard-library implementation. They include the following: +When cross-linking C++ code, any C++ functions that use types from the C++ standard library (including std::string, std::vector and other containers) must use the same standard-library implementation. They include the following: - Functions or kernels defined in hcc that are called from a standard compiler - Functions defined in a standard compiler that are called from hcc. -Applications with these interfaces should use the default libstdc++ linking. +Applications with these interfaces should use the default libstdc++ linking. Applications which are compiled entirely with hipcc, and which benefit from advanced C++ features not supported in libstdc++, and which do not require portability to nvcc, may choose to use libc++. @@ -387,12 +396,12 @@ The hip_runtime.h and hip_runtime_api.h files define the types, functions and en - hip_runtime_api.h: defines all the HIP runtime APIs (e.g., hipMalloc) and the types required to call them. A source file that is only calling HIP APIs but neither defines nor launches any kernels can include hip_runtime_api.h. hip_runtime_api.h uses no custom hc language features and can be compiled using a standard C++ compiler. - hip_runtime.h: included in hip_runtime_api.h. It additionally provides the types and defines required to create and launch kernels. hip_runtime.h does use custom hc language features, but they are guarded by ifdef checks. It can be compiled using a standard C++ compiler but will expose a subset of the available functions. -Cuda has slightly different contents for these two files. In some cases you may need to convert hipified code to include the richer hip_runtime.h instead of hip_runtime_api.h. +CUDA has slightly different contents for these two files. In some cases you may need to convert hipified code to include the richer hip_runtime.h instead of hip_runtime_api.h. ### Using a Standard C++ Compiler You can compile hip\_runtime\_api.h using a standard C or C++ compiler (e.g., gcc or ICC). The HIP include paths and defines (`__HIP_PLATFORM_HCC__` or `__HIP_PLATFORM_NVCC__`) must pass to the standard compiler; hipconfig then returns the necessary options: ``` -> hipconfig --cxx_config +> hipconfig --cxx_config -D__HIP_PLATFORM_HCC__ -I/home/user1/hip/include ``` @@ -410,11 +419,11 @@ The hipify-perl script automatically converts "cuda_runtime.h" to "hip_runtime.h #### cuda.h -The hcc path provides an empty cuda.h file. Some existing Cuda programs include this file but don't require any of the functions. +The hcc path provides an empty cuda.h file. Some existing CUDA programs include this file but don't require any of the functions. ### Choosing HIP File Extensions -Many existing Cuda projects use the ".cu" and ".cuh" file extensions to indicate code that should be run through the nvcc compiler. +Many existing CUDA projects use the ".cu" and ".cuh" file extensions to indicate code that should be run through the nvcc compiler. For quick HIP ports, leaving these file extensions unchanged is often easier, as it minimizes the work required to change file names in the directory and #include statements in the files. For new projects or ports which can be re-factored, we recommend the use of the extension ".hip.cpp" for source files, and @@ -547,7 +556,7 @@ hipcc-cmd: /opt/hcc/bin/hcc -hc -I/opt/hcc/include -stdlib=libc++ -I../../../.. #### /usr/include/c++/v1/memory:5172:15: error: call to implicitly deleted default constructor of 'std::__1::bad_weak_ptr' throw bad_weak_ptr(); -If you pass a ".cu" file, hcc will attempt to compile it as a Cuda language file. You must tell hcc that it's in fact a C++ file: use the "-x c++" option. +If you pass a ".cu" file, hcc will attempt to compile it as a CUDA language file. You must tell hcc that it's in fact a C++ file: use the "-x c++" option. ### HIP Environment Variables diff --git a/docs/markdown/hip_terms2.md b/docs/markdown/hip_terms2.md index 3b4661729d..8065c47876 100644 --- a/docs/markdown/hip_terms2.md +++ b/docs/markdown/hip_terms2.md @@ -10,10 +10,16 @@ The default device can be set with hipSetDevice. - completion_future becomes ready. "Completes". -- hcc = Heterogeneous Compute Compiler (https://bitbucket.org/multicoreware/hcc/wiki/Home). +- hcc = Heterogeneous Compute Compiler ( https://github.com/RadeonOpenCompute/hcc). + Starting from ROCM v3.5 release, hcc compiler is deprecated and HIP-Clang compiler is introduced for compiling HIP programs + +- HIP-Clang - Heterogeneous AMDGPU Compiler, with its capability to compile HIP programs on AMD platform (https://github.com/RadeonOpenCompute/llvm-project). + +- ROCclr - a virtual device interface that compute runtimes interact with different backends such as ROCr on Linux or PAL on Windows. + The ROCclr (https://github.com/ROCm-Developer-Tools/ROCclr) is an abstraction layer allowing runtimes to work on both OSes without much effort. - hipify tools - tools to convert CUDA(R) code to portable C++ code (https://github.com/ROCm-Developer-Tools/HIPIFY). + - hipconfig - tool to report various configuration properties of the target platform. - nvcc = nvcc compiler, do not capitalize. -- hcc = heterogeneous compute compiler, do not capitalize.