diff --git a/docs/markdown/hip_porting_guide.md b/docs/markdown/hip_porting_guide.md index b706262f71..de5c590e12 100644 --- a/docs/markdown/hip_porting_guide.md +++ b/docs/markdown/hip_porting_guide.md @@ -26,7 +26,7 @@ and provides practical suggestions on how to port CUDA code and work through com * [Linking With hipcc](#linking-with-hipcc" aria-hidden="true"> hipconfig --cxx_config -D__HIP_PLATFORM_HCC__ -I/home/user1/hip/include @@ -341,8 +343,11 @@ You can capture the hipconfig output and passed it to the standard compiler; bel CPPFLAGS += $(shell $(HIP_PATH)/bin/hipconfig --cpp_config) ``` -nvcc includes some headers by default. Files that call HIP run-time APIs or define HIP kernels must explicitly include HIP headers. If the compilation process reports that it cannot find necessary APIs (for example, "error: identifier ‘hipSetDevice’ is undefined"), -ensure that the file includes hip_runtime.h (or hip_runtime_api.h, if appropriate). The hipify script automatically converts "cuda_runtime.h" to "hip_runtime.h," and it converts "cuda_runtime_api.h" to "hip_runtime_api.h", but it may miss nested headers or macros. +nvcc includes some headers by default. However, HIP does not include default headers, and instead all required files must be explicitly included. +Specifically, files that call HIP run-time APIs or define HIP kernels must explicitly include the appropriate HIP headers. +If the compilation process reports that it cannot find necessary APIs (for example, "error: identifier ‘hipSetDevice’ is undefined"), +ensure that the file includes hip_runtime.h (or hip_runtime_api.h, if appropriate). +The hipify script automatically converts "cuda_runtime.h" to "hip_runtime.h," and it converts "cuda_runtime_api.h" to "hip_runtime_api.h", but it may miss nested headers or macros. #### cuda.h @@ -366,6 +371,8 @@ Code should not assume a warp size of 32 or 64. See [Warp Cross-Lane Functions] #### Textures and Cache Control +>Texture support is under-development and not yet supported by HIP. + Compute programs sometimes use textures either to access dedicated texture caches or to use the texture-sampling hardware for interpolation and clamping. The former approach uses simple point samplers with linear interpolation, essentially only reading a single point. The latter approach uses the sampler hardware to interpolate and combine multiple point samples. AMD hardware, as well as recent competing hardware, has a unified texture/L1 cache, so it no longer has a dedicated texture cache. But the nvcc path often caches global loads in the L2 cache, and some programs may benefit from explicit control of the L1 cache contents. We recommend the __ldg instruction for this purpose.