Merge branch 'release_0.92.00' into amd-develop

Conflicts: RELEASE.md docs/markdown/hip_faq.md Change-Id: Ifae1b64b6255a7872dfdc885bb8fb52f622464b7 [ROCm/clr commit: 2d6eb727e9]
2016-08-01 10:47:25 +05:30
@@ -1,18 +1,17 @@
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
-**Installation** 
+## Table of Contents

- [Installing pre-built packages:](#installing-pre-built-packages)
-  - [Prerequisites](#prerequisites)
-  - [AMD (hcc)](#amd-hcc)
-  - [NVIDIA (nvcc)](#nvidia-nvcc)
-  - [Verify your installation](#verify-your-installation)
+<!-- toc -->
+
+- [Installing pre-built packages](#installing-pre-built-packages)
+  * [Prerequisites](#prerequisites)
+  * [AMD-hcc](#amd-hcc)
+  * [NVIDIA-nvcc](#nvidia-nvcc)
+  * [Verify your installation](#verify-your-installation)
 - [Building HIP from source](#building-hip-from-source)
-  - [HCC Options](#hcc-options)
-    - [Using HIP with the AMD Native-GCN compiler.](#using-hip-with-the-amd-native-gcn-compiler)
-    - [Compiling CodeXL markers for HIP Functions](#compiling-codexl-markers-for-hip-functions)
+  * [HCC Options](#hcc-options)
+    + [Using HIP with the AMD Native-GCN compiler.](#using-hip-with-the-amd-native-gcn-compiler)

-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+<!-- tocstop -->

 # Installing pre-built packages

@@ -62,7 +61,7 @@ HIP source code is available and the project can be built from source on the HCC
 2. Download HIP source code (from the [GitHub repot](https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP).)
 3. Build and install HIP (This is the simple version assuming default paths ; see below for additional options.)
 ```
-cd HIP-privatestaging
+cd HIP
 mkdir build
 cd build
 cmake .. 
@@ -78,7 +77,7 @@ make install
 Here's a richer command-line that overrides the default paths:

 ```shell
-cd HIP-privatestaging
+cd HIP
 mkdir build
 cd build
 cmake -DHSA_PATH=/path/to/hsa -DHCC_HOME=/path/to/hcc -DCMAKE_INSTALL_PREFIX=/where/to/install/hip -DCMAKE_BUILD_TYPE=Release ..
@@ -105,42 +104,3 @@ Alternatively, this sections describes how to build it from source:
 export HCC_HOME=/path/to/native/hcc
 ```

-
-### Compiling CodeXL markers for HIP Functions
-HIP can generate markers at function begin/end which are displayed on the CodeXL timeline view.  To do this, you need to install CodeXL, tell HIP
-where the CodeXL install directory lives, and enable HIP to generate the markers:
-
-1. Install CodeXL
-See [CodeXL Download](http://developer.amd.com/tools-and-sdks/opencl-zone/codexl/?webSyncID=9d9c2cb9-3d73-5e65-268a-c7b06428e5e0&sessionGUID=29beacd0-d654-ddc6-a3e2-b9e6c0b0cc77) for the installation file.
-Also this [blog](http://gpuopen.com/getting-up-to-speed-with-the-codexl-gpu-profiler-and-radeon-open-compute/) provides more information and tips for using CodeXL.  In addition to installing the CodeXL profiling 
-and visualization tools, CodeXL also comes with an SDK that allow applications to add markers to the timeline viewer.  We'll be linking HIP against this library.
-
-2. Set CODEXL_PATH
-```shell
-# set to your code-xl installation location:
-export CODEXL_PATH=/opt/AMD/CodeXL
-```
-
-3. Enable in source code.
-In src/hip_hcc.cpp, enable the define 
-```c
-#define COMPILE_TRACE_MARKER 1
-```
-
-
-Then recompile the target application, run with profiler enabled to generate ATP file or trace log.
-```shell
-# Use profiler to generate timeline view:
-$CODEXL_PATH/CodeXLGpuProfiler -A  -o  ./myHipApp  
-...
-Session output path: /home/me/HIP-privatestaging/tests/b1/mytrace.atp
-```
-
-You can also print the HIP function strings to stderr using HIP_TRACE_API environment variable.  This can be useful for tracing application flow.  Also can be combined with the more detailed debug information provided
-by the HIP_DB switch.  For example:
-```shell
-# Trace to stderr showing begin/end of each function (with arguments) + intermediate debug trace during the execution of each function.
-HIP_TRACE_API=1 HIP_DB=0x2 ./myHipApp  
-```
-
-Note this trace mode uses colors.  "less -r" can handle raw control characters and will display the debug output in proper colors.
@@ -3,13 +3,27 @@
 We have attempted to document known bugs and limitations - in particular the [HIP Kernel Language](docs/markdown/hip_kernel_language.md) document uses the phrase "Under Development", and the [HIP Runtime API bug list](http://gpuopen-professionalcompute-tools.github.io/HIP/bug.html) lists known bugs. 

 Upcoming:
- hipLaunchKernel supports one-dimensional grid and/or block dims, without explicit cast to dim3 type.
 - Stability: Enforce perioidic host synchronization to reclaim resources if the application has launched a large
  number of commands (>1K) without synchronizing.  
 - Register keyword now silently ignored on HCC (previously would emit warning).
 - Doc updates: Add some more frequently asked questions to FAQ, fix TOC in some files, review.
+
 ===================================================================================================

+## Revision History:
+
+===================================================================================================
+Release:0.92.00
+- hipLaunchKernel supports one-dimensional grid and/or block dims, without explicit cast to dim3 type (actually in 0.90.00)
+- fp16 software support
+- Support for Hawaii dGPUs using environment variable ROCM_TARGET=hawaii
+- Support hipArray
+- Improved profiler support
+- Documentation updates
+- Improvements to clang-hipify
+
+
+===================================================================================================
 Release:0.90.00
 Date: 2016.06.29
 - Support dynamic shared memory allocations
@@ -19,9 +33,6 @@ Date: 2016.06.29
 - Improve error code reporting on nvcc.
 - Add hipPeekAtError for nvcc.

-===================================================================================================
-
-## Revision History:

 ===================================================================================================
 Release:0.86.00
@@ -1,3 +1,14 @@
+## Table of Contents
+
+<!-- toc -->
+
+- [Using hipify-clang](#using-hipify-clang)
+  * [Build and install](#build-and-install)
+  * [Running and using hipify-clang](#running-and-using-hipify-clang)
+    + [Disclaimer](#disclaimer)
+
+<!-- tocstop -->
+
 ## Using hipify-clang

 `hipify-clang` is a clang-based tool which can automate the translation of CUDA source code into portable HIP C++.
@@ -43,4 +54,4 @@ The information contained herein is for informational purposes only, and is subj

 AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

-Copyright (c) 2014-2016 Advanced Micro Devices, Inc. All rights reserved.
+Copyright (c) 2014-2016 Advanced Micro Devices, Inc. All rights reserved.
@@ -1,12 +1,14 @@
 # FAQ

- [What APIs and features does HIP support ?](#Q1)
- [What is not supported?](#Q2)
-  - [Run-time features](#run-time-features)
-  - [Kernel language features](#kernel-language-features)
- [Is HIP a drop-in replacement for CUDA?](#Q3)
- [What version of CUDA is supported?](#Q4)
- [What libraries does HIP support?](#Q5)
+<!-- toc -->
+
+- [What APIs and features does HIP support?](#what-apis-and-features-does-hip-support)
+- [What is not supported?](#what-is-not-supported)
+  * [Run-time features](#run-time-features)
+  * [Kernel language features](#kernel-language-features)
+- [Is HIP a drop-in replacement for CUDA?](#is-hip-a-drop-in-replacement-for-cuda)
+- [What specific version of CUDA does HIP support?](#what-specific-version-of-cuda-does-hip-support)
+- [What libraries does HIP support?](#what-libraries-does-hip-support)
 - [How does HIP compare with OpenCL?](#how-does-hip-compare-with-opencl)
 - [What hardware does HIP support?](#what-hardware-does-hip-support)
 - [Does Hipify automatically convert all source code?](#does-hipify-automatically-convert-all-source-code)
@@ -16,14 +18,17 @@
 - [Can I develop HIP code on an Nvidia CUDA platform?](#can-i-develop-hip-code-on-an-nvidia-cuda-platform)
 - [Can I develop HIP code on an AMD HCC platform?](#can-i-develop-hip-code-on-an-amd-hcc-platform)
 - [Can a HIP binary run on both AMD and Nvidia platforms?](#can-a-hip-binary-run-on-both-amd-and-nvidia-platforms)
- [Can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang ?]
 - [What's the difference between HIP and hc?](#whats-the-difference-between-hip-and-hc)
+- [On HCC, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang ?](#on-hcc-can-i-link-hip-code-with-host-code-compiled-with-another-compiler-such-as-gcc-icc-or-clang-)
 - [HIP detected my platform (hcc vs nvcc) incorrectly - what should I do?](#hip-detected-my-platform-hcc-vs-nvcc-incorrectly---what-should-i-do)
- [Can I install both CUDA SDK and HCC on same machine?](#Q100)
+- [Can I install both CUDA SDK and HCC on same machine?](#can-i-install-both-cuda-sdk-and-hcc-on-same-machine)
+- [How do I trace HIP application flow?](#how-do-i-trace-hip-application-flow)
+  * [Using CodeXL markers for HIP Functions](#using-codexl-markers-for-hip-functions)
+  * [Using HIP_TRACE_API](#using-hip_trace_api)

+<!-- tocstop -->

-
-### What APIs and features does HIP support?<a name="Q1"></a>
+### What APIs and features does HIP support?
 HIP provides the following:
 - Devices (hipSetDevice(), hipGetDeviceProperties(), etc.)
 - Memory management (hipMalloc(), hipMemcpy(), hipFree(), etc.)
@@ -36,7 +41,7 @@ HIP provides the following:

 The HIP API documentation describes each API and its limitations, if any, compared with the equivalent CUDA API.

-### What is not supported?<a name="Q2"></a>
+### What is not supported?
 #### Run-time features
 - Textures 
 - MemcpyToSymbol functions
@@ -53,19 +58,19 @@ The HIP API documentation describes each API and its limitations, if any, compar
 - PTX assembly (CUDA 4.0)
 - Several kernel features are under development.  See the [HIP Kernel Language](hip_kernel_language.md) for more information.

-### Is HIP a drop-in replacement for CUDA?<a name="Q3"></a>
+### Is HIP a drop-in replacement for CUDA?
 No. HIP provides porting tools which do most of the work do convert CUDA code into portable C++ code that uses the HIP APIs.
 Most developers will port their code from CUDA to HIP and then maintain the HIP version. 
 HIP code provides the same performance as coding in native CUDA, plus the benefit that the code can also run on AMD platforms.

-### What specific version of CUDA does HIP support?<a name="Q4"></a>
+### What specific version of CUDA does HIP support?
 HIP APIs and features do not map to a specific CUDA version.  HIP provides a strong subset of functionality provided in CUDA, and the hipify tools can 
 scan code to identify any unsupported CUDA functions - this is very useful for identifying the specific features required by a given application.

 However, we can provide a rough summary of the features included in each CUDA SDK and the support level in HIP:

 - CUDA 4.0 and earlier :  
-    - HIP supports CUDA 4.0 except for the limitations described [above](#Q2).
+    - HIP supports CUDA 4.0 except for the limitations described above.
 - CUDA 5.0 : 
    - Dynamic Parallelism (not supported) 
    - cuIpc functions (under development).
@@ -83,7 +88,7 @@ However, we can provide a rough summary of the features included in each CUDA SD
 - CUDA 8.0
    - No new language features.

-### What libraries does HIP support? <a name="Q5"></a>
+### What libraries does HIP support?
 HIP includes growing support for the 4 key math libraries using hcBlas, hcFft, hcrng, and hcsparse).
 These offer pointer-based memory interfaces (as opposed to opaque buffers) and can be easily interfaces with other HCC code.  Developers should use conditional compliation if portability to nvcc systems is desired - using calls to cu* routines on one path and hc* routines on the other.  

@@ -148,14 +153,13 @@ HIP is a portable C++ language that supports a strong subset of the CUDA run-tim
 A C++ dialect, hc is supported by the AMD HCC compiler. It provides C++ run time, C++ kernel-launch APIs (parallel_for_each), C++ kernel language, and several memory-management options, including pointers, arrays and array_view (with implicit data synchronization). It's intended to be a leading indicator of the ISO C++ standard.


-### [On HCC, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang ?
+### On HCC, can I link HIP code with host code compiled with another compiler such as gcc, icc, or clang ?
 Yes!  HIP/HCC generates the object code which conforms to the GCC ABI, and also links with libstdc++.  This means you can compile host code with the compiler of your choice and link this
 with GPU code compiler with HIP.  Larger projects often contain a mixture of accelerator code (initially written in CUDA with nvcc) plus host code (compiled with gcc, icc, or clang).   These projects
 can convert the accelerator code to HIP, compile that code with hipcc, and link with object code from the preferred compiler.


 ### HIP detected my platform (hcc vs nvcc) incorrectly - what should I do?
-
 HIP will set the platform to HCC if it sees that the AMD graphics driver is installed and has detected an AMD GPU.
 Sometimes this isn't what you want - you can force HIP to recognize the platform by setting HIP_PLATFORM to hcc (or nvcc)
 ```
@@ -166,5 +170,41 @@ One symptom of this problem is the message "error: 'unknown error'(11) at square

 If you see issues related to incorrect platform detection, please file an issue with the GitHub issue tracker so we can improve HIP's platform detection logic.

-### [Can I install both CUDA SDK and HCC on same machine?]<a name="Q100"></a>
+### Can I install both CUDA SDK and HCC on same machine?
 Yes. You can use HIP_PLATFORM to choose which path hipcc targets.  This configuration can be useful when using HIP to develop an application which is portable to both AMD and NVIDIA.
+
+### How do I trace HIP application flow?
+#### Using CodeXL markers for HIP Functions
+HIP can generate markers at function being/end which are displayed on the CodeXL timeline view.
+To do this, you need to install ROCm-Profiler and enable HIP to generate the markers:
+
+1. Install ROCm-Profiler
+Installing HIP from the [rocm](http://gpuopen.com/getting-started-with-boltzmann-components-platforms-installation/) pre-built packages, installs the ROCm-Profiler as well.
+Alternatively, you can build ROCm-Profiler using the instructions [here](https://github.com/RadeonOpenCompute/ROCm-Profiler#building-the-rocm-profiler).
+
+2. Build HIP with ATP markers enabled
+HIP pre-built packages are enabled with ATP marker support by default.
+To enable ATP marker support when building HIP from source, use the option ```-DCOMPILE_HIP_ATP_MARKER=1``` during the cmake configure step.
+
+3. Set HIP_ATP_MARKER
+```shell
+export HIP_ATP_MARKER=1
+```
+
+4. Recompile the target application
+
+5. Run with profiler enabled to generate ATP file.
+```shell
+# Use profile to generate timeline view:
+/opt/rocm/bin/rocm-profiler -o <outputATPFileName> -A <applicationName> <applicationArguments>
+```
+
+#### Using HIP_TRACE_API
+You can also print the HIP function strings to stderr using HIP_TRACE_API environment variable. This can also be combined with the more detailed debug information provided
+by the HIP_DB switch. For example:
+```shell
+# Trace to stderr showing being/end of each function (with arguments) + intermediate debug trace during the execution of each function.
+HIP_TRACE_API=1 HIP_DB=0x2 ./myHipApp
+```
+
+Note this trace mode uses colors. "less -r" can handle raw control characters and will display the debug output in proper colors.
@@ -1,38 +1,40 @@
-**Table of Contents**  
+## Table of Contents
+
+<!-- toc -->

 - [Introduction](#introduction)
 - [Function-Type Qualifiers](#function-type-qualifiers)
-  - [`__device__`](#__device__)
-  - [`__global__`](#__global__)
-  - [`__host__`](#__host__)
+  * [`__device__`](#__device__)
+  * [`__global__`](#__global__)
+  * [`__host__`](#__host__)
 - [Calling `__global__` Functions](#calling-__global__-functions)
 - [Kernel-Launch Example](#kernel-launch-example)
 - [Variable-Type Qualifiers](#variable-type-qualifiers)
-  - [`__constant__`](#__constant__)
-  - [`__shared__`](#__shared__)
-  - [`__managed__`](#__managed__)
-  - [`__restrict__`](#__restrict__)
+  * [`__constant__`](#__constant__)
+  * [`__shared__`](#__shared__)
+  * [`__managed__`](#__managed__)
+  * [`__restrict__`](#__restrict__)
 - [Built-In Variables](#built-in-variables)
-  - [Coordinate Built-Ins](#coordinate-built-ins)
-  - [warpSize](#warpsize)
+  * [Coordinate Built-Ins](#coordinate-built-ins)
+  * [warpSize](#warpsize)
 - [Vector Types](#vector-types)
-  - [Short Vector Types](#short-vector-types)
-  - [dim3](#dim3)
+  * [Short Vector Types](#short-vector-types)
+  * [dim3](#dim3)
 - [Memory-Fence Instructions](#memory-fence-instructions)
 - [Synchronization Functions](#synchronization-functions)
 - [Math Functions](#math-functions)
-  - [Single Precision Mathematical Functions](#single-precision-mathematical-functions)
-  - [Double Precision Mathematical Functions](#double-precision-mathematical-functions)
-  - [Integer Intrinsics](#integer-intrinsics)
-  - [Floating-point Intrinsics](#floating-point-intrinsics)
+  * [Single Precision Mathematical Functions](#single-precision-mathematical-functions)
+  * [Double Precision Mathematical Functions](#double-precision-mathematical-functions)
+  * [Integer Intrinsics](#integer-intrinsics)
+  * [Floating-point Intrinsics](#floating-point-intrinsics)
 - [Texture Functions](#texture-functions)
 - [Surface Functions](#surface-functions)
 - [Timer Functions](#timer-functions)
 - [Atomic Functions](#atomic-functions)
-  - [Caveats and Features Under-Development:](#caveats-and-features-under-development)
+  * [Caveats and Features Under-Development:](#caveats-and-features-under-development)
 - [Warp Cross-Lane Functions](#warp-cross-lane-functions)
-  - [Warp Vote and Ballot Functions](#warp-vote-and-ballot-functions)
-  - [Warp Shuffle Functions](#warp-shuffle-functions)
+  * [Warp Vote and Ballot Functions](#warp-vote-and-ballot-functions)
+  * [Warp Shuffle Functions](#warp-shuffle-functions)
 - [Profiler Counter Function](#profiler-counter-function)
 - [Assert](#assert)
 - [Printf](#printf)
@@ -43,7 +45,7 @@
 - [In-Line Assembly](#in-line-assembly)
 - [C++ Support](#c-support)

-
+<!-- tocstop -->

 ## Introduction

@@ -4,7 +4,6 @@ the porting of existing CUDA code into the HIP environment.  This section descri
 and provides practical suggestions on how to port CUDA code and work through common issues. 

 ## Table of Contents
-=================

 <!-- toc -->

@@ -41,8 +40,8 @@ and provides practical suggestions on how to port CUDA code and work through com
  * [Debugging hipcc](#debugging-hipcc)
  * [What Does This Error Mean?](#what-does-this-error-mean)
    + [/usr/include/c++/v1/memory:5172:15: error: call to implicitly deleted default constructor of 'std::__1::bad_weak_ptr' throw bad_weak_ptr();](#usrincludecv1memory517215-error-call-to-implicitly-deleted-default-constructor-of-std__1bad_weak_ptr-throw-bad_weak_ptr)
-    + [HIP Environment Variables](#hip-environment-variables)
-    + [Editor Highlighting](#editor-highlighting)
+  * [HIP Environment Variables](#hip-environment-variables)
+  * [Editor Highlighting](#editor-highlighting)

 <!-- tocstop -->

@@ -461,7 +460,7 @@ hipcc-cmd: /opt/hcc/bin/hcc  -hc -I/opt/hcc/include -stdlib=libc++ -I../../../..
 If you pass a ".cu" file, hcc will attempt to compile it as a Cuda language file. You must tell hcc that its in fact a C++ file: use the "-x c++" option.


-#### HIP Environment Variables
+### HIP Environment Variables

 On the HCC path, HIP provides a number of environment variables that control the behavior of HIP.  Some of these are useful for appliction development (for example HIP_VISIBLE_DEVICES, HIP_LAUNCH_BLOCKING),
 some are useful for performance tuning or experimentation (for example HIP_STAGING*), and some are useful for debugging (HIP_DB).  You can see the environment variables supported by HIP as well as
@@ -484,5 +483,5 @@ HIP_DISABLE_HW_COPY_DEP        =  1 : Disable HW dependencies before copy comman
 ```


-#### Editor Highlighting
+### Editor Highlighting
 See the utils/vim or utils/gedit directories to add handy highlighting to hip files.
@@ -1,11 +1,3 @@
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
-**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
-
- [Table Comparing Syntax for Different Compute APIs](#table-comparing-syntax-for-different-compute-apis)
-
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
-
 # Table Comparing Syntax for Different Compute APIs

 |Term|CUDA|HIP|HC|C++AMP|OpenCL|
@@ -46,5 +38,3 @@
 2. The indexing functions (starting with `thread-index`) show the terminology for a 1D grid.  Some APIs use reverse order of xyz / 012 indexing for 3D grids.
 3. HC allows tile dimensions to be specified at runtime while C++AMP requires that tile dimensions be specified at compile-time.  Thus hc syntax for tile dims is `t_ext.tile_dim[0]` while C++AMP is t_ext.tile_dim0.

-
-
@@ -1,12 +1,3 @@
-<!-- START doctoc generated TOC please keep comment here to allow auto update -->
-<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
-**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
-
- [Terms used in HIP Documentation](#terms-used-in-hip-documentation)
-
-<!-- END doctoc generated TOC please keep comment here to allow auto update -->
-
-
 # Terms used in HIP Documentation

 - host,  host cpu : Executes the HIP runtime API and is capable of initiating kernel launches to one or more devices.
@@ -24,6 +15,3 @@ The default device can be set with hipSetDevice.
 - hipify - tool to convert CUDA(R) code to portable C++ code.
 - hipconfig - tool to report various confoguration properties of the target platform.

-
-
-