From 0385a53d43d69b0d161b3119cb2b4c38f4560301 Mon Sep 17 00:00:00 2001 From: Maneesh Gupta Date: Tue, 24 May 2016 12:37:18 +0530 Subject: [PATCH 1/4] Move hipify-clang info to its own README Squashed commit of the following: commit bc44bcee461e46b0cf5cb9fe09213dca450b081a Author: Daniil Fukalov Date: Mon May 16 20:50:05 2016 +0300 added note about errors without CUDA sdk installed commit 5fd73ba90c0940bdc977737894362a99f4232b56 Author: Daniil Fukalov Date: Mon May 16 20:31:47 2016 +0300 move clang-hipify info to its own README commit 21d81a6d5acd3f093d77ac4d584e6f5bbe48f8cc Author: Daniil Fukalov Date: Mon May 16 20:30:00 2016 +0300 initial version Change-Id: I157294699a7be3d0bb38b2ee4a137a94280529c9 [ROCm/hip commit: 4a0f837042d961b7e746ee05bfcbf9c60758749e] --- projects/hip/INSTALL.md | 43 --------------------------- projects/hip/clang-hipify/README.md | 46 +++++++++++++++++++++++++++++ 2 files changed, 46 insertions(+), 43 deletions(-) create mode 100644 projects/hip/clang-hipify/README.md diff --git a/projects/hip/INSTALL.md b/projects/hip/INSTALL.md index dccbee2995..0b05a11f6b 100644 --- a/projects/hip/INSTALL.md +++ b/projects/hip/INSTALL.md @@ -11,9 +11,6 @@ - [HCC Options](#hcc-options) - [Using HIP with the AMD Native-GCN compiler.](#using-hip-with-the-amd-native-gcn-compiler) - [Compiling CodeXL markers for HIP Functions](#compiling-codexl-markers-for-hip-functions) - - [Using clang-hipify](#using-clang-hipify) - - [Building](#building) - - [Running and using clang-hipify](#running-and-using-clang-hipify) @@ -147,43 +144,3 @@ HIP_TRACE_API=1 HIP_DB=0x2 ./myHipApp ``` Note this trace mode uses colors. "less -r" can handle raw control characters and will display the debug output in proper colors. - - -### Using clang-hipify - -Clang-hipify is a clang-based tool which can automate the translation of CUDA source code into portable HIP C++. -The clang-hipify tool can automatically add extra HIP arguments (notably the "hipLaunchParm" required at the -beginning of every HIP kernel call). Clang-hipify has some additional dependencies explained below and -can be built as a separate make step. - - -#### Building - -1. Download and unpack clang+llvm 3.8 binary package preqrequisite: -``` -wget http://llvm.org/releases/3.8.0/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04.tar.xz -tar xvfJ clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04.tar.xz -``` - -2. Enable build of clang-hipify and specify path to LLVM: -Note LLVM_DIR must be a full absolute path (not relative) to the location extracted above. Here's an example assuming we -extract the clang 3.8 package into ~/HIP-privatestaging/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04/. -``` -cd HIP-privatestaging -mkdir build.clang-hipify -cd build.clang-hipify -cmake -DBUILD_CLANG_HIPIFY=1 -DLLVM_DIR=~/HIP-privatestaging/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04/ -DCMAKE_BUILD_TYPE=Release .. -make -make install -``` - -#### Running and using clang-hipify -clang-hipify performs an initial compile of the CUDA source code into a "symbol tree", and thus needs access to the appropriate header files: - 1. Download "deb(network)" variant of target installer from https://developer.nvidia.com/cuda-downloads. The commands below show how to download and install a recent version from the http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb. - -``` -wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb -sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb -sudo apt-get update && sudo apt-get install cuda-minimal-build-7-5 cuda-curand-dev-7-5 -``` - diff --git a/projects/hip/clang-hipify/README.md b/projects/hip/clang-hipify/README.md new file mode 100644 index 0000000000..6ea9e4a7a7 --- /dev/null +++ b/projects/hip/clang-hipify/README.md @@ -0,0 +1,46 @@ +## Using hipify-clang + +`hipify-clang` is a clang-based tool which can automate the translation of CUDA source code into portable HIP C++. +The tool can automatically add extra HIP arguments (notably the "hipLaunchParm" required at the beginning of every HIP kernel call). +`hipify-clang` has some additional dependencies explained below and can be built as a separate make step. The instructions below are specifically for **Ubuntu 14.04** + +### Build and install + +- Download and unpack clang+llvm 3.8 binary package preqrequisite. +```shell +wget http://llvm.org/releases/3.8.0/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04.tar.xz +tar xvfJ clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04.tar.xz +``` + +- Enable build of hipify-clang and specify path to LLVM. + +Note LLVM_DIR must be a full absolute path to the location extracted above. Here's an example assuming we extract the clang 3.8 package into ~/HIP/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04/ +```shell +cd HIP +mkdir build +cd build +cmake -DBUILD_CLANG_HIPIFY=1 -DLLVM_DIR=~/HIP/clang+llvm-3.8.0-x86_64-linux-gnu-ubuntu-14.04/ -DCMAKE_BUILD_TYPE=Release .. +make +make install +``` + +### Running and using hipify-clang + +`hipify-clang` performs an initial compile of the CUDA source code into a "symbol tree", and thus needs access to the appropriate header files. + +In the case when `hipify-clang` doesn't find cuda headers, it reports various errors about unknown keywords (e.g. '\__global\__'), API function names (e.g. 'cudaMalloc'), syntax (e.g. 'foo<<<1,n>>>(...)'), etc. + +To install CUDA headers, download the "deb(network)" variant of the target installer from https://developer.nvidia.com/cuda-downloads. The commands below show how to download and install a recent version from http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb. +```shell +wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb +sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb +sudo apt-get update && sudo apt-get install cuda-minimal-build-7-5 cuda-curand-dev-7-5 +``` + +#### Disclaimer + +The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD's products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. + +AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. + +Copyright (c) 2014-2016 Advanced Micro Devices, Inc. All rights reserved. \ No newline at end of file From d4ce97429b3a7a8eba915dd2edaad39658c864ad Mon Sep 17 00:00:00 2001 From: Maneesh Gupta Date: Mon, 6 Jun 2016 21:48:40 +0530 Subject: [PATCH 2/4] Updated doxygen documentation Change-Id: Idec4b7b811a441c9a792aa205286352f243945f0 [ROCm/hip commit: a352f83710a58f18e4a7c2f0e3cfda084c24487a] --- .../docs/RuntimeAPI/html/Synchonization.html | 2 +- .../hip/docs/RuntimeAPI/html/annotated.html | 2 +- projects/hip/docs/RuntimeAPI/html/bug.html | 6 +- .../html/classFakeMutex-members.html | 2 +- .../docs/RuntimeAPI/html/classFakeMutex.html | 4 +- .../html/classLockedAccessor-members.html | 2 +- .../RuntimeAPI/html/classLockedAccessor.html | 4 +- .../hip/docs/RuntimeAPI/html/classes.html | 2 +- ...lassihipDeviceCriticalBase__t-members.html | 21 +- .../html/classihipDeviceCriticalBase__t.html | 7 +- .../html/classihipDevice__t-members.html | 2 +- .../RuntimeAPI/html/classihipDevice__t.html | 6 +- .../html/classihipException-members.html | 2 +- .../RuntimeAPI/html/classihipException.html | 4 +- ...lassihipStreamCriticalBase__t-members.html | 2 +- .../html/classihipStreamCriticalBase__t.html | 4 +- .../html/classihipStream__t-members.html | 2 +- .../RuntimeAPI/html/classihipStream__t.html | 6 +- .../dir_68267d1309a1af8e8297ef4c3efbcdba.html | 6 +- .../dir_6d8604cb65fa6b83549668eb0ce09cac.html | 6 +- .../dir_d44c64559bbebec7f509842c48db8b23.html | 4 +- projects/hip/docs/RuntimeAPI/html/files.html | 33 +- .../hip/docs/RuntimeAPI/html/functions.html | 2 +- .../docs/RuntimeAPI/html/functions_vars.html | 2 +- .../hip/docs/RuntimeAPI/html/globals.html | 20 +- .../docs/RuntimeAPI/html/globals_defs.html | 5 +- .../docs/RuntimeAPI/html/globals_enum.html | 2 +- .../docs/RuntimeAPI/html/globals_eval.html | 2 +- .../docs/RuntimeAPI/html/globals_func.html | 8 +- .../docs/RuntimeAPI/html/globals_type.html | 2 +- .../hip/docs/RuntimeAPI/html/group__API.html | 2 +- .../docs/RuntimeAPI/html/group__Device.html | 2 +- .../docs/RuntimeAPI/html/group__Error.html | 2 +- .../docs/RuntimeAPI/html/group__Event.html | 2 +- .../RuntimeAPI/html/group__GlobalDefs.html | 13 +- .../RuntimeAPI/html/group__HCC__Specific.html | 2 +- .../docs/RuntimeAPI/html/group__HIP-ENV.html | 2 +- .../docs/RuntimeAPI/html/group__Memory.html | 5 +- .../RuntimeAPI/html/group__PeerToPeer.html | 148 +- .../docs/RuntimeAPI/html/group__Profiler.html | 2 +- .../docs/RuntimeAPI/html/group__Stream.html | 4 +- .../docs/RuntimeAPI/html/group__Texture.html | 2 +- .../docs/RuntimeAPI/html/group__Version.html | 2 +- .../docs/RuntimeAPI/html/hcc_8h_source.html | 4 +- .../RuntimeAPI/html/hcc__acc_8h_source.html | 8 +- .../html/hcc__detail_2hip__runtime_8h.html | 748 +++++++++- .../hcc__detail_2hip__runtime_8h_source.html | 997 ++++++------- .../hcc__detail_2hip__runtime__api_8h.html | 10 +- ...__detail_2hip__runtime__api_8h_source.html | 173 +-- .../hcc__detail_2hip__vector__types_8h.html | 336 ++--- ..._detail_2hip__vector__types_8h_source.html | 234 +-- .../hip/docs/RuntimeAPI/html/hierarchy.html | 2 +- .../html/hip__common_8h_source.html | 4 +- .../docs/RuntimeAPI/html/hip__hcc_8cpp.html | 22 +- .../RuntimeAPI/html/hip__hcc_8h_source.html | 1323 +++++++++-------- .../RuntimeAPI/html/hip__ldg_8h_source.html | 207 +++ .../html/hip__runtime_8h_source.html | 4 +- .../html/hip__runtime__api_8h_source.html | 184 +-- .../docs/RuntimeAPI/html/hip__texture_8h.html | 4 +- .../html/hip__texture_8h_source.html | 4 +- .../RuntimeAPI/html/hip__util_8h_source.html | 4 +- .../html/hip__vector__types_8h_source.html | 4 +- .../RuntimeAPI/html/host__defines_8h.html | 4 +- .../html/host__defines_8h_source.html | 4 +- projects/hip/docs/RuntimeAPI/html/index.html | 2 +- .../hip/docs/RuntimeAPI/html/modules.html | 2 +- projects/hip/docs/RuntimeAPI/html/pages.html | 2 +- .../hip/docs/RuntimeAPI/html/search/all_10.js | 7 +- .../hip/docs/RuntimeAPI/html/search/all_11.js | 5 +- .../hip/docs/RuntimeAPI/html/search/all_12.js | 2 +- .../hip/docs/RuntimeAPI/html/search/all_13.js | 2 +- .../hip/docs/RuntimeAPI/html/search/all_14.js | 2 +- .../docs/RuntimeAPI/html/search/all_15.html | 26 - .../hip/docs/RuntimeAPI/html/search/all_15.js | 4 - .../hip/docs/RuntimeAPI/html/search/all_8.js | 4 +- .../hip/docs/RuntimeAPI/html/search/all_d.js | 3 +- .../hip/docs/RuntimeAPI/html/search/all_e.js | 3 +- .../hip/docs/RuntimeAPI/html/search/all_f.js | 4 +- .../RuntimeAPI/html/search/defines_2.html | 26 - .../docs/RuntimeAPI/html/search/defines_2.js | 4 - .../RuntimeAPI/html/search/enumvalues_0.js | 2 + .../RuntimeAPI/html/search/functions_0.js | 2 - .../hip/docs/RuntimeAPI/html/search/search.js | 4 +- .../html/staging__buffer_8h_source.html | 31 +- .../html/structLockedBase-members.html | 2 +- .../RuntimeAPI/html/structLockedBase.html | 4 +- .../html/structStagingBuffer-members.html | 7 +- .../RuntimeAPI/html/structStagingBuffer.html | 9 +- .../RuntimeAPI/html/structdim3-members.html | 2 +- .../hip/docs/RuntimeAPI/html/structdim3.html | 4 +- .../structhipChannelFormatDesc-members.html | 2 +- .../html/structhipChannelFormatDesc.html | 4 +- .../html/structhipDeviceArch__t-members.html | 2 +- .../html/structhipDeviceArch__t.html | 4 +- .../html/structhipDeviceProp__t-members.html | 2 +- .../html/structhipDeviceProp__t.html | 4 +- .../html/structhipEvent__t-members.html | 2 +- .../RuntimeAPI/html/structhipEvent__t.html | 4 +- .../structhipPointerAttribute__t-members.html | 2 +- .../html/structhipPointerAttribute__t.html | 4 +- .../html/structihipEvent__t-members.html | 2 +- .../RuntimeAPI/html/structihipEvent__t.html | 4 +- .../html/structihipSignal__t-members.html | 2 +- .../RuntimeAPI/html/structihipSignal__t.html | 6 +- .../html/structtextureReference-members.html | 2 +- .../html/structtextureReference.html | 4 +- .../html/trace__helper_8h_source.html | 4 +- 107 files changed, 2834 insertions(+), 2025 deletions(-) create mode 100644 projects/hip/docs/RuntimeAPI/html/hip__ldg_8h_source.html delete mode 100644 projects/hip/docs/RuntimeAPI/html/search/all_15.html delete mode 100644 projects/hip/docs/RuntimeAPI/html/search/all_15.js delete mode 100644 projects/hip/docs/RuntimeAPI/html/search/defines_2.html delete mode 100644 projects/hip/docs/RuntimeAPI/html/search/defines_2.js diff --git a/projects/hip/docs/RuntimeAPI/html/Synchonization.html b/projects/hip/docs/RuntimeAPI/html/Synchonization.html index 05c16aa272..7ef63789c0 100644 --- a/projects/hip/docs/RuntimeAPI/html/Synchonization.html +++ b/projects/hip/docs/RuntimeAPI/html/Synchonization.html @@ -109,7 +109,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/annotated.html b/projects/hip/docs/RuntimeAPI/html/annotated.html index 1e46ddb367..91a1033754 100644 --- a/projects/hip/docs/RuntimeAPI/html/annotated.html +++ b/projects/hip/docs/RuntimeAPI/html/annotated.html @@ -112,7 +112,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/bug.html b/projects/hip/docs/RuntimeAPI/html/bug.html index d263a9da5e..ed5904320f 100644 --- a/projects/hip/docs/RuntimeAPI/html/bug.html +++ b/projects/hip/docs/RuntimeAPI/html/bug.html @@ -85,15 +85,13 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');

HCC always returns 0 for regsPerBlock

HCC always returns 0 for l2CacheSize

-
Member hipMemcpyPeerAsync (void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream)
-
This function uses a synchronous copy
-
Member hipStreamWaitEvent (hipStream_t stream, hipEvent_t event, unsigned int flags)
+
Member hipStreamWaitEvent (hipStream_t stream, hipEvent_t event, unsigned int flags)
This function conservatively waits for all work in the specified stream to complete.
diff --git a/projects/hip/docs/RuntimeAPI/html/classFakeMutex-members.html b/projects/hip/docs/RuntimeAPI/html/classFakeMutex-members.html index 93cef01e63..dfc1ad2516 100644 --- a/projects/hip/docs/RuntimeAPI/html/classFakeMutex-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classFakeMutex-members.html @@ -96,7 +96,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classFakeMutex.html b/projects/hip/docs/RuntimeAPI/html/classFakeMutex.html index ee6281bdcf..edd6c7a5de 100644 --- a/projects/hip/docs/RuntimeAPI/html/classFakeMutex.html +++ b/projects/hip/docs/RuntimeAPI/html/classFakeMutex.html @@ -104,12 +104,12 @@ void unlock () 
The documentation for this class was generated from the following file:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
diff --git a/projects/hip/docs/RuntimeAPI/html/classLockedAccessor-members.html b/projects/hip/docs/RuntimeAPI/html/classLockedAccessor-members.html index b28c517916..5c62d108d2 100644 --- a/projects/hip/docs/RuntimeAPI/html/classLockedAccessor-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classLockedAccessor-members.html @@ -97,7 +97,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classLockedAccessor.html b/projects/hip/docs/RuntimeAPI/html/classLockedAccessor.html index 198e041c75..6aee2a78d8 100644 --- a/projects/hip/docs/RuntimeAPI/html/classLockedAccessor.html +++ b/projects/hip/docs/RuntimeAPI/html/classLockedAccessor.html @@ -104,12 +104,12 @@ T * operator-> () 
The documentation for this class was generated from the following file:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
diff --git a/projects/hip/docs/RuntimeAPI/html/classes.html b/projects/hip/docs/RuntimeAPI/html/classes.html index 670a0c35d8..0d6ff008c7 100644 --- a/projects/hip/docs/RuntimeAPI/html/classes.html +++ b/projects/hip/docs/RuntimeAPI/html/classes.html @@ -111,7 +111,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t-members.html b/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t-members.html index cef393efa0..61c294b851 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t-members.html @@ -97,19 +97,20 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); ihipDeviceCriticalBase_t() (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline incStreamId() (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline init(unsigned deviceCnt) (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline - lock() (defined in LockedBase< MUTEX_TYPE >)LockedBase< MUTEX_TYPE >inlineprivate - LockedAccessor< ihipDeviceCriticalBase_t > (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >friend - peerAgents() const (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline - peerCnt() const (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline - removePeer(ihipDevice_t *peer) (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE > - resetPeers(ihipDevice_t *thisDevice) (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE > - streams() (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline - unlock() (defined in LockedBase< MUTEX_TYPE >)LockedBase< MUTEX_TYPE >inlineprivate - ~ihipDeviceCriticalBase_t() (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline + isPeer(const ihipDevice_t *peer) (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE > + lock() (defined in LockedBase< MUTEX_TYPE >)LockedBase< MUTEX_TYPE >inlineprivate + LockedAccessor< ihipDeviceCriticalBase_t > (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >friend + peerAgents() const (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline + peerCnt() const (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline + removePeer(ihipDevice_t *peer) (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE > + resetPeers(ihipDevice_t *thisDevice) (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE > + streams() (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline + unlock() (defined in LockedBase< MUTEX_TYPE >)LockedBase< MUTEX_TYPE >inlineprivate + ~ihipDeviceCriticalBase_t() (defined in ihipDeviceCriticalBase_t< MUTEX_TYPE >)ihipDeviceCriticalBase_t< MUTEX_TYPE >inline diff --git a/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t.html b/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t.html index c3bbd18337..2dc73d4d1d 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipDeviceCriticalBase__t.html @@ -115,6 +115,9 @@ const std::list< ihipStream_t ihipStream_t::SeqNum_t incStreamId ()   + +bool isPeer (const ihipDevice_t *peer) +  bool addPeer (ihipDevice_t *peer)   @@ -141,12 +144,12 @@ class LockedAccessor< i  
The documentation for this class was generated from the following file:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
diff --git a/projects/hip/docs/RuntimeAPI/html/classihipDevice__t-members.html b/projects/hip/docs/RuntimeAPI/html/classihipDevice__t-members.html index abfe39aa93..1b0a363925 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipDevice__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipDevice__t-members.html @@ -110,7 +110,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classihipDevice__t.html b/projects/hip/docs/RuntimeAPI/html/classihipDevice__t.html index c808706695..16dde77fa6 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipDevice__t.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipDevice__t.html @@ -144,13 +144,13 @@ unsigned _device_flags  
The documentation for this class was generated from the following files:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • -
  • /home/mangupta/hip_git/release_0.84.00/src/hip_hcc.cpp
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/src/hip_hcc.cpp
diff --git a/projects/hip/docs/RuntimeAPI/html/classihipException-members.html b/projects/hip/docs/RuntimeAPI/html/classihipException-members.html index 16623c6191..6a6901ab9e 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipException-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipException-members.html @@ -95,7 +95,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classihipException.html b/projects/hip/docs/RuntimeAPI/html/classihipException.html index b882caa43f..329b7becab 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipException.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipException.html @@ -113,12 +113,12 @@ Public Attributes  
The documentation for this class was generated from the following file:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
diff --git a/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t-members.html b/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t-members.html index b8c648b0b6..0576c1586a 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t-members.html @@ -106,7 +106,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t.html b/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t.html index 8d6d0cbdc0..cd5458940c 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipStreamCriticalBase__t.html @@ -144,12 +144,12 @@ MUTEX_TYPE _mutex  
The documentation for this class was generated from the following file:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
diff --git a/projects/hip/docs/RuntimeAPI/html/classihipStream__t-members.html b/projects/hip/docs/RuntimeAPI/html/classihipStream__t-members.html index 78a1c9e051..8a1c920e91 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipStream__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipStream__t-members.html @@ -113,7 +113,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/classihipStream__t.html b/projects/hip/docs/RuntimeAPI/html/classihipStream__t.html index af8a80a26e..3cc28b243f 100644 --- a/projects/hip/docs/RuntimeAPI/html/classihipStream__t.html +++ b/projects/hip/docs/RuntimeAPI/html/classihipStream__t.html @@ -164,13 +164,13 @@ std::ostream & operato  
The documentation for this class was generated from the following files:
    -
  • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
  • -
  • /home/mangupta/hip_git/release_0.84.00/src/hip_hcc.cpp
  • +
  • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
  • +
  • /home/mangupta/git/hip/release_0.86.00/src/hip_hcc.cpp
diff --git a/projects/hip/docs/RuntimeAPI/html/dir_68267d1309a1af8e8297ef4c3efbcdba.html b/projects/hip/docs/RuntimeAPI/html/dir_68267d1309a1af8e8297ef4c3efbcdba.html index e5899cfc4f..424d2390ce 100644 --- a/projects/hip/docs/RuntimeAPI/html/dir_68267d1309a1af8e8297ef4c3efbcdba.html +++ b/projects/hip/docs/RuntimeAPI/html/dir_68267d1309a1af8e8297ef4c3efbcdba.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/src Directory Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/src Directory Reference @@ -96,6 +96,8 @@ Files   file  hip_hcc.cpp   +file  hip_ldg.cpp +  file  hip_memory.cpp   file  hip_peer.cpp @@ -108,7 +110,7 @@ Files diff --git a/projects/hip/docs/RuntimeAPI/html/dir_6d8604cb65fa6b83549668eb0ce09cac.html b/projects/hip/docs/RuntimeAPI/html/dir_6d8604cb65fa6b83549668eb0ce09cac.html index 01b2359943..30736dcaa4 100644 --- a/projects/hip/docs/RuntimeAPI/html/dir_6d8604cb65fa6b83549668eb0ce09cac.html +++ b/projects/hip/docs/RuntimeAPI/html/dir_6d8604cb65fa6b83549668eb0ce09cac.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail Directory Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail Directory Reference @@ -90,6 +90,8 @@ Files   file  hip_hcc.h [code]   +file  hip_ldg.h [code] +  file  hip_runtime.h [code]  Contains definitions of APIs for HIP runtime.
  @@ -115,7 +117,7 @@ Files diff --git a/projects/hip/docs/RuntimeAPI/html/dir_d44c64559bbebec7f509842c48db8b23.html b/projects/hip/docs/RuntimeAPI/html/dir_d44c64559bbebec7f509842c48db8b23.html index 540366d798..3803ab7798 100644 --- a/projects/hip/docs/RuntimeAPI/html/dir_d44c64559bbebec7f509842c48db8b23.html +++ b/projects/hip/docs/RuntimeAPI/html/dir_d44c64559bbebec7f509842c48db8b23.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include Directory Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include Directory Reference @@ -105,7 +105,7 @@ Files diff --git a/projects/hip/docs/RuntimeAPI/html/files.html b/projects/hip/docs/RuntimeAPI/html/files.html index 6d6a5803cd..f463556c0a 100644 --- a/projects/hip/docs/RuntimeAPI/html/files.html +++ b/projects/hip/docs/RuntimeAPI/html/files.html @@ -91,27 +91,28 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); |o-hcc_detail ||o*hcc_acc.h ||o*hip_hcc.h -||o*hip_runtime.hContains definitions of APIs for HIP runtime -||o*hip_runtime_api.hContains C function APIs for HIP runtime. This file does not use any HCC builtin or special language extensions (-hc mode) ; those functions in hip_runtime.h -||o*hip_texture.hHIP C++ Texture API for hcc compiler -||o*hip_util.h -||o*hip_vector_types.hDefines the different newt vector types for HIP runtime -||o*host_defines.hTODO-doc -||o*staging_buffer.h -||\*trace_helper.h -|o*hcc.h -|o*hip_common.h -|o*hip_runtime.h -|o*hip_runtime_api.h -|\*hip_vector_types.h -\-src - \*hip_hcc.cpp +||o*hip_ldg.h +||o*hip_runtime.hContains definitions of APIs for HIP runtime +||o*hip_runtime_api.hContains C function APIs for HIP runtime. This file does not use any HCC builtin or special language extensions (-hc mode) ; those functions in hip_runtime.h +||o*hip_texture.hHIP C++ Texture API for hcc compiler +||o*hip_util.h +||o*hip_vector_types.hDefines the different newt vector types for HIP runtime +||o*host_defines.hTODO-doc +||o*staging_buffer.h +||\*trace_helper.h +|o*hcc.h +|o*hip_common.h +|o*hip_runtime.h +|o*hip_runtime_api.h +|\*hip_vector_types.h +\-src + \*hip_hcc.cpp diff --git a/projects/hip/docs/RuntimeAPI/html/functions.html b/projects/hip/docs/RuntimeAPI/html/functions.html index f8cf987a05..f4d68b34f9 100644 --- a/projects/hip/docs/RuntimeAPI/html/functions.html +++ b/projects/hip/docs/RuntimeAPI/html/functions.html @@ -309,7 +309,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/functions_vars.html b/projects/hip/docs/RuntimeAPI/html/functions_vars.html index f0369d18f3..2e2742f1a0 100644 --- a/projects/hip/docs/RuntimeAPI/html/functions_vars.html +++ b/projects/hip/docs/RuntimeAPI/html/functions_vars.html @@ -309,7 +309,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/globals.html b/projects/hip/docs/RuntimeAPI/html/globals.html index d945f975ca..e79e05a327 100644 --- a/projects/hip/docs/RuntimeAPI/html/globals.html +++ b/projects/hip/docs/RuntimeAPI/html/globals.html @@ -79,8 +79,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); @@ -294,13 +293,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); : hip_runtime_api.h
  • hipMemcpyKind -: hip_runtime_api.h -
  • -
  • hipMemcpyPeer() -: hip_runtime_api.h -
  • -
  • hipMemcpyPeerAsync() -: hip_runtime_api.h +: hip_runtime_api.h
  • hipMemcpyToSymbol() : hip_runtime_api.h @@ -375,17 +368,10 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); : hip_runtime.h
  • - - -

    - o -

    diff --git a/projects/hip/docs/RuntimeAPI/html/globals_defs.html b/projects/hip/docs/RuntimeAPI/html/globals_defs.html index 687f6b28ad..f48e455092 100644 --- a/projects/hip/docs/RuntimeAPI/html/globals_defs.html +++ b/projects/hip/docs/RuntimeAPI/html/globals_defs.html @@ -131,14 +131,11 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
  • hipThreadIdx_x : hip_runtime.h
  • -
  • ONE_COMPONENT_ACCESS -: hip_vector_types.h -
  • diff --git a/projects/hip/docs/RuntimeAPI/html/globals_enum.html b/projects/hip/docs/RuntimeAPI/html/globals_enum.html index 4888f8c682..242db05078 100644 --- a/projects/hip/docs/RuntimeAPI/html/globals_enum.html +++ b/projects/hip/docs/RuntimeAPI/html/globals_enum.html @@ -111,7 +111,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/globals_eval.html b/projects/hip/docs/RuntimeAPI/html/globals_eval.html index 5739a6b30a..d5761c75fe 100644 --- a/projects/hip/docs/RuntimeAPI/html/globals_eval.html +++ b/projects/hip/docs/RuntimeAPI/html/globals_eval.html @@ -138,7 +138,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/globals_func.html b/projects/hip/docs/RuntimeAPI/html/globals_func.html index cbf0b639db..7df814254c 100644 --- a/projects/hip/docs/RuntimeAPI/html/globals_func.html +++ b/projects/hip/docs/RuntimeAPI/html/globals_func.html @@ -216,12 +216,6 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
  • hipMemcpyAsync() : hip_runtime_api.h
  • -
  • hipMemcpyPeer() -: hip_runtime_api.h -
  • -
  • hipMemcpyPeerAsync() -: hip_runtime_api.h -
  • hipMemcpyToSymbol() : hip_runtime_api.h
  • @@ -268,7 +262,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/globals_type.html b/projects/hip/docs/RuntimeAPI/html/globals_type.html index a5f674defa..df5e5fcae7 100644 --- a/projects/hip/docs/RuntimeAPI/html/globals_type.html +++ b/projects/hip/docs/RuntimeAPI/html/globals_type.html @@ -108,7 +108,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/group__API.html b/projects/hip/docs/RuntimeAPI/html/group__API.html index f32b09b16b..2ff4d4e50f 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__API.html +++ b/projects/hip/docs/RuntimeAPI/html/group__API.html @@ -110,7 +110,7 @@ Modules diff --git a/projects/hip/docs/RuntimeAPI/html/group__Device.html b/projects/hip/docs/RuntimeAPI/html/group__Device.html index b97f6c6b23..47181d6d5f 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Device.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Device.html @@ -463,7 +463,7 @@ Functions diff --git a/projects/hip/docs/RuntimeAPI/html/group__Error.html b/projects/hip/docs/RuntimeAPI/html/group__Error.html index 6a01f21b81..cf2ef0f018 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Error.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Error.html @@ -197,7 +197,7 @@ Functions diff --git a/projects/hip/docs/RuntimeAPI/html/group__Event.html b/projects/hip/docs/RuntimeAPI/html/group__Event.html index f4f185dcac..7032ed64df 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Event.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Event.html @@ -340,7 +340,7 @@ Functions diff --git a/projects/hip/docs/RuntimeAPI/html/group__GlobalDefs.html b/projects/hip/docs/RuntimeAPI/html/group__GlobalDefs.html index 3cabc32dc3..886f96f61a 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__GlobalDefs.html +++ b/projects/hip/docs/RuntimeAPI/html/group__GlobalDefs.html @@ -202,7 +202,10 @@ Enumerations
      hipErrorRuntimeMemory, hipErrorRuntimeOther, -hipErrorTbd +hipErrorHostMemoryAlreadyRegistered, +hipErrorHostMemoryNotRegistered, +
    +  hipErrorTbd
    }   @@ -532,6 +535,12 @@ Enumerations hipErrorRuntimeOther 

    HSA runtime call other than memory returned error. Typically not seen in production systems.

    +hipErrorHostMemoryAlreadyRegistered  +

    Produced when trying to lock a page-locked memory.

    + +hipErrorHostMemoryNotRegistered  +

    Produced when trying to unlock a non-page-locked memory.

    + hipErrorTbd 

    Marker that more error codes are needed.

    @@ -623,7 +632,7 @@ Enumerations diff --git a/projects/hip/docs/RuntimeAPI/html/group__HCC__Specific.html b/projects/hip/docs/RuntimeAPI/html/group__HCC__Specific.html index bb5dee0fdc..4a6e6b63bb 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__HCC__Specific.html +++ b/projects/hip/docs/RuntimeAPI/html/group__HCC__Specific.html @@ -88,7 +88,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/group__HIP-ENV.html b/projects/hip/docs/RuntimeAPI/html/group__HIP-ENV.html index 7445752649..938d8cb437 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__HIP-ENV.html +++ b/projects/hip/docs/RuntimeAPI/html/group__HIP-ENV.html @@ -82,7 +82,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/group__Memory.html b/projects/hip/docs/RuntimeAPI/html/group__Memory.html index 7245bf2416..2bdce29d59 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Memory.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Memory.html @@ -535,6 +535,7 @@ Functions

    Copy data from src to dst.

    It supports memory from host to device, device to host, device to device and host to host The src and dst must not overlap.

    +

    For hipMemcpy, the copy is always performed by the current device (set by hipSetDevice). For multi-gpu or peer-to-peer configurations, it is recommended to set the current device to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling hipDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument. if this is not done, the hipMemcpy will still work, but will perform the copy using a staging buffer on the host.

    Parameters
    @@ -592,6 +593,8 @@ Functions

    Copy data from src to dst asynchronously.

    Warning
    If host or dest are not pinned, the memory copy will be performed synchronously. For best performance, use hipHostMalloc to allocate host memory that is transferred asynchronously.
    +

    For hipMemcpy, the copy is always performed by the device associated with the specified stream.

    +

    For multi-gpu or peer-to-peer configurations, it is recommended to use a stream which is a attached to the device where the src data is physically located. For optimal peer-to-peer copies, the copy device must be able to access the src and dst pointers (by calling hipDeviceEnablePeerAccess with copy agent as the current device and src/dest as the peerDevice argument. if this is not done, the hipMemcpy will still work, but will perform the copy using a staging buffer on the host.

    Parameters
    [out]dstData being copy to
    @@ -829,7 +832,7 @@ on HCC hipMemcpyAsync requires that any host pointers are pinned (ie via the hip diff --git a/projects/hip/docs/RuntimeAPI/html/group__PeerToPeer.html b/projects/hip/docs/RuntimeAPI/html/group__PeerToPeer.html index c737bc7bf4..e50d5831d9 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__PeerToPeer.html +++ b/projects/hip/docs/RuntimeAPI/html/group__PeerToPeer.html @@ -93,15 +93,10 @@ Functions - - - - - -
    [out]dstData being copy to
    hipError_t hipDeviceDisablePeerAccess (int peerDeviceId)
     Disable direct access from current device's virtual address space to memory allocations physically located on a peer device. More...
     
    hipError_t hipMemcpyPeer (void *dst, int dstDeviceId, const void *src, int srcDeviceId, size_t sizeBytes)
     Copies memory from one device to memory on another device. More...
     
    hipError_t hipMemcpyPeerAsync (void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream)
     Copies memory from one device to memory on another device. More...
     

    Detailed Description

    ----------------------------------------------------------------------------------------------—

    +
    Warning
    PeerToPeer support is experimental.

    Function Documentation

    @@ -136,17 +131,19 @@ Functions

    Determine if a device can access a peer's memory.

    Parameters
    - - - + + +
    [out]canAccessPeerreturns true if specified devices are peers.
    [in]device
    [in]peerDeviceReturns "1" in canAccessPeer if the specified device is capable of directly accessing memory physically located on peerDevice , or "0" if not.
    [out]canAccessPeerReturns the peer access capability (0 or 1)
    [in]device- device from where memory may be accessed.
    [in]peerDevice- device where memory is physically located
    +

    Returns "1" in canAccessPeer if the specified device is capable of directly accessing memory physically located on peerDevice , or "0" if not.

    Returns "0" in canAccessPeer if deviceId == peerDeviceId, and both are valid devices : a device is not a peer of itself.

    Returns
    hipSuccess,
    -hipErrorInvalidDevice if deviceId or peerDeviceId are not valid devices
    -
    Warning
    HCC returns 0 in *canAccessPeer ; Need to update this function when RT supports P2P
    +hipErrorInvalidDevice if deviceId or peerDeviceId are not valid devices
    +
    Warning
    PeerToPeer support is experimental.
    +

    HCC returns 0 in *canAccessPeer ; Need to update this function when RT supports P2P

    @@ -168,10 +165,12 @@ Functions

    Returns hipErrorPeerAccessNotEnabled if direct access to memory on peerDevice has not yet been enabled from the current device.

    Parameters
    - +
    [in]peerDeviceIdReturns hipSuccess, hipErrorPeerAccessNotEnabled
    [in]peerDeviceId
    +
    Returns
    hipSuccess, hipErrorPeerAccessNotEnabled
    +
    Warning
    PeerToPeer support is experimental.
    @@ -209,135 +208,14 @@ Functions
    Returns
    hipErrorPeerAccessAlreadyEnabled if peer access is already enabled for this device.
    - - - - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    hipError_t hipMemcpyPeer (void * dst,
    int dstDeviceId,
    const void * src,
    int srcDeviceId,
    size_t sizeBytes 
    )
    -
    - -

    Copies memory from one device to memory on another device.

    -
    Parameters
    - - - - - - -
    [out]dst- Destination device pointer.
    [in]dstDeviceId- Destination device
    [in]src- Source device pointer
    [in]srcDeviceId- Source device
    [in]sizeBytes- Size of memory copy in bytes
    -
    -
    -

    Returns hipSuccess, hipErrorInvalidValue, hipErrorInvalidDevice

    - -
    -
    - -
    -
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    hipError_t hipMemcpyPeerAsync (void * dst,
    int dstDevice,
    const void * src,
    int srcDevice,
    size_t sizeBytes,
    hipStream_t stream 
    )
    -
    - -

    Copies memory from one device to memory on another device.

    -
    Parameters
    - - - - - - - -
    [out]dst- Destination device pointer.
    [in]dstDevice- Destination device
    [in]src- Source device pointer
    [in]srcDevice- Source device
    [in]sizeBytes- Size of memory copy in bytes
    [in]stream- Stream identifier
    -
    -
    -

    Returns hipSuccess, hipErrorInvalidValue, hipErrorInvalidDevice

    -
    Bug:
    This function uses a synchronous copy
    +
    Warning
    PeerToPeer support is experimental.
    diff --git a/projects/hip/docs/RuntimeAPI/html/group__Profiler.html b/projects/hip/docs/RuntimeAPI/html/group__Profiler.html index be90725871..1400fe8d2e 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Profiler.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Profiler.html @@ -85,7 +85,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/group__Stream.html b/projects/hip/docs/RuntimeAPI/html/group__Stream.html index f3a4f40535..be16cb0f5f 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Stream.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Stream.html @@ -307,14 +307,14 @@ Functions
    Returns
    hipSuccess, hipErrorInvalidResourceHandle

    This function inserts a wait operation into the specified stream. All future work submitted to stream will wait until event reports completion before beginning execution. This function is host-asynchronous and the function may return before the wait has completed.

    -
    Bug:
    This function conservatively waits for all work in the specified stream to complete.
    +
    Bug:
    This function conservatively waits for all work in the specified stream to complete.
    diff --git a/projects/hip/docs/RuntimeAPI/html/group__Texture.html b/projects/hip/docs/RuntimeAPI/html/group__Texture.html index 0142de0aaa..308ae147b9 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Texture.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Texture.html @@ -121,7 +121,7 @@ template<class T , int dim, enum hipTextureReadMode readMode> diff --git a/projects/hip/docs/RuntimeAPI/html/group__Version.html b/projects/hip/docs/RuntimeAPI/html/group__Version.html index 8c8a1ef101..0a49b3eb83 100644 --- a/projects/hip/docs/RuntimeAPI/html/group__Version.html +++ b/projects/hip/docs/RuntimeAPI/html/group__Version.html @@ -114,7 +114,7 @@ Functions diff --git a/projects/hip/docs/RuntimeAPI/html/hcc_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hcc_8h_source.html index 287ff77891..c4fbf7b2f5 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc.h Source File @@ -100,7 +100,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__acc_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hcc__acc_8h_source.html index 5c1b902cfe..8d72291a50 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__acc_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__acc_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hcc_acc.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hcc_acc.h Source File @@ -103,13 +103,13 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    18 #endif
    19 
    20 #endif
    -
    hipError_t hipHccGetAccelerator(int deviceId, hc::accelerator *acc)
    Definition: hip_hcc.cpp:1396
    +
    hipError_t hipHccGetAccelerator(int deviceId, hc::accelerator *acc)
    Definition: hip_hcc.cpp:1499
    hipError_t
    Definition: hip_runtime_api.h:142
    -
    hipError_t hipHccGetAcceleratorView(hipStream_t stream, hc::accelerator_view **av)
    Definition: hip_hcc.cpp:1416
    +
    hipError_t hipHccGetAcceleratorView(hipStream_t stream, hc::accelerator_view **av)
    Definition: hip_hcc.cpp:1519
    diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h.html b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h.html index 3512236a26..e3f4149584 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_runtime.h File Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_runtime.h File Reference @@ -96,7 +96,8 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');

    Contains definitions of APIs for HIP runtime. More...

    -
    #include <string.h>
    +
    #include <cmath>
    +#include <string.h>
    #include <stddef.h>
    #include <hip/hip_runtime_api.h>
    #include <grid_launch.h>
    @@ -162,12 +163,753 @@ Macros + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Functions

    +__device__ float acosf (float x)
     
    +__device__ float acoshf (float x)
     
    +__device__ float asinf (float x)
     
    +__device__ float asinhf (float x)
     
    +__device__ float atan2f (float y, float x)
     
    +__device__ float atanf (float x)
     
    +__device__ float atanhf (float x)
     
    +__device__ float cbrtf (float x)
     
    +__device__ float ceilf (float x)
     
    +__device__ float copysignf (float x, float y)
     
    +__device__ float cosf (float x)
     
    +__device__ float coshf (float x)
     
    +__device__ float cyl_bessel_i0f (float x)
     
    +__device__ float cyl_bessel_i1f (float x)
     
    +__device__ float erfcf (float x)
     
    +__device__ float erfcinvf (float y)
     
    +__device__ float erfcxf (float x)
     
    +__device__ float erff (float x)
     
    +__device__ float erfinvf (float y)
     
    +__device__ float exp10f (float x)
     
    +__device__ float exp2f (float x)
     
    +__device__ float expf (float x)
     
    +__device__ float expm1f (float x)
     
    +__device__ float fabsf (float x)
     
    +__device__ float fdimf (float x, float y)
     
    +__device__ float fdividef (float x, float y)
     
    +__device__ float floorf (float x)
     
    +__device__ float fmaf (float x, float y, float z)
     
    +__device__ float fmaxf (float x, float y)
     
    +__device__ float fminf (float x, float y)
     
    +__device__ float fmodf (float x, float y)
     
    +__device__ float frexpf (float x, float y)
     
    +__device__ float hypotf (float x, float y)
     
    +__device__ float ilogbf (float x)
     
    +__host__ __device__ unsigned isfinite (float a)
     
    +__device__ unsigned isinf (float a)
     
    +__device__ unsigned isnan (float a)
     
    +__device__ float j0f (float x)
     
    +__device__ float j1f (float x)
     
    +__device__ float jnf (int n, float x)
     
    +__device__ float ldexpf (float x, int exp)
     
    +__device__ float lgammaf (float x)
     
    +__device__ long long int llrintf (float x)
     
    +__device__ long long int llroundf (float x)
     
    +__device__ float log10f (float x)
     
    +__device__ float log1pf (float x)
     
    +__device__ float log2f (float x)
     
    +__device__ float logbf (float x)
     
    +__device__ float logf (float x)
     
    +__device__ long int lrintf (float x)
     
    +__device__ long int lroundf (float x)
     
    +__device__ float modff (float x, float *iptr)
     
    +__device__ float nanf (const char *tagp)
     
    +__device__ float nearbyintf (float x)
     
    +__device__ float nextafterf (float x, float y)
     
    +__device__ float norm3df (float a, float b, float c)
     
    +__device__ float norm4df (float a, float b, float c, float d)
     
    +__device__ float normcdff (float y)
     
    +__device__ float normcdfinvf (float y)
     
    +__device__ float normf (int dim, const float *a)
     
    +__device__ float powf (float x, float y)
     
    +__device__ float rcbtrf (float x)
     
    +__device__ float remainderf (float x, float y)
     
    +__device__ float remquof (float x, float y, int *quo)
     
    +__device__ float rhypotf (float x, float y)
     
    +__device__ float rintf (float x)
     
    +__device__ float rnorm3df (float a, float b, float c)
     
    +__device__ float rnorm4df (float a, float b, float c, float d)
     
    +__device__ float rnormf (int dim, const float *a)
     
    +__device__ float roundf (float x)
     
    +__device__ float rsqrtf (float x)
     
    +__device__ float scalblnf (float x, long int n)
     
    +__device__ float scalbnf (float x, int n)
     
    +__host__ __device__ unsigned signbit (float a)
     
    +__device__ void sincosf (float x, float *sptr, float *cptr)
     
    +__device__ void sincospif (float x, float *sptr, float *cptr)
     
    +__device__ float sinf (float x)
     
    +__device__ float sinhf (float x)
     
    +__device__ float sinpif (float x)
     
    +__device__ float sqrtf (float x)
     
    +__device__ float tanf (float x)
     
    +__device__ float tanhf (float x)
     
    +__device__ float tgammaf (float x)
     
    +__device__ float truncf (float x)
     
    +__device__ float y0f (float x)
     
    +__device__ float y1f (float x)
     
    +__device__ float ynf (int n, float x)
     
    +__host__ __device__ float cospif (float x)
     
    +__device__ double acos (double x)
     
    +__device__ double acosh (double x)
     
    +__device__ double asin (double x)
     
    +__device__ double asinh (double x)
     
    +__device__ double atan (double x)
     
    +__device__ double atan2 (double y, double x)
     
    +__device__ double atanh (double x)
     
    +__device__ double cbrt (double x)
     
    +__device__ double ceil (double x)
     
    +__device__ double copysign (double x, double y)
     
    +__device__ double cos (double x)
     
    +__device__ double cosh (double x)
     
    +__host__ __device__ double cospi (double x)
     
    +__device__ double cyl_bessel_i0 (double x)
     
    +__device__ double cyl_bessel_i1 (double x)
     
    +__device__ double erf (double x)
     
    +__device__ double erfc (double x)
     
    +__device__ double erfcinv (double y)
     
    +__device__ double erfcx (double x)
     
    +__device__ double exp (double x)
     
    +__device__ double exp10 (double x)
     
    +__device__ double exp2 (double x)
     
    +__device__ double expm1 (double x)
     
    +__device__ double fabs (double x)
     
    +__device__ double fdim (double x, double y)
     
    +__device__ double floor (double x)
     
    +__device__ double fma (double x, double y, double z)
     
    +__device__ double fmax (double x, double y)
     
    +__device__ double fmin (double x, double y)
     
    +__device__ double fmod (double x, double y)
     
    +__device__ double frexp (double x, int *nptr)
     
    +__device__ double hypot (double x, double y)
     
    +__device__ double ilogb (double x)
     
    +__host__ __device__ unsigned isfinite (double x)
     
    +__device__ unsigned isinf (double x)
     
    +__device__ unsigned isnan (double x)
     
    +__device__ double j0 (double x)
     
    +__device__ double j1 (double x)
     
    +__device__ double jn (int n, double x)
     
    +__device__ double ldexp (double x, int exp)
     
    +__device__ double lgamma (double x)
     
    +__device__ long long llrint (double x)
     
    +__device__ long llround (double x)
     
    +__device__ double log (double x)
     
    +__device__ double log10 (double x)
     
    +__device__ double log1p (double x)
     
    +__device__ double log2 (double x)
     
    +__device__ double logb (double x)
     
    +__device__ long int lrint (double x)
     
    +__device__ long int lround (double x)
     
    +__device__ double modf (double x, double *iptr)
     
    +__device__ double nan (const char *tagp)
     
    +__device__ double nearbyint (double x)
     
    +__device__ double nextafter (double x, double y)
     
    +__device__ double norm (int dim, const double *t)
     
    +__device__ double norm3d (double a, double b, double c)
     
    +__device__ double norm4d (double a, double b, double d)
     
    +__device__ double normcdf (double y)
     
    +__device__ double normcdfinv (double y)
     
    +__device__ double pow (double x, double y)
     
    +__device__ double rcbrt (double x)
     
    +__device__ double remainder (double x, double y)
     
    +__device__ double remquo (double x, double y, int *quo)
     
    +__device__ double rhypot (double x, double y)
     
    +__device__ double rint (double x)
     
    +__device__ double rnorm (int dim, const double *t)
     
    +__device__ double rnorm3d (double a, double b, double c)
     
    +__device__ double rnorm4d (double a, double b, double c, double d)
     
    +__device__ double round (double x)
     
    +__host__ __device__ double rsqrt (double x)
     
    +__device__ double scalbln (double x, long int n)
     
    +__device__ double scalbn (double x, int n)
     
    +__host__ __device__ unsigned signbit (double a)
     
    +__device__ double sin (double a)
     
    +__device__ double sincos (double x, double *sptr, double *cptr)
     
    +__device__ double sincospi (double x, double *sptr, double *cptr)
     
    +__device__ double sinh (double x)
     
    +__host__ __device__ double sinpi (double x)
     
    +__device__ double sqrt (double x)
     
    +__device__ double tan (double x)
     
    +__device__ double tanh (double x)
     
    +__device__ double tgamma (double x)
     
    +__device__ double trunc (double x)
     
    +__device__ double y0 (double x)
     
    +__device__ double y1 (double y)
     
    +__device__ double yn (int n, double x)
     
    __device__ long long int clock64 ()
     
    __device__ clock_t clock ()
     
    +__device__ int atomicAdd (int *address, int val)
     
    +__device__ unsigned int atomicAdd (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicAdd (unsigned long long int *address, unsigned long long int val)
     
    +__device__ float atomicAdd (float *address, float val)
     
    +__device__ int atomicSub (int *address, int val)
     
    +__device__ unsigned int atomicSub (unsigned int *address, unsigned int val)
     
    +__device__ int atomicExch (int *address, int val)
     
    +__device__ unsigned int atomicExch (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicExch (unsigned long long int *address, unsigned long long int val)
     
    +__device__ float atomicExch (float *address, float val)
     
    +__device__ int atomicMin (int *address, int val)
     
    +__device__ unsigned int atomicMin (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicMin (unsigned long long int *address, unsigned long long int val)
     
    +__device__ int atomicMax (int *address, int val)
     
    +__device__ unsigned int atomicMax (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicMax (unsigned long long int *address, unsigned long long int val)
     
    +__device__ int atomicCAS (int *address, int compare, int val)
     
    +__device__ unsigned int atomicCAS (unsigned int *address, unsigned int compare, unsigned int val)
     
    +__device__ unsigned long long int atomicCAS (unsigned long long int *address, unsigned long long int compare, unsigned long long int val)
     
    +__device__ int atomicAnd (int *address, int val)
     
    +__device__ unsigned int atomicAnd (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicAnd (unsigned long long int *address, unsigned long long int val)
     
    +__device__ int atomicOr (int *address, int val)
     
    +__device__ unsigned int atomicOr (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicOr (unsigned long long int *address, unsigned long long int val)
     
    +__device__ int atomicXor (int *address, int val)
     
    +__device__ unsigned int atomicXor (unsigned int *address, unsigned int val)
     
    +__device__ unsigned long long int atomicXor (unsigned long long int *address, unsigned long long int val)
     
    +__device__ unsigned int atomicInc (unsigned int *address, unsigned int val)
     
    +__device__ unsigned int atomicDec (unsigned int *address, unsigned int val)
     
    +__device__ unsigned int __popc (unsigned int input)
     
    +__device__ unsigned int __popcll (unsigned long long int input)
     
    +__device__ unsigned int __clz (unsigned int input)
     
    +__device__ unsigned int __clzll (unsigned long long int input)
     
    +__device__ unsigned int __clz (int input)
     
    +__device__ unsigned int __clzll (long long int input)
     
    +__device__ unsigned int __ffs (unsigned int input)
     
    +__device__ unsigned int __ffsll (unsigned long long int input)
     
    +__device__ unsigned int __ffs (int input)
     
    +__device__ unsigned int __ffsll (long long int input)
     
    +__device__ unsigned int __brev (unsigned int input)
     
    +__device__ unsigned long long int __brevll (unsigned long long int input)
     
    +__device__ int __all (int input)
     
    +__device__ int __any (int input)
     
    +__device__ unsigned long long int __ballot (int input)
     
    +__device__ int __shfl (int input, int lane, int width)
     
    +__device__ int __shfl_up (int input, unsigned int lane_delta, int width)
     
    +__device__ int __shfl_down (int input, unsigned int lane_delta, int width)
     
    +__device__ int __shfl_xor (int input, int lane_mask, int width)
     
    +__device__ float __shfl (float input, int lane, int width)
     
    +__device__ float __shfl_up (float input, unsigned int lane_delta, int width)
     
    +__device__ float __shfl_down (float input, unsigned int lane_delta, int width)
     
    +__device__ float __shfl_xor (float input, int lane_mask, int width)
     
    +__host__ __device__ int min (int arg1, int arg2)
     
    +__host__ __device__ int max (int arg1, int arg2)
     
    +__device__ float __cosf (float x)
     
    +__device__ float __expf (float x)
     
    +__device__ float __frsqrt_rn (float x)
     
    +__device__ float __fsqrt_rd (float x)
     
    +__device__ float __fsqrt_rn (float x)
     
    +__device__ float __fsqrt_ru (float x)
     
    +__device__ float __fsqrt_rz (float x)
     
    +__device__ float __log10f (float x)
     
    +__device__ float __log2f (float x)
     
    +__device__ float __logf (float x)
     
    +__device__ float __powf (float base, float exponent)
     
    +__device__ void __sincosf (float x, float *s, float *c)
     
    +__device__ float __sinf (float x)
     
    +__device__ float __tanf (float x)
     
    +__device__ float __dsqrt_rd (double x)
     
    +__device__ float __dsqrt_rn (double x)
     
    +__device__ float __dsqrt_ru (double x)
     
    +__device__ float __dsqrt_rz (double x)
     
    @@ -197,7 +939,7 @@ const int  diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h_source.html index 17f164c7e0..3da659c60e 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_runtime.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_runtime.h Source File @@ -119,7 +119,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    33 
    34 
    35 //#include <cstring>
    -
    36 //#include <cmath>
    +
    36 #include <cmath>
    37 #include <string.h>
    38 #include <stddef.h>
    39 
    @@ -131,526 +131,527 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    45 //---
    46 // Remainder of this file only compiles with HCC
    47 #ifdef __HCC__
    -
    48 #if __cplusplus
    -
    49 #include <hc.hpp>
    -
    50 #endif
    -
    51 #include <grid_launch.h>
    -
    52 extern int HIP_TRACE_API;
    -
    53 
    -
    54 //TODO-HCC-GL - change this to typedef.
    -
    55 //typedef grid_launch_parm hipLaunchParm ;
    -
    56 #define hipLaunchParm grid_launch_parm
    -
    57 #ifdef __cplusplus
    -
    58 #include <hip/hcc_detail/hip_texture.h>
    -
    59 #endif
    -
    60 #include <hip/hcc_detail/host_defines.h>
    -
    61 // TODO-HCC remove old definitions ; ~1602 hcc supports __HCC_ACCELERATOR__ define.
    -
    62 #if defined (__KALMAR_ACCELERATOR__) && !defined (__HCC_ACCELERATOR__)
    -
    63 #define __HCC_ACCELERATOR__ __KALMAR_ACCELERATOR__
    -
    64 #endif
    -
    65 
    -
    66 // Feature tests:
    -
    67 #if defined(__HCC_ACCELERATOR__) && (__HCC_ACCELERATOR__ != 0)
    -
    68 // Device compile and not host compile:
    -
    69 
    -
    70 //TODO-HCC enable __HIP_ARCH_HAS_ATOMICS__ when HCC supports these.
    -
    71  // 32-bit Atomics:
    -
    72 #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1)
    -
    73 #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1)
    -
    74 #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1)
    -
    75 #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1)
    -
    76 #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (0)
    -
    77 
    -
    78 // 64-bit Atomics:
    -
    79 #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1)
    -
    80 #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0)
    -
    81 
    -
    82 // Doubles
    -
    83 #define __HIP_ARCH_HAS_DOUBLES__ (1)
    -
    84 
    -
    85 //warp cross-lane operations:
    -
    86 #define __HIP_ARCH_HAS_WARP_VOTE__ (1)
    -
    87 #define __HIP_ARCH_HAS_WARP_BALLOT__ (1)
    -
    88 #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1)
    -
    89 #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0)
    -
    90 
    -
    91 //sync
    -
    92 #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (0)
    -
    93 #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0)
    -
    94 
    -
    95 // misc
    -
    96 #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0)
    -
    97 #define __HIP_ARCH_HAS_3DGRID__ (1)
    -
    98 #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0)
    +
    48 #include <grid_launch.h>
    +
    49 extern int HIP_TRACE_API;
    +
    50 
    +
    51 //TODO-HCC-GL - change this to typedef.
    +
    52 //typedef grid_launch_parm hipLaunchParm ;
    +
    53 #define hipLaunchParm grid_launch_parm
    +
    54 #ifdef __cplusplus
    +
    55 #include <hip/hcc_detail/hip_texture.h>
    +
    56 #include <hip/hcc_detail/hip_ldg.h>
    +
    57 #endif
    +
    58 #include <hip/hcc_detail/host_defines.h>
    +
    59 // TODO-HCC remove old definitions ; ~1602 hcc supports __HCC_ACCELERATOR__ define.
    +
    60 #if defined (__KALMAR_ACCELERATOR__) && !defined (__HCC_ACCELERATOR__)
    +
    61 #define __HCC_ACCELERATOR__ __KALMAR_ACCELERATOR__
    +
    62 #endif
    +
    63 
    +
    64 // Feature tests:
    +
    65 #if defined(__HCC_ACCELERATOR__) && (__HCC_ACCELERATOR__ != 0)
    +
    66 // Device compile and not host compile:
    +
    67 
    +
    68 //TODO-HCC enable __HIP_ARCH_HAS_ATOMICS__ when HCC supports these.
    +
    69  // 32-bit Atomics:
    +
    70 #define __HIP_ARCH_HAS_GLOBAL_INT32_ATOMICS__ (1)
    +
    71 #define __HIP_ARCH_HAS_GLOBAL_FLOAT_ATOMIC_EXCH__ (1)
    +
    72 #define __HIP_ARCH_HAS_SHARED_INT32_ATOMICS__ (1)
    +
    73 #define __HIP_ARCH_HAS_SHARED_FLOAT_ATOMIC_EXCH__ (1)
    +
    74 #define __HIP_ARCH_HAS_FLOAT_ATOMIC_ADD__ (0)
    +
    75 
    +
    76 // 64-bit Atomics:
    +
    77 #define __HIP_ARCH_HAS_GLOBAL_INT64_ATOMICS__ (1)
    +
    78 #define __HIP_ARCH_HAS_SHARED_INT64_ATOMICS__ (0)
    +
    79 
    +
    80 // Doubles
    +
    81 #define __HIP_ARCH_HAS_DOUBLES__ (1)
    +
    82 
    +
    83 //warp cross-lane operations:
    +
    84 #define __HIP_ARCH_HAS_WARP_VOTE__ (1)
    +
    85 #define __HIP_ARCH_HAS_WARP_BALLOT__ (1)
    +
    86 #define __HIP_ARCH_HAS_WARP_SHUFFLE__ (1)
    +
    87 #define __HIP_ARCH_HAS_WARP_FUNNEL_SHIFT__ (0)
    +
    88 
    +
    89 //sync
    +
    90 #define __HIP_ARCH_HAS_THREAD_FENCE_SYSTEM__ (0)
    +
    91 #define __HIP_ARCH_HAS_SYNC_THREAD_EXT__ (0)
    +
    92 
    +
    93 // misc
    +
    94 #define __HIP_ARCH_HAS_SURFACE_FUNCS__ (0)
    +
    95 #define __HIP_ARCH_HAS_3DGRID__ (1)
    +
    96 #define __HIP_ARCH_HAS_DYNAMIC_PARALLEL__ (0)
    +
    97 
    +
    98 #endif /* Device feature flags */
    99 
    -
    100 #endif
    -
    101 
    -
    102 
    -
    103 //TODO-HCC this is currently ignored by HCC target of HIP
    -
    104 #define __launch_bounds__(requiredMaxThreadsPerBlock, minBlocksPerMultiprocessor)
    -
    105 
    -
    106 // Detect if we are compiling C++ mode or C mode
    -
    107 #if defined(__cplusplus)
    -
    108 #define __HCC_CPP__
    -
    109 #elif defined(__STDC_VERSION__)
    -
    110 #define __HCC_C__
    -
    111 #endif
    -
    112 
    -
    113 #if __cplusplus
    -
    114 __device__ float acosf(float x);
    -
    115 __device__ float acoshf(float x);
    -
    116 __device__ float asinf(float x);
    -
    117 __device__ float asinhf(float x);
    -
    118 __device__ float atan2f(float y, float x);
    -
    119 __device__ float atanf(float x);
    -
    120 __device__ float atanhf(float x);
    -
    121 __device__ float cbrtf(float x);
    -
    122 __device__ float ceilf(float x);
    -
    123 __device__ float copysignf(float x, float y);
    -
    124 __device__ float cosf(float x);
    -
    125 __device__ float coshf(float x);
    -
    126 __device__ float cyl_bessel_i0f(float x);
    -
    127 __device__ float cyl_bessel_i1f(float x);
    -
    128 __device__ float erfcf(float x);
    -
    129 __device__ float erfcinvf(float y);
    -
    130 __device__ float erfcxf(float x);
    -
    131 __device__ float erff(float x);
    -
    132 __device__ float erfinvf(float y);
    -
    133 __device__ float exp10f(float x);
    -
    134 __device__ float exp2f(float x);
    -
    135 __device__ float expf(float x);
    -
    136 __device__ float expm1f(float x);
    -
    137 __device__ float fabsf(float x);
    -
    138 __device__ float fdimf(float x, float y);
    -
    139 __device__ float fdividef(float x, float y);
    -
    140 __device__ float floorf(float x);
    -
    141 __device__ float fmaf(float x, float y, float z);
    -
    142 __device__ float fmaxf(float x, float y);
    -
    143 __device__ float fminf(float x, float y);
    -
    144 __device__ float fmodf(float x, float y);
    -
    145 __device__ float frexpf(float x, float y);
    -
    146 __device__ float hypotf(float x, float y);
    -
    147 __device__ float ilogbf(float x);
    -
    148 __device__ unsigned isfinite(float a);
    -
    149 __device__ unsigned isinf(float a);
    -
    150 __device__ unsigned isnan(float a);
    -
    151 __device__ float j0f(float x);
    -
    152 __device__ float j1f(float x);
    -
    153 __device__ float jnf(int n, float x);
    -
    154 __device__ float ldexpf(float x, int exp);
    -
    155 __device__ float lgammaf(float x);
    -
    156 __device__ long long int llrintf(float x);
    -
    157 __device__ long long int llroundf(float x);
    -
    158 __device__ float log10f(float x);
    -
    159 __device__ float log1pf(float x);
    -
    160 __device__ float log2f(float x);
    -
    161 __device__ float logbf(float x);
    -
    162 __device__ float logf(float x);
    -
    163 __device__ long int lrintf(float x);
    -
    164 __device__ long int lroundf(float x);
    -
    165 __device__ float modff(float x, float *iptr);
    -
    166 __device__ float nanf(const char* tagp);
    -
    167 __device__ float nearbyintf(float x);
    -
    168 __device__ float nextafterf(float x, float y);
    -
    169 __device__ float norm3df(float a, float b, float c);
    -
    170 __device__ float norm4df(float a, float b, float c, float d);
    -
    171 __device__ float normcdff(float y);
    -
    172 __device__ float normcdfinvf(float y);
    -
    173 __device__ float normf(int dim, const float *a);
    -
    174 __device__ float powf(float x, float y);
    -
    175 __device__ float rcbtrf(float x);
    -
    176 __device__ float remainderf(float x, float y);
    -
    177 __device__ float remquof(float x, float y, int *quo);
    -
    178 __device__ float rhypotf(float x, float y);
    -
    179 __device__ float rintf(float x);
    -
    180 __device__ float rnorm3df(float a, float b, float c);
    -
    181 __device__ float rnorm4df(float a, float b, float c, float d);
    -
    182 __device__ float rnormf(int dim, const float* a);
    -
    183 __device__ float roundf(float x);
    -
    184 __device__ float rsqrtf(float x);
    -
    185 __device__ float scalblnf(float x, long int n);
    -
    186 __device__ float scalbnf(float x, int n);
    -
    187 __device__ unsigned signbit(float a);
    -
    188 __device__ void sincosf(float x, float *sptr, float *cptr);
    -
    189 __device__ void sincospif(float x, float *sptr, float *cptr);
    -
    190 __device__ float sinf(float x);
    -
    191 __device__ float sinhf(float x);
    -
    192 __device__ float sinpif(float x);
    -
    193 __device__ float sqrtf(float x);
    -
    194 __device__ float tanf(float x);
    -
    195 __device__ float tanhf(float x);
    -
    196 __device__ float tgammaf(float x);
    -
    197 __device__ float truncf(float x);
    -
    198 __device__ float y0f(float x);
    -
    199 __device__ float y1f(float x);
    -
    200 __device__ float ynf(int n, float x);
    -
    201 
    -
    202 __host__ __device__ float cospif(float x);
    -
    203 __host__ __device__ float sinpif(float x);
    -
    204 __device__ float sqrtf(float x);
    -
    205 __host__ __device__ float rsqrtf(float x);
    -
    206 
    -
    207 __device__ double acos(double x);
    -
    208 __device__ double acosh(double x);
    -
    209 __device__ double asin(double x);
    -
    210 __device__ double asinh(double x);
    -
    211 __device__ double atan(double x);
    -
    212 __device__ double atan2(double y, double x);
    -
    213 __device__ double atanh(double x);
    -
    214 __device__ double cbrt(double x);
    -
    215 __device__ double ceil(double x);
    -
    216 __device__ double copysign(double x, double y);
    -
    217 __device__ double cos(double x);
    -
    218 __device__ double cosh(double x);
    -
    219 __host__ __device__ double cospi(double x);
    -
    220 __device__ double cyl_bessel_i0(double x);
    -
    221 __device__ double cyl_bessel_i1(double x);
    -
    222 __device__ double erf(double x);
    -
    223 __device__ double erfc(double x);
    -
    224 __device__ double erfcinv(double y);
    -
    225 __device__ double erfcx(double x);
    -
    226 __device__ double exp(double x);
    -
    227 __device__ double exp10(double x);
    -
    228 __device__ double exp2(double x);
    -
    229 __device__ double expm1(double x);
    -
    230 __device__ double fabs(double x);
    -
    231 __device__ double fdim(double x, double y);
    -
    232 __device__ double floor(double x);
    -
    233 __device__ double fma(double x, double y, double z);
    -
    234 __device__ double fmax(double x, double y);
    -
    235 __device__ double fmin(double x, double y);
    -
    236 __device__ double fmod(double x, double y);
    -
    237 __device__ double frexp(double x, int *nptr);
    -
    238 __device__ double hypot(double x, double y);
    -
    239 __device__ double ilogb(double x);
    -
    240 __device__ unsigned isfinite(double x);
    -
    241 __device__ unsigned isinf(double x);
    -
    242 __device__ unsigned isnan(double x);
    -
    243 __device__ double j0(double x);
    -
    244 __device__ double j1(double x);
    -
    245 __device__ double jn(int n, double x);
    -
    246 __device__ double ldexp(double x, int exp);
    -
    247 __device__ double lgamma(double x);
    -
    248 __device__ long long llrint(double x);
    -
    249 __device__ long llround(double x);
    -
    250 __device__ double log(double x);
    -
    251 __device__ double log10(double x);
    -
    252 __device__ double log1p(double x);
    -
    253 __device__ double log2(double x);
    -
    254 __device__ double logb(double x);
    -
    255 __device__ long int lrint(double x);
    -
    256 __device__ long int lround(double x);
    -
    257 __device__ double modf(double x, double *iptr);
    -
    258 __device__ double nan(const char* tagp);
    -
    259 __device__ double nearbyint(double x);
    -
    260 __device__ double nextafter(double x, double y);
    -
    261 __device__ double norm(int dim, const double* t);
    -
    262 __device__ double norm3d(double a, double b, double c);
    -
    263 __device__ double norm4d(double a, double b, double d);
    -
    264 __device__ double normcdf(double y);
    -
    265 __device__ double normcdfinv(double y);
    -
    266 __device__ double pow(double x, double y);
    -
    267 __device__ double rcbrt(double x);
    -
    268 __device__ double remainder(double x, double y);
    -
    269 __device__ double remquo(double x, double y, int *quo);
    -
    270 __device__ double rhypot(double x, double y);
    -
    271 __device__ double rint(double x);
    -
    272 __device__ double rnorm(int dim, const double* t);
    -
    273 __device__ double rnorm3d(double a, double b, double c);
    -
    274 __device__ double rnorm4d(double a, double b, double c, double d);
    -
    275 __device__ double round(double x);
    -
    276 __host__ __device__ double rsqrt(double x);
    -
    277 __device__ double scalbln(double x, long int n);
    -
    278 __device__ double scalbn(double x, int n);
    -
    279 __device__ unsigned signbit(double a);
    -
    280 __device__ double sin(double a);
    -
    281 __device__ double sincos(double x, double *sptr, double *cptr);
    -
    282 __device__ double sincospi(double x, double *sptr, double *cptr);
    -
    283 __device__ double sinh(double x);
    -
    284 __host__ __device__ double sinpi(double x);
    -
    285 __device__ double sqrt(double x);
    -
    286 __device__ double tan(double x);
    -
    287 __device__ double tanh(double x);
    -
    288 __device__ double tgamma(double x);
    -
    289 __device__ double trunc(double x);
    -
    290 __device__ double y0(double x);
    -
    291 __device__ double y1(double y);
    -
    292 __device__ double yn(int n, double x);
    -
    293 #endif
    -
    294 
    -
    295 // TODO - hipify-clang - change to use the function call.
    -
    296 //#define warpSize hc::__wavesize()
    -
    297 extern const int warpSize;
    -
    298 
    +
    100 
    +
    101 //TODO-HCC this is currently ignored by HCC target of HIP
    +
    102 #define __launch_bounds__(requiredMaxThreadsPerBlock, minBlocksPerMultiprocessor)
    +
    103 
    +
    104 // Detect if we are compiling C++ mode or C mode
    +
    105 #if defined(__cplusplus)
    +
    106 #define __HCC_CPP__
    +
    107 #elif defined(__STDC_VERSION__)
    +
    108 #define __HCC_C__
    +
    109 #endif
    +
    110 
    +
    111 __device__ float acosf(float x);
    +
    112 __device__ float acoshf(float x);
    +
    113 __device__ float asinf(float x);
    +
    114 __device__ float asinhf(float x);
    +
    115 __device__ float atan2f(float y, float x);
    +
    116 __device__ float atanf(float x);
    +
    117 __device__ float atanhf(float x);
    +
    118 __device__ float cbrtf(float x);
    +
    119 __device__ float ceilf(float x);
    +
    120 __device__ float copysignf(float x, float y);
    +
    121 __device__ float cosf(float x);
    +
    122 __device__ float coshf(float x);
    +
    123 __device__ float cyl_bessel_i0f(float x);
    +
    124 __device__ float cyl_bessel_i1f(float x);
    +
    125 __device__ float erfcf(float x);
    +
    126 __device__ float erfcinvf(float y);
    +
    127 __device__ float erfcxf(float x);
    +
    128 __device__ float erff(float x);
    +
    129 __device__ float erfinvf(float y);
    +
    130 __device__ float exp10f(float x);
    +
    131 __device__ float exp2f(float x);
    +
    132 __device__ float expf(float x);
    +
    133 __device__ float expm1f(float x);
    +
    134 __device__ float fabsf(float x);
    +
    135 __device__ float fdimf(float x, float y);
    +
    136 __device__ float fdividef(float x, float y);
    +
    137 __device__ float floorf(float x);
    +
    138 __device__ float fmaf(float x, float y, float z);
    +
    139 __device__ float fmaxf(float x, float y);
    +
    140 __device__ float fminf(float x, float y);
    +
    141 __device__ float fmodf(float x, float y);
    +
    142 __device__ float frexpf(float x, float y);
    +
    143 __device__ float hypotf(float x, float y);
    +
    144 __device__ float ilogbf(float x);
    +
    145 __host__ __device__ unsigned isfinite(float a);
    +
    146 __device__ unsigned isinf(float a);
    +
    147 __device__ unsigned isnan(float a);
    +
    148 __device__ float j0f(float x);
    +
    149 __device__ float j1f(float x);
    +
    150 __device__ float jnf(int n, float x);
    +
    151 __device__ float ldexpf(float x, int exp);
    +
    152 __device__ float lgammaf(float x);
    +
    153 __device__ long long int llrintf(float x);
    +
    154 __device__ long long int llroundf(float x);
    +
    155 __device__ float log10f(float x);
    +
    156 __device__ float log1pf(float x);
    +
    157 __device__ float log2f(float x);
    +
    158 __device__ float logbf(float x);
    +
    159 __device__ float logf(float x);
    +
    160 __device__ long int lrintf(float x);
    +
    161 __device__ long int lroundf(float x);
    +
    162 __device__ float modff(float x, float *iptr);
    +
    163 __device__ float nanf(const char* tagp);
    +
    164 __device__ float nearbyintf(float x);
    +
    165 __device__ float nextafterf(float x, float y);
    +
    166 __device__ float norm3df(float a, float b, float c);
    +
    167 __device__ float norm4df(float a, float b, float c, float d);
    +
    168 __device__ float normcdff(float y);
    +
    169 __device__ float normcdfinvf(float y);
    +
    170 __device__ float normf(int dim, const float *a);
    +
    171 __device__ float powf(float x, float y);
    +
    172 __device__ float rcbtrf(float x);
    +
    173 __device__ float remainderf(float x, float y);
    +
    174 __device__ float remquof(float x, float y, int *quo);
    +
    175 __device__ float rhypotf(float x, float y);
    +
    176 __device__ float rintf(float x);
    +
    177 __device__ float rnorm3df(float a, float b, float c);
    +
    178 __device__ float rnorm4df(float a, float b, float c, float d);
    +
    179 __device__ float rnormf(int dim, const float* a);
    +
    180 __device__ float roundf(float x);
    +
    181 __device__ float rsqrtf(float x);
    +
    182 __device__ float scalblnf(float x, long int n);
    +
    183 __device__ float scalbnf(float x, int n);
    +
    184 __host__ __device__ unsigned signbit(float a);
    +
    185 __device__ void sincosf(float x, float *sptr, float *cptr);
    +
    186 __device__ void sincospif(float x, float *sptr, float *cptr);
    +
    187 __device__ float sinf(float x);
    +
    188 __device__ float sinhf(float x);
    +
    189 __device__ float sinpif(float x);
    +
    190 __device__ float sqrtf(float x);
    +
    191 __device__ float tanf(float x);
    +
    192 __device__ float tanhf(float x);
    +
    193 __device__ float tgammaf(float x);
    +
    194 __device__ float truncf(float x);
    +
    195 __device__ float y0f(float x);
    +
    196 __device__ float y1f(float x);
    +
    197 __device__ float ynf(int n, float x);
    +
    198 
    +
    199 __host__ __device__ float cospif(float x);
    +
    200 __host__ __device__ float sinpif(float x);
    +
    201 __device__ float sqrtf(float x);
    +
    202 __host__ __device__ float rsqrtf(float x);
    +
    203 
    +
    204 __device__ double acos(double x);
    +
    205 __device__ double acosh(double x);
    +
    206 __device__ double asin(double x);
    +
    207 __device__ double asinh(double x);
    +
    208 __device__ double atan(double x);
    +
    209 __device__ double atan2(double y, double x);
    +
    210 __device__ double atanh(double x);
    +
    211 __device__ double cbrt(double x);
    +
    212 __device__ double ceil(double x);
    +
    213 __device__ double copysign(double x, double y);
    +
    214 __device__ double cos(double x);
    +
    215 __device__ double cosh(double x);
    +
    216 __host__ __device__ double cospi(double x);
    +
    217 __device__ double cyl_bessel_i0(double x);
    +
    218 __device__ double cyl_bessel_i1(double x);
    +
    219 __device__ double erf(double x);
    +
    220 __device__ double erfc(double x);
    +
    221 __device__ double erfcinv(double y);
    +
    222 __device__ double erfcx(double x);
    +
    223 __device__ double exp(double x);
    +
    224 __device__ double exp10(double x);
    +
    225 __device__ double exp2(double x);
    +
    226 __device__ double expm1(double x);
    +
    227 __device__ double fabs(double x);
    +
    228 __device__ double fdim(double x, double y);
    +
    229 __device__ double floor(double x);
    +
    230 __device__ double fma(double x, double y, double z);
    +
    231 __device__ double fmax(double x, double y);
    +
    232 __device__ double fmin(double x, double y);
    +
    233 __device__ double fmod(double x, double y);
    +
    234 __device__ double frexp(double x, int *nptr);
    +
    235 __device__ double hypot(double x, double y);
    +
    236 __device__ double ilogb(double x);
    +
    237 __host__ __device__ unsigned isfinite(double x);
    +
    238 __device__ unsigned isinf(double x);
    +
    239 __device__ unsigned isnan(double x);
    +
    240 __device__ double j0(double x);
    +
    241 __device__ double j1(double x);
    +
    242 __device__ double jn(int n, double x);
    +
    243 __device__ double ldexp(double x, int exp);
    +
    244 __device__ double lgamma(double x);
    +
    245 __device__ long long llrint(double x);
    +
    246 __device__ long llround(double x);
    +
    247 __device__ double log(double x);
    +
    248 __device__ double log10(double x);
    +
    249 __device__ double log1p(double x);
    +
    250 __device__ double log2(double x);
    +
    251 __device__ double logb(double x);
    +
    252 __device__ long int lrint(double x);
    +
    253 __device__ long int lround(double x);
    +
    254 __device__ double modf(double x, double *iptr);
    +
    255 __device__ double nan(const char* tagp);
    +
    256 __device__ double nearbyint(double x);
    +
    257 __device__ double nextafter(double x, double y);
    +
    258 __device__ double norm(int dim, const double* t);
    +
    259 __device__ double norm3d(double a, double b, double c);
    +
    260 __device__ double norm4d(double a, double b, double d);
    +
    261 __device__ double normcdf(double y);
    +
    262 __device__ double normcdfinv(double y);
    +
    263 __device__ double pow(double x, double y);
    +
    264 __device__ double rcbrt(double x);
    +
    265 __device__ double remainder(double x, double y);
    +
    266 __device__ double remquo(double x, double y, int *quo);
    +
    267 __device__ double rhypot(double x, double y);
    +
    268 __device__ double rint(double x);
    +
    269 __device__ double rnorm(int dim, const double* t);
    +
    270 __device__ double rnorm3d(double a, double b, double c);
    +
    271 __device__ double rnorm4d(double a, double b, double c, double d);
    +
    272 __device__ double round(double x);
    +
    273 __host__ __device__ double rsqrt(double x);
    +
    274 __device__ double scalbln(double x, long int n);
    +
    275 __device__ double scalbn(double x, int n);
    +
    276 __host__ __device__ unsigned signbit(double a);
    +
    277 __device__ double sin(double a);
    +
    278 __device__ double sincos(double x, double *sptr, double *cptr);
    +
    279 __device__ double sincospi(double x, double *sptr, double *cptr);
    +
    280 __device__ double sinh(double x);
    +
    281 __host__ __device__ double sinpi(double x);
    +
    282 __device__ double sqrt(double x);
    +
    283 __device__ double tan(double x);
    +
    284 __device__ double tanh(double x);
    +
    285 __device__ double tgamma(double x);
    +
    286 __device__ double trunc(double x);
    +
    287 __device__ double y0(double x);
    +
    288 __device__ double y1(double y);
    +
    289 __device__ double yn(int n, double x);
    +
    290 
    +
    291 // TODO - hipify-clang - change to use the function call.
    +
    292 //#define warpSize hc::__wavesize()
    +
    293 extern const int warpSize;
    +
    294 
    +
    295 
    +
    296 #define clock_t long long int
    +
    297 __device__ long long int clock64();
    +
    298 __device__ clock_t clock();
    299 
    -
    300 #define clock_t long long int
    -
    301 __device__ long long int clock64();
    -
    302 __device__ clock_t clock();
    -
    303 #if __cplusplus
    -
    304 //atomicAdd()
    -
    305 __device__ int atomicAdd(int* address, int val);
    -
    306 __device__ unsigned int atomicAdd(unsigned int* address,
    -
    307  unsigned int val);
    -
    308 
    -
    309 __device__ unsigned long long int atomicAdd(unsigned long long int* address,
    -
    310  unsigned long long int val);
    -
    311 
    -
    312 __device__ float atomicAdd(float* address, float val);
    +
    300 //atomicAdd()
    +
    301 __device__ int atomicAdd(int* address, int val);
    +
    302 __device__ unsigned int atomicAdd(unsigned int* address,
    +
    303  unsigned int val);
    +
    304 
    +
    305 __device__ unsigned long long int atomicAdd(unsigned long long int* address,
    +
    306  unsigned long long int val);
    +
    307 
    +
    308 __device__ float atomicAdd(float* address, float val);
    +
    309 
    +
    310 
    +
    311 //atomicSub()
    +
    312 __device__ int atomicSub(int* address, int val);
    313 
    -
    314 
    -
    315 //atomicSub()
    -
    316 __device__ int atomicSub(int* address, int val);
    +
    314 __device__ unsigned int atomicSub(unsigned int* address,
    +
    315  unsigned int val);
    +
    316 
    317 
    -
    318 __device__ unsigned int atomicSub(unsigned int* address,
    -
    319  unsigned int val);
    +
    318 //atomicExch()
    +
    319 __device__ int atomicExch(int* address, int val);
    320 
    -
    321 
    -
    322 //atomicExch()
    -
    323 __device__ int atomicExch(int* address, int val);
    -
    324 
    -
    325 __device__ unsigned int atomicExch(unsigned int* address,
    -
    326  unsigned int val);
    -
    327 
    -
    328 __device__ unsigned long long int atomicExch(unsigned long long int* address,
    -
    329  unsigned long long int val);
    -
    330 
    -
    331 __device__ float atomicExch(float* address, float val);
    -
    332 
    -
    333 
    -
    334 //atomicMin()
    -
    335 __device__ int atomicMin(int* address, int val);
    -
    336 __device__ unsigned int atomicMin(unsigned int* address,
    -
    337  unsigned int val);
    -
    338 __device__ unsigned long long int atomicMin(unsigned long long int* address,
    -
    339  unsigned long long int val);
    -
    340 
    -
    341 
    -
    342 //atomicMax()
    -
    343 __device__ int atomicMax(int* address, int val);
    -
    344 __device__ unsigned int atomicMax(unsigned int* address,
    -
    345  unsigned int val);
    -
    346 __device__ unsigned long long int atomicMax(unsigned long long int* address,
    -
    347  unsigned long long int val);
    -
    348 
    -
    349 
    -
    350 //atomicCAS()
    -
    351 __device__ int atomicCAS(int* address, int compare, int val);
    -
    352 __device__ unsigned int atomicCAS(unsigned int* address,
    -
    353  unsigned int compare,
    -
    354  unsigned int val);
    -
    355 __device__ unsigned long long int atomicCAS(unsigned long long int* address,
    -
    356  unsigned long long int compare,
    -
    357  unsigned long long int val);
    -
    358 
    -
    359 
    -
    360 //atomicAnd()
    -
    361 __device__ int atomicAnd(int* address, int val);
    -
    362 __device__ unsigned int atomicAnd(unsigned int* address,
    -
    363  unsigned int val);
    -
    364 __device__ unsigned long long int atomicAnd(unsigned long long int* address,
    -
    365  unsigned long long int val);
    -
    366 
    -
    367 
    -
    368 //atomicOr()
    -
    369 __device__ int atomicOr(int* address, int val);
    -
    370 __device__ unsigned int atomicOr(unsigned int* address,
    -
    371  unsigned int val);
    -
    372 __device__ unsigned long long int atomicOr(unsigned long long int* address,
    -
    373  unsigned long long int val);
    -
    374 
    -
    375 
    -
    376 //atomicXor()
    -
    377 __device__ int atomicXor(int* address, int val);
    -
    378 __device__ unsigned int atomicXor(unsigned int* address,
    -
    379  unsigned int val);
    -
    380 __device__ unsigned long long int atomicXor(unsigned long long int* address,
    -
    381  unsigned long long int val);
    +
    321 __device__ unsigned int atomicExch(unsigned int* address,
    +
    322  unsigned int val);
    +
    323 
    +
    324 __device__ unsigned long long int atomicExch(unsigned long long int* address,
    +
    325  unsigned long long int val);
    +
    326 
    +
    327 __device__ float atomicExch(float* address, float val);
    +
    328 
    +
    329 
    +
    330 //atomicMin()
    +
    331 __device__ int atomicMin(int* address, int val);
    +
    332 __device__ unsigned int atomicMin(unsigned int* address,
    +
    333  unsigned int val);
    +
    334 __device__ unsigned long long int atomicMin(unsigned long long int* address,
    +
    335  unsigned long long int val);
    +
    336 
    +
    337 
    +
    338 //atomicMax()
    +
    339 __device__ int atomicMax(int* address, int val);
    +
    340 __device__ unsigned int atomicMax(unsigned int* address,
    +
    341  unsigned int val);
    +
    342 __device__ unsigned long long int atomicMax(unsigned long long int* address,
    +
    343  unsigned long long int val);
    +
    344 
    +
    345 
    +
    346 //atomicCAS()
    +
    347 __device__ int atomicCAS(int* address, int compare, int val);
    +
    348 __device__ unsigned int atomicCAS(unsigned int* address,
    +
    349  unsigned int compare,
    +
    350  unsigned int val);
    +
    351 __device__ unsigned long long int atomicCAS(unsigned long long int* address,
    +
    352  unsigned long long int compare,
    +
    353  unsigned long long int val);
    +
    354 
    +
    355 
    +
    356 //atomicAnd()
    +
    357 __device__ int atomicAnd(int* address, int val);
    +
    358 __device__ unsigned int atomicAnd(unsigned int* address,
    +
    359  unsigned int val);
    +
    360 __device__ unsigned long long int atomicAnd(unsigned long long int* address,
    +
    361  unsigned long long int val);
    +
    362 
    +
    363 
    +
    364 //atomicOr()
    +
    365 __device__ int atomicOr(int* address, int val);
    +
    366 __device__ unsigned int atomicOr(unsigned int* address,
    +
    367  unsigned int val);
    +
    368 __device__ unsigned long long int atomicOr(unsigned long long int* address,
    +
    369  unsigned long long int val);
    +
    370 
    +
    371 
    +
    372 //atomicXor()
    +
    373 __device__ int atomicXor(int* address, int val);
    +
    374 __device__ unsigned int atomicXor(unsigned int* address,
    +
    375  unsigned int val);
    +
    376 __device__ unsigned long long int atomicXor(unsigned long long int* address,
    +
    377  unsigned long long int val);
    +
    378 
    +
    379 //atomicInc()
    +
    380 __device__ unsigned int atomicInc(unsigned int* address,
    +
    381  unsigned int val);
    382 
    383 
    -
    384 // integer intrinsic function __poc __clz __ffs __brev
    -
    385 __device__ unsigned int __popc( unsigned int input);
    -
    386 __device__ unsigned int __popcll( unsigned long long int input);
    -
    387 __device__ unsigned int __clz(unsigned int input);
    -
    388 __device__ unsigned int __clzll(unsigned long long int input);
    -
    389 __device__ unsigned int __clz(int input);
    -
    390 __device__ unsigned int __clzll(long long int input);
    -
    391 __device__ unsigned int __ffs(unsigned int input);
    -
    392 __device__ unsigned int __ffsll(unsigned long long int input);
    -
    393 __device__ unsigned int __ffs(int input);
    -
    394 __device__ unsigned int __ffsll(long long int input);
    -
    395 __device__ unsigned int __brev( unsigned int input);
    -
    396 __device__ unsigned long long int __brevll( unsigned long long int input);
    -
    397 
    -
    398 
    -
    399 // warp vote function __all __any __ballot
    -
    400 __device__ int __all( int input);
    -
    401 __device__ int __any( int input);
    -
    402 __device__ unsigned long long int __ballot( int input);
    +
    384 //atomicDec()
    +
    385 __device__ unsigned int atomicDec(unsigned int* address,
    +
    386  unsigned int val);
    +
    387 
    +
    388 
    +
    389 // integer intrinsic function __poc __clz __ffs __brev
    +
    390 __device__ unsigned int __popc( unsigned int input);
    +
    391 __device__ unsigned int __popcll( unsigned long long int input);
    +
    392 __device__ unsigned int __clz(unsigned int input);
    +
    393 __device__ unsigned int __clzll(unsigned long long int input);
    +
    394 __device__ unsigned int __clz(int input);
    +
    395 __device__ unsigned int __clzll(long long int input);
    +
    396 __device__ unsigned int __ffs(unsigned int input);
    +
    397 __device__ unsigned int __ffsll(unsigned long long int input);
    +
    398 __device__ unsigned int __ffs(int input);
    +
    399 __device__ unsigned int __ffsll(long long int input);
    +
    400 __device__ unsigned int __brev( unsigned int input);
    +
    401 __device__ unsigned long long int __brevll( unsigned long long int input);
    +
    402 
    403 
    -
    404 // warp shuffle functions
    -
    405 #ifdef __cplusplus
    -
    406 
    -
    407 __device__ int __shfl(int input, int lane, int width=warpSize);
    -
    408 __device__ int __shfl_up(int input, unsigned int lane_delta, int width=warpSize);
    -
    409 __device__ int __shfl_down(int input, unsigned int lane_delta, int width=warpSize);
    -
    410 __device__ int __shfl_xor(int input, int lane_mask, int width=warpSize);
    -
    411 __device__ float __shfl(float input, int lane, int width=warpSize);
    -
    412 __device__ float __shfl_up(float input, unsigned int lane_delta, int width=warpSize);
    -
    413 __device__ float __shfl_down(float input, unsigned int lane_delta, int width=warpSize);
    -
    414 __device__ float __shfl_xor(float input, int lane_mask, int width=warpSize);
    -
    415 #else
    -
    416 __device__ int __shfl(int input, int lane, int width);
    -
    417 __device__ int __shfl_up(int input, unsigned int lane_delta, int width);
    -
    418 __device__ int __shfl_down(int input, unsigned int lane_delta, int width);
    -
    419 __device__ int __shfl_xor(int input, int lane_mask, int width);
    -
    420 __device__ float __shfl(float input, int lane, int width);
    -
    421 __device__ float __shfl_up(float input, unsigned int lane_delta, int width);
    -
    422 __device__ float __shfl_down(float input, unsigned int lane_delta, int width);
    -
    423 __device__ float __shfl_xor(float input, int lane_mask, int width);
    -
    424 #endif
    -
    425 
    -
    426 __host__ __device__ int min(int arg1, int arg2);
    -
    427 __host__ __device__ int max(int arg1, int arg2);
    -
    428 
    -
    429 //TODO - add a couple fast math operations here, the set here will grow :
    -
    430 __device__ float __cosf(float x);
    -
    431 __device__ float __expf(float x);
    -
    432 __device__ float __frsqrt_rn(float x);
    -
    433 __device__ float __fsqrt_rd(float x);
    -
    434 __device__ float __fsqrt_rn(float x);
    -
    435 __device__ float __fsqrt_ru(float x);
    -
    436 __device__ float __fsqrt_rz(float x);
    -
    437 __device__ float __log10f(float x);
    -
    438 __device__ float __log2f(float x);
    -
    439 __device__ float __logf(float x);
    -
    440 __device__ float __powf(float base, float exponent);
    -
    441 __device__ void __sincosf(float x, float *s, float *c) ;
    -
    442 __device__ float __sinf(float x);
    -
    443 __device__ float __tanf(float x);
    -
    444 __device__ float __dsqrt_rd(double x);
    -
    445 __device__ float __dsqrt_rn(double x);
    -
    446 __device__ float __dsqrt_ru(double x);
    -
    447 __device__ float __dsqrt_rz(double x);
    -
    448 #endif
    -
    449 
    -
    453 #if __hcc_workweek__ >= 16123
    -
    454 
    -
    455 #define hipThreadIdx_x (amp_get_local_id(0))
    -
    456 #define hipThreadIdx_y (amp_get_local_id(1))
    -
    457 #define hipThreadIdx_z (amp_get_local_id(2))
    +
    404 // warp vote function __all __any __ballot
    +
    405 __device__ int __all( int input);
    +
    406 __device__ int __any( int input);
    +
    407 __device__ unsigned long long int __ballot( int input);
    +
    408 
    +
    409 // warp shuffle functions
    +
    410 #ifdef __cplusplus
    +
    411 __device__ int __shfl(int input, int lane, int width=warpSize);
    +
    412 __device__ int __shfl_up(int input, unsigned int lane_delta, int width=warpSize);
    +
    413 __device__ int __shfl_down(int input, unsigned int lane_delta, int width=warpSize);
    +
    414 __device__ int __shfl_xor(int input, int lane_mask, int width=warpSize);
    +
    415 __device__ float __shfl(float input, int lane, int width=warpSize);
    +
    416 __device__ float __shfl_up(float input, unsigned int lane_delta, int width=warpSize);
    +
    417 __device__ float __shfl_down(float input, unsigned int lane_delta, int width=warpSize);
    +
    418 __device__ float __shfl_xor(float input, int lane_mask, int width=warpSize);
    +
    419 #else
    +
    420 __device__ int __shfl(int input, int lane, int width);
    +
    421 __device__ int __shfl_up(int input, unsigned int lane_delta, int width);
    +
    422 __device__ int __shfl_down(int input, unsigned int lane_delta, int width);
    +
    423 __device__ int __shfl_xor(int input, int lane_mask, int width);
    +
    424 __device__ float __shfl(float input, int lane, int width);
    +
    425 __device__ float __shfl_up(float input, unsigned int lane_delta, int width);
    +
    426 __device__ float __shfl_down(float input, unsigned int lane_delta, int width);
    +
    427 __device__ float __shfl_xor(float input, int lane_mask, int width);
    +
    428 #endif
    +
    429 
    +
    430 __host__ __device__ int min(int arg1, int arg2);
    +
    431 __host__ __device__ int max(int arg1, int arg2);
    +
    432 
    +
    433 //TODO - add a couple fast math operations here, the set here will grow :
    +
    434 __device__ float __cosf(float x);
    +
    435 __device__ float __expf(float x);
    +
    436 __device__ float __frsqrt_rn(float x);
    +
    437 __device__ float __fsqrt_rd(float x);
    +
    438 __device__ float __fsqrt_rn(float x);
    +
    439 __device__ float __fsqrt_ru(float x);
    +
    440 __device__ float __fsqrt_rz(float x);
    +
    441 __device__ float __log10f(float x);
    +
    442 __device__ float __log2f(float x);
    +
    443 __device__ float __logf(float x);
    +
    444 __device__ float __powf(float base, float exponent);
    +
    445 __device__ void __sincosf(float x, float *s, float *c) ;
    +
    446 __device__ float __sinf(float x);
    +
    447 __device__ float __tanf(float x);
    +
    448 __device__ float __dsqrt_rd(double x);
    +
    449 __device__ float __dsqrt_rn(double x);
    +
    450 __device__ float __dsqrt_ru(double x);
    +
    451 __device__ float __dsqrt_rz(double x);
    +
    456 // Choose correct polarity of xyz/zyx ordering:
    +
    457 #if __hcc_workweek__ >= 16123
    458 
    -
    459 #define hipBlockIdx_x (hc_get_group_id(0))
    -
    460 #define hipBlockIdx_y (hc_get_group_id(1))
    -
    461 #define hipBlockIdx_z (hc_get_group_id(2))
    +
    459 #define hipThreadIdx_x (amp_get_local_id(0))
    +
    460 #define hipThreadIdx_y (amp_get_local_id(1))
    +
    461 #define hipThreadIdx_z (amp_get_local_id(2))
    462 
    -
    463 #define hipBlockDim_x (amp_get_local_size(0))
    -
    464 #define hipBlockDim_y (amp_get_local_size(1))
    -
    465 #define hipBlockDim_z (amp_get_local_size(2))
    +
    463 #define hipBlockIdx_x (hc_get_group_id(0))
    +
    464 #define hipBlockIdx_y (hc_get_group_id(1))
    +
    465 #define hipBlockIdx_z (hc_get_group_id(2))
    466 
    -
    467 #define hipGridDim_x (hc_get_num_groups(0))
    -
    468 #define hipGridDim_y (hc_get_num_groups(1))
    -
    469 #define hipGridDim_z (hc_get_num_groups(2))
    +
    467 #define hipBlockDim_x (amp_get_local_size(0))
    +
    468 #define hipBlockDim_y (amp_get_local_size(1))
    +
    469 #define hipBlockDim_z (amp_get_local_size(2))
    470 
    -
    471 #else
    -
    472 
    -
    473 #define hipThreadIdx_x (amp_get_local_id(2))
    -
    474 #define hipThreadIdx_y (amp_get_local_id(1))
    -
    475 #define hipThreadIdx_z (amp_get_local_id(0))
    +
    471 #define hipGridDim_x (hc_get_num_groups(0))
    +
    472 #define hipGridDim_y (hc_get_num_groups(1))
    +
    473 #define hipGridDim_z (hc_get_num_groups(2))
    +
    474 
    +
    475 #else
    476 
    -
    477 #define hipBlockIdx_x (hc_get_group_id(2))
    -
    478 #define hipBlockIdx_y (hc_get_group_id(1))
    -
    479 #define hipBlockIdx_z (hc_get_group_id(0))
    +
    477 #define hipThreadIdx_x (amp_get_local_id(2))
    +
    478 #define hipThreadIdx_y (amp_get_local_id(1))
    +
    479 #define hipThreadIdx_z (amp_get_local_id(0))
    480 
    -
    481 #define hipBlockDim_x (amp_get_local_size(2))
    -
    482 #define hipBlockDim_y (amp_get_local_size(1))
    -
    483 #define hipBlockDim_z (amp_get_local_size(0))
    +
    481 #define hipBlockIdx_x (hc_get_group_id(2))
    +
    482 #define hipBlockIdx_y (hc_get_group_id(1))
    +
    483 #define hipBlockIdx_z (hc_get_group_id(0))
    484 
    -
    485 #define hipGridDim_x (hc_get_num_groups(2))
    -
    486 #define hipGridDim_y (hc_get_num_groups(1))
    -
    487 #define hipGridDim_z (hc_get_num_groups(0))
    +
    485 #define hipBlockDim_x (amp_get_local_size(2))
    +
    486 #define hipBlockDim_y (amp_get_local_size(1))
    +
    487 #define hipBlockDim_z (amp_get_local_size(0))
    488 
    -
    489 #endif
    -
    490 
    -
    491 #define __syncthreads() hc_barrier(CLK_LOCAL_MEM_FENCE)
    +
    489 #define hipGridDim_x (hc_get_num_groups(2))
    +
    490 #define hipGridDim_y (hc_get_num_groups(1))
    +
    491 #define hipGridDim_z (hc_get_num_groups(0))
    492 
    -
    493 #define HIP_KERNEL_NAME(...) __VA_ARGS__
    +
    493 #endif // __hcc_workweek__ check
    494 
    -
    495 #ifdef __HCC_CPP__
    -
    496 hipStream_t ihipPreLaunchKernel(hipStream_t stream, hc::accelerator_view **av);
    -
    497 void ihipPostLaunchKernel(hipStream_t stream, hc::completion_future &cf);
    -
    498 
    -
    499 // TODO - move to common header file.
    -
    500 #define KNRM "\x1B[0m"
    -
    501 #define KGRN "\x1B[32m"
    -
    502 
    -
    503 #if not defined(DISABLE_GRID_LAUNCH)
    -
    504 #define hipLaunchKernel(_kernelName, _numBlocks3D, _blockDim3D, _groupMemBytes, _stream, ...) \
    -
    505 do {\
    -
    506  grid_launch_parm lp;\
    -
    507  lp.gridDim.x = _numBlocks3D.x; \
    -
    508  lp.gridDim.y = _numBlocks3D.y; \
    -
    509  lp.gridDim.z = _numBlocks3D.z; \
    -
    510  lp.groupDim.x = _blockDim3D.x; \
    -
    511  lp.groupDim.y = _blockDim3D.y; \
    -
    512  lp.groupDim.z = _blockDim3D.z; \
    -
    513  lp.groupMemBytes = _groupMemBytes;\
    -
    514  hc::completion_future cf;\
    -
    515  lp.cf = &cf; \
    -
    516  hipStream_t trueStream = (ihipPreLaunchKernel(_stream, &lp.av)); \
    -
    517  if (HIP_TRACE_API) {\
    -
    518  fprintf(stderr, KGRN "<<hip-api: hipLaunchKernel '%s' gridDim:[%d.%d.%d] groupDim:[%d.%d.%d] groupMem:+%d stream=%p\n" KNRM, \
    -
    519  #_kernelName, lp.gridDim.z, lp.gridDim.y, lp.gridDim.x, lp.groupDim.z, lp.groupDim.y, lp.groupDim.x, lp.groupMemBytes, (void*)(_stream));\
    -
    520  }\
    -
    521  _kernelName (lp, __VA_ARGS__);\
    -
    522  ihipPostLaunchKernel(trueStream, cf);\
    -
    523 } while(0)
    -
    524 
    -
    525 #else
    -
    526 #warning(DISABLE_GRID_LAUNCH set)
    -
    527 
    -
    528 #define hipLaunchKernel(_kernelName, _numBlocks3D, _blockDim3D, _groupMemBytes, _stream, ...) \
    -
    529 do {\
    -
    530  grid_launch_parm lp;\
    -
    531  lp.gridDim.x = _numBlocks3D.x * _blockDim3D.x;/*Convert from #blocks to #threads*/ \
    -
    532  lp.gridDim.y = _numBlocks3D.y * _blockDim3D.y;/*Convert from #blocks to #threads*/ \
    -
    533  lp.gridDim.z = _numBlocks3D.z * _blockDim3D.z;/*Convert from #blocks to #threads*/ \
    -
    534  lp.groupDim.x = _blockDim3D.x; \
    -
    535  lp.groupDim.y = _blockDim3D.y; \
    -
    536  lp.groupDim.z = _blockDim3D.z; \
    -
    537  lp.groupMemBytes = _groupMemBytes;\
    -
    538  hc::completion_future cf;\
    -
    539  lp.cf = &cf; \
    -
    540  hipStream_t trueStream = (ihipPreLaunchKernel(_stream, &lp.av)); \
    -
    541  if (HIP_TRACE_API) {\
    -
    542  fprintf(stderr, "==hip-api: launch '%s' gridDim:[%d.%d.%d] groupDim:[%d.%d.%d] groupMem:+%d stream=%p\n", \
    -
    543  #_kernelName, lp.gridDim.z, lp.gridDim.y, lp.gridDim.x, lp.groupDim.z, lp.groupDim.y, lp.groupDim.x, lp.groupMemBytes, (void*)(_stream));\
    -
    544  }\
    -
    545  _kernelName (lp, __VA_ARGS__);\
    -
    546  ihipPostLaunchKernel(trueStream, cf);\
    -
    547 } while(0)
    -
    548 /*end hipLaunchKernel */
    -
    549 #endif
    -
    550 
    -
    551 #elif defined (__HCC_C__)
    +
    495 #define __syncthreads() hc_barrier(CLK_LOCAL_MEM_FENCE)
    +
    496 
    +
    497 #define HIP_KERNEL_NAME(...) __VA_ARGS__
    +
    498 
    +
    499 #ifdef __HCC_CPP__
    +
    500 hipStream_t ihipPreLaunchKernel(hipStream_t stream, grid_launch_parm *lp);
    +
    501 void ihipPostLaunchKernel(hipStream_t stream, grid_launch_parm &lp);
    +
    502 
    +
    503 // TODO - move to common header file.
    +
    504 #define KNRM "\x1B[0m"
    +
    505 #define KGRN "\x1B[32m"
    +
    506 
    +
    507 #if not defined(DISABLE_GRID_LAUNCH)
    +
    508 #define hipLaunchKernel(_kernelName, _numBlocks3D, _blockDim3D, _groupMemBytes, _stream, ...) \
    +
    509 do {\
    +
    510  grid_launch_parm lp;\
    +
    511  lp.gridDim.x = _numBlocks3D.x; \
    +
    512  lp.gridDim.y = _numBlocks3D.y; \
    +
    513  lp.gridDim.z = _numBlocks3D.z; \
    +
    514  lp.groupDim.x = _blockDim3D.x; \
    +
    515  lp.groupDim.y = _blockDim3D.y; \
    +
    516  lp.groupDim.z = _blockDim3D.z; \
    +
    517  lp.groupMemBytes = _groupMemBytes; \
    +
    518  hipStream_t trueStream = (ihipPreLaunchKernel(_stream, &lp)); \
    +
    519  if (HIP_TRACE_API) {\
    +
    520  fprintf(stderr, KGRN "<<hip-api: hipLaunchKernel '%s' gridDim:(%d,%d,%d) groupDim:(%d,%d,%d) groupMem:+%d stream=%p\n" KNRM, \
    +
    521  #_kernelName, lp.gridDim.x, lp.gridDim.y, lp.gridDim.z, lp.groupDim.x, lp.groupDim.y, lp.groupDim.z, lp.groupMemBytes, (void*)(_stream));\
    +
    522  }\
    +
    523  _kernelName (lp, __VA_ARGS__);\
    +
    524  ihipPostLaunchKernel(trueStream, lp);\
    +
    525 } while(0)
    +
    526 
    +
    527 #else
    +
    528 #warning(DISABLE_GRID_LAUNCH set)
    +
    529 
    +
    530 #define hipLaunchKernel(_kernelName, _numBlocks3D, _blockDim3D, _groupMemBytes, _stream, ...) \
    +
    531 do {\
    +
    532  grid_launch_parm lp;\
    +
    533  lp.gridDim.x = _numBlocks3D.x * _blockDim3D.x;/*Convert from #blocks to #threads*/ \
    +
    534  lp.gridDim.y = _numBlocks3D.y * _blockDim3D.y;/*Convert from #blocks to #threads*/ \
    +
    535  lp.gridDim.z = _numBlocks3D.z * _blockDim3D.z;/*Convert from #blocks to #threads*/ \
    +
    536  lp.groupDim.x = _blockDim3D.x; \
    +
    537  lp.groupDim.y = _blockDim3D.y; \
    +
    538  lp.groupDim.z = _blockDim3D.z; \
    +
    539  lp.groupMemBytes = _groupMemBytes;\
    +
    540  hc::completion_future cf;\
    +
    541  lp.cf = &cf; \
    +
    542  hipStream_t trueStream = (ihipPreLaunchKernel(_stream, &lp.av)); \
    +
    543  if (HIP_TRACE_API) {\
    +
    544  fprintf(stderr, "==hip-api: launch '%s' gridDim:[%d.%d.%d] groupDim:[%d.%d.%d] groupMem:+%d stream=%p\n", \
    +
    545  #_kernelName, lp.gridDim.z, lp.gridDim.y, lp.gridDim.x, lp.groupDim.z, lp.groupDim.y, lp.groupDim.x, lp.groupMemBytes, (void*)(_stream));\
    +
    546  }\
    +
    547  _kernelName (lp, __VA_ARGS__);\
    +
    548  ihipPostLaunchKernel(trueStream, cf);\
    +
    549 } while(0)
    +
    550 /*end hipLaunchKernel */
    +
    551 #endif
    552 
    -
    553 //TODO - develop C interface.
    -
    554 
    -
    555 #endif
    -
    556 
    -
    557 #endif // __HCC__
    +
    553 #elif defined (__HCC_C__)
    +
    554 
    +
    555 //TODO - develop C interface.
    +
    556 
    +
    557 #endif
    558 
    -
    559 
    -
    564 //extern int HIP_PRINT_ENV ; ///< Print all HIP-related environment variables.
    -
    565 //extern int HIP_TRACE_API; ///< Trace HIP APIs.
    -
    566 //extern int HIP_LAUNCH_BLOCKING ; ///< Make all HIP APIs host-synchronous
    -
    567 
    -
    573 // End doxygen API:
    -
    579 #endif
    +
    559 #endif // __HCC__
    +
    560 
    +
    561 
    +
    566 //extern int HIP_PRINT_ENV ; ///< Print all HIP-related environment variables.
    +
    567 //extern int HIP_TRACE_API; ///< Trace HIP APIs.
    +
    568 //extern int HIP_LAUNCH_BLOCKING ; ///< Make all HIP APIs host-synchronous
    +
    569 
    +
    575 // End doxygen API:
    +
    581 #endif
    #define __host__
    Definition: host_defines.h:35
    diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h.html b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h.html index 67633ce1de..8b870efc4f 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_runtime_api.h File Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_runtime_api.h File Reference @@ -377,12 +377,6 @@ Functions - - - - - - @@ -392,7 +386,7 @@ Functions diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h_source.html index b62d900793..7cdbc0c4ab 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__runtime__api_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_runtime_api.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_runtime_api.h Source File @@ -353,133 +353,136 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    791 
    792 
    793 
    -
    807 hipError_t hipMemcpy(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind);
    -
    808 
    -
    809 
    -
    824 hipError_t hipMemcpyToSymbol(const char* symbolName, const void *src, size_t sizeBytes, size_t offset, hipMemcpyKind kind);
    -
    825 
    -
    826 
    -
    839 #if __cplusplus
    -
    840 hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream=0);
    -
    841 #else
    -
    842 hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream);
    -
    843 #endif
    -
    844 
    -
    857 hipError_t hipMemset(void* dst, int value, size_t sizeBytes );
    -
    858 
    -
    859 
    -
    873 #if __cplusplus
    -
    874 hipError_t hipMemsetAsync(void* dst, int value, size_t sizeBytes, hipStream_t = 0 );
    -
    875 #else
    -
    876 hipError_t hipMemsetAsync(void* dst, int value, size_t sizeBytes, hipStream_t stream);
    -
    877 #endif
    -
    878 
    -
    885 hipError_t hipMemGetInfo (size_t * free, size_t * total) ;
    -
    886 
    -
    887 // doxygen end Memory
    -
    919 hipError_t hipDeviceCanAccessPeer (int* canAccessPeer, int deviceId, int peerDeviceId);
    -
    920 
    -
    921 
    -
    936 hipError_t hipDeviceEnablePeerAccess (int peerDeviceId, unsigned int flags);
    -
    937 
    -
    938 
    -
    948 hipError_t hipDeviceDisablePeerAccess (int peerDeviceId);
    -
    949 
    -
    950 
    -
    962 hipError_t hipMemcpyPeer (void* dst, int dstDeviceId, const void* src, int srcDeviceId, size_t sizeBytes);
    -
    963 
    -
    976 #if __cplusplus
    -
    977 hipError_t hipMemcpyPeerAsync ( void* dst, int dstDeviceId, const void* src, int srcDevice, size_t sizeBytes, hipStream_t stream=0 );
    -
    978 #else
    -
    979 hipError_t hipMemcpyPeerAsync(void* dst, int dstDevice, const void* src, int srcDevice, size_t sizeBytes, hipStream_t stream);
    -
    980 #endif
    -
    981 // doxygen end PeerToPeer
    -
    1005 hipError_t hipDriverGetVersion(int *driverVersion) ;
    -
    1006 
    -
    1007 
    -
    1008 
    -
    1009 // doxygen end Version Management
    -
    1036 #ifdef __cplusplus
    -
    1037 } /* extern "c" */
    -
    1038 #endif
    -
    1039 
    -
    1040 
    -
    1058 // end-group HCC_Specific
    -
    1065 // doxygen end HIP API
    -
    1070 #endif
    -
    hipError_t hipHostFree(void *ptr)
    Free memory allocated by the hcc hip host memory allocation API.
    Definition: hip_memory.cpp:488
    -
    hipError_t hipMemcpyAsync(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream)
    Copy data from src to dst asynchronously.
    Definition: hip_memory.cpp:343
    +
    813 hipError_t hipMemcpy(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind);
    +
    814 
    +
    815 
    +
    830 hipError_t hipMemcpyToSymbol(const char* symbolName, const void *src, size_t sizeBytes, size_t offset, hipMemcpyKind kind);
    +
    831 
    +
    832 
    +
    852 #if __cplusplus
    +
    853 hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream=0);
    +
    854 #else
    +
    855 hipError_t hipMemcpyAsync(void* dst, const void* src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream);
    +
    856 #endif
    +
    857 
    +
    870 hipError_t hipMemset(void* dst, int value, size_t sizeBytes );
    +
    871 
    +
    872 
    +
    886 #if __cplusplus
    +
    887 hipError_t hipMemsetAsync(void* dst, int value, size_t sizeBytes, hipStream_t = 0 );
    +
    888 #else
    +
    889 hipError_t hipMemsetAsync(void* dst, int value, size_t sizeBytes, hipStream_t stream);
    +
    890 #endif
    +
    891 
    +
    898 hipError_t hipMemGetInfo (size_t * free, size_t * total) ;
    +
    899 
    +
    900 // doxygen end Memory
    +
    933 hipError_t hipDeviceCanAccessPeer (int* canAccessPeer, int deviceId, int peerDeviceId);
    +
    934 
    +
    935 
    +
    951 hipError_t hipDeviceEnablePeerAccess (int peerDeviceId, unsigned int flags);
    +
    952 
    +
    953 
    +
    964 hipError_t hipDeviceDisablePeerAccess (int peerDeviceId);
    +
    965 
    +
    966 
    +
    967 #ifdef PEER_NON_UNIFIED
    +
    968 
    +
    980 hipError_t hipMemcpyPeer (void* dst, int dstDeviceId, const void* src, int srcDeviceId, size_t sizeBytes);
    +
    981 
    +
    994 #if __cplusplus
    +
    995 hipError_t hipMemcpyPeerAsync ( void* dst, int dstDeviceId, const void* src, int srcDevice, size_t sizeBytes, hipStream_t stream=0 );
    +
    996 #else
    +
    997 hipError_t hipMemcpyPeerAsync(void* dst, int dstDevice, const void* src, int srcDevice, size_t sizeBytes, hipStream_t stream);
    +
    998 #endif
    +
    999 #endif
    +
    1000 
    +
    1001 
    +
    1002 // doxygen end PeerToPeer
    +
    1027 hipError_t hipDriverGetVersion(int *driverVersion) ;
    +
    1028 
    +
    1029 
    +
    1030 
    +
    1031 // doxygen end Version Management
    +
    1058 #ifdef __cplusplus
    +
    1059 } /* extern "c" */
    +
    1060 #endif
    +
    1061 
    +
    1062 
    +
    1080 // end-group HCC_Specific
    +
    1087 // doxygen end HIP API
    +
    1092 #endif
    +
    hipError_t hipHostFree(void *ptr)
    Free memory allocated by the hcc hip host memory allocation API.
    Definition: hip_memory.cpp:539
    +
    hipError_t hipMemcpyAsync(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream)
    Copy data from src to dst asynchronously.
    Definition: hip_memory.cpp:349
    hipError_t hipPeekAtLastError(void)
    Return last error returned by any HIP runtime API call.
    struct dim3 dim3
    -
    hipError_t hipGetDeviceProperties(hipDeviceProp_t *prop, int device)
    Returns device properties.
    Definition: hip_device.cpp:267
    -
    hipError_t hipMemcpyToSymbol(const char *symbolName, const void *src, size_t sizeBytes, size_t offset, hipMemcpyKind kind)
    Copies sizeBytes bytes from the memory area pointed to by src to the memory area pointed to by offset...
    Definition: hip_memory.cpp:291
    +
    hipError_t hipGetDeviceProperties(hipDeviceProp_t *prop, int device)
    Returns device properties.
    Definition: hip_device.cpp:261
    +
    hipError_t hipMemcpyToSymbol(const char *symbolName, const void *src, size_t sizeBytes, size_t offset, hipMemcpyKind kind)
    Copies sizeBytes bytes from the memory area pointed to by src to the memory area pointed to by offset...
    Definition: hip_memory.cpp:297
    hipError_t hipFuncSetCacheConfig(hipFuncCache config)
    Set Cache configuration for a specific function.
    Definition: hip_device.cpp:90
    no preference for shared memory or L1 (default)
    Definition: hip_runtime_api.h:92
    uint32_t x
    x
    Definition: hip_runtime_api.h:115
    Host-to-Device Copy.
    Definition: hip_runtime_api.h:131
    -
    hipError_t hipDeviceEnablePeerAccess(int peerDeviceId, unsigned int flags)
    Enable direct access from current device's virtual address space to memory allocations physically loc...
    Definition: hip_peer.cpp:101
    +
    hipError_t hipDeviceEnablePeerAccess(int peerDeviceId, unsigned int flags)
    Enable direct access from current device's virtual address space to memory allocations physically loc...
    Definition: hip_peer.cpp:106
    hipError_t hipDeviceGetSharedMemConfig(hipSharedMemConfig *pConfig)
    Get Shared memory bank configuration.
    Definition: hip_device.cpp:120
    hipError_t hipSetDevice(int device)
    Set default device to be used for subsequent hip API calls from this thread.
    Definition: hip_device.cpp:133
    Definition: hip_runtime_api.h:117
    Device-to-Host Copy.
    Definition: hip_runtime_api.h:132
    hipError_t hipHostGetDevicePointer(void **devPtr, void *hstPtr, unsigned int flags)
    Get Device pointer from Host Pointer allocated through hipHostAlloc.
    -
    hipError_t hipEventSynchronize(hipEvent_t event)
    : Wait for an event to complete.
    Definition: hip_event.cpp:121
    +
    hipError_t hipEventSynchronize(hipEvent_t event)
    : Wait for an event to complete.
    Definition: hip_event.cpp:120
    hipError_t hipSetDeviceFlags(unsigned flags)
    Set Device flags.
    hipFuncCache
    Definition: hip_runtime_api.h:91
    -
    hipError_t hipEventQuery(hipEvent_t event)
    Query event status.
    Definition: hip_event.cpp:199
    +
    hipError_t hipEventQuery(hipEvent_t event)
    Query event status.
    Definition: hip_event.cpp:198
    hipError_t hipDeviceGetCacheConfig(hipFuncCache *cacheConfig)
    Set Cache configuration for a specific function.
    Definition: hip_device.cpp:76
    -
    hipError_t hipDeviceDisablePeerAccess(int peerDeviceId)
    Disable direct access from current device's virtual address space to memory allocations physically lo...
    Definition: hip_peer.cpp:61
    -
    hipError_t hipDeviceGetAttribute(int *pi, hipDeviceAttribute_t attr, int device)
    Query device attribute.
    Definition: hip_device.cpp:191
    -
    hipError_t hipMallocHost(void **ptr, size_t size) __attribute__((deprecated("use hipHostMalloc instead")))
    Allocate pinned host memory.
    Definition: hip_memory.cpp:203
    +
    hipError_t hipDeviceDisablePeerAccess(int peerDeviceId)
    Disable direct access from current device's virtual address space to memory allocations physically lo...
    Definition: hip_peer.cpp:63
    +
    hipError_t hipDeviceGetAttribute(int *pi, hipDeviceAttribute_t attr, int device)
    Query device attribute.
    Definition: hip_device.cpp:185
    +
    hipError_t hipMallocHost(void **ptr, size_t size) __attribute__((deprecated("use hipHostMalloc instead")))
    Allocate pinned host memory.
    Definition: hip_memory.cpp:195
    hipError_t hipGetDevice(int *device)
    Return the default device id for the calling host thread.
    Definition: hip_device.cpp:31
    -
    hipError_t hipHostMalloc(void **ptr, size_t size, unsigned int flags)
    Allocate device accessible page locked host memory.
    Definition: hip_memory.cpp:152
    -
    hipDeviceAttribute_t
    Definition: hip_runtime_api.h:170
    -
    hipError_t hipEventDestroy(hipEvent_t event)
    Destroy the specified event.
    Definition: hip_event.cpp:106
    +
    hipError_t hipHostMalloc(void **ptr, size_t size, unsigned int flags)
    Allocate device accessible page locked host memory.
    Definition: hip_memory.cpp:148
    +
    hipDeviceAttribute_t
    Definition: hip_runtime_api.h:172
    +
    hipError_t hipEventDestroy(hipEvent_t event)
    Destroy the specified event.
    Definition: hip_event.cpp:105
    hipError_t hipStreamCreateWithFlags(hipStream_t *stream, unsigned int flags)
    Create an asynchronous stream.
    Definition: hip_stream.cpp:54
    Definition: hip_runtime_api.h:114
    uint32_t y
    y
    Definition: hip_runtime_api.h:116
    prefer equal size L1 cache and shared memory
    Definition: hip_runtime_api.h:95
    hipError_t hipEventCreateWithFlags(hipEvent_t *event, unsigned flags)
    Create an event with the specified flags.
    Definition: hip_event.cpp:53
    -
    hipError_t hipEventElapsedTime(float *ms, hipEvent_t start, hipEvent_t stop)
    Return the elapsed time between two events.
    Definition: hip_event.cpp:154
    +
    hipError_t hipEventElapsedTime(float *ms, hipEvent_t start, hipEvent_t stop)
    Return the elapsed time between two events.
    Definition: hip_event.cpp:153
    hipError_t hipDeviceCanAccessPeer(int *canAccessPeer, int deviceId, int peerDeviceId)
    Determine if a device can access a peer's memory.
    Definition: hip_peer.cpp:30
    hipError_t hipGetDeviceCount(int *count)
    Return number of compute-capable devices.
    Definition: hip_device.cpp:44
    -
    hipError_t hipMemset(void *dst, int value, size_t sizeBytes)
    Copy data from src to dst asynchronously.
    Definition: hip_memory.cpp:422
    +
    hipError_t hipMemset(void *dst, int value, size_t sizeBytes)
    Copy data from src to dst asynchronously.
    Definition: hip_memory.cpp:428
    hipError_t hipStreamDestroy(hipStream_t stream)
    Destroys the specified stream.
    Definition: hip_stream.cpp:117
    -
    hipError_t hipHostGetFlags(unsigned int *flagsPtr, void *hostPtr)
    Get flags associated with host pointer.
    Definition: hip_memory.cpp:210
    +
    hipError_t hipHostGetFlags(unsigned int *flagsPtr, void *hostPtr)
    Get flags associated with host pointer.
    Definition: hip_memory.cpp:202
    hipError_t hipStreamSynchronize(hipStream_t stream)
    Wait for all commands in stream to complete.
    Definition: hip_stream.cpp:94
    Shared mem is banked at 4-bytes intervals and performs best when adjacent threads access data 4 bytes...
    Definition: hip_runtime_api.h:104
    hipError_t
    Definition: hip_runtime_api.h:142
    hipMemcpyKind
    Definition: hip_runtime_api.h:129
    prefer larger L1 cache and smaller shared memory
    Definition: hip_runtime_api.h:94
    -
    hipError_t hipDriverGetVersion(int *driverVersion)
    Returns the approximate HIP driver version.
    Definition: hip_peer.cpp:156
    +
    hipError_t hipDriverGetVersion(int *driverVersion)
    Returns the approximate HIP driver version.
    Definition: hip_peer.cpp:163
    hipError_t hipDeviceSynchronize(void)
    Blocks until the default device has completed all preceding requested tasks.
    Definition: hip_device.cpp:149
    Definition: hip_runtime_api.h:47
    -
    hipError_t hipHostRegister(void *hostPtr, size_t sizeBytes, unsigned int flags)
    Register host memory so it can be accessed from the current device.
    Definition: hip_memory.cpp:236
    +
    hipError_t hipHostRegister(void *hostPtr, size_t sizeBytes, unsigned int flags)
    Register host memory so it can be accessed from the current device.
    Definition: hip_memory.cpp:228
    hipError_t hipDeviceSetCacheConfig(hipFuncCache cacheConfig)
    Set L1/Shared cache partition.
    Definition: hip_device.cpp:62
    -
    hipError_t hipMalloc(void **ptr, size_t size)
    Allocate memory on the default accelerator.
    Definition: hip_memory.cpp:117
    +
    hipError_t hipMalloc(void **ptr, size_t size)
    Allocate memory on the default accelerator.
    Definition: hip_memory.cpp:116
    const char * hipGetErrorName(hipError_t hip_error)
    Return name of the specified error code in text form.
    Definition: hip_error.cpp:53
    hipError_t hipGetLastError(void)
    Return last error returned by any HIP runtime API call and resets the stored error code to hipSuccess...
    Definition: hip_error.cpp:31
    hipError_t hipStreamWaitEvent(hipStream_t stream, hipEvent_t event, unsigned int flags)
    Make the specified compute stream wait for an event.
    Definition: hip_stream.cpp:75
    hipError_t hipStreamGetFlags(hipStream_t stream, unsigned int *flags)
    Return flags associated with this stream.
    Definition: hip_stream.cpp:146
    -
    hipError_t hipMemGetInfo(size_t *free, size_t *total)
    Query memory info. Return snapshot of free memory, and total allocatable memory on the device...
    Definition: hip_memory.cpp:435
    -
    hipError_t hipFree(void *ptr)
    Free memory allocated by the hcc hip memory allocation API. This API performs an implicit hipDeviceSy...
    Definition: hip_memory.cpp:463
    +
    hipError_t hipMemGetInfo(size_t *free, size_t *total)
    Query memory info. Return snapshot of free memory, and total allocatable memory on the device...
    Definition: hip_memory.cpp:484
    +
    hipError_t hipFree(void *ptr)
    Free memory allocated by the hcc hip memory allocation API. This API performs an implicit hipDeviceSy...
    Definition: hip_memory.cpp:514
    uint32_t z
    z
    Definition: hip_runtime_api.h:117
    hipError_t hipDeviceReset(void)
    Destroy all resources and reset all state on the default device in the current process.
    Definition: hip_device.cpp:163
    Definition: hip_runtime_api.h:74
    -
    hipError_t hipMemsetAsync(void *dst, int value, size_t sizeBytes, hipStream_t stream)
    Fills the first sizeBytes bytes of the memory area pointed to by dev with the constant byte value val...
    Definition: hip_memory.cpp:372
    +
    hipError_t hipMemsetAsync(void *dst, int value, size_t sizeBytes, hipStream_t stream)
    Fills the first sizeBytes bytes of the memory area pointed to by dev with the constant byte value val...
    Definition: hip_memory.cpp:378
    The compiler selects a device-specific value for the banking.
    Definition: hip_runtime_api.h:103
    Device-to-Device Copy.
    Definition: hip_runtime_api.h:133
    -
    Definition: hip_hcc.h:483
    -
    hipError_t hipMemcpyPeerAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream)
    Copies memory from one device to memory on another device.
    Definition: hip_peer.cpp:144
    +
    Definition: hip_hcc.h:485
    Runtime will automatically determine copy-kind based on virtual addresses.
    Definition: hip_runtime_api.h:134
    hipSharedMemConfig
    Definition: hip_runtime_api.h:102
    -
    hipError_t hipHostUnregister(void *hostPtr)
    Un-register host pointer.
    Definition: hip_memory.cpp:272
    -
    Definition: hip_hcc.h:399
    -
    hipError_t hipMemcpyPeer(void *dst, int dstDeviceId, const void *src, int srcDeviceId, size_t sizeBytes)
    Copies memory from one device to memory on another device.
    Definition: hip_peer.cpp:131
    +
    hipError_t hipHostUnregister(void *hostPtr)
    Un-register host pointer.
    Definition: hip_memory.cpp:275
    +
    Definition: hip_hcc.h:397
    hipError_t hipStreamCreate(hipStream_t *stream)
    Create an asynchronous stream.
    Definition: hip_stream.cpp:63
    -
    hipError_t hipMemcpy(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind)
    Copy data from src to dst.
    Definition: hip_memory.cpp:312
    +
    hipError_t hipMemcpy(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind)
    Copy data from src to dst.
    Definition: hip_memory.cpp:318
    hipError_t hipEventCreate(hipEvent_t *event)
    Definition: hip_event.cpp:61
    -
    hipError_t hipFreeHost(void *ptr) __attribute__((deprecated("use hipHostFree instead")))
    Free memory allocated by the hcc hip host memory allocation API.
    Definition: hip_memory.cpp:513
    +
    hipError_t hipFreeHost(void *ptr) __attribute__((deprecated("use hipHostFree instead")))
    Free memory allocated by the hcc hip host memory allocation API.
    Definition: hip_memory.cpp:564
    hipError_t hipDeviceSetSharedMemConfig(hipSharedMemConfig config)
    Set Shared memory bank configuration.
    Definition: hip_device.cpp:105
    hipError_t hipEventRecord(hipEvent_t event, hipStream_t stream)
    Record an event in the specified stream.
    Definition: hip_event.cpp:70
    prefer larger shared memory and smaller L1 cache
    Definition: hip_runtime_api.h:93
    @@ -490,7 +493,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h.html b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h.html index 1750bed328..7dcaae4e47 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_vector_types.h File Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_vector_types.h File Reference @@ -96,25 +96,15 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');

    Defines the different newt vector types for HIP runtime. More...

    - +
    #include "hip/hcc_detail/host_defines.h"
    +

    Go to the source code of this file.

    Variables

    warpSize
    hipError_t hipDeviceDisablePeerAccess (int peerDeviceId)
     Disable direct access from current device's virtual address space to memory allocations physically located on a peer device. More...
     
    hipError_t hipMemcpyPeer (void *dst, int dstDeviceId, const void *src, int srcDeviceId, size_t sizeBytes)
     Copies memory from one device to memory on another device. More...
     
    hipError_t hipMemcpyPeerAsync (void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream)
     Copies memory from one device to memory on another device. More...
     
    hipError_t hipDriverGetVersion (int *driverVersion)
     Returns the approximate HIP driver version. More...
     
    - - - - - - - - - + +

    Macros

    -#define ONE_COMPONENT_ACCESS(T, VT)   inline VT make_ ##VT (T x) { VT t; t.x = x; return t; };
     
    -
     
    -#define TWO_COMPONENT_ACCESS(T, VT)   inline VT make_ ##VT (T x, T y) { VT t; t.x=x; t.y=y; return t; };
     
    -#define THREE_COMPONENT_ACCESS(T, VT)   inline VT make_ ##VT (T x, T y, T z) { VT t; t.x=x; t.y=y; t.z=z; return t; };
     
    -#define FOUR_COMPONENT_ACCESS(T, VT)   inline VT make_ ##VT (T x, T y, T z, T w) { VT t; t.x=x; t.y=y; t.z=z; t.w=w; return t; };
     
    +#define __HIP_DEVICE__   __device__ __host__
     
    @@ -269,181 +259,157 @@ typedef hc::short_vector::double4  - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

    Typedefs

    Functions

    ONE_COMPONENT_ACCESS (signed char, char1)
     
    TWO_COMPONENT_ACCESS (signed char, char2)
     
    THREE_COMPONENT_ACCESS (signed char, char3)
     
    FOUR_COMPONENT_ACCESS (signed char, char4)
     
    ONE_COMPONENT_ACCESS (short, short1)
     
    TWO_COMPONENT_ACCESS (short, short2)
     
    THREE_COMPONENT_ACCESS (short, short3)
     
    FOUR_COMPONENT_ACCESS (short, short4)
     
    ONE_COMPONENT_ACCESS (int, int1)
     
    TWO_COMPONENT_ACCESS (int, int2)
     
    THREE_COMPONENT_ACCESS (int, int3)
     
    FOUR_COMPONENT_ACCESS (int, int4)
     
    ONE_COMPONENT_ACCESS (long int, long1)
     
    TWO_COMPONENT_ACCESS (long int, long2)
     
    THREE_COMPONENT_ACCESS (long int, long3)
     
    FOUR_COMPONENT_ACCESS (long int, long4)
     
    ONE_COMPONENT_ACCESS (long long int, ulong1)
     
    TWO_COMPONENT_ACCESS (long long int, ulong2)
     
    THREE_COMPONENT_ACCESS (long long int, ulong3)
     
    FOUR_COMPONENT_ACCESS (long long int, ulong4)
     
    ONE_COMPONENT_ACCESS (long long int, longlong1)
     
    TWO_COMPONENT_ACCESS (long long int, longlong2)
     
    THREE_COMPONENT_ACCESS (long long int, longlong3)
     
    FOUR_COMPONENT_ACCESS (long long int, longlong4)
     
    ONE_COMPONENT_ACCESS (unsigned char, uchar1)
     
    TWO_COMPONENT_ACCESS (unsigned char, uchar2)
     
    THREE_COMPONENT_ACCESS (unsigned char, uchar3)
     
    FOUR_COMPONENT_ACCESS (unsigned char, uchar4)
     
    ONE_COMPONENT_ACCESS (unsigned short, ushort1)
     
    TWO_COMPONENT_ACCESS (unsigned short, ushort2)
     
    THREE_COMPONENT_ACCESS (unsigned short, ushort3)
     
    FOUR_COMPONENT_ACCESS (unsigned short, ushort4)
     
    ONE_COMPONENT_ACCESS (unsigned int, uint1)
     
    TWO_COMPONENT_ACCESS (unsigned int, uint2)
     
    THREE_COMPONENT_ACCESS (unsigned int, uint3)
     
    FOUR_COMPONENT_ACCESS (unsigned int, uint4)
     
    ONE_COMPONENT_ACCESS (unsigned long int, ulong1)
     
    TWO_COMPONENT_ACCESS (unsigned long int, ulong2)
     
    THREE_COMPONENT_ACCESS (unsigned long int, ulong3)
     
    FOUR_COMPONENT_ACCESS (unsigned long int, ulong4)
     
    ONE_COMPONENT_ACCESS (unsigned long long int, ulong1)
     
    TWO_COMPONENT_ACCESS (unsigned long long int, ulong2)
     
    THREE_COMPONENT_ACCESS (unsigned long long int, ulong3)
     
    FOUR_COMPONENT_ACCESS (unsigned long long int, ulong4)
     
    ONE_COMPONENT_ACCESS (unsigned long long int, ulonglong1)
     
    TWO_COMPONENT_ACCESS (unsigned long long int, ulonglong2)
     
    THREE_COMPONENT_ACCESS (unsigned long long int, ulonglong3)
     
    FOUR_COMPONENT_ACCESS (unsigned long long int, ulonglong4)
     
    ONE_COMPONENT_ACCESS (float, float1)
     
    TWO_COMPONENT_ACCESS (float, float2)
     
    THREE_COMPONENT_ACCESS (float, float3)
     
    FOUR_COMPONENT_ACCESS (float, float4)
     
    ONE_COMPONENT_ACCESS (double, double1)
     
    TWO_COMPONENT_ACCESS (double, double2)
     
    THREE_COMPONENT_ACCESS (double, double3)
     
    FOUR_COMPONENT_ACCESS (double, double4)
     
    +__HIP_DEVICE__ char1 make_char1 (signed char)
     
    +__HIP_DEVICE__ char2 make_char2 (signed char, signed char)
     
    +__HIP_DEVICE__ char3 make_char3 (signed char, signed char, signed char)
     
    +__HIP_DEVICE__ char4 make_char4 (signed char, signed char, signed char, signed char)
     
    +__HIP_DEVICE__ short1 make_short1 (short)
     
    +__HIP_DEVICE__ short2 make_short2 (short, short)
     
    +__HIP_DEVICE__ short3 make_short3 (short, short, short)
     
    +__HIP_DEVICE__ short4 make_short4 (short, short, short, short)
     
    +__HIP_DEVICE__ int1 make_int1 (int)
     
    +__HIP_DEVICE__ int2 make_int2 (int, int)
     
    +__HIP_DEVICE__ int3 make_int3 (int, int, int)
     
    +__HIP_DEVICE__ int4 make_int4 (int, int, int, int)
     
    +__HIP_DEVICE__ long1 make_long1 (long)
     
    +__HIP_DEVICE__ long2 make_long2 (long, long)
     
    +__HIP_DEVICE__ long3 make_long3 (long, long, long)
     
    +__HIP_DEVICE__ long4 make_long4 (long, long, long, long)
     
    +__HIP_DEVICE__ longlong1 make_longlong1 (long long)
     
    +__HIP_DEVICE__ longlong2 make_longlong2 (long long, long long)
     
    +__HIP_DEVICE__ longlong3 make_longlong3 (long long, long long, long long)
     
    +__HIP_DEVICE__ longlong4 make_longlong4 (long long, long long, long long, long long)
     
    +__HIP_DEVICE__ uchar1 make_uchar1 (unsigned char)
     
    +__HIP_DEVICE__ uchar2 make_uchar2 (unsigned char, unsigned char)
     
    +__HIP_DEVICE__ uchar3 make_uchar3 (unsigned char, unsigned char, unsigned char)
     
    +__HIP_DEVICE__ uchar4 make_uchar4 (unsigned char, unsigned char, unsigned char, unsigned char)
     
    +__HIP_DEVICE__ ushort1 make_ushort1 (unsigned short)
     
    +__HIP_DEVICE__ ushort2 make_ushort2 (unsigned short, unsigned short)
     
    +__HIP_DEVICE__ ushort3 make_ushort3 (unsigned short, unsigned short, unsigned short)
     
    +__HIP_DEVICE__ ushort4 make_ushort4 (unsigned short, unsigned short, unsigned short, unsigned short)
     
    +__HIP_DEVICE__ uint1 make_uint1 (unsigned int)
     
    +__HIP_DEVICE__ uint2 make_uint2 (unsigned int, unsigned int)
     
    +__HIP_DEVICE__ uint3 make_uint3 (unsigned int, unsigned int, unsigned int)
     
    +__HIP_DEVICE__ uint4 make_uint4 (unsigned int, unsigned int, unsigned int, unsigned int)
     
    +__HIP_DEVICE__ ulong1 make_ulong1 (unsigned long)
     
    +__HIP_DEVICE__ ulong2 make_ulong2 (unsigned long, unsigned long)
     
    +__HIP_DEVICE__ ulong3 make_ulong3 (unsigned long, unsigned long, unsigned long)
     
    +__HIP_DEVICE__ ulong4 make_ulong4 (unsigned long, unsigned long, unsigned long, unsigned long)
     
    +__HIP_DEVICE__ ulonglong1 make_ulonglong1 (unsigned long long)
     
    +__HIP_DEVICE__ ulonglong2 make_ulonglong2 (unsigned long long, unsigned long long)
     
    +__HIP_DEVICE__ ulonglong3 make_ulonglong3 (unsigned long long, unsigned long long, unsigned long long)
     
    +__HIP_DEVICE__ ulonglong4 make_ulonglong4 (unsigned long long, unsigned long long, unsigned long long, unsigned long long)
     
    +__HIP_DEVICE__ float1 make_float1 (float)
     
    +__HIP_DEVICE__ float2 make_float2 (float, float)
     
    +__HIP_DEVICE__ float3 make_float3 (float, float, float)
     
    +__HIP_DEVICE__ float4 make_float4 (float, float, float, float)
     
    +__HIP_DEVICE__ double1 make_double1 (double)
     
    +__HIP_DEVICE__ double2 make_double2 (double, double)
     
    +__HIP_DEVICE__ double3 make_double3 (double, double, double)
     
    +__HIP_DEVICE__ double4 make_double4 (double, double, double, double)
     

    Detailed Description

    Defines the different newt vector types for HIP runtime.

    diff --git a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h_source.html index c19ef2c17b..2ecf8e9717 100644 --- a/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hcc__detail_2hip__vector__types_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_vector_types.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_vector_types.h Source File @@ -201,97 +201,163 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    115 typedef hc::short_vector::double3 double3;
    116 typedef hc::short_vector::double4 double4;
    117 
    -
    118 
    -
    120 // Inline functions for creating vector types from basic types
    -
    121 #define ONE_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x) { VT t; t.x = x; return t; };
    -
    122 #define TWO_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x, T y) { VT t; t.x=x; t.y=y; return t; };
    -
    123 #define THREE_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x, T y, T z) { VT t; t.x=x; t.y=y; t.z=z; return t; };
    -
    124 #define FOUR_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x, T y, T z, T w) { VT t; t.x=x; t.y=y; t.z=z; t.w=w; return t; };
    -
    125 
    -
    126 
    -
    127 //signed:
    -
    128 ONE_COMPONENT_ACCESS (signed char, char1);
    -
    129 TWO_COMPONENT_ACCESS (signed char, char2);
    -
    130 THREE_COMPONENT_ACCESS(signed char, char3);
    -
    131 FOUR_COMPONENT_ACCESS (signed char, char4);
    -
    132 
    -
    133 ONE_COMPONENT_ACCESS (short, short1);
    -
    134 TWO_COMPONENT_ACCESS (short, short2);
    -
    135 THREE_COMPONENT_ACCESS(short, short3);
    -
    136 FOUR_COMPONENT_ACCESS (short, short4);
    -
    137 
    -
    138 ONE_COMPONENT_ACCESS (int, int1);
    -
    139 TWO_COMPONENT_ACCESS (int, int2);
    -
    140 THREE_COMPONENT_ACCESS(int, int3);
    -
    141 FOUR_COMPONENT_ACCESS (int, int4);
    -
    142 
    -
    143 ONE_COMPONENT_ACCESS (long int, long1);
    -
    144 TWO_COMPONENT_ACCESS (long int, long2);
    -
    145 THREE_COMPONENT_ACCESS(long int, long3);
    -
    146 FOUR_COMPONENT_ACCESS (long int, long4);
    -
    147 
    -
    148 ONE_COMPONENT_ACCESS (long long int, ulong1);
    -
    149 TWO_COMPONENT_ACCESS (long long int, ulong2);
    -
    150 THREE_COMPONENT_ACCESS(long long int, ulong3);
    -
    151 FOUR_COMPONENT_ACCESS (long long int, ulong4);
    -
    152 
    -
    153 ONE_COMPONENT_ACCESS (long long int, longlong1);
    -
    154 TWO_COMPONENT_ACCESS (long long int, longlong2);
    -
    155 THREE_COMPONENT_ACCESS(long long int, longlong3);
    -
    156 FOUR_COMPONENT_ACCESS (long long int, longlong4);
    -
    157 
    -
    158 
    -
    159 // unsigned:
    -
    160 ONE_COMPONENT_ACCESS (unsigned char, uchar1);
    -
    161 TWO_COMPONENT_ACCESS (unsigned char, uchar2);
    -
    162 THREE_COMPONENT_ACCESS(unsigned char, uchar3);
    -
    163 FOUR_COMPONENT_ACCESS (unsigned char, uchar4);
    +
    118 #if __HCC__
    +
    119 #include"hip/hcc_detail/host_defines.h"
    +
    120 #define __HIP_DEVICE__ __device__ __host__
    +
    121 #else
    +
    122 #define __HIP_DEVICE__
    +
    123 #endif
    +
    124 
    +
    125 __HIP_DEVICE__ char1 make_char1(signed char );
    +
    126 __HIP_DEVICE__ char2 make_char2(signed char, signed char );
    +
    127 __HIP_DEVICE__ char3 make_char3(signed char, signed char, signed char );
    +
    128 __HIP_DEVICE__ char4 make_char4(signed char, signed char, signed char, signed char );
    +
    129 
    +
    130 __HIP_DEVICE__ short1 make_short1(short );
    +
    131 __HIP_DEVICE__ short2 make_short2(short, short );
    +
    132 __HIP_DEVICE__ short3 make_short3(short, short, short );
    +
    133 __HIP_DEVICE__ short4 make_short4(short, short, short, short );
    +
    134 
    +
    135 __HIP_DEVICE__ int1 make_int1(int );
    +
    136 __HIP_DEVICE__ int2 make_int2(int, int );
    +
    137 __HIP_DEVICE__ int3 make_int3(int, int, int );
    +
    138 __HIP_DEVICE__ int4 make_int4(int, int, int, int );
    +
    139 
    +
    140 __HIP_DEVICE__ long1 make_long1(long );
    +
    141 __HIP_DEVICE__ long2 make_long2(long, long );
    +
    142 __HIP_DEVICE__ long3 make_long3(long, long, long );
    +
    143 __HIP_DEVICE__ long4 make_long4(long, long, long, long );
    +
    144 
    +
    145 __HIP_DEVICE__ longlong1 make_longlong1(long long );
    +
    146 __HIP_DEVICE__ longlong2 make_longlong2(long long, long long );
    +
    147 __HIP_DEVICE__ longlong3 make_longlong3(long long, long long, long long );
    +
    148 __HIP_DEVICE__ longlong4 make_longlong4(long long, long long, long long, long long );
    +
    149 
    +
    150 __HIP_DEVICE__ uchar1 make_uchar1(unsigned char );
    +
    151 __HIP_DEVICE__ uchar2 make_uchar2(unsigned char, unsigned char );
    +
    152 __HIP_DEVICE__ uchar3 make_uchar3(unsigned char, unsigned char, unsigned char );
    +
    153 __HIP_DEVICE__ uchar4 make_uchar4(unsigned char, unsigned char, unsigned char, unsigned char );
    +
    154 
    +
    155 __HIP_DEVICE__ ushort1 make_ushort1(unsigned short );
    +
    156 __HIP_DEVICE__ ushort2 make_ushort2(unsigned short, unsigned short );
    +
    157 __HIP_DEVICE__ ushort3 make_ushort3(unsigned short, unsigned short, unsigned short );
    +
    158 __HIP_DEVICE__ ushort4 make_ushort4(unsigned short, unsigned short, unsigned short, unsigned short );
    +
    159 
    +
    160 __HIP_DEVICE__ uint1 make_uint1(unsigned int );
    +
    161 __HIP_DEVICE__ uint2 make_uint2(unsigned int, unsigned int );
    +
    162 __HIP_DEVICE__ uint3 make_uint3(unsigned int, unsigned int, unsigned int );
    +
    163 __HIP_DEVICE__ uint4 make_uint4(unsigned int, unsigned int, unsigned int, unsigned int );
    164 
    -
    165 ONE_COMPONENT_ACCESS (unsigned short, ushort1);
    -
    166 TWO_COMPONENT_ACCESS (unsigned short, ushort2);
    -
    167 THREE_COMPONENT_ACCESS(unsigned short, ushort3);
    -
    168 FOUR_COMPONENT_ACCESS (unsigned short, ushort4);
    +
    165 __HIP_DEVICE__ ulong1 make_ulong1(unsigned long );
    +
    166 __HIP_DEVICE__ ulong2 make_ulong2(unsigned long, unsigned long );
    +
    167 __HIP_DEVICE__ ulong3 make_ulong3(unsigned long, unsigned long, unsigned long );
    +
    168 __HIP_DEVICE__ ulong4 make_ulong4(unsigned long, unsigned long, unsigned long, unsigned long );
    169 
    -
    170 ONE_COMPONENT_ACCESS (unsigned int, uint1);
    -
    171 TWO_COMPONENT_ACCESS (unsigned int, uint2);
    -
    172 THREE_COMPONENT_ACCESS(unsigned int, uint3);
    -
    173 FOUR_COMPONENT_ACCESS (unsigned int, uint4);
    +
    170 __HIP_DEVICE__ ulonglong1 make_ulonglong1(unsigned long long );
    +
    171 __HIP_DEVICE__ ulonglong2 make_ulonglong2(unsigned long long, unsigned long long);
    +
    172 __HIP_DEVICE__ ulonglong3 make_ulonglong3(unsigned long long, unsigned long long, unsigned long long);
    +
    173 __HIP_DEVICE__ ulonglong4 make_ulonglong4(unsigned long long, unsigned long long, unsigned long long, unsigned long long );
    174 
    -
    175 ONE_COMPONENT_ACCESS (unsigned long int, ulong1);
    -
    176 TWO_COMPONENT_ACCESS (unsigned long int, ulong2);
    -
    177 THREE_COMPONENT_ACCESS(unsigned long int, ulong3);
    -
    178 FOUR_COMPONENT_ACCESS (unsigned long int, ulong4);
    +
    175 __HIP_DEVICE__ float1 make_float1(float );
    +
    176 __HIP_DEVICE__ float2 make_float2(float, float );
    +
    177 __HIP_DEVICE__ float3 make_float3(float, float, float );
    +
    178 __HIP_DEVICE__ float4 make_float4(float, float, float, float );
    179 
    -
    180 ONE_COMPONENT_ACCESS (unsigned long long int, ulong1);
    -
    181 TWO_COMPONENT_ACCESS (unsigned long long int, ulong2);
    -
    182 THREE_COMPONENT_ACCESS(unsigned long long int, ulong3);
    -
    183 FOUR_COMPONENT_ACCESS (unsigned long long int, ulong4);
    +
    180 __HIP_DEVICE__ double1 make_double1(double );
    +
    181 __HIP_DEVICE__ double2 make_double2(double, double );
    +
    182 __HIP_DEVICE__ double3 make_double3(double, double, double );
    +
    183 __HIP_DEVICE__ double4 make_double4(double, double, double, double );
    184 
    -
    185 ONE_COMPONENT_ACCESS (unsigned long long int, ulonglong1);
    -
    186 TWO_COMPONENT_ACCESS (unsigned long long int, ulonglong2);
    -
    187 THREE_COMPONENT_ACCESS(unsigned long long int, ulonglong3);
    -
    188 FOUR_COMPONENT_ACCESS (unsigned long long int, ulonglong4);
    -
    189 
    -
    190 
    -
    191 //Floating point
    -
    192 ONE_COMPONENT_ACCESS (float, float1);
    -
    193 TWO_COMPONENT_ACCESS (float, float2);
    -
    194 THREE_COMPONENT_ACCESS(float, float3);
    -
    195 FOUR_COMPONENT_ACCESS (float, float4);
    -
    196 
    -
    197 ONE_COMPONENT_ACCESS (double, double1);
    -
    198 TWO_COMPONENT_ACCESS (double, double2);
    -
    199 THREE_COMPONENT_ACCESS(double, double3);
    -
    200 FOUR_COMPONENT_ACCESS (double, double4);
    -
    201 
    -
    202 
    -
    203 #endif
    -
    204 
    -
    #define ONE_COMPONENT_ACCESS(T, VT)
    Definition: hip_vector_types.h:121
    +
    185 /*
    +
    187 // Inline functions for creating vector types from basic types
    +
    188 #define ONE_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x) { VT t; t.x = x; return t; };
    +
    189 #define TWO_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x, T y) { VT t; t.x=x; t.y=y; return t; };
    +
    190 #define THREE_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x, T y, T z) { VT t; t.x=x; t.y=y; t.z=z; return t; };
    +
    191 #define FOUR_COMPONENT_ACCESS(T, VT) inline VT make_ ##VT (T x, T y, T z, T w) { VT t; t.x=x; t.y=y; t.z=z; t.w=w; return t; };
    +
    192 
    +
    193 
    +
    194 //signed:
    +
    195 ONE_COMPONENT_ACCESS (signed char, char1);
    +
    196 TWO_COMPONENT_ACCESS (signed char, char2);
    +
    197 THREE_COMPONENT_ACCESS(signed char, char3);
    +
    198 FOUR_COMPONENT_ACCESS (signed char, char4);
    +
    199 
    +
    200 ONE_COMPONENT_ACCESS (short, short1);
    +
    201 TWO_COMPONENT_ACCESS (short, short2);
    +
    202 THREE_COMPONENT_ACCESS(short, short3);
    +
    203 FOUR_COMPONENT_ACCESS (short, short4);
    +
    204 
    +
    205 ONE_COMPONENT_ACCESS (int, int1);
    +
    206 TWO_COMPONENT_ACCESS (int, int2);
    +
    207 THREE_COMPONENT_ACCESS(int, int3);
    +
    208 FOUR_COMPONENT_ACCESS (int, int4);
    +
    209 
    +
    210 ONE_COMPONENT_ACCESS (long int, long1);
    +
    211 TWO_COMPONENT_ACCESS (long int, long2);
    +
    212 THREE_COMPONENT_ACCESS(long int, long3);
    +
    213 FOUR_COMPONENT_ACCESS (long int, long4);
    +
    214 
    +
    215 ONE_COMPONENT_ACCESS (long long int, ulong1);
    +
    216 TWO_COMPONENT_ACCESS (long long int, ulong2);
    +
    217 THREE_COMPONENT_ACCESS(long long int, ulong3);
    +
    218 FOUR_COMPONENT_ACCESS (long long int, ulong4);
    +
    219 
    +
    220 ONE_COMPONENT_ACCESS (long long int, longlong1);
    +
    221 TWO_COMPONENT_ACCESS (long long int, longlong2);
    +
    222 THREE_COMPONENT_ACCESS(long long int, longlong3);
    +
    223 FOUR_COMPONENT_ACCESS (long long int, longlong4);
    +
    224 
    +
    225 
    +
    226 // unsigned:
    +
    227 ONE_COMPONENT_ACCESS (unsigned char, uchar1);
    +
    228 TWO_COMPONENT_ACCESS (unsigned char, uchar2);
    +
    229 THREE_COMPONENT_ACCESS(unsigned char, uchar3);
    +
    230 FOUR_COMPONENT_ACCESS (unsigned char, uchar4);
    +
    231 
    +
    232 ONE_COMPONENT_ACCESS (unsigned short, ushort1);
    +
    233 TWO_COMPONENT_ACCESS (unsigned short, ushort2);
    +
    234 THREE_COMPONENT_ACCESS(unsigned short, ushort3);
    +
    235 FOUR_COMPONENT_ACCESS (unsigned short, ushort4);
    +
    236 
    +
    237 ONE_COMPONENT_ACCESS (unsigned int, uint1);
    +
    238 TWO_COMPONENT_ACCESS (unsigned int, uint2);
    +
    239 THREE_COMPONENT_ACCESS(unsigned int, uint3);
    +
    240 FOUR_COMPONENT_ACCESS (unsigned int, uint4);
    +
    241 
    +
    242 ONE_COMPONENT_ACCESS (unsigned long int, ulong1);
    +
    243 TWO_COMPONENT_ACCESS (unsigned long int, ulong2);
    +
    244 THREE_COMPONENT_ACCESS(unsigned long int, ulong3);
    +
    245 FOUR_COMPONENT_ACCESS (unsigned long int, ulong4);
    +
    246 
    +
    247 ONE_COMPONENT_ACCESS (unsigned long long int, ulong1);
    +
    248 TWO_COMPONENT_ACCESS (unsigned long long int, ulong2);
    +
    249 THREE_COMPONENT_ACCESS(unsigned long long int, ulong3);
    +
    250 FOUR_COMPONENT_ACCESS (unsigned long long int, ulong4);
    +
    251 
    +
    252 ONE_COMPONENT_ACCESS (unsigned long long int, ulonglong1);
    +
    253 TWO_COMPONENT_ACCESS (unsigned long long int, ulonglong2);
    +
    254 THREE_COMPONENT_ACCESS(unsigned long long int, ulonglong3);
    +
    255 FOUR_COMPONENT_ACCESS (unsigned long long int, ulonglong4);
    +
    256 
    +
    257 
    +
    258 //Floating point
    +
    259 ONE_COMPONENT_ACCESS (float, float1);
    +
    260 TWO_COMPONENT_ACCESS (float, float2);
    +
    261 THREE_COMPONENT_ACCESS(float, float3);
    +
    262 FOUR_COMPONENT_ACCESS (float, float4);
    +
    263 
    +
    264 ONE_COMPONENT_ACCESS (double, double1);
    +
    265 TWO_COMPONENT_ACCESS (double, double2);
    +
    266 THREE_COMPONENT_ACCESS(double, double3);
    +
    267 FOUR_COMPONENT_ACCESS (double, double4);
    +
    268 */
    +
    269 
    +
    270 #endif
    +
    271 
    diff --git a/projects/hip/docs/RuntimeAPI/html/hierarchy.html b/projects/hip/docs/RuntimeAPI/html/hierarchy.html index 10dfb72c2e..df6964be1a 100644 --- a/projects/hip/docs/RuntimeAPI/html/hierarchy.html +++ b/projects/hip/docs/RuntimeAPI/html/hierarchy.html @@ -117,7 +117,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hip__common_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__common_8h_source.html index b2940a5a45..e67805a70c 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__common_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__common_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hip_common.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hip_common.h Source File @@ -181,7 +181,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hip__hcc_8cpp.html b/projects/hip/docs/RuntimeAPI/html/hip__hcc_8cpp.html index 2243e52457..23737a93e2 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__hcc_8cpp.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__hcc_8cpp.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/src/hip_hcc.cpp File Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/src/hip_hcc.cpp File Reference @@ -112,9 +112,6 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); - - @@ -132,6 +129,9 @@ const char *  + + @@ -153,12 +153,12 @@ void < - - - - + + + + @@ -335,7 +335,7 @@ hsa_agent_t  diff --git a/projects/hip/docs/RuntimeAPI/html/hip__texture_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__texture_8h_source.html index 1ae7203a08..7ee8a7f90f 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__texture_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__texture_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_texture.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_texture.h Source File @@ -265,7 +265,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hip__util_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__util_8h_source.html index e139a984ef..4fdd3ad2ea 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__util_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__util_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_util.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_util.h Source File @@ -128,7 +128,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hip__vector__types_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__vector__types_8h_source.html index bf88422fb1..7626a68dc4 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__vector__types_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__vector__types_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hip_vector_types.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hip_vector_types.h Source File @@ -128,7 +128,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/host__defines_8h.html b/projects/hip/docs/RuntimeAPI/html/host__defines_8h.html index e6a0f4c745..9707d9e03b 100644 --- a/projects/hip/docs/RuntimeAPI/html/host__defines_8h.html +++ b/projects/hip/docs/RuntimeAPI/html/host__defines_8h.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/host_defines.h File Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/host_defines.h File Reference @@ -139,7 +139,7 @@ Macros diff --git a/projects/hip/docs/RuntimeAPI/html/host__defines_8h_source.html b/projects/hip/docs/RuntimeAPI/html/host__defines_8h_source.html index d52a1e6c0e..eb19a96656 100644 --- a/projects/hip/docs/RuntimeAPI/html/host__defines_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/host__defines_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/host_defines.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/host_defines.h Source File @@ -156,7 +156,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/index.html b/projects/hip/docs/RuntimeAPI/html/index.html index 620fc7a2dd..95d80acd77 100644 --- a/projects/hip/docs/RuntimeAPI/html/index.html +++ b/projects/hip/docs/RuntimeAPI/html/index.html @@ -91,7 +91,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/modules.html b/projects/hip/docs/RuntimeAPI/html/modules.html index 4b42aad93c..bea07f6a17 100644 --- a/projects/hip/docs/RuntimeAPI/html/modules.html +++ b/projects/hip/docs/RuntimeAPI/html/modules.html @@ -99,7 +99,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/pages.html b/projects/hip/docs/RuntimeAPI/html/pages.html index feeb12c8f1..e005f8d396 100644 --- a/projects/hip/docs/RuntimeAPI/html/pages.html +++ b/projects/hip/docs/RuntimeAPI/html/pages.html @@ -88,7 +88,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_10.js b/projects/hip/docs/RuntimeAPI/html/search/all_10.js index 7899428a20..bb8f6295d1 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_10.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_10.js @@ -1,6 +1,7 @@ var searchData= [ - ['sharedmemperblock',['sharedMemPerBlock',['../structhipDeviceProp__t.html#a3b9138678a0795c2677eddcfb1c67156',1,'hipDeviceProp_t']]], - ['stagingbuffer',['StagingBuffer',['../structStagingBuffer.html',1,'']]], - ['stream_20management',['Stream Management',['../group__Stream.html',1,'']]] + ['texture_20reference_20management',['Texture Reference Management',['../group__Texture.html',1,'']]], + ['texturereference',['textureReference',['../structtextureReference.html',1,'']]], + ['totalconstmem',['totalConstMem',['../structhipDeviceProp__t.html#a29880232c56120be3455ce00d5379665',1,'hipDeviceProp_t']]], + ['totalglobalmem',['totalGlobalMem',['../structhipDeviceProp__t.html#acedd6a2d23423441e4bf51c4a1b719f9',1,'hipDeviceProp_t']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_11.js b/projects/hip/docs/RuntimeAPI/html/search/all_11.js index bb8f6295d1..46a1400a7b 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_11.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_11.js @@ -1,7 +1,4 @@ var searchData= [ - ['texture_20reference_20management',['Texture Reference Management',['../group__Texture.html',1,'']]], - ['texturereference',['textureReference',['../structtextureReference.html',1,'']]], - ['totalconstmem',['totalConstMem',['../structhipDeviceProp__t.html#a29880232c56120be3455ce00d5379665',1,'hipDeviceProp_t']]], - ['totalglobalmem',['totalGlobalMem',['../structhipDeviceProp__t.html#acedd6a2d23423441e4bf51c4a1b719f9',1,'hipDeviceProp_t']]] + ['warpsize',['warpSize',['../structhipDeviceProp__t.html#af3357d33c004608bf05bc21a352be81b',1,'hipDeviceProp_t']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_12.js b/projects/hip/docs/RuntimeAPI/html/search/all_12.js index 46a1400a7b..250c203caf 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_12.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_12.js @@ -1,4 +1,4 @@ var searchData= [ - ['warpsize',['warpSize',['../structhipDeviceProp__t.html#af3357d33c004608bf05bc21a352be81b',1,'hipDeviceProp_t']]] + ['x',['x',['../structdim3.html#ac866c05f83a28dac20a153fc65b3b16c',1,'dim3']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_13.js b/projects/hip/docs/RuntimeAPI/html/search/all_13.js index 250c203caf..133dd9dc6e 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_13.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_13.js @@ -1,4 +1,4 @@ var searchData= [ - ['x',['x',['../structdim3.html#ac866c05f83a28dac20a153fc65b3b16c',1,'dim3']]] + ['y',['y',['../structdim3.html#a83e60e072f7e8bdfde6ac05053cbb370',1,'dim3']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_14.js b/projects/hip/docs/RuntimeAPI/html/search/all_14.js index 133dd9dc6e..e8bf38b99c 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_14.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_14.js @@ -1,4 +1,4 @@ var searchData= [ - ['y',['y',['../structdim3.html#a83e60e072f7e8bdfde6ac05053cbb370',1,'dim3']]] + ['z',['z',['../structdim3.html#a866e38993ecc4e76fd47311236c16b04',1,'dim3']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_15.html b/projects/hip/docs/RuntimeAPI/html/search/all_15.html deleted file mode 100644 index d3b5274ba7..0000000000 --- a/projects/hip/docs/RuntimeAPI/html/search/all_15.html +++ /dev/null @@ -1,26 +0,0 @@ - - - - - - - - - -
    -
    Loading...
    -
    - -
    Searching...
    -
    No Matches
    - -
    - - diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_15.js b/projects/hip/docs/RuntimeAPI/html/search/all_15.js deleted file mode 100644 index e8bf38b99c..0000000000 --- a/projects/hip/docs/RuntimeAPI/html/search/all_15.js +++ /dev/null @@ -1,4 +0,0 @@ -var searchData= -[ - ['z',['z',['../structdim3.html#a866e38993ecc4e76fd47311236c16b04',1,'dim3']]] -]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_8.js b/projects/hip/docs/RuntimeAPI/html/search/all_8.js index 223696d7f7..019aa40bc6 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_8.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_8.js @@ -66,6 +66,8 @@ var searchData= ['hipdevicesynchronize',['hipDeviceSynchronize',['../group__Device.html#gaefdc2847fb1d6c3fb1354e827a191ebd',1,'hipDeviceSynchronize(void): hip_device.cpp'],['../group__Device.html#gaefdc2847fb1d6c3fb1354e827a191ebd',1,'hipDeviceSynchronize(void): hip_device.cpp']]], ['hipdrivergetversion',['hipDriverGetVersion',['../group__Version.html#gaf6c342f52d2a29a0aca5cdd89b4dd47c',1,'hipDriverGetVersion(int *driverVersion): hip_peer.cpp'],['../group__Version.html#gaf6c342f52d2a29a0aca5cdd89b4dd47c',1,'hipDriverGetVersion(int *driverVersion): hip_peer.cpp']]], ['hiperror_5ft',['hipError_t',['../group__GlobalDefs.html#gadf5010f6e140a53ecbdf949e73e87594',1,'hip_runtime_api.h']]], + ['hiperrorhostmemoryalreadyregistered',['hipErrorHostMemoryAlreadyRegistered',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a9d7173cea72aace620a83d502569de1b',1,'hip_runtime_api.h']]], + ['hiperrorhostmemorynotregistered',['hipErrorHostMemoryNotRegistered',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a6901476ca88eed786fb8be003d9661d9',1,'hip_runtime_api.h']]], ['hiperrorinitializationerror',['hipErrorInitializationError',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a7e935ae88ee1f9ff3920156ac6864520',1,'hip_runtime_api.h']]], ['hiperrorinvaliddevice',['hipErrorInvalidDevice',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a07ab9b704ea693c1781a52741c60cd0d',1,'hip_runtime_api.h']]], ['hiperrorinvaliddevicepointer',['hipErrorInvalidDevicePointer',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a37a93fcd2b0aed9bf52b82fa26031e6f',1,'hip_runtime_api.h']]], @@ -134,8 +136,6 @@ var searchData= ['hipmemcpyhosttodevice',['hipMemcpyHostToDevice',['../group__GlobalDefs.html#gga232e222db36b1fc672ba98054d036a18aff32175ecb0c7113200286eff8211008',1,'hip_runtime_api.h']]], ['hipmemcpyhosttohost',['hipMemcpyHostToHost',['../group__GlobalDefs.html#gga232e222db36b1fc672ba98054d036a18a9d66b705aa85a9c83f0f533cef70d0af',1,'hip_runtime_api.h']]], ['hipmemcpykind',['hipMemcpyKind',['../group__GlobalDefs.html#ga232e222db36b1fc672ba98054d036a18',1,'hipMemcpyKind(): hip_runtime_api.h'],['../group__GlobalDefs.html#ga0c04e67413ce030817361f02673e5c85',1,'hipMemcpyKind(): hip_runtime_api.h']]], - ['hipmemcpypeer',['hipMemcpyPeer',['../group__PeerToPeer.html#ga5512f45e25c08052667c8ffe7162333b',1,'hipMemcpyPeer(void *dst, int dstDeviceId, const void *src, int srcDeviceId, size_t sizeBytes): hip_peer.cpp'],['../group__PeerToPeer.html#ga5512f45e25c08052667c8ffe7162333b',1,'hipMemcpyPeer(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes): hip_peer.cpp']]], - ['hipmemcpypeerasync',['hipMemcpyPeerAsync',['../group__PeerToPeer.html#ga216f951370c931d22e80c089ab724ed9',1,'hipMemcpyPeerAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream): hip_peer.cpp'],['../group__PeerToPeer.html#ga216f951370c931d22e80c089ab724ed9',1,'hipMemcpyPeerAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream): hip_peer.cpp']]], ['hipmemcpytosymbol',['hipMemcpyToSymbol',['../group__Memory.html#ga131ac5c1ba04e186112491cb9bf964bc',1,'hipMemcpyToSymbol(const char *symbolName, const void *src, size_t sizeBytes, size_t offset, hipMemcpyKind kind): hip_memory.cpp'],['../group__Memory.html#ga131ac5c1ba04e186112491cb9bf964bc',1,'hipMemcpyToSymbol(const char *symbolName, const void *src, size_t count, size_t offset, hipMemcpyKind kind): hip_memory.cpp']]], ['hipmemgetinfo',['hipMemGetInfo',['../group__Memory.html#ga311c3e246a21590de14478b8bd063be2',1,'hipMemGetInfo(size_t *free, size_t *total): hip_memory.cpp'],['../group__Memory.html#ga311c3e246a21590de14478b8bd063be2',1,'hipMemGetInfo(size_t *free, size_t *total): hip_memory.cpp']]], ['hipmemset',['hipMemset',['../group__Memory.html#gac7441e74affcce4b8b69dba996c5ebc4',1,'hipMemset(void *dst, int value, size_t sizeBytes): hip_memory.cpp'],['../group__Memory.html#gac7441e74affcce4b8b69dba996c5ebc4',1,'hipMemset(void *dst, int value, size_t sizeBytes): hip_memory.cpp']]], diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_d.js b/projects/hip/docs/RuntimeAPI/html/search/all_d.js index 3eaae3688b..71b6a5df56 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_d.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_d.js @@ -1,4 +1,5 @@ var searchData= [ - ['one_5fcomponent_5faccess',['ONE_COMPONENT_ACCESS',['../hcc__detail_2hip__vector__types_8h.html#add5d9c0f058c5a52c2b9165a66035d0e',1,'hip_vector_types.h']]] + ['pcibusid',['pciBusID',['../structhipDeviceProp__t.html#a1350f64d49b717ed3a06458f7549ccb0',1,'hipDeviceProp_t']]], + ['pcideviceid',['pciDeviceID',['../structhipDeviceProp__t.html#ae6aa845dc2d540f85098ea30be35f4eb',1,'hipDeviceProp_t']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_e.js b/projects/hip/docs/RuntimeAPI/html/search/all_e.js index 71b6a5df56..44ba50e0b7 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_e.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_e.js @@ -1,5 +1,4 @@ var searchData= [ - ['pcibusid',['pciBusID',['../structhipDeviceProp__t.html#a1350f64d49b717ed3a06458f7549ccb0',1,'hipDeviceProp_t']]], - ['pcideviceid',['pciDeviceID',['../structhipDeviceProp__t.html#ae6aa845dc2d540f85098ea30be35f4eb',1,'hipDeviceProp_t']]] + ['regsperblock',['regsPerBlock',['../structhipDeviceProp__t.html#a73c1c21648a901799ff6bef83c11135b',1,'hipDeviceProp_t']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/all_f.js b/projects/hip/docs/RuntimeAPI/html/search/all_f.js index 44ba50e0b7..7899428a20 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/all_f.js +++ b/projects/hip/docs/RuntimeAPI/html/search/all_f.js @@ -1,4 +1,6 @@ var searchData= [ - ['regsperblock',['regsPerBlock',['../structhipDeviceProp__t.html#a73c1c21648a901799ff6bef83c11135b',1,'hipDeviceProp_t']]] + ['sharedmemperblock',['sharedMemPerBlock',['../structhipDeviceProp__t.html#a3b9138678a0795c2677eddcfb1c67156',1,'hipDeviceProp_t']]], + ['stagingbuffer',['StagingBuffer',['../structStagingBuffer.html',1,'']]], + ['stream_20management',['Stream Management',['../group__Stream.html',1,'']]] ]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/defines_2.html b/projects/hip/docs/RuntimeAPI/html/search/defines_2.html deleted file mode 100644 index 6ef4b980d7..0000000000 --- a/projects/hip/docs/RuntimeAPI/html/search/defines_2.html +++ /dev/null @@ -1,26 +0,0 @@ - - - - - - - - - -
    -
    Loading...
    -
    - -
    Searching...
    -
    No Matches
    - -
    - - diff --git a/projects/hip/docs/RuntimeAPI/html/search/defines_2.js b/projects/hip/docs/RuntimeAPI/html/search/defines_2.js deleted file mode 100644 index 3eaae3688b..0000000000 --- a/projects/hip/docs/RuntimeAPI/html/search/defines_2.js +++ /dev/null @@ -1,4 +0,0 @@ -var searchData= -[ - ['one_5fcomponent_5faccess',['ONE_COMPONENT_ACCESS',['../hcc__detail_2hip__vector__types_8h.html#add5d9c0f058c5a52c2b9165a66035d0e',1,'hip_vector_types.h']]] -]; diff --git a/projects/hip/docs/RuntimeAPI/html/search/enumvalues_0.js b/projects/hip/docs/RuntimeAPI/html/search/enumvalues_0.js index 2c6c17cf36..c79bf0c947 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/enumvalues_0.js +++ b/projects/hip/docs/RuntimeAPI/html/search/enumvalues_0.js @@ -25,6 +25,8 @@ var searchData= ['hipdeviceattributepcideviceid',['hipDeviceAttributePciDeviceId',['../group__GlobalDefs.html#ggacc0acd7b9bda126c6bb3dfd6e2796d7ca955d90286e87be9e3528f0b817ab32ff',1,'hip_runtime_api.h']]], ['hipdeviceattributetotalconstantmemory',['hipDeviceAttributeTotalConstantMemory',['../group__GlobalDefs.html#ggacc0acd7b9bda126c6bb3dfd6e2796d7cac6089ac3a0f9c77cc382fb0eaa73ae9c',1,'hip_runtime_api.h']]], ['hipdeviceattributewarpsize',['hipDeviceAttributeWarpSize',['../group__GlobalDefs.html#ggacc0acd7b9bda126c6bb3dfd6e2796d7caffd94133e823247a6f1215343232f6ec',1,'hip_runtime_api.h']]], + ['hiperrorhostmemoryalreadyregistered',['hipErrorHostMemoryAlreadyRegistered',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a9d7173cea72aace620a83d502569de1b',1,'hip_runtime_api.h']]], + ['hiperrorhostmemorynotregistered',['hipErrorHostMemoryNotRegistered',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a6901476ca88eed786fb8be003d9661d9',1,'hip_runtime_api.h']]], ['hiperrorinitializationerror',['hipErrorInitializationError',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a7e935ae88ee1f9ff3920156ac6864520',1,'hip_runtime_api.h']]], ['hiperrorinvaliddevice',['hipErrorInvalidDevice',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a07ab9b704ea693c1781a52741c60cd0d',1,'hip_runtime_api.h']]], ['hiperrorinvaliddevicepointer',['hipErrorInvalidDevicePointer',['../group__GlobalDefs.html#ggadf5010f6e140a53ecbdf949e73e87594a37a93fcd2b0aed9bf52b82fa26031e6f',1,'hip_runtime_api.h']]], diff --git a/projects/hip/docs/RuntimeAPI/html/search/functions_0.js b/projects/hip/docs/RuntimeAPI/html/search/functions_0.js index fc5d49f443..5299b02a40 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/functions_0.js +++ b/projects/hip/docs/RuntimeAPI/html/search/functions_0.js @@ -39,8 +39,6 @@ var searchData= ['hipmallochost',['hipMallocHost',['../group__Memory.html#gad3d3cdf82eb0058fc9eac1f939cd9d30',1,'hipMallocHost(void **ptr, size_t size) __attribute__((deprecated("use hipHostMalloc instead"))): hip_memory.cpp'],['../group__Memory.html#gad3d3cdf82eb0058fc9eac1f939cd9d30',1,'hipMallocHost(void **ptr, size_t sizeBytes): hip_memory.cpp']]], ['hipmemcpy',['hipMemcpy',['../group__Memory.html#gac1a055d288302edd641c6d7416858e1e',1,'hipMemcpy(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind): hip_memory.cpp'],['../group__Memory.html#gac1a055d288302edd641c6d7416858e1e',1,'hipMemcpy(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind): hip_memory.cpp']]], ['hipmemcpyasync',['hipMemcpyAsync',['../group__Memory.html#gad55fa9f5980b711bc93c52820149ba18',1,'hipMemcpyAsync(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream): hip_memory.cpp'],['../group__Memory.html#gad55fa9f5980b711bc93c52820149ba18',1,'hipMemcpyAsync(void *dst, const void *src, size_t sizeBytes, hipMemcpyKind kind, hipStream_t stream): hip_memory.cpp']]], - ['hipmemcpypeer',['hipMemcpyPeer',['../group__PeerToPeer.html#ga5512f45e25c08052667c8ffe7162333b',1,'hipMemcpyPeer(void *dst, int dstDeviceId, const void *src, int srcDeviceId, size_t sizeBytes): hip_peer.cpp'],['../group__PeerToPeer.html#ga5512f45e25c08052667c8ffe7162333b',1,'hipMemcpyPeer(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes): hip_peer.cpp']]], - ['hipmemcpypeerasync',['hipMemcpyPeerAsync',['../group__PeerToPeer.html#ga216f951370c931d22e80c089ab724ed9',1,'hipMemcpyPeerAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream): hip_peer.cpp'],['../group__PeerToPeer.html#ga216f951370c931d22e80c089ab724ed9',1,'hipMemcpyPeerAsync(void *dst, int dstDevice, const void *src, int srcDevice, size_t sizeBytes, hipStream_t stream): hip_peer.cpp']]], ['hipmemcpytosymbol',['hipMemcpyToSymbol',['../group__Memory.html#ga131ac5c1ba04e186112491cb9bf964bc',1,'hipMemcpyToSymbol(const char *symbolName, const void *src, size_t sizeBytes, size_t offset, hipMemcpyKind kind): hip_memory.cpp'],['../group__Memory.html#ga131ac5c1ba04e186112491cb9bf964bc',1,'hipMemcpyToSymbol(const char *symbolName, const void *src, size_t count, size_t offset, hipMemcpyKind kind): hip_memory.cpp']]], ['hipmemgetinfo',['hipMemGetInfo',['../group__Memory.html#ga311c3e246a21590de14478b8bd063be2',1,'hipMemGetInfo(size_t *free, size_t *total): hip_memory.cpp'],['../group__Memory.html#ga311c3e246a21590de14478b8bd063be2',1,'hipMemGetInfo(size_t *free, size_t *total): hip_memory.cpp']]], ['hipmemset',['hipMemset',['../group__Memory.html#gac7441e74affcce4b8b69dba996c5ebc4',1,'hipMemset(void *dst, int value, size_t sizeBytes): hip_memory.cpp'],['../group__Memory.html#gac7441e74affcce4b8b69dba996c5ebc4',1,'hipMemset(void *dst, int value, size_t sizeBytes): hip_memory.cpp']]], diff --git a/projects/hip/docs/RuntimeAPI/html/search/search.js b/projects/hip/docs/RuntimeAPI/html/search/search.js index 19b5f78809..4b4647b86f 100644 --- a/projects/hip/docs/RuntimeAPI/html/search/search.js +++ b/projects/hip/docs/RuntimeAPI/html/search/search.js @@ -7,7 +7,7 @@ var indexSectionsWithContent = { - 0: "_abcdefghilmnoprstwxyz", + 0: "_abcdefghilmnprstwxyz", 1: "dfhilst", 2: "h", 3: "h", @@ -15,7 +15,7 @@ var indexSectionsWithContent = 5: "dh", 6: "h", 7: "h", - 8: "_ho", + 8: "_h", 9: "cdeghmst", 10: "bh" }; diff --git a/projects/hip/docs/RuntimeAPI/html/staging__buffer_8h_source.html b/projects/hip/docs/RuntimeAPI/html/staging__buffer_8h_source.html index ab3355af51..4d6304d81d 100644 --- a/projects/hip/docs/RuntimeAPI/html/staging__buffer_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/staging__buffer_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/staging_buffer.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/staging_buffer.h Source File @@ -141,23 +141,26 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    50  void CopyDeviceToHost (void* dst, const void* src, size_t sizeBytes, hsa_signal_t *waitFor);
    51  void CopyDeviceToHostPinInPlace(void* dst, const void* src, size_t sizeBytes, hsa_signal_t *waitFor);
    52 
    -
    53 
    -
    54 private:
    -
    55  hsa_agent_t _hsa_agent;
    -
    56  size_t _bufferSize; // Size of the buffers.
    -
    57  int _numBuffers;
    -
    58 
    -
    59  char *_pinnedStagingBuffer[_max_buffers];
    -
    60  hsa_signal_t _completion_signal[_max_buffers];
    -
    61  std::mutex _copy_lock; // provide thread-safe access
    -
    62 };
    -
    63 
    -
    64 #endif
    +
    53  void CopyPeerToPeer( void* dst, hsa_agent_t dstAgent, const void* src, hsa_agent_t srcAgent, size_t sizeBytes, hsa_signal_t *waitFor);
    +
    54 
    +
    55 
    +
    56 private:
    +
    57  hsa_agent_t _hsa_agent;
    +
    58  size_t _bufferSize; // Size of the buffers.
    +
    59  int _numBuffers;
    +
    60 
    +
    61  char *_pinnedStagingBuffer[_max_buffers];
    +
    62  hsa_signal_t _completion_signal[_max_buffers];
    +
    63  hsa_signal_t _completion_signal2[_max_buffers]; // P2P needs another set of signals.
    +
    64  std::mutex _copy_lock; // provide thread-safe access
    +
    65 };
    +
    66 
    +
    67 #endif
    Definition: staging_buffer.h:40
    diff --git a/projects/hip/docs/RuntimeAPI/html/structLockedBase-members.html b/projects/hip/docs/RuntimeAPI/html/structLockedBase-members.html index ae686aeb6e..59f65b693a 100644 --- a/projects/hip/docs/RuntimeAPI/html/structLockedBase-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structLockedBase-members.html @@ -96,7 +96,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');

    Macros

    -#define HIP_HCC
     
    #define DeviceErrorCheck(x)   if (x != HSA_STATUS_SUCCESS) { return hipErrorInvalidDevice; }
     
    ihipErrorStri
    bool ihipIsValidDevice (unsigned deviceIndex)
     
    +ihipDevice_tgetDevice (unsigned deviceIndex)
     
    void error_check (hsa_status_t hsa_error_code, int line_num, std::string str)
     
    ihipInit ()
    hipStream_t ihipSyncAndResolveStream (hipStream_t stream)
     
    -hipStream_t ihipPreLaunchKernel (hipStream_t stream, hc::accelerator_view **av)
     
    -void ihipPostLaunchKernel (hipStream_t stream, hc::completion_future &kernelFuture)
     
    +hipStream_t ihipPreLaunchKernel (hipStream_t stream, grid_launch_parm *lp)
     
    +void ihipPostLaunchKernel (hipStream_t stream, grid_launch_parm &lp)
     
    void ihipSetTs (hipEvent_t e)
     
    g_cpu_agent diff --git a/projects/hip/docs/RuntimeAPI/html/hip__hcc_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__hcc_8h_source.html index d8fac6f79f..bb7a4b6999 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__hcc_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__hcc_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h Source File @@ -115,702 +115,707 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    24 #include "hip/hcc_detail/hip_util.h"
    25 #include "hip/hcc_detail/staging_buffer.h"
    26 
    -
    27 #define HIP_HCC
    -
    28 
    -
    29 #if defined(__HCC__) && (__hcc_workweek__ < 1502)
    -
    30 #error("This version of HIP requires a newer version of HCC.");
    -
    31 #endif
    -
    32 
    -
    33 // #define USE_MEMCPYTOSYMBOL
    -
    34 //
    -
    35 //Use the new HCC accelerator_view::copy instead of am_copy
    -
    36 #define USE_AV_COPY 0
    -
    37 
    -
    38 // Compile peer-to-peer support.
    -
    39 // >= 2 : use HCC hc:accelerator::get_is_peer
    -
    40 // >= 3 : use hc::am_memtracker_update_peers(...)
    -
    41 #define USE_PEER_TO_PEER 0
    -
    42 
    -
    43 // Use new lock API in HCC:
    -
    44 #define USE_HCC_LOCK 0
    +
    27 
    +
    28 #if defined(__HCC__) && (__hcc_workweek__ < 16155)
    +
    29 #error("This version of HIP requires a newer version of HCC.");
    +
    30 #endif
    +
    31 
    +
    32 // #define USE_MEMCPYTOSYMBOL
    +
    33 //
    +
    34 //Use the new HCC accelerator_view::copy instead of am_copy
    +
    35 #define USE_AV_COPY 0
    +
    36 
    +
    37 // Compile peer-to-peer support.
    +
    38 // >= 2 : use HCC hc:accelerator::get_is_peer
    +
    39 // >= 3 : use hc::am_memtracker_update_peers(...)
    +
    40 #define USE_PEER_TO_PEER 2
    +
    41 
    +
    42 
    +
    43 // Use new am_memory_host_lock APIs:
    +
    44 #define USE_HCC_LOCK_API 1
    45 
    -
    46 //#define INLINE static inline
    -
    47 
    -
    48 //---
    -
    49 // Environment variables:
    -
    50 
    -
    51 // Intended to distinguish whether an environment variable should be visible only in debug mode, or in debug+release.
    -
    52 //static const int debug = 0;
    -
    53 extern const int release;
    -
    54 
    -
    55 extern int HIP_LAUNCH_BLOCKING;
    -
    56 
    -
    57 extern int HIP_PRINT_ENV;
    -
    58 extern int HIP_ATP_MARKER;
    -
    59 //extern int HIP_TRACE_API;
    -
    60 extern int HIP_ATP;
    -
    61 extern int HIP_DB;
    -
    62 extern int HIP_STAGING_SIZE; /* size of staging buffers, in KB */
    -
    63 extern int HIP_STAGING_BUFFERS; // TODO - remove, two buffers should be enough.
    -
    64 extern int HIP_PININPLACE;
    -
    65 extern int HIP_STREAM_SIGNALS; /* number of signals to allocate at stream creation */
    -
    66 extern int HIP_VISIBLE_DEVICES; /* Contains a comma-separated sequence of GPU identifiers */
    +
    46 
    +
    47 //---
    +
    48 // Environment variables:
    +
    49 
    +
    50 // Intended to distinguish whether an environment variable should be visible only in debug mode, or in debug+release.
    +
    51 //static const int debug = 0;
    +
    52 extern const int release;
    +
    53 
    +
    54 extern int HIP_LAUNCH_BLOCKING;
    +
    55 
    +
    56 extern int HIP_PRINT_ENV;
    +
    57 extern int HIP_ATP_MARKER;
    +
    58 //extern int HIP_TRACE_API;
    +
    59 extern int HIP_ATP;
    +
    60 extern int HIP_DB;
    +
    61 extern int HIP_STAGING_SIZE; /* size of staging buffers, in KB */
    +
    62 extern int HIP_STAGING_BUFFERS; // TODO - remove, two buffers should be enough.
    +
    63 extern int HIP_PININPLACE;
    +
    64 extern int HIP_STREAM_SIGNALS; /* number of signals to allocate at stream creation */
    +
    65 extern int HIP_VISIBLE_DEVICES; /* Contains a comma-separated sequence of GPU identifiers */
    +
    66 
    67 
    -
    68 
    -
    69 //---
    -
    70 // Chicken bits for disabling functionality to work around potential issues:
    -
    71 extern int HIP_DISABLE_HW_KERNEL_DEP;
    -
    72 extern int HIP_DISABLE_HW_COPY_DEP;
    -
    73 
    -
    74 extern thread_local int tls_defaultDevice;
    -
    75 extern thread_local hipError_t tls_lastHipError;
    -
    76 class ihipStream_t;
    -
    77 class ihipDevice_t;
    +
    68 //---
    +
    69 // Chicken bits for disabling functionality to work around potential issues:
    +
    70 extern int HIP_DISABLE_HW_KERNEL_DEP;
    +
    71 extern int HIP_DISABLE_HW_COPY_DEP;
    +
    72 
    +
    73 extern thread_local int tls_defaultDevice;
    +
    74 extern thread_local hipError_t tls_lastHipError;
    +
    75 class ihipStream_t;
    +
    76 class ihipDevice_t;
    +
    77 
    78 
    -
    79 
    -
    80 // Color defs for debug messages:
    -
    81 #define KNRM "\x1B[0m"
    -
    82 #define KRED "\x1B[31m"
    -
    83 #define KGRN "\x1B[32m"
    -
    84 #define KYEL "\x1B[33m"
    -
    85 #define KBLU "\x1B[34m"
    -
    86 #define KMAG "\x1B[35m"
    -
    87 #define KCYN "\x1B[36m"
    -
    88 #define KWHT "\x1B[37m"
    -
    89 
    -
    90 #define API_COLOR KGRN
    -
    91 
    -
    92 
    -
    93 #define HIP_HCC
    -
    94 
    -
    95 // If set, thread-safety is enforced on all stream functions.
    -
    96 // Stream functions will acquire a mutex before entering critical sections.
    -
    97 #define STREAM_THREAD_SAFE 1
    +
    79 // Color defs for debug messages:
    +
    80 #define KNRM "\x1B[0m"
    +
    81 #define KRED "\x1B[31m"
    +
    82 #define KGRN "\x1B[32m"
    +
    83 #define KYEL "\x1B[33m"
    +
    84 #define KBLU "\x1B[34m"
    +
    85 #define KMAG "\x1B[35m"
    +
    86 #define KCYN "\x1B[36m"
    +
    87 #define KWHT "\x1B[37m"
    +
    88 
    +
    89 #define API_COLOR KGRN
    +
    90 
    +
    91 
    +
    92 // If set, thread-safety is enforced on all stream functions.
    +
    93 // Stream functions will acquire a mutex before entering critical sections.
    +
    94 #define STREAM_THREAD_SAFE 1
    +
    95 
    +
    96 
    +
    97 #define DEVICE_THREAD_SAFE 1
    98 
    -
    99 
    -
    100 #define DEVICE_THREAD_SAFE 1
    -
    101 
    -
    102 // If FORCE_COPY_DEP=1 , HIP runtime will add
    -
    103 // synchronization for copy commands in the same stream, regardless of command type.
    -
    104 // If FORCE_COPY_DEP=0 data copies of the same kind (H2H, H2D, D2H, D2D) are assumed to be implicitly ordered.
    -
    105 // ROCR runtime implementation currently provides this guarantee when using SDMA queues but not
    -
    106 // when using shader queues.
    -
    107 // TODO - measure if this matters for performance, in particular for back-to-back small copies.
    -
    108 // If not, we can simplify the copy dependency tracking by collapsing to a single Copy type, and always forcing dependencies for copy commands.
    -
    109 #define FORCE_SAMEDIR_COPY_DEP 1
    -
    110 
    -
    111 
    -
    112 // Compile debug trace mode - this prints debug messages to stderr when env var HIP_DB is set.
    -
    113 // May be set to 0 to remove debug if checks - possible code size and performance difference?
    -
    114 #define COMPILE_HIP_DB 1
    -
    115 
    -
    116 
    -
    117 // Compile HIP tracing capability.
    -
    118 // 0x1 = print a string at function entry with arguments.
    -
    119 // 0x2 = prints a simple message with function name + return code when function exits.
    -
    120 // 0x3 = print both.
    -
    121 // Must be enabled at runtime with HIP_TRACE_API
    -
    122 #define COMPILE_HIP_TRACE_API 0x3
    -
    123 
    -
    124 
    -
    125 // Compile code that generates trace markers for CodeXL ATP at HIP function begin/end.
    -
    126 // ATP is standard CodeXL format that includes timestamps for kernels, HSA RT APIs, and HIP APIs.
    -
    127 #ifndef COMPILE_HIP_ATP_MARKER
    -
    128 #define COMPILE_HIP_ATP_MARKER 0
    -
    129 #endif
    -
    130 
    -
    131 
    -
    132 // #include CPP files to produce one object file
    -
    133 #define ONE_OBJECT_FILE 0
    -
    134 
    -
    135 
    -
    136 // Compile support for trace markers that are displayed on CodeXL GUI at start/stop of each function boundary.
    -
    137 // TODO - currently we print the trace message at the beginning. if we waited, we could also include return codes, and any values returned
    -
    138 // through ptr-to-args (ie the pointers allocated by hipMalloc).
    -
    139 #if COMPILE_HIP_ATP_MARKER
    -
    140 #include "AMDTActivityLogger.h"
    -
    141 #define SCOPED_MARKER(markerName,group,userString) amdtScopedMarker(markerName, group, userString)
    -
    142 #else
    -
    143 // Swallow scoped markers:
    -
    144 #define SCOPED_MARKER(markerName,group,userString)
    -
    145 #endif
    -
    146 
    -
    147 
    -
    148 #if COMPILE_HIP_ATP_MARKER || (COMPILE_HIP_TRACE_API & 0x1)
    -
    149 #define API_TRACE(...)\
    -
    150 {\
    -
    151  if (HIP_ATP_MARKER || (COMPILE_HIP_DB && HIP_TRACE_API)) {\
    -
    152  std::string s = std::string(__func__) + " (" + ToString(__VA_ARGS__) + ')';\
    -
    153  if (COMPILE_HIP_DB && HIP_TRACE_API) {\
    -
    154  fprintf (stderr, API_COLOR "<<hip-api: %s\n" KNRM, s.c_str());\
    -
    155  }\
    -
    156  SCOPED_MARKER(s.c_str(), "HIP", NULL);\
    -
    157  }\
    -
    158 }
    -
    159 #else
    -
    160 // Swallow API_TRACE
    -
    161 #define API_TRACE(...)
    -
    162 #endif
    -
    163 
    -
    164 
    -
    165 
    -
    166 // This macro should be called at the beginning of every HIP API.
    -
    167 // It initialies the hip runtime (exactly once), and
    -
    168 // generate trace string that can be output to stderr or to ATP file.
    -
    169 #define HIP_INIT_API(...) \
    -
    170  std::call_once(hip_initialized, ihipInit);\
    -
    171  API_TRACE(__VA_ARGS__);
    -
    172 
    -
    173 #define ihipLogStatus(_hip_status) \
    -
    174  ({\
    -
    175  hipError_t _local_hip_status = _hip_status; /*local copy so _hip_status only evaluated once*/ \
    -
    176  tls_lastHipError = _local_hip_status;\
    -
    177  \
    -
    178  if ((COMPILE_HIP_TRACE_API & 0x2) && HIP_TRACE_API) {\
    -
    179  fprintf(stderr, " %ship-api: %-30s ret=%2d (%s)>>\n" KNRM, (_local_hip_status == 0) ? API_COLOR:KRED, __func__, _local_hip_status, ihipErrorString(_local_hip_status));\
    -
    180  }\
    -
    181  _local_hip_status;\
    -
    182  })
    -
    183 
    -
    184 
    -
    185 
    -
    186 
    -
    187 //---
    -
    188 //HIP_DB Debug flags:
    -
    189 #define DB_API 0 /* 0x01 - shortcut to enable HIP_TRACE_API on single switch */
    -
    190 #define DB_SYNC 1 /* 0x02 - trace synchronization pieces */
    -
    191 #define DB_MEM 2 /* 0x04 - trace memory allocation / deallocation */
    -
    192 #define DB_COPY1 3 /* 0x08 - trace memory copy commands. . */
    -
    193 #define DB_SIGNAL 4 /* 0x10 - trace signal pool commands */
    -
    194 #define DB_COPY2 5 /* 0x20 - trace memory copy commands. Detailed. */
    -
    195 // When adding a new debug flag, also add to the char name table below.
    -
    196 
    -
    197 static const char *dbName [] =
    -
    198 {
    -
    199  KNRM "hip-api", // not used,
    -
    200  KYEL "hip-sync",
    -
    201  KCYN "hip-mem",
    -
    202  KMAG "hip-copy1",
    -
    203  KRED "hip-signal",
    -
    204  KNRM "hip-copy2",
    -
    205 };
    -
    206 
    -
    207 #if COMPILE_HIP_DB
    -
    208 #define tprintf(trace_level, ...) {\
    -
    209  if (HIP_DB & (1<<(trace_level))) {\
    -
    210  fprintf (stderr, " %s:", dbName[trace_level]); \
    -
    211  fprintf (stderr, __VA_ARGS__);\
    -
    212  fprintf (stderr, "%s", KNRM); \
    -
    213  }\
    -
    214 }
    -
    215 #else
    -
    216 /* Compile to empty code */
    -
    217 #define tprintf(trace_level, ...)
    -
    218 #endif
    -
    219 
    -
    220 class ihipException : public std::exception
    -
    221 {
    -
    222 public:
    -
    223  ihipException(hipError_t e) : _code(e) {};
    +
    99 // If FORCE_COPY_DEP=1 , HIP runtime will add
    +
    100 // synchronization for copy commands in the same stream, regardless of command type.
    +
    101 // If FORCE_COPY_DEP=0 data copies of the same kind (H2H, H2D, D2H, D2D) are assumed to be implicitly ordered.
    +
    102 // ROCR runtime implementation currently provides this guarantee when using SDMA queues but not
    +
    103 // when using shader queues.
    +
    104 // TODO - measure if this matters for performance, in particular for back-to-back small copies.
    +
    105 // If not, we can simplify the copy dependency tracking by collapsing to a single Copy type, and always forcing dependencies for copy commands.
    +
    106 #define FORCE_SAMEDIR_COPY_DEP 1
    +
    107 
    +
    108 
    +
    109 // Compile debug trace mode - this prints debug messages to stderr when env var HIP_DB is set.
    +
    110 // May be set to 0 to remove debug if checks - possible code size and performance difference?
    +
    111 #define COMPILE_HIP_DB 1
    +
    112 
    +
    113 
    +
    114 // Compile HIP tracing capability.
    +
    115 // 0x1 = print a string at function entry with arguments.
    +
    116 // 0x2 = prints a simple message with function name + return code when function exits.
    +
    117 // 0x3 = print both.
    +
    118 // Must be enabled at runtime with HIP_TRACE_API
    +
    119 #define COMPILE_HIP_TRACE_API 0x3
    +
    120 
    +
    121 
    +
    122 // Compile code that generates trace markers for CodeXL ATP at HIP function begin/end.
    +
    123 // ATP is standard CodeXL format that includes timestamps for kernels, HSA RT APIs, and HIP APIs.
    +
    124 #ifndef COMPILE_HIP_ATP_MARKER
    +
    125 #define COMPILE_HIP_ATP_MARKER 0
    +
    126 #endif
    +
    127 
    +
    128 
    +
    129 // #include CPP files to produce one object file
    +
    130 #define ONE_OBJECT_FILE 0
    +
    131 
    +
    132 
    +
    133 // Compile support for trace markers that are displayed on CodeXL GUI at start/stop of each function boundary.
    +
    134 // TODO - currently we print the trace message at the beginning. if we waited, we could also include return codes, and any values returned
    +
    135 // through ptr-to-args (ie the pointers allocated by hipMalloc).
    +
    136 #if COMPILE_HIP_ATP_MARKER
    +
    137 #include "AMDTActivityLogger.h"
    +
    138 #define SCOPED_MARKER(markerName,group,userString) amdtScopedMarker(markerName, group, userString)
    +
    139 #else
    +
    140 // Swallow scoped markers:
    +
    141 #define SCOPED_MARKER(markerName,group,userString)
    +
    142 #endif
    +
    143 
    +
    144 
    +
    145 #if COMPILE_HIP_ATP_MARKER || (COMPILE_HIP_TRACE_API & 0x1)
    +
    146 #define API_TRACE(...)\
    +
    147 {\
    +
    148  if (HIP_ATP_MARKER || (COMPILE_HIP_DB && HIP_TRACE_API)) {\
    +
    149  std::string s = std::string(__func__) + " (" + ToString(__VA_ARGS__) + ')';\
    +
    150  if (COMPILE_HIP_DB && HIP_TRACE_API) {\
    +
    151  fprintf (stderr, API_COLOR "<<hip-api: %s\n" KNRM, s.c_str());\
    +
    152  }\
    +
    153  SCOPED_MARKER(s.c_str(), "HIP", NULL);\
    +
    154  }\
    +
    155 }
    +
    156 #else
    +
    157 // Swallow API_TRACE
    +
    158 #define API_TRACE(...)
    +
    159 #endif
    +
    160 
    +
    161 
    +
    162 
    +
    163 // This macro should be called at the beginning of every HIP API.
    +
    164 // It initialies the hip runtime (exactly once), and
    +
    165 // generate trace string that can be output to stderr or to ATP file.
    +
    166 #define HIP_INIT_API(...) \
    +
    167  std::call_once(hip_initialized, ihipInit);\
    +
    168  API_TRACE(__VA_ARGS__);
    +
    169 
    +
    170 #define ihipLogStatus(_hip_status) \
    +
    171  ({\
    +
    172  hipError_t _local_hip_status = _hip_status; /*local copy so _hip_status only evaluated once*/ \
    +
    173  tls_lastHipError = _local_hip_status;\
    +
    174  \
    +
    175  if ((COMPILE_HIP_TRACE_API & 0x2) && HIP_TRACE_API) {\
    +
    176  fprintf(stderr, " %ship-api: %-30s ret=%2d (%s)>>\n" KNRM, (_local_hip_status == 0) ? API_COLOR:KRED, __func__, _local_hip_status, ihipErrorString(_local_hip_status));\
    +
    177  }\
    +
    178  _local_hip_status;\
    +
    179  })
    +
    180 
    +
    181 
    +
    182 
    +
    183 
    +
    184 //---
    +
    185 //HIP_DB Debug flags:
    +
    186 #define DB_API 0 /* 0x01 - shortcut to enable HIP_TRACE_API on single switch */
    +
    187 #define DB_SYNC 1 /* 0x02 - trace synchronization pieces */
    +
    188 #define DB_MEM 2 /* 0x04 - trace memory allocation / deallocation */
    +
    189 #define DB_COPY1 3 /* 0x08 - trace memory copy commands. . */
    +
    190 #define DB_SIGNAL 4 /* 0x10 - trace signal pool commands */
    +
    191 #define DB_COPY2 5 /* 0x20 - trace memory copy commands. Detailed. */
    +
    192 // When adding a new debug flag, also add to the char name table below.
    +
    193 
    +
    194 static const char *dbName [] =
    +
    195 {
    +
    196  KNRM "hip-api", // not used,
    +
    197  KYEL "hip-sync",
    +
    198  KCYN "hip-mem",
    +
    199  KMAG "hip-copy1",
    +
    200  KRED "hip-signal",
    +
    201  KNRM "hip-copy2",
    +
    202 };
    +
    203 
    +
    204 #if COMPILE_HIP_DB
    +
    205 #define tprintf(trace_level, ...) {\
    +
    206  if (HIP_DB & (1<<(trace_level))) {\
    +
    207  fprintf (stderr, " %s:", dbName[trace_level]); \
    +
    208  fprintf (stderr, __VA_ARGS__);\
    +
    209  fprintf (stderr, "%s", KNRM); \
    +
    210  }\
    +
    211 }
    +
    212 #else
    +
    213 /* Compile to empty code */
    +
    214 #define tprintf(trace_level, ...)
    +
    215 #endif
    +
    216 
    +
    217 class ihipException : public std::exception
    +
    218 {
    +
    219 public:
    +
    220  ihipException(hipError_t e) : _code(e) {};
    +
    221 
    +
    222  hipError_t _code;
    +
    223 };
    224 
    -
    225  hipError_t _code;
    -
    226 };
    -
    227 
    -
    228 
    -
    229 #ifdef __cplusplus
    -
    230 extern "C" {
    -
    231 #endif
    -
    232 
    -
    233 typedef class ihipStream_t* hipStream_t;
    -
    234 //typedef struct hipEvent_t {
    -
    235 // struct ihipEvent_t *_handle;
    -
    236 //} hipEvent_t;
    -
    237 
    -
    238 #ifdef __cplusplus
    -
    239 }
    -
    240 #endif
    -
    241 
    -
    242 const hipStream_t hipStreamNull = 0x0;
    -
    243 
    -
    244 
    -
    245 enum ihipCommand_t {
    -
    246  ihipCommandCopyH2H,
    -
    247  ihipCommandCopyH2D,
    -
    248  ihipCommandCopyD2H,
    -
    249  ihipCommandCopyD2D,
    -
    250  ihipCommandKernel,
    -
    251 };
    -
    252 
    -
    253 static const char* ihipCommandName[] = {
    -
    254  "CopyH2H", "CopyH2D", "CopyD2H", "CopyD2D", "Kernel"
    -
    255 };
    +
    225 
    +
    226 #ifdef __cplusplus
    +
    227 extern "C" {
    +
    228 #endif
    +
    229 
    +
    230 typedef class ihipStream_t* hipStream_t;
    +
    231 //typedef struct hipEvent_t {
    +
    232 // struct ihipEvent_t *_handle;
    +
    233 //} hipEvent_t;
    +
    234 
    +
    235 #ifdef __cplusplus
    +
    236 }
    +
    237 #endif
    +
    238 
    +
    239 const hipStream_t hipStreamNull = 0x0;
    +
    240 
    +
    241 
    +
    242 enum ihipCommand_t {
    +
    243  ihipCommandCopyH2H,
    +
    244  ihipCommandCopyH2D,
    +
    245  ihipCommandCopyD2H,
    +
    246  ihipCommandCopyD2D,
    +
    247  ihipCommandCopyP2P,
    +
    248  ihipCommandKernel,
    +
    249 };
    +
    250 
    +
    251 static const char* ihipCommandName[] = {
    +
    252  "CopyH2H", "CopyH2D", "CopyD2H", "CopyD2D", "CopyP2P", "Kernel"
    +
    253 };
    +
    254 
    +
    255 
    256 
    -
    257 
    +
    257 typedef uint64_t SIGSEQNUM;
    258 
    -
    259 typedef uint64_t SIGSEQNUM;
    -
    260 
    -
    261 //---
    -
    262 // Small wrapper around signals.
    -
    263 // Designed to be used from stream.
    -
    264 // TODO-someday refactor this class so it can be stored in a vector<>
    -
    265 // we already store the index here so we can use for garbage collection.
    -
    266 struct ihipSignal_t {
    -
    267  hsa_signal_t _hsa_signal; // hsa signal handle
    -
    268  int _index; // Index in pool, used for garbage collection.
    -
    269  SIGSEQNUM _sig_id; // unique sequentially increasing ID.
    -
    270 
    -
    271  ihipSignal_t();
    -
    272  ~ihipSignal_t();
    -
    273 
    -
    274  void release();
    -
    275 };
    -
    276 
    -
    277 
    -
    278 // Used to remove lock, for performance or stimulating bugs.
    - -
    280 {
    -
    281  public:
    -
    282  void lock() { }
    -
    283  bool try_lock() {return true; }
    -
    284  void unlock() { }
    -
    285 };
    -
    286 
    -
    287 
    -
    288 #if STREAM_THREAD_SAFE
    -
    289 typedef std::mutex StreamMutex;
    -
    290 #else
    -
    291 #warning "Stream thread-safe disabled"
    -
    292 typedef FakeMutex StreamMutex;
    -
    293 #endif
    -
    294 
    -
    295 #if DEVICE_THREAD_SAFE
    -
    296 typedef std::mutex DeviceMutex;
    -
    297 #else
    -
    298 typedef FakeMutex DeviceMutex;
    -
    299 #warning "Device thread-safe disabled"
    -
    300 #endif
    -
    301 
    -
    302 //
    -
    303 //---
    -
    304 // Protects access to the member _data with a lock acquired on contruction/destruction.
    -
    305 // T must contain a _mutex field which meets the BasicLockable requirements (lock/unlock)
    -
    306 template<typename T>
    - -
    308 {
    -
    309 public:
    -
    310  LockedAccessor(T &criticalData, bool autoUnlock=true) :
    -
    311  _criticalData(&criticalData),
    -
    312  _autoUnlock(autoUnlock)
    -
    313 
    -
    314  {
    -
    315  _criticalData->_mutex.lock();
    -
    316  };
    -
    317 
    -
    318  ~LockedAccessor()
    -
    319  {
    -
    320  if (_autoUnlock) {
    -
    321  _criticalData->_mutex.unlock();
    -
    322  }
    -
    323  }
    -
    324 
    -
    325  void unlock()
    -
    326  {
    -
    327  _criticalData->_mutex.unlock();
    -
    328  }
    -
    329 
    -
    330  // Syntactic sugar so -> can be used to get the underlying type.
    -
    331  T *operator->() { return _criticalData; };
    -
    332 
    -
    333 private:
    -
    334  T *_criticalData;
    -
    335  bool _autoUnlock;
    -
    336 };
    -
    337 
    -
    338 
    -
    339 template <typename MUTEX_TYPE>
    -
    340 struct LockedBase {
    -
    341 
    -
    342  // Experts-only interface for explicit locking.
    -
    343  // Most uses should use the lock-accessor.
    -
    344  void lock() { _mutex.lock(); }
    -
    345  void unlock() { _mutex.unlock(); }
    -
    346 
    -
    347  MUTEX_TYPE _mutex;
    -
    348 };
    -
    349 
    -
    350 
    -
    351 template <typename MUTEX_TYPE>
    -
    352 class ihipStreamCriticalBase_t : public LockedBase<MUTEX_TYPE>
    -
    353 {
    -
    354 public:
    - -
    356  _last_command_type(ihipCommandCopyH2H),
    -
    357  _last_copy_signal(NULL),
    -
    358  _signalCursor(0),
    -
    359  _oldest_live_sig_id(1),
    -
    360  _stream_sig_id(0)
    -
    361  {
    -
    362  _signalPool.resize(HIP_STREAM_SIGNALS > 0 ? HIP_STREAM_SIGNALS : 1);
    -
    363  };
    -
    364 
    - -
    366  _signalPool.clear();
    -
    367  }
    +
    259 //---
    +
    260 // Small wrapper around signals.
    +
    261 // Designed to be used from stream.
    +
    262 // TODO-someday refactor this class so it can be stored in a vector<>
    +
    263 // we already store the index here so we can use for garbage collection.
    +
    264 struct ihipSignal_t {
    +
    265  hsa_signal_t _hsa_signal; // hsa signal handle
    +
    266  int _index; // Index in pool, used for garbage collection.
    +
    267  SIGSEQNUM _sig_id; // unique sequentially increasing ID.
    +
    268 
    +
    269  ihipSignal_t();
    +
    270  ~ihipSignal_t();
    +
    271 
    +
    272  void release();
    +
    273 };
    +
    274 
    +
    275 
    +
    276 // Used to remove lock, for performance or stimulating bugs.
    + +
    278 {
    +
    279  public:
    +
    280  void lock() { }
    +
    281  bool try_lock() {return true; }
    +
    282  void unlock() { }
    +
    283 };
    +
    284 
    +
    285 
    +
    286 #if STREAM_THREAD_SAFE
    +
    287 typedef std::mutex StreamMutex;
    +
    288 #else
    +
    289 #warning "Stream thread-safe disabled"
    +
    290 typedef FakeMutex StreamMutex;
    +
    291 #endif
    +
    292 
    +
    293 #if DEVICE_THREAD_SAFE
    +
    294 typedef std::mutex DeviceMutex;
    +
    295 #else
    +
    296 typedef FakeMutex DeviceMutex;
    +
    297 #warning "Device thread-safe disabled"
    +
    298 #endif
    +
    299 
    +
    300 //
    +
    301 //---
    +
    302 // Protects access to the member _data with a lock acquired on contruction/destruction.
    +
    303 // T must contain a _mutex field which meets the BasicLockable requirements (lock/unlock)
    +
    304 template<typename T>
    + +
    306 {
    +
    307 public:
    +
    308  LockedAccessor(T &criticalData, bool autoUnlock=true) :
    +
    309  _criticalData(&criticalData),
    +
    310  _autoUnlock(autoUnlock)
    +
    311 
    +
    312  {
    +
    313  _criticalData->_mutex.lock();
    +
    314  };
    +
    315 
    +
    316  ~LockedAccessor()
    +
    317  {
    +
    318  if (_autoUnlock) {
    +
    319  _criticalData->_mutex.unlock();
    +
    320  }
    +
    321  }
    +
    322 
    +
    323  void unlock()
    +
    324  {
    +
    325  _criticalData->_mutex.unlock();
    +
    326  }
    +
    327 
    +
    328  // Syntactic sugar so -> can be used to get the underlying type.
    +
    329  T *operator->() { return _criticalData; };
    +
    330 
    +
    331 private:
    +
    332  T *_criticalData;
    +
    333  bool _autoUnlock;
    +
    334 };
    +
    335 
    +
    336 
    +
    337 template <typename MUTEX_TYPE>
    +
    338 struct LockedBase {
    +
    339 
    +
    340  // Experts-only interface for explicit locking.
    +
    341  // Most uses should use the lock-accessor.
    +
    342  void lock() { _mutex.lock(); }
    +
    343  void unlock() { _mutex.unlock(); }
    +
    344 
    +
    345  MUTEX_TYPE _mutex;
    +
    346 };
    +
    347 
    +
    348 
    +
    349 template <typename MUTEX_TYPE>
    +
    350 class ihipStreamCriticalBase_t : public LockedBase<MUTEX_TYPE>
    +
    351 {
    +
    352 public:
    + +
    354  _last_command_type(ihipCommandCopyH2H),
    +
    355  _last_copy_signal(NULL),
    +
    356  _signalCursor(0),
    +
    357  _oldest_live_sig_id(1),
    +
    358  _stream_sig_id(0)
    +
    359  {
    +
    360  _signalPool.resize(HIP_STREAM_SIGNALS > 0 ? HIP_STREAM_SIGNALS : 1);
    +
    361  };
    +
    362 
    + +
    364  _signalPool.clear();
    +
    365  }
    +
    366 
    +
    368 
    - -
    370 
    -
    371 
    -
    372 public:
    -
    373  // Critical Data:
    -
    374  ihipCommand_t _last_command_type; // type of the last command
    -
    375 
    -
    376  // signal of last copy command sent to the stream.
    -
    377  // May be NULL, indicating the previous command has completley finished and future commands don't need to create a dependency.
    -
    378  // Copy can be either H2D or D2H.
    -
    379  ihipSignal_t *_last_copy_signal;
    +
    369 
    +
    370 public:
    +
    371  // Critical Data:
    +
    372  ihipCommand_t _last_command_type; // type of the last command
    +
    373 
    +
    374  // signal of last copy command sent to the stream.
    +
    375  // May be NULL, indicating the previous command has completley finished and future commands don't need to create a dependency.
    +
    376  // Copy can be either H2D or D2H.
    +
    377  ihipSignal_t *_last_copy_signal;
    +
    378 
    +
    379  hc::completion_future _last_kernel_future; // Completion future of last kernel command sent to GPU.
    380 
    -
    381  hc::completion_future _last_kernel_future; // Completion future of last kernel command sent to GPU.
    -
    382 
    -
    383  // Signal pool:
    -
    384  int _signalCursor;
    -
    385  SIGSEQNUM _oldest_live_sig_id; // oldest live seq_id, anything < this can be allocated.
    -
    386  std::deque<ihipSignal_t> _signalPool; // Pool of signals for use by this stream.
    -
    387 
    -
    388 
    -
    389  SIGSEQNUM _stream_sig_id; // Monotonically increasing unique signal id.
    -
    390 };
    -
    391 
    -
    392 
    - - +
    381  // Signal pool:
    +
    382  int _signalCursor;
    +
    383  SIGSEQNUM _oldest_live_sig_id; // oldest live seq_id, anything < this can be allocated.
    +
    384  std::deque<ihipSignal_t> _signalPool; // Pool of signals for use by this stream.
    +
    385 
    +
    386 
    +
    387  SIGSEQNUM _stream_sig_id; // Monotonically increasing unique signal id.
    +
    388 };
    +
    389 
    +
    390 
    + + +
    393 
    +
    394 
    395 
    -
    396 
    -
    397 
    -
    398 // Internal stream structure.
    - -
    400 public:
    -
    401 typedef uint64_t SeqNum_t ;
    -
    402 
    -
    403  ihipStream_t(unsigned device_index, hc::accelerator_view av, unsigned int flags);
    -
    404  ~ihipStream_t();
    -
    405 
    -
    406  // kind is hipMemcpyKind
    -
    407  void copySync (LockedAccessor_StreamCrit_t &crit, void* dst, const void* src, size_t sizeBytes, unsigned kind);
    -
    408  void locked_copySync (void* dst, const void* src, size_t sizeBytes, unsigned kind);
    +
    396 // Internal stream structure.
    + +
    398 public:
    +
    399 typedef uint64_t SeqNum_t ;
    +
    400 
    +
    401  ihipStream_t(unsigned device_index, hc::accelerator_view av, unsigned int flags);
    +
    402  ~ihipStream_t();
    +
    403 
    +
    404  // kind is hipMemcpyKind
    +
    405  void copySync (LockedAccessor_StreamCrit_t &crit, void* dst, const void* src, size_t sizeBytes, unsigned kind);
    +
    406  void locked_copySync (void* dst, const void* src, size_t sizeBytes, unsigned kind);
    +
    407 
    +
    408  void copyAsync(void* dst, const void* src, size_t sizeBytes, unsigned kind);
    409 
    -
    410  void copyAsync(void* dst, const void* src, size_t sizeBytes, unsigned kind);
    -
    411 
    -
    412  //---
    -
    413  // Thread-safe accessors - these acquire / release mutex:
    -
    414  bool lockopen_preKernelCommand();
    -
    415  void lockclose_postKernelCommand(hc::completion_future &kernel_future);
    +
    410  //---
    +
    411  // Thread-safe accessors - these acquire / release mutex:
    +
    412  bool lockopen_preKernelCommand();
    +
    413  void lockclose_postKernelCommand(hc::completion_future &kernel_future);
    +
    414 
    +
    415  int preCopyCommand(LockedAccessor_StreamCrit_t &crit, ihipSignal_t *lastCopy, hsa_signal_t *waitSignal, ihipCommand_t copyType);
    416 
    -
    417  int preCopyCommand(LockedAccessor_StreamCrit_t &crit, ihipSignal_t *lastCopy, hsa_signal_t *waitSignal, ihipCommand_t copyType);
    -
    418 
    -
    419  void locked_reclaimSignals(SIGSEQNUM sigNum);
    -
    420  void locked_wait(bool assertQueueEmpty=false);
    -
    421  SIGSEQNUM locked_lastCopySeqId() {LockedAccessor_StreamCrit_t crit(_criticalData); return lastCopySeqId(crit); };
    -
    422 
    -
    423  // Use this if we already have the stream critical data mutex:
    -
    424  void wait(LockedAccessor_StreamCrit_t &crit, bool assertQueueEmpty=false);
    +
    417  void locked_reclaimSignals(SIGSEQNUM sigNum);
    +
    418  void locked_wait(bool assertQueueEmpty=false);
    +
    419  SIGSEQNUM locked_lastCopySeqId() {LockedAccessor_StreamCrit_t crit(_criticalData); return lastCopySeqId(crit); };
    +
    420 
    +
    421  // Use this if we already have the stream critical data mutex:
    +
    422  void wait(LockedAccessor_StreamCrit_t &crit, bool assertQueueEmpty=false);
    +
    423 
    +
    424 
    425 
    -
    426 
    -
    427 
    -
    428  // Non-threadsafe accessors - must be protected by high-level stream lock with accessor passed to function.
    -
    429  SIGSEQNUM lastCopySeqId (LockedAccessor_StreamCrit_t &crit) { return crit->_last_copy_signal ? crit->_last_copy_signal->_sig_id : 0; };
    -
    430  ihipSignal_t * allocSignal (LockedAccessor_StreamCrit_t &crit);
    -
    431 
    -
    432 
    -
    433  //-- Non-racy accessors:
    -
    434  // These functions access fields set at initialization time and are non-racy (so do not acquire mutex)
    -
    435  ihipDevice_t * getDevice() const;
    -
    436 
    -
    437 
    -
    438 public:
    -
    439  //---
    -
    440  //Public member vars - these are set at initialization and never change:
    -
    441  SeqNum_t _id; // monotonic sequence ID
    -
    442  hc::accelerator_view _av;
    -
    443  unsigned _flags;
    -
    444 
    -
    445 private: // Critical Data. THis MUST be accessed through LockedAccessor_StreamCrit_t
    -
    446  ihipStreamCritical_t _criticalData;
    -
    447 
    -
    448 private:
    -
    449  void enqueueBarrier(hsa_queue_t* queue, ihipSignal_t *depSignal);
    -
    450  void waitCopy(LockedAccessor_StreamCrit_t &crit, ihipSignal_t *signal);
    -
    451 
    -
    452  // The unsigned return is hipMemcpyKind
    -
    453  unsigned resolveMemcpyDirection(bool srcInDeviceMem, bool dstInDeviceMem);
    -
    454  void setCopyAgents(unsigned kind, ihipCommand_t *commandType, hsa_agent_t *srcAgent, hsa_agent_t *dstAgent);
    -
    455 
    -
    456  unsigned _device_index; // index into the g_device array
    +
    426  // Non-threadsafe accessors - must be protected by high-level stream lock with accessor passed to function.
    +
    427  SIGSEQNUM lastCopySeqId (LockedAccessor_StreamCrit_t &crit) { return crit->_last_copy_signal ? crit->_last_copy_signal->_sig_id : 0; };
    +
    428  ihipSignal_t * allocSignal (LockedAccessor_StreamCrit_t &crit);
    +
    429 
    +
    430 
    +
    431  //-- Non-racy accessors:
    +
    432  // These functions access fields set at initialization time and are non-racy (so do not acquire mutex)
    +
    433  ihipDevice_t * getDevice() const;
    +
    434 
    +
    435 
    +
    436 public:
    +
    437  //---
    +
    438  //Public member vars - these are set at initialization and never change:
    +
    439  SeqNum_t _id; // monotonic sequence ID
    +
    440  hc::accelerator_view _av;
    +
    441  unsigned _flags;
    +
    442 
    +
    443 private:
    +
    444  // Critical Data. THis MUST be accessed through LockedAccessor_StreamCrit_t
    +
    445  ihipStreamCritical_t _criticalData;
    +
    446 
    +
    447  // Array of dependency completion_future.
    +
    448  std::vector<hc::completion_future> _depFutures;
    +
    449 
    +
    450 private:
    +
    451  void enqueueBarrier(hsa_queue_t* queue, ihipSignal_t *depSignal);
    +
    452  void waitCopy(LockedAccessor_StreamCrit_t &crit, ihipSignal_t *signal);
    +
    453 
    +
    454  // The unsigned return is hipMemcpyKind
    +
    455  unsigned resolveMemcpyDirection(bool srcTracked, bool dstTracked, bool srcInDeviceMem, bool dstInDeviceMem);
    +
    456  void setAsyncCopyAgents(unsigned kind, ihipCommand_t *commandType, hsa_agent_t *srcAgent, hsa_agent_t *dstAgent);
    457 
    -
    458  friend std::ostream& operator<<(std::ostream& os, const ihipStream_t& s);
    -
    459 };
    -
    460 
    -
    461 
    -
    462 inline std::ostream& operator<<(std::ostream& os, const ihipStream_t& s)
    -
    463 {
    -
    464  os << "stream#";
    -
    465  os << s._device_index;
    -
    466  os << '.';
    -
    467  os << s._id;
    -
    468  return os;
    -
    469 }
    -
    470 
    -
    471 
    -
    472 //----
    -
    473 // Internal event structure:
    -
    474 enum hipEventStatus_t {
    -
    475  hipEventStatusUnitialized = 0, // event is unutilized, must be "Created" before use.
    -
    476  hipEventStatusCreated = 1,
    -
    477  hipEventStatusRecording = 2, // event has been enqueued to record something.
    -
    478  hipEventStatusRecorded = 3, // event has been recorded - timestamps are valid.
    -
    479 } ;
    -
    480 
    -
    481 
    -
    482 // internal hip event structure.
    -
    483 struct ihipEvent_t {
    -
    484  hipEventStatus_t _state;
    -
    485 
    -
    486  hipStream_t _stream; // Stream where the event is recorded, or NULL if all streams.
    -
    487  unsigned _flags;
    -
    488 
    -
    489  hc::completion_future _marker;
    -
    490  uint64_t _timestamp; // store timestamp, may be set on host or by marker.
    -
    491 
    -
    492  SIGSEQNUM _copy_seq_id;
    -
    493 } ;
    -
    494 
    -
    495 
    +
    458  unsigned _device_index; // index into the g_device array
    +
    459 
    +
    460  friend std::ostream& operator<<(std::ostream& os, const ihipStream_t& s);
    +
    461 };
    +
    462 
    +
    463 
    +
    464 inline std::ostream& operator<<(std::ostream& os, const ihipStream_t& s)
    +
    465 {
    +
    466  os << "stream#";
    +
    467  os << s._device_index;
    +
    468  os << '.';
    +
    469  os << s._id;
    +
    470  return os;
    +
    471 }
    +
    472 
    +
    473 
    +
    474 //----
    +
    475 // Internal event structure:
    +
    476 enum hipEventStatus_t {
    +
    477  hipEventStatusUnitialized = 0, // event is unutilized, must be "Created" before use.
    +
    478  hipEventStatusCreated = 1,
    +
    479  hipEventStatusRecording = 2, // event has been enqueued to record something.
    +
    480  hipEventStatusRecorded = 3, // event has been recorded - timestamps are valid.
    +
    481 } ;
    +
    482 
    +
    483 
    +
    484 // internal hip event structure.
    +
    485 struct ihipEvent_t {
    +
    486  hipEventStatus_t _state;
    +
    487 
    +
    488  hipStream_t _stream; // Stream where the event is recorded, or NULL if all streams.
    +
    489  unsigned _flags;
    +
    490 
    +
    491  hc::completion_future _marker;
    +
    492  uint64_t _timestamp; // store timestamp, may be set on host or by marker.
    +
    493 
    +
    494  SIGSEQNUM _copy_seq_id;
    +
    495 } ;
    496 
    497 
    498 
    -
    499 //---
    -
    500 // Data that must be protected with thread-safe access
    -
    501 // All members are private - this class must be accessed through friend LockedAccessor which
    -
    502 // will lock the mutex on construction and unlock on destruction.
    -
    503 //
    -
    504 // MUTEX_TYPE is template argument so can easily convert to FakeMutex for performance or stress testing.
    -
    505 template <class MUTEX_TYPE>
    - -
    507 {
    -
    508 public:
    -
    509  ihipDeviceCriticalBase_t() : _stream_id(0), _peerAgents(nullptr) {};
    -
    510 
    -
    511  void init(unsigned deviceCnt) {
    -
    512  assert(_peerAgents == nullptr);
    -
    513  _peerAgents = new hsa_agent_t[deviceCnt];
    -
    514  };
    -
    515 
    - -
    517  if (_peerAgents != nullptr) {
    -
    518  delete _peerAgents;
    -
    519  _peerAgents = nullptr;
    -
    520  }
    -
    521  }
    -
    522  friend class LockedAccessor<ihipDeviceCriticalBase_t>;
    -
    523 
    -
    524  std::list<ihipStream_t*> &streams() { return _streams; };
    -
    525  const std::list<ihipStream_t*> &const_streams() const { return _streams; };
    -
    526 
    -
    527  // "Allocate" a stream ID:
    -
    528  ihipStream_t::SeqNum_t incStreamId() { return _stream_id++; };
    -
    529 
    -
    530  bool addPeer(ihipDevice_t *peer);
    -
    531  bool removePeer(ihipDevice_t *peer);
    -
    532  void resetPeers(ihipDevice_t *thisDevice);
    -
    533 
    -
    534 
    -
    535  void addStream(ihipStream_t *stream);
    +
    499 
    +
    500 
    +
    501 //---
    +
    502 // Data that must be protected with thread-safe access
    +
    503 // All members are private - this class must be accessed through friend LockedAccessor which
    +
    504 // will lock the mutex on construction and unlock on destruction.
    +
    505 //
    +
    506 // MUTEX_TYPE is template argument so can easily convert to FakeMutex for performance or stress testing.
    +
    507 template <class MUTEX_TYPE>
    + +
    509 {
    +
    510 public:
    +
    511  ihipDeviceCriticalBase_t() : _stream_id(0), _peerAgents(nullptr) {};
    +
    512 
    +
    513  void init(unsigned deviceCnt) {
    +
    514  assert(_peerAgents == nullptr);
    +
    515  _peerAgents = new hsa_agent_t[deviceCnt];
    +
    516  };
    +
    517 
    + +
    519  if (_peerAgents != nullptr) {
    +
    520  delete _peerAgents;
    +
    521  _peerAgents = nullptr;
    +
    522  }
    +
    523  }
    +
    524  friend class LockedAccessor<ihipDeviceCriticalBase_t>;
    +
    525 
    +
    526  std::list<ihipStream_t*> &streams() { return _streams; };
    +
    527  const std::list<ihipStream_t*> &const_streams() const { return _streams; };
    +
    528 
    +
    529  // "Allocate" a stream ID:
    +
    530  ihipStream_t::SeqNum_t incStreamId() { return _stream_id++; };
    +
    531 
    +
    532  bool isPeer(const ihipDevice_t *peer); // returns Trus if peer has access to memory physically located on this device.
    +
    533  bool addPeer(ihipDevice_t *peer);
    +
    534  bool removePeer(ihipDevice_t *peer);
    +
    535  void resetPeers(ihipDevice_t *thisDevice);
    536 
    -
    537  uint32_t peerCnt() const { return _peerCnt; };
    -
    538  hsa_agent_t *peerAgents() const { return _peerAgents; };
    +
    537 
    +
    538  void addStream(ihipStream_t *stream);
    539 
    -
    540 
    -
    541 private:
    -
    542  std::list<ihipStream_t*> _streams; // streams associated with this device.
    -
    543  ihipStream_t::SeqNum_t _stream_id;
    -
    544 
    -
    545  // These reflect the currently Enabled set of peers for this GPU:
    -
    546  std::list<ihipDevice_t*> _peers; // list of enabled peer devices.
    -
    547  uint32_t _peerCnt; // number of enabled peers
    -
    548  hsa_agent_t *_peerAgents; // efficient packed array of enabled agents (to use for allocations.)
    -
    549 private:
    -
    550  void recomputePeerAgents();
    -
    551 };
    -
    552 
    -
    553 // Note Mutex selected based on DeviceMutex
    - -
    555 
    -
    556 // This type is used by functions that need access to the critical device structures.
    - -
    558 
    -
    559 
    +
    540  uint32_t peerCnt() const { return _peerCnt; };
    +
    541  hsa_agent_t *peerAgents() const { return _peerAgents; };
    +
    542 
    +
    543 
    +
    544 private:
    +
    545  //std::list< std::shared_ptr<ihipStream_t> > _streams; // streams associated with this device. TODO - convert to shared_ptr.
    +
    546  std::list< ihipStream_t* > _streams; // streams associated with this device.
    +
    547  ihipStream_t::SeqNum_t _stream_id;
    +
    548 
    +
    549  // These reflect the currently Enabled set of peers for this GPU:
    +
    550  // Enabled peers have permissions to access the memory physically allocated on this device.
    +
    551  std::list<ihipDevice_t*> _peers; // list of enabled peer devices.
    +
    552  uint32_t _peerCnt; // number of enabled peers
    +
    553  hsa_agent_t *_peerAgents; // efficient packed array of enabled agents (to use for allocations.)
    +
    554 private:
    +
    555  void recomputePeerAgents();
    +
    556 };
    +
    557 
    +
    558 // Note Mutex selected based on DeviceMutex
    +
    560 
    -
    561 //-------------------------------------------------------------------------------------------------
    -
    562 // Functions which read or write the critical data are named locked_.
    -
    563 // ihipDevice_t does not use recursive locks so the ihip implementation must avoid calling a locked_ function from within a locked_ function.
    -
    564 // External functions which call several locked_ functions will acquire and release the lock for each function. if this occurs in
    -
    565 // performance-sensitive code we may want to refactor by adding non-locked functions and creating a new locked_ member function to call them all.
    - -
    567 {
    -
    568 public: // Functions:
    -
    569  ihipDevice_t() {}; // note: calls constructor for _criticalData
    -
    570  void init(unsigned device_index, unsigned deviceCnt, hc::accelerator &acc, unsigned flags);
    -
    571  ~ihipDevice_t();
    -
    572 
    -
    573  void locked_addStream(ihipStream_t *s);
    -
    574  void locked_removeStream(ihipStream_t *s);
    -
    575  void locked_reset();
    -
    576  void locked_waitAllStreams();
    -
    577  void locked_syncDefaultStream(bool waitOnSelf);
    -
    578 
    -
    579  ihipDeviceCritical_t &criticalData() { return _criticalData; }; // TODO, move private. Fix P2P.
    -
    580 
    -
    581 public: // Data, set at initialization:
    -
    582  unsigned _device_index; // index into g_devices.
    +
    561 // This type is used by functions that need access to the critical device structures.
    + +
    563 
    +
    564 
    +
    565 
    +
    566 //-------------------------------------------------------------------------------------------------
    +
    567 // Functions which read or write the critical data are named locked_.
    +
    568 // ihipDevice_t does not use recursive locks so the ihip implementation must avoid calling a locked_ function from within a locked_ function.
    +
    569 // External functions which call several locked_ functions will acquire and release the lock for each function. if this occurs in
    +
    570 // performance-sensitive code we may want to refactor by adding non-locked functions and creating a new locked_ member function to call them all.
    + +
    572 {
    +
    573 public: // Functions:
    +
    574  ihipDevice_t() {}; // note: calls constructor for _criticalData
    +
    575  void init(unsigned device_index, unsigned deviceCnt, hc::accelerator &acc, unsigned flags);
    +
    576  ~ihipDevice_t();
    +
    577 
    +
    578  void locked_addStream(ihipStream_t *s);
    +
    579  void locked_removeStream(ihipStream_t *s);
    +
    580  void locked_reset();
    +
    581  void locked_waitAllStreams();
    +
    582  void locked_syncDefaultStream(bool waitOnSelf);
    583 
    -
    584  hipDeviceProp_t _props; // saved device properties.
    -
    585  hc::accelerator _acc;
    -
    586  hsa_agent_t _hsa_agent; // hsa agent handle
    -
    587 
    -
    588  // The NULL stream is used if no other stream is specified.
    -
    589  // NULL has special synchronization properties with other streams.
    -
    590  ihipStream_t *_default_stream;
    -
    591 
    +
    584  ihipDeviceCritical_t &criticalData() { return _criticalData; }; // TODO, move private. Fix P2P.
    +
    585 
    +
    586 public: // Data, set at initialization:
    +
    587  unsigned _device_index; // index into g_devices.
    +
    588 
    +
    589  hipDeviceProp_t _props; // saved device properties.
    +
    590  hc::accelerator _acc;
    +
    591  hsa_agent_t _hsa_agent; // hsa agent handle
    592 
    -
    593  unsigned _compute_units;
    -
    594 
    -
    595  StagingBuffer *_staging_buffer[2]; // one buffer for each direction.
    +
    593  // The NULL stream is used if no other stream is specified.
    +
    594  // NULL has special synchronization properties with other streams.
    +
    595  ihipStream_t *_default_stream;
    596 
    597 
    -
    598  unsigned _device_flags;
    +
    598  unsigned _compute_units;
    599 
    -
    600 private:
    -
    601  hipError_t getProperties(hipDeviceProp_t* prop);
    +
    600  StagingBuffer *_staging_buffer[2]; // one buffer for each direction.
    +
    601 
    602 
    -
    603 private: // Critical data, protected with locked access:
    -
    604  // Members of _protected data MUST be accessed through the LockedAccessor.
    -
    605  // Search for LockedAccessor<ihipDeviceCritical_t> for examples; do not access _criticalData directly.
    -
    606  ihipDeviceCritical_t _criticalData;
    +
    603  unsigned _device_flags;
    +
    604 
    +
    605 private:
    +
    606  hipError_t getProperties(hipDeviceProp_t* prop);
    607 
    -
    608 };
    -
    609 
    -
    610 
    -
    611 
    -
    612 // Global variable definition:
    -
    613 extern std::once_flag hip_initialized;
    -
    614 extern ihipDevice_t *g_devices; // Array of all non-emulated (ie GPU) accelerators in the system.
    -
    615 extern bool g_visible_device; // Set the flag when HIP_VISIBLE_DEVICES is set
    -
    616 extern unsigned g_deviceCnt;
    -
    617 extern std::vector<int> g_hip_visible_devices; /* vector of integers that contains the visible device IDs */
    -
    618 extern hsa_agent_t g_cpu_agent ; // the CPU agent.
    -
    619 //=================================================================================================
    -
    620 void ihipInit();
    -
    621 const char *ihipErrorString(hipError_t);
    -
    622 ihipDevice_t *ihipGetTlsDefaultDevice();
    -
    623 ihipDevice_t *ihipGetDevice(int);
    -
    624 void ihipSetTs(hipEvent_t e);
    -
    625 
    -
    626 template<typename T>
    -
    627 hc::completion_future ihipMemcpyKernel(hipStream_t, T*, const T*, size_t);
    -
    628 
    -
    629 template<typename T>
    -
    630 hc::completion_future ihipMemsetKernel(hipStream_t, T*, T, size_t);
    -
    631 
    -
    632 hipStream_t ihipSyncAndResolveStream(hipStream_t);
    -
    633 template <typename T>
    -
    634 
    -
    635 hc::completion_future
    -
    636 ihipMemsetKernel(hipStream_t stream, T * ptr, T val, size_t sizeBytes)
    -
    637 {
    -
    638  int wg = std::min((unsigned)8, stream->getDevice()->_compute_units);
    -
    639  const int threads_per_wg = 256;
    -
    640 
    -
    641  int threads = wg * threads_per_wg;
    -
    642  if (threads > sizeBytes) {
    -
    643  threads = ((sizeBytes + threads_per_wg - 1) / threads_per_wg) * threads_per_wg;
    -
    644  }
    +
    608 private: // Critical data, protected with locked access:
    +
    609  // Members of _protected data MUST be accessed through the LockedAccessor.
    +
    610  // Search for LockedAccessor<ihipDeviceCritical_t> for examples; do not access _criticalData directly.
    +
    611  ihipDeviceCritical_t _criticalData;
    +
    612 
    +
    613 };
    +
    614 
    +
    615 
    +
    616 
    +
    617 // Global variable definition:
    +
    618 extern std::once_flag hip_initialized;
    +
    619 extern ihipDevice_t *g_devices; // Array of all non-emulated (ie GPU) accelerators in the system.
    +
    620 extern bool g_visible_device; // Set the flag when HIP_VISIBLE_DEVICES is set
    +
    621 extern unsigned g_deviceCnt;
    +
    622 extern std::vector<int> g_hip_visible_devices; /* vector of integers that contains the visible device IDs */
    +
    623 extern hsa_agent_t g_cpu_agent ; // the CPU agent.
    +
    624 //=================================================================================================
    +
    625 void ihipInit();
    +
    626 const char *ihipErrorString(hipError_t);
    +
    627 ihipDevice_t *ihipGetTlsDefaultDevice();
    +
    628 ihipDevice_t *ihipGetDevice(int);
    +
    629 void ihipSetTs(hipEvent_t e);
    +
    630 
    +
    631 template<typename T>
    +
    632 hc::completion_future ihipMemcpyKernel(hipStream_t, T*, const T*, size_t);
    +
    633 
    +
    634 template<typename T>
    +
    635 hc::completion_future ihipMemsetKernel(hipStream_t, T*, T, size_t);
    +
    636 
    +
    637 hipStream_t ihipSyncAndResolveStream(hipStream_t);
    +
    638 template <typename T>
    +
    639 
    +
    640 hc::completion_future
    +
    641 ihipMemsetKernel(hipStream_t stream, T * ptr, T val, size_t sizeBytes)
    +
    642 {
    +
    643  int wg = std::min((unsigned)8, stream->getDevice()->_compute_units);
    +
    644  const int threads_per_wg = 256;
    645 
    -
    646 
    -
    647  hc::extent<1> ext(threads);
    -
    648  auto ext_tile = ext.tile(threads_per_wg);
    -
    649 
    -
    650  hc::completion_future cf =
    -
    651  hc::parallel_for_each(
    -
    652  stream->_av,
    -
    653  ext_tile,
    -
    654  [=] (hc::tiled_index<1> idx)
    -
    655  __attribute__((hc))
    -
    656  {
    -
    657  int offset = amp_get_global_id(0);
    -
    658  // TODO-HCC - change to hc_get_local_size()
    -
    659  int stride = amp_get_local_size(0) * hc_get_num_groups(0) ;
    -
    660 
    -
    661  for (int i=offset; i<sizeBytes; i+=stride) {
    -
    662  ptr[i] = val;
    -
    663  }
    -
    664  });
    +
    646  int threads = wg * threads_per_wg;
    +
    647  if (threads > sizeBytes) {
    +
    648  threads = ((sizeBytes + threads_per_wg - 1) / threads_per_wg) * threads_per_wg;
    +
    649  }
    +
    650 
    +
    651 
    +
    652  hc::extent<1> ext(threads);
    +
    653  auto ext_tile = ext.tile(threads_per_wg);
    +
    654 
    +
    655  hc::completion_future cf =
    +
    656  hc::parallel_for_each(
    +
    657  stream->_av,
    +
    658  ext_tile,
    +
    659  [=] (hc::tiled_index<1> idx)
    +
    660  __attribute__((hc))
    +
    661  {
    +
    662  int offset = amp_get_global_id(0);
    +
    663  // TODO-HCC - change to hc_get_local_size()
    +
    664  int stride = amp_get_local_size(0) * hc_get_num_groups(0) ;
    665 
    -
    666  return cf;
    -
    667 }
    -
    668 
    -
    669 template <typename T>
    -
    670 hc::completion_future
    -
    671 ihipMemcpyKernel(hipStream_t stream, T * c, const T * a, size_t sizeBytes)
    -
    672 {
    -
    673  int wg = std::min((unsigned)8, stream->getDevice()->_compute_units);
    -
    674  const int threads_per_wg = 256;
    -
    675 
    -
    676  int threads = wg * threads_per_wg;
    -
    677  if (threads > sizeBytes) {
    -
    678  threads = ((sizeBytes + threads_per_wg - 1) / threads_per_wg) * threads_per_wg;
    -
    679  }
    +
    666  for (int i=offset; i<sizeBytes; i+=stride) {
    +
    667  ptr[i] = val;
    +
    668  }
    +
    669  });
    +
    670 
    +
    671  return cf;
    +
    672 }
    +
    673 
    +
    674 template <typename T>
    +
    675 hc::completion_future
    +
    676 ihipMemcpyKernel(hipStream_t stream, T * c, const T * a, size_t sizeBytes)
    +
    677 {
    +
    678  int wg = std::min((unsigned)8, stream->getDevice()->_compute_units);
    +
    679  const int threads_per_wg = 256;
    680 
    -
    681 
    -
    682  hc::extent<1> ext(threads);
    -
    683  auto ext_tile = ext.tile(threads_per_wg);
    -
    684 
    -
    685  hc::completion_future cf =
    -
    686  hc::parallel_for_each(
    -
    687  stream->_av,
    -
    688  ext_tile,
    -
    689  [=] (hc::tiled_index<1> idx)
    -
    690  __attribute__((hc))
    -
    691  {
    -
    692  int offset = amp_get_global_id(0);
    -
    693  // TODO-HCC - change to hc_get_local_size()
    -
    694  int stride = amp_get_local_size(0) * hc_get_num_groups(0) ;
    -
    695 
    -
    696  for (int i=offset; i<sizeBytes; i+=stride) {
    -
    697  c[i] = a[i];
    -
    698  }
    -
    699  });
    +
    681  int threads = wg * threads_per_wg;
    +
    682  if (threads > sizeBytes) {
    +
    683  threads = ((sizeBytes + threads_per_wg - 1) / threads_per_wg) * threads_per_wg;
    +
    684  }
    +
    685 
    +
    686 
    +
    687  hc::extent<1> ext(threads);
    +
    688  auto ext_tile = ext.tile(threads_per_wg);
    +
    689 
    +
    690  hc::completion_future cf =
    +
    691  hc::parallel_for_each(
    +
    692  stream->_av,
    +
    693  ext_tile,
    +
    694  [=] (hc::tiled_index<1> idx)
    +
    695  __attribute__((hc))
    +
    696  {
    +
    697  int offset = amp_get_global_id(0);
    +
    698  // TODO-HCC - change to hc_get_local_size()
    +
    699  int stride = amp_get_local_size(0) * hc_get_num_groups(0) ;
    700 
    -
    701  return cf;
    -
    702 }
    -
    703 
    -
    704 #endif
    -
    Definition: hip_hcc.h:566
    -
    Definition: hip_hcc.h:340
    -
    Definition: hip_hcc.h:279
    +
    701  for (int i=offset; i<sizeBytes; i+=stride) {
    +
    702  c[i] = a[i];
    +
    703  }
    +
    704  });
    +
    705 
    +
    706  return cf;
    +
    707 }
    +
    708 
    +
    709 #endif
    +
    Definition: hip_hcc.h:571
    +
    Definition: hip_hcc.h:338
    +
    Definition: hip_hcc.h:277
    hipError_t
    Definition: hip_runtime_api.h:142
    Definition: hip_runtime_api.h:47
    -
    Definition: hip_hcc.h:506
    -
    Definition: hip_hcc.h:266
    +
    Definition: hip_hcc.h:508
    +
    Definition: hip_hcc.h:264
    Definition: hip_runtime_api.h:74
    Definition: staging_buffer.h:40
    -
    Definition: hip_hcc.h:483
    -
    Definition: hip_hcc.h:220
    -
    Definition: hip_hcc.h:399
    -
    Definition: hip_hcc.h:352
    -
    Definition: hip_hcc.h:307
    +
    Definition: hip_hcc.h:485
    +
    Definition: hip_hcc.h:217
    +
    Definition: hip_hcc.h:397
    +
    Definition: hip_hcc.h:350
    +
    Definition: hip_hcc.h:305
    diff --git a/projects/hip/docs/RuntimeAPI/html/hip__ldg_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__ldg_8h_source.html new file mode 100644 index 0000000000..7ea20a40cc --- /dev/null +++ b/projects/hip/docs/RuntimeAPI/html/hip__ldg_8h_source.html @@ -0,0 +1,207 @@ + + + + + + +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_ldg.h Source File + + + + + + + + + +
    +
    + + + + + + +
    +
    HIP: Heterogenous-computing Interface for Portability +
    +
    +
    + + + + + + + + + +
    + +
    + + +
    +
    +
    +
    hip_ldg.h
    +
    +
    +
    1 /*
    +
    2 Copyright (c) 2015-2016 Advanced Micro Devices, Inc. All rights reserved.
    +
    3 Permission is hereby granted, free of charge, to any person obtaining a copy
    +
    4 of this software and associated documentation files (the "Software"), to deal
    +
    5 in the Software without restriction, including without limitation the rights
    +
    6 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    +
    7 copies of the Software, and to permit persons to whom the Software is
    +
    8 furnished to do so, subject to the following conditions:
    +
    9 The above copyright notice and this permission notice shall be included in
    +
    10 all copies or substantial portions of the Software.
    +
    11 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    +
    12 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    +
    13 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    +
    14 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    +
    15 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    +
    16 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
    +
    17 THE SOFTWARE.
    +
    18 */
    +
    19 
    +
    20 #ifndef HIP_LDG_H
    +
    21 #define HIP_LDG_H
    +
    22 
    +
    23 #if __HCC__
    +
    24 #include"hip_vector_types.h"
    +
    25 #include"host_defines.h"
    +
    26 #if __hcc_workweek__ >= 16164
    +
    27 #include"hip/hip_vector_types.h"
    +
    28 #include"hip/hcc_detail/host_defines.h"
    +
    29 
    +
    30 
    +
    31 __device__ char __ldg(const char* );
    +
    32 __device__ char1 __ldg(const char1* );
    +
    33 __device__ char2 __ldg(const char2* );
    +
    34 __device__ char3 __ldg(const char3* );
    +
    35 __device__ char4 __ldg(const char4* );
    +
    36 __device__ signed char __ldg(const signed char* );
    +
    37 __device__ unsigned char __ldg(const unsigned char* );
    +
    38 
    +
    39 __device__ short __ldg(const short* );
    +
    40 __device__ short1 __ldg(const short1* );
    +
    41 __device__ short2 __ldg(const short2* );
    +
    42 __device__ short3 __ldg(const short3* );
    +
    43 __device__ short4 __ldg(const short4* );
    +
    44 __device__ unsigned short __ldg(const unsigned short* );
    +
    45 
    +
    46 __device__ int __ldg(const int* );
    +
    47 __device__ int1 __ldg(const int1* );
    +
    48 __device__ int2 __ldg(const int2* );
    +
    49 __device__ int3 __ldg(const int3* );
    +
    50 __device__ int4 __ldg(const int4* );
    +
    51 __device__ unsigned int __ldg(const unsigned int* );
    +
    52 
    +
    53 
    +
    54 __device__ long __ldg(const long* );
    +
    55 __device__ long1 __ldg(const long1* );
    +
    56 __device__ long2 __ldg(const long2* );
    +
    57 __device__ long3 __ldg(const long3* );
    +
    58 __device__ long4 __ldg(const long4* );
    +
    59 __device__ unsigned long __ldg(const unsigned long* );
    +
    60 
    +
    61 __device__ long long __ldg(const long long* );
    +
    62 __device__ longlong1 __ldg(const longlong1* );
    +
    63 __device__ longlong2 __ldg(const longlong2* );
    +
    64 __device__ longlong3 __ldg(const longlong3* );
    +
    65 __device__ longlong4 __ldg(const longlong4* );
    +
    66 __device__ unsigned long long __ldg(const unsigned long long* );
    +
    67 
    +
    68 __device__ uchar1 __ldg(const uchar1* );
    +
    69 __device__ uchar2 __ldg(const uchar2* );
    +
    70 __device__ uchar3 __ldg(const uchar3* );
    +
    71 __device__ uchar4 __ldg(const uchar4* );
    +
    72 
    +
    73 __device__ ushort1 __ldg(const ushort1* );
    +
    74 __device__ ushort2 __ldg(const ushort2* );
    +
    75 __device__ ushort3 __ldg(const ushort3* );
    +
    76 __device__ ushort4 __ldg(const ushort4* );
    +
    77 
    +
    78 __device__ uint1 __ldg(const uint1* );
    +
    79 __device__ uint2 __ldg(const uint2* );
    +
    80 __device__ uint3 __ldg(const uint3* );
    +
    81 __device__ uint4 __ldg(const uint4* );
    +
    82 
    +
    83 __device__ ulonglong1 __ldg(const ulonglong1* );
    +
    84 __device__ ulonglong2 __ldg(const ulonglong2* );
    +
    85 __device__ ulonglong3 __ldg(const ulonglong3* );
    +
    86 __device__ ulonglong4 __ldg(const ulonglong4* );
    +
    87 
    +
    88 __device__ float __ldg(const float* );
    +
    89 __device__ float1 __ldg(const float1* );
    +
    90 __device__ float2 __ldg(const float2* );
    +
    91 __device__ float3 __ldg(const float3* );
    +
    92 __device__ float4 __ldg(const float4* );
    +
    93 
    +
    94 __device__ double __ldg(const double* );
    +
    95 __device__ double1 __ldg(const double1* );
    +
    96 __device__ double2 __ldg(const double2* );
    +
    97 __device__ double3 __ldg(const double3* );
    +
    98 __device__ double4 __ldg(const double4* );
    +
    99 
    +
    100 #endif // __hcc_workweek__
    +
    101 
    +
    102 #endif // __HCC__
    +
    103 
    +
    104 #endif // HIP_LDG_H
    +
    105 
    +
    TODO-doc.
    +
    Defines the different newt vector types for HIP runtime.
    +
    + + + + diff --git a/projects/hip/docs/RuntimeAPI/html/hip__runtime_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__runtime_8h_source.html index 4776bfead4..67ae4a53ac 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__runtime_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__runtime_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hip_runtime.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hip_runtime.h Source File @@ -146,7 +146,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/hip__runtime__api_8h_source.html b/projects/hip/docs/RuntimeAPI/html/hip__runtime__api_8h_source.html index 604563cbe3..f7a9ee1e49 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__runtime__api_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__runtime__api_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hip_runtime_api.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hip_runtime_api.h Source File @@ -226,131 +226,134 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); - -
    163 } hipError_t;
    -
    164 
    -
    165 /*
    -
    166  * @brief hipDeviceAttribute_t
    -
    167  * @enum
    -
    168  * @ingroup Enumerations
    -
    169  */
    -
    170 typedef enum hipDeviceAttribute_t {
    - - - - - - - - - - - - - - - - - - - - - - - - - - -
    197 
    -
    202 #if defined(__HIP_PLATFORM_HCC__) && !defined (__HIP_PLATFORM_NVCC__)
    -
    203 #include "hip/hcc_detail/hip_runtime_api.h"
    -
    204 #elif defined(__HIP_PLATFORM_NVCC__) && !defined (__HIP_PLATFORM_HCC__)
    -
    205 #include "hip/nvcc_detail/hip_runtime_api.h"
    -
    206 #else
    -
    207 #error("Must define exactly one of __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__");
    -
    208 #endif
    -
    209 
    -
    210 
    -
    218 #ifdef __cplusplus
    -
    219 template<class T>
    -
    220 static inline hipError_t hipMalloc ( T** devPtr, size_t size)
    -
    221 {
    -
    222  return hipMalloc((void**)devPtr, size);
    -
    223 }
    -
    224 
    -
    225 // Provide an override to automatically typecast the pointer type from void**, and also provide a default for the flags.
    -
    226 template<class T>
    -
    227 static inline hipError_t hipHostMalloc( T** ptr, size_t size, unsigned int flags = hipHostMallocDefault)
    -
    228 {
    -
    229  return hipHostMalloc((void**)ptr, size, flags);
    -
    230 }
    -
    231 #endif
    + + + +
    165 } hipError_t;
    +
    166 
    +
    167 /*
    +
    168  * @brief hipDeviceAttribute_t
    +
    169  * @enum
    +
    170  * @ingroup Enumerations
    +
    171  */
    +
    172 typedef enum hipDeviceAttribute_t {
    + + + + + + + + + + + + + + + + + + + + + + + + + + +
    199 
    +
    204 #if defined(__HIP_PLATFORM_HCC__) && !defined (__HIP_PLATFORM_NVCC__)
    +
    205 #include "hip/hcc_detail/hip_runtime_api.h"
    +
    206 #elif defined(__HIP_PLATFORM_NVCC__) && !defined (__HIP_PLATFORM_HCC__)
    +
    207 #include "hip/nvcc_detail/hip_runtime_api.h"
    +
    208 #else
    +
    209 #error("Must define exactly one of __HIP_PLATFORM_HCC__ or __HIP_PLATFORM_NVCC__");
    +
    210 #endif
    +
    211 
    +
    212 
    +
    220 #ifdef __cplusplus
    +
    221 template<class T>
    +
    222 static inline hipError_t hipMalloc ( T** devPtr, size_t size)
    +
    223 {
    +
    224  return hipMalloc((void**)devPtr, size);
    +
    225 }
    +
    226 
    +
    227 // Provide an override to automatically typecast the pointer type from void**, and also provide a default for the flags.
    +
    228 template<class T>
    +
    229 static inline hipError_t hipHostMalloc( T** ptr, size_t size, unsigned int flags = hipHostMallocDefault)
    +
    230 {
    +
    231  return hipHostMalloc((void**)ptr, size, flags);
    +
    232 }
    +
    233 #endif
    Call to hipGetDeviceCount returned 0 devices.
    Definition: hip_runtime_api.h:155
    size_t totalConstMem
    Size of shared memory region (in bytes).
    Definition: hip_runtime_api.h:86
    -
    Maximum Shared Memory Per Multiprocessor.
    Definition: hip_runtime_api.h:194
    -
    Maximum x-dimension of a block.
    Definition: hip_runtime_api.h:172
    -
    Maximum x-dimension of a grid.
    Definition: hip_runtime_api.h:175
    +
    Maximum Shared Memory Per Multiprocessor.
    Definition: hip_runtime_api.h:196
    +
    Maximum x-dimension of a block.
    Definition: hip_runtime_api.h:174
    +
    Maximum x-dimension of a grid.
    Definition: hip_runtime_api.h:177
    Peer access was already enabled from the current device.
    Definition: hip_runtime_api.h:159
    Unknown symbol.
    Definition: hip_runtime_api.h:146
    HSA runtime memory call returned error. Typically not seen in production systems. ...
    Definition: hip_runtime_api.h:160
    -
    Global memory bus width in bits.
    Definition: hip_runtime_api.h:184
    +
    Global memory bus width in bits.
    Definition: hip_runtime_api.h:186
    Successful completion.
    Definition: hip_runtime_api.h:143
    int minor
    Minor compute capability. On HCC, this is an approximation and features may differ from CUDA CC...
    Definition: hip_runtime_api.h:88
    int canMapHostMemory
    Check whether HIP can map host memory.
    Definition: hip_runtime_api.h:100
    -
    Maximum number of 32-bit registers available to a thread block. This number is shared by all thread b...
    Definition: hip_runtime_api.h:181
    +
    Maximum number of 32-bit registers available to a thread block. This number is shared by all thread b...
    Definition: hip_runtime_api.h:183
    int regsPerBlock
    Registers per block.
    Definition: hip_runtime_api.h:78
    -
    Size of L2 cache in bytes. 0 if the device doesn't have L2 cache.
    Definition: hip_runtime_api.h:187
    +
    Size of L2 cache in bytes. 0 if the device doesn't have L2 cache.
    Definition: hip_runtime_api.h:189
    #define hipHostMallocDefault
    Flags that can be used with hipHostMalloc.
    Definition: hip_runtime_api.h:69
    HSA runtime call other than memory returned error. Typically not seen in production systems...
    Definition: hip_runtime_api.h:161
    int isMultiGpuBoard
    1 if device is on a multi-GPU board, 0 if not.
    Definition: hip_runtime_api.h:99
    DeviceID must be in range 0...#compute-devices.
    Definition: hip_runtime_api.h:150
    -
    Peak clock frequency in kilohertz.
    Definition: hip_runtime_api.h:182
    +
    Peak clock frequency in kilohertz.
    Definition: hip_runtime_api.h:184
    Definition: hip_runtime_api.h:117
    int clockRate
    Max clock frequency of the multiProcessors in khz.
    Definition: hip_runtime_api.h:83
    -
    Maximum z-dimension of a grid.
    Definition: hip_runtime_api.h:177
    +
    Maximum z-dimension of a grid.
    Definition: hip_runtime_api.h:179
    Out of resources error.
    Definition: hip_runtime_api.h:147
    -
    Minor compute capability version number.
    Definition: hip_runtime_api.h:190
    -
    Maximum shared memory available per block in bytes.
    Definition: hip_runtime_api.h:178
    +
    Minor compute capability version number.
    Definition: hip_runtime_api.h:192
    +
    Maximum shared memory available per block in bytes.
    Definition: hip_runtime_api.h:180
    int pciBusID
    PCI Bus ID.
    Definition: hip_runtime_api.h:96
    -
    Maximum y-dimension of a grid.
    Definition: hip_runtime_api.h:176
    -
    Multiple GPU devices.
    Definition: hip_runtime_api.h:195
    +
    Maximum y-dimension of a grid.
    Definition: hip_runtime_api.h:178
    +
    Multiple GPU devices.
    Definition: hip_runtime_api.h:197
    Unknown error.
    Definition: hip_runtime_api.h:157
    int maxThreadsPerBlock
    Max work items per work group or workgroup max size.
    Definition: hip_runtime_api.h:80
    -
    Maximum y-dimension of a block.
    Definition: hip_runtime_api.h:173
    -
    hipError_t hipHostMalloc(void **ptr, size_t size, unsigned int flags)
    Allocate device accessible page locked host memory.
    Definition: hip_memory.cpp:152
    +
    Maximum y-dimension of a block.
    Definition: hip_runtime_api.h:175
    +
    hipError_t hipHostMalloc(void **ptr, size_t size, unsigned int flags)
    Allocate device accessible page locked host memory.
    Definition: hip_memory.cpp:148
    size_t sharedMemPerBlock
    Size of shared memory region (in bytes).
    Definition: hip_runtime_api.h:77
    int maxThreadsPerMultiProcessor
    Maximum resident threads per multi-processor.
    Definition: hip_runtime_api.h:91
    +
    Produced when trying to lock a page-locked memory.
    Definition: hip_runtime_api.h:162
    int l2CacheSize
    L2 cache size.
    Definition: hip_runtime_api.h:90
    -
    hipDeviceAttribute_t
    Definition: hip_runtime_api.h:170
    -
    Major compute capability version number.
    Definition: hip_runtime_api.h:189
    +
    hipDeviceAttribute_t
    Definition: hip_runtime_api.h:172
    +
    Major compute capability version number.
    Definition: hip_runtime_api.h:191
    Peer access was never enabled from the current device.
    Definition: hip_runtime_api.h:158
    -
    Maximum number of threads per block.
    Definition: hip_runtime_api.h:171
    +
    Maximum number of threads per block.
    Definition: hip_runtime_api.h:173
    Resource handle (hipEvent_t or hipStream_t) invalid.
    Definition: hip_runtime_api.h:149
    Memory allocation error.
    Definition: hip_runtime_api.h:144
    hipDeviceArch_t arch
    Architectural feature flags. New for HIP.
    Definition: hip_runtime_api.h:94
    int maxGridSize[3]
    Max grid dimensions (XYZ).
    Definition: hip_runtime_api.h:82
    int computeMode
    Compute mode.
    Definition: hip_runtime_api.h:92
    -
    Maximum z-dimension of a block.
    Definition: hip_runtime_api.h:174
    -
    PCI Bus ID.
    Definition: hip_runtime_api.h:192
    +
    Maximum z-dimension of a block.
    Definition: hip_runtime_api.h:176
    +
    PCI Bus ID.
    Definition: hip_runtime_api.h:194
    Invalid memory copy direction.
    Definition: hip_runtime_api.h:151
    -
    Marker that more error codes are needed.
    Definition: hip_runtime_api.h:162
    -
    Warp size in threads.
    Definition: hip_runtime_api.h:180
    +
    Marker that more error codes are needed.
    Definition: hip_runtime_api.h:164
    +
    Warp size in threads.
    Definition: hip_runtime_api.h:182
    int major
    Major compute capability. On HCC, this is an approximation and features may differ from CUDA CC...
    Definition: hip_runtime_api.h:87
    -
    Peak memory clock frequency in kilohertz.
    Definition: hip_runtime_api.h:183
    -
    Maximum resident threads per multiprocessor.
    Definition: hip_runtime_api.h:188
    +
    Peak memory clock frequency in kilohertz.
    Definition: hip_runtime_api.h:185
    +
    Maximum resident threads per multiprocessor.
    Definition: hip_runtime_api.h:190
    hipError_t
    Definition: hip_runtime_api.h:142
    int clockInstructionRate
    Frequency in khz of the timer used by the device-side "clock*" instructions. New for HIP...
    Definition: hip_runtime_api.h:93
    -
    Constant memory size in bytes.
    Definition: hip_runtime_api.h:179
    +
    Constant memory size in bytes.
    Definition: hip_runtime_api.h:181
    Memory free error.
    Definition: hip_runtime_api.h:145
    int warpSize
    Warp size.
    Definition: hip_runtime_api.h:79
    int concurrentKernels
    Device can possibly execute multiple kernels concurrently.
    Definition: hip_runtime_api.h:95
    size_t totalGlobalMem
    Size of global memory region (in bytes).
    Definition: hip_runtime_api.h:76
    Invalid Device Pointer.
    Definition: hip_runtime_api.h:152
    -
    hipError_t hipMalloc(void **ptr, size_t size)
    Allocate memory on the default accelerator.
    Definition: hip_memory.cpp:117
    -
    Compute mode that device is currently in.
    Definition: hip_runtime_api.h:186
    -
    PCI Device ID.
    Definition: hip_runtime_api.h:193
    +
    hipError_t hipMalloc(void **ptr, size_t size)
    Allocate memory on the default accelerator.
    Definition: hip_memory.cpp:116
    +
    Compute mode that device is currently in.
    Definition: hip_runtime_api.h:188
    +
    PCI Device ID.
    Definition: hip_runtime_api.h:195
    int maxThreadsDim[3]
    Max number of threads in each dimension (XYZ) of a block.
    Definition: hip_runtime_api.h:81
    -
    Number of multiprocessors on the device.
    Definition: hip_runtime_api.h:185
    +
    Number of multiprocessors on the device.
    Definition: hip_runtime_api.h:187
    int memoryBusWidth
    Global memory bus width in bits.
    Definition: hip_runtime_api.h:85
    One or more of the parameters passed to the API call is NULL or not in an acceptable range...
    Definition: hip_runtime_api.h:148
    Definition: hip_runtime_api.h:74
    @@ -358,15 +361,16 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search');
    size_t maxSharedMemoryPerMultiProcessor
    Maximum Shared Memory Per Multiprocessor.
    Definition: hip_runtime_api.h:98
    int pciDeviceID
    PCI Device ID.
    Definition: hip_runtime_api.h:97
    char name[256]
    Device name.
    Definition: hip_runtime_api.h:75
    +
    Produced when trying to unlock a non-page-locked memory.
    Definition: hip_runtime_api.h:163
    Definition: hip_runtime_api.h:35
    int memoryClockRate
    Max global memory clock frequency in khz.
    Definition: hip_runtime_api.h:84
    TODO comment from hipErrorInitializationError.
    Definition: hip_runtime_api.h:153
    -
    Device can possibly execute multiple kernels concurrently.
    Definition: hip_runtime_api.h:191
    +
    Device can possibly execute multiple kernels concurrently.
    Definition: hip_runtime_api.h:193
    int multiProcessorCount
    Number of multi-processors (compute units).
    Definition: hip_runtime_api.h:89
    diff --git a/projects/hip/docs/RuntimeAPI/html/hip__texture_8h.html b/projects/hip/docs/RuntimeAPI/html/hip__texture_8h.html index ed84404545..13af97ee44 100644 --- a/projects/hip/docs/RuntimeAPI/html/hip__texture_8h.html +++ b/projects/hip/docs/RuntimeAPI/html/hip__texture_8h.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_texture.h File Reference +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_texture.h File Reference @@ -199,7 +199,7 @@ template<class T , int dim, enum hipTextureReadMode readMode>
    diff --git a/projects/hip/docs/RuntimeAPI/html/structLockedBase.html b/projects/hip/docs/RuntimeAPI/html/structLockedBase.html index da945347a1..1a7c52b1ba 100644 --- a/projects/hip/docs/RuntimeAPI/html/structLockedBase.html +++ b/projects/hip/docs/RuntimeAPI/html/structLockedBase.html @@ -118,12 +118,12 @@ MUTEX_TYPE _mutex  
    The documentation for this struct was generated from the following file:
      -
    • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
    • +
    • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
    diff --git a/projects/hip/docs/RuntimeAPI/html/structStagingBuffer-members.html b/projects/hip/docs/RuntimeAPI/html/structStagingBuffer-members.html index 9c47a5d15e..92a9c25c65 100644 --- a/projects/hip/docs/RuntimeAPI/html/structStagingBuffer-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structStagingBuffer-members.html @@ -95,12 +95,13 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); CopyDeviceToHostPinInPlace(void *dst, const void *src, size_t sizeBytes, hsa_signal_t *waitFor) (defined in StagingBuffer)StagingBuffer CopyHostToDevice(void *dst, const void *src, size_t sizeBytes, hsa_signal_t *waitFor) (defined in StagingBuffer)StagingBuffer CopyHostToDevicePinInPlace(void *dst, const void *src, size_t sizeBytes, hsa_signal_t *waitFor) (defined in StagingBuffer)StagingBuffer - StagingBuffer(hsa_agent_t hsaAgent, hsa_region_t systemRegion, size_t bufferSize, int numBuffers) (defined in StagingBuffer)StagingBuffer - ~StagingBuffer() (defined in StagingBuffer)StagingBuffer + CopyPeerToPeer(void *dst, hsa_agent_t dstAgent, const void *src, hsa_agent_t srcAgent, size_t sizeBytes, hsa_signal_t *waitFor) (defined in StagingBuffer)StagingBuffer + StagingBuffer(hsa_agent_t hsaAgent, hsa_region_t systemRegion, size_t bufferSize, int numBuffers) (defined in StagingBuffer)StagingBuffer + ~StagingBuffer() (defined in StagingBuffer)StagingBuffer diff --git a/projects/hip/docs/RuntimeAPI/html/structStagingBuffer.html b/projects/hip/docs/RuntimeAPI/html/structStagingBuffer.html index 64a05ded7e..9256e39ddd 100644 --- a/projects/hip/docs/RuntimeAPI/html/structStagingBuffer.html +++ b/projects/hip/docs/RuntimeAPI/html/structStagingBuffer.html @@ -109,6 +109,9 @@ void CopyDeviceToHost void CopyDeviceToHostPinInPlace (void *dst, const void *src, size_t sizeBytes, hsa_signal_t *waitFor)   + +void CopyPeerToPeer (void *dst, hsa_agent_t dstAgent, const void *src, hsa_agent_t srcAgent, size_t sizeBytes, hsa_signal_t *waitFor) +  @@ -117,13 +120,13 @@ static const int 

    Static Public Attributes

    _max_buff
     

    The documentation for this struct was generated from the following files:
      -
    • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/staging_buffer.h
    • -
    • /home/mangupta/hip_git/release_0.84.00/src/staging_buffer.cpp
    • +
    • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/staging_buffer.h
    • +
    • /home/mangupta/git/hip/release_0.86.00/src/staging_buffer.cpp
    diff --git a/projects/hip/docs/RuntimeAPI/html/structdim3-members.html b/projects/hip/docs/RuntimeAPI/html/structdim3-members.html index ac9ae244d8..ccd7ea9126 100644 --- a/projects/hip/docs/RuntimeAPI/html/structdim3-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structdim3-members.html @@ -96,7 +96,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structdim3.html b/projects/hip/docs/RuntimeAPI/html/structdim3.html index c6fbec3499..9df96ec8f7 100644 --- a/projects/hip/docs/RuntimeAPI/html/structdim3.html +++ b/projects/hip/docs/RuntimeAPI/html/structdim3.html @@ -111,12 +111,12 @@ uint32_t 

    Detailed Description

    Struct for data in 3D


    The documentation for this struct was generated from the following file: diff --git a/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc-members.html b/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc-members.html index bbb75cc17b..7c591d33b6 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc-members.html @@ -94,7 +94,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc.html b/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc.html index fec2b62496..fecb5d435f 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipChannelFormatDesc.html @@ -98,12 +98,12 @@ int _dummy  
    The documentation for this struct was generated from the following file:
      -
    • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_texture.h
    • +
    • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_texture.h
    diff --git a/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t-members.html b/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t-members.html index 49f3d1cfd3..be609f635d 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t-members.html @@ -110,7 +110,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t.html b/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t.html index d8c3e5dc46..a3c90860bd 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipDeviceArch__t.html @@ -163,12 +163,12 @@ unsigned  
    The documentation for this struct was generated from the following file:
    diff --git a/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t-members.html b/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t-members.html index 8defe33756..340d20dc2b 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t-members.html @@ -119,7 +119,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t.html b/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t.html index d522b400a9..946504549a 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipDeviceProp__t.html @@ -203,12 +203,12 @@ int 

    Detailed Description

    hipDeviceProp


    The documentation for this struct was generated from the following file: diff --git a/projects/hip/docs/RuntimeAPI/html/structhipEvent__t-members.html b/projects/hip/docs/RuntimeAPI/html/structhipEvent__t-members.html index b0771e38af..24be66f609 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipEvent__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipEvent__t-members.html @@ -94,7 +94,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structhipEvent__t.html b/projects/hip/docs/RuntimeAPI/html/structhipEvent__t.html index 4b11f341ef..8d14386695 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipEvent__t.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipEvent__t.html @@ -98,12 +98,12 @@ struct ihipEvent_t *   
    The documentation for this struct was generated from the following file: diff --git a/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t-members.html b/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t-members.html index d1336c466e..2806a775ca 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t-members.html @@ -99,7 +99,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t.html b/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t.html index 5001dcd04b..c240cfa1e1 100644 --- a/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t.html +++ b/projects/hip/docs/RuntimeAPI/html/structhipPointerAttribute__t.html @@ -117,12 +117,12 @@ unsigned allocationFlags

    Detailed Description

    Pointer attributes


    The documentation for this struct was generated from the following file: diff --git a/projects/hip/docs/RuntimeAPI/html/structihipEvent__t-members.html b/projects/hip/docs/RuntimeAPI/html/structihipEvent__t-members.html index 39b6ea3e46..77aa124649 100644 --- a/projects/hip/docs/RuntimeAPI/html/structihipEvent__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structihipEvent__t-members.html @@ -99,7 +99,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structihipEvent__t.html b/projects/hip/docs/RuntimeAPI/html/structihipEvent__t.html index ea9db7510a..cfa04779fe 100644 --- a/projects/hip/docs/RuntimeAPI/html/structihipEvent__t.html +++ b/projects/hip/docs/RuntimeAPI/html/structihipEvent__t.html @@ -113,12 +113,12 @@ SIGSEQNUM _copy_seq_id  
    The documentation for this struct was generated from the following file:
      -
    • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
    • +
    • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
    diff --git a/projects/hip/docs/RuntimeAPI/html/structihipSignal__t-members.html b/projects/hip/docs/RuntimeAPI/html/structihipSignal__t-members.html index 83c49b4ee4..c0da64c56c 100644 --- a/projects/hip/docs/RuntimeAPI/html/structihipSignal__t-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structihipSignal__t-members.html @@ -99,7 +99,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structihipSignal__t.html b/projects/hip/docs/RuntimeAPI/html/structihipSignal__t.html index 9c17d1f040..e0f59774e9 100644 --- a/projects/hip/docs/RuntimeAPI/html/structihipSignal__t.html +++ b/projects/hip/docs/RuntimeAPI/html/structihipSignal__t.html @@ -111,13 +111,13 @@ SIGSEQNUM _sig_id  
    The documentation for this struct was generated from the following files:
      -
    • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_hcc.h
    • -
    • /home/mangupta/hip_git/release_0.84.00/src/hip_hcc.cpp
    • +
    • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_hcc.h
    • +
    • /home/mangupta/git/hip/release_0.86.00/src/hip_hcc.cpp
    diff --git a/projects/hip/docs/RuntimeAPI/html/structtextureReference-members.html b/projects/hip/docs/RuntimeAPI/html/structtextureReference-members.html index b180867696..01d24b468b 100644 --- a/projects/hip/docs/RuntimeAPI/html/structtextureReference-members.html +++ b/projects/hip/docs/RuntimeAPI/html/structtextureReference-members.html @@ -96,7 +96,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); diff --git a/projects/hip/docs/RuntimeAPI/html/structtextureReference.html b/projects/hip/docs/RuntimeAPI/html/structtextureReference.html index 546733072a..0f77af701a 100644 --- a/projects/hip/docs/RuntimeAPI/html/structtextureReference.html +++ b/projects/hip/docs/RuntimeAPI/html/structtextureReference.html @@ -104,12 +104,12 @@ bool normalized 
    The documentation for this struct was generated from the following file:
      -
    • /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/hip_texture.h
    • +
    • /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/hip_texture.h
    diff --git a/projects/hip/docs/RuntimeAPI/html/trace__helper_8h_source.html b/projects/hip/docs/RuntimeAPI/html/trace__helper_8h_source.html index 825ead43b9..a4cfb60299 100644 --- a/projects/hip/docs/RuntimeAPI/html/trace__helper_8h_source.html +++ b/projects/hip/docs/RuntimeAPI/html/trace__helper_8h_source.html @@ -4,7 +4,7 @@ -HIP: Heterogenous-computing Interface for Portability: /home/mangupta/hip_git/release_0.84.00/include/hcc_detail/trace_helper.h Source File +HIP: Heterogenous-computing Interface for Portability: /home/mangupta/git/hip/release_0.86.00/include/hcc_detail/trace_helper.h Source File @@ -223,7 +223,7 @@ var searchBox = new SearchBox("searchBox", "search",false,'Search'); From c30b0fe2a6b852c77dd843530488907e68908304 Mon Sep 17 00:00:00 2001 From: Maneesh Gupta Date: Mon, 6 Jun 2016 21:52:13 +0530 Subject: [PATCH 3/4] Updated release notes Change-Id: Ied90c54683dd96ac9fb0c3039a94aea5e4aa11c6 [ROCm/hip commit: b22950661134f18b202161ace7e1f200769c15ee] --- projects/hip/RELEASE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/projects/hip/RELEASE.md b/projects/hip/RELEASE.md index 343e028e9c..d553cda49b 100644 --- a/projects/hip/RELEASE.md +++ b/projects/hip/RELEASE.md @@ -9,7 +9,7 @@ Stay tuned - the work for many of these features is already in-flight. =================================================================================================== Release:0.86.00 -Date: 2016.05.xx +Date: 2016.06.06 - Add clang-hipify : clang-based hipify tool. Improved parsing of source code, and automates creation of hipLaunchParm variable. - Implement memory register / unregister commands (hipHostRegister, hipHostUnregister) From 5531b649aa9f5c91fa407ee7cdefebe0f68563f6 Mon Sep 17 00:00:00 2001 From: Maneesh Gupta Date: Tue, 7 Jun 2016 22:18:18 +0530 Subject: [PATCH 4/4] Fix RELEASE.md [ROCm/hip commit: 100c8c83c1f54e4f78d50c64307b9e7083498da2] --- projects/hip/RELEASE.md | 1 + 1 file changed, 1 insertion(+) diff --git a/projects/hip/RELEASE.md b/projects/hip/RELEASE.md index d553cda49b..6a40d28f61 100644 --- a/projects/hip/RELEASE.md +++ b/projects/hip/RELEASE.md @@ -17,6 +17,7 @@ Date: 2016.06.06 standard C++ libraries (ie std::vectors, std::strings). HIPCC now uses libstdc++ by default on the HCC compilation path. - More samples including gpu-burn, SHOC, nbody, rtm. See [HIP-Examples](https://github.com/GPUOpen-ProfessionalCompute-Tools/HIP-Examples) + =================================================================================================== ## Revision History: