+ Add a corresponding matcher cudaDeviceFuncCall to match only (__device__ or __global__) and not __host__ functions.
+ Add a corresponding device functions mapping:
only unsupported are listed, cause supported are exactly the same as of CUDA and do not need transformation;
make FindAndReplace for device functions separated from host API calls.
+ Add a test to distinguish device functions and user-defined.
[ROCm/hip commit: 6602fadc16]
The math library equivalents between CUDA-HIP are broken. This is a key feature for converting to AMD hardware. This fix corrects the broken link and moves the library equivalents to sit under the "Porting a New Cuda Project" header.
[ROCm/hip commit: 5a6eafcbf1]
The implementation for OCKL AS was recently removed from the device
library since that feature is now superseded by hostcall.
[ROCm/hip commit: 70023c9075]
Currently hipcc uses -O3 for hip-clang by default but uses -O0 if -g is used. This
causes surprise for users since -g should not affect default opt level.
[ROCm/hip commit: b046ec698b]
* occupancy.cpp with Makefile
* occupancy sample changes according tothe comments
* Changes according to the review comments
* Occupancy Sample Changes
* Changes according to review comments
[ROCm/hip commit: f807cc1a7b]
* first cut of the header implementation of cooperative group feature
* add diclarations for device library functions
* fixed various compile time issues in the CG headers
* enabled copy construction and copy assignment
* fixed a minor bug related to conditional compilation macro
* fixed few more CG constructor issues and added a unit testcase
* fixed typo
* extended unit testcase
* compute size of partitioned CG from mask
* bit of code refactoring
* removed boilerplate code
* fixed few of the review comments by Brian
* Changes to the sigantures of few grid and multi-grid related OCKL functions
* changes to declarations of OCKL functions related to CG feature
* removed all the block level support as it is not planned for 2.9
* Have taken care of review comments by Brian
* Have taken care of review comments by Brian
* removed unused functions which were initially intended to use in block level cg support
[ROCm/hip commit: d75dc4eb29]
+ Start to translate preprocessor's false conditional blocks too:
based on clang's https://reviews.llvm.org/D66597;
available only starting from LLVM 10.0 or trunk.
+ Option -skip-excluded-preprocessor-conditional-blocks for skipping excluded conditional blocks:
the default behavior for hipify-clang built with LLVM < 10.0;
false by default for hipify-clang built with LLVM 10 or trunk.
+ Add 4 preprocessor unit tests, 2 of which are LLVM 10.0 only
+ Update couple of existing tests by setting -skip-excluded-preprocessor-conditional-blocks option:
update lit testing accordingly
[ROCm/hip commit: 24be21495d]
1. Fix setting std c++ (11|14)
2. Get rid of WIN32, use MSVC instead
3. Use VERSION_GREATER_EQUAL and VERSION_MAJOR instead of logical expressions
[ROCm/hip commit: eeb4452b23]
- `result_of_t` is defined as the shortcut of
```
template< class T >
using result_of_t = typename result_of<T>::type;
```
[ROCm/hip commit: 63e47e525b]
[Reason] LLVM became c++14 last week due to the following change:
37508d3dd94b0154861a90b1909d17b01400df99
Replace llvm::integer_sequence and friends with the C++14 standard version
[ROCm/hip commit: e1d4f8510a]
* [hip] add initial implementation for hipLaunchCooperativeKernel API
* [hip] use total number of work groups to initialize the GWS resource
* [hip] use only one argument for init_gws kernel
* [hip] use the device associated with the stream for checking the device properties
[ROCm/hip commit: 5066700ace]