This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code).
[ROCm/clr commit: f909a393ff]
Reverts changes made in #1399. This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API.
[ROCm/clr commit: 66a3c874c8]
Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync.
[ROCm/clr commit: 14b870d1ce]
* Make CAS loops use the TTAS idiom.
* More efficient re-formulation of TTAS.
* Fix typo.
* The typo was not quite a typo
[ROCm/clr commit: 26914ec76e]
* [hip] add support for implicit kernel argument for multi-grid sync
* modified code for calculating the prev_sum
* change the impCoopArg type to size_t
* add memory clean up
* launch init_gws and main kernels into two separate loops
[ROCm/clr commit: 93c688a0c9]
[Algorithm]
[Release]
If CMAKE_INSTALL_PREFIX is set by the user:
If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise CMAKE_INSTALL_PREFIX is used unchanged.
If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise use PROJECT_BINARY_DIR/bin for installation.
[Debug]
If CMAKE_INSTALL_PREFIX is set by the user:
CMAKE_INSTALL_PREFIX is used unchanged.
If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
use CMAKE_CURRENT_SOURCE_DIR/bin for installation.
Standalone build left unchanged: CMAKE_INSTALL_PREFIX is used if set.
[ROCm/clr commit: 2435567e70]
By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang.
[Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified.
Affects cub_03.cu test, which has uncalled global template function.
[ROCm/clr commit: 7ab06b3892]
[Reason] To support maximum CUDA features in offline tests
+ Add defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 restriction for atomicAdd on doubles in atomics.cu.
So if LLVM < 7 and --cuda-gpu-arch doesn't work, __CUDA_ARCH__ is unset too (350 by default in clang);
if LLVM >= 7 --cuda-gpu-arch is used and __CUDA_ARCH__ is set based on it.
[ROCm/clr commit: 3233a845f6]
[Reason] To support maximum CUDA features in offline tests
+ Add CUDA_VERSION >= 800 restriction for atomics.cu
[TODO] Find a way to use or exclude atomicAdd for doubles if LLVM < 7, because
LLVM 6.0.1 and older do not use --cuda-gpu-arch in clang's Driver code at all (option is only declared)
[ROCm/clr commit: 9fc7afa738]
Fixes SWDEV-203394.
Currently in runTest() returns true, even if the texture reference copy does not happen. Using the existing testResult Flag to return from runTest().
[ROCm/clr commit: 9d571e3c9e]
* Temporary add -D_OPENMP to clang options in hipcc to allow using CMake OpenMP detection with hip-clang (until updated CMake version is available).
[ROCm/clr commit: 9773f94c71]
* guard against division by zero for no VGPR usage (e.g., in an empty kernel)
* fix bracket format
* clean up parenthesis
[ROCm/clr commit: d16963c9d5]