Reverting #1673, #1697 and #1707.
Support for hipMemcpyWithStream and memcpy optimizations, will be brought in again once issues seen with these are resolved independently.
This will fix issue #1621. It also adds tests for is_callable with c++11, c++14, and c++17.
The fallback implementation was completely broken so I rewrote it so it pass the tests as well. This should be used instead of PR #1631.
* Updated hipEnvVarDriver to work with Windows
* Cleaned up a bit of code
* Fixed a part where putenv was used for both win and linux
* Defines moved to test_common.h and cleaned up code
* Cleaned up some macro defines and used const char instead
* Got rid of some excess commenting
* directory paths are unconditional
* Cleaned some duplicate code, and variables are now declared and defined together
* Add support for extended launch syntax.
* Add unit test.
* Fix typo
* hipExtLaunchKernelGGL lives in hip_ext.h
Change-Id: Ice32dab0d43475fda65c6a910c11416871a8f2ff
* [dtest] remove redundant include from hipModuleGetGlobal dtest
Texture 2D image mapping for pitched arrays:
github issue: Texture Object's Buffer seems to be Misaligned #886
JIRA ticket: SWDEV-199313
SWDEV-151670 : Fixed issue with 3D texture with 4 components
SWDEV-151671 : Issue with 2D layered texture with 4 components
+ Add a corresponding test kernel_launch_01.cu
+ Add isBefore() check to avoid crash on Replacement with negative length
TODO:
+ Compatibility with former LLVM versions
+ More complicated kernel launch tests
Reverts changes made in #1399. This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API.
By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang.
[Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified.
Affects cub_03.cu test, which has uncalled global template function.
[Reason] To support maximum CUDA features in offline tests
+ Add defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 restriction for atomicAdd on doubles in atomics.cu.
So if LLVM < 7 and --cuda-gpu-arch doesn't work, __CUDA_ARCH__ is unset too (350 by default in clang);
if LLVM >= 7 --cuda-gpu-arch is used and __CUDA_ARCH__ is set based on it.
[Reason] To support maximum CUDA features in offline tests
+ Add CUDA_VERSION >= 800 restriction for atomics.cu
[TODO] Find a way to use or exclude atomicAdd for doubles if LLVM < 7, because
LLVM 6.0.1 and older do not use --cuda-gpu-arch in clang's Driver code at all (option is only declared)