Rahul Garg
aeb7cebbad
Merge pull request #1515 from ansurya/tex_unbind_issue_fix
...
Fix undefined ref to hipUnbindTexture for texture types
2019-10-30 17:54:15 -07:00
Evgeny Mankov
961bc5737e
Merge pull request #1593 from emankov/doc
...
[HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()
2019-10-30 22:10:05 +03:00
Rahul Garg
b94f5bd667
Merge pull request #1607 from mhbliao/hliao/master/missing.api.hip.clang
...
[HIP] Correct headers and add missing function templates for hip-clang.
2019-10-30 07:48:57 -07:00
Michael LIAO
61bc68a5f4
[HIP] Correct headers and add missing function templates for hip-clang.
...
- Fix 2 runtime API prototypes
`hipOccupancyMaxActiveBlocksPerMultiprocessor` and
`hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- Add missing function templates of them in hip-clang.
2019-10-29 22:00:11 -04:00
Rahul Garg
9840cdac99
Merge pull request #1602 from ROCm-Developer-Tools/revert-1560-satyanveshd/hipoccupy
...
Revert "Cooperative groups match with cuda SWDEV-205006"
2019-10-29 16:54:36 -07:00
Evgeny Mankov
daab61e8e8
Merge pull request #1604 from emankov/hipify
...
[HIPIFY][#1603 ] Fix
2019-10-29 22:12:39 +03:00
Evgeny Mankov
050fdad7b7
[HIPIFY][ #1603 ] Fix
2019-10-29 22:10:36 +03:00
Rahul Garg
27221bc823
Revert "Fix occupany APIs ( #1560 )"
...
This reverts commit 6c5fbf9b4a .
2019-10-29 11:41:08 -07:00
Evgeny Mankov
8a7e6fb747
Merge pull request #1601 from emankov/hipify
...
[HIPIFY][Linux] Rollback --cuda-compile-host-device on Linux
2019-10-29 20:55:29 +03:00
Evgeny Mankov
dd2243f2fa
[HIPIFY][Linux] Rollback --cuda-compile-host-device on Linux
...
[Reason] It doesn't work with LLVM 9 and higher; Windows is fine
2019-10-29 20:53:54 +03:00
Evgeny Mankov
99c4a40da1
Merge pull request #1600 from emankov/hipify
...
[HIPIFY] Introduce --cuda-compile-host-device for LLVM >= 9
2019-10-29 19:47:15 +03:00
Evgeny Mankov
411b18a124
[HIPIFY] Introduce --cuda-compile-host-device for LLVM >= 9
...
* LLVM < 9 continues using --cuda-host-only
2019-10-29 19:42:53 +03:00
Evgeny Mankov
50df94be1e
Merge pull request #1599 from emankov/hipify
...
[HIPIFY] cudaMemcpy2DFromArray(Async) support
2019-10-29 19:14:00 +03:00
Evgeny Mankov
5dd00bdf52
[HIPIFY] cudaMemcpy2DFromArray(Async) support
2019-10-29 19:12:42 +03:00
Evgeny Mankov
3921ea9057
Merge pull request #1594 from emankov/HIP
...
[HIP][doc] Fix typo: AMD-clang -> HIP-clang
2019-10-28 23:22:57 +03:00
Evgeny Mankov
3df22b2fde
[HIP][doc] NVIDIA-nvcc -> HIP-nvcc
2019-10-28 22:46:33 +03:00
Evgeny Mankov
d312bce79d
[HIP][doc] AMD-hcc -> HIP-hcc
2019-10-28 21:41:12 +03:00
Evgeny Mankov
6284b041e5
[HIP][doc] Fix typo: AMD-clang -> HIP-clang
...
HIP-clang is already used below instead of AMD-clang
2019-10-28 21:19:21 +03:00
Evgeny Mankov
8100e084b8
[HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()
...
[REASON]
Those vars (may) used by cmake in subdirectories (#1571 )
2019-10-28 21:07:00 +03:00
Evgeny Mankov
7f367ff933
Merge pull request #1590 from emankov/doc
...
[HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h
2019-10-25 16:08:22 +03:00
Evgeny Mankov
f68bee02f5
[HIPIFY][tests] Rename the ambiguous call as well
2019-10-25 16:07:31 +03:00
Evgeny Mankov
9529e1d91d
[HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h
2019-10-25 16:04:20 +03:00
Anusha Godavarthy Surya
9332a39838
Merge branch 'master' into tex_unbind_issue_fix
2019-10-25 15:54:25 +05:30
Anusha Godavarthy Surya
ae838f8cee
merge from master
2019-10-25 15:52:09 +05:30
Alex Voicu
40522e2b6a
Add missing operators, fix GCC compilation. ( #1589 )
2019-10-25 15:44:24 +05:30
Alex Voicu
f909a393ff
Fix deadlock, remove old __sync_* use. ( #1584 )
...
This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code).
2019-10-25 15:44:17 +05:30
Rahul Garg
66a3c874c8
[dtest] Fix hipMemset2D test ( #1579 )
...
Reverts changes made in #1399 . This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API.
2019-10-25 15:44:05 +05:30
Rahul Garg
14b870d1ce
Add hipMemcpy2DfromArray ( #1510 )
...
Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync.
2019-10-25 15:43:33 +05:30
Anusha Godavarthy Surya
c0fc5e718c
Merge branch 'master' into tex_unbind_issue_fix
2019-10-25 15:36:55 +05:30
Anusha Godavarthy Surya
b9c8dd8ac6
Fixed CI build failure
2019-10-25 12:21:41 +05:30
Rahul Garg
ff8d3fa446
Update profiling doc ( #1576 )
2019-10-24 17:51:55 +05:30
Jatin Chaudhary
f53b1a1755
Adding New Analyze Target Merging with cppcheck ( #1583 )
2019-10-24 17:46:06 +05:30
Rahul Garg
170c4f0270
Add HIP checks in texture driver sample ( #1581 )
2019-10-24 17:45:51 +05:30
gandryey
f25692b399
Hip vdi profiling header ( #1577 )
...
Add HIP-VDI profiling interface for GPU timing collection.
2019-10-24 17:45:42 +05:30
Alex Voicu
26914ec76e
Make CAS loops use the TTAS idiom. ( #1573 )
...
* Make CAS loops use the TTAS idiom.
* More efficient re-formulation of TTAS.
* Fix typo.
* The typo was not quite a typo
2019-10-24 17:45:20 +05:30
satyanveshd
6c5fbf9b4a
Fix occupany APIs ( #1560 )
...
Addresses SWDEV-205006
2019-10-24 17:44:47 +05:30
searlmc1
15a699688e
Improve performance of v2 arg handling ( #1539 )
...
* Improve performance of v2 arg handling
* Missing change to `std::string`
2019-10-24 17:44:05 +05:30
Alex Voicu
84d5b399f6
Improve scalar access into vector types. ( #1531 )
...
The improvement is based on the ideas here: https://t0rakka.silvrback.com/simd-scalar-accessor . It yields significantly better ISA when the base's .xyzw members are used.
2019-10-24 17:43:49 +05:30
Aryan Salmanpour
93c688a0c9
[hip] add support for implicit kernel argument for multi-grid sync ( #1456 )
...
* [hip] add support for implicit kernel argument for multi-grid sync
* modified code for calculating the prev_sum
* change the impCoopArg type to size_t
* add memory clean up
* launch init_gws and main kernels into two separate loops
2019-10-24 17:43:30 +05:30
Rahul Garg
465581612e
Merge pull request #1559 from vsytch/win10_aligned_alloc
...
Fixes for hipMemcpy_simple on Windows
2019-10-23 13:10:59 -07:00
Evgeny Mankov
80bf79c2f8
Merge pull request #1578 from emankov/doc
...
[HIPIFY][cmake][#1571 ] Take into account building hipify-clang as a part of building HIP while installing
2019-10-23 21:23:05 +03:00
Evgeny Mankov
2435567e70
[HIPIFY][cmake][ #1571 ] Take into account building hipify-clang as a part of building HIP while installing
...
[Algorithm]
[Release]
If CMAKE_INSTALL_PREFIX is set by the user:
If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise CMAKE_INSTALL_PREFIX is used unchanged.
If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise use PROJECT_BINARY_DIR/bin for installation.
[Debug]
If CMAKE_INSTALL_PREFIX is set by the user:
CMAKE_INSTALL_PREFIX is used unchanged.
If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
use CMAKE_CURRENT_SOURCE_DIR/bin for installation.
Standalone build left unchanged: CMAKE_INSTALL_PREFIX is used if set.
2019-10-23 18:54:45 +03:00
Evgeny Mankov
3d16a8b121
Merge pull request #1574 from emankov/hipify-clang
...
[HIPIFY] Disable delayed template parsing
2019-10-22 19:09:13 +03:00
Evgeny Mankov
7ab06b3892
[HIPIFY] Disable delayed template parsing
...
By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang.
[Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified.
Affects cub_03.cu test, which has uncalled global template function.
2019-10-22 19:07:37 +03:00
Evgeny Mankov
bf879f9c86
Merge pull request #1570 from emankov/doc
...
[HIPIFY][#1569 ] Fix
2019-10-22 11:13:47 +03:00
Evgeny Mankov
e2191e23e6
[HIPIFY][ #1569 ] Fix
2019-10-22 11:08:37 +03:00
Evgeny Mankov
62fc6f8487
Merge pull request #1568 from emankov/hipify-clang
...
[HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA major…
2019-10-21 17:52:02 +03:00
Evgeny Mankov
3233a845f6
[HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA major.minor version
...
[Reason] To support maximum CUDA features in offline tests
+ Add defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 restriction for atomicAdd on doubles in atomics.cu.
So if LLVM < 7 and --cuda-gpu-arch doesn't work, __CUDA_ARCH__ is unset too (350 by default in clang);
if LLVM >= 7 --cuda-gpu-arch is used and __CUDA_ARCH__ is set based on it.
2019-10-21 17:50:00 +03:00
Evgeny Mankov
ef25774dae
Merge pull request #1567 from emankov/hipify-clang
...
[HIPIFY][perl] Support of 'using namespace cub'
2019-10-21 17:16:32 +03:00
Evgeny Mankov
9633cdbd8a
[HIPIFY][perl] Support of 'using namespace cub'
2019-10-21 17:15:05 +03:00