Evgeny Mankov
f472257a40
Merge pull request #1612 from emankov/hipify
...
[HIPIFY][cmake][#1572 ] Fix: Do not override CMAKE_INSTALL_PREFIX
[ROCm/clr commit: 8b99b0ffd8 ]
2019-10-31 16:58:36 +03:00
Evgeny Mankov
0cfdeda490
[HIPIFY][cmake][ #1572 ] Fix: Do not override CMAKE_INSTALL_PREFIX
...
Affects building with HIP, standalone building is not changed
[ROCm/clr commit: e79fd55d01 ]
2019-10-31 16:55:06 +03:00
Rahul Garg
04785f2d54
Merge pull request #1515 from ansurya/tex_unbind_issue_fix
...
Fix undefined ref to hipUnbindTexture for texture types
[ROCm/clr commit: aeb7cebbad ]
2019-10-30 17:54:15 -07:00
Evgeny Mankov
5403e7edcc
Merge pull request #1593 from emankov/doc
...
[HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()
[ROCm/clr commit: 961bc5737e ]
2019-10-30 22:10:05 +03:00
Rahul Garg
434231ac69
Merge pull request #1607 from mhbliao/hliao/master/missing.api.hip.clang
...
[HIP] Correct headers and add missing function templates for hip-clang.
[ROCm/clr commit: b94f5bd667 ]
2019-10-30 07:48:57 -07:00
Michael LIAO
64f2d5e861
[HIP] Correct headers and add missing function templates for hip-clang.
...
- Fix 2 runtime API prototypes
`hipOccupancyMaxActiveBlocksPerMultiprocessor` and
`hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- Add missing function templates of them in hip-clang.
[ROCm/clr commit: 61bc68a5f4 ]
2019-10-29 22:00:11 -04:00
Rahul Garg
42ab7b830e
Merge pull request #1602 from ROCm-Developer-Tools/revert-1560-satyanveshd/hipoccupy
...
Revert "Cooperative groups match with cuda SWDEV-205006"
[ROCm/clr commit: 9840cdac99 ]
2019-10-29 16:54:36 -07:00
Evgeny Mankov
615b9b373b
Merge pull request #1604 from emankov/hipify
...
[HIPIFY][#1603 ] Fix
[ROCm/clr commit: daab61e8e8 ]
2019-10-29 22:12:39 +03:00
Evgeny Mankov
dc0186720c
[HIPIFY][ #1603 ] Fix
...
[ROCm/clr commit: 050fdad7b7 ]
2019-10-29 22:10:36 +03:00
Rahul Garg
72c686ed67
Revert "Fix occupany APIs ( #1560 )"
...
This reverts commit ad1e409a24 .
[ROCm/clr commit: 27221bc823 ]
2019-10-29 11:41:08 -07:00
Evgeny Mankov
1a9af0ea4e
Merge pull request #1601 from emankov/hipify
...
[HIPIFY][Linux] Rollback --cuda-compile-host-device on Linux
[ROCm/clr commit: 8a7e6fb747 ]
2019-10-29 20:55:29 +03:00
Evgeny Mankov
d8e846fc91
[HIPIFY][Linux] Rollback --cuda-compile-host-device on Linux
...
[Reason] It doesn't work with LLVM 9 and higher; Windows is fine
[ROCm/clr commit: dd2243f2fa ]
2019-10-29 20:53:54 +03:00
Evgeny Mankov
0989a885f9
Merge pull request #1600 from emankov/hipify
...
[HIPIFY] Introduce --cuda-compile-host-device for LLVM >= 9
[ROCm/clr commit: 99c4a40da1 ]
2019-10-29 19:47:15 +03:00
Evgeny Mankov
21d798394a
[HIPIFY] Introduce --cuda-compile-host-device for LLVM >= 9
...
* LLVM < 9 continues using --cuda-host-only
[ROCm/clr commit: 411b18a124 ]
2019-10-29 19:42:53 +03:00
Evgeny Mankov
532e8138d5
Merge pull request #1599 from emankov/hipify
...
[HIPIFY] cudaMemcpy2DFromArray(Async) support
[ROCm/clr commit: 50df94be1e ]
2019-10-29 19:14:00 +03:00
Evgeny Mankov
bb75fa46f0
[HIPIFY] cudaMemcpy2DFromArray(Async) support
...
[ROCm/clr commit: 5dd00bdf52 ]
2019-10-29 19:12:42 +03:00
Evgeny Mankov
28a5dd488b
Merge pull request #1594 from emankov/HIP
...
[HIP][doc] Fix typo: AMD-clang -> HIP-clang
[ROCm/clr commit: 3921ea9057 ]
2019-10-28 23:22:57 +03:00
Evgeny Mankov
4e02b285d6
[HIP][doc] NVIDIA-nvcc -> HIP-nvcc
...
[ROCm/clr commit: 3df22b2fde ]
2019-10-28 22:46:33 +03:00
Evgeny Mankov
935dd4ce94
[HIP][doc] AMD-hcc -> HIP-hcc
...
[ROCm/clr commit: d312bce79d ]
2019-10-28 21:41:12 +03:00
Evgeny Mankov
20b127bf45
[HIP][doc] Fix typo: AMD-clang -> HIP-clang
...
HIP-clang is already used below instead of AMD-clang
[ROCm/clr commit: 6284b041e5 ]
2019-10-28 21:19:21 +03:00
Evgeny Mankov
0737167ee4
[HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()
...
[REASON]
Those vars (may) used by cmake in subdirectories (#1571 )
[ROCm/clr commit: 8100e084b8 ]
2019-10-28 21:07:00 +03:00
Evgeny Mankov
1ef94a4f94
Merge pull request #1590 from emankov/doc
...
[HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h
[ROCm/clr commit: 7f367ff933 ]
2019-10-25 16:08:22 +03:00
Evgeny Mankov
bcc9d88b20
[HIPIFY][tests] Rename the ambiguous call as well
...
[ROCm/clr commit: f68bee02f5 ]
2019-10-25 16:07:31 +03:00
Evgeny Mankov
91732f98c0
[HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h
...
[ROCm/clr commit: 9529e1d91d ]
2019-10-25 16:04:20 +03:00
Anusha Godavarthy Surya
dfa019bdf6
Merge branch 'master' into tex_unbind_issue_fix
...
[ROCm/clr commit: 9332a39838 ]
2019-10-25 15:54:25 +05:30
Anusha Godavarthy Surya
0140ea8e1a
merge from master
...
[ROCm/clr commit: ae838f8cee ]
2019-10-25 15:52:09 +05:30
Alex Voicu
f22391c362
Add missing operators, fix GCC compilation. ( #1589 )
...
[ROCm/clr commit: 40522e2b6a ]
2019-10-25 15:44:24 +05:30
Alex Voicu
acbee5a48b
Fix deadlock, remove old __sync_* use. ( #1584 )
...
This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code).
[ROCm/clr commit: f909a393ff ]
2019-10-25 15:44:17 +05:30
Rahul Garg
9c599a3581
[dtest] Fix hipMemset2D test ( #1579 )
...
Reverts changes made in #1399 . This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API.
[ROCm/clr commit: 66a3c874c8 ]
2019-10-25 15:44:05 +05:30
Rahul Garg
7ea7a9c3b7
Add hipMemcpy2DfromArray ( #1510 )
...
Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync.
[ROCm/clr commit: 14b870d1ce ]
2019-10-25 15:43:33 +05:30
Anusha Godavarthy Surya
2d538b702d
Merge branch 'master' into tex_unbind_issue_fix
...
[ROCm/clr commit: c0fc5e718c ]
2019-10-25 15:36:55 +05:30
Anusha Godavarthy Surya
1dd70e007b
Fixed CI build failure
...
[ROCm/clr commit: b9c8dd8ac6 ]
2019-10-25 12:21:41 +05:30
Rahul Garg
6760e4065e
Update profiling doc ( #1576 )
...
[ROCm/clr commit: ff8d3fa446 ]
2019-10-24 17:51:55 +05:30
Jatin Chaudhary
e7f4cf4487
Adding New Analyze Target Merging with cppcheck ( #1583 )
...
[ROCm/clr commit: f53b1a1755 ]
2019-10-24 17:46:06 +05:30
Rahul Garg
7f429afe2e
Add HIP checks in texture driver sample ( #1581 )
...
[ROCm/clr commit: 170c4f0270 ]
2019-10-24 17:45:51 +05:30
gandryey
21a2925ee7
Hip vdi profiling header ( #1577 )
...
Add HIP-VDI profiling interface for GPU timing collection.
[ROCm/clr commit: f25692b399 ]
2019-10-24 17:45:42 +05:30
Alex Voicu
5b917afa5f
Make CAS loops use the TTAS idiom. ( #1573 )
...
* Make CAS loops use the TTAS idiom.
* More efficient re-formulation of TTAS.
* Fix typo.
* The typo was not quite a typo
[ROCm/clr commit: 26914ec76e ]
2019-10-24 17:45:20 +05:30
satyanveshd
ad1e409a24
Fix occupany APIs ( #1560 )
...
Addresses SWDEV-205006
[ROCm/clr commit: 6c5fbf9b4a ]
2019-10-24 17:44:47 +05:30
searlmc1
510be4b5dc
Improve performance of v2 arg handling ( #1539 )
...
* Improve performance of v2 arg handling
* Missing change to `std::string`
[ROCm/clr commit: 15a699688e ]
2019-10-24 17:44:05 +05:30
Alex Voicu
fb411b56c2
Improve scalar access into vector types. ( #1531 )
...
The improvement is based on the ideas here: https://t0rakka.silvrback.com/simd-scalar-accessor . It yields significantly better ISA when the base's .xyzw members are used.
[ROCm/clr commit: 84d5b399f6 ]
2019-10-24 17:43:49 +05:30
Aryan Salmanpour
9e0eaef846
[hip] add support for implicit kernel argument for multi-grid sync ( #1456 )
...
* [hip] add support for implicit kernel argument for multi-grid sync
* modified code for calculating the prev_sum
* change the impCoopArg type to size_t
* add memory clean up
* launch init_gws and main kernels into two separate loops
[ROCm/clr commit: 93c688a0c9 ]
2019-10-24 17:43:30 +05:30
Rahul Garg
764135d242
Merge pull request #1559 from vsytch/win10_aligned_alloc
...
Fixes for hipMemcpy_simple on Windows
[ROCm/clr commit: 465581612e ]
2019-10-23 13:10:59 -07:00
Evgeny Mankov
48b264c154
Merge pull request #1578 from emankov/doc
...
[HIPIFY][cmake][#1571 ] Take into account building hipify-clang as a part of building HIP while installing
[ROCm/clr commit: 80bf79c2f8 ]
2019-10-23 21:23:05 +03:00
Evgeny Mankov
50d72e13ca
[HIPIFY][cmake][ #1571 ] Take into account building hipify-clang as a part of building HIP while installing
...
[Algorithm]
[Release]
If CMAKE_INSTALL_PREFIX is set by the user:
If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise CMAKE_INSTALL_PREFIX is used unchanged.
If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise use PROJECT_BINARY_DIR/bin for installation.
[Debug]
If CMAKE_INSTALL_PREFIX is set by the user:
CMAKE_INSTALL_PREFIX is used unchanged.
If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
use CMAKE_CURRENT_SOURCE_DIR/bin for installation.
Standalone build left unchanged: CMAKE_INSTALL_PREFIX is used if set.
[ROCm/clr commit: 2435567e70 ]
2019-10-23 18:54:45 +03:00
Evgeny Mankov
b34b56a761
Merge pull request #1574 from emankov/hipify-clang
...
[HIPIFY] Disable delayed template parsing
[ROCm/clr commit: 3d16a8b121 ]
2019-10-22 19:09:13 +03:00
Evgeny Mankov
0896e41987
[HIPIFY] Disable delayed template parsing
...
By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang.
[Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified.
Affects cub_03.cu test, which has uncalled global template function.
[ROCm/clr commit: 7ab06b3892 ]
2019-10-22 19:07:37 +03:00
Evgeny Mankov
7426cbee0d
Merge pull request #1570 from emankov/doc
...
[HIPIFY][#1569 ] Fix
[ROCm/clr commit: bf879f9c86 ]
2019-10-22 11:13:47 +03:00
Evgeny Mankov
82222bf945
[HIPIFY][ #1569 ] Fix
...
[ROCm/clr commit: e2191e23e6 ]
2019-10-22 11:08:37 +03:00
Evgeny Mankov
fe97898c1a
Merge pull request #1568 from emankov/hipify-clang
...
[HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA major…
[ROCm/clr commit: 62fc6f8487 ]
2019-10-21 17:52:02 +03:00
Evgeny Mankov
e3cf10192c
[HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA major.minor version
...
[Reason] To support maximum CUDA features in offline tests
+ Add defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 restriction for atomicAdd on doubles in atomics.cu.
So if LLVM < 7 and --cuda-gpu-arch doesn't work, __CUDA_ARCH__ is unset too (350 by default in clang);
if LLVM >= 7 --cuda-gpu-arch is used and __CUDA_ARCH__ is set based on it.
[ROCm/clr commit: 3233a845f6 ]
2019-10-21 17:50:00 +03:00