Graf commitů

3672 Commity

Autor SHA1 Zpráva Datum
Evgeny Mankov 0cfdeda490 [HIPIFY][cmake][#1572] Fix: Do not override CMAKE_INSTALL_PREFIX
Affects building with HIP, standalone building is not changed


[ROCm/clr commit: e79fd55d01]
2019-10-31 16:55:06 +03:00
Rahul Garg 04785f2d54 Merge pull request #1515 from ansurya/tex_unbind_issue_fix
Fix undefined ref to hipUnbindTexture for texture types

[ROCm/clr commit: aeb7cebbad]
2019-10-30 17:54:15 -07:00
Evgeny Mankov 5403e7edcc Merge pull request #1593 from emankov/doc
[HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()

[ROCm/clr commit: 961bc5737e]
2019-10-30 22:10:05 +03:00
Michael LIAO 64f2d5e861 [HIP] Correct headers and add missing function templates for hip-clang.
- Fix 2 runtime API prototypes
  `hipOccupancyMaxActiveBlocksPerMultiprocessor` and
  `hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- Add missing function templates of them in hip-clang.


[ROCm/clr commit: 61bc68a5f4]
2019-10-29 22:00:11 -04:00
Rahul Garg 42ab7b830e Merge pull request #1602 from ROCm-Developer-Tools/revert-1560-satyanveshd/hipoccupy
Revert "Cooperative groups match with cuda SWDEV-205006"

[ROCm/clr commit: 9840cdac99]
2019-10-29 16:54:36 -07:00
Evgeny Mankov dc0186720c [HIPIFY][#1603] Fix
[ROCm/clr commit: 050fdad7b7]
2019-10-29 22:10:36 +03:00
Rahul Garg 72c686ed67 Revert "Fix occupany APIs (#1560)"
This reverts commit ad1e409a24.


[ROCm/clr commit: 27221bc823]
2019-10-29 11:41:08 -07:00
Evgeny Mankov d8e846fc91 [HIPIFY][Linux] Rollback --cuda-compile-host-device on Linux
[Reason] It doesn't work with LLVM 9 and higher; Windows is fine


[ROCm/clr commit: dd2243f2fa]
2019-10-29 20:53:54 +03:00
Evgeny Mankov 21d798394a [HIPIFY] Introduce --cuda-compile-host-device for LLVM >= 9
* LLVM < 9 continues using --cuda-host-only


[ROCm/clr commit: 411b18a124]
2019-10-29 19:42:53 +03:00
Evgeny Mankov bb75fa46f0 [HIPIFY] cudaMemcpy2DFromArray(Async) support
[ROCm/clr commit: 5dd00bdf52]
2019-10-29 19:12:42 +03:00
Evgeny Mankov 4e02b285d6 [HIP][doc] NVIDIA-nvcc -> HIP-nvcc
[ROCm/clr commit: 3df22b2fde]
2019-10-28 22:46:33 +03:00
Evgeny Mankov 935dd4ce94 [HIP][doc] AMD-hcc -> HIP-hcc
[ROCm/clr commit: d312bce79d]
2019-10-28 21:41:12 +03:00
Evgeny Mankov 20b127bf45 [HIP][doc] Fix typo: AMD-clang -> HIP-clang
HIP-clang is already used below instead of AMD-clang


[ROCm/clr commit: 6284b041e5]
2019-10-28 21:19:21 +03:00
Evgeny Mankov 0737167ee4 [HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()
[REASON]
Those vars (may) used by cmake in subdirectories (#1571)


[ROCm/clr commit: 8100e084b8]
2019-10-28 21:07:00 +03:00
Evgeny Mankov bcc9d88b20 [HIPIFY][tests] Rename the ambiguous call as well
[ROCm/clr commit: f68bee02f5]
2019-10-25 16:07:31 +03:00
Evgeny Mankov 91732f98c0 [HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h
[ROCm/clr commit: 9529e1d91d]
2019-10-25 16:04:20 +03:00
Anusha Godavarthy Surya dfa019bdf6 Merge branch 'master' into tex_unbind_issue_fix
[ROCm/clr commit: 9332a39838]
2019-10-25 15:54:25 +05:30
Anusha Godavarthy Surya 0140ea8e1a merge from master
[ROCm/clr commit: ae838f8cee]
2019-10-25 15:52:09 +05:30
Alex Voicu f22391c362 Add missing operators, fix GCC compilation. (#1589)
[ROCm/clr commit: 40522e2b6a]
2019-10-25 15:44:24 +05:30
Alex Voicu acbee5a48b Fix deadlock, remove old __sync_* use. (#1584)
This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code).

[ROCm/clr commit: f909a393ff]
2019-10-25 15:44:17 +05:30
Rahul Garg 9c599a3581 [dtest] Fix hipMemset2D test (#1579)
Reverts changes made in #1399. This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API.

[ROCm/clr commit: 66a3c874c8]
2019-10-25 15:44:05 +05:30
Rahul Garg 7ea7a9c3b7 Add hipMemcpy2DfromArray (#1510)
Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync.

[ROCm/clr commit: 14b870d1ce]
2019-10-25 15:43:33 +05:30
Anusha Godavarthy Surya 2d538b702d Merge branch 'master' into tex_unbind_issue_fix
[ROCm/clr commit: c0fc5e718c]
2019-10-25 15:36:55 +05:30
Anusha Godavarthy Surya 1dd70e007b Fixed CI build failure
[ROCm/clr commit: b9c8dd8ac6]
2019-10-25 12:21:41 +05:30
Rahul Garg 6760e4065e Update profiling doc (#1576)
[ROCm/clr commit: ff8d3fa446]
2019-10-24 17:51:55 +05:30
Jatin Chaudhary e7f4cf4487 Adding New Analyze Target Merging with cppcheck (#1583)
[ROCm/clr commit: f53b1a1755]
2019-10-24 17:46:06 +05:30
Rahul Garg 7f429afe2e Add HIP checks in texture driver sample (#1581)
[ROCm/clr commit: 170c4f0270]
2019-10-24 17:45:51 +05:30
gandryey 21a2925ee7 Hip vdi profiling header (#1577)
Add HIP-VDI profiling interface for GPU timing collection.

[ROCm/clr commit: f25692b399]
2019-10-24 17:45:42 +05:30
Alex Voicu 5b917afa5f Make CAS loops use the TTAS idiom. (#1573)
* Make CAS loops use the TTAS idiom.

* More efficient re-formulation of TTAS.

* Fix typo.

* The typo was not quite a typo


[ROCm/clr commit: 26914ec76e]
2019-10-24 17:45:20 +05:30
satyanveshd ad1e409a24 Fix occupany APIs (#1560)
Addresses SWDEV-205006 

[ROCm/clr commit: 6c5fbf9b4a]
2019-10-24 17:44:47 +05:30
searlmc1 510be4b5dc Improve performance of v2 arg handling (#1539)
* Improve performance of v2 arg handling

* Missing change to `std::string`


[ROCm/clr commit: 15a699688e]
2019-10-24 17:44:05 +05:30
Alex Voicu fb411b56c2 Improve scalar access into vector types. (#1531)
The improvement is based on the ideas here: https://t0rakka.silvrback.com/simd-scalar-accessor. It yields significantly better ISA when the base's .xyzw members are used.

[ROCm/clr commit: 84d5b399f6]
2019-10-24 17:43:49 +05:30
Aryan Salmanpour 9e0eaef846 [hip] add support for implicit kernel argument for multi-grid sync (#1456)
* [hip] add support for implicit kernel argument for multi-grid sync

* modified code for calculating the prev_sum

* change the impCoopArg type to size_t

* add memory clean up

* launch init_gws and main kernels into two separate loops


[ROCm/clr commit: 93c688a0c9]
2019-10-24 17:43:30 +05:30
Rahul Garg 764135d242 Merge pull request #1559 from vsytch/win10_aligned_alloc
Fixes for hipMemcpy_simple on Windows

[ROCm/clr commit: 465581612e]
2019-10-23 13:10:59 -07:00
Evgeny Mankov 50d72e13ca [HIPIFY][cmake][#1571] Take into account building hipify-clang as a part of building HIP while installing
[Algorithm]
  [Release]
    If CMAKE_INSTALL_PREFIX is set by the user:
       If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise CMAKE_INSTALL_PREFIX is used unchanged.
    If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
       If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise use PROJECT_BINARY_DIR/bin for installation.
  [Debug]
    If CMAKE_INSTALL_PREFIX is set by the user:
       CMAKE_INSTALL_PREFIX is used unchanged.
    If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
       use CMAKE_CURRENT_SOURCE_DIR/bin for installation.

Standalone build left unchanged: CMAKE_INSTALL_PREFIX is used if set.


[ROCm/clr commit: 2435567e70]
2019-10-23 18:54:45 +03:00
Evgeny Mankov 0896e41987 [HIPIFY] Disable delayed template parsing
By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang.

[Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified.

Affects cub_03.cu test, which has uncalled global template function.


[ROCm/clr commit: 7ab06b3892]
2019-10-22 19:07:37 +03:00
Evgeny Mankov 82222bf945 [HIPIFY][#1569] Fix
[ROCm/clr commit: e2191e23e6]
2019-10-22 11:08:37 +03:00
Evgeny Mankov e3cf10192c [HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA major.minor version
[Reason] To support maximum CUDA features in offline tests

+ Add defined(__CUDA_ARCH__) && __CUDA_ARCH__ >= 600 restriction for atomicAdd on doubles in atomics.cu.
  So if LLVM < 7 and --cuda-gpu-arch doesn't work, __CUDA_ARCH__ is unset too (350 by default in clang);
  if LLVM >= 7 --cuda-gpu-arch is used and __CUDA_ARCH__ is set based on it.


[ROCm/clr commit: 3233a845f6]
2019-10-21 17:50:00 +03:00
Evgeny Mankov de849a44e7 [HIPIFY][perl] Support of 'using namespace cub'
[ROCm/clr commit: 9633cdbd8a]
2019-10-21 17:15:05 +03:00
Evgeny Mankov 665a200247 [HIPIFY][tests] Set max clang's CudaArch for corresponding CUDA version
[Reason] To support maximum CUDA features in offline tests

+ Add CUDA_VERSION >= 800 restriction for atomics.cu

[TODO] Find a way to use or exclude atomicAdd for doubles if LLVM < 7, because
LLVM 6.0.1 and older do not use --cuda-gpu-arch in clang's Driver code at all (option is only declared)


[ROCm/clr commit: 9fc7afa738]
2019-10-21 15:51:25 +03:00
Evgeny Mankov 3a45daed0a [HIPIFY][tests] Set -I for CUDA path instead of --cuda-path for LLVM < 4
[ROCm/clr commit: ff6057d1ff]
2019-10-20 20:08:56 +03:00
Evgeny Mankov e07be75489 [HIPIFY][tests] Exclude all CUB tests if CUDA_CUB_ROOT_DIR is not set
[ROCm/clr commit: 5bf1ff19ff]
2019-10-20 20:03:18 +03:00
Vladislav Sytchenko 33acfa17c1 Remove extra #endif.
[ROCm/clr commit: 432380aa5d]
2019-10-18 16:40:29 -04:00
Evgeny Mankov bb20336fa6 [HIPIFY][tests] Test clean-up
[ROCm/clr commit: 44a897a146]
2019-10-18 18:55:52 +03:00
Evgeny Mankov 85281b1d86 [HIPIFY][CUB][#1460] Add "using namespace cub" translation support
+ Add cub_03.cu


[ROCm/clr commit: 86f6756b02]
2019-10-18 18:51:40 +03:00
Evgeny Mankov a392a050d6 Merge pull request #1558 from aaronenyeshi/fix-hipify-cmake-version
[HIPIFY][cmake] Make CMakeLists use default 3.5.1 for Ubuntu 16.04

[ROCm/clr commit: eb6690bbba]
2019-10-18 06:39:35 +03:00
Rahul Garg 30759e7c9b Merge pull request #1550 from yxsamliu/new-launch
Add -fhip-new-launch-api to hipcc for HIP/VDI

[ROCm/clr commit: 07eed1e5bf]
2019-10-17 19:07:32 -07:00
Vladislav Sytchenko 54eddfc8f0 _aligned_malloc() on Windows first takes size, then alignment, which is the opposite of how the similar function behaves on Linux. Memory allocated by it also has to be freed using _aligned_free(), unlike Linux where we can use regular free().
Edit aligned_alloc() macro and add a aligned_free() one to align with the above behaviour.


[ROCm/clr commit: f4440817cb]
2019-10-17 18:58:32 -04:00
Aaron Enye Shi 489e3dda9a [HIPIFY][cmake] Make CMakeLists use default 3.5.1 for Ubuntu 16.04
[ROCm/clr commit: b3ea58abe7]
2019-10-17 21:21:24 +00:00
Evgeny Mankov 9fb60fa36a [HIPIFY][doc] Update README.md
+ Versions, testing


[ROCm/clr commit: 1165e6bd71]
2019-10-17 22:26:48 +03:00