Граф коммитов

3685 Коммитов

Автор SHA1 Сообщение Дата
Rahul Garg 25a5ca94de Merge pull request #1582 from amd-lthakur/hipExtMLK
Adding a directed test case for hipExtModuleLaunchKernel() api.

[ROCm/hip commit: 4739e68bbe]
2019-10-31 17:13:26 -07:00
Rahul Garg 3e3cebe614 Merge pull request #1598 from lmoriche/master
Fix a code object memory corruption

[ROCm/hip commit: 782cf1c007]
2019-10-31 17:12:24 -07:00
Rahul Garg 75cf902cdb Add stream
[ROCm/hip commit: 85d70086cb]
2019-10-31 12:15:56 -04:00
Rahul Garg 73ca647852 Fix HIP init calls in hipMemcpy2DFromArray
[ROCm/hip commit: efe6fa86dc]
2019-10-31 12:15:56 -04:00
Evgeny Mankov 0feee792b8 [HIPIFY][cmake][#1572] Fix: Do not override CMAKE_INSTALL_PREFIX
Affects building with HIP, standalone building is not changed


[ROCm/hip commit: f563772a25]
2019-10-31 16:55:06 +03:00
Rahul Garg b68c8d2f60 Formatting changes
[ROCm/hip commit: 55f2a38120]
2019-10-30 18:12:51 -07:00
Rahul Garg 8429e15052 Formatting changes ,variable name and check update
[ROCm/hip commit: 4ab71216b4]
2019-10-30 18:09:21 -07:00
Rahul Garg 7e742b1216 Merge pull request #1515 from ansurya/tex_unbind_issue_fix
Fix undefined ref to hipUnbindTexture for texture types

[ROCm/hip commit: ba8105e0cd]
2019-10-30 17:54:15 -07:00
Laurent Morichetti 1056ca35dc Addressed review comments
Change comment "must exceed" to "must be no shorter than"
move the std::string instead of creating a copy


[ROCm/hip commit: 91748f4e6c]
2019-10-30 13:14:41 -07:00
Evgeny Mankov cee5e37f57 Merge pull request #1593 from emankov/doc
[HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()

[ROCm/hip commit: 77962371e7]
2019-10-30 22:10:05 +03:00
Michael LIAO 2bff0748cd [HIP] Correct headers and add missing function templates for hip-clang.
- Fix 2 runtime API prototypes
  `hipOccupancyMaxActiveBlocksPerMultiprocessor` and
  `hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags`
- Add missing function templates of them in hip-clang.


[ROCm/hip commit: 5c8a7521f4]
2019-10-29 22:00:11 -04:00
Rahul Garg 2102b59eb6 Merge pull request #1602 from ROCm-Developer-Tools/revert-1560-satyanveshd/hipoccupy
Revert "Cooperative groups match with cuda SWDEV-205006"

[ROCm/hip commit: 4d04baf0cd]
2019-10-29 16:54:36 -07:00
Evgeny Mankov 1ca3744948 [HIPIFY][#1603] Fix
[ROCm/hip commit: 389b5ec957]
2019-10-29 22:10:36 +03:00
Rahul Garg 70449cfa92 Revert "Fix occupany APIs (#1560)"
This reverts commit 4f23f9cb18.


[ROCm/hip commit: e4a1e44162]
2019-10-29 11:41:08 -07:00
Evgeny Mankov bce3beed0c [HIPIFY][Linux] Rollback --cuda-compile-host-device on Linux
[Reason] It doesn't work with LLVM 9 and higher; Windows is fine


[ROCm/hip commit: 85087644da]
2019-10-29 20:53:54 +03:00
Evgeny Mankov 0fd46e00cb [HIPIFY] Introduce --cuda-compile-host-device for LLVM >= 9
* LLVM < 9 continues using --cuda-host-only


[ROCm/hip commit: 3f2eefa82a]
2019-10-29 19:42:53 +03:00
Evgeny Mankov 933568681c [HIPIFY] cudaMemcpy2DFromArray(Async) support
[ROCm/hip commit: 315a10a59d]
2019-10-29 19:12:42 +03:00
Laurent Morichetti 86dd262e9b Fix a code object memory corruption
The lifetime of the buffer given to
hsa_code_object_reader_create_from_memory must exceed that of the
code object reader. We need to create a copy of the code object
binary memory (file) that is kept allocated until the code object
reader is destroyed.


[ROCm/hip commit: 7473140a76]
2019-10-29 08:23:57 -07:00
Evgeny Mankov fa39151e3b [HIP][doc] NVIDIA-nvcc -> HIP-nvcc
[ROCm/hip commit: 3a4165779a]
2019-10-28 22:46:33 +03:00
Evgeny Mankov d75d979d31 [HIP][doc] AMD-hcc -> HIP-hcc
[ROCm/hip commit: 46b164c17a]
2019-10-28 21:41:12 +03:00
Evgeny Mankov 995348aecf [HIP][doc] Fix typo: AMD-clang -> HIP-clang
HIP-clang is already used below instead of AMD-clang


[ROCm/hip commit: 06d9e426e0]
2019-10-28 21:19:21 +03:00
Evgeny Mankov a2c162f85e [HIP][cmake] Move all *_INSTALL_DIR variables up before first add_subdirectory()
[REASON]
Those vars (may) used by cmake in subdirectories (#1571)


[ROCm/hip commit: b089d905c6]
2019-10-28 21:07:00 +03:00
Evgeny Mankov 17fd872099 [HIPIFY][tests] Rename the ambiguous call as well
[ROCm/hip commit: 70c5072302]
2019-10-25 16:07:31 +03:00
Evgeny Mankov 536376b341 [HIPIFY][tests] Fix ambiguous call to cusparseGetErrorString declared in cusparse.h
[ROCm/hip commit: 0410d5dcd2]
2019-10-25 16:04:20 +03:00
Anusha Godavarthy Surya 5c77b7d19a Merge branch 'master' into tex_unbind_issue_fix
[ROCm/hip commit: 03623cc3f1]
2019-10-25 15:54:25 +05:30
amd-lthakur 5e11495936 Excluded the test case for nvcc platform
[ROCm/hip commit: 4239c94fe5]
2019-10-25 15:52:11 +05:30
Anusha Godavarthy Surya 196bdea9c0 merge from master
[ROCm/hip commit: 5f47e99ffe]
2019-10-25 15:52:09 +05:30
Alex Voicu 8460793117 Add missing operators, fix GCC compilation. (#1589)
[ROCm/hip commit: dabd939048]
2019-10-25 15:44:24 +05:30
Alex Voicu 2e9868d597 Fix deadlock, remove old __sync_* use. (#1584)
This fixes a deadlock introduced by the switch to TTAS loops, and is therefore mildly urgent (to prevent the CI from hoovering in the broken code).

[ROCm/hip commit: a855a13c22]
2019-10-25 15:44:17 +05:30
Rahul Garg 849ae2bff0 [dtest] Fix hipMemset2D test (#1579)
Reverts changes made in #1399. This is a RT api test. For testing hipMemAllocPitch , a new test should be written and that should use correct memset API.

[ROCm/hip commit: 12e1a86ec1]
2019-10-25 15:44:05 +05:30
Rahul Garg c315da2028 Add hipMemcpy2DfromArray (#1510)
Adds hipMemcpy2DFromArray and hipMemcpy2DFromArrayAsync equivalent to cudaMemcpy2DFromArray and cudaMemcpy2DFromArrayAsync.

[ROCm/hip commit: 356765a223]
2019-10-25 15:43:33 +05:30
Anusha Godavarthy Surya 3007505d30 Merge branch 'master' into tex_unbind_issue_fix
[ROCm/hip commit: 259d8b4cdf]
2019-10-25 15:36:55 +05:30
Anusha Godavarthy Surya f51eeeb5de Fixed CI build failure
[ROCm/hip commit: ce04bdaa1a]
2019-10-25 12:21:41 +05:30
amd-lthakur 158cab3bb7 Refactored the file as suggested
[ROCm/hip commit: 564418c308]
2019-10-25 10:44:38 +05:30
amd-lthakur 629a933b63 Update matmul.cpp
[ROCm/hip commit: 318df5c36b]
2019-10-25 09:22:07 +05:30
amd-lthakur 4b771db194 Update hipExtModuleLaunchKernel.cpp
[ROCm/hip commit: cd25149225]
2019-10-25 09:19:49 +05:30
Rahul Garg 83bbf215c8 Update profiling doc (#1576)
[ROCm/hip commit: 70f2cd1317]
2019-10-24 17:51:55 +05:30
Jatin Chaudhary e0a382a781 Adding New Analyze Target Merging with cppcheck (#1583)
[ROCm/hip commit: 770d3412f8]
2019-10-24 17:46:06 +05:30
Rahul Garg 9e0f66daec Add HIP checks in texture driver sample (#1581)
[ROCm/hip commit: 04e10814d8]
2019-10-24 17:45:51 +05:30
gandryey 4a7884105f Hip vdi profiling header (#1577)
Add HIP-VDI profiling interface for GPU timing collection.

[ROCm/hip commit: 81952ce5a7]
2019-10-24 17:45:42 +05:30
Alex Voicu 8f020907c7 Make CAS loops use the TTAS idiom. (#1573)
* Make CAS loops use the TTAS idiom.

* More efficient re-formulation of TTAS.

* Fix typo.

* The typo was not quite a typo


[ROCm/hip commit: 9ba25b42c8]
2019-10-24 17:45:20 +05:30
satyanveshd 4f23f9cb18 Fix occupany APIs (#1560)
Addresses SWDEV-205006 

[ROCm/hip commit: af351d7e1b]
2019-10-24 17:44:47 +05:30
searlmc1 4d668d5a52 Improve performance of v2 arg handling (#1539)
* Improve performance of v2 arg handling

* Missing change to `std::string`


[ROCm/hip commit: c4a51f3679]
2019-10-24 17:44:05 +05:30
Alex Voicu 52a5380263 Improve scalar access into vector types. (#1531)
The improvement is based on the ideas here: https://t0rakka.silvrback.com/simd-scalar-accessor. It yields significantly better ISA when the base's .xyzw members are used.

[ROCm/hip commit: 4a635add45]
2019-10-24 17:43:49 +05:30
Aryan Salmanpour 9ab561dd66 [hip] add support for implicit kernel argument for multi-grid sync (#1456)
* [hip] add support for implicit kernel argument for multi-grid sync

* modified code for calculating the prev_sum

* change the impCoopArg type to size_t

* add memory clean up

* launch init_gws and main kernels into two separate loops


[ROCm/hip commit: 359dc79101]
2019-10-24 17:43:30 +05:30
amd-lthakur 297a20eac7 Adding a directed test case for hipExtModuleLaunchKernel() api.
[ROCm/hip commit: 8b496e4715]
2019-10-24 15:06:28 +05:30
Rahul Garg 66f0280f0b Merge pull request #1559 from vsytch/win10_aligned_alloc
Fixes for hipMemcpy_simple on Windows

[ROCm/hip commit: fe5f7d4245]
2019-10-23 13:10:59 -07:00
Evgeny Mankov 6a0ce151e5 [HIPIFY][cmake][#1571] Take into account building hipify-clang as a part of building HIP while installing
[Algorithm]
  [Release]
    If CMAKE_INSTALL_PREFIX is set by the user:
       If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise CMAKE_INSTALL_PREFIX is used unchanged.
    If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
       If BIN_INSTALL_DIR is set by HIP, use it as CMAKE_INSTALL_PREFIX, otherwise use PROJECT_BINARY_DIR/bin for installation.
  [Debug]
    If CMAKE_INSTALL_PREFIX is set by the user:
       CMAKE_INSTALL_PREFIX is used unchanged.
    If the user does not set CMAKE_INSTALL_PREFIX (CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT):
       use CMAKE_CURRENT_SOURCE_DIR/bin for installation.

Standalone build left unchanged: CMAKE_INSTALL_PREFIX is used if set.


[ROCm/hip commit: 75d70a6714]
2019-10-23 18:54:45 +03:00
Evgeny Mankov d39793f0f7 [HIPIFY] Disable delayed template parsing
By implicit unconditional passing -fno-delayed-template-parsing option (which appeared in LLVM 3.8.0, thus doesn't need compatibility wrapping) to hipify-clang.

[Reason] To parse uncalled template functions otherwise they are not parsed without calling, thus not hipified.

Affects cub_03.cu test, which has uncalled global template function.


[ROCm/hip commit: b6e6f12b54]
2019-10-22 19:07:37 +03:00
Evgeny Mankov 9822351686 [HIPIFY][#1569] Fix
[ROCm/hip commit: 6f88c81a78]
2019-10-22 11:08:37 +03:00