Граф коммитов

4425 Коммитов

Автор SHA1 Сообщение Дата
Aryan Salmanpour 015895a265 [HIP] add cooperative kernel launch APIs on NVCC (#1929) 2020-03-17 14:01:11 +05:30
Joseph Greathouse 55e55e78bb Fix maxSharedMemoryPerMultiProcessor attribute (#1927)
The maxSharedMemoryPerMultiProcessor attribute is meant to describe
the number of bytes of shared memory (LDS space in AMD terminology)
in each SM (CU in AMD terminology). For instance, on AMD GPUs this
is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM.

This shared memory is a different address space from the normal
global memory. However, the current HIP-HCC properties fill this
in with a size that matches the totalGlboalMem property. This gives
a drastically too-high calculation for the amount of LDS space that
each CU has -- tens of GBs vs. 10s of KBs.

This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor
property from the HSA pool that describes how much workgroup-local
space is available on each CU. The HSA runtime eventually pulls
this from the topology information about LDSSizeInKB, defined as
"Size of Local Data Store in Kilobytes per SIMD".

Previously, this HSA query was used to fill in the value of the
sharedMemPerBlock property. On today's AMD GPUs, we know that
the amount of LDS avaialble to the workgroup is identical to the
amount of LDS space in the CU. However, in the future this may
differ. As such, this patch changes around the order and fills
in the "PerMultiProcessor" property from the HSA query (since
what's what the query is defined to return), and then separately
fills in the "PerBlock" property as we know it.
2020-03-17 14:00:51 +05:30
Joseph Greathouse bf04d7380a Fix errors in occupancy calculation function (#1926)
Fix two errors in hipOccupancyMaxActiveBlocksPerMultiprocessor.
1) Fix a possible segfault if the user passed in a null pointer for
   the numBlocks value.
2) Handle the situation when the user is asking for a block size
   that is larger than what the target device can hold within a
   single block.
2020-03-17 14:00:38 +05:30
Yaxun (Sam) Liu 7aa9611689 Let hipcc not pass -mllvm option to HIP-Clang on Windows (#1924)
Currently there is a clang bug on Windows causing duplicate -mllvm options in clang -cc1.

Tempoarily disable -mllvm options for HIP-Clang on Windows until the bug is fixed.

Change-Id: I3a4393ba7745989398dc6c6001722837dad18704
2020-03-17 14:00:20 +05:30
Maneesh Gupta eee5cc8621 Annotate __constant__ (#1901) 2020-03-17 13:59:44 +05:30
mhbliao 774035d869 [hip] Improve the portability of the header for vector type support. (#1873)
- Need to check the availability of `__has_attribute` builtin macro
  instead of compiler versions. That's more reliable and portable among
  various compilers.
- Provides a very basic support of vectors for unknown compilers.
2020-03-17 13:59:24 +05:30
Evgeny Mankov 821c60a3d9 Merge pull request #1916 from asalmanp/refactor_cooperative_APIs
[HIP] Refactor cooperative APIs
2020-03-12 19:12:50 +03:00
Evgeny Mankov 70f5646f8a Merge pull request #1908 from asalmanp/prop_mulit_coop
[HIP] add hip specific properties for cooperative kernel multi device
2020-03-12 19:12:11 +03:00
Alex Voicu 1c5f526e6b Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_robust_constant 2020-03-12 14:20:26 +00:00
Maneesh Gupta 0726abf424 Expose support for non-returning atomic FADD (#1909)
Change-Id: If5359488324477315a9bd4f308a75f606c065b39
2020-03-11 14:33:15 +05:30
srinivamd 65a790bc08 return hipSuccess when count is zero (#1900) 2020-03-11 14:32:54 +05:30
Evgeny Mankov f98ce58e06 Merge pull request #1925 from arghdos/patch-3
Fix incorrect shfl_xor for Windows
2020-03-11 00:58:11 +03:00
Aryan Salmanpour b663fccf0b [HIP] return an error if blockDim exceeds maxThreadsPerBlock 2020-03-10 15:26:53 -04:00
Evgeny Mankov da8669ea03 Merge pull request #1922 from emankov/HIP
[HIP][doc] Update README.md
2020-03-10 22:20:00 +03:00
Nick Curtis 09edc7e49c Fix incorrect shfl_xor for Windows
copy/paste error, need __shfl_xor w/ lane_mask
2020-03-10 12:04:05 -05:00
Evgeny Mankov fea3017168 [HIP][doc] Update README.md 2020-03-10 18:04:01 +03:00
Evgeny Mankov cabadddc7b Merge pull request #1919 from ssahasra/declare-printf
separate printf declaration for vdi/clang
2020-03-10 12:26:54 +03:00
Aryan Salmanpour 5494f5b247 [HIP] fix formatting/code clean up and fix a bug 2020-03-09 16:03:59 -04:00
Sameer Sahasrabuddhe 09130b3b92 separate printf declaration for vdi/clang
There are now two implementations of printf in HIP:

1. The implemenation for HCC is controlled by the HC_FEATURE_PRINTF
   macro, and it works only with the HCC compiler used in combination
   with the HCC runtime.

2. The implementation for hip-clang requires the VDI runtime, and is
   always enabled with that combination.
2020-03-09 09:40:05 +05:30
Alex Voicu c7f7ada0e9 Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_robust_constant 2020-03-08 18:00:14 +00:00
Aryan Salmanpour 4844fbdf0a [HIP] Refactor cooperative APIs 2020-03-06 18:30:12 -05:00
Aryan Salmanpour 7e45c54ea6 move new enums to the end to maintain compatibility 2020-03-06 11:38:44 -05:00
Evgeny Mankov 5c036520b1 Merge pull request #1914 from emankov/hipify-clang
[HIPIFY][doc] Update README.md: LLVM 10.0.0-rc3 is supported
2020-03-06 18:20:36 +03:00
Evgeny Mankov dd5f3fd282 [HIPIFY][doc] Update README.md: LLVM 10.0.0-rc3 is supported
+ Add -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON for LLVM 10.0.0 or newer
+ Supported versions update
2020-03-06 18:17:05 +03:00
Alex Voicu 44e5834c8e Merge branch 'master' of https://github.com/ROCm-Developer-Tools/HIP into feature_robust_constant 2020-03-06 12:33:31 +02:00
Maneesh Gupta 4a40010ac6 Expose support for non-returning atomic FADD
Change-Id: If5359488324477315a9bd4f308a75f606c065b39
2020-03-05 10:30:52 +05:30
Evgeny Mankov 43d12b523c Merge pull request #1905 from emankov/hipify-clang
[HIP][cmake] Remove dependency from hipify-clang
2020-03-04 07:42:48 +03:00
Aryan Salmanpour 03797ae986 [HIP] add hip specific properties for cooperative kernel multi device 2020-03-03 13:25:36 -05:00
Evgeny Mankov 1561f61642 [HIP][cmake] Remove dependency from hipify-clang
[Reason] Upcoming hipify-clang's splitting out into a new repository https://github.com/ROCm-Developer-Tools/HIPIFY.
2020-03-03 12:07:13 +03:00
Alex Voicu 27480ff5a2 Annotate __constant__ 2020-02-28 22:54:00 +02:00
Jatin Chaudhary d29ad50464 [dtests] __shfl_up and __shfl_down tests (#1899) 2020-02-28 16:48:15 +05:30
Siu Chi Chan 57edf48191 improve code object loading error message (#1889) 2020-02-28 16:47:40 +05:30
saleelk 3e1f41c165 Fix HIPRTC headers to export C style symbols (#1879) 2020-02-28 16:47:29 +05:30
Rahul Garg 6c5fa32815 Remove deprecated HIP markers (#1876) 2020-02-28 16:47:15 +05:30
Rahul Garg edc97f3073 Add hipDrvOccupancyMaxActiveBlocksPerMultiprocessor[WithFlags] (#1854)
Equivalent to cuOccupancyMaxActiveBlocksPerMultiprocessor[WithFlags].
2020-02-28 16:46:55 +05:30
jiabaxie af90312867 Cleaned up error messages for HipEnvVarDriver test (#1825)
There were several error messages that appeared even if the hipEnvVarDriver.exe test passes and executes successfully. Now it is cleaned up. The following are those instances:

* When popen searches for directed_test directory but does not find it, it outputs an error, then finds the hipEnvVar at the same level. Currently the fix will prompt the test to only output an error if both searches for hipEnvVar fails.
* When assertion is used towards the later half of the test, conditions were set to specifically hide the devices, resulting in No Hip Device detected in the latter half of the test. The fix will make these errors not appear as they are intended to not find any devices. Assertions themselves are untouched.

HipEnvVarDriver.cpp has also been refactored. Reading HipEnvVar will now happen in a helper function for getDeviceNumber and getDevicePCIBusNumRemote, as the code to read HipEnvVar were really similar in them.
2020-02-28 16:46:12 +05:30
Alex Voicu d830dad3be Address post-staging issues in #1809 (#1894)
Fixes SWDEV-223910 and SWDEV-223663
2020-02-27 16:21:12 +05:30
Maneesh Gupta 71e1f87f7e bump version to 3.2 (#1898)
- Bump version to 3.2
- [ci] Enable tests on ROCm 3.1
2020-02-27 16:18:31 +05:30
Nick Curtis b7dd073d93 fix long shuffle implementations for windows (#1895)
Fixes for SWDEV-223694
2020-02-26 15:53:56 +05:30
Yaxun (Sam) Liu 69404d8e78 Fix hipcc for extra -mllvm option (#1885) 2020-02-26 15:53:43 +05:30
Sarbojit2019 c1a70707e0 [HIPIFY] Add back missing execute permission to hipify-perl (#1881)
hipify-perl script lost its executable permission hence "samples/0_Intro/square" was failing. Fixes SWDEV 223433.
2020-02-19 13:48:20 +05:30
eshcherb 82ec3c1c5b adding hipExtModuleLaunchKernel to tracing layer (#1880) 2020-02-19 13:47:49 +05:30
Alex Voicu 9b4f39e1d8 Tweak synchronous memcpy implementation (#1809)
The existing one can have issues on certain systems, therefore this limits use of direct memcpy via largeBAR to sizes where it is unequivocally better.

Also addresses SWDEV-220030 and SWDEV-222237.
2020-02-18 20:50:27 +05:30
Yaxun (Sam) Liu 92cc29ae2b Let HIP-Clang inline all functions by default (#1875)
This is a quick workaround to match HCC behavior for performance since inlining usually
results in more optimization opportunities therefore better performance.

We will fine tuning inline threashold later.
2020-02-17 22:49:26 +05:30
Rahul Garg 8c5e5e435b Fix hipMemcpy3D (#1798)
Fixes #1790 and #1791. hipMemcpy3D still requires further refactoring for different input and output combinations.
2020-02-17 19:35:35 +05:30
Maneesh Gupta 854afef281 [dtests] Fix random timeout failures in hipModuleLoadDataMultThreaded (#1877)
Limit the max threads that are launched to 16.
2020-02-17 11:16:20 +05:30
vsytch 56b8b0d80e Add missing __hip_pinned_shadow__ attributes to the texture global vars. (#1866) 2020-02-15 09:52:25 +05:30
Maneesh Gupta e7120dd876 Use deque instead of vector for code readers so that the iterators and references will be stable (#1851)
* Use deque instead of vector for code readers so that the iterators and references will be stable

* Fix compile error

* Assign the iterator

* Add multithreaded test

* Make threads a multiple of hardware concurrency

* Output on failure

* Add setDevice to try and initialize the context on cuda

* Create context for cuda

* Set context on each thread

* Reduce threads on cuda

* Skip test on cuda

* Try to initialize the primary context on cuda

* Push ctx to the stack as current

* Revert "Push ctx to the stack as current"

This reverts commit bff8cbe950.

* Revert "Try to initialize the primary context on cuda"

This reverts commit fd98514113.

* updated test for nvidia path

* Add c++11 option for nvcc

Co-authored-by: satyanveshd <53337087+satyanveshd@users.noreply.github.com>
2020-02-15 09:51:24 +05:30
Nick Curtis 797a929a65 Implement long / long long shuffles (#1829)
Implement additional data-types for shuffles (long and long long).
Based upon the double implementation.
2020-02-15 09:51:09 +05:30
Siu Chi Chan f2ab87d872 Disabling HCC code object v3 generation by default.
Some PyTorch unit tests have regression.  Disabling cov3 to allow more
time to debug and unblock PyTorch

Change-Id: Iba7f425ef3499c20c42ec45d9152b5d27ce97d03
2020-02-14 19:39:27 -05:00