* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
[ROCm/clr commit: f337ae1edb]
...while including HIP main header file, which is inserted now after #indef controlling macro, or after #pragma once, if it's occurred earlier.
+ Add a couple of unit tests.
ToDo: Check backward compatibility on older clang versions.
[ROCm/clr commit: fedef02c37]
* Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttributeHdpRegFlushCntl
* Added NVCC blocker for the hip*FlushCntl test cases
[ROCm/clr commit: abe6776677]
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
[ROCm/clr commit: 7b9801fe9a]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos, add missing references.
[ROCm/clr commit: b149219167]
* Add hip init in hipExtLaunchMultiKernelMultiDevice
* Add more logstatus for multiple return paths
* Fix missing i in function name
[ROCm/clr commit: 8df47255c5]
* Add HSA_PATH to hip_Includes in cmake and hipconfig
* HSA_PATH to CACHE path,checks for HSA include path
* Removed new lines at EOF
[ROCm/clr commit: 53b5c917cc]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos.
[ROCm/clr commit: 98ce4725fd]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos
[ROCm/clr commit: e145850f26]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/clr commit: 9547bd5ddb]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/clr commit: e61a9d60f0]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/clr commit: d1a0ac6990]
* Fix hipMemcpy-size test running out of Host Mem
The hipMemcpy-size uses a maxElem calculated from the total GPU mem /8. Then it will allocate 4 times that amount of host memory. This tests begins failing when there is not enough host memory, such as on systems with 32GB GPU mem, and 16GB RAM. This fixes the test if not enough host memory is available on the system.
* Add windows support to hipMemcpy-size fix
* avoid linking extra libs for windows
* HIPMemcpy-size Remove freeCPU including swap
[ROCm/clr commit: c56876cc19]
+ Tested on Windows and Linux;
+ Provide patch for clang's bug 38811;
+ Update Readme.md accordingly.
P.S.
With the next 9.0.0 release patches for Windows won't be needed, cause all fixes will be there.
[ROCm/clr commit: deb4325372]
+ Fixes the following assert in debug version:
Assertion failed: (S.empty() || S[0] != '-') && "Option can't start with '-", file C:\GIT\LLVM\trunk-for-submits\llvm-project\llvm\lib\Support\CommandLine.cpp, line 440
+ DashDash option left declared in order to be listed in help.
[ROCm/clr commit: 14aad50e07]
* [hit] Workaround for %cc and %cxx mappings.
HIP CMakeLists.txt modifies CMAKE_C_COMPILER and CMAKE_CXX_COMPILER.
This messes up any dtests that want to test against cc/c++.
So hardcode %cc to /usr/bin/cc and %cxx to /usr/bin/c++ for now till
we come up with a better solution.
Change-Id: I7dce93ce8360191e612a94e3a735e5612ac27ab5
* [hit] Add auto-variable %hip-path to syntax for BUILD_CMD
Change-Id: Id097a183fbce2b2c9691d0180d3304dd17a4e016
[ROCm/clr commit: af9aae6b4e]
* [HIP][tests] New testcases for module api
* [HIP][Tests]Support for CUDA devices
* Updated tests as per latest master & test GetGlobal to work on all platforms
[ROCm/clr commit: f566bec546]