* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
[ROCm/hip commit: 1eb3dbf065]
...while including HIP main header file, which is inserted now after #indef controlling macro, or after #pragma once, if it's occurred earlier.
+ Add a couple of unit tests.
ToDo: Check backward compatibility on older clang versions.
[ROCm/hip commit: 25075729f9]
* Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttributeHdpRegFlushCntl
* Added NVCC blocker for the hip*FlushCntl test cases
[ROCm/hip commit: e7447d5809]
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
[ROCm/hip commit: 4b18b321f7]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos, add missing references.
[ROCm/hip commit: 77e9ade9bc]
* Add hip init in hipExtLaunchMultiKernelMultiDevice
* Add more logstatus for multiple return paths
* Fix missing i in function name
[ROCm/hip commit: b9e6d72ee6]
* Add HSA_PATH to hip_Includes in cmake and hipconfig
* HSA_PATH to CACHE path,checks for HSA include path
* Removed new lines at EOF
[ROCm/hip commit: 50597e2085]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos.
[ROCm/hip commit: c48fca494a]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos
[ROCm/hip commit: 6f6aa13448]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/hip commit: 697c7d87d3]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/hip commit: 667defc65d]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/hip commit: 75c0dc9d8f]
* Fix hipMemcpy-size test running out of Host Mem
The hipMemcpy-size uses a maxElem calculated from the total GPU mem /8. Then it will allocate 4 times that amount of host memory. This tests begins failing when there is not enough host memory, such as on systems with 32GB GPU mem, and 16GB RAM. This fixes the test if not enough host memory is available on the system.
* Add windows support to hipMemcpy-size fix
* avoid linking extra libs for windows
* HIPMemcpy-size Remove freeCPU including swap
[ROCm/hip commit: 0de4caa085]
+ Tested on Windows and Linux;
+ Provide patch for clang's bug 38811;
+ Update Readme.md accordingly.
P.S.
With the next 9.0.0 release patches for Windows won't be needed, cause all fixes will be there.
[ROCm/hip commit: 91e461fcf2]