* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
[ROCm/clr commit: f337ae1edb]
...while including HIP main header file, which is inserted now after #indef controlling macro, or after #pragma once, if it's occurred earlier.
+ Add a couple of unit tests.
ToDo: Check backward compatibility on older clang versions.
[ROCm/clr commit: fedef02c37]
* Added query of hipDeviceAttributeHdpMemFlushCntl and hipDeviceAttributeHdpRegFlushCntl
* Added NVCC blocker for the hip*FlushCntl test cases
[ROCm/clr commit: abe6776677]
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
[ROCm/clr commit: 7b9801fe9a]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos, add missing references.
[ROCm/clr commit: b149219167]
* Add hip init in hipExtLaunchMultiKernelMultiDevice
* Add more logstatus for multiple return paths
* Fix missing i in function name
[ROCm/clr commit: 8df47255c5]
* Add HSA_PATH to hip_Includes in cmake and hipconfig
* HSA_PATH to CACHE path,checks for HSA include path
* Removed new lines at EOF
[ROCm/clr commit: 53b5c917cc]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos.
[ROCm/clr commit: 98ce4725fd]
+ CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
+ Fix typos
[ROCm/clr commit: e145850f26]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/clr commit: 9547bd5ddb]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/clr commit: e61a9d60f0]
CUDA version - version in which API has appeared and (optional) last version before abandoning it; no value in case of earlier versions < 7.5.
[ROCm/clr commit: d1a0ac6990]
* Fix hipMemcpy-size test running out of Host Mem
The hipMemcpy-size uses a maxElem calculated from the total GPU mem /8. Then it will allocate 4 times that amount of host memory. This tests begins failing when there is not enough host memory, such as on systems with 32GB GPU mem, and 16GB RAM. This fixes the test if not enough host memory is available on the system.
* Add windows support to hipMemcpy-size fix
* avoid linking extra libs for windows
* HIPMemcpy-size Remove freeCPU including swap
[ROCm/clr commit: c56876cc19]
+ Tested on Windows and Linux;
+ Provide patch for clang's bug 38811;
+ Update Readme.md accordingly.
P.S.
With the next 9.0.0 release patches for Windows won't be needed, cause all fixes will be there.
[ROCm/clr commit: deb4325372]