This is a quick workaround to match HCC behavior for performance since inlining usually
results in more optimization opportunities therefore better performance.
We will fine tuning inline threashold later.
Some PyTorch unit tests have regression. Disabling cov3 to allow more
time to debug and unblock PyTorch
Change-Id: Iba7f425ef3499c20c42ec45d9152b5d27ce97d03
- The known target checking should skip `gfx000` as well as it won't be
used in real compilation command formation. The avoid generating
annoying warning on `gfx000`.
* Fix hipcc warning related to hipVersion
* Rename hipVersion.h to hip_version.h
* Remove HIP_VERSION splitting
* Update .gitignore
- Ignore generated include/hip/hip_version.h
- Removed some stale entries
- Added executables from samples/1_Utils/*/ for consistency with bin/ entries.
HIP_VERSION_MAJOR, HIP_VERSION_MINOR, HIP_VERSION_PATCH and HIP_VERSION pre-processor macros are now defined in hipVersion.h instead of being set by hipcc.
+ Both Driver API and RT API are supported and synced with each other
+ Update *.md docs and hipify-perl accordingly
+ Add new conversion type "virtual_memory", introduced in Dirver API
+ Update *.md docs and hipify-perl accordingly
[Reason]
Starting with CUDA 10.1 all error codes are merged between Driver and RT APIs
[ToDo]
Do the same merge in HIP API as there is no need in distinguishing return codes by API
+ Add one matcher (will be more)
+ Update Maps and Statistics
+ Add cub_01.cu unit test
+ Update lit harness to support standalone CUB
+ Update README.md
+ Update hipify-perl (only CUB header is supported for now)
[IMPORTANT]
clang (and hipify-clang) works correctly only with official NVLabs version on GitHub.
Compilation of CUB from official CUDA release has conflicts with THRUST.
Thus, to compile CUB sources, option "-I" should be specified to the cloned CUB from NVLAB on GitHub.
Added new memory API's hipMemAllocPitch, hipMemAllocHost, hipMemsetD16, hipMemsetD16Async, hipMemsetD8Async
Modified to support all scenarios hipMemcpyParam2DAsync, hipMemcpyParam2D.
+ hipify-perl script is entirely generated by hipify-clang under an option -perl now
+ hipify-perl still has correctness gaps comparing to hipify-clang: they will be eliminated AMAP further
[REASON]
1. hip-clang is fine with the templated kernel launch, brackets are unneeded: HIP_KERNEL_NAME(...) __VA_ARGS__
2. HCC is not, thus: HIP_KERNEL_NAME(...) (__VA_ARGS__)
[TODO] Clean-up entirely kernel name wrapping when HCC is finally obsolete.
+ Update perl generation, hipify-perl, and affected tests accordingly.