* [HIP] Introduce library_types.h as a common header for libs
[Reason]
Currently, hipFFT, hipBLAS and other HIP libs use their own data types, prefixed with HIPFFT or HIPBLAS, whereas in CUDA those types are common and declared in library_types.h
[TODO]
Switch hipFFT, hipBLAS and other HIP libs to use common library_types.h.
* [HIP] Move include for library_types.h to hip_runtime.h
[Reason]
Repeat CUDA's behaviour, where library_types.h is included in cuda_runtime.h
This fixes the usage of an uninitialized cdattr variable in hipDeviceGetAttribute for the CUDA backend when taking the switch default, as detailed in #1317.
Note that the directed_tests/runtimeApi/device/hipGetDeviceAttribute.tst test fails for me, but it already did before applying this patch. Let's see what CI says!
Added new memory API's hipMemAllocPitch, hipMemAllocHost, hipMemsetD16, hipMemsetD16Async, hipMemsetD8Async
Modified to support all scenarios hipMemcpyParam2DAsync, hipMemcpyParam2D.
* occupancy.cpp with Makefile
* occupancy sample changes according tothe comments
* Changes according to the review comments
* Occupancy Sample Changes
* Changes according to review comments
* first cut of the header implementation of cooperative group feature
* add diclarations for device library functions
* fixed various compile time issues in the CG headers
* enabled copy construction and copy assignment
* fixed a minor bug related to conditional compilation macro
* fixed few more CG constructor issues and added a unit testcase
* fixed typo
* extended unit testcase
* compute size of partitioned CG from mask
* bit of code refactoring
* removed boilerplate code
* fixed few of the review comments by Brian
* Changes to the sigantures of few grid and multi-grid related OCKL functions
* changes to declarations of OCKL functions related to CG feature
* removed all the block level support as it is not planned for 2.9
* Have taken care of review comments by Brian
* Have taken care of review comments by Brian
* removed unused functions which were initially intended to use in block level cg support
* [hip] add initial implementation for hipLaunchCooperativeKernel API
* [hip] use total number of work groups to initialize the GWS resource
* [hip] use only one argument for init_gws kernel
* [hip] use the device associated with the stream for checking the device properties
* add default visibility to most APIs in program_state
* remove unwanted C++ headers
* Add symbol visibility pragmas and compiler flags
* Add visibility attribute to APIs in channel_descriptor and hip_hcc
* remove unused headers
* simplify build flags with hcc
* add pragma visibility hidden to functional_grid_launch
* [CMake] add gfx908 back
* Add support for hipFunGetAttribute
* Support NVCC path
* Test using sample module_api_global
* Try fixing CI build failure due to hip_prof_gen scan
* Fix for CI build issue
* Resolve conflict
* Rebase and resolve conflicts with master
* Fix build error
* Fix NVCC path build error
* Enabled gcc for hip host code
* Adding tests for hip code + (gcc & g++), without kernels
* Excluding nvcc platforms for gcc and g++ tests + Addressing review comments
* minor code clean-up
* Add rocm include path
* Added relative path for library
* Hiding non supported functions for gcc
* Incorporating review comments
* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor