Added new memory API's hipMemAllocPitch, hipMemAllocHost, hipMemsetD16, hipMemsetD16Async, hipMemsetD8Async
Modified to support all scenarios hipMemcpyParam2DAsync, hipMemcpyParam2D.
* occupancy.cpp with Makefile
* occupancy sample changes according tothe comments
* Changes according to the review comments
* Occupancy Sample Changes
* Changes according to review comments
* first cut of the header implementation of cooperative group feature
* add diclarations for device library functions
* fixed various compile time issues in the CG headers
* enabled copy construction and copy assignment
* fixed a minor bug related to conditional compilation macro
* fixed few more CG constructor issues and added a unit testcase
* fixed typo
* extended unit testcase
* compute size of partitioned CG from mask
* bit of code refactoring
* removed boilerplate code
* fixed few of the review comments by Brian
* Changes to the sigantures of few grid and multi-grid related OCKL functions
* changes to declarations of OCKL functions related to CG feature
* removed all the block level support as it is not planned for 2.9
* Have taken care of review comments by Brian
* Have taken care of review comments by Brian
* removed unused functions which were initially intended to use in block level cg support
* [hip] add initial implementation for hipLaunchCooperativeKernel API
* [hip] use total number of work groups to initialize the GWS resource
* [hip] use only one argument for init_gws kernel
* [hip] use the device associated with the stream for checking the device properties
* add default visibility to most APIs in program_state
* remove unwanted C++ headers
* Add symbol visibility pragmas and compiler flags
* Add visibility attribute to APIs in channel_descriptor and hip_hcc
* remove unused headers
* simplify build flags with hcc
* add pragma visibility hidden to functional_grid_launch
* [CMake] add gfx908 back
* Add support for hipFunGetAttribute
* Support NVCC path
* Test using sample module_api_global
* Try fixing CI build failure due to hip_prof_gen scan
* Fix for CI build issue
* Resolve conflict
* Rebase and resolve conflicts with master
* Fix build error
* Fix NVCC path build error
* Enabled gcc for hip host code
* Adding tests for hip code + (gcc & g++), without kernels
* Excluding nvcc platforms for gcc and g++ tests + Addressing review comments
* minor code clean-up
* Add rocm include path
* Added relative path for library
* Hiding non supported functions for gcc
* Incorporating review comments
* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
* Add Max Texture 1D,2D,3D device properties
* Corrected testcase to use enums defined in hipDeviceAttribute_t
* Added texture 1D,2D and 3D support for NVIDIA path
* UChar and UShort textures as Normalized Float
* UChar and UShort textures as Normalized Float for all float variants
* Handled uninitilaized texture format value
[Reason] To be compatible with CUDA [#1133]
Update HIP code, hipify-clang, tests and docs
[TODO] Add support of the corresponding functions on nvcc fallback path