* occupancy.cpp with Makefile
* occupancy sample changes according tothe comments
* Changes according to the review comments
* Occupancy Sample Changes
* Changes according to review comments
* first cut of the header implementation of cooperative group feature
* add diclarations for device library functions
* fixed various compile time issues in the CG headers
* enabled copy construction and copy assignment
* fixed a minor bug related to conditional compilation macro
* fixed few more CG constructor issues and added a unit testcase
* fixed typo
* extended unit testcase
* compute size of partitioned CG from mask
* bit of code refactoring
* removed boilerplate code
* fixed few of the review comments by Brian
* Changes to the sigantures of few grid and multi-grid related OCKL functions
* changes to declarations of OCKL functions related to CG feature
* removed all the block level support as it is not planned for 2.9
* Have taken care of review comments by Brian
* Have taken care of review comments by Brian
* removed unused functions which were initially intended to use in block level cg support
* [hip] add initial implementation for hipLaunchCooperativeKernel API
* [hip] use total number of work groups to initialize the GWS resource
* [hip] use only one argument for init_gws kernel
* [hip] use the device associated with the stream for checking the device properties
* add default visibility to most APIs in program_state
* remove unwanted C++ headers
* Add symbol visibility pragmas and compiler flags
* Add visibility attribute to APIs in channel_descriptor and hip_hcc
* remove unused headers
* simplify build flags with hcc
* add pragma visibility hidden to functional_grid_launch
* [CMake] add gfx908 back
* Add support for hipFunGetAttribute
* Support NVCC path
* Test using sample module_api_global
* Try fixing CI build failure due to hip_prof_gen scan
* Fix for CI build issue
* Resolve conflict
* Rebase and resolve conflicts with master
* Fix build error
* Fix NVCC path build error
* Enabled gcc for hip host code
* Adding tests for hip code + (gcc & g++), without kernels
* Excluding nvcc platforms for gcc and g++ tests + Addressing review comments
* minor code clean-up
* Add rocm include path
* Added relative path for library
* Hiding non supported functions for gcc
* Incorporating review comments
* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
* Add Max Texture 1D,2D,3D device properties
* Corrected testcase to use enums defined in hipDeviceAttribute_t
* Added texture 1D,2D and 3D support for NVIDIA path
* UChar and UShort textures as Normalized Float
* UChar and UShort textures as Normalized Float for all float variants
* Handled uninitilaized texture format value
[Reason] To be compatible with CUDA [#1133]
Update HIP code, hipify-clang, tests and docs
[TODO] Add support of the corresponding functions on nvcc fallback path
Typo introduced here:
commit 67abac1365
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Mon Jun 24 20:02:09 2019 -0500
Put 3-wide vector types on a ketogenic diet. (#1180)
* Remove flags parameter from hipOccupancyMaxPotentialBlockSize
This commit makes the hipOccupancyMaxPotentialBlockSize method consistent with hcc path and the CUDA API.
* Put 3-wide vector types on a ketogenic diet.
* Remove needless include.
* Do not be narrow-minded.
* Do not be narrow-minded.
* Put the C people on a diet too.
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
* Implement the hipOccupancyMaxPotentialBlockSize function
* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()
* Add test for hipOccupancyMaxPotentialBlockSize in Module API
* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()
* fixed hipOccupancyMaxPotentialBlockSize test build issue
* Fix hipOccupancyMaxPotentialBlockSize dtest
* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest
* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"
This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.
* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC
* move extern declaration of ihipGetDeviceProperties to hip_module.cpp
* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy