* [hit] Workaround for %cc and %cxx mappings.
HIP CMakeLists.txt modifies CMAKE_C_COMPILER and CMAKE_CXX_COMPILER.
This messes up any dtests that want to test against cc/c++.
So hardcode %cc to /usr/bin/cc and %cxx to /usr/bin/c++ for now till
we come up with a better solution.
Change-Id: I7dce93ce8360191e612a94e3a735e5612ac27ab5
* [hit] Add auto-variable %hip-path to syntax for BUILD_CMD
Change-Id: Id097a183fbce2b2c9691d0180d3304dd17a4e016
[ROCm/clr commit: af9aae6b4e]
* [HIP][tests] New testcases for module api
* [HIP][Tests]Support for CUDA devices
* Updated tests as per latest master & test GetGlobal to work on all platforms
[ROCm/clr commit: f566bec546]
* Add Max Texture 1D,2D,3D device properties
* Corrected testcase to use enums defined in hipDeviceAttribute_t
* Added texture 1D,2D and 3D support for NVIDIA path
[ROCm/clr commit: 00aa42e05f]
* UChar and UShort textures as Normalized Float
* UChar and UShort textures as Normalized Float for all float variants
* Handled uninitilaized texture format value
[ROCm/clr commit: 849b5ef6af]
+ Source files are the first to go. It is needed for in-place hipification in order to avoid errors with included but already hipified header files.
+ More extensions support for batch processing.
[ROCm/clr commit: ba87bcba5c]
[Reason] To be compatible with CUDA [#1133]
Update HIP code, hipify-clang, tests and docs
[TODO] Add support of the corresponding functions on nvcc fallback path
[ROCm/clr commit: f0832fd968]
+ Add option -print-stats-csv to dump statistics to CSV file
+ If -o-dir is specified, CSV file will be dumped there
+ Generate 1 summary file sum_stat.csv in case of multiple sources
[ROCm/clr commit: 60a9143d6d]
* Remove flags parameter from hipOccupancyMaxPotentialBlockSize
This commit makes the hipOccupancyMaxPotentialBlockSize method consistent with hcc path and the CUDA API.
[ROCm/clr commit: a401997b8e]
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
[ROCm/clr commit: 3541d18528]
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
[ROCm/clr commit: 98648828c0]
This fixes a bug where GCC++ on Ubuntu 18.04 creates failing executables compared to GCC++ on 16.04 and clang++. While creating function names on Ubuntu 18.04, dl_phdr_info seems to provide a non-zero value for dlpi_addr on initial iteration, and an empty string in dlpi_name. This is causing failure when linking with g++, since the empty string prevents the kernel function from being loaded. Clang++ and GCC on UB16 provide a zero value for dlpi_addr. To fix this, we need to verify both addr and name exists, so that /proc/self/exe can be properly loaded.
[ROCm/clr commit: f87b900f96]
* Put 3-wide vector types on a ketogenic diet.
* Remove needless include.
* Do not be narrow-minded.
* Do not be narrow-minded.
* Put the C people on a diet too.
[ROCm/clr commit: 87eac86298]
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
[ROCm/clr commit: d6ad690cb6]