* Remove flags parameter from hipOccupancyMaxPotentialBlockSize
This commit makes the hipOccupancyMaxPotentialBlockSize method consistent with hcc path and the CUDA API.
[ROCm/hip commit: 2a1b0ba27d]
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
[ROCm/hip commit: 502a734ebf]
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
[ROCm/hip commit: 60e1733afe]
This fixes a bug where GCC++ on Ubuntu 18.04 creates failing executables compared to GCC++ on 16.04 and clang++. While creating function names on Ubuntu 18.04, dl_phdr_info seems to provide a non-zero value for dlpi_addr on initial iteration, and an empty string in dlpi_name. This is causing failure when linking with g++, since the empty string prevents the kernel function from being loaded. Clang++ and GCC on UB16 provide a zero value for dlpi_addr. To fix this, we need to verify both addr and name exists, so that /proc/self/exe can be properly loaded.
[ROCm/hip commit: 77bef86949]
* Put 3-wide vector types on a ketogenic diet.
* Remove needless include.
* Do not be narrow-minded.
* Do not be narrow-minded.
* Put the C people on a diet too.
[ROCm/hip commit: 67abac1365]
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
[ROCm/hip commit: 96dc74897d]
* Implement the hipOccupancyMaxPotentialBlockSize function
* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()
* Add test for hipOccupancyMaxPotentialBlockSize in Module API
* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()
* fixed hipOccupancyMaxPotentialBlockSize test build issue
* Fix hipOccupancyMaxPotentialBlockSize dtest
* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest
* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"
This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.
* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC
* move extern declaration of ihipGetDeviceProperties to hip_module.cpp
* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy
[ROCm/hip commit: d492f1fd6b]
Convert python 2 constructs to python 3 compatible ones.
In python 3, print is a function, so use write methods (which are always functions) instead.
In python3 keys() returns an iterator, rather than a list. This means you can't change the data structure that is being iterated over. Converting this iterator into a list mimics the python 2 behavior.
[ROCm/hip commit: cc374b2bd3]
- Once HIP_VDI_HOME is defined but HIP_CLANG_INCLUDE_PATH is not,
calculate it directly without HIP_CLANG_PATH is defined or not;
Otherwise, we may leave HIP_CLANG_INCLUDE_PATH undefined, if clang is
not installed following the official way (so far, HIP-Clang breaks
that), we may leave HIP_CLANG_INCLUDE_PATH undefined before its uses.
[ROCm/hip commit: e32940357f]