* Remove flags parameter from hipOccupancyMaxPotentialBlockSize
This commit makes the hipOccupancyMaxPotentialBlockSize method consistent with hcc path and the CUDA API.
[ROCm/hip commit: 2a1b0ba27d]
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
[ROCm/hip commit: 502a734ebf]
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
[ROCm/hip commit: 60e1733afe]
This fixes a bug where GCC++ on Ubuntu 18.04 creates failing executables compared to GCC++ on 16.04 and clang++. While creating function names on Ubuntu 18.04, dl_phdr_info seems to provide a non-zero value for dlpi_addr on initial iteration, and an empty string in dlpi_name. This is causing failure when linking with g++, since the empty string prevents the kernel function from being loaded. Clang++ and GCC on UB16 provide a zero value for dlpi_addr. To fix this, we need to verify both addr and name exists, so that /proc/self/exe can be properly loaded.
[ROCm/hip commit: 77bef86949]
* Put 3-wide vector types on a ketogenic diet.
* Remove needless include.
* Do not be narrow-minded.
* Do not be narrow-minded.
* Put the C people on a diet too.
[ROCm/hip commit: 67abac1365]
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
[ROCm/hip commit: 96dc74897d]
* Implement the hipOccupancyMaxPotentialBlockSize function
* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()
* Add test for hipOccupancyMaxPotentialBlockSize in Module API
* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()
* fixed hipOccupancyMaxPotentialBlockSize test build issue
* Fix hipOccupancyMaxPotentialBlockSize dtest
* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest
* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"
This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.
* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC
* move extern declaration of ihipGetDeviceProperties to hip_module.cpp
* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy
[ROCm/hip commit: d492f1fd6b]
Convert python 2 constructs to python 3 compatible ones.
In python 3, print is a function, so use write methods (which are always functions) instead.
In python3 keys() returns an iterator, rather than a list. This means you can't change the data structure that is being iterated over. Converting this iterator into a list mimics the python 2 behavior.
[ROCm/hip commit: cc374b2bd3]
- Once HIP_VDI_HOME is defined but HIP_CLANG_INCLUDE_PATH is not,
calculate it directly without HIP_CLANG_PATH is defined or not;
Otherwise, we may leave HIP_CLANG_INCLUDE_PATH undefined, if clang is
not installed following the official way (so far, HIP-Clang breaks
that), we may leave HIP_CLANG_INCLUDE_PATH undefined before its uses.
[ROCm/hip commit: e32940357f]
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.
[ROCm/hip commit: 9c03a5f948]
There is soft link /opt/rocm/bin/.hipVersion, therefore when hipcc is executed
as /opt/rocm/bin/hipcc, it will set HIP_VDI_HOME to /opt/rocm, which is
incorrect. Check ../lib/bitcode instead to identify HIP_VDI_HOME.
[ROCm/hip commit: 71f6bf4e67]