* Put 3-wide vector types on a ketogenic diet.
* Remove needless include.
* Do not be narrow-minded.
* Do not be narrow-minded.
* Put the C people on a diet too.
[ROCm/hip commit: 67abac1365]
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
[ROCm/hip commit: 96dc74897d]
* Implement the hipOccupancyMaxPotentialBlockSize function
* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()
* Add test for hipOccupancyMaxPotentialBlockSize in Module API
* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()
* fixed hipOccupancyMaxPotentialBlockSize test build issue
* Fix hipOccupancyMaxPotentialBlockSize dtest
* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest
* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"
This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.
* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC
* move extern declaration of ihipGetDeviceProperties to hip_module.cpp
* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy
[ROCm/hip commit: d492f1fd6b]
Convert python 2 constructs to python 3 compatible ones.
In python 3, print is a function, so use write methods (which are always functions) instead.
In python3 keys() returns an iterator, rather than a list. This means you can't change the data structure that is being iterated over. Converting this iterator into a list mimics the python 2 behavior.
[ROCm/hip commit: cc374b2bd3]
- Once HIP_VDI_HOME is defined but HIP_CLANG_INCLUDE_PATH is not,
calculate it directly without HIP_CLANG_PATH is defined or not;
Otherwise, we may leave HIP_CLANG_INCLUDE_PATH undefined, if clang is
not installed following the official way (so far, HIP-Clang breaks
that), we may leave HIP_CLANG_INCLUDE_PATH undefined before its uses.
[ROCm/hip commit: e32940357f]
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.
[ROCm/hip commit: 9c03a5f948]
There is soft link /opt/rocm/bin/.hipVersion, therefore when hipcc is executed
as /opt/rocm/bin/hipcc, it will set HIP_VDI_HOME to /opt/rocm, which is
incorrect. Check ../lib/bitcode instead to identify HIP_VDI_HOME.
[ROCm/hip commit: 71f6bf4e67]
* Fix hipcc for hip-clang.
If there is -g, do not add -O3 by default.
If HIP_VDI_HOME is not set, set HIP_VDI_HOME based on hipcc directory for HIP/VDI runtime.
For HIP/VDI runtime, set HIP_CLANG_PATH and DEVICE_LIB_PATH based on HIP_VDI_HOME only if they exist.
This allows using HIP/VDI runtime with hip-clang installed at /opt/rocm/llvm and device lib installed
at /opt/rocm/lib.
* Fix HIP_VDI_HOME for hipcc called from /opt/rocm/bin
[ROCm/hip commit: e17f94e080]