+ Add option -print-stats-csv to dump statistics to CSV file
+ If -o-dir is specified, CSV file will be dumped there
+ Generate 1 summary file sum_stat.csv in case of multiple sources
Typo introduced here:
commit 87eac86298
Author: Alex Voicu <alexandru.voicu@amd.com>
Date: Mon Jun 24 20:02:09 2019 -0500
Put 3-wide vector types on a ketogenic diet. (#1180)
* Remove flags parameter from hipOccupancyMaxPotentialBlockSize
This commit makes the hipOccupancyMaxPotentialBlockSize method consistent with hcc path and the CUDA API.
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
This fixes a bug where GCC++ on Ubuntu 18.04 creates failing executables compared to GCC++ on 16.04 and clang++. While creating function names on Ubuntu 18.04, dl_phdr_info seems to provide a non-zero value for dlpi_addr on initial iteration, and an empty string in dlpi_name. This is causing failure when linking with g++, since the empty string prevents the kernel function from being loaded. Clang++ and GCC on UB16 provide a zero value for dlpi_addr. To fix this, we need to verify both addr and name exists, so that /proc/self/exe can be properly loaded.
* Put 3-wide vector types on a ketogenic diet.
* Remove needless include.
* Do not be narrow-minded.
* Do not be narrow-minded.
* Put the C people on a diet too.
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
* Implement the hipOccupancyMaxPotentialBlockSize function
* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()
* Add test for hipOccupancyMaxPotentialBlockSize in Module API
* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()
* fixed hipOccupancyMaxPotentialBlockSize test build issue
* Fix hipOccupancyMaxPotentialBlockSize dtest
* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest
* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"
This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.
* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC
* move extern declaration of ihipGetDeviceProperties to hip_module.cpp
* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy
Convert python 2 constructs to python 3 compatible ones.
In python 3, print is a function, so use write methods (which are always functions) instead.
In python3 keys() returns an iterator, rather than a list. This means you can't change the data structure that is being iterated over. Converting this iterator into a list mimics the python 2 behavior.
- Once HIP_VDI_HOME is defined but HIP_CLANG_INCLUDE_PATH is not,
calculate it directly without HIP_CLANG_PATH is defined or not;
Otherwise, we may leave HIP_CLANG_INCLUDE_PATH undefined, if clang is
not installed following the official way (so far, HIP-Clang breaks
that), we may leave HIP_CLANG_INCLUDE_PATH undefined before its uses.
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.
There is soft link /opt/rocm/bin/.hipVersion, therefore when hipcc is executed
as /opt/rocm/bin/hipcc, it will set HIP_VDI_HOME to /opt/rocm, which is
incorrect. Check ../lib/bitcode instead to identify HIP_VDI_HOME.