17b734afde
+ Add a corresponding matcher cudaDeviceFuncCall to match only (__device__ or __global__) and not __host__ functions.
+ Add a corresponding device functions mapping:
only unsupported are listed, cause supported are exactly the same as of CUDA and do not need transformation;
make FindAndReplace for device functions separated from host API calls.
+ Add a test to distinguish device functions and user-defined.
[ROCm/hip commit: 6602fadc16]