Older gcc, e.g., 5.x, see an ambiguity in some calls. Example error
output as seen with gcc 5.5:
In file included from benchmark_wino.cpp:1:
In file included from ./miopen.hpp:13:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/regex:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/algorithm:62:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/bits/stl_algo.h:66:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/random:51:
/usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/bits/random.tcc:1324:27: error: call to 'abs' is ambiguous
const double __y = -std::abs(__n) * __param._M_sm - 1;
^~~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:345:31: note: candidate function
__DEFINE_HIP_COMPLEX_REAL_FUN(abs, hipCabs)
Older gcc, e.g., 5.x, see an ambiguity in some calls. Example error
output as seen with gcc 5.5:
In file included from benchmark_wino.cpp:1:
In file included from ./miopen.hpp:13:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/regex:38:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/algorithm:62:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/bits/stl_algo.h:66:
In file included from /usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/random:51:
/usr/lib/gcc/x86_64-linux-gnu/5.5.0/../../../../include/c++/5.5.0/bits/random.tcc:1324:27: error: call to 'abs' is ambiguous
const double __y = -std::abs(__n) * __param._M_sm - 1;
^~~~~~~~
/opt/rocm/hip/include/hip/hcc_detail/hip_complex.h:345:31: note: candidate function
__DEFINE_HIP_COMPLEX_REAL_FUN(abs, hipCabs)
hcc_detail/math_functions.h used to include hcc_detail/hip_runtime.h.
Removing it has caused regression in TensorFlow 1.8.
Put it back for backward compatibiliity.
Header math_functions.h should be included after including __clang_cuda_math_forward_declares.h to avoid warning: attribute declaration must precede definition.
Replace implementation of __any and __all functions using OCKL functions and replaced __ballot implementation to use llvm intrinsic llvm.amdgcn.icmp.i32 instead of calls to __activelanemask_v4_b64_b1 which is not convergent.
For now, guard the __to_local function for device compile only since a local pointer should be same size as unsigned int on GPU compile. Also change to void* instead of char*.