With opaque pointers, the suffix of those intrinsics changed. This caused
build failures that should be solved by just using the corresponding
Clang builtins instead of using intrinsics directly
See SWDEV-356581
Change-Id: Icd1d9b9438cac4bef0f7c52d4cd341ac76500890
When printing HIP API function parameters, use the integer format to
print 'char' arguments to avoid printing invalid ASCII characters
(value > 127).
Make sure the roctracer::hip_support::detail operator<< overloads are
used when printing arguments.
Change-Id: Id072c2ed19b1b4166108599e393d1cae6c54b6b0
To avoid using the thread local std::stack to remember the phase enter
timestamp, the tracer tool uses the phase data to store the timestamp.
Change-Id: I9e95637b41d6f0b2bd61016062ca07d6ba897652
Fixed an error in the CMakeLists.txt USE_PERF_API option declaration
that was making it always disabled. Fixing this exposed an issue with
the hip_prof_gen.py script's handling of function variants (for example,
_spt functions) and new HIP_INIT_API_* macros.
Also switched the python interpreter to python3 as python2 may not be
available by default on the build system.
Change-Id: I971fc9edcc746ca63a2bdf4f540e755f9a80fe69
- add user obj APIs for creating release and retain of user onbjects
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I0bf2999c77e44269565b27c31c7c1461f8a160a2
Part 3: Add missing declaration of wall_clock64() to fix
compiling issue in device code.
Add querying hipDeviceAttributeWallClockRate.
Change-Id: Ie54771c2f58eeaacdc0248bc116ef193f99eb9b9
- hipDeviceGet/SetGraphMemAttr
- hipDeviceGraphMemTrim
- there is no memory pool for graphs currently
Signed-off-by: sdashmiz <shadi.dashmiz@amd.com>
Change-Id: I11db76ea7ea1c7732175fc93264448052357e8dc
Update HIP's unsafeAtomicAdd to:
- Compile properly even when not compiling for gfx90a
- Fall back to safe atomic add on non-gfx90a architectures
- use flat atomic add for FP64 on gfx90a, instead of dynamically
checking memory spaces.
In addition, when the compiler is passed -munsafe-fp-atomics, it
will define __AMDGCN_UNSAFE_FP_ATOMICS__. When this happens, the
compiler is requesting that the HIP headers force all HIP
atomicAdd() calls on floats or doubles to use their unsafe versions.
This patch thus causes unsafeAtomicAdd() calls when that define
is seen. This call to unsafeAtomicAdd() is also done for atomicSub(),
since that calls atomicAdd underneath. This is not done for
system-scope atomicAdd because, on gfx90a, system-scope atomic FP
add instructions would need to target fine-grained memory, which is
always unsafe.
This patch also creates safeAtomicAdd() functions for float and double.
These functions will create a standalone safe atomic, even when the
application is compiled with -munsafe-fp-atomics.
Finally, this patch adds wrappers in the Nvidia path of HIP so that
these HIP functions call through to atomicAdd there as well.
Change-Id: I8af0621d3d28ea30c9278bfeea7393d03bbdac6d