The original implementation had the statistics system woken very
tightly into things like PPCallbacks, with counters duplicated
in two places, and all the output code duplicated. This made it
very difficult to alter the structure of the program without
breaking the statistics system.
Since the planned approach for solving the remaining preprocessor
bugs needs the introduction of a custom FrontendAction, and such
a restructure was incompatible with the way the statistics system
was set up, this rewrite was required.
'tis rather simpler now, mind you :D
This commit also fixes an issue where some stats were counted
twice, and allows `-print-stats` to operate independently of
`-stat-output`, allowing you to print stats to a file without
printing them to a terminal (or vice-versa).
operator[] default-constructs the map value if no value exists
for that key. Default-construction of int yields a zero. So all
the manual faffing around is just unnecessary.
See [the documentation](https://cmake.org/cmake/help/v3.0/command/find_package.html)
for exactly how the search procedure works. If you want to use an
LLVM from a specific location, use CMAKE_PREFIX_PATH as normal.
No longer do we have a nonstandard HIPIFY_CLANG_LLVM_DIR variable
for people to learn about.
Since there's now an option for toggling hipify-clang, omitting the
path is no longer something we need to check for. We'll still
abort if LLVM isn't found, due to `REQUIRED`.
Instead of deciding whether to build hipify-clang based on
the presence of an LLVM path on the command line, have an
explicit option.
Do we want this default-on or default-off? I've defaulted it to
on for now, but maybe we want the opposite?
All but the most contrived use of macros is now properly handled -
have a look at the new testcases this commit adds. You can have
macros in kernel calls, macros spanning chunks of your arguments,
the call, call parameters, or callee can all be macros or
partially macros.
It seems like there was a lot of machinery here that is no longer
needed now we have hipLaunchKernelGGL (which doesn't require us
to insert an extra argument into kernel functions). We no longer
need to waste cycles scanning the AST for callees.
We can literally just do "Take the callee expression, and dump
it into the first argument of hipLaunchKernelGGL()".
There's three functions here that all do the same thing...
There was also logic that looks for numeric literals and works
backwards to find the macro name from which they are expanded.
I previously introduced code that rewrites macro references at
expand-time in the `MacroExpands` callback, so that code is no
longer doing anything useful.
Calls to macros that were themselves CUDA API calls were often
being missed - this applies the identifier transform to macro
names at the callsites, too.
The source location for a call that's inside a macro body will,
by default, point into the macro definition itself. The original
logic was causing macro invocations to be overwritten, as I
explain here:
https://github.com/ROCm-Developer-Tools/HIP/issues/207#issuecomment-337521851
The existing PPCallbacks code is correctly rewriting macro
definitions, so the practical effect of this change is that AST
rewrites on code that's expanded from macros are no-ops.
It might be a performance optimisation to put a short-circiut at
the top of the AST callbacks to abort when faced with code that
was expanded from macros.
It might yet prove wise to do absolutely everything at lex-time...
A chain of 7 closing braces is never a great sign :D
In the process it became apparant that the unsupported flag
was being silently ignored, causing users to be left with cuda
API calls in their programs with no warning given. This has been
rectified for consistency.
Instead of having a single, enormous LUT for all CUDA names, let's
have separate ones for different types of entity. We often know
that we're looking at a typename, or a function name, or a macro
name - so we can be more efficient (and resilient to name
collisions) by having smaller lookup tables for each of those
classes of entity).
Here we start that off by having three LUTs:
- Header names
- Type names
- Everything else
Future work could usefully split "everything else" into:
- enum values
- macro names
- function names
- everything else
It's worth noting that the "needs new matcher" todos I delete here
were actually resolved with the previous commit. It no longer
naively searches for things that start with "cu*" - it will find
exactly those things that are present in our lookup tables.
Previously, there were different AST matchers for each
language construct that contains a type reference, and custom
logic to perform the transformation within each of those
structures.
Since the transformation in all such cases was only replacing
CUDA types with hip ones, we can instead use an AST matcher
that finds and updates the type references directly.
This simplifies the program considerably, and it won't fail
when it finds a language feature (or complicated type expression)
that nobody wrote custom logic for yet.
Also, rewrote it as a constant variable instead of a function
that imperatively fills a map. It's shorter, faster the compile,
and (depending on how badly the compiler screws it up) maybe
faster to run.
And, of course, it starts breaking up that giant .cpp file.