+ Add missing cuda_profiler_api.h to hip/hip_profile.h transformation.
NOTE: HIP Profiler API is under development. This is NOT WORKING example.
TODO: Find out a way to generate HIP_SCOPED_MARKER, HIP_BEGIN_MARKER, HIP_END_MARKER, declared in hip/hip_profile.h in particular place (signatures are to obtain).
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""
This reverts commit a3d118eaa8.
* Revert "Use COMgr to read Kernel Args Metadata (#1006)"
This reverts commit 8a548bf40b.
* Revert "improve program state commentary"
This reverts commit 7aada87cbd.
* Revert "load program state once per agent"
This reverts commit c9117de8eb.
* start moving function_names() into the hip shared lib
* start moving code_object_blobs to a new "state" object
* Consolidate various program state related static objects into a
single program_state object
* minor clean up
* move more stuffs from functional_grid_launch into program_state
* debug make_kernarg
* moving lookup for kernargs size_align into program_state
* clean up old code for kernarg size and alignment
* update hip_module to use newer api in program_state
* Create public member functions for program_state
* move most program state functions into shared library
* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)
* turning free functions in program state into members of program_state_impl
* change the free function globals() into a member of program_state_impl
* replace the static mutex used for populating globals
* moving associate_code_object_symbols_with_host_allocation into
program_state_impl
* move load_code_object_and_freeze_executable into program_state_impl
* moving executables and functions_names into program_state_impl
* moving kernels() into program_state_impl
* moving functions() into program_state_impl
* move get_kernargs into program_state_impl
* moving kernel_descriptor into program_state_impl
* moving kernargs_size_align calculation into program_state_impl
* Changing the handle to program_state_impl to a pointer
* moving program_state_impl into a separate inline source file
* fixing/cleaning up some header file includes
* moving member function for kernargs_size_align into program_state.cpp
* moving Kernel_descriptor into program_state.inl
* add a new class to manage agent globals
* moving all agent globals processing functions into agent_globals_impl
* load program state once per agent
re-merging PR991 against other program state changes
* fix per-agent program state member initialization
* cache executables based on elf name, isa, and agent.
This avoids program state reloading executables after a shared library is dlopened.
re-merging PR1057 against other program state changes
* protect executables cache by a global mutex
* return ref to executables cache
* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
+ Only a generation of transformation map of CUDA entities is implemented.
+ 2 hipify-clang options are added: -python, -o-python-map-dir.
+ Explicitly set -roc option for cuda_to_hip_mappings.py generation.
+ Generated file already might be used by pytorch team.
+ Only a generation of transformation map of CUDA entities supported by HIP is implemented.
+ 3 hipify-clang options are added: -perl, -o-perl-map, -o-perl-map-dir.
+ OptionsParser mode is changed from OneOrMore to Optional to support hipify-perl generation without actual hipification.
+ Add explicit control of source files specification absence in case of no perl generation.
* Return hipErrorInsufficientDriver status when CPU device not found - no exception thrown
* Return hipErrorInsufficientDriver status when CPU device not found