The current implementation skips this procedure for a given device
object when a global symbol is found in the cache. This is incorrect:
- There could be other undefined globals that have not been previously
encountered further down the list
- If a symbol is found in the cache, it doesn't need to be pinned again
but it still need to be defined for the current executable
Added special case for the printf buffer symbol (already pinned by HCC)
The bug was exposed by running printf on different GPUs.
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""
This reverts commit a3d118eaa8.
* Revert "Use COMgr to read Kernel Args Metadata (#1006)"
This reverts commit 8a548bf40b.
* Revert "improve program state commentary"
This reverts commit 7aada87cbd.
* Revert "load program state once per agent"
This reverts commit c9117de8eb.
* start moving function_names() into the hip shared lib
* start moving code_object_blobs to a new "state" object
* Consolidate various program state related static objects into a
single program_state object
* minor clean up
* move more stuffs from functional_grid_launch into program_state
* debug make_kernarg
* moving lookup for kernargs size_align into program_state
* clean up old code for kernarg size and alignment
* update hip_module to use newer api in program_state
* Create public member functions for program_state
* move most program state functions into shared library
* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)
* turning free functions in program state into members of program_state_impl
* change the free function globals() into a member of program_state_impl
* replace the static mutex used for populating globals
* moving associate_code_object_symbols_with_host_allocation into
program_state_impl
* move load_code_object_and_freeze_executable into program_state_impl
* moving executables and functions_names into program_state_impl
* moving kernels() into program_state_impl
* moving functions() into program_state_impl
* move get_kernargs into program_state_impl
* moving kernel_descriptor into program_state_impl
* moving kernargs_size_align calculation into program_state_impl
* Changing the handle to program_state_impl to a pointer
* moving program_state_impl into a separate inline source file
* fixing/cleaning up some header file includes
* moving member function for kernargs_size_align into program_state.cpp
* moving Kernel_descriptor into program_state.inl
* add a new class to manage agent globals
* moving all agent globals processing functions into agent_globals_impl
* load program state once per agent
re-merging PR991 against other program state changes
* fix per-agent program state member initialization
* cache executables based on elf name, isa, and agent.
This avoids program state reloading executables after a shared library is dlopened.
re-merging PR1057 against other program state changes
* protect executables cache by a global mutex
* return ref to executables cache
* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
* Initial attempt to switch over to internally linked state.
* Add missing CMake update.
* hipLaunchKernelGGLImpl must be inline as well. Ensure internal linkage.
* Ensure global retrieval uses internally linked state.
* Hide HC in the implementation. Minimise ADL woes.
* Strange software exists, and must be catered to.
* Use a less spammy mechanism for ensuring internal linkage / non-export.
* Remove leftover internal detail.
The logic to parse the kernel metadata is unaware that enabling
of early finalization could result in multiple code blobs in a
single .kernel section. This teaches the HIP runtime to handle
that.
Change-Id: I1581b42f0da8b30233d7898014f7468728c1d489
When compiling with Early Finalization enabled in HCC,
the resulting .kernel section of the host object now may
contain more than one device code bundles. This is to
teach the HIP runtime to correctly extract all the
bundles from the .kernel section.