* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()
* [hip] implement the hipExtLaunchMultiKernelMultiDevice API
* add a guard to check the HCC version for acquire_locked_hsa_queue() API which was introdued in HCC for ROCm 2.5
* modified code based on the requested changes
* changes to lock all streams before launching kernels for each device and unlock them after the dispatches
* check each stream to be valid before starting to lock all the streams
* Implement the hipOccupancyMaxPotentialBlockSize function
* Replaced hipGetDeviceProperties() call by ihipGetDeviceProperties() in ihipOccupancyMaxPotentialBlockSize()
* Add test for hipOccupancyMaxPotentialBlockSize in Module API
* Added extern declaration for ihipGetDeviceProperties() to be accessed inside ihipOccupancyMaxPotentialBlockSize()
* fixed hipOccupancyMaxPotentialBlockSize test build issue
* Fix hipOccupancyMaxPotentialBlockSize dtest
* Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest
* Revert "Add BUILD_CMD in hipOccupancyMaxPotentialBlockSize dtest"
This reverts commit 0480ff56f1441fc515d2c26ce33783e303423938.
* Disable hipOccupancyMaxPotentialBlockSize dtest on NVCC
* move extern declaration of ihipGetDeviceProperties to hip_module.cpp
* Update the limiation of 32 wavefronts per CU and 800/512 SGPRs for VI/pre-VI chips to calculate the occupancy
Appended 48 empty bytes to the kernarg area at runtime. The implicit arguments are enabled primarily for the hostcall services
and it is completely abstracted from the user code. Enabled it for both hip-clang and hip-hcc.
Use the code object manager library to parse the code object metadata. Both
code object v2 and v3 formats are now supported for HCC generated binaries.
* Add ersatz for NVRTC.
* Fix extraneous paren and use correct namespace.
* Use lowerCamelCase (yuck, yuck) consistently.
* Link against FS when building hiprtc lib.
* Correctly mark Manipulators. Fix dual compile.
* Add unit tests. Extend HIT to accept linker options.
* Make sure the HIPRTC library is installed.
* Better logging. Try to auto-detect the target.
* Stop specifying the target explicitly.
* Add missing flavour of `hipModuleLaunchKernel`.
* Program was already destroyed.
* Don't use `--genco`. Fix mangled name trimming.
* Fix HIPRTC breakage due to upstream noise.
* [dtests] Replace RUN -> TEST in hiprtc tests
Change-Id: Ie499e92dfe4e5c94634b1c2b76cf52d241bcfea3
* [hit] Set HIP_PATH to HIP_ROOT_DIR for all tests
Change-Id: Ib0ad1f99bc71c03e363e055dd508a7a4a210680a
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""
This reverts commit a3d118eaa8.
* Revert "Use COMgr to read Kernel Args Metadata (#1006)"
This reverts commit 8a548bf40b.
* Revert "improve program state commentary"
This reverts commit 7aada87cbd.
* Revert "load program state once per agent"
This reverts commit c9117de8eb.
* start moving function_names() into the hip shared lib
* start moving code_object_blobs to a new "state" object
* Consolidate various program state related static objects into a
single program_state object
* minor clean up
* move more stuffs from functional_grid_launch into program_state
* debug make_kernarg
* moving lookup for kernargs size_align into program_state
* clean up old code for kernarg size and alignment
* update hip_module to use newer api in program_state
* Create public member functions for program_state
* move most program state functions into shared library
* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)
* turning free functions in program state into members of program_state_impl
* change the free function globals() into a member of program_state_impl
* replace the static mutex used for populating globals
* moving associate_code_object_symbols_with_host_allocation into
program_state_impl
* move load_code_object_and_freeze_executable into program_state_impl
* moving executables and functions_names into program_state_impl
* moving kernels() into program_state_impl
* moving functions() into program_state_impl
* move get_kernargs into program_state_impl
* moving kernel_descriptor into program_state_impl
* moving kernargs_size_align calculation into program_state_impl
* Changing the handle to program_state_impl to a pointer
* moving program_state_impl into a separate inline source file
* fixing/cleaning up some header file includes
* moving member function for kernargs_size_align into program_state.cpp
* moving Kernel_descriptor into program_state.inl
* add a new class to manage agent globals
* moving all agent globals processing functions into agent_globals_impl
* load program state once per agent
re-merging PR991 against other program state changes
* fix per-agent program state member initialization
* cache executables based on elf name, isa, and agent.
This avoids program state reloading executables after a shared library is dlopened.
re-merging PR1057 against other program state changes
* protect executables cache by a global mutex
* return ref to executables cache
* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
__hipRegisterFunction is called during by .init functions during program initialization.
It calls hipModuleGetFunction to locate kernel symbol in code objects. hipModuleGetFunction
assumes current device when locating kernel symbols. This works for HCC but not for hip-clang,
since hip-clang needs to locate kernel symbols for different devices without switching
between devices.
This patch introduces a new hsa agent parameter to ihipModuleGetFunction, which allows
__hipRegisterFunction to choose the correct hsa agent when locating kernel symbols. By
default it uses this_agent(), therefore this patch has no impact on HCC.
* Make hipModuleGetGlobal be in HIP runtime so it can be discovered at runtime
In HIP PR #929, quite a few HIP public APIs were made as inline functions with
hidden visibility. It was necessary to support applications with shared
libraries with GPU kernels launched via hipLaunchKernelGGL(), after HIP runtime
is initialized.
In empirical tests, the implementation has been proved to be a bit too
excessive, especially for hipModuleGetGlobal(). The function is used by another
type of client applications which relies on the existence of this function
within HIP runtime so global symbols from HSA code objects loaded dynamically
at runtime can be retrieved programmtically.
This commit moves hipModuleGetGlobal() back to src/hip_module.cpp, and makes it
visible and not inline, to fulfill requirements for applications
aforementioned. It does not change the behavior of applications depending on
hipLaunchKernelGGL().
* Add HIP_INIT_API into the implementation of hipModuleGetGlobal
Address review comments.
* Fix failing HIP unit tests
Issue: Header uses std::vector<Agent_global> agent_globals which is created by hip_module.cpp
- Move iterator fails to copy Agent_global from library source into header version
- Due to different versions of std::string name in struct Agent_global
Fix: Change Agent_global to use char* name instead of std::string name
Issue: mismatch undefined symbols in different user env
- Binary expects modified return value std::string&
- Fails to match libhip_hcc.so: return value is std::string& but doesn't match modified C++ env
Fix: Change return value to char*, create new key std::string in header from char*
* Initial attempt to switch over to internally linked state.
* Add missing CMake update.
* hipLaunchKernelGGLImpl must be inline as well. Ensure internal linkage.
* Ensure global retrieval uses internally linked state.
* Hide HC in the implementation. Minimise ADL woes.
* Strange software exists, and must be catered to.
* Use a less spammy mechanism for ensuring internal linkage / non-export.
* Remove leftover internal detail.
A hash calculated via FNV-1a algorithm is introduced in ihipModule_t, the
internal of hipModule_t. The hash is used by HIP module APIs such as
- read_agent_global_from_module
to determine whether the agent-scope globals for a module have been iterated.
This commit fixes one issue that applications which load / unload modules
frequently would occasionally fail. After deep investigation of the issue it
turns out the old implementation in read_agent_global_from_module uses
hipModule_t as the key, which is not robust enough, as hipModule_t instances
are allocated dynamically so there are cases that one memory address may be
used by multiple hipModule_t instances. The real solution is to introduce a
uniquely identifiable hash for the code object associated with the HIP module.
And that's the rationale behind this commit.