+ Makes hip_Memcpy2D struct compatible with CUDA_MEMCPY2D struct
+ Add hipMemcpyParam2D support in nvcc fallback path
+ Update hipify-clang, tests and docs accordingly
* Add ersatz for NVRTC.
* Fix extraneous paren and use correct namespace.
* Use lowerCamelCase (yuck, yuck) consistently.
* Link against FS when building hiprtc lib.
* Correctly mark Manipulators. Fix dual compile.
* Add unit tests. Extend HIT to accept linker options.
* Make sure the HIPRTC library is installed.
* Better logging. Try to auto-detect the target.
* Stop specifying the target explicitly.
* Add missing flavour of `hipModuleLaunchKernel`.
* Program was already destroyed.
* Don't use `--genco`. Fix mangled name trimming.
* Fix HIPRTC breakage due to upstream noise.
* [dtests] Replace RUN -> TEST in hiprtc tests
Change-Id: Ie499e92dfe4e5c94634b1c2b76cf52d241bcfea3
* [hit] Set HIP_PATH to HIP_ROOT_DIR for all tests
Change-Id: Ib0ad1f99bc71c03e363e055dd508a7a4a210680a
* Revert "Revert "Use COMgr to read Kernel Args Metadata (#1006)""
This reverts commit a3d118eaa8.
* Revert "Use COMgr to read Kernel Args Metadata (#1006)"
This reverts commit 8a548bf40b.
* Revert "improve program state commentary"
This reverts commit 7aada87cbd.
* Revert "load program state once per agent"
This reverts commit c9117de8eb.
* start moving function_names() into the hip shared lib
* start moving code_object_blobs to a new "state" object
* Consolidate various program state related static objects into a
single program_state object
* minor clean up
* move more stuffs from functional_grid_launch into program_state
* debug make_kernarg
* moving lookup for kernargs size_align into program_state
* clean up old code for kernarg size and alignment
* update hip_module to use newer api in program_state
* Create public member functions for program_state
* move most program state functions into shared library
* Pass the data buffer size to load_executable
Otherwise, it can't figure what the data size is
just from the char* (since the data is not really a string)
* turning free functions in program state into members of program_state_impl
* change the free function globals() into a member of program_state_impl
* replace the static mutex used for populating globals
* moving associate_code_object_symbols_with_host_allocation into
program_state_impl
* move load_code_object_and_freeze_executable into program_state_impl
* moving executables and functions_names into program_state_impl
* moving kernels() into program_state_impl
* moving functions() into program_state_impl
* move get_kernargs into program_state_impl
* moving kernel_descriptor into program_state_impl
* moving kernargs_size_align calculation into program_state_impl
* Changing the handle to program_state_impl to a pointer
* moving program_state_impl into a separate inline source file
* fixing/cleaning up some header file includes
* moving member function for kernargs_size_align into program_state.cpp
* moving Kernel_descriptor into program_state.inl
* add a new class to manage agent globals
* moving all agent globals processing functions into agent_globals_impl
* load program state once per agent
re-merging PR991 against other program state changes
* fix per-agent program state member initialization
* cache executables based on elf name, isa, and agent.
This avoids program state reloading executables after a shared library is dlopened.
re-merging PR1057 against other program state changes
* protect executables cache by a global mutex
* return ref to executables cache
* adapt PR#981 Make hipModuleGetGlobal be in HIP runtime
- It's a common mistake by assuming 1 << shamt would be promoted to
64-bit, if shamt is a 64-bit integer. That's not the case. Replace
that left shift to a 64-bit one to ensure it won't fall into undefined
behavior.
- Fix the host-side implementation as well for device function testing.
- Separate the definition of `__HCC_OR_HIP_CLANG__`, `__HCC_ONLY__`, and
`__HIP_CLANG_ONLY__` into hip_common.h so that it could be included in
hip_fp16.h, which may be included separately in app.