Make hipModuleGetGlobal match cuModuleGetGlobal behavour.
That is, if one of the first two parameters is nullptr, ignore it.
Change-Id: I3fe6dbc35a7b14aa9119df297b7885df83d28048
Remove hip-hcc codes from hip code base
Simplify hip CMakeLists.txt to exclude hip-hcc
Simplify cmake cmd for hip-rocclr building
Some minor fixes
Change-Id: I1ae357ecfd638d6c25bca293c1724b026be21ecd
there is a build error when building HIP with latest HCC from GitHub after PR#1935 merged into HIP master branch. this PR changed blockDimX to blockDim and two lines missed this change where added in the current PR.
* Fix cooperative launch APIs to set hipGetLastError
Previously, the cooperative launch APIs did not properly log their
errors in the global hipGetLastError variable before returning back
to the user. As such, the APIs would leave hipSuccess in the
last error, which would break some use cases.
This fixes that problem by making a trampoline function that does
the HIP_INIT_API and ihipLogStatus.
* Add missing flag to the log of multi-GPU launch
This PR takes ensures that the maxThreadsPerBlock returned by hipFuncGetAttributes is both a multiple of the warp size and that the register usage of the maximum block does not exceed the number of available registers.
Fixes#1662
Fix two errors in hipOccupancyMaxActiveBlocksPerMultiprocessor.
1) Fix a possible segfault if the user passed in a null pointer for
the numBlocks value.
2) Handle the situation when the user is asking for a block size
that is larger than what the target device can hold within a
single block.
* Fix bug in LaunchKernel test
Instead of passing the address of the gpu buffer, pass the address
of the pointer that holds the address of the gpu buffer
* Fix hipLaunchKernel's kernarg buffer construction.
The hipLaunchKernel implementation should rely on ihipModuleLaunchKernel
to construct the kernarg buffer correctly based on kernel metadata.
* Fix a bug in get_functions where the Kernel_descriptor wasn't constructed with the correct kernarg layout information.
* Fix a bug in kernarg layout parsing dealing with kernel without any arg
* teach ihipModuleLaunchKernel to handle kernel without any arg
* Add a more interesting test
This PR is a follow-up on PR# #1698 and it makes two more APIs (hipLaunchCooperativeKernel/hipLaunchCooperativeKernelMultiDevice) inline so that they can work correctly with lazy binding.
[Background] it was found that if lazy linking used for a library that calls hipExtLaunchMultiKernelMultiDevice API then this API can get the wrong program_state object for looking up device kernels leading to a "No device code available" error in this API.
To fix this issue, the API was refactored to be inline and get and pass the correct program_state to an internal hip API to request a multi-device kernel launch.
* [hip] add support for implicit kernel argument for multi-grid sync
* modified code for calculating the prev_sum
* change the impCoopArg type to size_t
* add memory clean up
* launch init_gws and main kernels into two separate loops
* [hip] add initial implementation for hipLaunchCooperativeKernel API
* [hip] use total number of work groups to initialize the GWS resource
* [hip] use only one argument for init_gws kernel
* [hip] use the device associated with the stream for checking the device properties
* Add support for hipFunGetAttribute
* Support NVCC path
* Test using sample module_api_global
* Try fixing CI build failure due to hip_prof_gen scan
* Fix for CI build issue
* Resolve conflict
* Rebase and resolve conflicts with master
* Fix build error
* Fix NVCC path build error
* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
* Added support of hipOccupancyMaxActiveBlocksPerMultiprocessor & hipOccupancyMaxActiveBlocksPerMultiprocessorWithFlags APIs
* Taking into account of SGPR usage to determine the max active blocks in hipOccupancyMaxActiveBlocksPerMultiprocessor()