Fix issues of missing kernel function symbols and missing argument list via
using __hipRegister* functions.
Then the following tests can pass,
directed_tests/runtimeApi/module/hipFuncGetAttributes
directed_tests/runtimeApi/module/hipExtLaunchMultiKernelMultiDevice
directed_tests/gcc/LaunchKernel
Change-Id: I52135b61e8283eb4f9f10f77895151e4e55418d9
Fix compilation error on hip-hcc+clang , hip-vdi+clang
Enabled hipExtLaunchMultiKernelMultiDevice test on hip-vdi path
hipExtLaunchMultiKernelMultiDevice common declaration for all paths
Change-Id: I76031840614fce8e12a8e845548fa43a389a741a
Review comments - generate hiprtc lib everytime when HIP_PLATFORM is hcc
Changes for hip-clang
Removing pre processor directive to simplify
Change-Id: Id38ab368362b58ee0458baeb8051fea709ae6bba
Make __gnu_h2f_ieee and __gnu_f2h_ieee visible so that hipTestHalf
test can succeed in Clang compiler + Hcc RT.
Change-Id: I5f7d5db19e559b3b66356f0170a8dbc1e5505f3e
The existing one can have issues on certain systems, therefore this limits use of direct memcpy via largeBAR to sizes where it is unequivocally better.
Also addresses SWDEV-220030 and SWDEV-222237.
For hipLaunchKernelGGL(), hmod->kernargs is empty, thus we need
insert hmod->kernargs[name_str] which is empty.
Change-Id: I95f818d0525da84452e66c5778f0648a643843c7
Fix the following issues:
1.Ignore hidden arguments of kernel functions.
2.Look up both origial function name and function name with .kd postfix
when argments are retrived from module.
3.Addition, fix compiling issue of LaunchKernel test app.
Change-Id: I9400943f2f02433cb4409b19c0cac3626c2bc454
* Use deque instead of vector for code readers so that the iterators and references will be stable
* Fix compile error
* Assign the iterator
* Add multithreaded test
* Make threads a multiple of hardware concurrency
* Output on failure
* Add setDevice to try and initialize the context on cuda
* Create context for cuda
* Set context on each thread
* Reduce threads on cuda
* Skip test on cuda
* Try to initialize the primary context on cuda
* Push ctx to the stack as current
* Revert "Push ctx to the stack as current"
This reverts commit bff8cbe950.
* Revert "Try to initialize the primary context on cuda"
This reverts commit fd98514113.
* updated test for nvidia path
* Add c++11 option for nvcc
Co-authored-by: satyanveshd <53337087+satyanveshd@users.noreply.github.com>
hipDeviceEnablePeerAccess returns success and adds peer into the list even if it is not accessible which creates problem in hipMalloc when it tries to share the ptr to peer device.
Proposed change is to check the access status before updating the peer list and update only when it can access the peer.
* Device texture functions should not normalize the sampled pixel. This is already done by HW.
* Add support to use h/w capability for normalized float data convertion for driver API's
Co-authored-by: ansurya <50609411+ansurya@users.noreply.github.com>
* Fix bug in LaunchKernel test
Instead of passing the address of the gpu buffer, pass the address
of the pointer that holds the address of the gpu buffer
* Fix hipLaunchKernel's kernarg buffer construction.
The hipLaunchKernel implementation should rely on ihipModuleLaunchKernel
to construct the kernarg buffer correctly based on kernel metadata.
* Fix a bug in get_functions where the Kernel_descriptor wasn't constructed with the correct kernarg layout information.
* Fix a bug in kernarg layout parsing dealing with kernel without any arg
* teach ihipModuleLaunchKernel to handle kernel without any arg
* Add a more interesting test
* Add missing texturePitchAlignment member to the hipDeviceProp_t struct.
* Add missing hipDeviceAttributeTexturePitchAlignment enumerator to the hipDeviceAttribute_t enum.
* Initialize texturePitchAlignment to 256. This works for gfx9+, but is technically overaligned in most cases for pre-gfx9.
* Add the texturePitchAlignment property to the NVCC path.
The current implementation skips this procedure for a given device
object when a global symbol is found in the cache. This is incorrect:
- There could be other undefined globals that have not been previously
encountered further down the list
- If a symbol is found in the cache, it doesn't need to be pinned again
but it still need to be defined for the current executable
Added special case for the printf buffer symbol (already pinned by HCC)
The bug was exposed by running printf on different GPUs.
=> New ROCr calculates pitch as per HSA specification and addrlib is used to check whether HW can support that configuration. Hence few texture tests are failing with HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED.
=> Determine pitch for linear images and always pass rowpitch to HSA API's.
Fixes SWDEV-218626 and SWDEV-218629
Changes:
- Revert "`static inline` in a header, just like excess sugar in a diet, causes bloat (#1692)"
This reverts commit be70b9f7e7.
- Revert "Fix rocFFT build failure (#1777)"
This reverts commit 753277422a.
This PR is a follow-up on PR# #1698 and it makes two more APIs (hipLaunchCooperativeKernel/hipLaunchCooperativeKernelMultiDevice) inline so that they can work correctly with lazy binding.
SWDEV-151670: Issue with 3D texture with 4 components
SWDEV-151671: Issue with 2D layered texture with 4 components
Fixed memcpy when memory is allocated with driver API's.
Github issues: #1755
Fixed 3D default case when array type is not set during memory allocation.