* Disable device side malloc
Currently device side malloc is not working and takes excessive
device memory.
Disable it for now until a working malloc is implemented.
Change-Id: I1ad908c1c53a83752383b4be96688a848642c699
The existing one can have issues on certain systems, therefore this limits use of direct memcpy via largeBAR to sizes where it is unequivocally better.
Also addresses SWDEV-220030 and SWDEV-222237.
=> New ROCr calculates pitch as per HSA specification and addrlib is used to check whether HW can support that configuration. Hence few texture tests are failing with HSA_EXT_STATUS_ERROR_IMAGE_PITCH_UNSUPPORTED.
=> Determine pitch for linear images and always pass rowpitch to HSA API's.
SWDEV-151670: Issue with 3D texture with 4 components
SWDEV-151671: Issue with 2D layered texture with 4 components
Fixed memcpy when memory is allocated with driver API's.
Github issues: #1755
Fixed 3D default case when array type is not set during memory allocation.
Reverting #1673, #1697 and #1707.
Support for hipMemcpyWithStream and memcpy optimizations, will be brought in again once issues seen with these are resolved independently.
* hipMemset et al can use HSA API directly for synchronous cases
* lock and flush stream in hipMemset, hold lock until complete
* move hipMemset async check to front of conditional
* use hsa_amd_memory_fill for additional sync memset cases
code cleanup/review for all memset calls
* Fix inversion of execution mutating value.
* ihipMemsetSync fall back to kernel if HSA memset fails
* Never fallback, never surrender.
* Allow NULL stream.
* Optimise memset kernel. Remove deadwood.
* Update hip_memory.cpp
* Clean up stream logic in sync memset
* Revert "Clean up stream logic in sync memset"
This reverts commit 6117dedf673367f44cc704192573a117a3d92477.
Texture 2D image mapping for pitched arrays:
github issue: Texture Object's Buffer seems to be Misaligned #886
JIRA ticket: SWDEV-199313
SWDEV-151670 : Fixed issue with 3D texture with 4 components
SWDEV-151671 : Issue with 2D layered texture with 4 components
* [hip] add support for implicit kernel argument for multi-grid sync
* modified code for calculating the prev_sum
* change the impCoopArg type to size_t
* add memory clean up
* launch init_gws and main kernels into two separate loops
Added new memory API's hipMemAllocPitch, hipMemAllocHost, hipMemsetD16, hipMemsetD16Async, hipMemsetD8Async
Modified to support all scenarios hipMemcpyParam2DAsync, hipMemcpyParam2D.
* all thread local access now through single struct
* clean up old commented-out code, more use of GET_TLS()
* fewer calls to GET_TLS by passing tls as a funtion argument
* revert unnecessary change to printf
* fix failing tests due to TLS change
* fix merge conflicts in ihipOccupancyMaxActiveBlocksPerMultiprocessor
[Reason] To be compatible with CUDA [#1133]
Update HIP code, hipify-clang, tests and docs
[TODO] Add support of the corresponding functions on nvcc fallback path
+ Makes hip_Memcpy2D struct compatible with CUDA_MEMCPY2D struct
+ Add hipMemcpyParam2D support in nvcc fallback path
+ Update hipify-clang, tests and docs accordingly
* In hipFree, if memory is associated with a device, synchronize that device's streams.
This changes the behavior from synchronizing the currently set TLS device.
* All devices sync in hipFree for _appId=-1 case.
* Revert "All devices sync in hipFree for _appId=-1 case."
This reverts commit 1efb34d6a8426661e45bc5f763422a1147aeac10.
* add HIP_SYNC_FREE env var