1. In kernel/hipDynamicShared
Fix shared memory size and type mismatch in host and kernel.
2. In kernel/hipDynamicShared2
Cuda kernels relying on shared memory allocations over 48 KB require
to explicitly set size using hipFuncSetAttribute().
Change-Id: I4248b6cebd3dc156f9d5d427e1897da22fb964ed
[ROCm/hip commit: 5b739b0373]
Remove __HCC__, __HCC_ONLY__, __HCC_CPP__, __HCC_C__,
__HCC_OR_HIP_CLANG__, __HIP_ROCclr__ and their guarded codes.
Remove Hcc codes from directed_tests and samples.
Remove __HIP_PLATFORM_HCC__ and __HIP_PLATFORM_NVCC__ from
some files where they are not necessary.
Add deprecation notice.
Change-Id: I1ae467eafd749d6c25bca204c1724b026be21fce
[ROCm/hip commit: b34dd95124]
1.Rename include/hip/hcc_detail/ as include/hip/amd_detail/
2.Rename include/hip/nvcc_detail/ as include/hip/nvidia_detail/
3.Create __HIP_PLATFORM_AMD__ to replace __HIP_PLATFORM_HCC__
4.Create __HIP_PLATFORM_NVIDIA__ to replace __HIP_PLATFORM_NVCC__
After hcc_detail, nvcc_detail, __HIP_PLATFORM_HCC__ and __HIP_PLATFORM_NVCC__
have been removed from upstream, they will be removed from hip runtime.
Change-Id: I1ae457effd739d6c25bca203c1724b026be21fce
[ROCm/hip commit: c2adc70d4d]
This test does not work if block size is greater than wave size
since it relies on lock-step execution of the kernel in the block.
If there are more than waves in the block, the threads in the block
miss synchronization since one wave may finish before another wave.
Due to this bug, the test fails on GFX10 wave32 mode.
This patch fixes that so that it works for block size greater than
wave size.
Change-Id: Ie0097066081df36cb6fe025a71d0ee5a83ec00a2
[ROCm/hip commit: 78269dcbe5]
This makes hipLaunchKernelGGL take a variable argument list, that will be
expanded before being fed to hipLaunchKernelGGLInternal.
This is different from b2edee4693.
We try to accomodate the case when a kernel template has multiple
type parameters.
Change-Id: I87577d402c92b0f3b51e298f8293f4065e1f6de8
[ROCm/hip commit: affe9ab9b5]
This makes hipLaunchKernelGGL take a variable argument list, that will be
expanded before being fed to hipLaunchKernelGGLInternal.
Change-Id: Id76e2bf91acd5d68f56a24fc39f219f2eeb06d33
[ROCm/hip commit: 961717879d]
Let hipMalloc() be called in main() so that global variable can be initialized.
Change-Id: I9aa1f0a0bb4fa0825d10af0b58c843e7b928e9a3
[ROCm/hip commit: de4c173c6e]
SWDEV-225266: [HIP-VDI] HIP-VDI disabled tests (p2p_copy_coherency.cpp)
SWDEV-225388: hipTestDeviceSymbol.cpp & hipTestConstant.cpp failed to build on hip-vdi
For hipTestDeviceSymbol.cpp & hipTestConstant.cpp tests:
Currently "__HIP_VDI__" flag is enabled in CMakeLists.txt, but when application is compiled with hipcc,
__HIP_VDI__ is not defined to differentiate if compiled for VDI/HCC for headers.
For ./src/runtimeApi/memory/p2p_copy_coherency.cpp:
Fixed compilation issue to include only when compile for HCC runtime "<hc_am.hpp> not found"
Currently test is disabled to run on all platforms. When validated on multi-GPU machine,
memcpy between multiple GPUs via GPU synchronization is not working on hcc and vdi path.
Need to validate on nvidia machine to know if test is valid. Disabled GPU synchronization test for now.
For ./src/runtimeApi/module/hipModuleTexture2dDrv.cpp:
updated test to generate tex2d_kernel.code object in build directory. Currently ctest looks for it in build directory.
Change-Id: I629d395a919c2440d921422716944c7940ed6010
[ROCm/hip commit: 3479847d16]
Temporarily comment out Hcc-specific template functions
hipExtLaunchKernelGGL and hipOccupancyMaxPotentialBlockSize for CLang
compiler so that all test cases under hip/samples can be built
successfully for Clang + Hip/Hcc runtime.
Change-Id: Iafc761257be4a7b34eafa6759a01f369570cd6ce
[ROCm/hip commit: b3f445c0f5]
The current implementation skips this procedure for a given device
object when a global symbol is found in the cache. This is incorrect:
- There could be other undefined globals that have not been previously
encountered further down the list
- If a symbol is found in the cache, it doesn't need to be pinned again
but it still need to be defined for the current executable
Added special case for the printf buffer symbol (already pinned by HCC)
The bug was exposed by running printf on different GPUs.
[ROCm/hip commit: 6613a37b3b]
* Removed unwanted #include sys/time.h , gettimeofday() and timeval variables and this also helps avavoid compilation error in windows due to gettimeofday() call equivalent of which is not available in windows
* Changed the Macro name from GPU_PRINT_TIME to MY_LAUNCH_MACRO
[ROCm/hip commit: e94c0592de]
- Uses c++11 features. Added it to nvcc options
- Arguments for some kernels exceeded 4096 bytes which is the limit
imposed by nvcc. Reduced BLOCK_DIM_SIZE to 512 to handle this
- Fixed compilation issues on nvcc path
Change-Id: I14f6b28afcb7c6b24a085fd707b2104e2ed64627
[ROCm/hip commit: 4c41d62435]