- The driver code should not re-define `tex` again as it's already
defined in the kernel code. Eventually, the driver code should be as
regular C++ code instad of HIP code.
Change-Id: I8c7cab204b98990619d6e7109b990d7089ea9261
1.Combine libamdhip64_static_base.a and libamdvdi_static.a into libamdhip64_static.a.
2.Let hipcc use -use-staticlib to link libamdhip64_static.a.
3.Add some samples for static lib.
4.Fix compiling failure of code object.
Change-Id: Ic8c95228eb139058da8b5d66ba8439486154ca6f
This reverts commit ed3b0eb391.
Reason for revert: It is causing dkms-no-npi-hipclang broken.
It is top priority to maintain dkms-no-npi-hipclang build, otherwise we lose track of regression analysis.
So revert the change for now and recommit it after fixing it.
Change-Id: Ia5136e888baecb6148c6c18eedbf37066fcb1eaa
1.Combine libamdhip64_static_base.a and libamdvdi_static.a into libamdhip64_static.a.
2.Let hipcc use -use-staticlib to link libamdhip64_static.a.
3.Add some samples for static lib.
4.Fix compiling failure of code object.
Change-Id: Ia2333622a8d05639b90974c4c5d3d85654ba0138
The maxSharedMemoryPerMultiProcessor attribute is meant to describe
the number of bytes of shared memory (LDS space in AMD terminology)
in each SM (CU in AMD terminology). For instance, on AMD GPUs this
is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM.
This shared memory is a different address space from the normal
global memory. However, the current HIP-HCC properties fill this
in with a size that matches the totalGlboalMem property. This gives
a drastically too-high calculation for the amount of LDS space that
each CU has -- tens of GBs vs. 10s of KBs.
This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor
property from the HSA pool that describes how much workgroup-local
space is available on each CU. The HSA runtime eventually pulls
this from the topology information about LDSSizeInKB, defined as
"Size of Local Data Store in Kilobytes per SIMD".
Previously, this HSA query was used to fill in the value of the
sharedMemPerBlock property. On today's AMD GPUs, we know that
the amount of LDS avaialble to the workgroup is identical to the
amount of LDS space in the CU. However, in the future this may
differ. As such, this patch changes around the order and fills
in the "PerMultiProcessor" property from the HSA query (since
what's what the query is defined to return), and then separately
fills in the "PerBlock" property as we know it.
Temporarily comment out Hcc-specific template functions
hipExtLaunchKernelGGL and hipOccupancyMaxPotentialBlockSize for CLang
compiler so that all test cases under hip/samples can be built
successfully for Clang + Hip/Hcc runtime.
Change-Id: Iafc761257be4a7b34eafa6759a01f369570cd6ce
* Use hipExtLaunchKernelGGL in dispatchlatency sample
* Let it run on NVCC path too
* Refactoring
* Add test_kernel source
* Remove ResultDB
* Remove error checks
Fixes SWDEV-203394.
Currently in runTest() returns true, even if the texture reference copy does not happen. Using the existing testResult Flag to return from runTest().
* occupancy.cpp with Makefile
* occupancy sample changes according tothe comments
* Changes according to the review comments
* Occupancy Sample Changes
* Changes according to review comments
* Add support for hipFunGetAttribute
* Support NVCC path
* Test using sample module_api_global
* Try fixing CI build failure due to hip_prof_gen scan
* Fix for CI build issue
* Resolve conflict
* Rebase and resolve conflicts with master
* Fix build error
* Fix NVCC path build error
[Reason] To be compatible with CUDA [#1133]
Update HIP code, hipify-clang, tests and docs
[TODO] Add support of the corresponding functions on nvcc fallback path
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
module_api_global relies on a HCC only feature which allows host code
to write to device variables. This feature does not exist in CUDA
or hip-clang, which causes the sample not working in CUDA or hip-clang.
This patch fixes the sample by using standard features of CUDA and
hip-clang. The fixed sample works in HCC, CUDA and hip-clang.
+ Makes hip_Memcpy2D struct compatible with CUDA_MEMCPY2D struct
+ Add hipMemcpyParam2D support in nvcc fallback path
+ Update hipify-clang, tests and docs accordingly