Enable NV printf DTests as many as possible.
Fix the bugs due to behavour difference between
Hip-Rocclr and Cuda.
Add hipLimitPrintfFifoSize.
Change-Id: I3fe6dbc35a7a140a9919df197b7885df83d28049
[ROCm/hip commit: 586165ebc2]
Add test on CXX and Fortran build in cmake.
Add test on hip::device interface linking in cmake.
Change-Id: I3fe6dba05a7a140a9a19df107b7885df83d28042
[ROCm/hip commit: 818aa18d59]
HIP supports emitting two types of static libraries. One type
will export host functions and is compatible with host linkers.
The second type exports device functions, but is generated with
ar manually. Also, add a README with steps on how to run these
samples with Makefile or CMake.
Change-Id: I1be15c2884583b370092bc8e4bf04f726f8f5a27
[ROCm/hip commit: cfcf04d502]
This reverts commit 4ca1d84a26.
Hold wall time related updates till direct dispatch is ready.
Change-Id: I53b232f6f51bc2fc71b6b639fe0081e2907e9707
[ROCm/hip commit: 0130a166db]
Remove __HCC__, __HCC_ONLY__, __HCC_CPP__, __HCC_C__,
__HCC_OR_HIP_CLANG__, __HIP_ROCclr__ and their guarded codes.
Remove Hcc codes from directed_tests and samples.
Remove __HIP_PLATFORM_HCC__ and __HIP_PLATFORM_NVCC__ from
some files where they are not necessary.
Add deprecation notice.
Change-Id: I1ae467eafd749d6c25bca204c1724b026be21fce
[ROCm/hip commit: b34dd95124]
1.Rename include/hip/hcc_detail/ as include/hip/amd_detail/
2.Rename include/hip/nvcc_detail/ as include/hip/nvidia_detail/
3.Create __HIP_PLATFORM_AMD__ to replace __HIP_PLATFORM_HCC__
4.Create __HIP_PLATFORM_NVIDIA__ to replace __HIP_PLATFORM_NVCC__
After hcc_detail, nvcc_detail, __HIP_PLATFORM_HCC__ and __HIP_PLATFORM_NVCC__
have been removed from upstream, they will be removed from hip runtime.
Change-Id: I1ae457effd739d6c25bca203c1724b026be21fce
[ROCm/hip commit: c2adc70d4d]
HIP supports compiling kernels from LLVM IR into executable.
The device LLVM IR needs to be compiled into a fat binary
object. This device object is embedded into a host object using
llvm-mc directives. Then, any host linker may link the host and
device objects together into an executable. A README was added.
Change-Id: I8ebb6ae86b7ab4290f7cba2eea5584d73a7c453e
[ROCm/hip commit: 8a5b8a36f2]
HIP supports compiling kernels from assembly into exec.
The device assembly needs to be compiled into a fat binary
object. This device object is embedded into a host object using
llvm-mc directives. Then, any host linker may link the host and
device objects together into an executable. A README is added.
Change-Id: I59d3a8b5363073810ffc3aa0d57f21b0df272369
[ROCm/hip commit: 33f0a41c7a]
1.Make directed_test apps linked against static libs
of hip, rocclr, rocr, roct and amd_comgr.
2.Remove custom_target amdhip64_static_combiner.
3.Support EXCLUDE_HIP_LIB_TYPE <static|shared>.
4.Simplify argument list parsing.
5.Install rocclr when rocm is installed.
6.Fix some original small bugs.
Revert "Revert "Make directed_test support static libs""
This reverts commit 4a8a95a8e9.
Change-Id: I918eeae94487e5e2ff5bfde083667ac65fb6e702
[ROCm/hip commit: bcd067f462]
Only cmake can support static lib of hip rt.
Thus samples will support static lib of hip
rt when this is done.
Change-Id: I70e8d06e85084369a035b42c5d1d56287c874ac9
[ROCm/hip commit: 8f72a6993f]
- The driver code should not re-define `tex` again as it's already
defined in the kernel code. Eventually, the driver code should be as
regular C++ code instad of HIP code.
Change-Id: I8c7cab204b98990619d6e7109b990d7089ea9261
[ROCm/hip commit: 74ba25602b]
1.Combine libamdhip64_static_base.a and libamdvdi_static.a into libamdhip64_static.a.
2.Let hipcc use -use-staticlib to link libamdhip64_static.a.
3.Add some samples for static lib.
4.Fix compiling failure of code object.
Change-Id: Ic8c95228eb139058da8b5d66ba8439486154ca6f
[ROCm/hip commit: da27fd2b09]
This reverts commit 8a42ac4d03.
Reason for revert: It is causing dkms-no-npi-hipclang broken.
It is top priority to maintain dkms-no-npi-hipclang build, otherwise we lose track of regression analysis.
So revert the change for now and recommit it after fixing it.
Change-Id: Ia5136e888baecb6148c6c18eedbf37066fcb1eaa
[ROCm/hip commit: f246761dee]
1.Combine libamdhip64_static_base.a and libamdvdi_static.a into libamdhip64_static.a.
2.Let hipcc use -use-staticlib to link libamdhip64_static.a.
3.Add some samples for static lib.
4.Fix compiling failure of code object.
Change-Id: Ia2333622a8d05639b90974c4c5d3d85654ba0138
[ROCm/hip commit: 4c2ab3f41e]
The maxSharedMemoryPerMultiProcessor attribute is meant to describe
the number of bytes of shared memory (LDS space in AMD terminology)
in each SM (CU in AMD terminology). For instance, on AMD GPUs this
is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM.
This shared memory is a different address space from the normal
global memory. However, the current HIP-HCC properties fill this
in with a size that matches the totalGlboalMem property. This gives
a drastically too-high calculation for the amount of LDS space that
each CU has -- tens of GBs vs. 10s of KBs.
This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor
property from the HSA pool that describes how much workgroup-local
space is available on each CU. The HSA runtime eventually pulls
this from the topology information about LDSSizeInKB, defined as
"Size of Local Data Store in Kilobytes per SIMD".
Previously, this HSA query was used to fill in the value of the
sharedMemPerBlock property. On today's AMD GPUs, we know that
the amount of LDS avaialble to the workgroup is identical to the
amount of LDS space in the CU. However, in the future this may
differ. As such, this patch changes around the order and fills
in the "PerMultiProcessor" property from the HSA query (since
what's what the query is defined to return), and then separately
fills in the "PerBlock" property as we know it.
[ROCm/hip commit: 55e55e78bb]