Fixes github issue: #1754
- When ResourceDesc::resType is hipResourceTypeLinear ignore address mode and filter mode.
- When textureDesc::normalizedCoords is set to zero, AddressModeWrap and AddressModeMirror won't be supported and will be switched to AddressModeClamp.
there is a build error when building HIP with latest HCC from GitHub after PR#1935 merged into HIP master branch. this PR changed blockDimX to blockDim and two lines missed this change where added in the current PR.
By default hipcc passes -mllvm options to let HIP-Clang inline all device functions.
--hipcc-func-supp enables function support and disables inline all.
--hipcc-no-func-supp disable function support and enables inline all.
This is a temporary solution to match HCC behavior for performance.
This option is mainly for debugging purpose.
Change-Id: I0c44ac1812bb3cea5c3e5b6e14ebaa45919236f6
* Fix cooperative launch APIs to set hipGetLastError
Previously, the cooperative launch APIs did not properly log their
errors in the global hipGetLastError variable before returning back
to the user. As such, the APIs would leave hipSuccess in the
last error, which would break some use cases.
This fixes that problem by making a trampoline function that does
the HIP_INIT_API and ihipLogStatus.
* Add missing flag to the log of multi-GPU launch
nvcc treats .c program as C program and .cpp program as C++ program.
Currently hipcc treats .c and .cpp programs as HIP programs.
It is desirable to let hipcc behave like nvcc.
Currently it is not feasible to let hipcc treat .cpp programs as C++ program
since there are too many HIP applications use .cpp as extension for HIP programs.
However we should be able let hipcc treat .c program C program since there
are few applications use .c as extensioin for HIP programs.
HIP-Clang cuda_wrapper headers require clang include path before standard C++ include path.
However libc++ include path requires to be before clang include path.
To workaround this, we pass -isystem with the parent directory of clang include
path instead of the clang include path itself.
This PR takes ensures that the maxThreadsPerBlock returned by hipFuncGetAttributes is both a multiple of the warp size and that the register usage of the maximum block does not exceed the number of available registers.
Fixes#1662
GCC emits a warning about using static functions like
hipCUDAErrorTohipError inside this function, because it has an
inline directive, but it's not static. Adding static to this function
to silence warnings (and prevent potential problems in the future).
NVCC warned if you tried to use hipOccupancyMaxActiveBlocksPerMultiprocessor
because when passing in a device function pointer, "const void* func" was
insufficient to describe it accurately. Adding a C++ templated class type
definition for this function.
Fixes SWDEV-207362,
The output file name should not contribute to picking up the right flags for the compiler. This fix solves issues when the output has conflicting extensions which confuses hipcc to treat them as the source files and add the required flags for them.
PS: Output file refers to the file followed by -o
Example: hipcc test.o -o test.hip will add the flags for .hip compilation ignoring the fact that it is an output file
Query ROCr to see if we have the proper lower-level support for
cooperative groups -- GWS support through the firmware, driver,
thunk, and ROCr. ROCr does these checks for us, and presents a
query that allows us to see if GWS entries are available for use.
If so, then we have all the lower-level technologies needed, and
we should enable cooperative groups support for HIP.
Fixes SWDEV-226025,
Right now -x c++ can come before libhip_hcc.so which forces the compiler to treat libhip_hcc.so as a text file and generates a lot of gibberish unicode. This PR changes the order of flags ensuring that -x c++ and similar flags come after libhip_hcc.so
Hopefully, this will not have any negative side effect.
The maxSharedMemoryPerMultiProcessor attribute is meant to describe
the number of bytes of shared memory (LDS space in AMD terminology)
in each SM (CU in AMD terminology). For instance, on AMD GPUs this
is often 64KB per CU, and some Nvidia GPUs it's 96KB per SM.
This shared memory is a different address space from the normal
global memory. However, the current HIP-HCC properties fill this
in with a size that matches the totalGlboalMem property. This gives
a drastically too-high calculation for the amount of LDS space that
each CU has -- tens of GBs vs. 10s of KBs.
This patch fixes this by pulling the maxSharedMemoryPerMultiProcessor
property from the HSA pool that describes how much workgroup-local
space is available on each CU. The HSA runtime eventually pulls
this from the topology information about LDSSizeInKB, defined as
"Size of Local Data Store in Kilobytes per SIMD".
Previously, this HSA query was used to fill in the value of the
sharedMemPerBlock property. On today's AMD GPUs, we know that
the amount of LDS avaialble to the workgroup is identical to the
amount of LDS space in the CU. However, in the future this may
differ. As such, this patch changes around the order and fills
in the "PerMultiProcessor" property from the HSA query (since
what's what the query is defined to return), and then separately
fills in the "PerBlock" property as we know it.
Fix two errors in hipOccupancyMaxActiveBlocksPerMultiprocessor.
1) Fix a possible segfault if the user passed in a null pointer for
the numBlocks value.
2) Handle the situation when the user is asking for a block size
that is larger than what the target device can hold within a
single block.
Currently there is a clang bug on Windows causing duplicate -mllvm options in clang -cc1.
Tempoarily disable -mllvm options for HIP-Clang on Windows until the bug is fixed.
Change-Id: I3a4393ba7745989398dc6c6001722837dad18704