SWDEV-132899 - [OCL][GFX10] 70 subtests of Conformance Mipmaps (clCopyImage) test failed for image type 1Darray
This is the follow up for CL#1517501
copyImage1DA blit kernel uses image2d_array_t type for src/dst images. On gx10, num of arrays/layers is expected in Z component for a 2Darray image so a swap is required for 1Darray images when we use 2Darray image for the image copy. The copyImage1DA has code for swapping z and y components as follows:
if (srcOrigin.w != 0) {
coordsSrc.z = coordsSrc.y;
coordsSrc.y = 0;
}
if (dstOrigin.w != 0) {
coordsDst.z = coordsDst.y;
coordsDst.y = 0;
}
So to use this path force the w component to 1 for src and dst images on gfx10 if image type is 1Darray.
ReviewRequestURL = http://ocltc.amd.com/reviews/r/16538/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#28 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Remove mapping of some internal CL formats in PAL backend, since it shouldn't need them anymore.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#43 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Use SDMA staging transfers for data upload if pinning fails. Fixes HIP failure in a test that uses the code segment data for uppload.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#26 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Following CL#1552596. Make sure virtual GPU is set for the internal allocations before the create() call, since the deferred alloc is disabled.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#128 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#416 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#144 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#96 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#51 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#21 edit
SWDEV-151739 - [CQE OCL][DTB][Perf][QR][DTB-BLOCKER][VEGA10] Upto 18% performance drop observed while running Video Composition test sub test of Compubench due to faulty CL#1544622
- Implement customized TS tracking for managed buffers. The common TS tracking mechanism saves the event of the last command, assuming SDMA and compute operations occur in order, but for managed buffers it's not the case. Also managed buffer doesn't have to validate TS for the parent resource.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#21 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palconstbuf.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palconstbuf.hpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.hpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#22 edit
SWDEV-132899 - [OCL][GFX10] OCLCreateImage[3] fails for the image type 1Darray
Issue: FillImage blit kernel is not working properly on gfx10 if the image type is 1Darray (i.e., it only fills the first slice/layer and ignores the rest of the layers when number of layers >1)
Root cause: gfx10 HW expects the number of layers in Z component
Fix: To fix this issue we swap the Y and Z components if the image type is 1Darray for gfx10+ in image blit kernels.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/14281/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#14 edit
SWDEV-129129 - [[CQE OCL][Vega vs Fiji] Upto 12% Performance drop observed on VEGA10 compared to FIJI while running BlackMagic Davinci Resolve
The app creates/destroys hundred resources each frame. PAL path was removing the destroyed resources from the resident list, although the resource was kept in the cache. This change does the follwoing:
- Switch TS tracking from a map in VirtualGPU to resource
- Don't remove references until the actual memory destruction
- Add a residency threshold to avoid OS resident/eviction calls
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/blit.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#50 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#35 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#46 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.hpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#52 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#28 edit
SWDEV-107226 - [SDI] SDISpeedTest Corruption for OCL GPU to SDI RGBA
- Single step copy using SDMA to remote SDI buffer seems to be causing corruption. This fix is a workaround to do transfer via a staging buffer and seems to be fixing corruption. The issue is under investigation
ReviewBoardURL = http://ocltc.amd.com/reviews/r/11882/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#125 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#12 edit
SWDEV-101448 - [CQE OCL][Brahma][PERF][QR] ~21% perf drop is observed with lulesh-cl subtest of ComputeApps tests : Faulty CL # 1306133
- Use the logic for transfer size before CL#1306133
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#124 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#10 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Improve image fill performance with multiple writes in a single thread. The current split has 3 regions
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/common.hsa/src/blitKernels.cl#4 edit
... //depot/stg/opencl/drivers/opencl/library/common/src/blitKernels.cl#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#123 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#4 edit
SWDEV-101206 - [CQE OCL][Perf][G][QR] Upto ~9% Performance drop observed while running Video Composition subtest of Compubench; Faulty CL#1306133
- Use the original logic without DMA flush. Flush on staging write helps with a blocking op only, but currently VDI doesn't have that information.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#122 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#7 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Update staging copy path with a flush so CPU copy and SDMA transfer could run asynchronously.
- Tune chunk size for transfers
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#6 edit
SWDEV-3 - AMDGPU: Expand unaligned accesses early
Due to visit order problems, in the case of an unaligned copy
the legalized DAG fails to eliminate extra instructions introduced
by the expansion of both unaligned parts.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274397 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: d4452f8fcf496a2e19c1a1c9792f5f063f4e9703
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Target/AMDGPU/AMDGPUISelLowering.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Target/AMDGPU/AMDGPUISelLowering.h#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/CodeGen/AMDGPU/cvt_f32_ubyte.ll#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/CodeGen/AMDGPU/sext-in-reg-failure-r600.ll#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/CodeGen/AMDGPU/sext-in-reg.ll#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/CodeGen/AMDGPU/unaligned-load-store.ll#2 edit
SWDEV-3 - [msan] Fix __msan_maybe_ for non-standard type sizes.
Fix incorrect calculation of the type size for __msan_maybe_warning_N
call that resulted in an invalid (narrowing) zext instruction and
\"Assertion `castIsValid(op, S, Ty) && \"Invalid cast!\"' failed.\"
Only happens in very large functions (with more than 3500 MSan
checks) operating on integer types that are not power-of-two.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274395 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: dcfa1b5241a1d0484ad1a67485329b1c7c13b575
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Transforms/Instrumentation/MemorySanitizer.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/Instrumentation/MemorySanitizer/with-call-type-size.ll#1 add
SWDEV-3 - [codeview] Don't record UDTs for anonymous structs
MSVC makes up names for these anonymous structs, but we don't (yet).
Eventually Clang should use getTypedefNameForAnonDecl() to put some name
in the debug info, and we can update the test case when that happens.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274391 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: 613f19910964eb95a63bd906b0b75d9aa20d9b06
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/DebugInfo/COFF/udts.ll#2 edit
SWDEV-3 - IR: Set TargetPrefix for some X86 and AArch64 intrinsics where it was missing
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274390 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: f0a4c116041f7c2aef7796c8b067f0947b69602d
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/include/llvm/IR/IntrinsicsAArch64.td#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/include/llvm/IR/IntrinsicsX86.td#2 edit
SWDEV-3 - Address two correctness issues in LoadStoreVectorizer
Summary:
GetBoundryInstruction returns the last instruction as the instruction which follows or end(). Otherwise the last instruction in the boundry set is not being tested by isVectorizable().
Partially solve reordering of instructions. More extensive solution to follow.
Reviewers: tstellarAMD, llvm-commits, jlebar
Subscribers: escha, arsenm, mzolotukhin
Differential Revision: http://reviews.llvm.org/D21934
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274389 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: 1e53a5fcec984e0f1cefe43dba3939e4b72a533f
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Transforms/Vectorize/LoadStoreVectorizer.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/Transforms/LoadStoreVectorizer/AMDGPU/interleaved-mayalias-store.ll#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/Transforms/LoadStoreVectorizer/X86/lit.local.cfg#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/Transforms/LoadStoreVectorizer/X86/preserve-order32.ll#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/test/Transforms/LoadStoreVectorizer/X86/preserve-order64.ll#1 add
SWDEV-3 - [PM] Preparatory cleanups to ArgumentPromotion.
This pulls some obvious changes out of http://reviews.llvm.org/D21921 to
minimize the diff.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274445 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: 197a7516a32b69da7d1243308cb8eb6c5f29de0c
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Transforms/IPO/ArgumentPromotion.cpp#2 edit
SWDEV-3 - [PM] Fix a small typo from when I ported JumpThreading
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274440 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: ea9886a5909183770b8d0baa9061150adf664b1a
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Transforms/Scalar/JumpThreading.cpp#2 edit
SWDEV-3 - [Hexagon] Create global std::map lazily.
This could of course be a simple binary search with no global state
involved at all if someone cares enough. Just don't make everyone
linking the hexagon backend pay for it on process startup and shutdown.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274437 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: b4e53350f9349677e2a0178bde5b8b0c3b743b5e
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Target/Hexagon/MCTargetDesc/HexagonMCDuplexInfo.cpp#2 edit
SWDEV-3 - CodeGen: Use MachineInstr& in SlotIndexes.cpp, NFC
Avoid implicit conversions from iterator to pointer by preferring
MachineInstr& and using range-based for loops.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274354 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: effa4cc200078395a74decd1ae2d1e380c79a2f7
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/CodeGen/SlotIndexes.cpp#2 edit
SWDEV-3 - CodeGen: Use MachineInstr& in RegAllocFast, NFC
Use MachineInstr& instead of MachineInstr* in RegAllocFast to avoid
implicit conversions from MachineInstrBundleIterator. RAFast::spillAll
and RAFast::spillVirtReg still take iterators, since their argument may
be an end iterator from MachineBasicBlock::getFirstTerminator.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274353 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: ce5fdc00e7ed9f05c643b056d0561a8133b5438b
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/CodeGen/RegAllocFast.cpp#2 edit
SWDEV-3 - [CMake] Add LLVM_BUILD_32_BITS to LLVMConfig.cmake
Previously out-of-tree passes could detect if LLVM was built with
LLVM_BUILD_32_BITS by looking for -m32 in LLVM_DEFINITIONS, but as of r271871
it no longer appears there. Resolve this by instead emitting LLVM_BUILD_32_BITS
in LLVMConfig so it can be checked for directly.
Differential Revision: http://reviews.llvm.org/D21434
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274351 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: e6124112ab41442b4df11207eaf004bb6066c021
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/cmake/modules/LLVMConfig.cmake.in#6 edit
SWDEV-3 - [ARM] Refactor Thumb2 mul instruction descs
No functional changes. Just created wrapper classes around the 3
and 4 reg mult and mac instruction classes.
Differential Revision: http://reviews.llvm.org/D21549
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@274347 91177308-0d34-0410-b5e6-96231b3b80d8
GitHash: b5755a89959882b64dc9adc3a963b5ba920b392f
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm.git/lib/Target/ARM/ARMInstrThumb2.td#2 edit