SWDEV-189650 - [HIP-CLANG][HIP/VDI/PAL] Hangs on test hip_threadfence_system
1. In HIP + VDI + ROCm, allow SVM atomic in VEGA10 and later ASIC. GFX8 (Tonga) was enabled before.
2. In HIP + VDI + PAL Linux driver, allow SVM atomic in VEGA10 and later ASIC.
Tests:
1. In HIP + VDI + ROCm, hip_threadfence_system test passed.
2. In HIP + VDI + PAL + Linux , hip_threadfence_system test passed.
3. OpenCL + PAL, clinfo and ocltest runtime test pass.
4. OpenCL + ROCM, clinfo and ocltest runtime test pass.
5. Windows 10, VEGA 10, clinfo and and ocltest runtime test pass. hip_threadfence_system test passed by skipping the test.
Teamcity presubmission test:
http://ocltc.amd.com:8111/viewModification.html?modId=127083&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=127076&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18077/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#320 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
The runtime cannot trivially determine all the resources that will be used by a kernel, thus it can fail to make all of them resident.
1. Add new runtime flag PAL_ALWAYS_RESIDENT. Enabling this setting will cause resources to become resident at allocation time.
2. Set the default value of the above flag to true for HIP and false for OCL.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18054/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#79 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#153 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#319 edit
SWDEV-197488 - [Navi10][Navi14][19.30][CopySurfaceRegion] CopySurfaceRegion Failing Multiple Tests
- Try to optimize the condition for image buffer workaround. The new logic will attempt to validate the custom pitch with the HW requirement and use the backing store only if it doesn't match
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.hpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#76 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#29 edit
SWDEV-189140 - Add P2P support in PAL path
- PAL requires P2P resource open on the usage device. Add the new interface to open the resource
- Add a hidden P2P device object creation into amd::Memory. It can be activated with OCL context that has a single device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_p2p_amd.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#337 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#126 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#306 edit
SWDEV-182204 - [CQE OCL][QR][DTB-Blocker][RV2][PCO][Windows][RS] Soft hang is observed with multiple OCL applications on PCO, RV | Faulty CL#1747482
Ocltst test SW hung at sub test OCLCreateBuffer due to the failure of submit command buffer, because of not enough OS memory allocation.
OCL adds back up GART heap for memory allocation.
http://ocltc.amd.com/reviews/r/17143/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#72 edit
SWDEV-86035 - Fix asserts in PAL after latest integration
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#71 edit
SWDEV-159255 - [CQE OCL][ocltst][WIN] [DTB-Blocker] OCLMemoryInfo[0] a sub-test of ocltst oclruntime module is failed while running whole module and getting pass while running alone this test due to faulty CL#1576247
- Reduce extra size acceptance for the cache look-up, so it will satisfy the test condition.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#69 edit
SWDEV-132899 - [OCL][GFX10] OCLGLDepthTex[0] and OCLGLDepthTex[4] subtests of OCLTST/OCLGL are failing on gfx10 Emulator
extending the Depth24_Stencil8 workaround for gfx10 based on gfx10 Image SRDs
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15194/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#67 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Persistent memory failure could be propagated to the app without the second attempt after resource cache release. Try to allocate memory again after resource cache release.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#65 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#23 edit
SWDEV-150453 - [CQE OCL][DTB][Perf][QR][Vega][DTB-BLOCKER] Performance drop observed on multiple subtests while running Nuke
1. Clean up suballocation chunk creation logic.
2. Try to cache a resource if it wasn't a suballocation.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/14591/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#62 edit
SWDEV-149330 - [CQE OCL][Vega10][PAL] ocltst - OCLFoldLibFunc a sub-test of oclcompiler module fails on Vega10 PAL/HSAIL path | Faulty PAL/HSAIL CL#1524674
- Reset offset to 0 for each Resource::create() call, since runtime could call create() more than once if the initial memtype failed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#61 edit
SWDEV-149178 - [CQE OCL][DTB-BLOCKER][RS4][WF][QR]observed failures while running samplers test due to faulty CL#1529531
- Image alignment requirement could be different from the original buffer chunk alignment and that could cause a failure on the final address alignment. Use fragment size alignment for the chunk and protect suballocations from possible alignment mismatch
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#58 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Fix a regression in the AMF test and reenable the suballoc optimization. Rearrange the locks around cache field access only to avoid calling memory release under the cache lock.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#57 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#53 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#287 edit
SWDEV-133818 - PAL support for Linux Pro: Coarse Grain SVM for OpenCL 2.0
1. This change enables OCL 2.0 on Linux for devices using PAL backend.
2. Set the alignment for Coarse Grain SVM allocations to be the gpu fragment size (2MB on Linux).
ReviewBoardURL = http://ocltc.amd.com/reviews/r/14437/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#52 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#46 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#128 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Remove pinOffset_ field, since the pinning offset can be combined with global offset_ field
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#50 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#16 edit
SWDEV-147487 - DX9/DX11 texture and OpenCL interop for YUY2
- Enable YUY2 support for DX11 and DX9. YUY2 contains just one plane of interleaved Y0UY1V components and can be mapped to (CL_RGBA, CL_UNSIGNED_INT8) with image width reduced by 2. YUY2 provides better quality due to 16bit data per pixel
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_d3d11.cpp#23 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_d3d9.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#241 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#49 edit
SWDEV-145750 - SSG Player drop in performance observed when using the OCL Api in 18.10
- Keep persistent memory mapped all time for Linux and Win10
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#48 edit
SWDEV-143822 - [CQE OCL][Vega10][OCLtst][DTB-Blocker][QR] 8 out of 50 failures are observed with OCLPerf 32bit test; due to Faulty CL# 1502648
- Free resource cache if PAL failed memory allocation
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#47 edit
SWDEV-142224 - [OCL] OCL runtime hang in multithread app with extended tests
- Lock the resource cache only if a resource will be placed into the cache. Views, allocated/destroyed dynamically on the queues, won't be placed into the cache and lock should not be called for them
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#46 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#15 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Don't add the offset, since it's already a part of VM address
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#44 edit
SWDEV-136068 - PAL support for Linux Pro: SSG support on OpenCL 1.2
- Enable DGMA memory allocation for persistent memory under Linux, since it's SSG requirement
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#39 edit
SWDEV-129129 - [[CQE OCL][Vega vs Fiji] Upto 12% Performance drop observed on VEGA10 compared to FIJI while running BlackMagic Davinci Resolve
More benchmark tuning:
- Keep system memory locked in the resource cache. That removes huge amount of lock/unlock calls to OS due to the resource creation and destruciton
- Reduce the command buffer size to 256 commands and incrrease the amount of CBs to 16
- Increase the amount of resident resources to 2048
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#574 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#58 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#31 edit
SWDEV-130722 - Channel order in an interop buffer from OpenCL to OpenGL is flipped on Vega
Follow up for CL#1456230. Adding a new table that maps the OGL surface formats (hData.format) returned by wglResourceAttachAMD function into the OCL image format. The hData.format is the internal image surface format created for an interop by OGL and should be used by OCL for cl_gl interop.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/13421/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevicegl.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#36 edit
SWDEV-130722 - Channel order in an interop buffer from OpenCL to OpenGL is flipped on Vega
OCL calls glGetTexLevelParameteriv_ function to get the internal GL format but this format is the one chosen by app in OGL API such as glTexImage2D.
The issue is that OGL sometimes selects a different format than defined in the glTexImage2D and this causes some issues in cl_gl interop. One example is shown below
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA/**internal format**/, width, height, 0, GL_BGRA/**external format**/, GL_UNSIGNED_BYTES, NULL);
in this case GL_RGBA is selected by app as the internal format but OGL switches to BGRA8 internally and causes an issue later in cl_gl interop (i.e., R and B channels are swapped) because OCL gets GL_RGBA as the internal format in the glGetTexLevelParameteriv_ call.
To avoid this issue, OCL needs to query the real internal gl format in wglResourceAttachAMD and adjusts the CL format accordingly.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/13408/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevicegl.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#35 edit
SWDEV-131493 - [CQE OCL][Vega10][QR][DTB-Blocker] Soft Hang is observed while running 'Mipmaps-clCopyImage' tests of WF Conformance due to Faulty CL# 1451293
Multiple runtime locks could conflict each other:
- Remove PAL lock from the resource creation/destruction. PAL should be thread safe for those operations.
- Avoid queue execution lock for a mipmap view destruction in submitUnmapMemory
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#34 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#55 edit
SWDEV-131311 - [CQE OCL][DTB][DTB-BLOCKER][Perf][QR][VEGA] BasemarkCL test are not completing due to faulty CL#1451293
- After a view destruction the original object is no longer can be associated with a vgpu
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#33 edit
SWDEV-86035 - Code clean-up
- Use TS check first to avoid LogError
- Reset VirtualGPU reference if resource was cached
- Lock active VirtualGPU on release, since a cached resource can have access to that queue from another thread
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#54 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#30 edit