SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 2)
Enable the log functions for release build.
Tests:
1. Linux HIP ROCM platform. VEGA10. Driver is release build.
1.1 export LOG_LEVEL=3
./hipModule
There are many logs.
1.2 export GPU_LOG_MASK=0
./hipModule
There is no log
2. Windows HIP PAL platform. VEGA10, Driver is release or fastdbg build.
2.1 set LOG_LEVEL=3
run test hipPrintfKernel
There are many logs
2.2 set GPU_LOG_MASK=0
run test hipPrintfKernel
There is no log
3. http://ocltc.amd.com:8111/viewModification.html?modId=128481&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18240/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#324 edit
SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 1)
1. The log macros is turned off for release build. So log functions has zero impact to release build.
2. The log macros have level, mask, condition control. So we can have more control to avoid log flooding.
I also adjusted some existing log to use new log functions.
1. To excercise and test the new log functions.
2. To improve performance slightly.
3. The change is mainly for HIP-ROCM, we can move more in next phases for PAL or ORCA.
4. I make these log feature unavailable for release build. We can revert to old log functions for release build in a case by case method.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=128289&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=128358&personal=true&tab=vcsModificationBuilds
2. release build, run hip program, there is no log
3. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967295
There was a lot of logs.
4. fastdebug build, run hip program,
export LOG_LEVEL=2
export GPU_LOG_MASK=4294967295
There was no logs.
5. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967294
There was much less logs.
6. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=47102
There was even much less logs. The logs was expected according to the mask.
7. Tested step 2 to 6 similarily in Windows and Linux
ReviewBoard: http://ocltc.amd.com/reviews/r/18215
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hiprtc_internal.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/comgrctx.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#91 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/runtime.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#323 edit
SWDEV-198862 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_OPT_FLUSH
Add HCC_OPT_FLUSH flag to use fence scope agent when possible for HIP VDI. The flag is defaulted to turn on, similiar to HIP HCC.
Add AMD_OCL_OPT_FLUSH to use fence scope agent when possible for OpenCL. This was tested in Windows and PAL. Default is off.
This flag can be used for future OpenCL test.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=127189&personal=true&tab=vcsModificationBuilds
The teamcity test includes HIP - VDI - Rocm tests.
2. VEGA10 , Windows, HIP, 110 hiptests PASS.
3. VEGA10 , Linux AMDGPU PRO, HIP - PAL, 110 hiptests PASS.
Newer:
http://ocltc.amd.com:8111/viewModification.html?modId=127193&personal=true&tab=vcsModificationBuilds
Reviewboard: http://ocltc.amd.com/reviews/r/18092/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#247 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#342 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#89 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#321 edit
SWDEV-189650 - [HIP-CLANG][HIP/VDI/PAL] Hangs on test hip_threadfence_system
1. In HIP + VDI + ROCm, allow SVM atomic in VEGA10 and later ASIC. GFX8 (Tonga) was enabled before.
2. In HIP + VDI + PAL Linux driver, allow SVM atomic in VEGA10 and later ASIC.
Tests:
1. In HIP + VDI + ROCm, hip_threadfence_system test passed.
2. In HIP + VDI + PAL + Linux , hip_threadfence_system test passed.
3. OpenCL + PAL, clinfo and ocltest runtime test pass.
4. OpenCL + ROCM, clinfo and ocltest runtime test pass.
5. Windows 10, VEGA 10, clinfo and and ocltest runtime test pass. hip_threadfence_system test passed by skipping the test.
Teamcity presubmission test:
http://ocltc.amd.com:8111/viewModification.html?modId=127083&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=127076&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18077/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#320 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
The runtime cannot trivially determine all the resources that will be used by a kernel, thus it can fail to make all of them resident.
1. Add new runtime flag PAL_ALWAYS_RESIDENT. Enabling this setting will cause resources to become resident at allocation time.
2. Set the default value of the above flag to true for HIP and false for OCL.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18054/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#79 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#153 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#319 edit
SWDEV-200614 - [Schneider] Crash in Agisoft when run in mGPU environment
- Add a workaround for memory pinning path. It will perform 2-step copy to make sure memory pinning doesn't occur on the first unaligned page, because in Windows memory manager can have CPU access to the allocation header in another thread and a race condition is possible
- change some default setting for staging and pinned paths, because PCIE gen3 performance.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#96 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#150 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#317 edit
SWDEV-145570 - Rename OCL_DUMP_CODE_OBJECT to GPU_DUMP_CODE_OBJECT.
Since this is used by both OCL and HIP. Rename to avoid confusion.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#315 edit
SWDEV-193423 - HIP/VDI - Support for lazy hsa queue creation
- Add queue pool support for HSA HW queues. GPU_MAX_HW_QUEUES controls the pool size. The current default value is 4 (the number of active pipes on GPU).
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#132 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#81 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#314 edit
SWDEV-195023 - [CQE OCL][Navi10][RESOLVE] corruption seen in thumbnail for mxf clip after enabling temporal denoiser in Davinci resolve app
- Add a workaround for missing custom pitch in gfx10 HW. It can be disabled with GPU_IMAGE_BUFFER_WAR=0. Workaround implements double copy with an image without pitch.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#89 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#62 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#313 edit
SWDEV-180872 - Runtime support changes for Cooperative Group Features
- Initial implementation of the core functionality. Disabled by default. Use GPU_ENABLE_COOP_GROUPS=1 to enable the feature.
- Runtime uses device queue for cooperative executions with a synchronization on the launched queue.
- The current implementation is pure runtime change and it can work if only one app uses this feature. No ROCr/KFD support was added or tested
- Only inline assembler was tested
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device_runtime.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#17 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#606 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palschedcl.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocschedcl.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#94 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#92 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#311 edit
SWDEV-188631 - Allocating large buffers produce wrong kernel result on Windows
1. Set a limit for USWC allocations to 2GB on Windows.
2. Allocations larger than the specified limit will get placed into pinned memory instead.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/17407/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#310 edit
SWDEV-185452 - Offline compilation failing on a VM, producing error CL_PLATFORM_NOT_FOUND_KHR
1. Don't load a platform if there are no devices available for it. If there is no platform that has visible devices, only allow the PAL platform to load.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/17419/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_icd.cpp#34 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#309 edit
SWDEV-145570 - Support loading fat binary generated through --genco by hipModuleLoad.
hip-clang --genco generates fat binary instead of code object. To support that
we need to extract code object from fat binary in hipModuleLoadData. This is
needed for hipRTC since multiple GPU archs may be passed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#308 edit
SWDEV-189140 - Add P2P support in PAL path
- PAL requires P2P resource open on the usage device. Add the new interface to open the resource
- Add a hidden P2P device object creation into amd::Memory. It can be activated with OCL context that has a single device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_p2p_amd.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#337 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#126 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#306 edit
SWDEV-145570 - [HIP] Increase SVM size
In MGPU setups, we need more SVM to accomodate up to 4 GPUs @ 16GB each.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#302 edit
SWDEV-132899 - [OCL][GFX10] report number of WGP by default on gfx10 ASICs
Both HSAIL/SC and LC compilers use WGP mode by default on gfx10 ASICs (i.e., COMPUTE_PGM_RSRC1.WGP_MODE is set to 1 by both compilers) therefore runtime should report number of WGP (i.e., CU/2) on gfx10 ASICs by default.
The new environment variable (GPU_ENABLE_WGP_MODE = 0) can be used to force CU mode on LC (i.e., -mcumode option) if its needed (HSAIL/SC doesn't have any compiler option for forcing the CU mode)
Also, using the new environment variable (GPU_ENABLE_WAVE32_MODE) to control the wave32 mode on gfx10+.
ReviewRequestURL = http://ocltc.amd.com/reviews/r/16435/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#329 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#65 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#301 edit
SWDEV-159036 - HBCC is off by default in OCL on WX9100
- Disable HBCC, since it causes ocltst failures on TC
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#293 edit
SWDEV-159036 - HBCC is off by default in OCL on WX9100
- Add HBCC size to the global memory. It seems KMD fixed all regressions.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#292 edit
SWDEV-151981 - Removal of CPU support on Windows
- Part 6. Remove obsolete environment variables
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#289 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Fix a regression in the AMF test and reenable the suballoc optimization. Rearrange the locks around cache field access only to avoid calling memory release under the cache lock.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#57 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#53 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#287 edit
SWDEV-132899 - [gfx10][OCL]- Adding support for forcing WaveSize32 from runtime for testing on gfx10 HW emulator
Motivation: During testing ocltst on Windows on PAL/HSAIL/SC path on gfx10 HW emulator, it was found that SC uses WaveSize64 by default for compute kernels.
SC also has an interface that can be used for forcing the WaveSize to 32 or 64.
- Adding the "-force-wave-size-32" into compiler to be passed down to Finalizer/SC
- Adding environment variable "GPU_FORCE_WAVE_SIZE_32" that can be used from runtime to force WaveSize32 compilation in HSAIL/SC path
ReviewBoardURL = http://ocltc.amd.com/reviews/r/14364/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#69 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#55 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#284 edit
SWDEV-142271 - Performance drop is observed in Ocean Surface Simulation of Compubenchcl in 17.50 when compared to 17.Q4.1
- Rewrite the adaptive mode for waveliimiter. Make sure the performance feedback corresponds to the right wave count. Add the new sampling logic to find the best number, based on average performance.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#295 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.hpp#15 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#71 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palwavelimiter.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palwavelimiter.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#88 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#282 edit
SWDEV-132238 - [CQE OCL][Vega10][DTB-Blocker][QR] 'Allocation (Single)' test of WF Conformance is failing; Faulty CL# 1451444
- Disable reporting extra HBCC memory by default. Reporting extra memory can be reenabled with GPU_ADD_HBCC_SIZE=1
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#60 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#280 edit
SWDEV-130808 - Add support of two new queries: CL_DEVICE_PREFERRED_WORK_GROUP_SIZE_AMD, CL_DEVICE_MAX_WORK_GROUP_SIZE_AMD.
- Restore the original behavior when setting GPU_MAX_WORKGROUP_SIZE.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#356 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocsettings.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#279 edit
SWDEV-129129 - [[CQE OCL][Vega vs Fiji] Upto 12% Performance drop observed on VEGA10 compared to FIJI while running BlackMagic Davinci Resolve
- Force tiny read_only buffers into USWC memory. That will avoid expensive tiny data uploads, which occur every frame.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#278 edit