SWDEV-94610 - Make sure each kernarg segment sits on a different cache line (align the kernargs on cache lines at minimum). Minor misc cleanups.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#13 edit
SWDEV-101383 - [RS_DVR][MGPU] Slave GPU is blocked from going into BACO when DVR process is active (no recording or instant replay)
- Fix a memory leak
- Also make sure to use VALIDATE_ONLY flag properly as bindExternalDevice can be called even during context creation for which we cant close the adaper
ReviewBoardURL = http://ocltc.amd.com/reviews/r/11330/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#555 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#174 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#62 edit
SWDEV-79278 - [OpenCL][PAL] fixing a regression in gfx9 after CL#1309875 which caused all the OCLTST tests to fail on gfx9 emulator. Dont add any extra entry to the GfxIpDeviceInfo table as this table must match with GfxIpLevel enum (located in //depot/stg/pal/inc/core/palDevice.h).
ReviewBoardURL = http://ocltc.amd.com/reviews/r/11313/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#11 edit
SWDEV-101448 - [CQE OCL][Brahma][PERF][QR] ~21% perf drop is observed with lulesh-cl subtest of ComputeApps tests : Faulty CL # 1306133
- Use the logic for transfer size before CL#1306133
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#124 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#10 edit
SWDEV-101315 - Fix PerfCounter not working under CodeXL.
1. Need to map ORCA PerfCounter block to PAL PerfCounter block/instance.
2. CodeXL could try to create PerfCouters that don't exist in HW, so need to handle that and return 0 as result.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcounters.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcounters.hpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#21 edit
SWDEV-101853 - Fix the build, add a "return NULL" after the assert.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#7 edit
SWDEV-94610 - Fill the compileSize_ and compileSizeHint_ info from the LC metadata.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#13 edit
SWDEV-101853 - roc::Kernel cleanups:
- Remove unused classes & member functions/variables.
- Flatten vector arguments for the HSAIL path to remove the need for numElem_.
- Consolidate initArguments in a single loop for the HSAIL path.
- Use the Kernel::Argument to fill the OCL descriptor as much as possible.
- Set the access qualifier for both buffers and images.
- Fix the indentation and coding conventions.
- Add new ROC_ARG_TYPE type for hidden arguments
- Add an index_ field the roc::Kernel::Argument to record the OCL signature index for this argument, or -1 for hidden arguments
- Handle the hidden arguments as any other argument at dispatch (now included in the hsailArgList_)
- roc::Kernel::hsailArgAt(int) now returns the kernel argument for the given position in the OCL signature, not the position the the hsailArgList_.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#12 edit
SWDEV-101383 - Back out CL1310033 as it is causing Carrizo Win 10 Sanity test to crash at ocltst module ocldx.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#553 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#172 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#61 edit
SWDEV-2 - Change OpenCL version number from 2216 to 2217.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1963 edit
SWDEV-2 - Change OpenCL version number from 2215 to 2216.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1962 edit
SWDEV-101169 - Compile the PCH file from <stdin> instead of a file reference. This removes the requirement to have the original file present when using the PCH file.
Affected files ...
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/headers/build/Makefile.headers#9 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/irif/build/Makefile.irif#6 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ockl/build/Makefile.ockl#7 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#9 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/ocml/build/Makefile.ocml#7 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/opencl/build/Makefile.opencl#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#14 edit
SWDEV-2 - Change OpenCL version number from 2214 to 2215.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1961 edit
SWDEV-94610 - The spec says that the value returned for HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH does not include the NUL terminator. We should add one before using the string.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#25 edit
SWDEV-101383 - [RS_DVR][MGPU] Slave GPU is blocked from going into BACO when DVR process is active (no recording or instant replay)
- if the OS is Win10, no need to do extensive adapter init.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/11241/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#552 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#60 edit
SWDEV-101621 - [CQE OCL][OpenCL on PAL] 6 WF Conformance tests are failing
- Make sure the rowPitch is aligned to pixels for images created from buffer
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#10 edit
SWDEV-79278 - [OpenCL][PAL] force Vega10(gfx9)(aka: Greenland) to use PAL backend
ReviewBoardURL = http://ocltc.amd.com/reviews/r/11279/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#551 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Improve image fill performance with multiple writes in a single thread. The current split has 3 regions
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/common.hsa/src/blitKernels.cl#4 edit
... //depot/stg/opencl/drivers/opencl/library/common/src/blitKernels.cl#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#123 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#4 edit
SWDEV-94610 - Restore the amdgpu_metadata.[ch]pp files. We need to share these files between different projects, and should avoid branching them. Ideally, they would be part of a metadata utility library.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/amdgpu_metadata.cpp#1 add
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/amdgpu_metadata.hpp#1 branch
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmetadata.cpp#3 delete
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmetadata.hpp#4 delete
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.hpp#11 edit
SWDEV-2 - Change OpenCL version number from 2213 to 2214.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1960 edit
SWDEV-94610 - Use the metadata to set the correct size for pointer arguments. Pointers to different address spaces may be of different sizes.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#11 edit
SWDEV-94610 - Fix the argName length issue. The string returned by the ROCR is already NUL-terminated.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#22 edit
SWDEV-94610 - Fix the API::get_kernel_arg_info conformance test failure. The runtime metadata needs to return references from Name() and TypeName() instead of temporary strings. Name().c_str() should be valid until the program is destroyed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmetadata.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmetadata.hpp#3 edit
SWDEV-2 - Change OpenCL version number from 2212 to 2213.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1959 edit
SWDEV-94644 - Run prepare-builtins on the control functions.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/build/Makefile.library#53 edit
... //depot/stg/opencl/drivers/opencl/make/amdgcn.git/oclc/build/Makefile.oclc#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#19 edit
SWDEV-86035 - Enable PAL for GFX9 by default
- GPU_ENABLE_PAL=0 will force GSL backend for GFX9
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#550 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#256 edit
SWDEV-101678 - Create a new instance of the ROCm-OpenCL-Driver for each call to compileImpl and linkImpl.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#202 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccompiler.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#18 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.hpp#9 edit
SWDEV-2 - Change OpenCL version number from 2211 to 2212.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1958 edit
SWDEV-101206 - [CQE OCL][Perf][G][QR] Upto ~9% Performance drop observed while running Video Composition subtest of Compubench; Faulty CL#1306133
- Use the original logic without DMA flush. Flush on staging write helps with a blocking op only, but currently VDI doesn't have that information.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#122 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#7 edit
SWDEV-2 - Change OpenCL version number from 2210 to 2211.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1957 edit
SWDEV-2 - Change OpenCL version number from 2209 to 2210.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1956 edit
SWDEV-2 - Change OpenCL version number from 2208 to 2209.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1955 edit
SWDEV-2 - Change OpenCL version number from 2207 to 2208.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1954 edit
SWDEV-94610 - Wait on every kernel dispatch if env.GPU_FLUSH_ON_EXECUTION is set.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#9 edit
SWDEV-101354 - HSA HLC: fix unify metadata pass
When we link multiple modules we have metadata duplicated, so after we link with our library bitcode is twice bigger than needs to be.
Besides we did not unify llvm.ident metadata since llvm 3.6 merge.
Fix that:
1. Add llvm.ident to the processing;
2. Do not duplicate strings within unified metadata;
3. Run unification pass post link, not before the link.
Now since our library is compiled for OpenCL 2.0 we will always get OCL version 2.0 as a maximum. That is not really correct, and since
the pass was not really working before that would lead to regression, as we would fail to identify correct kernel's OpenCL version and
perform simplifications for 1.2. Now the pass will pick the first version, which shall represent the kernel module. That might not be
100% correct because we may have several kernel modules, but a proper fix would require to correctly identify library as 1.2, which is
troublesome. In the current state that just keeps the status quo.
Testing: smoke, precheckin
Reviewed by Evgeny Mankov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#152 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/include/AMDFixupKernelModule.h#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/lib/AMDFixupKernelModule.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/linker/tools/opencl-link/opencl-link.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDUnifyMetadata.cpp#2 edit