ECR #304775 - Disable Pre-SI asics initialization in runtime. EG and NI are no longer supported from 15.30
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#125 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#519 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#135 edit
ECR #304775 - back out 1179705 since it also affects AMDIL path
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#322 edit
EPR #397491 - Disable 32 bit OCL2.0 on Win 10 for now. isWindows10OrLater() call doesn't work. So, passing a flag from GSL to runtime to disable 32 bit OCL2.0 on Win 10.
http://ocltc.amd.com/reviews/r/8234/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#321 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/backend.h#8 edit
ECR #304775 - switch on "denorm" support on VI and make HSAIL default to not generating ftz modifier
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILISelDAGToDAG.cpp#48 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#320 edit
EPR #419072 - [OpenCL2.0] Enable 16MB large on device queues
- Enable device queue creation up to 12MB. That should allow to run Intel SDK sample from the EPR that requires 6MB queue only.
- Currently a queue with >12.5MB size has a significant performance degradation. Thus the current max possible is 12MB. In general it's preferable to use the queue size more suitable for the task, rather than max possible.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#123 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#517 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#372 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#131 edit
EPR #421017 - IOMMU2/SVM on CZ Win10, the bit INST_ATC of COMPUTE_PGM_HI needs to be set for device enqueue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#370 edit
EPR #424562 - Add Averaging algorithm to Wave Limiter.
1. Extract the algorithms to a sub-class of the wave limiter class.
2. Add Averaging algorithm
This averaging algorithm typically improves performance of BasemarkCL wave simulation by 8% on Tonga/Fiji than the current smooth algorithm. This change has not enable the averaging algorithm yet. Follow-up changes should be made to intelligently select which algorithm to use.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#4 edit
EPR #397491 - Back out changelist 1177450. Disable 32 bit generic address space
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#130 edit
EPR #397491 - Enable generic address space for 32 bit on Windows
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#129 edit
ECR #333753 - ORCA RT/Compiler Lib/aoc2: AMD HSA Code Object Import feature (part II) - arbitrary hidden (extra) kernargs support
Only HSAIL path is affected. It doesn't affect blit kernels.
To use offline by aoc2:
aoc2 -hsacodeobject=<importing_code_object_filename> -numhiddenkernargs=<num> -cl-std=CL2.0 -march=hsail(-64) -mdevice=Bonaire <source_cl_filename>
To use online by setting env:
AMD_DEBUG_HSA_NUM_HIDDEN_KERNARGS=<num>
where num >= 0. If num == 0, then no additional arguments will be added on RT for every kernel. The default value is unchanged and equal to 6 for now.
Misc:
+ get rid of PRE & POST defines in Compiler Lib, as they started to conflict with ugl\gl\gs\hwl\ headers with the same defines.
+ minor copy/paste eliminations & typo fixes
+ ocltst complib tests update
Testing: pre check-in, manually based on ocl sdk MatrixMultiplication
Reviewers: Brian Sumner, German Andryeyev, Nikolay Haustov, Artem Tamazov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#72 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#49 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/metadata.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclDefs.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclEnums.h#19 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclStructs.h#17 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/bif_section_labels.hpp#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#20 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#181 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#249 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#291 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#113 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#199 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#369 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsaprogram.cpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsaprogram.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLAssumptionCheck.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLEnumCheck.cpp#44 edit
EPR #419362 - Forum [170348]: problem with printf for OpenCL 2.0 kernel build - added a set of missing symbols to flex including space, added a extra backslash to escape sequences that could not be handled by flex and also to colon which is defined in flex as a delimiter!
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/MDParser/AMDILMDParser.l#3 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/MDParser/AMDILMDParser.output#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/MDParser/lex.yy.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/HSAILProducePrintfMetadata.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDPrintfRuntimeBinding.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#290 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/compiler/OCLPrintSpecialChars.cpp#1 add
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/compiler/OCLPrintSpecialChars.h#1 add
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/compiler/TestList.cpp#41 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/compiler/build/Makefile.compiler#44 edit
ECR #333753 - cl-denorms-are-zero runtime change - comment out "singleFpDenorm_ = true" for now.
So AMD_GPU_FORCE_SINGLE_FP_DENORM is required to enable this, until a few issues are resolved.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#316 edit
EPR #394700 - SubAllocation Scheme for ConstantBuffers Implemented in GLL.
This CL introduces sub-allocation for constant buffer for all Asics and currently supported by GLL, it can easily be opted in by OpenCL, OES as well.
OpenCL changes for Sub-allocation scheme for constantbuffers. For more info please see CL#1172125
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#126 edit
ECR #304775 - First batch of build fixes for clang.
Fixes hard source errors and a handful of simple warnings, but leaves most other warnings for later. Other errors not fixed here are from adding compile flags that are not understood.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/clc/src/e2lCommon.h#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/opt_level.hpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/BRIGAsmPrinter.cpp#117 edit
... //depot/stg/opencl/drivers/opencl/opencldefs#162 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#289 edit
EPR #010002 - Change OpenCL version number from 1849 to 1850.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1596 edit
EPR #010002 - Change OpenCL version number from 1848 to 1849.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1595 edit
ECR #333753 - ORCA RT/Compiler Lib: HSA Code Object/RT independent loader introducing/integration into OpenCL.
Changes by Evgeniy Mankov.
Purpose:
Use the same Finalizer & loader for both HSA & ORCA RT.
AMDIL path is not affected.
Changes:
1. The whole BRIG is finalized now instead of per kernel finalization (both in gpuprogram & hsail_be).
2. HSALoader is changed in order to work with CodeObject and new HSA Loader's API <96> Context. Now it is in ORCA<92>s gpuprogram instead of Compiler Lib.
3. brig_loader.cpp is removed from compiler lib, as well as __aclHSALoader function exports from the whole stack.
4. BIF .text section now contains the whole finalized HSA CodeObject instead of separate symbols for finalized kernels.
5. ORCA RT now works directly with amd_kernel_code_t and doesn't need any SC metadata anymore.
6. aoc2 is supplemented with fake offline loader correspondingly.
7. amdocl/complib make sytem changes.
8. test_driver.pl update.
ToDo:
1. Implement disassemble() & BuildLog() functions to support ISA dumping & SC error handling (Konstantin).
2. Global variables initialization by pragma reference (Konstantin). Test to verify: test_basic progvar_prog_scope_init.
3. Code Object without kernels support (Nikolay - ready). Test to verify: test_generic_address_space.exe library_function
testing: windows smoke, pre check-in, ocl conformance 2.0, ocl SDK 2.9
Reviewers: Nikolay Haustov, German Andryeyev
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.def.in#13 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.map.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/build/Makefile.api#116 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.def.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.map.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.def.in#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.map.in#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/build/Makefile.gpu#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/build/Makefile.complib#85 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/build/Makefile.aoc2#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#248 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#288 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#194 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.hpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuscsi.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#368 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#12 edit
ECR #304775 - Fix a crash in memorybandwidth test
- Remove a pinned mem object from the list only if we need a free slot.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#367 edit
ECR #304775 - Mipmaps support. Fix falures after CL#1163104
- don't allocate extra view if level/layer was created already
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#124 edit
ECR #304775 - Mipmaps support
- Fix the view creation for the host path transfers. GSL can ignore the original mipmap surface dimensions and apply the new settings
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#220 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#123 edit
EPR #420701 - WDDM 2.0
- Indicate lite initialization via a new parameter initLite. This is needed as OCL\OGL can query devices first and lot of setup in IOL may not be really needed. On Win10, we create paging\fences for sync which dont get destroyed by the OS unless process is killed. This may prevent slave GPU powerdown in PX cases.
- Disable WriteFence for OpenCL for now.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7765/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#72 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#122 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#48 edit
EPR #410989 - [Project Brahma] - CL-GL Support
At present, multi-gpu is not supported on Brahma stack. Thus, glxGetContextMVPUINFOAMD hasn't been implemented by OGL.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDeviceGL.cpp#24 edit