ECR #304775 - First batch of build fixes for clang.
Fixes hard source errors and a handful of simple warnings, but leaves most other warnings for later. Other errors not fixed here are from adding compile flags that are not understood.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/clc/src/e2lCommon.h#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/opt_level.hpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/BRIGAsmPrinter.cpp#117 edit
... //depot/stg/opencl/drivers/opencl/opencldefs#162 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#289 edit
EPR #010002 - Change OpenCL version number from 1849 to 1850.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1596 edit
EPR #010002 - Change OpenCL version number from 1848 to 1849.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1595 edit
ECR #333753 - ORCA RT/Compiler Lib: HSA Code Object/RT independent loader introducing/integration into OpenCL.
Changes by Evgeniy Mankov.
Purpose:
Use the same Finalizer & loader for both HSA & ORCA RT.
AMDIL path is not affected.
Changes:
1. The whole BRIG is finalized now instead of per kernel finalization (both in gpuprogram & hsail_be).
2. HSALoader is changed in order to work with CodeObject and new HSA Loader's API <96> Context. Now it is in ORCA<92>s gpuprogram instead of Compiler Lib.
3. brig_loader.cpp is removed from compiler lib, as well as __aclHSALoader function exports from the whole stack.
4. BIF .text section now contains the whole finalized HSA CodeObject instead of separate symbols for finalized kernels.
5. ORCA RT now works directly with amd_kernel_code_t and doesn't need any SC metadata anymore.
6. aoc2 is supplemented with fake offline loader correspondingly.
7. amdocl/complib make sytem changes.
8. test_driver.pl update.
ToDo:
1. Implement disassemble() & BuildLog() functions to support ISA dumping & SC error handling (Konstantin).
2. Global variables initialization by pragma reference (Konstantin). Test to verify: test_basic progvar_prog_scope_init.
3. Code Object without kernels support (Nikolay - ready). Test to verify: test_generic_address_space.exe library_function
testing: windows smoke, pre check-in, ocl conformance 2.0, ocl SDK 2.9
Reviewers: Nikolay Haustov, German Andryeyev
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.def.in#13 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.map.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/build/Makefile.api#116 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.def.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.map.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.def.in#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.map.in#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/build/Makefile.gpu#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/build/Makefile.complib#85 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/build/Makefile.aoc2#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#248 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#288 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#194 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.hpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuscsi.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#368 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#12 edit
ECR #304775 - Fix a crash in memorybandwidth test
- Remove a pinned mem object from the list only if we need a free slot.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#367 edit
ECR #304775 - Mipmaps support. Fix falures after CL#1163104
- don't allocate extra view if level/layer was created already
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#124 edit
ECR #304775 - Mipmaps support
- Fix the view creation for the host path transfers. GSL can ignore the original mipmap surface dimensions and apply the new settings
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#220 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#123 edit
EPR #420701 - WDDM 2.0
- Indicate lite initialization via a new parameter initLite. This is needed as OCL\OGL can query devices first and lot of setup in IOL may not be really needed. On Win10, we create paging\fences for sync which dont get destroyed by the OS unless process is killed. This may prevent slave GPU powerdown in PX cases.
- Disable WriteFence for OpenCL for now.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7765/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#72 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#122 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#48 edit
EPR #410989 - [Project Brahma] - CL-GL Support
At present, multi-gpu is not supported on Brahma stack. Thus, glxGetContextMVPUINFOAMD hasn't been implemented by OGL.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDeviceGL.cpp#24 edit
EPR #397491 - Disabling 32 bit generic address space temporarily because of bug 10841.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#121 edit
EPR #420344 - Forum [180211]: enqueueNDRangeKernel crashes to execute device binary if it contains printf statements
This is a temporary workaround to avoid app crash when a kernel has pritntf but the program object is built from a binary (i.e., the printf info is not propagated if the program object is built from a binary).
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7676/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprintf.cpp#36 edit
EPR #420584 - [CQE OCL][ISV][QR][SI] FAHBenchmark application is crashing on all SI cards.
Wave limiter causes FAH crash on SI. Disable wave limiter for SI as a workaround.
Opened bug #10817 to track this issue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#6 edit
EPR #419347 - Fix a d3d9 memory leak.
According to https://msdn.microsoft.com/en-us/library/windows/desktop/bb174386(v=vs.85).aspx: Calling IDirect3DDevice9::GetDirect3D will increase the internal reference count on the IDirect3D9 interface. Failure to call IUnknown::Release when finished using this IDirect3D9 interface results in a memory leak..
Although p3d9devEx->Release() has the same effect as p3d9dev->Release(), for clarification we better use p3d9dev->Release() instead.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDeviceD3D9.cpp#13 edit
ECR #304775 - Mipmaps support in OpenCL
- Enable PAD2 bit for miplevel views
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#218 edit
EPR #403782 - IOMMU2/SVM
- Update the caching and hit logic for resource cache to reflect allocation attributes for SVM. Else it can give wrong hits leading to hangs if a regular surface is used for shader upload etc. IOMMUv2 strictly needs shader and command buffers to have EXECUTE attribute.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7572/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#217 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.hpp#80 edit
EPR #397491 - Disabling generic address space on 32 bit windows too for now.
Back out revision 116 from //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#117 edit
EPR #397491 - According to HSA-Finalizer-ADD, for GPUVM32 private_segment_aperture_base_hi and group_segment_aperture_base_hi should be equal to the 32 bits of the 32 bit private and group segment flat address aperture.
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#362 edit
ECR #333756 - HSA Finalizer: Make sure size of kernarg segment, alignment of kernarg, private and group segments are multiple of 16. Update ORCA runtime assert. [ OpenCL integration of CL 1151953]
Change by Nikolay Haustov
Testing: http://ocltc:8111/viewModification.html?modId=51851&personal=true&init=1&tab=vcsModificationBuilds
Also fix uncovered problem in test.
Testing: pre-checkin
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sc/HSAIL/hsail-fin/HSAILFinalizer.cpp#16 integrate
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/sc/HSAIL/tests/src/finalizer/features/structural_analysis/short_circuit/short_circuit06.hsail#4 integrate
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#284 edit
EPR #403782 - IOMMU2/SVM
- Disable DX interop on SVM. This is a feature for SVM and may need more work.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7555/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#310 edit
ECR #304775 - Mipmaps support
- Following CL#1151650. Change the comparison condition to 1.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#216 edit
ECR #304775 - Mipmaps support
- Enable miplevel flag even for the first mip level when runtime creates a view. Otherwise GSL may change the pitch alignment for the created view.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#215 edit
EPR #397491 - Disable generic address space only on 32 bit Linux
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#116 edit
ECR #333753 - Compiler Lib/RT: libutils.h usage removal due to non-API interface
Utils are to be used only by Compiler Lib itself.
Testing: pre checkin
Reviewers: German Andryeyev, Brian Sumner, Yaxun Liu
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#178 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#193 edit
EPR #419351 - clEnqueueNDRange crash if one doesn't create a device queue and use device enqueue in the kernel
- add a check for defQueue is NULL in case the app didn't create one.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#510 edit
EPR #419065 - [CQE OCL][ISV][QR][G] FAHBenchmark application is crashing.
Two issues:
1. Remove clearing of profileEnabled_ since it may cause incorrect kernel execution time measurement.
2. Blit kernels causes assertion in getWavesPerSH since they do not have wave limiters. Remove the assert. If a kernel has no wave limiter, returns 0 in getWavesPerSH.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#361 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#5 edit
ECR #304775 - Wave limiter: Fix bug in adaptation.
Dumped waves/simd value is incorrect.
Should exit adptation only after the changed waves/simd value is applied.
Added wave limiter manager to handle situation that one kernel is enqueued to more than one queues. Create wave limiter for each virtual device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#245 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#283 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#360 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#76 edit
EPR #410821 - Reduced the maxMemAlloc and global memory size on APUs to 75% of uncachedRemoteRAM.
http://ocltc.amd.com/reviews/r/7425/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#509 edit
ECR #304775 - Wave limiter: Fix crash in CompuBenchCL video composition due to profiling data not collected correctly.
Gpuvirtual.cpp only collects profiling data when all events have profiling enabled. Fixed it by adding a member to indicate at least one event has profiling enabled and collect profiling data.
Improved adptation by changing waves/simd only when the last change has been enforced. Also detecting discontinuities in measured data and discard them.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#359 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#230 edit
ECR #304775 - Fix offline compilation for Hawaii with -cl-fp32-correctly-rounded-divide-sqrt flag
- check for cl-fp32-correctly-rounded-divide-sqrt support uses device info, but device info was never fully updated for offline devices. This change will update device info structure for offline devices as well.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#508 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#143 edit
EPR #403782 - IOMMU2/SVM on CZ Win10, GL Interop changes for SVM.
- Pass flag to GL to disable forceRemoteAllocation during GLDissociate
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDeviceGL.cpp#23 edit
ECR #304775 - Wave limiter: fix bug about some variables not initialized before being used.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#2 edit