EPR #010002 - Change OpenCL version number from 1632 to 1633.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1379 edit
[ROCm/clr commit: 6fddbe6449]
EPR #405753 - Fixed incorrect value of slicePitch returned from clEnqueueMapimage for 1Darray.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#111 edit
[ROCm/clr commit: edb288692d]
EPR #010002 - Change OpenCL version number from 1631 to 1632.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1378 edit
[ROCm/clr commit: 93ee6bb034]
EPR #405194 - Change unroll threshold to LLVM default to partially work around Linpack performance problem.
Prior to CL 1058428, which increased the unroll threshold to 200, this was only 100 which is lower than the LLVM default. Linpack's new ISA has increased register usage, but decreasing the unroll threshold to the previous level does not reduce the register count to its previous level. The increased register usage is probably a new SC problem, so this should probably be increased again in the future. There is no change in register usage with 100 vs. 150 on Linpack.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#113 edit
[ROCm/clr commit: d5f7502ca8]
ECR #304775 - Remove _ in hsail_64 triple enum name. It isn't consistent with itself, or most other targets. The string form is already "hsail64", but the target name is sometimes "hsail-64". Does not remove the - in "hsail-64" for the target name since users could be depending on that, although that should also be fixed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/codegen.cpp#57 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#110 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/ADT/Triple.h#36 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDLLVMContextHook.h#22 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/MC/MCObjectFileInfo.cpp#14 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Support/Triple.cpp#47 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Target/HSAIL/TargetInfo/HSAILTargetInfo.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/SPIR/AMDSPIRLoader.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Scalar/AMDLowerAtomics.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Utils/AMDUtils.cpp#2 edit
[ROCm/clr commit: c02dacedb2]
EPR #403493 - Block index error for CI and VI, OCL code change
Problem description: The OCL implementation requires HSA to used different block index values for CI and VI. However, the same index value is used for the same counter block in both CI and VI, which in turn caused a segmentation fault.
Root cause: HSA implementation does not know this situation before hand.
Solution: Fix to use different counter block index in CI from that in VI
Functional area: HSA perf counter implementation...
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsacounters.cpp#4 edit
[ROCm/clr commit: f517eefd51]
ECR #333756 - HSA Finalizer: added runtime option to force buffer instructions for global access
This can be used under ORCA RT.
Testing: smoke, smoke_clang, precheckin, clbas dgemm
Reviewed by Nikolay Haustov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/scwrapper/SI/scStateSI.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/OPTIONS.def#112 edit
[ROCm/clr commit: d35be99f01]
EPR #010002 - Change OpenCL version number from 1630 to 1631.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1377 edit
[ROCm/clr commit: 612ed6149b]
EPR #010002 - Change OpenCL version number from 1629 to 1630.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1376 edit
[ROCm/clr commit: 8751996be9]
EPR #400016 - Keep the path of temp folder if the app is WIndows app
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/os/os_win32.cpp#39 edit
[ROCm/clr commit: 8f5b43ffd1]
EPR #010002 - Change OpenCL version number from 1628 to 1629.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1375 edit
[ROCm/clr commit: c777c3e198]
EPR #405458 - clinfo segfaults when ENABLE_CAL_SHUTDOWN=1.
For the global variables of:
std::map <std::string, int> OptionNameMap[2];
std::map <std::string, int> NoneSeparatorOptionMap[2];
std::map <std::string, int> FOptionMap;
std::map <std::string, int> MOptionMap;
We don't need to call the clear() method explicitly, since the std::map destructor will clean things up (valgrind mem-check doesn't report any leak related to these global variables after this change). Besides, on Linux amd::option::teardown() is called after the global variables' destructors are called, and it will cause segfault.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/options.cpp#29 edit
[ROCm/clr commit: b9e695d254]
EPR #010002 - Change OpenCL version number from 1627 to 1628.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1374 edit
[ROCm/clr commit: 97c9f5611c]
EPR #010002 - Change OpenCL version number from 1626 to 1627.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1373 edit
[ROCm/clr commit: 126e8c33e1]
EPR #010002 - Change OpenCL version number from 1625 to 1626.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1372 edit
[ROCm/clr commit: 7bd8bf4f9c]
EPR #398128 - Windows 2015, WDDM2.0, New Residency Model
- Modify MarkUsedInCmdBuf in IOL to make sure that MakeResident is called for OpenCL (Part 2)
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5684/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#99 edit
[ROCm/clr commit: ef1f9267eb]
ECR #304775 - Add batching to the device enqueue for possible asynchronous execution
- Increase the max device queue size to 512KB. That will allow to pass conformance tests that enqueue more jobs than the queue size.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#459 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#333 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#65 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#39 edit
[ROCm/clr commit: 2738b30287]
EPR #010002 - Change OpenCL version number from 1624 to 1625.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1371 edit
[ROCm/clr commit: b2cfd32629]
EPR #010002 - Change OpenCL version number from 1623 to 1624.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1370 edit
[ROCm/clr commit: 467ec09d69]
EPR #010002 - Change OpenCL version number from 1622 to 1623.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1369 edit
[ROCm/clr commit: 6f7a3b20f8]
EPR #010002 - Change OpenCL version number from 1621 to 1622.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1368 edit
[ROCm/clr commit: 17edecc562]
ECR #333753 - unify online/offline linkers
The code for "FixUpModule" from the online linker is now moved to
a common file under llvm/lib. This replaces the copy present in
llvm/tools/llvm-link, thus unifying the two linkers.
Reviewed by Stanislav Mekhanoshin, Yaxun Liu (Sam)
Passes smoke, smoke_clang and precheckin.
Also passes OpenCL 2.0 conformance tests.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/linker.cpp#109 edit
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDFixupKernelModule.h#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm/include/llvm/AMDUtils.h#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Linker/AMDFixupKernelModule.cpp#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm/lib/Transforms/Utils/AMDUtils.cpp#1 add
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/llvm-link/AMDFixUpModule.cpp#12 delete
... //depot/stg/opencl/drivers/opencl/compiler/llvm/tools/llvm-link/llvm-link.cpp#48 edit
[ROCm/clr commit: 7f55691ebc]
EPR #010002 - Change OpenCL version number from 1619 to 1620.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1366 edit
[ROCm/clr commit: c018bed751]
ECR #304775 - Optimization for rectangular copies(Part2). Due to HW restriction of 14bits for src and dst pitch, its advantageous to choose optimal bpp. Higher the bpp the larger the byte pitch. This indirectly helps to reduce the number of packets for buffer copy(line by line vs a single sub_win raw packet)
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5605/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#191 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.hpp#76 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#64 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#38 edit
[ROCm/clr commit: 5efe63df44]
EPR #010002 - Change OpenCL version number from 1618 to 1619.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1365 edit
[ROCm/clr commit: 61fa04cf2b]
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- FIx for TC regression after CL#1069020. Move the lock directly to the gsl flush() calls.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#332 edit
[ROCm/clr commit: 7cc2a2d6e3]
ECR #333753 - Compiler Lib: improve & refactor HSAIL text routines
+ HSAIL text is always being inserted into BIF now in one place of Codegen phase
+ AMDIL & HSAIL paths are unified at Codegen phase
+ Error handling is improved
Testing: make smoke_clang, pre check-in
Reviewers: Brian Sumner, Nikolay Haustov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#47 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.hpp#10 edit
[ROCm/clr commit: 762e51bb71]
EPR #010002 - Change OpenCL version number from 1617 to 1618.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1364 edit
[ROCm/clr commit: 3f7a110561]
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- Add a new MapCacheLock monitor to separate the map cache from the global lock
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#456 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#131 edit
[ROCm/clr commit: 18b88ee095]
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- Add VGPU lock to flush() method, because gsl flush for the same context could be called from multiple threads
- Use new scratchAlloc_ monitor for scratch reallocation
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#455 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#130 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#331 edit
[ROCm/clr commit: a4bede39eb]
EPR #010002 - Change OpenCL version number from 1616 to 1617.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1363 edit
[ROCm/clr commit: f68129b85b]
ECR #333753 - Compiler Lib & RT: Fix for Compiler's build log printing on RT.
+ RT now asks correctly Compiler's build log by aclGetCompilerLog().
+ BuildLog is added for HSAILKernel by moving it from NullKernel class to Kernel class.
+ Compiler's Lib appendLogToCL() is fixed.
+ Usage of API's aclExtractSection/aclExtractSymbol/aclInsertSection/aclInsertSymbol in Compiler Lib itself replaced by it's inner realizations extSec/extSym/insSec/insSym due to unneded build log clearing in first case.
+ Phase info is added to build log even if CallBack function is not presented for aclCompiler.
How to verify:
set AMD_OCL_BUILD_OPTIONS_APPEND="-print-compile-phases -buildlog=stdout"
test_integer_ops integer_ctz
test_integer_ops integer_ctz cpu
Testing: make smoke_clang, selective OCL conf. tests, pre check-in
Reviewer: Brian Sumner, German Andryeyev
Review board: http://ocltc.amd.com/reviews/r/5582/
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#46 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/brig_loader.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#228 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#262 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#174 edit
[ROCm/clr commit: 96c74ba5fd]
EPR #010002 - Change OpenCL version number from 1615 to 1616.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1362 edit
[ROCm/clr commit: 7035548e92]
ECR #304775 - HSAIL: Direct SRD support
- Copy SRD to CB1 for image views to avoid a wait for SRD resource when image view is destroyed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#261 edit
[ROCm/clr commit: 83baaf707e]
EPR #010002 - Change OpenCL version number from 1614 to 1615.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#1361 edit
[ROCm/clr commit: ced3dd9589]
ECR #304775 - Refactor code to do line by line copies for read\write Rect. This avoids taking the blit copy path which may be even slower.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5567/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#108 edit
[ROCm/clr commit: a5e788c9f8]
ECR #304775 - Correct a typo where I didnt remove the offset from the condition which made the writeRect take pinning path.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5566/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#330 edit
[ROCm/clr commit: d40300fab7]