SWDEV-86170 - Need OCL changes for Compute Unit Reservation
- Add support for RT and Medium priority queues
- Use the new packet for the CU mask programming. It will allow CU reservation for RT queue in KMD.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_command.cpp#11 edit
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#546 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#159 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#402 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#139 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#81 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#52 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#165 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/backend.h#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.hpp#17 edit
SWDEV-92245 - [CQE OCL][QR][G] Multiple 32/64 bit Bolt sample crashes on all CPUs. FaultyCL#1257532
- Bolt calls map/unmap with the same region twice. Add a counter for the same region to track that case
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#195 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#272 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#401 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#6 edit
SWDEV-76870 - Add write feature to AMD extension (Merging //depot/rel/r6/15.30.1023/stream/opencl/... to //depot/stg/opencl/drivers/opencl/...)
Add clEnqueueReadBufferToFileAMD function to AMD extension (added declaration in cl_context.cpp)
1. Added clEnqueueReadBufferToFileAMD_fn function declaration to cl_ext.h
2. Added LiquidFlashFile::writeBlock method who implements transfer from GPU to SSD. In order to avoid code duplication LiquidFlashFile::readBlock and LiquidFlashFile::writeBlock is called from new method LiquidFlashFile::transferBlock (changes in cl_lqdflash_amd.cpp)
3. clEnqueueWriteBufferFromFileAMD and clEnqueueWriteBufferFromFileAMD call internal function EnqueueTransferBufferFromFileAMD who makes the same preparations as clEnqueueWriteBufferFromFileAMD in the prvious release
4. WriteBufferFromFileCommand class is renamed to TransferBufferFromFileCommand and new class makes the same preparation except assigning transfer direction (read or write)
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_context.cpp#48 integrate
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_lqdflash_amd.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_lqdflash_amd.h#4 integrate
... //depot/stg/opencl/drivers/opencl/api/opencl/khronos/headers/opencl2.0/CL/cl_ext.h#24 integrate
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#267 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#396 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#81 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#96 edit
SWDEV-78103 - Remove the extra flush from WriteMarker call.
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#389 edit
SWDEV-59579 - using existing function to simplify the code.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#301 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#385 edit
SWDEV-59579 - resubmit the changelist 1193161. refactory the Coare-grained SVM and fine grain buffer SVM code path, so that if the device SVM running on supports fine grain system, then the SVM API operation will be on system memory, no need to go through GPU backend. In addition, added support for PX system with CZ on windows 10, which supports SVM fine grain system.
code review:
http://ocltc.amd.com/reviews/r/8530/
precheckin:
http://ocltc.amd.com:8111/viewModification.html?modId=58913&personal=true&buildTypeId=&tab=vcsModificationBuilds&show_all_builds=true
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#527 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#152 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#382 edit
SWDEV-59579 - refactory the Coare-grained SVM and fine grain buffer SVM code path, so that if the device SVM running on supports fine grain system, then the SVM API operation will be on system memory, no need to go through GPU backend. In addition, added support for PX system with CZ on windows 10, which supports SVM fine grain system.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#256 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#525 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#150 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#380 edit
EPR #426143 - [AVID] clEnqueueWriteImage with a row pitch different from 0 will fail if we use the pre-pinned path
- pass pitch and slice to the copy functions
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#377 edit
EPR #419072 - [OpenCL2.0] Enable 16MB large on device queues
- Enable device queue creation up to 12MB. That should allow to run Intel SDK sample from the EPR that requires 6MB queue only.
- Currently a queue with >12.5MB size has a significant performance degradation. Thus the current max possible is 12MB. In general it's preferable to use the queue size more suitable for the task, rather than max possible.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#123 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#517 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#372 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#131 edit
EPR #421017 - IOMMU2/SVM on CZ Win10, the bit INST_ATC of COMPUTE_PGM_HI needs to be set for device enqueue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#370 edit
ECR #333753 - ORCA RT/Compiler Lib/aoc2: AMD HSA Code Object Import feature (part II) - arbitrary hidden (extra) kernargs support
Only HSAIL path is affected. It doesn't affect blit kernels.
To use offline by aoc2:
aoc2 -hsacodeobject=<importing_code_object_filename> -numhiddenkernargs=<num> -cl-std=CL2.0 -march=hsail(-64) -mdevice=Bonaire <source_cl_filename>
To use online by setting env:
AMD_DEBUG_HSA_NUM_HIDDEN_KERNARGS=<num>
where num >= 0. If num == 0, then no additional arguments will be added on RT for every kernel. The default value is unchanged and equal to 6 for now.
Misc:
+ get rid of PRE & POST defines in Compiler Lib, as they started to conflict with ugl\gl\gs\hwl\ headers with the same defines.
+ minor copy/paste eliminations & typo fixes
+ ocltst complib tests update
Testing: pre check-in, manually based on ocl sdk MatrixMultiplication
Reviewers: Brian Sumner, German Andryeyev, Nikolay Haustov, Artem Tamazov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#72 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#49 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/metadata.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclDefs.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclEnums.h#19 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclStructs.h#17 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/bif_section_labels.hpp#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#20 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#181 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#249 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#291 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#113 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#199 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#369 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsaprogram.cpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsaprogram.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLAssumptionCheck.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLEnumCheck.cpp#44 edit
ECR #333753 - ORCA RT/Compiler Lib: HSA Code Object/RT independent loader introducing/integration into OpenCL.
Changes by Evgeniy Mankov.
Purpose:
Use the same Finalizer & loader for both HSA & ORCA RT.
AMDIL path is not affected.
Changes:
1. The whole BRIG is finalized now instead of per kernel finalization (both in gpuprogram & hsail_be).
2. HSALoader is changed in order to work with CodeObject and new HSA Loader's API <96> Context. Now it is in ORCA<92>s gpuprogram instead of Compiler Lib.
3. brig_loader.cpp is removed from compiler lib, as well as __aclHSALoader function exports from the whole stack.
4. BIF .text section now contains the whole finalized HSA CodeObject instead of separate symbols for finalized kernels.
5. ORCA RT now works directly with amd_kernel_code_t and doesn't need any SC metadata anymore.
6. aoc2 is supplemented with fake offline loader correspondingly.
7. amdocl/complib make sytem changes.
8. test_driver.pl update.
ToDo:
1. Implement disassemble() & BuildLog() functions to support ISA dumping & SC error handling (Konstantin).
2. Global variables initialization by pragma reference (Konstantin). Test to verify: test_basic progvar_prog_scope_init.
3. Code Object without kernels support (Nikolay - ready). Test to verify: test_generic_address_space.exe library_function
testing: windows smoke, pre check-in, ocl conformance 2.0, ocl SDK 2.9
Reviewers: Nikolay Haustov, German Andryeyev
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.def.in#13 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.map.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/build/Makefile.api#116 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.def.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.map.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.def.in#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.map.in#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/build/Makefile.gpu#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/build/Makefile.complib#85 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/build/Makefile.aoc2#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#248 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#288 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#194 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.hpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuscsi.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#368 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#12 edit
ECR #304775 - Fix a crash in memorybandwidth test
- Remove a pinned mem object from the list only if we need a free slot.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#367 edit
EPR #397491 - According to HSA-Finalizer-ADD, for GPUVM32 private_segment_aperture_base_hi and group_segment_aperture_base_hi should be equal to the 32 bits of the 32 bit private and group segment flat address aperture.
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#362 edit
EPR #419065 - [CQE OCL][ISV][QR][G] FAHBenchmark application is crashing.
Two issues:
1. Remove clearing of profileEnabled_ since it may cause incorrect kernel execution time measurement.
2. Blit kernels causes assertion in getWavesPerSH since they do not have wave limiters. Remove the assert. If a kernel has no wave limiter, returns 0 in getWavesPerSH.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#361 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#5 edit
ECR #304775 - Wave limiter: Fix bug in adaptation.
Dumped waves/simd value is incorrect.
Should exit adptation only after the changed waves/simd value is applied.
Added wave limiter manager to handle situation that one kernel is enqueued to more than one queues. Create wave limiter for each virtual device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#245 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#283 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#360 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#76 edit
ECR #304775 - Wave limiter: Fix crash in CompuBenchCL video composition due to profiling data not collected correctly.
Gpuvirtual.cpp only collects profiling data when all events have profiling enabled. Fixed it by adding a member to indicate at least one event has profiling enabled and collect profiling data.
Improved adptation by changing waves/simd only when the last change has been enforced. Also detecting discontinuities in measured data and discard them.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#359 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#230 edit
ECR #399840 - OpenCL Runtime HW Debug support development - resolve the TDR issues on Kaveri.
1. update the resource descriptors in the runtime trap handler to match those in the HSA HW debug implementation
2. force to use SDMA for device memory map function, which is called when using clHwDbgSetGlobalMemoryAMD() and clEnqueueMapImage() functions, for HW debug
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#214 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gputrap.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#358 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#47 edit
EPR #403782 - IOMMU2/SVM
- Handle case of only one DMA engine available, for example with SVM.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7284/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#506 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#357 edit
ECR #399840 - OpenCL Runtime HW Debug support development - add support to the VI asics & support the use case of debug registeration in a pre-dispatch callback function
** Cross branch check-in with CL1131894
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#501 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#139 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gputrap.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#355 edit
EPR #394115 - Adding the environment variable "GPU_SELECT_COMPUTE_RINGS_ID" to select a specific compute queue for OCL submission. This EV was requested from KMD team for testing the CWSR demo on CZ.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7082/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#354 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#111 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#226 edit