EPR #419072 - [OpenCL2.0] Enable 16MB large on device queues
- Enable device queue creation up to 12MB. That should allow to run Intel SDK sample from the EPR that requires 6MB queue only.
- Currently a queue with >12.5MB size has a significant performance degradation. Thus the current max possible is 12MB. In general it's preferable to use the queue size more suitable for the task, rather than max possible.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#123 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#517 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#372 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#131 edit
EPR #421017 - IOMMU2/SVM on CZ Win10, the bit INST_ATC of COMPUTE_PGM_HI needs to be set for device enqueue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#370 edit
ECR #333753 - ORCA RT/Compiler Lib/aoc2: AMD HSA Code Object Import feature (part II) - arbitrary hidden (extra) kernargs support
Only HSAIL path is affected. It doesn't affect blit kernels.
To use offline by aoc2:
aoc2 -hsacodeobject=<importing_code_object_filename> -numhiddenkernargs=<num> -cl-std=CL2.0 -march=hsail(-64) -mdevice=Bonaire <source_cl_filename>
To use online by setting env:
AMD_DEBUG_HSA_NUM_HIDDEN_KERNARGS=<num>
where num >= 0. If num == 0, then no additional arguments will be added on RT for every kernel. The default value is unchanged and equal to 6 for now.
Misc:
+ get rid of PRE & POST defines in Compiler Lib, as they started to conflict with ugl\gl\gs\hwl\ headers with the same defines.
+ minor copy/paste eliminations & typo fixes
+ ocltst complib tests update
Testing: pre check-in, manually based on ocl sdk MatrixMultiplication
Reviewers: Brian Sumner, German Andryeyev, Nikolay Haustov, Artem Tamazov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#72 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#49 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/metadata.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclDefs.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclEnums.h#19 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclStructs.h#17 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/bif_section_labels.hpp#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#20 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#181 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#249 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#291 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#113 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#199 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#369 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsaprogram.cpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsaprogram.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLAssumptionCheck.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLEnumCheck.cpp#44 edit
ECR #333753 - ORCA RT/Compiler Lib: HSA Code Object/RT independent loader introducing/integration into OpenCL.
Changes by Evgeniy Mankov.
Purpose:
Use the same Finalizer & loader for both HSA & ORCA RT.
AMDIL path is not affected.
Changes:
1. The whole BRIG is finalized now instead of per kernel finalization (both in gpuprogram & hsail_be).
2. HSALoader is changed in order to work with CodeObject and new HSA Loader's API <96> Context. Now it is in ORCA<92>s gpuprogram instead of Compiler Lib.
3. brig_loader.cpp is removed from compiler lib, as well as __aclHSALoader function exports from the whole stack.
4. BIF .text section now contains the whole finalized HSA CodeObject instead of separate symbols for finalized kernels.
5. ORCA RT now works directly with amd_kernel_code_t and doesn't need any SC metadata anymore.
6. aoc2 is supplemented with fake offline loader correspondingly.
7. amdocl/complib make sytem changes.
8. test_driver.pl update.
ToDo:
1. Implement disassemble() & BuildLog() functions to support ISA dumping & SC error handling (Konstantin).
2. Global variables initialization by pragma reference (Konstantin). Test to verify: test_basic progvar_prog_scope_init.
3. Code Object without kernels support (Nikolay - ready). Test to verify: test_generic_address_space.exe library_function
testing: windows smoke, pre check-in, ocl conformance 2.0, ocl SDK 2.9
Reviewers: Nikolay Haustov, German Andryeyev
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.def.in#13 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.map.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/build/Makefile.api#116 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.def.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.map.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.def.in#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.map.in#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/build/Makefile.gpu#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/build/Makefile.complib#85 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/build/Makefile.aoc2#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#248 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#288 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#194 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.hpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuscsi.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#368 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#12 edit
ECR #304775 - Fix a crash in memorybandwidth test
- Remove a pinned mem object from the list only if we need a free slot.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#367 edit
EPR #397491 - According to HSA-Finalizer-ADD, for GPUVM32 private_segment_aperture_base_hi and group_segment_aperture_base_hi should be equal to the 32 bits of the 32 bit private and group segment flat address aperture.
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#362 edit
EPR #419065 - [CQE OCL][ISV][QR][G] FAHBenchmark application is crashing.
Two issues:
1. Remove clearing of profileEnabled_ since it may cause incorrect kernel execution time measurement.
2. Blit kernels causes assertion in getWavesPerSH since they do not have wave limiters. Remove the assert. If a kernel has no wave limiter, returns 0 in getWavesPerSH.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#361 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#5 edit
ECR #304775 - Wave limiter: Fix bug in adaptation.
Dumped waves/simd value is incorrect.
Should exit adptation only after the changed waves/simd value is applied.
Added wave limiter manager to handle situation that one kernel is enqueued to more than one queues. Create wave limiter for each virtual device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#245 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#283 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#360 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#76 edit
ECR #304775 - Wave limiter: Fix crash in CompuBenchCL video composition due to profiling data not collected correctly.
Gpuvirtual.cpp only collects profiling data when all events have profiling enabled. Fixed it by adding a member to indicate at least one event has profiling enabled and collect profiling data.
Improved adptation by changing waves/simd only when the last change has been enforced. Also detecting discontinuities in measured data and discard them.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#359 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#230 edit
ECR #399840 - OpenCL Runtime HW Debug support development - resolve the TDR issues on Kaveri.
1. update the resource descriptors in the runtime trap handler to match those in the HSA HW debug implementation
2. force to use SDMA for device memory map function, which is called when using clHwDbgSetGlobalMemoryAMD() and clEnqueueMapImage() functions, for HW debug
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#214 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gputrap.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#358 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#47 edit
EPR #403782 - IOMMU2/SVM
- Handle case of only one DMA engine available, for example with SVM.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7284/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#506 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#357 edit
ECR #399840 - OpenCL Runtime HW Debug support development - add support to the VI asics & support the use case of debug registeration in a pre-dispatch callback function
** Cross branch check-in with CL1131894
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#501 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#139 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gputrap.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#355 edit
EPR #394115 - Adding the environment variable "GPU_SELECT_COMPUTE_RINGS_ID" to select a specific compute queue for OCL submission. This EV was requested from KMD team for testing the CWSR demo on CZ.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7082/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#354 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#111 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#226 edit
ECR #399840 - OpenCL Runtime HW Debug support development
- use device to control debugger registration and exception notification so that debug event will not be tied to any particular queue.
- use aqlCodeInfo parameter for clHwDbgMapKernelCodeAMD() to be consistent with clHwDbgGetAqlPacketInfoAMD()
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.h#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.hpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#352 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#43 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hwdebug.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hwdebug.hpp#6 edit
EPR #403782 - IOMMU2/SVM
Basic changes to enable finegrainsystem.
- OpenCL runtime changes for enabling Fine Grain System on Carrizo
- Check for SVMPointer while unmap, if so skip unmap
ReviewBoardURL = http://ocltc.amd.com/reviews/r/6844/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#494 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#350 edit
ECR #304775 - Don't disable second SDMA if configuration has just 1 compute ring.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#343 edit
EPR #411058 - [CQE OCL][Lnx][QR][CZ]MultiDevice_Context fails in 2.0 conformance wimpyfull due to CL# 1101352
- The detection of different map types is overcomplicated with possibility of multiple maps and multithreading environment. Thus keep USWC indirect map optimization based on the allocation flags.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#114 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#342 edit
EPR #410736 - [CQE OCL][ISV][QR][G] FFMPEG app generating corrupted video output; Faulty CL:1101352
- Add detection for AHP allocation.
FFmpeg uses AHP allocations with CL_MAP_READ flag, but actually performs CPU write into the buffer. With indirect map runtime executes useless transfer on map and doesn't write updated memory on unmap, because a wrong flag sent by the app.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#113 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.hpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#341 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/perf/TestList.cpp#40 edit
ECR #304775 - Optimize oclBandwidthTest from nVidia SDK
- Cache pinned memory, since the benchmark sends the same transfer in a single batch. Thus we could avoid pin/unpin
- Swap SDMA engine allocation order. Blit manager allocates a queue on device, thus the first app queue was getting the paging second SDMA.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#339 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#121 edit
ECR #333755 - Part 2- Update to foundation spec 1.0 20141019:
- hsa_dispatch_packet_t now becomes hsa_kernel_dispatch_packet_t
- all bit mask in a struct are removed and replaced by enums that indicates the bit position and width.
Test: TC precheckin
Review: Hari, Fan, Shucai, German, Yunjun.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#268 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#103 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#15 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.hpp#12 edit
ECR #304775 - Align the queue size to match the multidispatch scheduler requirements
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#337 edit
ECR #304775 - Reduce the total number of renames to 16.
- Use 128KB for CB size on SI+
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#286 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#334 edit
ECR #304775 - Add batching to the device enqueue for possible asynchronous execution
- Increase the max device queue size to 512KB. That will allow to pass conformance tests that enqueue more jobs than the queue size.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#459 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#333 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#65 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#39 edit
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- FIx for TC regression after CL#1069020. Move the lock directly to the gsl flush() calls.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#332 edit
EPR #404714 - [CQE OCL][2.0][DTB]Opencl1.2 WF Conf. Math test failedon Pitcairn and Oland due to CL#1065597
- Add VGPU lock to flush() method, because gsl flush for the same context could be called from multiple threads
- Use new scratchAlloc_ monitor for scratch reallocation
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#455 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#130 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#331 edit
ECR #304775 - Correct a typo where I didnt remove the offset from the condition which made the writeRect take pinning path.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5566/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#330 edit
ECR #304775 - Device enqueuing
- Provide scratch buffer offset for generic address space
- Use single scratch buffer for all available queues. Each queue will have a unique subbuffer in the global buffer
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#454 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#329 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#120 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#63 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#37 edit
ECR #304775 - Use accelerated copy path for read/writeRect if the host memory has offsets. This avoids re-pinning the memory giving nearly a 100% perf boost for such copies.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/5371/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#328 edit
ECR #304775 - Device enqueuing
- Use atomic fetch for enqueue flags
- Switch to a multithreaded scheduler
- Add a workaround for Linux host_multi_queue failures. Linux has only 2 queues, but the test allocates multiple host queues and the same HW ring can be used
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#106 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#449 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#325 edit
ECR #304775 - Device enqueuing
- Add L2 cache flush after the scheduler execution. Although CP has to work with L2 cache, it seems some functionality relies on direct memory access and without explicit L2 flush CP can pick old values in the template.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#324 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#35 edit