SWDEV-59579 - resubmit the changelist 1193161. refactory the Coare-grained SVM and fine grain buffer SVM code path, so that if the device SVM running on supports fine grain system, then the SVM API operation will be on system memory, no need to go through GPU backend. In addition, added support for PX system with CZ on windows 10, which supports SVM fine grain system.
code review:
http://ocltc.amd.com/reviews/r/8530/
precheckin:
http://ocltc.amd.com:8111/viewModification.html?modId=58913&personal=true&buildTypeId=&tab=vcsModificationBuilds&show_all_builds=true
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#527 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#152 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#382 edit
[ROCm/clr commit: a3074a2a8f]
SWDEV-59579 - refactory the Coare-grained SVM and fine grain buffer SVM code path, so that if the device SVM running on supports fine grain system, then the SVM API operation will be on system memory, no need to go through GPU backend. In addition, added support for PX system with CZ on windows 10, which supports SVM fine grain system.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#13 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#256 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#525 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#150 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#380 edit
[ROCm/clr commit: 2cd56dc9f0]
EPR #426143 - [AVID] clEnqueueWriteImage with a row pitch different from 0 will fail if we use the pre-pinned path
- pass pitch and slice to the copy functions
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#377 edit
[ROCm/clr commit: 8e3e9dbae5]
EPR #419072 - [OpenCL2.0] Enable 16MB large on device queues
- Enable device queue creation up to 12MB. That should allow to run Intel SDK sample from the EPR that requires 6MB queue only.
- Currently a queue with >12.5MB size has a significant performance degradation. Thus the current max possible is 12MB. In general it's preferable to use the queue size more suitable for the task, rather than max possible.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#123 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#517 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#372 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#131 edit
[ROCm/clr commit: 1386191b6c]
EPR #421017 - IOMMU2/SVM on CZ Win10, the bit INST_ATC of COMPUTE_PGM_HI needs to be set for device enqueue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/library/hsa/hsail/src/devenq/schedule.cl#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#16 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#370 edit
[ROCm/clr commit: 7a54d367f3]
ECR #333753 - ORCA RT/Compiler Lib/aoc2: AMD HSA Code Object Import feature (part II) - arbitrary hidden (extra) kernargs support
Only HSAIL path is affected. It doesn't affect blit kernels.
To use offline by aoc2:
aoc2 -hsacodeobject=<importing_code_object_filename> -numhiddenkernargs=<num> -cl-std=CL2.0 -march=hsail(-64) -mdevice=Bonaire <source_cl_filename>
To use online by setting env:
AMD_DEBUG_HSA_NUM_HIDDEN_KERNARGS=<num>
where num >= 0. If num == 0, then no additional arguments will be added on RT for every kernel. The default value is unchanged and equal to 6 for now.
Misc:
+ get rid of PRE & POST defines in Compiler Lib, as they started to conflict with ugl\gl\gs\hwl\ headers with the same defines.
+ minor copy/paste eliminations & typo fixes
+ ocltst complib tests update
Testing: pre check-in, manually based on ocl sdk MatrixMultiplication
Reviewers: Brian Sumner, German Andryeyev, Nikolay Haustov, Artem Tamazov
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#72 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#49 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/metadata.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclDefs.h#5 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclEnums.h#19 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/include/v0_8/aclStructs.h#17 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/bif_section_labels.hpp#21 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#10 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#20 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#181 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#249 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#291 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#113 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#199 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#369 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsaprogram.cpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsakernel.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsaprogram.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLAssumptionCheck.cpp#43 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/complib/CLEnumCheck.cpp#44 edit
[ROCm/clr commit: 81b331f4c5]
ECR #333753 - ORCA RT/Compiler Lib: HSA Code Object/RT independent loader introducing/integration into OpenCL.
Changes by Evgeniy Mankov.
Purpose:
Use the same Finalizer & loader for both HSA & ORCA RT.
AMDIL path is not affected.
Changes:
1. The whole BRIG is finalized now instead of per kernel finalization (both in gpuprogram & hsail_be).
2. HSALoader is changed in order to work with CodeObject and new HSA Loader's API <96> Context. Now it is in ORCA<92>s gpuprogram instead of Compiler Lib.
3. brig_loader.cpp is removed from compiler lib, as well as __aclHSALoader function exports from the whole stack.
4. BIF .text section now contains the whole finalized HSA CodeObject instead of separate symbols for finalized kernels.
5. ORCA RT now works directly with amd_kernel_code_t and doesn't need any SC metadata anymore.
6. aoc2 is supplemented with fake offline loader correspondingly.
7. amdocl/complib make sytem changes.
8. test_driver.pl update.
ToDo:
1. Implement disassemble() & BuildLog() functions to support ISA dumping & SC error handling (Konstantin).
2. Global variables initialization by pragma reference (Konstantin). Test to verify: test_basic progvar_prog_scope_init.
3. Code Object without kernels support (Nikolay - ready). Test to verify: test_generic_address_space.exe library_function
testing: windows smoke, pre check-in, ocl conformance 2.0, ocl SDK 2.9
Reviewers: Nikolay Haustov, German Andryeyev
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.def.in#13 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/amdocl.map.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/build/Makefile.api#116 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.def.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/legacy-lib/amdoclcl.map.in#2 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.def.in#12 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/amdoclcl.map.in#11 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/common/v0_8/if_acl.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/build/Makefile.gpu#32 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/backends/gpu/hsail_be.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/build/Makefile.complib#85 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/compiler/lib/utils/v0_8/libUtils.h#18 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/aoc2.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/compiler/tools/aoc2/build/Makefile.aoc2#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#248 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#121 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#288 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.cpp#194 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuprogram.hpp#59 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuscsi.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#368 edit
... //depot/stg/opencl/drivers/opencl/tests/hsa/bin/test_driver.pl#12 edit
[ROCm/clr commit: 8cc3f47661]
ECR #304775 - Fix a crash in memorybandwidth test
- Remove a pinned mem object from the list only if we need a free slot.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#367 edit
[ROCm/clr commit: 75b2321608]
EPR #397491 - According to HSA-Finalizer-ADD, for GPUVM32 private_segment_aperture_base_hi and group_segment_aperture_base_hi should be equal to the 32 bits of the 32 bit private and group segment flat address aperture.
Reviewed by: German
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#362 edit
[ROCm/clr commit: 9d37ac1fc8]
EPR #419065 - [CQE OCL][ISV][QR][G] FAHBenchmark application is crashing.
Two issues:
1. Remove clearing of profileEnabled_ since it may cause incorrect kernel execution time measurement.
2. Blit kernels causes assertion in getWavesPerSH since they do not have wave limiters. Remove the assert. If a kernel has no wave limiter, returns 0 in getWavesPerSH.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#361 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#5 edit
[ROCm/clr commit: 7a956a4aa7]
ECR #304775 - Wave limiter: Fix bug in adaptation.
Dumped waves/simd value is incorrect.
Should exit adptation only after the changed waves/simd value is applied.
Added wave limiter manager to handle situation that one kernel is enqueued to more than one queues. Create wave limiter for each virtual device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#245 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#283 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#360 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#76 edit
[ROCm/clr commit: 51efa976bf]
ECR #304775 - Wave limiter: Fix crash in CompuBenchCL video composition due to profiling data not collected correctly.
Gpuvirtual.cpp only collects profiling data when all events have profiling enabled. Fixed it by adding a member to indicate at least one event has profiling enabled and collect profiling data.
Improved adptation by changing waves/simd only when the last change has been enforced. Also detecting discontinuities in measured data and discard them.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#359 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#129 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuwavelimiter.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#230 edit
[ROCm/clr commit: b5a5c65e53]
ECR #399840 - OpenCL Runtime HW Debug support development - resolve the TDR issues on Kaveri.
1. update the resource descriptors in the runtime trap handler to match those in the HSA HW debug implementation
2. force to use SDMA for device memory map function, which is called when using clHwDbgSetGlobalMemoryAMD() and clEnqueueMapImage() functions, for HW debug
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#214 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gputrap.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#358 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#115 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#47 edit
[ROCm/clr commit: 28a35ae54d]
EPR #403782 - IOMMU2/SVM
- Handle case of only one DMA engine available, for example with SVM.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7284/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#506 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#357 edit
[ROCm/clr commit: 32c073c558]
ECR #399840 - OpenCL Runtime HW Debug support development - add support to the VI asics & support the use case of debug registeration in a pre-dispatch callback function
** Cross branch check-in with CL1131894
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#501 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#139 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gputrap.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#355 edit
[ROCm/clr commit: 95596795fc]
EPR #394115 - Adding the environment variable "GPU_SELECT_COMPUTE_RINGS_ID" to select a specific compute queue for OCL submission. This EV was requested from KMD team for testing the CWSR demo on CZ.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/7082/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#354 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.cpp#111 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLDevice.h#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#226 edit
[ROCm/clr commit: 0494cd6ace]
ECR #399840 - OpenCL Runtime HW Debug support development
- use device to control debugger registration and exception notification so that debug event will not be tied to any particular queue.
- use aqlCodeInfo parameter for clHwDbgMapKernelCodeAMD() to be consistent with clHwDbgGetAqlPacketInfoAMD()
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_debugger_amd.h#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudebugmanager.hpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#352 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#43 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hwdebug.cpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hwdebug.hpp#6 edit
[ROCm/clr commit: 3e53caa02e]
EPR #403782 - IOMMU2/SVM
Basic changes to enable finegrainsystem.
- OpenCL runtime changes for enabling Fine Grain System on Carrizo
- Check for SVMPointer while unmap, if so skip unmap
ReviewBoardURL = http://ocltc.amd.com/reviews/r/6844/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#494 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#350 edit
[ROCm/clr commit: 546ce7ec2d]
EPR #411058 - [CQE OCL][Lnx][QR][CZ]MultiDevice_Context fails in 2.0 conformance wimpyfull due to CL# 1101352
- The detection of different map types is overcomplicated with possibility of multiple maps and multithreading environment. Thus keep USWC indirect map optimization based on the allocation flags.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#114 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#342 edit
[ROCm/clr commit: 593d1e3b8d]
EPR #410736 - [CQE OCL][ISV][QR][G] FFMPEG app generating corrupted video output; Faulty CL:1101352
- Add detection for AHP allocation.
FFmpeg uses AHP allocations with CL_MAP_READ flag, but actually performs CPU write into the buffer. With indirect map runtime executes useless transfer on map and doesn't write updated memory on unmap, because a wrong flag sent by the app.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.cpp#113 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpumemory.hpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#341 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/perf/TestList.cpp#40 edit
[ROCm/clr commit: f9f5df731e]
ECR #304775 - Optimize oclBandwidthTest from nVidia SDK
- Cache pinned memory, since the benchmark sends the same transfer in a single batch. Thus we could avoid pin/unpin
- Swap SDMA engine allocation order. Blit manager allocates a queue on device, thus the first app queue was getting the paging second SDMA.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#339 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#121 edit
[ROCm/clr commit: dc8a3205ce]
ECR #333755 - Part 2- Update to foundation spec 1.0 20141019:
- hsa_dispatch_packet_t now becomes hsa_kernel_dispatch_packet_t
- all bit mask in a struct are removed and replaced by enums that indicates the bit position and width.
Test: TC precheckin
Review: Hari, Fan, Shucai, German, Yunjun.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#268 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.hpp#103 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusched.hpp#15 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsavirtual.hpp#12 edit
[ROCm/clr commit: c7988f7209]
ECR #304775 - Reduce the total number of renames to 16.
- Use 128KB for CB size on SI+
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#286 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#334 edit
[ROCm/clr commit: f48b935b43]