SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 3)
Use ClPrint to implement other log functions.
Move some funtion to use new log functions.
This is the final change of the JIRA.
Tests:
1. Linux HIP ROCM platform. VEGA10. Driver is release build.
1.1 export LOG_LEVEL=3
./hipModule
There are many logs.
1.2 export GPU_LOG_MASK=0
./hipModule
There is no log
2. Windows HIP PAL platform. VEGA10, Driver is release build.
2.1 set LOG_LEVEL=3
run test hipPrintfKernel
There are many logs
2.2 set GPU_LOG_MASK=0
run test hipPrintfKernel
There is no log
3. http://ocltc.amd.com:8111/viewModification.html?modId=128588&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18259/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#177 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#157 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#14 edit
SWDEV-198859 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB
There are regression caused by this change in ocltst test.
Back out changelist 2026859
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#176 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#156 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#13 edit
SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 3)
Use ClPrint to implement other log functions.
Move some funtion to use new log functions.
This is the final change of the JIRA.
Tests:
1. Linux HIP ROCM platform. VEGA10. Driver is release build.
1.1 export LOG_LEVEL=3
./hipModule
There are many logs.
1.2 export GPU_LOG_MASK=0
./hipModule
There is no log
2. Windows HIP PAL platform. VEGA10, Driver is release build.
2.1 set LOG_LEVEL=3
run test hipPrintfKernel
There are many logs
2.2 set GPU_LOG_MASK=0
run test hipPrintfKernel
There is no log
3. http://ocltc.amd.com:8111/viewModification.html?modId=128490&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18247/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#175 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#155 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#12 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
Add undefined memory object in PAL process memory objects.
http://ocltc.amd.com/reviews/r/18055/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.hpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#154 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
The runtime cannot trivially determine all the resources that will be used by a kernel, thus it can fail to make all of them resident.
1. Add new runtime flag PAL_ALWAYS_RESIDENT. Enabling this setting will cause resources to become resident at allocation time.
2. Set the default value of the above flag to true for HIP and false for OCL.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18054/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#79 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#153 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#319 edit
SWDEV-200614 - [Schneider] Crash in Agisoft when run in mGPU environment
- Add a workaround for memory pinning path. It will perform 2-step copy to make sure memory pinning doesn't occur on the first unaligned page, because in Windows memory manager can have CPU access to the allocation header in another thread and a race condition is possible
- change some default setting for staging and pinned paths, because PCIE gen3 performance.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#96 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#150 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#317 edit
SWDEV-200489 - [CQE OCL][QR][Windows][Vega20][19H1] Performance drop is observed while running Blender on Vega20 due to faulty CL#1981122
- Switch scratch buffer allocation algorithm back to the optimal size calculation with sync mode. Some kernels will run slower if max scratch per queue is programmed unconditionally due to possible lower memory efficiency with fetches
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#160 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#45 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#149 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Use max number of waves per SIMD in the scratch calculation to allow async kernel execution with the scratch buffer
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#155 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#43 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#147 edit
SWDEV-199667 - [OpenCL][PAL][LC] bruteforce conformance test causes vm fault
- With HWS scratch buffer has to be allocated per each windows scheduling context, because HWS can schedule CB on any pipe and different pipes can't preserve unique wave_id. Recycle PAL queues in order to keep the scratch buffer per scheduling context. The change will also allow to remove Windows/kmd limitation of the scheduling contexts per process. GPU_MAX_HW_QUEUES controls the number of unique PAL queues, default is 4
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#154 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#146 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#63 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Allow async execution with scratch on the same queue. COMPUTE_TMPRING_SIZE.WAVESIZE should be constant across all dispatches.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#151 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#145 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Remove the first queue skipping logic, since KMD no longer reports an extra normal queue
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#144 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Make sure PAL_DISABLE_SDMA is fully functional. CP DMA is used for buffer transfers currently and kernels for images and buffer rect copies.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#150 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#92 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#142 edit
SWDEV-196199 - [Navi10] Corruption is observed when running Premier Pro Benchmarks using Adobe Premier Pro 2019
- In addition to CL#1968527. Add an extra transfer in the case the app reads image without kernel execution
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#139 edit
SWDEV-195023 - [CQE OCL][Navi10][RESOLVE] corruption seen in thumbnail for mxf clip after enabling temporal denoiser in Davinci resolve app
- Add a workaround for missing custom pitch in gfx10 HW. It can be disabled with GPU_IMAGE_BUFFER_WAR=0. Workaround implements double copy with an image without pitch.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#89 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#62 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#313 edit
SWDEV-192353 - 12% Performance drop observed while running ROC_OCL_Perf_CompubenchCL_GPU_W64 only on Vega10 with Win7
1. Reduce CmdAllocator sub-allocation size to 4KB.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#137 edit
SWDEV-191674 - Handling cases where p2p memcpy is initiated from device 1. (No Large Bar/P2P staging).
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#77 edit
SWDEV-180872 - Runtime support changes for Cooperative Group Features
- Initial implementation of the core functionality. Disabled by default. Use GPU_ENABLE_COOP_GROUPS=1 to enable the feature.
- Runtime uses device queue for cooperative executions with a synchronization on the launched queue.
- The current implementation is pure runtime change and it can work if only one app uses this feature. No ROCr/KFD support was added or tested
- Only inline assembler was tested
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device_runtime.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#17 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#606 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palschedcl.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocschedcl.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#94 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#92 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#311 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Use PAL shader core properties instead of the local device info
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#53 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#140 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#134 edit
SWDEV-189140 - Add P2P support in PAL path
- PAL requires P2P resource open on the usage device. Add the new interface to open the resource
- Add a hidden P2P device object creation into amd::Memory. It can be activated with OCL context that has a single device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_p2p_amd.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#337 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#126 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#306 edit
SWDEV-155310 - Request for OpenCL extension function to set stable pstate
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#128 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Fix a crash with Unity, during RGP capture. Keep local size as 1 if the app didn't provide any
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#126 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Fix test_basic progvar_prog_scope_uninit with LC. Detect global variables usage in the program and add the code object allocation to the memory dependency tracking
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.hpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#123 edit
SWDEV-145570 - [HIP] Output Kernel name and mem arguments passed with LOG_LEVEL=3 for PAL and ROCm backends
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15617/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#120 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#62 edit
SWDEV-155434 - Add SQTT instrumentation tokens for OpenCL dispatches for RGP support
- Switch to the workgroup size report for the dispatch info.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#118 edit
SWDEV-79445 - OCL generic changes and code clean-up
1. In SvmCopyMemoryCommand handle the case when both src and dst pointer don't belong to the SVM space.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15481/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#424 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#117 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Reset memory dependency if runtime invalidated L1 for the profiling logic workaround. Profiling can be enabled for wave limiter, which could cause L1 invalidation twice.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#116 edit
SWDEV-155438 - Produce RGP Queue Timings chunk for OpenCL RGP files
- Register SDMA queue in order to get SDMA timing. The RGP trace capture with SDMA may cause a HW hang occasionally
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palgpuopen.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palgpuopen.hpp#5 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#115 edit
SWDEV-155433 - Add OpenCL API type for RGP files generated from OpenCL
- Add API type events to the SQTT
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palgpuopen.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palgpuopen.hpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#112 edit
SWDEV-155306 - Restore ClockMode when the last queue is destroyed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#110 edit