SWDEV-2 - Change OpenCL version number from 2922 to 2923.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2670 edit
SWDEV-180872 - Runtime support changes for Cooperative Group Features
- Initial implementation of the core functionality. Disabled by default. Use GPU_ENABLE_COOP_GROUPS=1 to enable the feature.
- Runtime uses device queue for cooperative executions with a synchronization on the launched queue.
- The current implementation is pure runtime change and it can work if only one app uses this feature. No ROCr/KFD support was added or tested
- Only inline assembler was tested
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device_runtime.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#17 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#606 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palschedcl.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocschedcl.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#94 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#92 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#311 edit
SWDEV-187399 - Linux-Pro Pyro_Explosion.hip Image difference is observed specific to W9000 ASIC in OCL mode
The issue disappeared when we invalidate L1 cache after each dispatch.
This change also fix ocltst runtime OCLMultiQueue test in Tahiti Brahma driver.
Tests:
http://ocltc.amd.com:8111/viewModification.html?modId=121519&personal=true&tab=vcsModificationBuilds
Test in the Houdini app. The symptom disappeared.
This change also fix ocltst runtime OCLMultiQueue test in Tahiti Brahma driver.
ReviewBoard: http://ocltc.amd.com/reviews/r/17509/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#337 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Force LC for HIP, since it doesn't support HSAIL path
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#83 edit
SWDEV-2 - Change OpenCL version number from 2921 to 2922.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2669 edit
SWDEV-190922 - Revert CL 1785925.
CLOCK_MONOTONIC_RAW access latency is too large for high frequency use.
Note: Since clock use must be synchronized with ROCr this patch may cause apparent performance regressions when tested with the amd-master branch of ROCr until the corresponding ROCr patch passes through PSDB. There should be no conflict with the ROCr 2.5 release branch.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/os/os_posix.cpp#46 edit
SWDEV-2 - Change OpenCL version number from 2920 to 2921.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2668 edit
SWDEV-2 - Change OpenCL version number from 2919 to 2920.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2667 edit
SWDEV-2 - Change OpenCL version number from 2918 to 2919.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2666 edit
SWDEV-79445 - OCL generic changes and code clean-up
Optimize scratch buffer calculation in the preparation for coop group launch, since the current limit affects max waves calculation:
- Switch to 32 waves per CU as the max possible limit
- Use vgprs count for the waves limit calculation to avoid unconditional possible max
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#141 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#82 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Use PAL shader core properties instead of the local device info
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#53 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#140 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#134 edit
SWDEV-2 - Change OpenCL version number from 2917 to 2918.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2665 edit
SWDEV-2 - Change OpenCL version number from 2916 to 2917.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2664 edit
SWDEV-132899 - [OCL][GFX10] increase the numScratchWavesPerCu in Wave32 mode and use the actual num of CUs not the total num of WGPs when calculating the scratch buffer size
ReviewBoardURL = http://ocltc.amd.com/reviews/r/17474/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#139 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#81 edit
SWDEV-2 - Change OpenCL version number from 2915 to 2916.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2663 edit
SWDEV-79445 - OCL generic changes and code clean-up
1. Add a flag to amd::memory::create() to force the allocation on all available devices.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/17466/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#110 edit
SWDEV-2 - Change OpenCL version number from 2914 to 2915.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2662 edit
SWDEV-2 - Change OpenCL version number from 2913 to 2914.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2661 edit
SWDEV-187169 - Hotel Lobby scene takes long time to compile
Patch authored by Valery Pykhtin.
Remove " -mllvm -amdgpu-early-inline-all", from the options passed
to the compiler; option interferes with function call support.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#44 edit
SWDEV-162389 - OpenCL Support for COMgr
- direct the COMgr log to buildLog_ buffer
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#43 edit
SWDEV-2 - Change OpenCL version number from 2912 to 2913.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2660 edit
SWDEV-188631 - Allocating large buffers produce wrong kernel result on Windows
1. Set a limit for USWC allocations to 2GB on Windows.
2. Allocations larger than the specified limit will get placed into pinned memory instead.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/17407/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#138 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#310 edit
SWDEV-2 - Change OpenCL version number from 2911 to 2912.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2659 edit
SWDEV-185452 - Offline compilation failing on a VM, producing error CL_PLATFORM_NOT_FOUND_KHR
1. Don't load a platform if there are no devices available for it. If there is no platform that has visible devices, only allow the PAL platform to load.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/17419/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_icd.cpp#34 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#309 edit
SWDEV-2 - Change OpenCL version number from 2910 to 2911.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2658 edit
SWDEV-2 - Change OpenCL version number from 2909 to 2910.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2657 edit
SWDEV-145570 - Support loading fat binary generated through --genco by hipModuleLoad.
hip-clang --genco generates fat binary instead of code object. To support that
we need to extract code object from fat binary in hipModuleLoadData. This is
needed for hipRTC since multiple GPU archs may be passed.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#27 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#308 edit
SWDEV-2 - Change OpenCL version number from 2908 to 2909.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2656 edit
SWDEV-2 - Change OpenCL version number from 2907 to 2908.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2655 edit
SWDEV-2 - Change OpenCL version number from 2906 to 2907.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2654 edit
SWDEV-2 - Change OpenCL version number from 2905 to 2906.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2653 edit
SWDEV-190142 - Switch os::TimeNanos to use CLOCK_MONOTONIC_RAW.
Avoids NTP adjustment and re-aligns CPU clock usage with ROCm.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/os/os_posix.cpp#45 edit
SWDEV-189990, SWDEV-190337 - Update OpenCL/PAL gfx10 counter blocks again to match GPUPerfAPI.
SWDEV-190337:
- Update the GFX10 block list (gfx10BlockIsPal) to match GPUPerfAPI, which expects the OpenCL block lists to match OpenGL (UGL). Note: The list now matches the expectations of GPUPerfAPI. For TA/TD/TCP, nearly all GFX10 ASICs only require 10 or 12 instances (Arden would require 14, Mero would require 8, but not sure if those are supported by OCL), but we are using 16 instances to match UGL.
- Make sure the blockIdToIndexSelect array contains all the blocks supported by PAL (add a static_assert to ensure this)
- Refactor the PCIndexSelect enum. This enum is used to determine how to sum up counters across multiple block instances. The following types are now supported:
Instance -- no autosumming; instances have a one-to-one correlation with PAL
ShaderEngine -- the block is instanced per shader engine, and OpenCL will autosum counters across all PAL instances, providing a single value for all of PAL's instances
ShaderArray -- the block is instanced per shader array, and OpenCL will autosum counters across shader arrays, providing a single value for each instance within a shader array. For example, if a block has four instances per shader array, PAL would expose 16 instances total on Navi10 (2 SEs, 2 SAs per SE), but OpenCL will expose four instances
ComputeUnit -- the block is instanced per compute unit, and OpenCL will autosum counters across shader arrays, providing a single value for each compute-unit-per-shader-array. For example, if a block is instanced per compute unit, then PAL would expose 40 instances on a 40CU Navi10. OpenCL would support 10 instances (2 CUs-per-WGP, 5 WGPs-per-SA), autosummed across shader arrays.
SWDEV-189990:
- Revert GFX9 and GFX10 tests back to using the MCVML2 counter it was using previously (prior to CL 1766829). This is counter index 2, which the test calls "BigK bank 0 hits". In the aforementioned change list, I updated the counter index from 2 to 14, since index 14 is the actual counter that represents "BigK bank 0 hits". Counter index 2 is the number of hits, not "bigK" hits. This previous change caused a test regression reported in SWDEV-189990. By reverting the code to use counter 2, the expected value in the test should be correct. Perhaps a better update would be to change the description in the source from "BigK bank 0 hits" to "bank 0 hits", but for now, I'm just going to go back to what the test was doing before.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcounters.cpp#21 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcounters.hpp#11 edit
... //depot/stg/opencl/drivers/opencl/tests/ocltst/module/runtime/OCLPerfCounters.cpp#47 edit
SWDEV-2 - Change OpenCL version number from 2904 to 2905.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2652 edit
SWDEV-2 - Change OpenCL version number from 2903 to 2904.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2651 edit
SWDEV-2 - Change OpenCL version number from 2902 to 2903.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2650 edit
SWDEV-189453 - [Navi10][OpenCl][x32][Converter] Process hang
- Use the argument size from the caller. With LC path and 32 bit the both sizes are different and runtime has to use the caller's size, which matches the host bitness, because the optimized path updates 32 bit values only.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#30 edit
SWDEV-2 - Change OpenCL version number from 2901 to 2902.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2649 edit
SWDEV-2 - Change OpenCL version number from 2900 to 2901.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2648 edit
SWDEV-189541 - [HIP] Make sure maxSvmSize is power of two.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#136 edit
SWDEV-2 - Change OpenCL version number from 2899 to 2900.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2647 edit
SWDEV-2 - Change OpenCL version number from 2898 to 2899.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/utils/versions.hpp#2646 edit
SWDEV-189140 - Add P2P support in PAL path
- PAL requires P2P resource open on the usage device. Add the new interface to open the resource
- Add a hidden P2P device object creation into amd::Memory. It can be activated with OCL context that has a single device.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_p2p_amd.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#337 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palmemory.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#74 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#28 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#133 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#126 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#93 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.hpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#306 edit