SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 1)
1. The log macros is turned off for release build. So log functions has zero impact to release build.
2. The log macros have level, mask, condition control. So we can have more control to avoid log flooding.
I also adjusted some existing log to use new log functions.
1. To excercise and test the new log functions.
2. To improve performance slightly.
3. The change is mainly for HIP-ROCM, we can move more in next phases for PAL or ORCA.
4. I make these log feature unavailable for release build. We can revert to old log functions for release build in a case by case method.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=128289&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=128358&personal=true&tab=vcsModificationBuilds
2. release build, run hip program, there is no log
3. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967295
There was a lot of logs.
4. fastdebug build, run hip program,
export LOG_LEVEL=2
export GPU_LOG_MASK=4294967295
There was no logs.
5. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967294
There was much less logs.
6. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=47102
There was even much less logs. The logs was expected according to the mask.
7. Tested step 2 to 6 similarily in Windows and Linux
ReviewBoard: http://ocltc.amd.com/reviews/r/18215
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hiprtc_internal.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/comgrctx.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#91 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/runtime.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#323 edit
SWDEV-184710 - Support hipLaunchCooperativeKernelMultiDevice()
- Add support for multi grid launch in hip
- Detect the new hidden argument and pass the required information for the kernel launch
- Memory for synchronization is allocated as a single object and then the offset for each GPU is found
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#343 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#90 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#99 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#97 edit
SWDEV-198862 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_OPT_FLUSH
Add HCC_OPT_FLUSH flag to use fence scope agent when possible for HIP VDI. The flag is defaulted to turn on, similiar to HIP HCC.
Add AMD_OCL_OPT_FLUSH to use fence scope agent when possible for OpenCL. This was tested in Windows and PAL. Default is off.
This flag can be used for future OpenCL test.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=127189&personal=true&tab=vcsModificationBuilds
The teamcity test includes HIP - VDI - Rocm tests.
2. VEGA10 , Windows, HIP, 110 hiptests PASS.
3. VEGA10 , Linux AMDGPU PRO, HIP - PAL, 110 hiptests PASS.
Newer:
http://ocltc.amd.com:8111/viewModification.html?modId=127193&personal=true&tab=vcsModificationBuilds
Reviewboard: http://ocltc.amd.com/reviews/r/18092/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#247 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#342 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#89 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#321 edit
SWDEV-204782 - store extra information per HSA queue
The new struct QueueInfo is used to store metadata about each HSA
queue. For hostcall, this structure will eventually contain a pointer to
the hostcall buffer allocated to each HSA queue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#41 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#88 edit
SWDEV-201128 - [HIP] test_snli_cuda failure
Default to sync packet
Make sure GPU_NUM_MEM_DEPENDENCY is 0 for HIP
No sync packet is only used when there are mem dependency check
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_context.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#86 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#28 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Remove the atomic flag from the pool of kernel arguments
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#85 edit
SWDEV-193423 - HIP/VDI - Support for lazy hsa queue creation
- Add queue pool support for HSA HW queues. GPU_MAX_HW_QUEUES controls the pool size. The current default value is 4 (the number of active pipes on GPU).
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#132 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#38 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#81 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#24 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#314 edit
SWDEV-174198 - Put back the fix of segfault running integer_ops.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#80 edit
SWDEV-174198 - Back out changelist 1968648.
Since it causes soft-hang in multiple_device_context test.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#79 edit
SWDEV-191674 - Handling cases where p2p memcpy is initiated from device 1. (No Large Bar/P2P staging).
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#77 edit
SWDEV-180872 - Runtime support changes for Cooperative Group Features
- Initial implementation of the core functionality. Disabled by default. Use GPU_ENABLE_COOP_GROUPS=1 to enable the feature.
- Runtime uses device queue for cooperative executions with a synchronization on the launched queue.
- The current implementation is pure runtime change and it can work if only one app uses this feature. No ROCr/KFD support was added or tested
- Only inline assembler was tested
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device_runtime.cpp#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#15 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#17 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#28 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#338 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#606 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palblit.hpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#142 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.hpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palschedcl.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.hpp#61 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.hpp#12 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocschedcl.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#75 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#23 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#94 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#92 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#311 edit
SWDEV-126897 - Remove calling setDynamicParallelFlag() from submitKernelInternal() in LC/ROCm path.
For LC, setDynamicParallelFlag() is now called at Kernel::InitParameters().
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#69 edit
SWDEV-159881 - [OCL][ROCm] Add SVM coarse-grain buffer support with device memory
1. Use the system memory pool for coarse grain allocations on APUs
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15671/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#39 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#64 edit
SWDEV-145570 - [HIP] Output Kernel name and mem arguments passed with LOG_LEVEL=3 for PAL and ROCm backends
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15617/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#120 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#62 edit
SWDEV-79445 - OCL generic changes and code clean-up
1. std::bind2nd is deprecated in c++11 and removed in c++17. Use a range based for loop without any functor instead.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15430/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#59 edit
SWDEV-158585 - [ROCm] OCLPinnedMemory test is giving VM fault
1. If host ptr is present, use the CopyBuffer path instead of the ReadBuffer path
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15403/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#58 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Add reallocation logic for memory dependency. SVM path can send the amount of SVM ptrs over the max size of arguments
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#420 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#105 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#54 edit
SWDEV-137270 - Add findLocalWorkSize to OpenCL runtime ROC device
- Add support for kernel using image argument
- fix the issue of not using the max workgroup size specified by the environment variables "maxWorkGroupSize*"
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.hpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocsettings.cpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#50 edit
SWDEV-142081 - Performance drop while running GE HealthCare - Penality test test due to CL#1498909
- Port the logic to consider cache line size for the local workgroup detection (CL#1496286)
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#49 edit
SWDEV-136633 - Remove correlation handle.
Not used by debugger or profiler, dead code.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#47 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#15 edit
SWDEV-137374 - Collecting counters from some TCC, TCP and TA block instances causes kernel to hang
- fixed by using the new AQLProfile feature to obtain precise size for command/data buffers
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/roccounters.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#46 edit
SWDEV-79445 - Add logic to handle persistent memory separately from the host memory path
- Some of the optimizations applied to the host memory can't be used for persistent
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocblit.cpp#20 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#45 edit
SWDEV-130808 - set the local sizes to preferredWorkGroupSize_ when clEnqueueNDRange is not given and the kernel does not have required workgroup sizes.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpukernel.cpp#320 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#411 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#36 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rockernel.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#43 edit
SWDEV-121486 - Clean up sampler list if we run out of memory.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#42 edit