SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 3)
Use ClPrint to implement other log functions.
Move some funtion to use new log functions.
This is the final change of the JIRA.
Tests:
1. Linux HIP ROCM platform. VEGA10. Driver is release build.
1.1 export LOG_LEVEL=3
./hipModule
There are many logs.
1.2 export GPU_LOG_MASK=0
./hipModule
There is no log
2. Windows HIP PAL platform. VEGA10, Driver is release build.
2.1 set LOG_LEVEL=3
run test hipPrintfKernel
There are many logs
2.2 set GPU_LOG_MASK=0
run test hipPrintfKernel
There is no log
3. http://ocltc.amd.com:8111/viewModification.html?modId=128588&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18259/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#177 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#157 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#14 edit
SWDEV-198859 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB
There are regression caused by this change in ocltst test.
Back out changelist 2026859
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#176 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#156 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#13 edit
SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 3)
Use ClPrint to implement other log functions.
Move some funtion to use new log functions.
This is the final change of the JIRA.
Tests:
1. Linux HIP ROCM platform. VEGA10. Driver is release build.
1.1 export LOG_LEVEL=3
./hipModule
There are many logs.
1.2 export GPU_LOG_MASK=0
./hipModule
There is no log
2. Windows HIP PAL platform. VEGA10, Driver is release build.
2.1 set LOG_LEVEL=3
run test hipPrintfKernel
There are many logs
2.2 set GPU_LOG_MASK=0
run test hipPrintfKernel
There is no log
3. http://ocltc.amd.com:8111/viewModification.html?modId=128490&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18247/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#175 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#155 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#12 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Fix the detection of all devices for P2P. The previous logic worked only if GPU_ENABLE_PAL was forced to 1.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#174 edit
SWDEV-209969 - SQTT missing event instrumenting token in OpenCL trace
- SQTT trace reports memory clears if program load occurs during the capture. Avoid memory clears with GPU if CPU backing store is available
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palprogram.cpp#100 edit
SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 1)
1. The log macros is turned off for release build. So log functions has zero impact to release build.
2. The log macros have level, mask, condition control. So we can have more control to avoid log flooding.
I also adjusted some existing log to use new log functions.
1. To excercise and test the new log functions.
2. To improve performance slightly.
3. The change is mainly for HIP-ROCM, we can move more in next phases for PAL or ORCA.
4. I make these log feature unavailable for release build. We can revert to old log functions for release build in a case by case method.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=128289&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=128358&personal=true&tab=vcsModificationBuilds
2. release build, run hip program, there is no log
3. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967295
There was a lot of logs.
4. fastdebug build, run hip program,
export LOG_LEVEL=2
export GPU_LOG_MASK=4294967295
There was no logs.
5. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967294
There was much less logs.
6. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=47102
There was even much less logs. The logs was expected according to the mask.
7. Tested step 2 to 6 similarily in Windows and Linux
ReviewBoard: http://ocltc.amd.com/reviews/r/18215
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hiprtc_internal.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/comgrctx.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#91 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/runtime.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#323 edit
SWDEV-200688 - Correct some misalignment between DeviceInfo and CALtarget, so that VEGAM can be correctly recognized as gfx804 instead of gfx902
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/caltarget.h#10 edit
SWDEV-208424 - ROCr language runtime should not free code object until executable destroy
- Reshuffle the code to make sure HSA runtime can keep the pointer to the code object
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#67 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Fix memory leaks in COMGR path. Don't create binaryData, since it will be overwritten with action_data_get_data() call.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#65 edit
SWDEV-208424 - ROCr language runtime should not free code object until executable destroy
- Keep the code object reader alive until the program destruction. Update HSAIL path only, since LC path already handles it correctly.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocprogram.cpp#107 edit
SWDEV-207662 - [EURI][OPENCL][Forum 244452]: Multiple printf statements inside kernel producing strange output on Vega on Windows
- Correct the printf arguments parsing logic. Don't use local PrintfInfo info, because it can contain some stale data after the first iteration
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#27 edit
SWDEV-204511 - [NV14 XTM] OpenCL Conformance Test Fails
- Handle different ABI versions for LC and HSAIL if single context with multiple devices was used. LC changed the locaiton of hidden arguments and HSAIL path requires patching
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#83 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/kernel.hpp#26 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/program.cpp#105 edit
SWDEV-184710 - Support hipLaunchCooperativeKernelMultiDevice()
- Add support for multi grid launch in hip
- Detect the new hidden argument and pass the required information for the kernel launch
- Memory for synchronization is allocated as a single object and then the offset for each GPU is found
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#343 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#90 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#99 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#97 edit
SWDEV-79445 - OCL generic changes and code clean-up
- Restore xnack support for Navi1x HW(requires COMGR support). Only Navi2x should have a fix in HW
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#64 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#173 edit
SWDEV-198862 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_OPT_FLUSH
Add HCC_OPT_FLUSH flag to use fence scope agent when possible for HIP VDI. The flag is defaulted to turn on, similiar to HIP HCC.
Add AMD_OCL_OPT_FLUSH to use fence scope agent when possible for OpenCL. This was tested in Windows and PAL. Default is off.
This flag can be used for future OpenCL test.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=127189&personal=true&tab=vcsModificationBuilds
The teamcity test includes HIP - VDI - Rocm tests.
2. VEGA10 , Windows, HIP, 110 hiptests PASS.
3. VEGA10 , Linux AMDGPU PRO, HIP - PAL, 110 hiptests PASS.
Newer:
http://ocltc.amd.com:8111/viewModification.html?modId=127193&personal=true&tab=vcsModificationBuilds
Reviewboard: http://ocltc.amd.com/reviews/r/18092/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/device.cpp#247 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#342 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#89 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#321 edit
SWDEV-205994 - [CQE OCL][NAVI10][DTB-BLOCKER] ~ 10% -50% performance drop observed while running IndigoBench Benchmark | Faulty CL#2007647
- PAL changed the value reported in numAvailableVgprs on Navi10. Runtime has to switch to vgprsPerSimd for scratch buffer size calculation.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#172 edit
SWDEV-193973 - Update perfcounter info to accomodate PAL interface changes
Gfx103 added perf counters for three new blocks - GeDist, GeSe and Df
1. Update the blockIdToIndexSelect array to reflect these changes.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18063/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palcounters.cpp#25 edit
SWDEV-204782 - store extra information per HSA queue
The new struct QueueInfo is used to store metadata about each HSA
queue. For hostcall, this structure will eventually contain a pointer to
the hostcall buffer allocated to each HSA queue.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#135 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#41 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#88 edit
SWDEV-189650 - [HIP-CLANG][HIP/VDI/PAL] Hangs on test hip_threadfence_system
1. In HIP + VDI + ROCm, allow SVM atomic in VEGA10 and later ASIC. GFX8 (Tonga) was enabled before.
2. In HIP + VDI + PAL Linux driver, allow SVM atomic in VEGA10 and later ASIC.
Tests:
1. In HIP + VDI + ROCm, hip_threadfence_system test passed.
2. In HIP + VDI + PAL + Linux , hip_threadfence_system test passed.
3. OpenCL + PAL, clinfo and ocltest runtime test pass.
4. OpenCL + ROCM, clinfo and ocltest runtime test pass.
5. Windows 10, VEGA 10, clinfo and and ocltest runtime test pass. hip_threadfence_system test passed by skipping the test.
Teamcity presubmission test:
http://ocltc.amd.com:8111/viewModification.html?modId=127083&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=127076&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18077/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#320 edit
SWDEV-204999 - [hipclang-vdi-rocm] TF unit test tracking.util_xla_test_gpu fails to run
- Change the HSACO detection logic to use e_machine
- Allow to load a binary without any kernel.
Affected files ...
... //depot/stg/opencl/drivers/opencl/compiler/lib/loaders/elf/elf.hpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#63 edit
SWDEV-86035 - Integrate PAL from //depot/stg/pal_prm/...
- Adjust Gfx9PlusSubDeviceInfo for the new defines
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldefs.hpp#62 edit
SWDEV-79445 - OCL generic changes and code clean-up
- More changes for VanGoghLite support in OCL
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#168 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#103 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
1. Don't ignore the PAL_ALWAYS_RESIDENT flag for HIP.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18061/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#101 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
Add undefined memory object in PAL process memory objects.
http://ocltc.amd.com/reviews/r/18055/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.hpp#33 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#154 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
The runtime cannot trivially determine all the resources that will be used by a kernel, thus it can fail to make all of them resident.
1. Add new runtime flag PAL_ALWAYS_RESIDENT. Enabling this setting will cause resources to become resident at allocation time.
2. Set the default value of the above flag to true for HIP and false for OCL.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18054/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#79 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palsettings.hpp#27 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palvirtual.cpp#153 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#319 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
1. Correctly set the image type for textures created from arrays.
2. Allow creating any kind of image from a buffer.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18051/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_texture.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#166 edit
SWDEV-86035 - Swtich PAL to the new interface 533
- Correct the logic for compute queues detection. Linux doesn't support realtime queues with CU reservation.
Affected files ...
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#163 edit
SWDEV-201128 - [HIP] test_snli_cuda failure
Default to sync packet
Make sure GPU_NUM_MEM_DEPENDENCY is 0 for HIP
No sync packet is only used when there are mem dependency check
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_context.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#86 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#28 edit