SWDEV-145570 - [HIP] Fetched properties from current device and not default 0 one.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_device_runtime.cpp#21 edit
SWDEV-206239 - [HIP] Return hipErrorMemoryAllocation for fine grained VRAM for now
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#83 edit
SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 1)
1. The log macros is turned off for release build. So log functions has zero impact to release build.
2. The log macros have level, mask, condition control. So we can have more control to avoid log flooding.
I also adjusted some existing log to use new log functions.
1. To excercise and test the new log functions.
2. To improve performance slightly.
3. The change is mainly for HIP-ROCM, we can move more in next phases for PAL or ORCA.
4. I make these log feature unavailable for release build. We can revert to old log functions for release build in a case by case method.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=128289&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=128358&personal=true&tab=vcsModificationBuilds
2. release build, run hip program, there is no log
3. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967295
There was a lot of logs.
4. fastdebug build, run hip program,
export LOG_LEVEL=2
export GPU_LOG_MASK=4294967295
There was no logs.
5. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967294
There was much less logs.
6. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=47102
There was even much less logs. The logs was expected according to the mask.
7. Tested step 2 to 6 similarily in Windows and Linux
ReviewBoard: http://ocltc.amd.com/reviews/r/18215
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hiprtc_internal.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/comgrctx.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#91 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/runtime.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#323 edit
SWDEV-145570 - [HIP] Fix occupancy API prototype.
- They need to be C API, i.e. extern "C".
- Follow the current API and use `uint32_t` instead of `int`.
+ TODO: We need to revert them back once that APIs are changed to be
compatible with CUDA.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#46 edit
SWDEV-197289 - VDI tracing API integration in rocTracer
- Change the names of the functions according to the new interface
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_activity.cpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#34 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#32 edit
SWDEV-207366 - [HIP] 'hipErrorInvalidValue' (1011) with hipMemcpy3D
We need to divide by sizeByte and not multiply the WidthInBytes to get pixel width
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#79 edit
SWDEV-207100 - [HIP CQE][HIPonPAL][WIN][QR] 5 hiptests failed in 19H1 Windows on all ASICs
1. Reshuffle locations of the hipMemset functions to make them all next to each other.
2. Update the declarations of hipMemsetD8, hipMemsetD8Async, hipMemsetD16, hipMemsetD16Async. These functions are type aware and take in as their third argument the number of elements in the buffer, not the buffer size. Change the name of this argument from sizeBytes to count to align with the above description. Changes for the header are tracked here https://github.com/ROCm-Developer-Tools/HIP/pull/1544
3. Add the actual implementation of hipMemsetD8, hipMemsetD8Async, hipMemsetD16, hipMemsetD16Async.
4. Remove ihipMemset2D() as it is essentially a copy of ihipMemset(). Change hipMemset2D()/hipMemset2DAsync() to use ihipMemset().
5. Implement hipMemset3DAsync().
6. Update the test script to pick up the updated command line options for hipMemset and hipMemset3D.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#32 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#30 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#78 edit
... //depot/stg/opencl/drivers/opencl/make/hip.git/tests/scripts/hip_runtimeapi_tests.txt#13 edit
SWDEV-207449 - [HIP CQE][HIPonPAL][LNX][QR] 6 hiptests failed on all ASICs
hipTestHalf fails to build on Windows due to linker error "unresolved external symbol __gnu_h2f_ieee"
1. Expose __gnu_h2f_ieee() and __gnu_f2h_ieee() for Windows builds.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18127/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#31 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#45 edit
SWDEV-205925 - Update HIP texture APIs for issue in hipTexRefSetAddress in HIP/PAL on Windows
- Remove the nullptr possibility
http://ocltc.amd.com/reviews/r/18121/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_texture.cpp#23 edit
SWDEV-184710 - Support hipLaunchCooperativeKernelMultiDevice()
- Add support for multi grid launch in hip
- Detect the new hidden argument and pass the required information for the kernel launch
- Memory for synchronization is allocated as a single object and then the offset for each GPU is found
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/device.hpp#343 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#25 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.hpp#17 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palkernel.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#136 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.hpp#42 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#90 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#30 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#99 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.hpp#97 edit
SWDEV-189650 - [HIP-CLANG][HIP/VDI/PAL] Hangs on test hip_threadfence_system
1. In HIP + VDI + ROCm, allow SVM atomic in VEGA10 and later ASIC. GFX8 (Tonga) was enabled before.
2. In HIP + VDI + PAL Linux driver, allow SVM atomic in VEGA10 and later ASIC.
Tests:
1. In HIP + VDI + ROCm, hip_threadfence_system test passed.
2. In HIP + VDI + PAL + Linux , hip_threadfence_system test passed.
3. OpenCL + PAL, clinfo and ocltest runtime test pass.
4. OpenCL + ROCM, clinfo and ocltest runtime test pass.
5. Windows 10, VEGA 10, clinfo and and ocltest runtime test pass. hip_threadfence_system test passed by skipping the test.
Teamcity presubmission test:
http://ocltc.amd.com:8111/viewModification.html?modId=127083&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=127076&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18077/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#320 edit
SWDEV-205724 - Issue with hipTexRefSetAddress in HIP/PAL on Windows
Handle nullptr channel format desc
http://ocltc.amd.com/reviews/r/18065/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_texture.cpp#20 edit
SWDEV-144570 - Adding extern var support for dynamically loaded modules for Texture reference.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#43 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#42 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_platform.cpp#43 edit
SWDEV-192384 - [HIP CQE][HIPonPAL][19.40] hipBindTexRef1DFetch, hipTextureRef2D are failed on all ASICs for both Win/Lnx
1. Correctly set the image type for textures created from arrays.
2. Allow creating any kind of image from a buffer.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18051/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_texture.cpp#19 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#166 edit
SWDEV-203855 - Segfault when using hipArrayCreate and hipMemcpyParam2D
1. hipArrayCreate API implementation uses a wrong parameter to check width. That parameter can be null pointer because it is used to pass the pointer back to the caller.
2. Implement hipMemcpyParam2D similar to HIP-HCC implementation. Reference: https://github.com/ROCm-Developer-Tools/HIP/blob/master/src/hip_memory.cpp
Tests:
1. PRE CHECK-IN build and test(no regression): http://ocltc:8111/viewModification.html?modId=126608&personal=true&init=1&tab=vcsModificationBuilds
2. GPU is VEGA10, OS is Windows 10, CPU is threadripper 1900x, run the test. There is not segfault or exit during hipArrayCreate and hipMemcpyParam2D function call.
ReviewBoard: http://ocltc.amd.com/reviews/r/18037/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#72 edit
SWDEV-201925 - hipArray3DCreate() not available in HIP/PAL on Windows
1. Implement hipArray3DCreate().
2. Remove the array size calculation from hipArrayCreate() as it is not used.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18005/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#71 edit
SWDEV-201128 - [HIP] test_snli_cuda failure
Default to sync packet
Make sure GPU_NUM_MEM_DEPENDENCY is 0 for HIP
No sync packet is only used when there are mem dependency check
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_context.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#86 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.hpp#28 edit
SWDEV-203438 - [HIP] AllGather RCCL test issue
The test tries to launch a kernel on two devices at once and they need to communicate with each other.
For that, it uses a custom stream for each devices.
Problem is in getNullStream we used to call syncStreams all the time
and it was syncing all the streams even the ones on different devices.
So that made the second kernel launch (on 2n dev) to wait for the first kernel to finish which
would never occur since the first one was waiting for the second one.
The fix is to not call syncStreams from getNullStream because we sync already anyway prior in general.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_context.cpp#21 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_event.cpp#16 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#40 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#41 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#24 edit