SWDEV-214490 - Update HIP RT for texture3D in HIP/PAL on Windows
-Update ihipBindTexture
http://ocltc.amd.com/reviews/r/18333/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#89 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_texture.cpp#28 edit
SWDEV-214490 - Update HIP RT for texture3D in HIP/PAL on Windows
- Update function hipMemcpy3D for Texture Array
- Add hipArrayCubemap support in hipMalloc3DArray
http://ocltc.amd.com/reviews/r/18328/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#88 edit
SWDEV-213526 - pytorch tests fail with hipErrorOutofMemory
There's a bug in ROCr when loading a lot of kernels and not syncing
So for now, if an allocation fails, sync devices and retry before
returning hipErrorOutOfMemory error.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#86 edit
SWDEV-212440 - [HIP] Memory access fault observed on Pytorch while running performance tests with Microbenchmarking script
We need to loop through all the default stream to sync them in case
the app call hipFree on a different current stream and another current stream
is using the memory.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#85 edit
SWDEV-206239 - [HIP] Return hipErrorMemoryAllocation for fine grained VRAM for now
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#83 edit
SWDEV-198863 - Options for hip-clang-vdi path to provide the chicken bits, or functional equivalents to HCC_DB (phase 1)
1. The log macros is turned off for release build. So log functions has zero impact to release build.
2. The log macros have level, mask, condition control. So we can have more control to avoid log flooding.
I also adjusted some existing log to use new log functions.
1. To excercise and test the new log functions.
2. To improve performance slightly.
3. The change is mainly for HIP-ROCM, we can move more in next phases for PAL or ORCA.
4. I make these log feature unavailable for release build. We can revert to old log functions for release build in a case by case method.
Tests:
1. http://ocltc.amd.com:8111/viewModification.html?modId=128289&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=128358&personal=true&tab=vcsModificationBuilds
2. release build, run hip program, there is no log
3. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967295
There was a lot of logs.
4. fastdebug build, run hip program,
export LOG_LEVEL=2
export GPU_LOG_MASK=4294967295
There was no logs.
5. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=4294967294
There was much less logs.
6. fastdebug build, run hip program,
export LOG_LEVEL=3
export GPU_LOG_MASK=47102
There was even much less logs. The logs was expected according to the mask.
7. Tested step 2 to 6 similarily in Windows and Linux
ReviewBoard: http://ocltc.amd.com/reviews/r/18215
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#46 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#82 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hiprtc_internal.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_svm.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/comgrctx.cpp#6 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devkernel.cpp#29 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/devprogram.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#137 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocvirtual.cpp#91 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/command.cpp#100 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/commandqueue.cpp#32 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/runtime.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/debug.hpp#10 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#323 edit
SWDEV-207366 - [HIP] 'hipErrorInvalidValue' (1011) with hipMemcpy3D
We need to divide by sizeByte and not multiply the WidthInBytes to get pixel width
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#79 edit
SWDEV-207100 - [HIP CQE][HIPonPAL][WIN][QR] 5 hiptests failed in 19H1 Windows on all ASICs
1. Reshuffle locations of the hipMemset functions to make them all next to each other.
2. Update the declarations of hipMemsetD8, hipMemsetD8Async, hipMemsetD16, hipMemsetD16Async. These functions are type aware and take in as their third argument the number of elements in the buffer, not the buffer size. Change the name of this argument from sizeBytes to count to align with the above description. Changes for the header are tracked here https://github.com/ROCm-Developer-Tools/HIP/pull/1544
3. Add the actual implementation of hipMemsetD8, hipMemsetD8Async, hipMemsetD16, hipMemsetD16Async.
4. Remove ihipMemset2D() as it is essentially a copy of ihipMemset(). Change hipMemset2D()/hipMemset2DAsync() to use ihipMemset().
5. Implement hipMemset3DAsync().
6. Update the test script to pick up the updated command line options for hipMemset and hipMemset3D.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.def.in#32 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_hcc.map.in#30 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#78 edit
... //depot/stg/opencl/drivers/opencl/make/hip.git/tests/scripts/hip_runtimeapi_tests.txt#13 edit
SWDEV-189650 - [HIP-CLANG][HIP/VDI/PAL] Hangs on test hip_threadfence_system
1. In HIP + VDI + ROCm, allow SVM atomic in VEGA10 and later ASIC. GFX8 (Tonga) was enabled before.
2. In HIP + VDI + PAL Linux driver, allow SVM atomic in VEGA10 and later ASIC.
Tests:
1. In HIP + VDI + ROCm, hip_threadfence_system test passed.
2. In HIP + VDI + PAL + Linux , hip_threadfence_system test passed.
3. OpenCL + PAL, clinfo and ocltest runtime test pass.
4. OpenCL + ROCM, clinfo and ocltest runtime test pass.
5. Windows 10, VEGA 10, clinfo and and ocltest runtime test pass. hip_threadfence_system test passed by skipping the test.
Teamcity presubmission test:
http://ocltc.amd.com:8111/viewModification.html?modId=127083&personal=true&tab=vcsModificationBuildshttp://ocltc.amd.com:8111/viewModification.html?modId=127076&personal=true&tab=vcsModificationBuilds
ReviewBoard: http://ocltc.amd.com/reviews/r/18077/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#73 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/paldevice.cpp#171 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.cpp#80 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/pal/palresource.hpp#31 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocdevice.cpp#134 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/rocm/rocmemory.cpp#44 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#320 edit
SWDEV-203855 - Segfault when using hipArrayCreate and hipMemcpyParam2D
1. hipArrayCreate API implementation uses a wrong parameter to check width. That parameter can be null pointer because it is used to pass the pointer back to the caller.
2. Implement hipMemcpyParam2D similar to HIP-HCC implementation. Reference: https://github.com/ROCm-Developer-Tools/HIP/blob/master/src/hip_memory.cpp
Tests:
1. PRE CHECK-IN build and test(no regression): http://ocltc:8111/viewModification.html?modId=126608&personal=true&init=1&tab=vcsModificationBuilds
2. GPU is VEGA10, OS is Windows 10, CPU is threadripper 1900x, run the test. There is not segfault or exit during hipArrayCreate and hipMemcpyParam2D function call.
ReviewBoard: http://ocltc.amd.com/reviews/r/18037/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#72 edit
SWDEV-201925 - hipArray3DCreate() not available in HIP/PAL on Windows
1. Implement hipArray3DCreate().
2. Remove the array size calculation from hipArrayCreate() as it is not used.
ReviewBoardURL = http://ocltc.amd.com/reviews/r/18005/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#71 edit
SWDEV-203438 - [HIP] AllGather RCCL test issue
The test tries to launch a kernel on two devices at once and they need to communicate with each other.
For that, it uses a custom stream for each devices.
Problem is in getNullStream we used to call syncStreams all the time
and it was syncing all the streams even the ones on different devices.
So that made the second kernel launch (on 2n dev) to wait for the first kernel to finish which
would never occur since the first one was waiting for the second one.
The fix is to not call syncStreams from getNullStream because we sync already anyway prior in general.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_context.cpp#21 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_event.cpp#16 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#40 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#70 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#41 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_stream.cpp#24 edit
SWDEV-198556 - [HIP] Use src/dstMemory->getContext instead of host_context.
Also relax the check for P2P copies in case of hipMemcpy(hostMalloced, hipMalloced(dev1), dev0)
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#67 edit
SWDEV-198556 - [HIP] Gnarly bug due to macros:
HIP_RETURN(ret) duplicates ret twice first by setting the last error
then via LogDebugInfo. So if HIP_RETURN has a function as a parameter,
the function would get called twice. So ihipMalloc and ihipMemcpy were
being called twice (and perhaps more functions).
Also logging the pointer returned by ihipMalloc so we can track memory
in logs more easily.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#33 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#65 edit
SWDEV-197168 - [HIP] handle width or height or src or dst being 0
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#63 edit
SWDEV-189500 - [HIP] Have to force async=false for host to device case as well
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#61 edit
SWDEV-194872 - [HIP] CUDA and HCC sync after a DeviceToHost async copy.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#60 edit
SWDEV-189383 - [HIP CQE][HIPonPAL][WIN] hipDeviceMalloc, hip_test_ldg, hipHostRegister, hipModule, hipStreamSync2 tests failed on VEGA10.
1. For pinned memory allocations add the host pointer and all of its respective device pointers to the memory object map.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#57 edit
SWDEV-189488 - [HIP] Caffe2 TensorTest.TensorSerializationMultiDevices fails
1. Make sure to set attributes->device to current device for host malloc'd
2. Return hipSuccess for hipDeviceCanAccessPeer
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#56 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_peer.cpp#4 edit
SWDEV-145570 - Check host_context when matching GPU device.
- In CL#1766264, `host_context` is introduced for mGPU support. Need to
match that context specially when trying to match GPU device context.
The following tests passed:
$ python test_dataloader.py TestDictDataLoader.test_pin_memory
.
----------------------------------------------------------------------
Ran 1 test in 0.004s
OK
$ python test_dataloader.py TestDataLoader.test_sequential_pin_memory
.
----------------------------------------------------------------------
Ran 1 test in 0.063s
OK
$ python test_dataloader.py TestDataLoader.test_shuffle_pin_memory
.
----------------------------------------------------------------------
Ran 1 test in 0.174s
OK
$ python test_dataloader.py TestStringDataLoader.test_shuffle_pin_memory
.
----------------------------------------------------------------------
Ran 1 test in 0.104s
OK
$ python test_torch.py TestTorch.test_pin_memory
.
----------------------------------------------------------------------
Ran 1 test in 0.124s
OK
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#52 edit
SWDEV-144570 - Fix pointer attribute query.
- For memory not registered with runtime, return
`hipErrorInvalidValue`. That's the behavior expected to check whether
a host buffer is pinned.
- Return `hipErrorInvalidDevice` in case a registered memory object
cannot find its matching device.
RB: http://ocltc.amd.com/reviews/r/17094/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#51 edit
SWDEV-145570 - [HIP] Use a context with all devices in system for host register
hipHostRegister and hipMemcpy 0x10 and 0x20 fail in mGPU systems because
we only register the memory on the current device. But in HIP, the registering
needs to happen on all devices.
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_context.cpp#17 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_internal.hpp#26 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#50 edit
SWDEV-145570 - [HIP] - Fix some issues in hip runtime
- Set stream for event
- Free mem needs to be reported in bytes but runtime backends reports in Kb
ReviewBoardURL = http://ocltc.amd.com/reviews/r/15586/diff/
Affected files ...
... //depot/stg/opencl/drivers/opencl/api/hip/hip_memory.cpp#40 edit
... //depot/stg/opencl/drivers/opencl/api/hip/hip_module.cpp#15 edit