These fixes address regressions caused by http://gerrit-git.amd.com/c/compute/ec/hip/+/337601
Currently we're converting a 1D offset into a 3D offset, which doesn't make much sense once you consider the fact that this offset is relative to a different origin than our current 3D offset.
I traced through our blit kernels in VDI - the copy buffer rect path is able to handle immediate offsets in the 3D buffer via the amd::BufferRect::start_ parameter.
Instead of adjusting the offset, simply adjust the start of the region.
Change-Id: Ic8797a2c8ac0ad106f246f61ff06ca1ca03d3058
This patch is a workaround to support user pitch for hipMemcpy{2D/3D}.
Historically OpenCL didn't support pitch with clEnqueueFillBuffer(), so neither did we in VDI. Adding it now will be slightly nontrivial, since the fill kernel and runtime in many places will need to be modified.
As a temporary workaround for cases when pitch > width, we can just enqueue a fill for each row separately. This implementation is slow, but it satisfies the correctness criteria.
Change-Id: Idfeca349288b51d6ff84a7cf001fb63c6a66818a
- Use symbol value as the qeury key. Compared to the symbol name, the
symbol value is more robust as developers may use unqualified or
qualified identifiers. It also removes the mangling and/or demangling
requirement for the runtime API.
Change-Id: I9d4259f3842612c7cc98551269fc2092d8b5c19e
The only requirment from hipMallocPitch() is that the returned pitch is aligned to the HW image pitch alignment. There is no restriction on the size of the allocation, since the memory might not be used for images.
Change-Id: I97438e5fe4012ca4721b14b85f514dbac803c17c
Instead of using the sampler field force_degamma to perform sRGB->linear conversion during pixel sampling, we use an appropriate image format instead. The overhead of this is having to create an image view when creating a texture object from an array.
Change-Id: I1ca368c312c1fd4b6f784a3a1b35b5eeb28070ff
What Cuda refers to "linear texture memory" is the OpenCL equivalent of CL_MEM_OBJECT_IMAGE1D_BUFFER. For these types of allocations we should create a typed buffer instead of an image.
Currently there is no check in the texture fetch functions as to what kind of SRD is written into the texture object, so any kind of incorrect programming will cause the TA to hang. Fortunately for us, every one writes correct code :)
Change-Id: I80dab85a992f2c0754ebf303d40ac6b5e045c7c1
HIP assumes that image width is in bytes, but VDI assumes that image width in pixels. Need to perform byte -> pixel coversion before doing anything.
Change-Id: Ia9fd1f46d05db3fbe8049add10b4d7e5118a2b9a
Incoming changes from upstream split the struct hipMemcpy3DParms into two separate ones - hipMemcpy3DParms and HIP_MEMCPY3D, which are cudaMemcpy3DParms and CUDA_MEMCPY3D equivalents respectively.
Note that HIP_MEMCPY3D is missing half the members of CUDA_MEMCPY3D (this should be fixed in PR#1887). Work around this by using a substitute _HIP_MEMCPY3D struct for now.
Change-Id: Ic15134e6deb260189b662b3804d2309a9b8473e9
Add hip::syncStreams(dev) to sync blocking streams on a given device.
hip::syncStreams(void) should only sync streams on the current device.
Change-Id: Ib6b0735215fa0ed12c646ebd029e9763ee3712ce
Also make D2H and H2D keep track of the chain of events
when we need to use a different HostQueue.
Change-Id: I1c5da6ea6104b37ad7aac00f0eb8ea9371e6ba1c
The current texture implementation is based off the one for HIP-HCC. There's a lot of problems with it - only creating images from buffers, hard coding logic and ignoring user parameters. This leads to a whole lot of UB even with simple examples (as seen with RedShift's code).
This CL is aimed to bring the HIP-VDI texture implementation closer to what is described by Cuda.
hipMemcpyAtoA() - image to image copy.
hipMemcpyHtoA()/hipMemcpyDtoA() - buffer to image copy.
hipMemcpyAtoH()/hipMemcpyAtoD() - image to buffer copy.
hipArrayCreate()/hipArray3DCreate()/hipMallocArray()/hipMalloc3DArray() - creates 1D/2D/3D/1D Array/2D Array images.
hipCreateTextureObject() - creates sampler, (optional) creates 1D/2D image from buffer, (optional) creates image views.
hipBindTexture() - creates 1D image from buffer (should create a typed buffer, however this is not compatible with HIP-HCC).
hipBindTexture2D() - creates 2D image form buffer.
hipBindTextureToArray() - creates image view.
hipTexRefSetAddress() - creates 1D image from buffer (should create a typed buffer, however this is not compatible with HIP-HCC).
hipTexRefSetAddress2D() - creates 2D image from buffer.
hipTexRefSetArray() - creates image view.
There are still a lot of TODOs in the code, here's a few important ones:
1. VDI doesn't support a lot of sampler flags.
2. VDI doesn't support device to image 2D/3D copy.
3. Mipmaps implementation is incomplete.
4. Image view implementation is incomplete.
Change-Id: Ia374ee27aa14f76451fee7667495036f4419a487