Граф коммитов

40 Коммитов

Автор SHA1 Сообщение Дата
Vlad Sytchenko cf52b5fca4 Handle copy cases where the uses pases a bad ptr
Change-Id: I4490b8519e4d0dde25b845f9ca7e81c1d80d4f28
2020-04-28 11:04:29 -04:00
kjayapra-amd 1f3789d5a2 SWDEV-232008 - Handle cases where Memcpy size is less than Symbol Size.
Change-Id: I046bc70445dfd4f6fdfd06415f53d1fc06f425b4
2020-04-27 10:47:28 -04:00
Michael LIAO a0acf7bdaa [vdi] Revise the symbol management.
- As different modules may have symbols with the same name, each symbol
  needs identifying with a pair of the module handle and the symbol
  name.

Change-Id: I85650a787d9a424545154cc40ebd59e706fa358f
2020-04-24 22:28:31 -04:00
German Andryeyev f7f7337bae SWDEV-232918
hipEventRecord is much slower in hipclang/vdi

- Make sure default streams don't sync each other.
- Add null stream into the list of default streams.
- Code clean-up to simplify queue look-up.

Change-Id: I36e1fc8d86a600e3dce806694d95d146ed8afd03
2020-04-24 14:40:43 -04:00
kjayapra-amd b443172d12 SWDEV - 231874 - Do Dword aligned memset if the total size aligns.
Change-Id: Id05db4cfd9c43b2cffa3cec8b02f1cd07f340dd6
2020-04-23 21:08:50 -04:00
German Andryeyev db70fc66b7 SWDEV-231579 - [hipclang-vdi-rocm][perf]
- HIPPerfDispatchSpeed disparity between HIP/HCC vs HIP/VDI
Insert a wait marker command in the default stream only when
HIP has pending operations on other async streams

Change-Id: I68660a54867fab7571ba57eb1df5feb1bca1c61a
2020-04-21 10:05:26 -04:00
kjayapra-amd 2eb8cc8e90 SWDEV-229480 - Improve error messages in HIP Layer.
Change-Id: I054b979d3aa6cf6ed4ca14a9393bdcba757772ff
2020-04-20 21:23:52 -04:00
Vlad Sytchenko f95cdb09c2 Correctly calculate size of the copy region
Since we adjust we adjust the start of the region, amd::BufferRect::end_ is no longer the size, just the offset as to where the region ends.

The actual size of the region is (amd::BufferRect::end_ - amd::BufferRect::start_).

Change-Id: I8425d8bdfb20f485740863813e762e8923d9ee94
2020-04-14 12:25:17 -04:00
Payam 8c26e0b3b6 updated LOG_LEVEL prints to print pid and tid
Change-Id: I8a9212b26bb7e312408a222823efcfd00344094b
2020-04-06 16:58:25 -04:00
Vladislav Sytchenko 1bd640b659 (SWDEV-228488)
These fixes address regressions caused by http://gerrit-git.amd.com/c/compute/ec/hip/+/337601

Currently we're converting a 1D offset into a 3D offset, which doesn't make much sense once you consider the fact that this offset is relative to a different origin than our current 3D offset.

I traced through our blit kernels in VDI - the copy buffer rect path is able to handle immediate offsets in the 3D buffer via the amd::BufferRect::start_ parameter.

Instead of adjusting the offset, simply adjust the start of the region.

Change-Id: Ic8797a2c8ac0ad106f246f61ff06ca1ca03d3058
2020-04-06 14:17:11 -04:00
Christophe Paquot fa5a9b3810 Default HostMalloc to uncached memory
Change-Id: I72e19c7f7820a77fd5afc09f09cfea9acd0b8e84
2020-04-03 19:19:33 -04:00
Michael LIAO 9e619430f4 [vdi] Add hipFreeHost
Change-Id: I8a5b7ff3f0ab4f5674efd6723c18808ad6ef33f5
2020-04-03 16:34:28 -04:00
Vladislav Sytchenko 5f14ae1161 Take into an account the number of channels...
when querying the element size of an array.

Change-Id: Id57d3374b14d80a59230ec8286704f2fbabb0fae
2020-04-03 15:43:18 -04:00
Vladislav Sytchenko 77223a8eca (SWDEV-229354)
This patch is a workaround to support user pitch for hipMemcpy{2D/3D}.

Historically OpenCL didn't support pitch with clEnqueueFillBuffer(), so neither did we in VDI. Adding it now will be slightly nontrivial, since the fill kernel and runtime in many places will need to be modified.

As a temporary workaround for cases when pitch > width, we can just enqueue a fill for each row separately. This implementation is slow, but it satisfies the correctness criteria.

Change-Id: Idfeca349288b51d6ff84a7cf001fb63c6a66818a
2020-03-31 18:12:56 -04:00
Michael LIAO b72196613a [vdi] Fix hipGetSymbol{Address|Size}
- Use symbol value as the qeury key. Compared to the symbol name, the
  symbol value is more robust as developers may use unqualified or
  qualified identifiers. It also removes the mangling and/or demangling
  requirement for the runtime API.

Change-Id: I9d4259f3842612c7cc98551269fc2092d8b5c19e
2020-03-31 00:26:53 -04:00
Christophe Paquot 94a7ef6ed1 Do not retry to allocate when OOM. Shouldn't be needed since we idle on Free.
SWDEV-229214

Change-Id: I183006f409388e3c7981f2569649d01d6378be46
2020-03-30 12:49:48 -07:00
Vladislav Sytchenko 6ed73f50f7 (SWDEV-228794)
Adjust the origin of the copy if the user passes a pointer that wasn't allocated by the runtime.

Change-Id: I0aeb20195ed730857a461a53f537626ec2573fd1
2020-03-27 16:33:16 -04:00
Vladislav Sytchenko fd7a8f0367 (SWDEV-228794)
Add hipMallocHost()

Change-Id: Ia3c7c5ca94b39fe30f3a51d1b60782d3472259ff
2020-03-27 15:57:48 -04:00
Vladislav Sytchenko a91b82f00e (SWDEV-228782)
The only requirment from hipMallocPitch() is that the returned pitch is aligned to the HW image pitch alignment. There is no restriction on the size of the allocation, since the memory might not be used for images.

Change-Id: I97438e5fe4012ca4721b14b85f514dbac803c17c
2020-03-27 15:52:17 -04:00
Vladislav Sytchenko 2a2b9e47a0 Add support for formating hipExtent objects
Change-Id: Iea54a510e81a856c0c450305b3e5a7179ee48295
2020-03-26 14:45:20 -04:00
Vladislav Sytchenko e0187ba405 Add initial entry points for mipmapped array API
Change-Id: Icd59cc7323ddcb6773da6105260415a1e6f4cdcb
2020-03-26 14:45:20 -04:00
Vladislav Sytchenko 8ddeeb4551 Enable initial sRGB support
Instead of using the sampler field force_degamma to perform sRGB->linear conversion during pixel sampling, we use an appropriate image format instead. The overhead of this is having to create an image view when creating a texture object from an array.

Change-Id: I1ca368c312c1fd4b6f784a3a1b35b5eeb28070ff
2020-03-26 14:45:20 -04:00
Vladislav Sytchenko faf3b83594 Handle offsets for dptr <-> image copies
Change-Id: I7a4a56ee07a26a741d2aac35502446d248f720ad
2020-03-26 14:45:20 -04:00
Saleel Kudchadker 8b39e0b74e Sync streams when freeing or destroying mem
Change-Id: I6932f225a8b932bb2adbd5e37880f7e604496809
2020-03-20 10:53:23 -07:00
Vladislav Sytchenko 4829a7c215 Add support for creating typed buffers
What Cuda refers to "linear texture memory" is the OpenCL equivalent of CL_MEM_OBJECT_IMAGE1D_BUFFER. For these types of allocations we should create a typed buffer instead of an image.

Currently there is no check in the texture fetch functions as to what kind of SRD is written into the texture object, so any kind of incorrect programming will cause the TA to hang. Fortunately for us, every one writes correct code :)

Change-Id: I80dab85a992f2c0754ebf303d40ac6b5e045c7c1
2020-03-18 18:15:17 -04:00
Vladislav Sytchenko ecd7c99b49 Add hipDrvMemcpy3D.
This is the equivalent of cuMemcpy3D.

Change-Id: Ib2e06dbd6f5093c931cdfd36c87617f32acffc2d
2020-03-09 16:11:25 -04:00
Vladislav Sytchenko 2e9d177304 Add hipMemcpy3DParams -> HIP_MEMCPY3D conversion
Change-Id: I66dd1ae722b8c0d7ddb2562c958b14854e6b86bc
2020-03-06 14:33:52 -05:00
Vladislav Sytchenko 3fc50731ae Correct logic in ihipMemcpyAtoA()
HIP assumes that image width is in bytes, but VDI assumes that image width in pixels. Need to perform byte -> pixel coversion before doing anything.

Change-Id: Ia9fd1f46d05db3fbe8049add10b4d7e5118a2b9a
2020-03-05 15:03:18 -05:00
Vladislav Sytchenko f28e2ebb42 Fix typos in ihipMemcpy3D()
Change-Id: I8720f113642f00bb013cf46284e9b13cc932bf4a
2020-03-05 14:11:55 -05:00
agodavar 65ffaf82d8 Enable tests that are passing on hip-vdi
Change-Id: I7de965f7e8bb7e4d0fa61cd584f0cd118c1e212e
2020-03-05 16:08:51 +05:30
Vladislav Sytchenko 87178f9456 Fix hipMemcpy3d (partially)
Incoming changes from upstream split the struct hipMemcpy3DParms into two separate ones - hipMemcpy3DParms and HIP_MEMCPY3D, which are cudaMemcpy3DParms and CUDA_MEMCPY3D equivalents respectively.

Note that HIP_MEMCPY3D is missing half the members of CUDA_MEMCPY3D (this should be fixed in PR#1887). Work around this by using a substitute _HIP_MEMCPY3D struct for now.

Change-Id: Ic15134e6deb260189b662b3804d2309a9b8473e9
2020-03-01 13:52:05 -05:00
Christophe Paquot fa6ed89cff Blocking and default streams' sync:
Add hip::syncStreams(dev) to sync blocking streams on a given device.
hip::syncStreams(void) should only sync streams on the current device.

Change-Id: Ib6b0735215fa0ed12c646ebd029e9763ee3712ce
2020-02-26 08:54:00 -08:00
Saleel Kudchadker 631fd65be4 Use the context variant of getNullStream
Do not create a new queue to call finish in hipFree if none was
created earlier elsewhere.
Change-Id: I87bb191e6b186ddbe607ab29d11e3ae5bc2ac8e6
2020-02-25 00:13:43 -08:00
Christophe Paquot 2203093159 Fixed a few multithreaded potential issues
Also make D2H and H2D keep track of the chain of events
when we need to use a different HostQueue.

Change-Id: I1c5da6ea6104b37ad7aac00f0eb8ea9371e6ba1c
2020-02-24 20:14:10 -08:00
Vladislav Sytchenko c22eb7808d HIP-VDI texture rework
The current texture implementation is based off the one for HIP-HCC. There's a lot of problems with it - only creating images from buffers, hard coding logic and ignoring user parameters. This leads to a whole lot of UB even with simple examples (as seen with RedShift's code).

This CL is aimed to bring the HIP-VDI texture implementation closer to what is described by Cuda.

hipMemcpyAtoA() - image to image copy.
hipMemcpyHtoA()/hipMemcpyDtoA() - buffer to image copy.
hipMemcpyAtoH()/hipMemcpyAtoD() - image to buffer copy.

hipArrayCreate()/hipArray3DCreate()/hipMallocArray()/hipMalloc3DArray() - creates 1D/2D/3D/1D Array/2D Array images.
hipCreateTextureObject() - creates sampler, (optional) creates 1D/2D image from buffer, (optional) creates image views.
hipBindTexture() - creates 1D image from buffer (should create a typed buffer, however this is not compatible with HIP-HCC).
hipBindTexture2D() - creates 2D image form buffer.
hipBindTextureToArray() - creates image view.
hipTexRefSetAddress() - creates 1D image from buffer (should create a typed buffer, however this is not compatible with HIP-HCC).
hipTexRefSetAddress2D() - creates 2D image from buffer.
hipTexRefSetArray() - creates image view.

There are still a lot of  TODOs in the code, here's a few important ones:
1. VDI doesn't support a lot of sampler flags.
2. VDI doesn't support device to image 2D/3D copy.
3. Mipmaps implementation is incomplete.
4. Image view implementation is incomplete.

Change-Id: Ia374ee27aa14f76451fee7667495036f4419a487
2020-02-24 15:23:45 -05:00
Christophe Paquot a9a6cae956 SWDEV-223262
hipMemcpyWithStream is supposed to be synchronous.

Change-Id: Ie44e37ecc9246e26a6b315c01e88a279f9e42fd7
2020-02-19 14:08:12 -08:00
Christophe Paquot 9630e40beb Introducing hip::Device which wraps around amd::Context and deviceId
Change-Id: Ie35a6edb65c001b35eb9f5d2af26e765dc41c00e
2020-02-18 17:18:56 -05:00
Christophe Paquot 6462732e49 SWDEV-220533 - HostMapped should use fine grained.
Change-Id: I4ad2064e8e5ea1cd4ed7df143c778ccb685c4f22
2020-02-10 16:53:06 -05:00
Laurent Morichetti 2764aee583 Update copyright info for VDI files
Change-Id: Ib160fbf89ec89a5895321f73402a33b4d344a68f
2020-02-04 08:47:10 -08:00
Laurent Morichetti e604beeef4 Merge HIP/VDI branch 'amd-staging' into lmoriche/amd-master-next
Change-Id: Iabaab4e72815ba483a1330ec6a1130f2b86676f0
2020-01-29 15:02:13 -08:00