Граф коммитов

10 Коммитов

Автор SHA1 Сообщение Дата
foreman dc8a3205ce P4 to Git Change 1097200 by gandryey@gera-dev-w7 on 2014/11/14 13:59:46
ECR #304775 - Optimize oclBandwidthTest from nVidia SDK
	- Cache pinned memory, since the benchmark sends the same transfer in a single batch. Thus we could avoid pin/unpin
	- Swap SDMA engine allocation order. Blit manager allocates a queue on device, thus the first app queue was getting the paging second SDMA.

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#112 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#37 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#339 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#121 edit
2014-11-14 14:07:55 -05:00
foreman bfc41a18dd P4 to Git Change 1083967 by gandryey@gera-dev-w7 on 2014/10/03 11:20:24
ECR #304775 - Fix for BUG#10330.
	- Add an optimized version for unaligned buffer copy

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/blitcl.cpp#7 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#111 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsablit.cpp#9 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsablit.cpp#5 edit
2014-10-03 12:04:15 -04:00
foreman b672b6c4da P4 to Git Change 1077444 by gandryey@gera-dev-w7 on 2014/09/16 14:31:35
ECR #304775 - Add capability to enable large allocations >4GB
	- Update the blit kernels to consider a buffer size >4GB

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/blitcl.cpp#4 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#110 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#280 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsablit.cpp#8 edit
... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#214 edit
2014-09-16 14:43:17 -04:00
foreman 5efe63df44 P4 to Git Change 1069927 by skudchad@skudchad_test_win_opencl2 on 2014/08/25 14:51:55
ECR #304775 - Optimization for rectangular copies(Part2). Due to HW restriction of 14bits for src and dst pitch, its advantageous to choose optimal bpp. Higher the bpp the larger the byte pitch. This indirectly helps to reduce the number of packets for buffer copy(line by line vs a single sub_win raw packet)

	ReviewBoardURL = http://ocltc.amd.com/reviews/r/5605/diff/

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#109 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#191 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.hpp#76 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#64 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#38 edit
2014-08-25 15:09:01 -04:00
foreman a5e788c9f8 P4 to Git Change 1067573 by skudchad@skudchad_opencl_win_2 on 2014/08/18 16:38:03
ECR #304775 - Refactor code to do line by line copies for read\write Rect. This avoids taking the blit copy path which may be even slower.

	ReviewBoardURL = http://ocltc.amd.com/reviews/r/5567/

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#108 edit
2014-08-18 16:46:45 -04:00
foreman 1681dd142f P4 to Git Change 1058007 by rili@rili_opencl_stg_01 on 2014/07/22 17:28:41
EPR #399808 - Fixed wrong conversion of sRGBA when using host copy instead of blit kernel transfer

Affected files ...

... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_memobj.cpp#68 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/blit.cpp#3 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/blit.hpp#2 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#107 edit
2014-07-22 17:42:44 -04:00
foreman d2b905f18e P4 to Git Change 1057998 by gandryey@gera-dev-w7 on 2014/07/22 17:15:58
ECR #304775 - Device enqueuing
	- Use atomic fetch for enqueue flags
	- Switch to a multithreaded scheduler
	- Add a workaround for Linux host_multi_queue failures. Linux has only 2 queues, but the test allocates multiple host queues and the same HW ring can be used

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#106 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#449 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#127 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#22 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#325 edit
2014-07-22 17:30:56 -04:00
foreman 1b9e65b27b P4 to Git Change 1057445 by rili@rili_opencl_stg on 2014/07/21 14:11:34
EPR #399808 - Add CL_RGB, CL_UNORM_INT_101010 support

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#105 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#111 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#186 edit
... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#106 edit
2014-07-21 14:27:24 -04:00
foreman 6314b334ba P4 to Git Change 1055054 by gandryey@gera-dev-w7 on 2014/07/14 20:18:53
ECR #304775 - Device enqueuing
	- Switch to the single thread scheduler for now(the current version isn't friendly for single thread). Hopefully it's a temporary solution until synchronization issue with multithreaded scheduler will be identified.

Affected files ...

... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#104 edit
... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#20 edit
2014-07-14 20:24:58 -04:00
foreman 3694ab2ce8 initial commit 2014-07-04 16:17:05 -04:00