rocm-systems

Автор	SHA1	Сообщение	Дата
foreman	dc8a3205ce	P4 to Git Change 1097200 by gandryey@gera-dev-w7 on 2014/11/14 13:59:46 ECR #304775 - Optimize oclBandwidthTest from nVidia SDK - Cache pinned memory, since the benchmark sends the same transfer in a single batch. Thus we could avoid pin/unpin - Swap SDMA engine allocation order. Blit manager allocates a queue on device, thus the first app queue was getting the paging second SDMA. Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#112 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.hpp#37 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#339 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.hpp#121 edit	2014-11-14 14:07:55 -05:00
foreman	bfc41a18dd	P4 to Git Change 1083967 by gandryey@gera-dev-w7 on 2014/10/03 11:20:24 ECR #304775 - Fix for BUG#10330. - Add an optimized version for unaligned buffer copy Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/blitcl.cpp#7 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#111 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsablit.cpp#9 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/hsa_foundation/hsablit.cpp#5 edit	2014-10-03 12:04:15 -04:00
foreman	b672b6c4da	P4 to Git Change 1077444 by gandryey@gera-dev-w7 on 2014/09/16 14:31:35 ECR #304775 - Add capability to enable large allocations >4GB - Update the blit kernels to consider a buffer size >4GB Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/blitcl.cpp#4 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#110 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpusettings.cpp#280 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/hsa/hsablit.cpp#8 edit ... //depot/stg/opencl/drivers/opencl/runtime/utils/flags.hpp#214 edit	2014-09-16 14:43:17 -04:00
foreman	5efe63df44	P4 to Git Change 1069927 by skudchad@skudchad_test_win_opencl2 on 2014/08/25 14:51:55 ECR #304775 - Optimization for rectangular copies(Part2). Due to HW restriction of 14bits for src and dst pitch, its advantageous to choose optimal bpp. Higher the bpp the larger the byte pitch. This indirectly helps to reduce the number of packets for buffer copy(line by line vs a single sub_win raw packet) ReviewBoardURL = http://ocltc.amd.com/reviews/r/5605/diff/ Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#109 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#191 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.hpp#76 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.cpp#64 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gslbe/src/rt/GSLContext.h#38 edit	2014-08-25 15:09:01 -04:00
foreman	a5e788c9f8	P4 to Git Change 1067573 by skudchad@skudchad_opencl_win_2 on 2014/08/18 16:38:03 ECR #304775 - Refactor code to do line by line copies for read\write Rect. This avoids taking the blit copy path which may be even slower. ReviewBoardURL = http://ocltc.amd.com/reviews/r/5567/ Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#108 edit	2014-08-18 16:46:45 -04:00
foreman	1681dd142f	P4 to Git Change 1058007 by rili@rili_opencl_stg_01 on 2014/07/22 17:28:41 EPR #399808 - Fixed wrong conversion of sRGBA when using host copy instead of blit kernel transfer Affected files ... ... //depot/stg/opencl/drivers/opencl/api/opencl/amdocl/cl_memobj.cpp#68 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/blit.cpp#3 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/blit.hpp#2 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#107 edit	2014-07-22 17:42:44 -04:00
foreman	d2b905f18e	P4 to Git Change 1057998 by gandryey@gera-dev-w7 on 2014/07/22 17:15:58 ECR #304775 - Device enqueuing - Use atomic fetch for enqueue flags - Switch to a multithreaded scheduler - Add a workaround for Linux host_multi_queue failures. Linux has only 2 queues, but the test allocates multiple host queues and the same HW ring can be used Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#106 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.cpp#449 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudevice.hpp#127 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#22 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuvirtual.cpp#325 edit	2014-07-22 17:30:56 -04:00
foreman	1b9e65b27b	P4 to Git Change 1057445 by rili@rili_opencl_stg on 2014/07/21 14:11:34 EPR #399808 - Add CL_RGB, CL_UNORM_INT_101010 support Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#105 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpudefs.hpp#111 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuresource.cpp#186 edit ... //depot/stg/opencl/drivers/opencl/runtime/platform/memory.cpp#106 edit	2014-07-21 14:27:24 -04:00
foreman	6314b334ba	P4 to Git Change 1055054 by gandryey@gera-dev-w7 on 2014/07/14 20:18:53 ECR #304775 - Device enqueuing - Switch to the single thread scheduler for now(the current version isn't friendly for single thread). Hopefully it's a temporary solution until synchronization issue with multithreaded scheduler will be identified. Affected files ... ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpublit.cpp#104 edit ... //depot/stg/opencl/drivers/opencl/runtime/device/gpu/gpuschedcl.cpp#20 edit	2014-07-14 20:24:58 -04:00
foreman	3694ab2ce8	initial commit	2014-07-04 16:17:05 -04:00

10 Коммитов