rocm-systems

Автор	SHA1	Сообщение	Дата
pensun	bcbc76470d	Merge branch 'privatestaging' of https://github.com/AMDComputeLibraries/HIP-privatestaging into privatestaging	2016-02-27 04:25:28 -06:00
Aditya Avinash Atluri	9c4819bc29	Merge pull request #4 from AMDComputeLibraries/memtracker hipGetPointerAttrib behavioral changes	2016-02-27 10:51:23 -06:00
Ben Sander	3eb281aeff	disable rocrv2, properly	2016-02-27 03:31:30 -06:00
Aditya Avinash Atluri	2ca6162593	Corrected hipPointerGetAttribute Made hipPointerGetAttribute work same as cudaPointerGetAttribute for HCC	2016-02-26 18:50:40 -06:00
pensun	e21841c152	relsove conflicts	2016-02-26 09:57:40 -06:00
pensun	980ec93f46	fix compiling error	2016-02-26 09:50:00 -06:00
Ben Sander	8105bd636f	fixes for titan platform	2016-02-26 05:25:30 -06:00
Ben Sander	822c7292c9	Disable ROCR_V2	2016-02-26 23:34:45 -06:00
Ben Sander	7a1b4c3878	Merge branch 'memtracker' into privatestaging Conflicts: include/nvcc_detail/hip_runtime_api.h	2016-02-26 06:17:05 -06:00
Ben Sander	4a6173fe58	Merge branch 'privatestaging' of https://github.com/AMDComputeLibraries/HIP-privatestaging into privatestaging	2016-02-26 06:15:09 -06:00
Ben Sander	8d985188dd	Merge branch 'memtracker' of https://github.com/AMDComputeLibraries/HIP-privatestaging into memtracker Conflicts: tests/src/hipMemcpy.cpp	2016-02-25 23:22:51 -06:00
Ben Sander	af97f5e317	Merge branch 'memtracker' into privatestaging Conflicts: src/hip_hcc.cpp	2016-02-25 19:38:46 -06:00
Ben Sander	91ed5c7d78	Improve memory copy and commands switching - Add chicken bits to use host-side dependency management. - Add optional PinInPlace path for unpinned copies - Synchronize before pinned memcpy path. - Add mutex to protect two threads launching to same stream.	2016-02-25 19:19:49 -06:00
Evgeny Mankov	7bb0f17656	Attribute hipDeviceAttributeIsMultiGpuBoard for obtaining Device property isMultiGpuBoard is added. On HIP path property obtaining done through hsa_iterate_agents and counting the devices of HSA_DEVICE_TYPE_GPU type. P.S. On multi-boards systems it might be problems with detection what board a GPU plugged into (not tested).	2016-02-25 23:44:39 +03:00
Ben Sander	836c485d0b	Add tests for multi-threaded streams	2016-02-23 12:08:22 -06:00
Ben Sander	8f98aca124	Sync review. - add calls to ihipInit missing from some routines. - sync before draining a stream.	2016-02-23 04:07:11 -06:00
Ben Sander	28990567fb	Improve async copy implementation. - Add device-side signal waits when transitioning between command classes (Kernel, H2D copy, D2H copy). - Support waiting in staged memory copies as well. - Add several chicken bits to control implementation: - HIP_DISABLE_ENQ_BARRIER - HIP_DISABLE_BIDIR_MEMCPY - HIP_ONESHOT_COPY_DEP - Refactor signal pool to support efficient deallocation based on signsequnm. - Deallocate copy signals on eventSynchronize. - Improve copy tests, add pingpong.	2016-02-22 23:15:24 -06:00
Ben Sander	16b04fc0d3	Merge branch 'memtracker' of https://github.com/AMDComputeLibraries/HIP-privatestaging into memtracker	2016-02-22 08:33:47 -06:00
gargrahul	14508fd0d6	Update for shared atomics support	2016-02-22 16:21:52 +05:30
Ben Sander	d5c777268a	Track last command to a stream. Passing simple tests.	2016-02-20 11:02:07 -06:00
Evgeny Mankov	d4b15399f5	Guard #ifdef USE_ROCR_20 is added for ROCR_20 device properties (memoryClockRate, memoryBusWidth) By default isn't defined. To add ROCR_20 support HIP have to be compiled as follows: make CXX_DEFINES+=-DUSE_ROCR_20	2016-02-19 13:27:03 +03:00
Evgeny Mankov	14ec340746	Formatting, no functional changes.	2016-02-18 18:54:19 +03:00
Evgeny Mankov	da8169dd89	Device property memoryBusWidth implementation. + Device property memoryBusWidth is added to hipDeviceProp_t struct. + Device attribute hipDeviceAttributeMemoryBusWidth is added to hipDeviceAttribute_t struct. + Tests update.	2016-02-18 18:15:01 +03:00
Evgeny Mankov	8aace64dce	Device property memoryClockRate implementation. + Device property memoryClockRate is added to hipDeviceProp_t struct. + Device attribute hipDeviceAttributeMemoryClockRate is added to hipDeviceAttribute_t struct. + Tests update. + Rename hipDevAttrConcurrentKernels to hipDeviceAttributeConcurrentKernels.	2016-02-18 17:25:28 +03:00
Evgeny Mankov	d4bd94e9a0	Attribute hipDevAttrConcurrentKernels for obtaining Device property concurrentKernels is added.	2016-02-18 14:34:18 +03:00
Ben Sander	400dcb8bcb	Enable Tracker and ROCR by default, verify with HCC	2016-02-17 23:03:37 -06:00
Ben Sander	b08e468c06	Remove HIP-local AM tracker (now in HCC)	2016-02-17 21:33:32 -06:00
Ben Sander	354c9f945a	USE_AM_TRACKER=0 works	2016-02-17 21:23:36 -06:00
pensun	c1da0f1e12	1. Bug fix 2. passed initial tests on different sets of HIP_VISIBLE_DEVICES: (0),(1),(0,1),(1,2),(2,3),(1,2,3),(2,3,4),(1,5,2,3) and achieved expected choice of GPU devices at the runtime. 3. Passed HIP test suite.	2016-02-17 09:32:50 -06:00
pensun	43785243a5	Implementation of HIP_VISIBLE_DEVICES in runtime	2016-02-17 06:59:18 -06:00
Ben Sander	0cdbe1ff05	more work on async copies	2016-02-17 00:59:12 -06:00
pensun	7309e9ea6a	modify to add remove invalid devices numbers	2016-02-16 10:00:05 -06:00
pensun	45d863851d	Implement to read HIP_VISIBLE_DEVICES to internal global variable	2016-02-16 07:39:04 -06:00
Ben Sander	5d721a2649	Add per-stream pool for hsa_signals.	2016-02-16 01:59:13 -06:00
Ben Sander	1ed431c0f6	Update before checkin to HCC. Add support for USE_AM_TRACKER=2 (HCC version). Add AM_ALLOC, AM_FREE indirection to ease swapping AM implementations.	2016-02-15 21:16:00 -06:00
Ben Sander	bd7e3b83b9	Move warpSize to header, have shuffles use default warpsize.	2016-02-15 05:41:09 -06:00
Ben Sander	8939b4f0e5	Add multi-threading synchonization on staging buffers and signals. Also pre-allocate a couple signals for copies.	2016-02-13 03:18:01 -06:00
Ben Sander	a002833a89	D2H multi-buffer	2016-02-13 01:15:23 -06:00
Ben Sander	2353cbb028	Improve copy testing	2016-02-12 18:24:08 -06:00
Ben Sander	1128610801	Improve copy testing implementation. - add tests for (unpinned/pinned) x H2H x D2D. - Free memory at end of test.	2016-02-12 18:24:08 -06:00
Ben Sander	90af462b85	Step1 in staging buffer copy. - use StagingBuffer class for copies. - refactor g_device to use array rather than vector. (keeps pointers from moving).	2016-02-12 18:24:08 -06:00
Ben Sander	f464cedcf4	Query tracked memory sizes. Support more accurate hipMemGetInfo. Add test to hipPointerAttrib.	2016-02-12 18:24:08 -06:00
Ben Sander	f2c1bf3bc0	Remove ! USE_PINNED_HOST support	2016-02-12 18:24:08 -06:00
Ben Sander	c04b5d3afb	Use memtracker 'appID' to store deviceID associated with ptr	2016-02-12 18:24:08 -06:00
Ben Sander	7216727fba	Tracker improvements - add API to add / remove user-pointers from the tracker. - test for thread-safety with MultiThreadtest_2 - rapid insertions/removal. - add mutex to provide thread-safety. - rename tracker interface to "memtracker_..." for consistency. - add am_memtracker_reset, connect to hipDeviceReset. -	2016-02-12 18:24:08 -06:00
Ben Sander	721508cc2f	Create address tracker for am_alloc. Tracks device where memory is allocated, pinned-host or device, and more. Uses memory-range-based lookups - so pointers that exist anywhere in the range of hostPtr + size will find the associated AmPointerInfo. The insertions and lookups use a self-balancing binary tree and should support O(logN) lookup speed.	2016-02-12 18:24:08 -06:00
Ben Sander	f1bc9af294	Fix bug in device bounds comparison. Shows up in multi-GPU.	2016-02-12 18:24:08 -06:00
Evgeny Mankov	460b501cbb	Fix typo: maxThreadsPerMultiProcessor -> MaxSharedMemoryPerMultiprocessor Device property MaxSharedMemoryPerMultiprocessor set equal to totalGlobalMem (HIP path). Reason: MaxSharedMemoryPerMultiprocessor should be as the same as group memory size. Group memory will not be paged out, so, the physical memory size = total shared memory size = group region size. NVCC path remains untouched: CUDA's device property MaxSharedMemoryPerMultiprocessor is reported. hipify is updated as well.	2016-02-12 01:29:20 +03:00
Evgeny Mankov	1025341300	Device property maxThreadsPerMultiProcessor set equal to totalGlobalMem (HIP path). Reason: maxThreadsPerMultiProcessor should be as the same as group memory size. Group memory will not be paged out, so, the physical memory size = total shared memory size = group region size. NVCC path remains untouched: CUDA's device property maxThreadsPerMultiProcessor is reported.	2016-02-12 00:04:14 +03:00
Evgeny Mankov	658e9f0484	BDFID (BusID/DeviceID/FunctionID) support. Except FunctionID (or DomainID in CUDA) support, because cudaDeviceProp::pciDomainID is not reported by CUDA.	2016-02-11 22:26:01 +03:00

1 2

53 Коммитов