Wykres commitów

28 Commity

Autor SHA1 Wiadomość Data
Ben Sander 8b64c0dc62 Improve memory copy and commands switching
- Add chicken bits to use host-side dependency management.
- Add optional PinInPlace path for unpinned copies
- Synchronize before pinned memcpy path.
- Add mutex to protect two threads launching to same stream.
2016-02-25 19:19:49 -06:00
Ben Sander 3886d494f4 Sync review.
- add calls to ihipInit missing from some routines.
- sync before draining a stream.
2016-02-23 04:07:11 -06:00
Ben Sander 549b18ce77 Improve async copy implementation.
- Add device-side signal waits when transitioning between command classes
(Kernel, H2D copy, D2H copy).
- Support waiting in staged memory copies as well.
- Add several chicken bits to control implementation:
    - HIP_DISABLE_ENQ_BARRIER
    - HIP_DISABLE_BIDIR_MEMCPY
    - HIP_ONESHOT_COPY_DEP
- Refactor signal pool to support efficient deallocation based on
signsequnm.
- Deallocate copy signals on eventSynchronize.
- Improve copy tests, add pingpong.
2016-02-22 23:15:24 -06:00
Ben Sander d33d806a5b Track last command to a stream.
Passing simple tests.
2016-02-20 11:02:07 -06:00
Ben Sander c6f8883b0d Enable Tracker and ROCR by default, verify with HCC 2016-02-17 23:03:37 -06:00
Ben Sander d653782d9d Remove HIP-local AM tracker (now in HCC) 2016-02-17 21:33:32 -06:00
Ben Sander 44f40e171a USE_AM_TRACKER=0 works 2016-02-17 21:23:36 -06:00
Ben Sander 59379ffb44 more work on async copies 2016-02-17 00:59:12 -06:00
Ben Sander caef9b5ced Add per-stream pool for hsa_signals. 2016-02-16 01:59:13 -06:00
Ben Sander 38c735fd1d Update before checkin to HCC.
Add support for USE_AM_TRACKER=2 (HCC version).
Add AM_ALLOC, AM_FREE indirection to ease swapping AM implementations.
2016-02-15 21:16:00 -06:00
Ben Sander db3a63360b Move warpSize to header, have shuffles use default warpsize. 2016-02-15 05:41:09 -06:00
Ben Sander 6420655dc8 Add multi-threading synchonization on staging buffers and signals.
Also pre-allocate a couple signals for copies.
2016-02-13 03:18:01 -06:00
Ben Sander b314777bc1 D2H multi-buffer 2016-02-13 01:15:23 -06:00
Ben Sander 1bfd3cdbd0 Improve copy testing 2016-02-12 18:24:08 -06:00
Ben Sander 134d7975ce Improve copy testing implementation.
- add tests for (unpinned/pinned) x H2H x D2D.
- Free memory at end of test.
2016-02-12 18:24:08 -06:00
Ben Sander 24c1fdb864 Step1 in staging buffer copy.
- use StagingBuffer class for copies.
- refactor g_device to use array rather than vector.
   (keeps pointers from moving).
2016-02-12 18:24:08 -06:00
Ben Sander d7396b5af3 Query tracked memory sizes.
Support more accurate hipMemGetInfo.  Add test to hipPointerAttrib.
2016-02-12 18:24:08 -06:00
Ben Sander 0370cd1cfc Remove ! USE_PINNED_HOST support 2016-02-12 18:24:08 -06:00
Ben Sander 00fd172c64 Use memtracker 'appID' to store deviceID associated with ptr 2016-02-12 18:24:08 -06:00
Ben Sander de45e2291e Tracker improvements
- add API to add / remove user-pointers from the tracker.
- test for thread-safety with MultiThreadtest_2 - rapid
  insertions/removal.
- add mutex to provide thread-safety.
- rename tracker interface to "memtracker_..." for consistency.
- add am_memtracker_reset, connect to hipDeviceReset.
-
2016-02-12 18:24:08 -06:00
Ben Sander 4ee2a5229b Create address tracker for am_alloc.
Tracks device where memory is allocated, pinned-host or device, and
more.

Uses memory-range-based lookups - so pointers that exist anywhere in

the range of hostPtr + size will find the associated AmPointerInfo.

The insertions and lookups use a self-balancing binary tree and
should support O(logN) lookup speed.
2016-02-12 18:24:08 -06:00
Ben Sander e483eea85b Fix bug in device bounds comparison.
Shows up in multi-GPU.
2016-02-12 18:24:08 -06:00
Evgeny Mankov ea8f99702d Fix typo: maxThreadsPerMultiProcessor -> MaxSharedMemoryPerMultiprocessor
Device property MaxSharedMemoryPerMultiprocessor set equal to totalGlobalMem (HIP path).
Reason: MaxSharedMemoryPerMultiprocessor should be as the same as group memory size. Group memory will not be paged out, so, the physical memory size = total shared memory size = group region size. NVCC path remains untouched: CUDA's device property MaxSharedMemoryPerMultiprocessor is reported.

hipify is updated as well.
2016-02-12 01:29:20 +03:00
Evgeny Mankov 9f05a52c74 Device property maxThreadsPerMultiProcessor set equal to totalGlobalMem (HIP path).
Reason: maxThreadsPerMultiProcessor should be as the same as group memory size. Group memory will not be paged out, so, the physical memory size = total shared memory size = group region size.

NVCC path remains untouched: CUDA's device property maxThreadsPerMultiProcessor is reported.
2016-02-12 00:04:14 +03:00
Evgeny Mankov 33f60c300d BDFID (BusID/DeviceID/FunctionID) support.
Except FunctionID (or DomainID in CUDA) support, because cudaDeviceProp::pciDomainID is not reported by CUDA.
2016-02-11 22:26:01 +03:00
Evgeny Mankov 950c3baacd Device property concurrentKernels is added to hipDeviceProp_t struct.
For HCC path concurrentKernels is set to true since all ROCR hardware supports this feature.
For NVCC path concurrentKernels is obtained from CUDA's device property cudaDeviceProp::concurrentKernels.
2016-02-09 17:10:35 +03:00
Sam Kolton 0a27507208 Implementation of hipDeviceGetAttribute() 2016-02-04 17:39:27 +03:00
Ben Sander f38e63ff18 Initial commit for GPUOpen Launch 2016-01-26 20:14:33 -06:00