Includes some tricky manipulation of the locks for contexts and streams.
issue is that stealing a stream requires we lock the context to
walk the streams to find a victim. To avoid deadlock, we can't
have a stream locked when we lock the context. This implementation
releases the stream lock, then acquires the context and selects the
victim.
A more stable implemenation might be to copy the stream list
from a context so that a lock is not required to walk all streams.
Smart shared_ptr could be used to prevent the streams from being
deallocated during the walk.
Move HIP_COHERENT_HOST_ALLOC so it is read once at init time.
Add HIP_LAUNCH_BLOCKING_KERNELS, HIP_API_BLOCKING.
Update docs on debug and chicken bits.
Conflicts:
src/hip_hcc.cpp
Prefer use of source-engine for DMA copies, even if user submits copy
in a stream attached to a different device.
The stream is now used only for synchronization, and HIP
makes the most optimal decision for which engine to perform the
copy - typically the source copy engine.
HIP now makes decision on which engine should perform the copy
and passes this to HCC using new apis.
HIP has additional information about peer
visibility and will make a decision which agent should perform
the copy .
Change-Id: I0cf4cfebeae256e6ca795f08a7ed7130f4857d1f
1. Refactored code to use HCC internal APIs rather than HCC copy APIs
2. Added hipMemcpyToSymbolAsync
3. Added test for hipMemcpyToSymbolAsync
4. Added new error hipErrorInvalidSymbol
Change-Id: I0e359b2d0ff5d682bbccdf9c2923e16b35e39497
1. Currently works only for __attribute__((addrspace(1))
2. Need to pass in string for name of the variable
3. Added test to check functionality
Change-Id: I4c3cc1bf151cb5423e4aef59fcc4ad5693b31641
- Minimize time that locks are held.
- Eliminate copy code that locked stream and ctx at same time.
- Stream was locked to ensure thread-safe enqueue to the queue.
- Devices were locked to query peer-lists.
Change-Id: Ibe8880bb7fb995a3da8f90ff911f212d81525018
Remove memory leak with new hc::completion_future.
Implement HIP_LAUNCH_BLOCKING with queue-level wait.
Change-Id: I45975f81c4d239fdeed7776970988d28449865dc
- Bug fix for peer visibility. Now contexts correctly detect when they can use SDMA for P2P vs staging buffers.
- Interface to new HCC copy_ext function.
- Improve context and peer print /debug options.
- Add comments and usage to hipPeerToPeer_simple test.