1. The number of kernels that can use signals are increased to 128
2. The kernel count is now specific to the stream
Change-Id: Ie6d1aa3f437aad8f08c3333fe48bd3f46e551e60
1. The patch uses HIP signal pools to sync between copy and kernel commands
2. The hsa_signal_create is removed
3. Left the redundant enqueueBarrier method just in case
Change-Id: I3dff3e8ee57fff3cd49bec802ff735ed128e5ca1
ihipStream_t::copySync use GPU agent in memory async copy API, even
if the src/dst memory does not belong to GPU, which cause the hsa
runtime to choose a slower copy engine.
SWDEV-95191
Change-Id: If3cab3d493c0c96ed63721cdcf28247a1193887c
_ Use fields from GRID_LAUNCH_20 structure
(See USE_GRID_LAUNCH_20 define, currently set to 0)
"1" will require HCC support.
- Remove old DISABLE_GRID_LAUNCH support.
Change-Id: I584ce648d217251789a6283cf27feb24cb7dc8d1
- Complete translation tables for cudaError <-> hipError_t.
- Remove some odd errors that were not correctly translated or not used.
- Add HIPCHECK_API to test infrastructure. Used for negative testing
an API ; if a mismatch occurs it shows the expected return error
code. Can also print a warning rather than error.
- Enable hipMemoryAllocate on NV system, and review error coded.
- Add hipErrorName to nvcc.
Change-Id: I680427dcf32a5796d5913cf9e7f3b4c6f6b91599
Conflicts:
tests/src/CMakeLists.txt
Bug fixes and improved docs for hipFree and hipHostFree.
- Passing NULL pointer initialized runtime and return hipSuccess
(not an error like before).
- add negative test for this. (hipMemoryAllocate, improved)
- Match NVCC errors for invalid pointers, add to test.
- Update hipFree and hipHostFree docs.
- hipGetDevicePointer always set *devicePointer=NULL, even for
invalid flags.
- Gate shared memory usage on specific HCC work-week.
Change-Id: I533b4fd3280a3d6cdbf05eb768976f0c7506c012
The completion future of a particular kernel is lost if there are
multiple kernels in the stream. This can cause a racing condition where
the signal associated with the unreferenced completion_future might get
released by hcc runtime.
- devicereset would lose track of default stream and thus subsequent
synchronization calls might not actually sychronize.
- Also deviceReset now correctly frees streams.
- fix waits in P2P staging copy - first phase (Device0-to-Staging) must
wait for second phase (Staging to Device1) to finish draining the
buffer.
- add P2P staging buffer copy.
- If copy device does not have sufficient access permissions, fall back
to staging buffer.
- improve docs for which copy device is used.
- set USE_PEER_TO_PEER=3 (requires HCC "am_memtracker_update_peers")
- when enabling peer, turn it on for previously allocated memory.
- hipDeviceCanAccessPeer is no longer self-ware (self does not qualify
as a peer)
- device peerlist always includes self, so when we call allow_access
we never remove self access.
- hipDeviceReset() removes old peer mappings.
- set USE_PEER_TO_PEER=3 (requires HCC "am_memtracker_update_peers")
- when enabling peer, turn it on for previously allocated memory.
- hipDeviceCanAccessPeer is no longer self-ware (self does not qualify
as a peer)
- device peerlist always includes self, so when we call allow_access
we never remove self access.
- hipDeviceReset() removes old peer mappings.
introduce LockedAccessor option so destructor does not unlock.
Allows locks to exist across function boundaries, required
for hipLaunchKernel macro which has several unusual requirements.
(including C comppatibility, must use variadic macro, more).
Move critical data into separate class and protect with LockAccessor
wrapper class.
For device, the streams list is the critical data since it is modified when
streams are created or destroyed. The streams list is accessed in
several places including when synchronizing across all streams on the
device (ie from the default stream).
Other device data is set once by the device cosntructor and is not critical
so
All functions which acquire the LockAccessor now named with "locked_" prefix.