fd98ca392d
[ROCm/hip commit: 02b84f231d]
50 γραμμές
2.2 KiB
Plaintext
50 γραμμές
2.2 KiB
Plaintext
/** @page Synchonization
|
|
* @tableofcontents
|
|
|
|
* # Host-synchronous behavior:
|
|
The following commands are "host-asynchronous" - meaning they do not wait for any preceding commands to complete, and may return control to the host thread before the requested operation completes:
|
|
|
|
- Kernel launches (hipLaunchKernel() )
|
|
- Asynchronous memory copies - any memory copy API which contains "Async", such as hipMemcpyAsync())
|
|
- Any memory set (for example, hipMemset());
|
|
- TODO
|
|
|
|
"Host-synchronous" commands have the following properties:
|
|
- wait for all previous commands to complete.
|
|
- will not return control back to host until the command completes.
|
|
|
|
|
|
The following commands are "host-synchronous".
|
|
|
|
- hipMemcpy waits for preceding work in the same stream to complete.
|
|
|
|
|
|
* # Stream synchronization
|
|
|
|
|
|
### Blocking
|
|
|
|
The term "blocking" has two meanings in HIP.
|
|
|
|
The first refers to synchronization commands (ie hipStreamSynchronize, hipEventSynchronize) that cause the host CPU to wait for GPU activity to complete.
|
|
These can either use an active where the host CPU spin-waits on the synchronization variable, or can use an interrupt-based scheme where the core is interrupted
|
|
when the wait completes. The second technique is referred to as "blocking" (ie hipDeviceBlockingSync, hipEventBlockingSync) while the first is referred
|
|
to as "active". Active can be appropriate for short tasks where latency is critical, but comes at the expense of a CPU core dedicated to monitoring the event.
|
|
|
|
### HIP_LAUNCH_BLOCKING (also can use CUDA_LAUNCH_BLOCKING)
|
|
|
|
- The following commands become host-synchronous and will not return until the requested command has completed:
|
|
|
|
- Kernel launches (hipKernelLaunch).
|
|
- Memory set commands (hipMemset, hipMemsetAsync).
|
|
- Memory copy commands (hipMemcpy, hipMemsetAsync).
|
|
|
|
Note CUDA_LAUNCH_BLOCKING does add any pre-serialization to the commands and does not affect the concurrent stream behavior. For example,
|
|
even when CUDA_LAUNCH_BLOCKING is set, kernels or data copy commands launched to separate streams can execute concurrently. Use the NULL
|
|
stream if additional stream synchronization is desired.
|
|
|
|
|
|
|
|
|
|
*/
|