SWDEV-270961 - Update hip_programming_guide.md for event handling

Change-Id: Ieadac9972e5ee13c05ccb42a679866f494f96f47


[ROCm/hip commit: 171551ea8a]
This commit is contained in:
Julia Jiang
2021-04-09 15:00:09 -04:00
zatwierdzone przez Julia Jiang
rodzic 338c21da14
commit 76aa3fbde4
2 zmienionych plików z 14 dodań i 5 usunięć
@@ -20,15 +20,14 @@ ROCm defines two coherency options for host memory:
- Coherent memory : Supports fine-grain synchronization while the kernel is running.  For example, a kernel can perform atomic operations that are visible to the host CPU or to other (peer) GPUs.  Synchronization instructions include threadfence_system and C++11-style atomic operations.   However, coherent memory cannot be cached by the GPU and thus may have lower performance.
- Non-coherent memory : Can be cached by GPU, but cannot support synchronization while the kernel is running.  Non-coherent memory can be optionally synchronized only at command (end-of-kernel or copy command) boundaries.  This memory is appropriate for high-performance access when fine-grain synchronization is not required.
IP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable:
- hipHostllocCoherent=0, hipHostMallocNonCoherent=0: Use HIP_HOST_COHERENT environment variable:
- If HIP_HOST_COHERENT is 1 or undefined, the host memory allocation is coherent.
- If HIP_HOST_COHERENT is `defined and 0: the host memory allocation is non-coherent.
HIP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable. By default, the environment variable HIP_HOST_COHERENT is set to 0 in HIP.
- hipHostMallocCoherent=0, hipHostMallocNonCoherent=0: Use HIP_HOST_COHERENT environment variable,
- If HIP_HOST_COHERENT is defined as 1, the host memory allocation is coherent.
- If HIP_HOST_COHERENT is not defined, or defined as 0, the host memory allocation is non-coherent.
- hipHostMallocCoherent=1, hipHostMallocNonCoherent=0: The host memory allocation will be coherent.  HIP_HOST_COHERENT env variable is ignored.
- hipHostMallocCoherent=0, hipHostMallocNonCoherent=1: The host memory allocation will be non-coherent.  HIP_HOST_COHERENT env variable is ignored.
- hipHostMallocCoherent=1, hipHostMallocNonCoherent=1: Illegal.
### Visibility of Zero-Copy Host Memory
Coherent host memory is automatically visible at synchronization points.
Non-coherent
@@ -49,6 +48,9 @@ A stronger system-level fence can be specified when the event is created with hi
- hipEventReleaseToSystem : Perform a system-scope release operation when the event is recorded.  This will make both Coherent and Non-Coherent host memory visible to other agents in the system, but may involve heavyweight operations such as cache flushing.  Coherent memory will typically use lighter-weight in-kernel synchronization mechanisms such as an atomic operation and thus does not need to use hipEventReleaseToSystem.
- hipEventDisableTiming: Events created with this flag would not record profiling data and provide best performance if used for synchronization.
Note, for HIP Events used in kernel dispatch using hipExtLaunchKernelGGL/hipExtLaunchKernel, events passed in the API are not explicitly recorded and should only be used to get elapsed time for that specific launch.
In case events are used across multiple dispatches, for example, start and stop events from different hipExtLaunchKernelGGL/hipExtLaunchKernel calls, they will be treated as invalid unrecorded events, HIP will throw error "hipErrorInvalidHandle" from hipEventElapsedTime.
### Summary and Recommendations:
- Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points. This memory allows in-kernel synchronization commands such as threadfence_system to work transparently.
@@ -1524,6 +1524,13 @@ hipError_t hipEventSynchronize(hipEvent_t event);
* recorded on one or both events (that is, hipEventQuery() would return #hipErrorNotReady on at
* least one of the events), then #hipErrorNotReady is returned.
*
* Note, for HIP Events used in kernel dispatch using hipExtLaunchKernelGGL/hipExtLaunchKernel,
* events passed in hipExtLaunchKernelGGL/hipExtLaunchKernel are not explicitly recorded and should
* only be used to get elapsed time for that specific launch. In case events are used across
* multiple dispatches, for example, start and stop events from different hipExtLaunchKernelGGL/
* hipExtLaunchKernel calls, they will be treated as invalid unrecorded events, HIP will throw
* error "hipErrorInvalidHandle" from hipEventElapsedTime.
*
* @see hipEventCreate, hipEventCreateWithFlags, hipEventQuery, hipEventDestroy, hipEventRecord,
* hipEventSynchronize
*/