From 76aa3fbde4884f1e78341e718d04471e9e80308b Mon Sep 17 00:00:00 2001 From: Julia Jiang Date: Fri, 9 Apr 2021 15:00:09 -0400 Subject: [PATCH] SWDEV-270961 - Update hip_programming_guide.md for event handling Change-Id: Ieadac9972e5ee13c05ccb42a679866f494f96f47 [ROCm/hip commit: 171551ea8a8138122c0b76cda02d3f6ea9cd6d12] --- projects/hip/docs/markdown/hip_programming_guide.md | 12 +++++++----- .../hip/include/hip/amd_detail/hip_runtime_api.h | 7 +++++++ 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/projects/hip/docs/markdown/hip_programming_guide.md b/projects/hip/docs/markdown/hip_programming_guide.md index 617a5a407d..fb02f9b22d 100644 --- a/projects/hip/docs/markdown/hip_programming_guide.md +++ b/projects/hip/docs/markdown/hip_programming_guide.md @@ -20,15 +20,14 @@ ROCm defines two coherency options for host memory: - Coherent memory : Supports fine-grain synchronization while the kernel is running.  For example, a kernel can perform atomic operations that are visible to the host CPU or to other (peer) GPUs.  Synchronization instructions include threadfence_system and C++11-style atomic operations.   However, coherent memory cannot be cached by the GPU and thus may have lower performance. - Non-coherent memory : Can be cached by GPU, but cannot support synchronization while the kernel is running.  Non-coherent memory can be optionally synchronized only at command (end-of-kernel or copy command) boundaries.  This memory is appropriate for high-performance access when fine-grain synchronization is not required. -IP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable: -- hipHostllocCoherent=0, hipHostMallocNonCoherent=0: Use HIP_HOST_COHERENT environment variable: - - If HIP_HOST_COHERENT is 1 or undefined, the host memory allocation is coherent. - - If HIP_HOST_COHERENT is `defined and 0: the host memory allocation is non-coherent. +HIP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable. By default, the environment variable HIP_HOST_COHERENT is set to 0 in HIP. +- hipHostMallocCoherent=0, hipHostMallocNonCoherent=0: Use HIP_HOST_COHERENT environment variable, + - If HIP_HOST_COHERENT is defined as 1, the host memory allocation is coherent. + - If HIP_HOST_COHERENT is not defined, or defined as 0, the host memory allocation is non-coherent. - hipHostMallocCoherent=1, hipHostMallocNonCoherent=0: The host memory allocation will be coherent.  HIP_HOST_COHERENT env variable is ignored. - hipHostMallocCoherent=0, hipHostMallocNonCoherent=1: The host memory allocation will be non-coherent.  HIP_HOST_COHERENT env variable is ignored. - hipHostMallocCoherent=1, hipHostMallocNonCoherent=1: Illegal. - ### Visibility of Zero-Copy Host Memory Coherent host memory is automatically visible at synchronization points. Non-coherent @@ -49,6 +48,9 @@ A stronger system-level fence can be specified when the event is created with hi - hipEventReleaseToSystem : Perform a system-scope release operation when the event is recorded.  This will make both Coherent and Non-Coherent host memory visible to other agents in the system, but may involve heavyweight operations such as cache flushing.  Coherent memory will typically use lighter-weight in-kernel synchronization mechanisms such as an atomic operation and thus does not need to use hipEventReleaseToSystem. - hipEventDisableTiming: Events created with this flag would not record profiling data and provide best performance if used for synchronization. +Note, for HIP Events used in kernel dispatch using hipExtLaunchKernelGGL/hipExtLaunchKernel, events passed in the API are not explicitly recorded and should only be used to get elapsed time for that specific launch. +In case events are used across multiple dispatches, for example, start and stop events from different hipExtLaunchKernelGGL/hipExtLaunchKernel calls, they will be treated as invalid unrecorded events, HIP will throw error "hipErrorInvalidHandle" from hipEventElapsedTime. + ### Summary and Recommendations: - Coherent host memory is the default and is the easiest to use since the memory is visible to the CPU at typical synchronization points. This memory allows in-kernel synchronization commands such as threadfence_system to work transparently. diff --git a/projects/hip/include/hip/amd_detail/hip_runtime_api.h b/projects/hip/include/hip/amd_detail/hip_runtime_api.h index 7739c3b1d0..5cd74f687e 100644 --- a/projects/hip/include/hip/amd_detail/hip_runtime_api.h +++ b/projects/hip/include/hip/amd_detail/hip_runtime_api.h @@ -1524,6 +1524,13 @@ hipError_t hipEventSynchronize(hipEvent_t event); * recorded on one or both events (that is, hipEventQuery() would return #hipErrorNotReady on at * least one of the events), then #hipErrorNotReady is returned. * + * Note, for HIP Events used in kernel dispatch using hipExtLaunchKernelGGL/hipExtLaunchKernel, + * events passed in hipExtLaunchKernelGGL/hipExtLaunchKernel are not explicitly recorded and should + * only be used to get elapsed time for that specific launch. In case events are used across + * multiple dispatches, for example, start and stop events from different hipExtLaunchKernelGGL/ + * hipExtLaunchKernel calls, they will be treated as invalid unrecorded events, HIP will throw + * error "hipErrorInvalidHandle" from hipEventElapsedTime. + * * @see hipEventCreate, hipEventCreateWithFlags, hipEventQuery, hipEventDestroy, hipEventRecord, * hipEventSynchronize */