Files
rocm-systems/shared/amdgpu-windows-interop/pal/inc/gpuUtil/palRenderOpTraceController.h
T
Scott Todd fa772be675 Reapply amdgpu-windows-interop revert. (#1893)
## Overview and rationale

This reverts https://github.com/ROCm/rocm-systems/pull/1886, which...
* Re-applies https://github.com/ROCm/rocm-systems/pull/1866
* Reverts https://github.com/ROCm/rocm-systems/pull/1728

(So it restores the [`amdgpu-windows-interop/`](https://github.com/ROCm/rocm-systems/tree/develop/shared/amdgpu-windows-interop) folder back to the state from a few weeks ago)

The rationale for this change is at https://github.com/ROCm/rocm-systems/pull/1866:
> Last PAL update broke applications on gfx12 Windows.

## Cross-repository change details

That PR failed to build but was merged with this explanation:

> TheRock CI Windows build fails as expected with this revert.
> 
> References to these PAL members need to be stripped out in a patch on TheRock.
> 
> ```
> 11.3	C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(152): error C2039: 'RegisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\projects\clr\rocclr\device\pal\palubercapturemgr.cpp(195): error C2039: 'UnregisterTraceStateChangeCallback': is not a member of 'GpuUtil::TraceSession'
> 11.4	C:\home\runner\_work\rocm-systems\rocm-systems\shared\amdgpu-windows-interop\pal\inc\gpuUtil\palTraceSession.h(372): note: see declaration of 'GpuUtil::TraceSession'
> ```

The patch in TheRock was updated in https://github.com/ROCm/TheRock/pull/2154. This rolls forward by updating the ref for TheRock.

That original PR could have been sequenced differently to avoid a build break - perhaps by
* Pointing to a branch in TheRock with the patch rebased
* Deleting the patch in the workflows here but holding a local copy of the path to be applied in workflows
* Landing the patch as a normal commit instead of carrying it at all

## Test plan

1. Watch TheRock CI here (https://github.com/ROCm/rocm-systems/actions/runs/19447202693/job/55644411119?pr=1893)
2. Build locally:
    
    ```bash
    # In rocm-systems
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0001-Revert-SWDEV-543498-Some-compute-Ubertrace-profiles-.patch
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0003-Use-is_versioned-true-consistently-in-both-Comgr-Loa.patch
    git am --whitespace=nowarn D:\projects\TheRock\patches\amd-mainline\rocm-systems\0006-Explicitly-load-libamdhip64.so.7.patch
    # Note: the build fails with the observed errors if patch 0001 is not applied!
    
    # In TheRock
    cmake -DCMAKE_BUILD_TYPE=Release \
      -DCMAKE_C_COMPILER=cl.exe -DCMAKE_CXX_COMPILER=cl.exe \
      -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
      -DPython3_EXECUTABLE=d:/projects/TheRock/.venv/Scripts/python \
      -DTHEROCK_ROCM_SYSTEMS_SOURCE_DIR=d:/projects/TheRock/../rocm-systems \  # IMPORTANT
      -DTHEROCK_AMDGPU_FAMILIES=gfx110X-all \
      -DBUILD_TESTING=ON \
      -DTHEROCK_ENABLE_ALL=ON \
      -Damd-llvm_BUILD_TYPE=RelWithDebInfo \
      -S D:/projects/TheRock \
      -B D:/projects/TheRock/build \
      -G Ninja
    
    cmake --build D:/projects/TheRock/build --target hip-clr
    # [build] Build finished with exit code 0
    cmake --build D:/projects/TheRock/build --target ocl-clr+dist
    # [build] Build finished with exit code 0
    ```
2025-11-18 07:17:06 -08:00

151 baris
6.9 KiB
C++

/*
***********************************************************************************************************************
*
* Copyright (c) 2024-2025 Advanced Micro Devices, Inc. All Rights Reserved.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in all
* copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
* SOFTWARE.
*
**********************************************************************************************************************/
#pragma once
#include "palTraceSession.h"
namespace Pal
{
class IPlatform;
class IQueue;
class ICmdBuffer;
class Device;
}
namespace GpuUtil
{
/// Supported render operations used to advance the trace
enum RenderOp : Pal::uint8
{
RenderOpDraw = (1u << 0),
RenderOpDispatch = (1u << 1)
};
/// Structure used to batch submit render operations on queue submission
/// This struct should have a `*Count` field for each @ref RenderOp enumeration above
struct RenderOpCounts
{
Pal::uint32 drawCount;
Pal::uint32 dispatchCount;
};
constexpr Pal::uint32 RenderOpTraceControllerVersion = 4;
constexpr char RenderOpTraceControllerName[] = "renderop";
// =====================================================================================================================
class RenderOpTraceController : public ITraceController
{
public:
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 896
using RenderOp = GpuUtil::RenderOp;
#endif
RenderOpTraceController(Pal::IPlatform* pPlatform, Pal::IDevice* pDevice);
virtual ~RenderOpTraceController();
virtual const char* GetName() const override { return RenderOpTraceControllerName; }
virtual Pal::uint32 GetVersion() const override { return RenderOpTraceControllerVersion; }
virtual void OnConfigUpdated(DevDriver::StructuredValue* pJsonConfig) override;
virtual Pal::Result OnTraceRequested() override;
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION >= 908
virtual Pal::Result OnPreparationGpuWork(Pal::uint32 gpuIndex, Pal::ICmdBuffer** ppCmdBuf) override;
#endif
virtual Pal::Result OnBeginGpuWork(Pal::uint32 gpuIndex, Pal::ICmdBuffer** ppCmdBuffer) override;
virtual Pal::Result OnEndGpuWork(Pal::uint32 gpuIndex, Pal::ICmdBuffer** ppCmdBuffer) override;
virtual Pal::Result OnEndPostambleGpuWork(
Pal::uint32 gpuIndex,
Pal::ICmdBuffer** ppCmdBuffer) override;
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION < 896
void RecordRenderOp(Pal::IQueue* pQueue, RenderOp renderOp);
#endif
void FinishTrace();
// Cancel the trace currently in progress.
virtual Pal::Result OnTraceCanceled() override;
/// This function must be called by client drivers implementing the RenderOp controller.
/// On every queue submission, this function is called with the cumulative counts of render operations
/// recorded into that queue's command buffers.
/// Based on the controller's internal mask, set by the user during trace configuration,
/// the trace controller may advance its state.
void RecordRenderOps(Pal::IQueue* pQueue, const RenderOpCounts& renderOpCounts);
private:
/// Controls whether the trace proceeds on absolute render op counts or relative
enum class CaptureMode : Pal::uint8
{
Relative = 0, ///< Relative to when the trace request is received
Absolute ///< Absolute render op index
};
Pal::Result AcceptTrace();
Pal::Result BeginTrace();
Pal::Result SubmitBeginTraceGpuWork() const;
Pal::Result SubmitEndTraceGpuWork();
Pal::Result SubmitEndPostambleGpuWork();
Pal::Result WaitForTraceEndGpuWorkCompletion() const;
Pal::Result CreateFence(Pal::IFence** ppFence) const;
Pal::Result CreateCommandBuffer(bool traceEnd, Pal::ICmdBuffer** ppCmdBuf) const;
Pal::Result CreateCmdAllocator();
void OnRenderOpUpdated(Pal::uint64 countRecorded);
void FreeResources();
void AbortTrace();
Pal::IPlatform* const m_pPlatform; // Platform associated with this TraceController
Pal::IDevice* m_pDevice; // Device associated with this TraceController
Pal::ICmdAllocator* m_pCmdAllocator; // Command allocator for the TraceController
TraceSession* m_pTraceSession; // TraceSession owning this TraceController
Pal::uint64 m_supportedGpuMask; // Bit mask of GPU indices that are capable of participating in the trace
Pal::uint8 m_renderOpMask; // Bitmask of RenderOp modes, indicating which are accepted
CaptureMode m_captureMode; // Modality for determining the starting renderop index of the trace
Pal::uint64 m_renderOpCount; // The "global" count, incremented on every render op
Pal::uint64 m_prepStartRenderOp; // Relative or absolute render op number indicating trace begin
Pal::uint64 m_numPrepRenderOps; // Number of "warm-up" frames before the start frame
Pal::uint64 m_captureRenderOpCount; // Number of frames to wait before ending the trace
Pal::uint64 m_renderOpTraceAccepted; // The frame number when the trace was accepted
Util::Mutex m_renderOpLock; // Lock over UpdateFrame/OnFrameUpdated
Pal::IQueue* m_pQueue; // The queue being used to submit Begin/End GPU trace command buffers
#if PAL_CLIENT_INTERFACE_MAJOR_VERSION >= 908
Pal::ICmdBuffer* m_pCmdBufTracePrepare; // Command buffer for recording during the prep phase
#endif
Pal::ICmdBuffer* m_pCmdBufTraceBegin; // Command buffer to submit Trace Begin
Pal::ICmdBuffer* m_pCmdBufTraceEnd; // Command buffer to submit Trace End
Pal::ICmdBuffer* m_pCmdBufPostambleEnd; // Command buffer to submit Postamble End
Pal::IFence* m_pFenceTraceEnd; // Fence to wait for Trace End command buffer completion
Pal::IFence* m_pFencePostambleEnd; // Fence to wait for Postamble End command buffer completion
};
} // namespace GpuUtil