946385d0ff
Reverts #1379 and properly migrates the docs --------- Co-authored-by: Matt Williams <matt.williams@amd.com>
61 строка
3.1 KiB
ReStructuredText
61 строка
3.1 KiB
ReStructuredText
.. meta::
|
|
:description: AQLprofile is an open source library that enables advanced GPU profiling and tracing on AMD platforms.
|
|
:keywords: AQLprofile, ROCm, tool, Instinct, accelerator, AMD
|
|
|
|
What is AQLprofile?
|
|
===================
|
|
|
|
The Architected Queuing Language profiling library (AQLprofile) is an
|
|
open source library that enables advanced GPU profiling and tracing on
|
|
AMD platforms. It works in conjunction with
|
|
`ROCprofiler-SDK <https://github.com/ROCm/rocprofiler-sdk>`__ to
|
|
support profiling methods such as `performance counters
|
|
(PMC) <https://rocm.docs.amd.com/projects/aqlprofile/en/latest/examples/pmc-workflow.html>`__ and `SQ thread trace
|
|
(SQTT) <https://rocm.docs.amd.com/projects/aqlprofile/en/latest/examples/sqtt-workflow.html>`__. AQLprofile provides the
|
|
foundational mechanisms for constructing AQL packets and managing
|
|
profiling operations across multiple AMD GPU architecture families. The
|
|
development of AQLprofile is aligned with ROCprofiler-SDK, ensuring
|
|
compatibility and feature support for new GPU architectures and
|
|
profiling requirements.
|
|
|
|
AQLprofile builds on concepts from the Heterogeneous System Architecture
|
|
(HSA) and the AQL, which define the foundations for GPU command
|
|
processing and profiling on AMD platforms. For more information, see:
|
|
|
|
- `HSA Platform System Architecture
|
|
Specification <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-SysArch-1.2.pdf>`__
|
|
- `HSA Runtime Programmer's Reference
|
|
Specification <http://hsafoundation.com/wp-content/uploads/2021/02/HSA-Runtime-1.2.pdf>`__
|
|
|
|
Features
|
|
--------
|
|
|
|
- Profiling AQL packets for GPU workloads.
|
|
- Performance counters and SQ thread traces.
|
|
- Support for GFX9, GFX10XX, GFX11XX, and GFX12XX architecture families.
|
|
- Verbose tracing and error logging capabilities.
|
|
- Thread trace binary data generated by AQLprofile can be decoded using
|
|
`rocprof-trace-decoder <https://github.com/ROCm/rocprof-trace-decoder/releases>`__.
|
|
|
|
Who should use this library?
|
|
----------------------------
|
|
|
|
- **End users**: If you want to profile AMD GPUs, use
|
|
`ROCprofiler-SDK <https://github.com/ROCm/rocprofiler-sdk>`__ or
|
|
tools that depend on it. You do *not* need to use AQLprofile
|
|
directly.
|
|
- **Developers/integrators**: If you're building profiling tools,
|
|
custom workflows, or need to extend profiling capabilities, you may
|
|
use AQLprofile directly as a backend.
|
|
|
|
How does AQLprofile fit into the ROCm profiling stack?
|
|
------------------------------------------------------
|
|
|
|
Here's the typical workflow:
|
|
|
|
Application → ROCprofiler-SDK ⇄ **AQLprofile** ⇄ ROCprofiler-SDK → HSA/ROCR/KFD → AMD GPU hardware
|
|
|
|
- **AQLprofile** generates profiling command packets (AQL/PM4) tailored to the GPU architecture. It doesn't interact with hardware or drivers directly. It only produces the packets and buffer requirements requested by ``ROCprofiler-SDK``.
|
|
|
|
- **ROCprofiler-SDK** provides a higher-level API and user-facing tools, using AQLprofile internally. It manages profiling sessions, submits packets to the GPU via `ROCr <https://rocm.docs.amd.com/projects/rocr_debug_agent/en/latest/index.html>`_/HSA/KFD, and collects results.
|