Update ROCTracer README for the GitHub link (#1745)
* Update README for the GitHub link * Updating links to rocm-systems
このコミットが含まれているのは:
@@ -339,4 +339,4 @@ Here is the general sequence of events when a code object is loaded and unloaded
|
||||
all buffers that might contain references to that code object or kernel symbol identifier before
|
||||
deleting the associated data.
|
||||
|
||||
For a sample of code object tracing, see `samples/code_object_tracing <https://github.com/ROCm/rocprofiler-sdk/tree/amd-mainline/samples/code_object_tracing>`_.
|
||||
For a sample of code object tracing, see `samples/code_object_tracing <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk/samples/code_object_tracing>`_.
|
||||
|
||||
@@ -13,7 +13,7 @@ There are two modes of counter collection service:
|
||||
|
||||
- **Device counting**: In this mode, counters are collected on a device level. This mode is useful for collecting device level counters not tied to a specific kernel execution, which encompasses collecting counter values for a specific time range.
|
||||
|
||||
This topic explains how to setup dispatch and device counting and use common counter collection APIs. For details on the APIs including the less commonly used counter collection APIs, see the API library. For fully functional examples of both dispatch and device counting, see `Samples <https://github.com/ROCm/rocprofiler-sdk/tree/amd-mainline/samples>`_.
|
||||
This topic explains how to setup dispatch and device counting and use common counter collection APIs. For details on the APIs including the less commonly used counter collection APIs, see the API library. For fully functional examples of both dispatch and device counting, see `Samples <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk/samples>`_.
|
||||
|
||||
Definitions
|
||||
-----------
|
||||
|
||||
@@ -96,4 +96,4 @@ Dispatch table chaining
|
||||
ROCprofiler-SDK can save the original values of the function pointers such as ``foo_fn`` in ``impl::construct_dispatch_table()`` and install its own function pointers in its place. This results in the public C API function ``foo`` calling into the ROCprofiler-SDK function pointer, which in turn, calls the original function pointer to ``impl::foo``. This phenomenon is named chaining. Once ROCprofiler-SDK
|
||||
makes necessary modifications to the dispatch table, tools requesting access to the raw dispatch table via ``rocprofiler_at_intercept_table_registration`` are provided the pointer to the dispatch table.
|
||||
|
||||
For examples on dispatch table chaining, see `samples/intercept_table <https://github.com/ROCm/rocprofiler-sdk/tree/amd-staging/samples/intercept_table>`_.
|
||||
For examples on dispatch table chaining, see `samples/intercept_table <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk/samples/intercept_table>`_.
|
||||
|
||||
@@ -22,7 +22,7 @@ Program Counter (PC) sampling is a profiling method that uses statistical approx
|
||||
ROCprofiler-SDK PC sampling service
|
||||
------------------------------------
|
||||
|
||||
This section describes how to use ROCProfiler-SDK PC sampling API to configure and use PC sampling service. For fully functional examples, see `Samples <https://github.com/ROCm/rocprofiler-sdk/tree/amd-mainline/samples>`_.
|
||||
This section describes how to use ROCProfiler-SDK PC sampling API to configure and use PC sampling service. For fully functional examples, see `Samples <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk/samples>`_.
|
||||
|
||||
tool_init() setup
|
||||
++++++++++++++++++
|
||||
@@ -132,7 +132,7 @@ Configure the PC sampling service on an agent with ``agent_id`` to generate samp
|
||||
|
||||
.. note::
|
||||
|
||||
Multiple processes can share the same GPU agent simultaneously, so the following A->B->A problem is possible on shared systems. For example, process A can query available configurations and opt to configure the service with configuration CA. However, if process B manages to finish configuring the service with configuration CB, then process A will fail. Thus, it is advisable for process A to repeat the querying process to observe configuration CB and reuse it for configuring the PC sampling service. For more details, refer to the `Samples <https://github.com/ROCm/rocprofiler-sdk/tree/amd-mainline/samples>`_.
|
||||
Multiple processes can share the same GPU agent simultaneously, so the following A->B->A problem is possible on shared systems. For example, process A can query available configurations and opt to configure the service with configuration CA. However, if process B manages to finish configuring the service with configuration CB, then process A will fail. Thus, it is advisable for process A to repeat the querying process to observe configuration CB and reuse it for configuring the PC sampling service. For more details, refer to the `Samples <https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk/samples>`_.
|
||||
|
||||
Processing PC samples
|
||||
----------------------
|
||||
@@ -170,7 +170,7 @@ The PC sampling service asynchronously delivers samples via a dedicated callback
|
||||
}
|
||||
}
|
||||
|
||||
For more information on the data comprising a single sample, see `pc_sampling.h <https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/source/include/rocprofiler-sdk/pc_sampling.h>`_.
|
||||
For more information on the data comprising a single sample, see `pc_sampling.h <https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-sdk/source/include/rocprofiler-sdk/pc_sampling.h>`_.
|
||||
|
||||
.. note::
|
||||
A user can synchronously flush buffers via ``rocprofiler_buffer_flush`` that triggers ``pc_sampling_callback``.
|
||||
|
||||
@@ -346,12 +346,12 @@ The Trace Decoder provides important information about the quality and comprehen
|
||||
|
||||
For more information about the data structures and functions available for thread trace decoding, see the following headers:
|
||||
|
||||
- `trace_decoder.h <https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/source/include/rocprofiler-sdk/experimental/thread-trace/trace_decoder.h>`_
|
||||
- `trace_decoder.h <https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-sdk/source/include/rocprofiler-sdk/experimental/thread-trace/trace_decoder.h>`_
|
||||
|
||||
- `trace_decoder_types.h <https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/source/include/rocprofiler-sdk/experimental/thread-trace/trace_decoder_types.h>`_
|
||||
- `trace_decoder_types.h <https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-sdk/source/include/rocprofiler-sdk/experimental/thread-trace/trace_decoder_types.h>`_
|
||||
|
||||
- `core.h <https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/source/include/rocprofiler-sdk/experimental/thread-trace/core.h>`_
|
||||
- `core.h <https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-sdk/source/include/rocprofiler-sdk/experimental/thread-trace/core.h>`_
|
||||
|
||||
- `dispatch.h <https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/source/include/rocprofiler-sdk/experimental/thread-trace/dispatch.h>`_
|
||||
- `dispatch.h <https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-sdk/source/include/rocprofiler-sdk/experimental/thread-trace/dispatch.h>`_
|
||||
|
||||
- `agent.h <https://github.com/ROCm/rocprofiler-sdk/blob/amd-mainline/source/include/rocprofiler-sdk/experimental/thread-trace/agent.h>`_
|
||||
- `agent.h <https://github.com/ROCm/rocm-systems/blob/develop/projects/rocprofiler-sdk/source/include/rocprofiler-sdk/experimental/thread-trace/agent.h>`_
|
||||
|
||||
@@ -39,7 +39,7 @@ Prerequisites
|
||||
|
||||
* ROCm 7.x build, or
|
||||
|
||||
* Early release can be `built from source <https://github.com/rocm/aqlprofile>`_
|
||||
* Early release can be `built from source <https://github.com/ROCm/rocm-systems/tree/develop/projects/aqlprofile>`_
|
||||
|
||||
* Otherwise, ``rocprofv3`` throws error "INVALID_SHADER_DATA" or "Agent not supported".
|
||||
|
||||
@@ -48,13 +48,13 @@ Prerequisites
|
||||
* For binary files, see `ROCprof trace decoder release page <https://github.com/ROCm/rocprof-trace-decoder/releases>`_.
|
||||
|
||||
* Default install location is ``/opt/rocm/lib``
|
||||
|
||||
|
||||
* For custom location, use:
|
||||
|
||||
* Parameter ``--att-library-path``, or
|
||||
|
||||
* Environment variable ``ROCPROF_ATT_LIBRARY_PATH``
|
||||
|
||||
|
||||
|
||||
.. _thread-trace-parameters:
|
||||
|
||||
@@ -163,7 +163,7 @@ If the subsequent kernels are targeted kernels, the profiler will then profile a
|
||||
new targeted kernel, so it is possible for a generated ATT file to have more than ``n`` kernels profiled.
|
||||
All the profiled kernels are then compiled into a single ATT file.
|
||||
If a new targeted kernel is encountered after the ``rocprofv3`` tool has finished profiling a batch of kernels,
|
||||
the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels.
|
||||
the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels.
|
||||
|
||||
.. _output-files:
|
||||
|
||||
@@ -175,17 +175,17 @@ After the application finishes executing, ROCprof Trace Decoder runs automatical
|
||||
- stats_*.csv files:
|
||||
|
||||
* Contains a summary of instruction latency per kernel.
|
||||
|
||||
|
||||
- ui_output_agent_{agent_id}_dispatch_{dispatch_id} directory:
|
||||
|
||||
|
||||
* Contains detailed tracing information in the form of .json files.
|
||||
|
||||
|
||||
* This directory can be opened using the `ROCprof Compute Viewer <https://rocm.docs.amd.com/projects/rocprof-compute-viewer/en/amd-mainline/>`_.
|
||||
|
||||
- Raw files:
|
||||
|
||||
* .att - Raw SQTT data. Can be used with the ROCprof Trace Decoder for further analysis.
|
||||
|
||||
|
||||
* .out - Code object binaries (executable). Can be used with ISA analysis tools.
|
||||
|
||||
.. _csv-content:
|
||||
@@ -217,28 +217,28 @@ The columns of the stats_*.csv file are described here:
|
||||
|
||||
* **Latency:** Total latency in cycles, defined as "Stall time + Issue time" for gfx9 or "Stall time + Execute time" for gfx10+.
|
||||
|
||||
* **Stall:** The total number of cycles the hardware pipe couldn't issue an instruction.
|
||||
* **Stall:** The total number of cycles the hardware pipe couldn't issue an instruction.
|
||||
|
||||
* Usually caused when the hardware unit is busy, such as TCP or LDS backpressure.
|
||||
|
||||
|
||||
* **Idle:** The total time gap between the completion of previous instruction and the beginning of the current instruction. The idle time can be caused by:
|
||||
|
||||
* Arbiter loss
|
||||
|
||||
|
||||
* Source or destination register dependency
|
||||
|
||||
|
||||
* Instruction cache miss
|
||||
|
||||
|
||||
* **Source:** The original source line of code assigned by the compiler.
|
||||
|
||||
* Requires compiling with debug symbols.
|
||||
|
||||
|
||||
|
||||
Troubleshooting
|
||||
===============
|
||||
|
||||
For some applications, stats_*.csv file could be empty even for a valid kernel dispatch.
|
||||
Thread trace is limited to a single CU per SE (``att-target-cu``). If a kernel dispatch doesn't launch enough waves to populate the whole GPU, there's a possibility of no wave getting assigned to the ``target_cu``. In such cases, there's nothing to be traced.
|
||||
Thread trace is limited to a single CU per SE (``att-target-cu``). If a kernel dispatch doesn't launch enough waves to populate the whole GPU, there's a possibility of no wave getting assigned to the ``target_cu``. In such cases, there's nothing to be traced.
|
||||
Here are some options to handle this:
|
||||
|
||||
* Launch more waves.
|
||||
@@ -248,10 +248,10 @@ Here are some options to handle this:
|
||||
* Set the ``--att-shader-engine-mask`` to 0x11111111, or possibly to 0xFFFFFFFF
|
||||
|
||||
* A number too high can cause packet losses and/or lead to a full buffer.
|
||||
|
||||
|
||||
* Set the ``HSA_CU_MASK`` to mask out all CUs but the target. For more details, see `setting CUs <https://rocm.docs.amd.com/en/latest/how-to/setting-cus.html>`_.
|
||||
|
||||
* If only the ``target_cu`` (or a few CUs) are not masked out, then all or most waves will be assigned to the ``target_cu``.
|
||||
|
||||
|
||||
* This can potentially cause low performance in high-demanding kernels.
|
||||
|
||||
|
||||
|
||||
@@ -18,10 +18,10 @@ You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing
|
||||
The code is open source and hosted at `<https://github.com/ROCm/rocm-systems/tree/develop/projects/rocprofiler-sdk>`_.
|
||||
|
||||
.. note::
|
||||
|
||||
|
||||
The ROCprofiler-SDK repository for ROCm 7.0 and earlier is located at `<https://github.com/ROCm/rocprofiler-sdk>`_.
|
||||
|
||||
ROCprofiler-SDK uses a companion library called `AQLprofile <https://rocm.docs.amd.com/projects/aqlprofile/en/latest/index.html>`__ that generates profiling command packets (AQL/PM4) for performance counters and SQ thread trace. See the `AQLprofile docs <https://rocm.docs.amd.com/projects/aqlprofile/en/latest/index.html>`__ for more info.
|
||||
ROCprofiler-SDK uses a companion library called `AQLprofile <https://rocm.docs.amd.com/projects/aqlprofile/en/latest/index.html>`_, that generates profiling command packets (AQL/PM4) for performance counters and SQ thread trace. For details, see the `AQLprofile docs <https://rocm.docs.amd.com/projects/aqlprofile/en/latest/index.html>`_.
|
||||
|
||||
The documentation is structured as follows:
|
||||
|
||||
|
||||
@@ -61,11 +61,11 @@ To use the rocTX API you need the API header and to link your application with `
|
||||
`rocTracer` library public API header.
|
||||
|
||||
- `roctx.h`
|
||||
|
||||
|
||||
`rocTX` library public API header.
|
||||
|
||||
- `src`
|
||||
|
||||
|
||||
Library sources.
|
||||
|
||||
- `core`
|
||||
@@ -138,20 +138,20 @@ To use the rocTX API you need the API header and to link your application with `
|
||||
- Clone development branch of `roctracer`:
|
||||
|
||||
```sh
|
||||
git clone -b amd-master https://github.com/ROCm-Developer-Tools/roctracer
|
||||
git clone -b develop https://github.com/ROCm/rocm-systems.git
|
||||
```
|
||||
|
||||
- To build `roctracer` library:
|
||||
|
||||
```sh
|
||||
cd <your path>/roctracer
|
||||
cd <your path>/rocm-systems/projects/roctracer
|
||||
./build.sh
|
||||
```
|
||||
|
||||
- To build and run test:
|
||||
|
||||
```sh
|
||||
cd <your path>/roctracer/build
|
||||
cd <your path>/rocm-systems/projects/roctracer/build
|
||||
make mytest
|
||||
run.sh
|
||||
```
|
||||
|
||||
新しいイシューから参照
ユーザーをブロックする