Adding ROCTx usage doc (#159)
* Adding Roctx usage doc * updated CHANGELOG * dpc update * Fixing Related Pages issue * updating doc * updating docs * Adding Resource naming section * Fixed Formatting * format fix * format fix * Fixing build due to incorrect indentation
This commit is contained in:
@@ -153,6 +153,7 @@ Full documentation for ROCprofiler-SDK is available at [rocm.docs.amd.com/projec
|
||||
- `--collection-period-unit` feature added in rocprofv3, to allow the user to control time units used in collection period option.
|
||||
- Added deprecation notice for rocprofiler(v1) and rocprofiler(v2).
|
||||
- Added support for rocDecode API Tracing
|
||||
- Added usage documentation for ROCTx
|
||||
|
||||
### Changed
|
||||
|
||||
|
||||
@@ -12,6 +12,7 @@ subtrees:
|
||||
- caption: How to
|
||||
entries:
|
||||
- file: how-to/using-rocprofv3
|
||||
- file: how-to/using-rocprofiler-sdk-roctx
|
||||
- file: how-to/samples
|
||||
- caption: API reference
|
||||
entries:
|
||||
|
||||
@@ -0,0 +1,182 @@
|
||||
.. meta::
|
||||
:description: Documentation for the usage of rocprofiler-sdk-roctx library
|
||||
:keywords: ROCprofiler-SDK tool, using-rocprofiler-sdk-roctx library, roctx, markers, ranges, rocprofv3, rocprofv3 tool usage, Using rocprofv3, ROCprofiler-SDK command line tool, marker-trace
|
||||
|
||||
.. _using-rocprofiler-sdk-roctx:
|
||||
|
||||
=============================================
|
||||
Using ``ROCTx`` (AMD Tools Extension Library)
|
||||
=============================================
|
||||
|
||||
``ROCtx`` is AMD's cross platform API for annotating code with markers and ranges. The ``ROCTx`` API is written in C++.
|
||||
In certain situations, such as debugging performance issues in large-scale GPU programs, API-level tracing might be too fine-grained to provide a big picture of the program execution.
|
||||
In such cases, it is helpful to define specific tasks to be traced.To specify the tasks for tracing, enclose the respective source code with the API calls provided by the ``ROCTx`` library.
|
||||
This process is also known as instrumentation.
|
||||
|
||||
What kinds of annotation does ``ROCTx`` provide?
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
``ROCTx`` provides two types of annotations: markers and ranges.
|
||||
|
||||
Markers:
|
||||
========
|
||||
Helps you inserts a marker in the code with a message. Creating markers help you see when a line of code is executed.
|
||||
|
||||
Ranges:
|
||||
=======
|
||||
The scope of code for instrumentation is defined using the enclosing API calls, it is called a range.
|
||||
A range is a programmer-defined task that has a well-defined start and end code scope.
|
||||
You can also refine the scope specified within a range using further nested ranges. ``rocprofv3`` also reports the timelines for these nested ranges.
|
||||
There are two types of ranges:
|
||||
|
||||
1. **Push/Pop ranges**:
|
||||
- These can be nested to form a stack.
|
||||
- The Pop call is automatically associated with a prior Push call on the same thread.
|
||||
|
||||
2. **Start/End ranges**:
|
||||
- These may overlap with other ranges arbitrarily.
|
||||
- The Start call returns a handle which must be passed to the End call.
|
||||
- These ranges can start and end on different threads.
|
||||
|
||||
List of APIs supported by ``ROCTx``
|
||||
===================================
|
||||
Here is a list of useful APIs for code instrumentation.
|
||||
|
||||
- ``roctxMark``: Inserts a marker in the code with a message. Creating marks help you see when a line of code is executed.
|
||||
- ``roctxRangeStart``: Starts a range. Different threads can start ranges.
|
||||
- ``roctxRangePush``: Starts a new nested range.
|
||||
- ``roctxRangePop``: Stops the current nested range.
|
||||
- ``roctxRangeStop``: Stops the given range.
|
||||
- ``roctxProfilerPause``: Request any currently running profiling tool that it should stop collecting data.
|
||||
- ``roctxProfilerResume``: Request any currently running profiling tool that it should resume collecting data.
|
||||
- ``roctxGetThreadId``: Retrieve a id value for the current thread which will be identical to the id value a profiling tool gets via `rocprofiler_get_thread_id(rocprofiler_thread_id_t*)`.
|
||||
- ``roctxNameOsThread``: Current CPU OS thread to be labeled by the provided name in the output of the profiling tool.
|
||||
- ``roctxNameHsaAgent``: Given HSA agent to be labeled by the provided name in the output of the profiling tool.
|
||||
- ``roctxNameHipDevice``: Given HIP device id to be labeled by the provided name in the output of the profiling tool.
|
||||
- ``roctxNameHipStream``: Given HIP stream to be labeled by the provided name in the output of the profiling tool.
|
||||
|
||||
|
||||
How to use ``ROCTx`` in your application?
|
||||
===========================================
|
||||
|
||||
See how to use ``ROCTx`` APIs in the MatrixTranspose application below:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
roctxMark("before hipLaunchKernel");
|
||||
int rangeId = roctxRangeStart("hipLaunchKernel range");
|
||||
roctxRangePush("hipLaunchKernel");
|
||||
|
||||
// Launching kernel from host
|
||||
hipLaunchKernelGGL(matrixTranspose, dim3(WIDTH/THREADS_PER_BLOCK_X, WIDTH/THREADS_PER_BLOCK_Y), dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0,0,gpuTransposeMatrix,gpuMatrix, WIDTH);
|
||||
|
||||
roctxMark("after hipLaunchKernel");
|
||||
|
||||
// Memory transfer from device to host
|
||||
roctxRangePush("hipMemcpy");
|
||||
|
||||
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost);
|
||||
|
||||
roctxRangePop(); // for "hipMemcpy"
|
||||
roctxRangePop(); // for "hipLaunchKernel"
|
||||
roctxRangeStop(rangeId);
|
||||
|
||||
To trace the API calls enclosed within the range, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --marker-trace -- <application_path>
|
||||
|
||||
Running the preceding command generates a ``marker_api_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 210_marker_api_trace.csv
|
||||
|
||||
Here are the contents of ``marker_api_trace.csv`` file:
|
||||
|
||||
.. csv-table:: Marker api trace
|
||||
:file: /data/marker_api_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
``roctxProfilerPause`` and ``roctxProfilerResume`` can be used to hide the calls between them. This is useful when you want to hide the calls that are not relevant to your profiling session.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
// Memory transfer from host to device
|
||||
HIP_API_CALL(hipMemcpy(gpuMatrix, Matrix, NUM * sizeof(float), hipMemcpyHostToDevice));
|
||||
|
||||
auto tid = roctx_thread_id_t{};
|
||||
roctxGetThreadId(&tid);
|
||||
roctxProfilerPause(tid);
|
||||
// Memory transfer that should be hidden by profiling tool
|
||||
HIP_API_CALL(
|
||||
hipMemcpy(gpuTransposeMatrix, gpuMatrix, NUM * sizeof(float), hipMemcpyDeviceToDevice));
|
||||
roctxProfilerResume(tid);
|
||||
|
||||
// Launching kernel from host
|
||||
hipLaunchKernelGGL(matrixTranspose,
|
||||
dim3(WIDTH / THREADS_PER_BLOCK_X, WIDTH / THREADS_PER_BLOCK_Y),
|
||||
dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y),
|
||||
0,
|
||||
0,
|
||||
gpuTransposeMatrix,
|
||||
gpuMatrix,
|
||||
WIDTH);
|
||||
|
||||
// Memory transfer from device to host
|
||||
HIP_API_CALL(
|
||||
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost));
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --marker-trace --hip-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hip_api_trace.csv`` file prefixed with the process ID, which has only 2 `hipMemcpy` calls and the in between ``hipMemcpyDeviceToHost`` is hidden .
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
|
||||
"HIP_COMPILER_API","__hipRegisterFatBinary",1643920,1643920,1,320301257609216,320301257636427
|
||||
"HIP_COMPILER_API","__hipRegisterFunction",1643920,1643920,2,320301257650707,320301257678857
|
||||
"HIP_RUNTIME_API","hipGetDevicePropertiesR0600",1643920,1643920,4,320301258114239,320301337764472
|
||||
"HIP_RUNTIME_API","hipMalloc",1643920,1643920,5,320301338073823,320301338247374
|
||||
"HIP_RUNTIME_API","hipMalloc",1643920,1643920,6,320301338248284,320301338399595
|
||||
"HIP_RUNTIME_API","hipMemcpy",1643920,1643920,7,320301338410995,320301631549262
|
||||
"HIP_COMPILER_API","__hipPushCallConfiguration",1643920,1643920,10,320301632131175,320301632134215
|
||||
"HIP_COMPILER_API","__hipPopCallConfiguration",1643920,1643920,11,320301632137745,320301632139735
|
||||
"HIP_RUNTIME_API","hipLaunchKernel",1643920,1643920,12,320301632142615,320301632898289
|
||||
"HIP_RUNTIME_API","hipMemcpy",1643920,1643920,14,320301632901249,320301633934395
|
||||
"HIP_RUNTIME_API","hipFree",1643920,1643920,15,320301643320908,320301643511479
|
||||
"HIP_RUNTIME_API","hipFree",1643920,1643920,16,320301643512629,320301643585639
|
||||
|
||||
Resource Naming:
|
||||
++++++++++++++++
|
||||
|
||||
OS Thread:
|
||||
==========
|
||||
|
||||
:code:`roctxNameOsThread(const char*)` function Current CPU OS thread to be labeled by the provided name in the output of the profiling tool.
|
||||
|
||||
Indicate to a profiling tool that, where possible, you would like the current CPU OS thread to be labeled by the provided name in the output of the profiling tool.
|
||||
Rocprofiler does not provide any explicit support for how profiling tools handle this request:
|
||||
- support for this capability is tool specific.
|
||||
- ROCTx does NOT rename the thread via `pthread_setname_np`.
|
||||
|
||||
HIP Runtime Resources:
|
||||
======================
|
||||
|
||||
:code:`roctxNameHipDevice(const char* name, int device_id)` and :code:`roctxNameHipStream(const char* name, const struct ihipStream_t* stream)` functions indicate to a profiling tool that, where possible, you would like the given HIP device id and HIP stream-id to be labeled by the provided name in the output of the profiling tool. Rocprofiler does not provide any explicit support for how profiling tools handle this request:
|
||||
- support for this capability is tool specific.
|
||||
|
||||
HSA Runtime Resources:
|
||||
======================
|
||||
|
||||
:code:`roctxNameHsaAgent(const char* name, const struct hsa_agent_s*)` function indicates to a profiling tool that, where possible, you would like the given HSA agent to be labeled by the provided name in the output of the profiling tool.
|
||||
- support for this capability is tool specific.
|
||||
@@ -281,129 +281,13 @@ For the description of the fields in the output file, see :ref:`output-file-fiel
|
||||
Marker trace
|
||||
++++++++++++++
|
||||
|
||||
In certain situations, such as debugging performance issues in large-scale GPU programs, API-level tracing might be too fine-grained to provide a big picture of the program execution. In such cases, it is helpful to define specific tasks to be traced.
|
||||
|
||||
To specify the tasks for tracing, enclose the respective source code with the API calls provided by the ``ROCTx`` library. This process is also known as instrumentation. As the scope of code for instrumentation is defined using the enclosing API calls, it is called a range. A range is a programmer-defined task that has a well-defined start and end code scope. You can also refine the scope specified within a range using further nested ranges. ``rocprofv3`` also reports the timelines for these nested ranges.
|
||||
|
||||
Here is a list of useful APIs for code instrumentation.
|
||||
|
||||
- ``roctxMark``: Inserts a marker in the code with a message. Creating marks help you see when a line of code is executed.
|
||||
- ``roctxRangeStart``: Starts a range. Different threads can start ranges.
|
||||
- ``roctxRangePush``: Starts a new nested range.
|
||||
- ``roctxRangePop``: Stops the current nested range.
|
||||
- ``roctxRangeStop``: Stops the given range.
|
||||
- ``roctxProfilerPause``: Request any currently running profiling tool that it should stop collecting data.
|
||||
- ``roctxProfilerResume``: Request any currently running profiling tool that it should resume collecting data.
|
||||
- ``roctxGetThreadId``: Retrieve a id value for the current thread which will be identical to the id value a profiling tool gets via `rocprofiler_get_thread_id(rocprofiler_thread_id_t*)`.
|
||||
- ``roctxNameOsThread``: Current CPU OS thread to be labeled by the provided name in the output of the profiling tool.
|
||||
- ``roctxNameHsaAgent``: Given HSA agent to be labeled by the provided name in the output of the profiling tool.
|
||||
- ``roctxNameHipDevice``: Given HIP device id to be labeled by the provided name in the output of the profiling tool.
|
||||
- ``roctxNameHipStream``: Given HIP stream to be labeled by the provided name in the output of the profiling tool.
|
||||
|
||||
|
||||
.. note::
|
||||
To use ``rocprofv3`` for marker tracing, including and linking to old ROCTx works but it is recommended to switch to new ROCTx because
|
||||
it has been extended with new APIs.
|
||||
To use new ROCTx, please include header ``"rocprofiler-sdk-roctx/roctx.h"`` and link your application with ``librocprofiler-sdk-roctx.so``.
|
||||
Above list of APIs is not exhaustive. See public header file ``"rocprofiler-sdk-roctx/roctx.h"`` for full list.
|
||||
|
||||
See how to use ``ROCTx`` APIs in the MatrixTranspose application below:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
roctxMark("before hipLaunchKernel");
|
||||
int rangeId = roctxRangeStart("hipLaunchKernel range");
|
||||
roctxRangePush("hipLaunchKernel");
|
||||
|
||||
// Launching kernel from host
|
||||
hipLaunchKernelGGL(matrixTranspose, dim3(WIDTH/THREADS_PER_BLOCK_X, WIDTH/THREADS_PER_BLOCK_Y), dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0,0,gpuTransposeMatrix,gpuMatrix, WIDTH);
|
||||
|
||||
roctxMark("after hipLaunchKernel");
|
||||
|
||||
// Memory transfer from device to host
|
||||
roctxRangePush("hipMemcpy");
|
||||
|
||||
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost);
|
||||
|
||||
roctxRangePop(); // for "hipMemcpy"
|
||||
roctxRangePop(); // for "hipLaunchKernel"
|
||||
roctxRangeStop(rangeId);
|
||||
|
||||
To trace the API calls enclosed within the range, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --marker-trace -- <application_path>
|
||||
|
||||
Running the preceding command generates a ``marker_api_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 210_marker_api_trace.csv
|
||||
|
||||
Here are the contents of ``marker_api_trace.csv`` file:
|
||||
|
||||
.. csv-table:: Marker api trace
|
||||
:file: /data/marker_api_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
``roctxProfilerPause`` and ``roctxProfilerResume`` can be used to hide the calls between them. This is useful when you want to hide the calls that are not relevant to your profiling session.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
// Memory transfer from host to device
|
||||
HIP_API_CALL(hipMemcpy(gpuMatrix, Matrix, NUM * sizeof(float), hipMemcpyHostToDevice));
|
||||
|
||||
auto tid = roctx_thread_id_t{};
|
||||
roctxGetThreadId(&tid);
|
||||
roctxProfilerPause(tid);
|
||||
// Memory transfer that should be hidden by profiling tool
|
||||
HIP_API_CALL(
|
||||
hipMemcpy(gpuTransposeMatrix, gpuMatrix, NUM * sizeof(float), hipMemcpyDeviceToDevice));
|
||||
roctxProfilerResume(tid);
|
||||
|
||||
// Lauching kernel from host
|
||||
hipLaunchKernelGGL(matrixTranspose,
|
||||
dim3(WIDTH / THREADS_PER_BLOCK_X, WIDTH / THREADS_PER_BLOCK_Y),
|
||||
dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y),
|
||||
0,
|
||||
0,
|
||||
gpuTransposeMatrix,
|
||||
gpuMatrix,
|
||||
WIDTH);
|
||||
|
||||
// Memory transfer from device to host
|
||||
HIP_API_CALL(
|
||||
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost));
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --marker-trace --hip-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hip_api_trace.csv`` file prefixed with the process ID, which has only 2 `hipMemcpy` calls and the in between ``hipMemcpyDeviceToHost`` is hidden .
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
|
||||
"HIP_COMPILER_API","__hipRegisterFatBinary",1643920,1643920,1,320301257609216,320301257636427
|
||||
"HIP_COMPILER_API","__hipRegisterFunction",1643920,1643920,2,320301257650707,320301257678857
|
||||
"HIP_RUNTIME_API","hipGetDevicePropertiesR0600",1643920,1643920,4,320301258114239,320301337764472
|
||||
"HIP_RUNTIME_API","hipMalloc",1643920,1643920,5,320301338073823,320301338247374
|
||||
"HIP_RUNTIME_API","hipMalloc",1643920,1643920,6,320301338248284,320301338399595
|
||||
"HIP_RUNTIME_API","hipMemcpy",1643920,1643920,7,320301338410995,320301631549262
|
||||
"HIP_COMPILER_API","__hipPushCallConfiguration",1643920,1643920,10,320301632131175,320301632134215
|
||||
"HIP_COMPILER_API","__hipPopCallConfiguration",1643920,1643920,11,320301632137745,320301632139735
|
||||
"HIP_RUNTIME_API","hipLaunchKernel",1643920,1643920,12,320301632142615,320301632898289
|
||||
"HIP_RUNTIME_API","hipMemcpy",1643920,1643920,14,320301632901249,320301633934395
|
||||
"HIP_RUNTIME_API","hipFree",1643920,1643920,15,320301643320908,320301643511479
|
||||
"HIP_RUNTIME_API","hipFree",1643920,1643920,16,320301643512629,320301643585639
|
||||
|
||||
To see usage of ``ROCTx/marker`` library, see :ref:`using-rocprofiler-sdk-roctx`.
|
||||
|
||||
Kernel Rename
|
||||
++++++++++++++
|
||||
|
||||
@@ -32,6 +32,7 @@ The documentation is structured as follows:
|
||||
.. grid-item-card:: How to
|
||||
|
||||
* :ref:`using-rocprofv3`
|
||||
* :ref:`using-rocprofiler-sdk-roctx`
|
||||
* :doc:`Samples <how-to/samples>`
|
||||
|
||||
.. grid-item-card:: API reference
|
||||
|
||||
@@ -108,7 +108,7 @@ CITE_BIB_FILES =
|
||||
#---------------------------------------------------------------------------
|
||||
# Configuration options related to warning and progress messages
|
||||
#---------------------------------------------------------------------------
|
||||
QUIET = NO
|
||||
QUIET = YES
|
||||
WARNINGS = YES
|
||||
WARN_IF_UNDOCUMENTED = YES
|
||||
WARN_IF_DOC_ERROR = YES
|
||||
@@ -139,12 +139,17 @@ FILE_PATTERNS = *.h \
|
||||
*.tcc \
|
||||
conf.py
|
||||
RECURSIVE = YES
|
||||
EXCLUDE =
|
||||
EXCLUDE = @SOURCE_DIR@/README.md
|
||||
EXCLUDE_SYMLINKS = YES
|
||||
EXCLUDE_PATTERNS = */.git/* \
|
||||
@SOURCE_DIR@/**/tests/* \
|
||||
@SOURCE_DIR@/**/scripts/* \
|
||||
@SOURCE_DIR@/**/docs/*
|
||||
@SOURCE_DIR@/**/docs/* \
|
||||
@SOURCE_DIR@/**/cmake/* \
|
||||
@SOURCE_DIR@/**/external/* \
|
||||
@SOURCE_DIR@/**/RPM/* \
|
||||
@SOURCE_DIR@/**/ISSUE_TEMPLATE/*
|
||||
|
||||
EXCLUDE_SYMBOLS = "std::*" \
|
||||
"ROCPROFILER_ATTRIBUTE" \
|
||||
"ROCPROFILER_API" \
|
||||
@@ -332,7 +337,8 @@ MACRO_EXPANSION = YES
|
||||
EXPAND_ONLY_PREDEF = NO
|
||||
SEARCH_INCLUDES = NO
|
||||
INCLUDE_PATH = @SOURCE_DIR@/source/include
|
||||
INCLUDE_FILE_PATTERNS = *.h
|
||||
INCLUDE_FILE_PATTERNS = *.h \
|
||||
*.hpp
|
||||
PREDEFINED = "ROCPROFILER_API=" \
|
||||
"ROCPROFILER_EXPORT=" \
|
||||
"ROCPROFILER_IMPORT=" \
|
||||
|
||||
Reference in New Issue
Block a user