diff --git a/projects/rocprofiler-sdk/source/docs/api-reference/callback_services.rst b/projects/rocprofiler-sdk/source/docs/api-reference/callback_services.rst index 27ce735984..810a6bc93d 100644 --- a/projects/rocprofiler-sdk/source/docs/api-reference/callback_services.rst +++ b/projects/rocprofiler-sdk/source/docs/api-reference/callback_services.rst @@ -339,4 +339,4 @@ Here is the general sequence of events when a code object is loaded and unloaded all buffers that might contain references to that code object or kernel symbol identifier before deleting the associated data. -For a sample of code object tracing, see `samples/code_object_tracing `_. +For a sample of code object tracing, see `samples/code_object_tracing `_. diff --git a/projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst b/projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst index ebb4334aa2..a59ad22283 100644 --- a/projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst +++ b/projects/rocprofiler-sdk/source/docs/api-reference/counter_collection_services.rst @@ -13,7 +13,7 @@ There are two modes of counter collection service: - **Device counting**: In this mode, counters are collected on a device level. This mode is useful for collecting device level counters not tied to a specific kernel execution, which encompasses collecting counter values for a specific time range. -This topic explains how to setup dispatch and device counting and use common counter collection APIs. For details on the APIs including the less commonly used counter collection APIs, see the API library. For fully functional examples of both dispatch and device counting, see `Samples `_. +This topic explains how to setup dispatch and device counting and use common counter collection APIs. For details on the APIs including the less commonly used counter collection APIs, see the API library. For fully functional examples of both dispatch and device counting, see `Samples `_. Definitions ----------- diff --git a/projects/rocprofiler-sdk/source/docs/api-reference/intercept_table.rst b/projects/rocprofiler-sdk/source/docs/api-reference/intercept_table.rst index fba2d830de..edd2d7f6bb 100644 --- a/projects/rocprofiler-sdk/source/docs/api-reference/intercept_table.rst +++ b/projects/rocprofiler-sdk/source/docs/api-reference/intercept_table.rst @@ -96,4 +96,4 @@ Dispatch table chaining ROCprofiler-SDK can save the original values of the function pointers such as ``foo_fn`` in ``impl::construct_dispatch_table()`` and install its own function pointers in its place. This results in the public C API function ``foo`` calling into the ROCprofiler-SDK function pointer, which in turn, calls the original function pointer to ``impl::foo``. This phenomenon is named chaining. Once ROCprofiler-SDK makes necessary modifications to the dispatch table, tools requesting access to the raw dispatch table via ``rocprofiler_at_intercept_table_registration`` are provided the pointer to the dispatch table. -For examples on dispatch table chaining, see `samples/intercept_table `_. +For examples on dispatch table chaining, see `samples/intercept_table `_. diff --git a/projects/rocprofiler-sdk/source/docs/api-reference/pc_sampling.rst b/projects/rocprofiler-sdk/source/docs/api-reference/pc_sampling.rst index b90351e959..45b5c84499 100644 --- a/projects/rocprofiler-sdk/source/docs/api-reference/pc_sampling.rst +++ b/projects/rocprofiler-sdk/source/docs/api-reference/pc_sampling.rst @@ -22,7 +22,7 @@ Program Counter (PC) sampling is a profiling method that uses statistical approx ROCprofiler-SDK PC sampling service ------------------------------------ -This section describes how to use ROCProfiler-SDK PC sampling API to configure and use PC sampling service. For fully functional examples, see `Samples `_. +This section describes how to use ROCProfiler-SDK PC sampling API to configure and use PC sampling service. For fully functional examples, see `Samples `_. tool_init() setup ++++++++++++++++++ @@ -132,7 +132,7 @@ Configure the PC sampling service on an agent with ``agent_id`` to generate samp .. note:: - Multiple processes can share the same GPU agent simultaneously, so the following A->B->A problem is possible on shared systems. For example, process A can query available configurations and opt to configure the service with configuration CA. However, if process B manages to finish configuring the service with configuration CB, then process A will fail. Thus, it is advisable for process A to repeat the querying process to observe configuration CB and reuse it for configuring the PC sampling service. For more details, refer to the `Samples `_. + Multiple processes can share the same GPU agent simultaneously, so the following A->B->A problem is possible on shared systems. For example, process A can query available configurations and opt to configure the service with configuration CA. However, if process B manages to finish configuring the service with configuration CB, then process A will fail. Thus, it is advisable for process A to repeat the querying process to observe configuration CB and reuse it for configuring the PC sampling service. For more details, refer to the `Samples `_. Processing PC samples ---------------------- @@ -170,7 +170,7 @@ The PC sampling service asynchronously delivers samples via a dedicated callback } } -For more information on the data comprising a single sample, see `pc_sampling.h `_. +For more information on the data comprising a single sample, see `pc_sampling.h `_. .. note:: A user can synchronously flush buffers via ``rocprofiler_buffer_flush`` that triggers ``pc_sampling_callback``. diff --git a/projects/rocprofiler-sdk/source/docs/api-reference/thread_trace.rst b/projects/rocprofiler-sdk/source/docs/api-reference/thread_trace.rst index a543c731d9..9c61cbe6d2 100644 --- a/projects/rocprofiler-sdk/source/docs/api-reference/thread_trace.rst +++ b/projects/rocprofiler-sdk/source/docs/api-reference/thread_trace.rst @@ -346,12 +346,12 @@ The Trace Decoder provides important information about the quality and comprehen For more information about the data structures and functions available for thread trace decoding, see the following headers: -- `trace_decoder.h `_ +- `trace_decoder.h `_ -- `trace_decoder_types.h `_ +- `trace_decoder_types.h `_ -- `core.h `_ +- `core.h `_ -- `dispatch.h `_ +- `dispatch.h `_ -- `agent.h `_ +- `agent.h `_ diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst index aa729c6de3..ac19e13f5d 100644 --- a/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-thread-trace.rst @@ -39,7 +39,7 @@ Prerequisites * ROCm 7.x build, or - * Early release can be `built from source `_ + * Early release can be `built from source `_ * Otherwise, ``rocprofv3`` throws error "INVALID_SHADER_DATA" or "Agent not supported". @@ -48,13 +48,13 @@ Prerequisites * For binary files, see `ROCprof trace decoder release page `_. * Default install location is ``/opt/rocm/lib`` - + * For custom location, use: * Parameter ``--att-library-path``, or * Environment variable ``ROCPROF_ATT_LIBRARY_PATH`` - + .. _thread-trace-parameters: @@ -163,7 +163,7 @@ If the subsequent kernels are targeted kernels, the profiler will then profile a new targeted kernel, so it is possible for a generated ATT file to have more than ``n`` kernels profiled. All the profiled kernels are then compiled into a single ATT file. If a new targeted kernel is encountered after the ``rocprofv3`` tool has finished profiling a batch of kernels, -the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels. +the profiler will restart profiling when encountering this new targeted kernel and create another ATT file with multiple kernels. .. _output-files: @@ -175,17 +175,17 @@ After the application finishes executing, ROCprof Trace Decoder runs automatical - stats_*.csv files: * Contains a summary of instruction latency per kernel. - + - ui_output_agent_{agent_id}_dispatch_{dispatch_id} directory: - + * Contains detailed tracing information in the form of .json files. - + * This directory can be opened using the `ROCprof Compute Viewer `_. - Raw files: * .att - Raw SQTT data. Can be used with the ROCprof Trace Decoder for further analysis. - + * .out - Code object binaries (executable). Can be used with ISA analysis tools. .. _csv-content: @@ -217,28 +217,28 @@ The columns of the stats_*.csv file are described here: * **Latency:** Total latency in cycles, defined as "Stall time + Issue time" for gfx9 or "Stall time + Execute time" for gfx10+. -* **Stall:** The total number of cycles the hardware pipe couldn't issue an instruction. +* **Stall:** The total number of cycles the hardware pipe couldn't issue an instruction. * Usually caused when the hardware unit is busy, such as TCP or LDS backpressure. - + * **Idle:** The total time gap between the completion of previous instruction and the beginning of the current instruction. The idle time can be caused by: * Arbiter loss - + * Source or destination register dependency - + * Instruction cache miss - + * **Source:** The original source line of code assigned by the compiler. * Requires compiling with debug symbols. - + Troubleshooting =============== For some applications, stats_*.csv file could be empty even for a valid kernel dispatch. -Thread trace is limited to a single CU per SE (``att-target-cu``). If a kernel dispatch doesn't launch enough waves to populate the whole GPU, there's a possibility of no wave getting assigned to the ``target_cu``. In such cases, there's nothing to be traced. +Thread trace is limited to a single CU per SE (``att-target-cu``). If a kernel dispatch doesn't launch enough waves to populate the whole GPU, there's a possibility of no wave getting assigned to the ``target_cu``. In such cases, there's nothing to be traced. Here are some options to handle this: * Launch more waves. @@ -248,10 +248,10 @@ Here are some options to handle this: * Set the ``--att-shader-engine-mask`` to 0x11111111, or possibly to 0xFFFFFFFF * A number too high can cause packet losses and/or lead to a full buffer. - + * Set the ``HSA_CU_MASK`` to mask out all CUs but the target. For more details, see `setting CUs `_. * If only the ``target_cu`` (or a few CUs) are not masked out, then all or most waves will be assigned to the ``target_cu``. - + * This can potentially cause low performance in high-demanding kernels. - + diff --git a/projects/rocprofiler-sdk/source/docs/index.rst b/projects/rocprofiler-sdk/source/docs/index.rst index 7ef987cab8..34f2bb4329 100644 --- a/projects/rocprofiler-sdk/source/docs/index.rst +++ b/projects/rocprofiler-sdk/source/docs/index.rst @@ -18,10 +18,10 @@ You can utilize the ROCprofiler-SDK to develop a tool for profiling and tracing The code is open source and hosted at ``_. .. note:: - + The ROCprofiler-SDK repository for ROCm 7.0 and earlier is located at ``_. -ROCprofiler-SDK uses a companion library called `AQLprofile `__ that generates profiling command packets (AQL/PM4) for performance counters and SQ thread trace. See the `AQLprofile docs `__ for more info. +ROCprofiler-SDK uses a companion library called `AQLprofile `_, that generates profiling command packets (AQL/PM4) for performance counters and SQ thread trace. For details, see the `AQLprofile docs `_. The documentation is structured as follows: diff --git a/projects/roctracer/README.md b/projects/roctracer/README.md index dbded634b7..49bb73658f 100644 --- a/projects/roctracer/README.md +++ b/projects/roctracer/README.md @@ -61,11 +61,11 @@ To use the rocTX API you need the API header and to link your application with ` `rocTracer` library public API header. - `roctx.h` - + `rocTX` library public API header. - `src` - + Library sources. - `core` @@ -138,20 +138,20 @@ To use the rocTX API you need the API header and to link your application with ` - Clone development branch of `roctracer`: ```sh - git clone -b amd-master https://github.com/ROCm-Developer-Tools/roctracer + git clone -b develop https://github.com/ROCm/rocm-systems.git ``` - To build `roctracer` library: ```sh - cd /roctracer + cd /rocm-systems/projects/roctracer ./build.sh ``` - To build and run test: ```sh - cd /roctracer/build + cd /rocm-systems/projects/roctracer/build make mytest run.sh ```