SDK doc updates (#1183)
* correcting usage example
* rccl trace
* Adding Navi power state limitation
* Addressed feedback
* kernel-rename
* kokkos trace
* more information on kookos tracing
* Corecting tool library hardcoding
* summary domains
* Updating domain stats file
* updating images
* rocprofv3 default behavior update
* Removing README from API documentation
* Added missing description in Topics
* Fixed wrong rendering of README in API document
* Fixing Topics in API docs
* Removing API doc for details/rccl.h
* Addressed review comments
[ROCm/rocprofiler-sdk commit: 7ea9ced493]
Этот коммит содержится в:
коммит произвёл
GitHub
родитель
e11b553a26
Коммит
5bea1772ea
@@ -73,13 +73,24 @@ Please report in the Github Issues.
|
||||
- **Need for Cold Restart**: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations.
|
||||
Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.
|
||||
|
||||
- At this point, We do not recommend stress-testing the beta implementation.
|
||||
- At this point, We do not recommend stress-testing the beta implementation.
|
||||
|
||||
- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.
|
||||
- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.
|
||||
|
||||
- Timestamps in PC sampling records might not be 100% accurate.
|
||||
- Timestamps in PC sampling records might not be 100% accurate.
|
||||
|
||||
- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.
|
||||
- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.
|
||||
|
||||
- Navi3x requires a stable power state for counter collection.
|
||||
Currently, this state needs to be set by the user.
|
||||
To do so, set "power_dpm_force_performance_level" to be writeable for non-root users, then set performance level to profile_standard:
|
||||
|
||||
```bash
|
||||
sudo chmod 777 /sys/class/drm/card0/device/power_dpm_force_performance_level
|
||||
echo profile_standard >> /sys/class/drm/card0/device/power_dpm_force_performance_level
|
||||
```
|
||||
|
||||
Recommended: "profile_standard" for counter collection and "auto" for all other profiling. Use rocm-smi to verify the current power state. For multiGPU systems (includes integrated graphics), replace "card0" by the desired card.
|
||||
|
||||
> [!WARNING]
|
||||
> The latest mainline version of AQLprofile can be found at [https://repo.radeon.com/rocm/misc/aqlprofile/](https://repo.radeon.com/rocm/misc/aqlprofile/). However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source.
|
||||
|
||||
@@ -7,7 +7,7 @@ myst:
|
||||
|
||||
# ROCprofiler-SDK tool library
|
||||
|
||||
The tool library utilizes APIs from `rocprofiler-sdk` and `rocprofiler-register` libraries for profiling and tracing HIP applications. This document provides information to help you design a tool by utilizing the `rocprofiler-sdk` and `rocprofiler-register` libraries efficiently. The command-line tool `rocprofv3` is also built on `librocprofiler-sdk-tool.so.0.4.0`, which uses these libraries.
|
||||
The tool library utilizes APIs from `rocprofiler-sdk` and `rocprofiler-register` libraries for profiling and tracing HIP applications. This document provides information to help you design a tool by utilizing the `rocprofiler-sdk` and `rocprofiler-register` libraries efficiently. The command-line tool `rocprofv3` is also built on `librocprofiler-sdk-tool.so.X.Y.Z`, which uses these libraries.
|
||||
|
||||
## ROCm runtimes design
|
||||
|
||||
|
||||
+7
-1
@@ -383,4 +383,10 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
|
||||
Timing Difference Between rocprofv3 and rocprofv1/v2
|
||||
========================================================
|
||||
|
||||
Rocprofv3 has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
|
||||
``rocprofv3`` has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
|
||||
|
||||
========================================================
|
||||
Default run of rocprofv3 and rocprofv1/v2
|
||||
========================================================
|
||||
|
||||
``rocprofv3`` has a different default behavior than rocprofv1/v2 when being run without any option. The default behavior of rocprofv3 is to collect all available agents on the system and to output it in ``csv`` format. The default behavior of rocprofv1/v2 was to output the `kernel traces` in CSV format. In rocprofv3, kernel traces can be obtained by using ``--kernel-trace`` option.
|
||||
|
||||
@@ -0,0 +1,2 @@
|
||||
"Name","Calls","TotalDurationNs","AverageNs","Percentage","MinNs","MaxNs","StdDev"
|
||||
"HIP_API",13,458514859,35270373.769231,100.00,2300,352276613,99315857.546240
|
||||
|
@@ -0,0 +1,22 @@
|
||||
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
|
||||
"RCCL_API","ncclGetVersion",1834151,1834151,416,18413845573432,18413845577374
|
||||
"RCCL_API","ncclGetUniqueId",1834151,1834151,1116,18413961300878,18413963267869
|
||||
"RCCL_API","ncclGetUniqueId",1834151,1834151,1481,18414166449182,18414166720831
|
||||
"RCCL_API","ncclGroupStart",1834151,1834151,1482,18414166723772,18414166726834
|
||||
"RCCL_API","ncclGroupEnd",1834151,1834151,1490,18414166823575,18414380520973
|
||||
"RCCL_API","ncclCommInitAll",1834151,1834151,1477,18414166402665,18414380522536
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89098,18414380660695,18414380661652
|
||||
"RCCL_API","ncclAllReduce",1834151,1834151,89097,18414380653860,18414380693574
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89108,18414380694631,18414380694659
|
||||
"RCCL_API","ncclAllReduce",1834151,1834151,89107,18414380694212,18414380704722
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89117,18414380706650,18414380706677
|
||||
"RCCL_API","ncclAllReduce",1834151,1834151,89116,18414380705574,18414380715055
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89126,18414380715749,18414380715774
|
||||
"RCCL_API","ncclAllReduce",1834151,1834151,89125,18414380715463,18414380723944
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89135,18414380724688,18414380724715
|
||||
"RCCL_API","ncclAllReduce",1834151,1834151,89134,18414380724395,18414380732209
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89154,18414380746383,18414380746411
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89157,18414380749863,18414380749889
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89160,18414380751671,18414380751696
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89163,18414380753326,18414380753353
|
||||
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89166,18414380755128,18414380755154
|
||||
|
Двоичные данные
Двоичный файл не отображается.
|
После Ширина: | Высота: | Размер: 35 KiB |
Двоичные данные
Двоичный файл не отображается.
|
После Ширина: | Высота: | Размер: 11 KiB |
Двоичный файл не отображается.
|
После Ширина: | Высота: | Размер: 51 KiB |
@@ -173,7 +173,7 @@ To use ``rocprofv3`` for application tracing, run:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 <tracing_option> -- <app_relative_path>
|
||||
rocprofv3 <tracing_option> -- <application_path>
|
||||
|
||||
HIP trace
|
||||
+++++++++++
|
||||
@@ -184,7 +184,7 @@ To trace HIP runtime APIs, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --hip-trace -- < app_relative_path >
|
||||
rocprofv3 --hip-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hip_api_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
@@ -203,7 +203,7 @@ To trace HIP compile time APIs, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --hip-compiler-trace -- < app_relative_path >
|
||||
rocprofv3 --hip-compiler-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hip_api_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
@@ -229,7 +229,7 @@ HSA trace contains the start and end time of HSA runtime API calls and their asy
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --hsa-trace -- < app_relative_path >
|
||||
rocprofv3 --hsa-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hsa_api_trace.csv`` file prefixed with process ID. Note that the contents of this file have been truncated for demonstration purposes.
|
||||
|
||||
@@ -295,7 +295,7 @@ To trace the API calls enclosed within the range, use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --marker-trace -- < app_relative_path >
|
||||
rocprofv3 --marker-trace -- <application_path>
|
||||
|
||||
Running the preceding command generates a ``marker_api_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
@@ -312,6 +312,74 @@ Here are the contents of ``marker_api_trace.csv`` file:
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
Kernel Rename
|
||||
++++++++++++++
|
||||
|
||||
To rename kernels with their enclosing roctxRangePush/roctxRangePop message. Known as --roctx-rename in earlier rocprof versions.
|
||||
|
||||
See how to use ``--kernel-rename`` option with help of below code snippet:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
#include <rocprofiler-sdk-roctx/roctx.h>
|
||||
|
||||
roctxRangePush("HIP_Kernel-1");
|
||||
|
||||
// Launching kernel from host
|
||||
hipLaunchKernelGGL(matrixTranspose, dim3(WIDTH/THREADS_PER_BLOCK_X, WIDTH/THREADS_PER_BLOCK_Y), dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0,0,gpuTransposeMatrix,gpuMatrix, WIDTH);
|
||||
|
||||
// Memory transfer from device to host
|
||||
roctxRangePush("hipMemCpy-DeviceToHost");
|
||||
|
||||
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost);
|
||||
|
||||
roctxRangePop(); // for "hipMemcpy"
|
||||
roctxRangePop(); // for "hipLaunchKernel"
|
||||
roctxRangeStop(rangeId);
|
||||
|
||||
To rename the kernel , use:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --marker-trace --kernel-rename -- <application_path>
|
||||
|
||||
The above command generates a ``marker-trace`` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 210_marker_api_trace.csv
|
||||
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
|
||||
"MARKER_CORE_API","roctxGetThreadId",315155,315155,2,58378843928406,58378843930247
|
||||
"MARKER_CONTROL_API","roctxProfilerPause",315155,315155,3,58378844627184,58378844627502
|
||||
"MARKER_CONTROL_API","roctxProfilerResume",315155,315155,4,58378844638601,58378844639267
|
||||
"MARKER_CORE_API","pre-kernel-launch",315155,315155,5,58378844641787,58378844641787
|
||||
"MARKER_CORE_API","post-kernel-launch",315155,315155,6,58378844936586,58378844936586
|
||||
"MARKER_CORE_API","memCopyDth",315155,315155,7,58378844938371,58378851383270
|
||||
"MARKER_CORE_API","HIP_Kernel-1",315155,315155,1,58378526575735,58378851384485
|
||||
|
||||
|
||||
Kokkos Trace
|
||||
++++++++++++++
|
||||
|
||||
rocprofv3 has a built-in `Kokkos Tools library <https://github.com/kokkos/kokkos-tools>`_ support to trace Kokkos API calls. `Kokkos <https://github.com/kokkos/kokkos>`_ is a C++ library for writing performance portable applications. It is used in many scientific applications to write performance portable code that can run on CPUs, GPUs, and other accelerators.
|
||||
rocprofv3 loads a built-in Kokkos tools library which emits roctx ranges with the labels passed through the API, e.g. Kokkos::parallel_for(“MyParallelForLabel”, …); will internally calls for roctxRangePush and enables the kernel renaming option so that the highly templated kernel names are replaced by the Kokkos labels.
|
||||
To enable built-in marker support, use the ``kokkos-trace`` option. Internally this option enables ``marker-trace`` and ``kernel-rename``.:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rocprofv3 --kokkos-trace -- <application_path>
|
||||
|
||||
The above command generates a ``marker-trace`` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 210_marker_api_trace.csv
|
||||
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
|
||||
"MARKER_CORE_API","Kokkos::Initialization Complete",4069256,4069256,1,56728499773965,56728499773965
|
||||
"MARKER_CORE_API","Kokkos::Impl::CombinedFunctorReducer<CountFunctor, Kokkos::Impl::FunctorAnalysis<Kokkos::Impl::FunctorPatternInterface::REDUCE, Kokkos::RangePolicy<Kokkos::Serial>, CountFunctor, long int>::Reducer, void>",4069256,4069256,2,56728501756088,56728501764241
|
||||
"MARKER_CORE_API","Kokkos::parallel_reduce: fence due to result being value, not view",4069256,4069256,4,56728501767957,56728501769600
|
||||
"MARKER_CORE_API","Kokkos::Finalization Complete",4069256,4069256,6,56728502054554,56728502054554
|
||||
|
||||
Kernel trace
|
||||
++++++++++++++
|
||||
|
||||
@@ -319,7 +387,7 @@ To trace kernel dispatch traces, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --kernel-trace -- < app_relative_path >
|
||||
rocprofv3 --kernel-trace -- <application_path>
|
||||
|
||||
The above command generates a ``kernel_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
@@ -343,7 +411,7 @@ To trace memory moves across the application, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 –-memory-copy-trace -- < app_relative_path >
|
||||
rocprofv3 –-memory-copy-trace -- <application_path>
|
||||
|
||||
The above command generates a ``memory_copy_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
@@ -400,7 +468,7 @@ memory operations (copies and scratch).
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 –-runtime-trace -- < app_relative_path >
|
||||
rocprofv3 –-runtime-trace -- <application_path>
|
||||
|
||||
Running the above command generates ``hip_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``scratch_memory_trace.csv``, ``memory_allocation_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.
|
||||
|
||||
@@ -411,7 +479,7 @@ This is an all-inclusive option to collect all the above-mentioned traces.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 –-sys-trace -- < app_relative_path >
|
||||
rocprofv3 –-sys-trace -- <application_path>
|
||||
|
||||
Running the above command generates ``hip_api_trace.csv``, ``hsa_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``memory_allocation_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.
|
||||
|
||||
@@ -422,19 +490,45 @@ This option collects scratch memory operation's traces. Scratch is an address sp
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --scratch-memory-trace -- < app_relative_path >
|
||||
rocprofv3 --scratch-memory-trace -- <application_path>
|
||||
|
||||
Stats
|
||||
++++++++
|
||||
|
||||
RCCL trace
|
||||
++++++++++++
|
||||
|
||||
`RCCL <https://github.com/ROCm/rccl>`_ (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs. This option traces those communication routines.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --rccl-trace -- <application_path>
|
||||
|
||||
The above command generates a ``rccl_api_trace`` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
$ cat 197_rccl_api_trace.csv
|
||||
|
||||
Here are the contents of ``rccl_api_trace.csv`` file:
|
||||
|
||||
.. csv-table:: RCCL trace
|
||||
:file: /data/rccl_trace.csv
|
||||
:widths: 10,10,10,10,10,20,20
|
||||
:header-rows: 1
|
||||
|
||||
Post-processing tracing options
|
||||
++++++++++++++++++++++++++++++++
|
||||
|
||||
1. Stats
|
||||
+++++++++
|
||||
|
||||
This option collects statistics for the enabled tracing types. For example, to collect statistics of HIP APIs, when HIP trace is enabled.
|
||||
A higher percentage in statistics can help user focus on the API/function that has taken the most time:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --stats --hip-trace -- < app_relative_path >
|
||||
rocprofv3 --stats --hip-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hip_api_stats.csv`` and ``hip_api_trace`` file prefixed with the process ID.
|
||||
The above command generates a ``hip_api_stats.csv``, ``domain_stats.csv`` and ``hip_api_trace.csv`` file prefixed with the process ID.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
@@ -447,8 +541,60 @@ Here are the contents of ``hip_api_stats.csv`` file:
|
||||
:widths: 10,10,20,20,10,10,10,10
|
||||
:header-rows: 1
|
||||
|
||||
Here are the contents of ``domain_stats.csv`` file:
|
||||
|
||||
.. csv-table:: Domain stats
|
||||
:file: /data/hip_domain_stats.csv
|
||||
:widths: 10,10,20,20,10,10,10,10
|
||||
:header-rows: 1
|
||||
|
||||
For the description of the fields in the output file, see :ref:`output-file-fields`.
|
||||
|
||||
2. Summary
|
||||
+++++++++++
|
||||
|
||||
Output single summary of tracing data at the conclusion of the profiling session
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -S --hip-trace -- <application_path>
|
||||
|
||||
.. image:: /data/rocprofv3_summary.png
|
||||
|
||||
|
||||
2.1 Summary per domain
|
||||
++++++++++++++++++++++
|
||||
|
||||
Outputs the summary of each tracing domain at the end of profiling session.
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -D --hsa-trace --hip-trace -- <application_path>
|
||||
|
||||
The above command generates a ``hip_trace.csv``, ``hsa_trace.csv`` file prefixed with the process ID along with the summary of each domain at the terminal.
|
||||
|
||||
2.2 Summary groups
|
||||
+++++++++++++++++++
|
||||
|
||||
Users can create a summary of multiple domains by specifying the domain names in the command line. The summary groups are separated by a pipe (|) symbol.
|
||||
To create a summary for ``MEMORY_COPY`` domains, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --summary-groups MEMORY_COPY --sys-trace -- <application_path>
|
||||
|
||||
.. image:: /data/rocprofv3_memcpy_summary.png
|
||||
|
||||
|
||||
To create a summary for ``MEMORY_COPY`` and ``HIP_API`` domains, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --summary-groups 'MEMORY_COPY|HIP_API' --sys-trace -- <application_path>
|
||||
|
||||
.. image:: /data/rocprofv3_hip_memcpy_summary.png
|
||||
|
||||
|
||||
Kernel profiling
|
||||
-------------------
|
||||
|
||||
@@ -540,7 +686,7 @@ Properties
|
||||
{
|
||||
"jobs": [
|
||||
{
|
||||
"pmc": ["SQ_WAVES", "GRBM_COUNT", "GUI_ACTIVE"]
|
||||
"pmc": ["SQ_WAVES", "GRBM_COUNT", "GRBM_GUI_ACTIVE"]
|
||||
},
|
||||
{
|
||||
"pmc": ["FETCH_SIZE", "WRITE_SIZE"],
|
||||
@@ -564,7 +710,7 @@ Properties
|
||||
- pmc:
|
||||
- SQ_WAVES
|
||||
- GRBM_COUNT
|
||||
- GUI_ACTIVE
|
||||
- GRBM_GUI_ACTIVE
|
||||
- 'TCC_HIT[1]'
|
||||
- 'TCC_HIT[2]'
|
||||
- pmc:
|
||||
@@ -581,7 +727,7 @@ To supply the counters via ``command-line`` options, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 --pmc SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE -- <app_relative_path>
|
||||
rocprofv3 --pmc SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE -- <application_path>
|
||||
|
||||
.. note::
|
||||
1. Please note that more than 1 counters should be separated by a space or a comma.
|
||||
@@ -594,7 +740,7 @@ To supply the input file for kernel profiling, use:
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -i input.txt -- <app_relative_path>
|
||||
rocprofv3 -i input.txt -- <application_path>
|
||||
|
||||
Running the above command generates a ``./pmc_n/counter_collection.csv`` file prefixed with the process ID. For each ``pmc`` row, a directory ``pmc_n`` containing a ``counter_collection.csv`` file is generated, where n = 1 for the first row and so on.
|
||||
|
||||
@@ -666,7 +812,7 @@ To collect counters for the kernels matching the filters specified in the preced
|
||||
|
||||
.. code-block:: shell
|
||||
|
||||
rocprofv3 -i input.yml -- <app_relative_path>
|
||||
rocprofv3 -i input.yml -- <application_path>
|
||||
|
||||
$ cat pass_1/312_counter_collection.csv
|
||||
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value","Start_Timestamp","End_Timestamp"
|
||||
|
||||
@@ -139,7 +139,8 @@ FILE_PATTERNS = *.h \
|
||||
*.tcc \
|
||||
conf.py
|
||||
RECURSIVE = YES
|
||||
EXCLUDE =
|
||||
EXCLUDE = @SOURCE_DIR@/README.md \
|
||||
@SOURCE_DIR@/include/rocprofiler-sdk/rccl/details/rccl.h
|
||||
EXCLUDE_SYMLINKS = YES
|
||||
EXCLUDE_PATTERNS = */.git/* \
|
||||
@SOURCE_DIR@/**/tests/* \
|
||||
@@ -266,7 +267,7 @@ LATEX_CMD_NAME = latex
|
||||
MAKEINDEX_CMD_NAME = makeindex
|
||||
LATEX_MAKEINDEX_CMD = makeindex
|
||||
COMPACT_LATEX = NO
|
||||
PAPER_TYPE =
|
||||
PAPER_TYPE = a4
|
||||
EXTRA_PACKAGES = float
|
||||
LATEX_HEADER =
|
||||
LATEX_FOOTER =
|
||||
@@ -332,7 +333,8 @@ MACRO_EXPANSION = YES
|
||||
EXPAND_ONLY_PREDEF = NO
|
||||
SEARCH_INCLUDES = NO
|
||||
INCLUDE_PATH = @SOURCE_DIR@/source/include
|
||||
INCLUDE_FILE_PATTERNS = *.h
|
||||
INCLUDE_FILE_PATTERNS = *.h \
|
||||
*.hpp
|
||||
PREDEFINED = "ROCPROFILER_API=" \
|
||||
"ROCPROFILER_EXPORT=" \
|
||||
"ROCPROFILER_IMPORT=" \
|
||||
|
||||
+15
-13
@@ -47,7 +47,7 @@ typedef struct
|
||||
} ncclUniqueId;
|
||||
|
||||
/*! @defgroup rccl_result_code Result Codes
|
||||
@details The various result codes that RCCL API calls may return
|
||||
@brief The various result codes that RCCL API calls may return
|
||||
@{ */
|
||||
|
||||
/*! @brief Result type
|
||||
@@ -71,12 +71,14 @@ typedef enum
|
||||
#define NCCL_SPLIT_NOCOLOR -1
|
||||
|
||||
/*! @defgroup rccl_config_type Communicator Configuration
|
||||
@details Structure that allows for customizing Communicator behavior via
|
||||
@brief Structure that allows for customizing Communicator behavior via
|
||||
ncclCommInitRankConfig
|
||||
@{ */
|
||||
|
||||
/*! @brief Communicator configuration
|
||||
@details Users can assign value to attributes to specify the behavior of a communicator */
|
||||
/**
|
||||
* @defgroup Communicator configuration
|
||||
* @brief Users can assign value to attributes to specify the behavior of a communicator.
|
||||
*/
|
||||
typedef struct ncclConfig_v21700
|
||||
{
|
||||
/* attributes that users should never touch. */
|
||||
@@ -122,7 +124,7 @@ ncclResult_t
|
||||
pncclMemFree(void* ptr);
|
||||
|
||||
/*! @defgroup rccl_api_version Version Information
|
||||
@details API call that returns RCCL version
|
||||
@brief API call that returns RCCL version
|
||||
@{ */
|
||||
|
||||
/*! @brief Return the RCCL_VERSION_CODE of RCCL in the supplied integer.
|
||||
@@ -140,7 +142,7 @@ pncclGetVersion(int* version);
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup rccl_api_communicator Communicator Initialization/Destruction
|
||||
@details API calls that operate on communicators.
|
||||
@brief API calls that operate on communicators.
|
||||
Communicators objects are used to launch collective communication
|
||||
operations. Unique ranks between 0 and N-1 must be assigned to
|
||||
each HIP device participating in the same Communicator.
|
||||
@@ -292,7 +294,7 @@ pncclCommSplit(ncclComm_t comm, int color, int key, ncclComm_t* newcomm, ncclCon
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup rccl_api_errcheck Error Checking Calls
|
||||
@details API calls that check for errors
|
||||
@brief API calls that check for errors
|
||||
@{ */
|
||||
|
||||
/*! @brief Returns a string for each result code.
|
||||
@@ -330,7 +332,7 @@ pncclCommGetAsyncError(ncclComm_t comm, ncclResult_t* asyncError);
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup rccl_api_comminfo Communicator Information
|
||||
@details API calls that query communicator information
|
||||
@brief API calls that query communicator information
|
||||
@{ */
|
||||
|
||||
/*! @brief Gets the number of ranks in the communicator clique.
|
||||
@@ -391,7 +393,7 @@ pncclCommDeregister(const ncclComm_t comm, void* handle);
|
||||
/*! @endcond */
|
||||
|
||||
/*! @defgroup rccl_api_enumerations API Enumerations
|
||||
@details Enumerations used by collective communication calls
|
||||
@brief Enumerations used by collective communication calls
|
||||
@{ */
|
||||
|
||||
/*! @brief Dummy reduction enumeration
|
||||
@@ -454,7 +456,7 @@ typedef enum
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup rccl_api_custom_redop Custom Reduction Operator
|
||||
@details API calls relating to creation/destroying custom reduction operator
|
||||
@brief API calls relating to creation/destroying custom reduction operator
|
||||
that pre-multiplies local source arrays prior to reduction
|
||||
@{ */
|
||||
|
||||
@@ -516,7 +518,7 @@ pncclRedOpDestroy(ncclRedOp_t op, ncclComm_t comm);
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup rccl_collective_api Collective Communication Operations
|
||||
@details Collective communication operations must be called separately for each
|
||||
@brief Collective communication operations must be called separately for each
|
||||
communicator in a communicator clique.
|
||||
|
||||
They return when operations have been enqueued on the HIP stream.
|
||||
@@ -935,7 +937,7 @@ pncclAllToAllv(const void* sendbuff,
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup msccl_api MSCCL Algorithm
|
||||
@details API calls relating to the optional MSCCL algorithm datapath
|
||||
@brief API calls relating to the optional MSCCL algorithm datapath
|
||||
@{ */
|
||||
|
||||
/*! @brief Opaque handle to MSCCL algorithm */
|
||||
@@ -1030,7 +1032,7 @@ pmscclUnloadAlgo(mscclAlgoHandle_t mscclAlgoHandle);
|
||||
/*! @} */
|
||||
|
||||
/*! @defgroup rccl_group_api Group semantics
|
||||
@details When managing multiple GPUs from a single thread, and since RCCL collective
|
||||
@brief When managing multiple GPUs from a single thread, and since RCCL collective
|
||||
calls may perform inter-CPU synchronization, we need to "group" calls for
|
||||
different ranks/devices into a single call.
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ ROCPROFILER_EXTERN_C_INIT
|
||||
/**
|
||||
* @defgroup REGISTRATION_GROUP Tool registration
|
||||
*
|
||||
* Data types and functions for tool registration with rocprofiler
|
||||
* @brief Data types and functions for tool registration with rocprofiler
|
||||
* @{
|
||||
*/
|
||||
|
||||
|
||||
@@ -86,7 +86,7 @@ ROCPROFILER_EXTERN_C_INIT
|
||||
|
||||
/**
|
||||
* @defgroup MISCELLANEOUS_GROUP Miscellaneous Utility Functions
|
||||
*
|
||||
* @brief utility functions for library
|
||||
* @{
|
||||
*/
|
||||
|
||||
|
||||
Ссылка в новой задаче
Block a user