* correcting usage example

* rccl trace

* Adding Navi power state limitation

* Addressed feedback

* kernel-rename

* kokkos trace

* more information on kookos tracing

* Corecting tool library hardcoding

* summary domains

* Updating domain stats file

* updating images

* rocprofv3 default behavior update

* Removing README from API documentation

* Added missing description in Topics

* Fixed wrong rendering of README in API document

* Fixing Topics in API docs

* Removing API doc for details/rccl.h

* Addressed review comments

[ROCm/rocprofiler-sdk commit: 7ea9ced493]
Этот коммит содержится в:
Gopesh Bhardwaj
2024-11-22 12:05:11 +05:30
коммит произвёл GitHub
родитель e11b553a26
Коммит 5bea1772ea
13 изменённых файлов: 234 добавлений и 43 удалений
+15 -4
Просмотреть файл
@@ -73,13 +73,24 @@ Please report in the Github Issues.
- **Need for Cold Restart**: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations.
Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.
- At this point, We do not recommend stress-testing the beta implementation.
- At this point, We do not recommend stress-testing the beta implementation.
- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.
- Correlation IDs provided by the PC sampling service are verified only for HIP API calls.
- Timestamps in PC sampling records might not be 100% accurate.
- Timestamps in PC sampling records might not be 100% accurate.
- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.
- Using PC sampling on multi-threaded applications might fail with `HSA_STATUS_ERROR_EXCEPTION`.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the `HSA_STATUS_ERROR_EXCEPTION` might appear.
- Navi3x requires a stable power state for counter collection.
Currently, this state needs to be set by the user.
To do so, set "power_dpm_force_performance_level" to be writeable for non-root users, then set performance level to profile_standard:
```bash
sudo chmod 777 /sys/class/drm/card0/device/power_dpm_force_performance_level
echo profile_standard >> /sys/class/drm/card0/device/power_dpm_force_performance_level
```
Recommended: "profile_standard" for counter collection and "auto" for all other profiling. Use rocm-smi to verify the current power state. For multiGPU systems (includes integrated graphics), replace "card0" by the desired card.
> [!WARNING]
> The latest mainline version of AQLprofile can be found at [https://repo.radeon.com/rocm/misc/aqlprofile/](https://repo.radeon.com/rocm/misc/aqlprofile/). However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source.
+1 -1
Просмотреть файл
@@ -7,7 +7,7 @@ myst:
# ROCprofiler-SDK tool library
The tool library utilizes APIs from `rocprofiler-sdk` and `rocprofiler-register` libraries for profiling and tracing HIP applications. This document provides information to help you design a tool by utilizing the `rocprofiler-sdk` and `rocprofiler-register` libraries efficiently. The command-line tool `rocprofv3` is also built on `librocprofiler-sdk-tool.so.0.4.0`, which uses these libraries.
The tool library utilizes APIs from `rocprofiler-sdk` and `rocprofiler-register` libraries for profiling and tracing HIP applications. This document provides information to help you design a tool by utilizing the `rocprofiler-sdk` and `rocprofiler-register` libraries efficiently. The command-line tool `rocprofv3` is also built on `librocprofiler-sdk-tool.so.X.Y.Z`, which uses these libraries.
## ROCm runtimes design
+7 -1
Просмотреть файл
@@ -383,4 +383,10 @@ ROCprofiler-SDK introduces a new command-line tool, `rocprofv3`, which is a more
Timing Difference Between rocprofv3 and rocprofv1/v2
========================================================
Rocprofv3 has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
``rocprofv3`` has improved the accuracy of timing information by reducing the tool overhead required to collect data and reducing the interference to the timing of the kernel being measured. The result of this work is a reduction in variance of kernel times received for the same kernel execution and more accurate timing in general. These changes have not been backported (and will not be backported) to rocprofv1/v2, so there can be substantial (20%) differences in execution time reported by v1/v2 vs v3 for a single kernel execution. Over a large number of samples of the same kernel, the difference in average execution time is in the low single digit percentage time with a much tighter variance of results on rocprofv3. We have included testing in the test suite to verify the timing information outputted by rocprofv3 to ensure that the values we are returning are accurate.
========================================================
Default run of rocprofv3 and rocprofv1/v2
========================================================
``rocprofv3`` has a different default behavior than rocprofv1/v2 when being run without any option. The default behavior of rocprofv3 is to collect all available agents on the system and to output it in ``csv`` format. The default behavior of rocprofv1/v2 was to output the `kernel traces` in CSV format. In rocprofv3, kernel traces can be obtained by using ``--kernel-trace`` option.
+2
Просмотреть файл
@@ -0,0 +1,2 @@
"Name","Calls","TotalDurationNs","AverageNs","Percentage","MinNs","MaxNs","StdDev"
"HIP_API",13,458514859,35270373.769231,100.00,2300,352276613,99315857.546240
1 Name Calls TotalDurationNs AverageNs Percentage MinNs MaxNs StdDev
2 HIP_API 13 458514859 35270373.769231 100.00 2300 352276613 99315857.546240
+22
Просмотреть файл
@@ -0,0 +1,22 @@
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
"RCCL_API","ncclGetVersion",1834151,1834151,416,18413845573432,18413845577374
"RCCL_API","ncclGetUniqueId",1834151,1834151,1116,18413961300878,18413963267869
"RCCL_API","ncclGetUniqueId",1834151,1834151,1481,18414166449182,18414166720831
"RCCL_API","ncclGroupStart",1834151,1834151,1482,18414166723772,18414166726834
"RCCL_API","ncclGroupEnd",1834151,1834151,1490,18414166823575,18414380520973
"RCCL_API","ncclCommInitAll",1834151,1834151,1477,18414166402665,18414380522536
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89098,18414380660695,18414380661652
"RCCL_API","ncclAllReduce",1834151,1834151,89097,18414380653860,18414380693574
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89108,18414380694631,18414380694659
"RCCL_API","ncclAllReduce",1834151,1834151,89107,18414380694212,18414380704722
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89117,18414380706650,18414380706677
"RCCL_API","ncclAllReduce",1834151,1834151,89116,18414380705574,18414380715055
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89126,18414380715749,18414380715774
"RCCL_API","ncclAllReduce",1834151,1834151,89125,18414380715463,18414380723944
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89135,18414380724688,18414380724715
"RCCL_API","ncclAllReduce",1834151,1834151,89134,18414380724395,18414380732209
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89154,18414380746383,18414380746411
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89157,18414380749863,18414380749889
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89160,18414380751671,18414380751696
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89163,18414380753326,18414380753353
"RCCL_API","ncclCommGetAsyncError",1834151,1834151,89166,18414380755128,18414380755154
1 Domain Function Process_Id Thread_Id Correlation_Id Start_Timestamp End_Timestamp
2 RCCL_API ncclGetVersion 1834151 1834151 416 18413845573432 18413845577374
3 RCCL_API ncclGetUniqueId 1834151 1834151 1116 18413961300878 18413963267869
4 RCCL_API ncclGetUniqueId 1834151 1834151 1481 18414166449182 18414166720831
5 RCCL_API ncclGroupStart 1834151 1834151 1482 18414166723772 18414166726834
6 RCCL_API ncclGroupEnd 1834151 1834151 1490 18414166823575 18414380520973
7 RCCL_API ncclCommInitAll 1834151 1834151 1477 18414166402665 18414380522536
8 RCCL_API ncclCommGetAsyncError 1834151 1834151 89098 18414380660695 18414380661652
9 RCCL_API ncclAllReduce 1834151 1834151 89097 18414380653860 18414380693574
10 RCCL_API ncclCommGetAsyncError 1834151 1834151 89108 18414380694631 18414380694659
11 RCCL_API ncclAllReduce 1834151 1834151 89107 18414380694212 18414380704722
12 RCCL_API ncclCommGetAsyncError 1834151 1834151 89117 18414380706650 18414380706677
13 RCCL_API ncclAllReduce 1834151 1834151 89116 18414380705574 18414380715055
14 RCCL_API ncclCommGetAsyncError 1834151 1834151 89126 18414380715749 18414380715774
15 RCCL_API ncclAllReduce 1834151 1834151 89125 18414380715463 18414380723944
16 RCCL_API ncclCommGetAsyncError 1834151 1834151 89135 18414380724688 18414380724715
17 RCCL_API ncclAllReduce 1834151 1834151 89134 18414380724395 18414380732209
18 RCCL_API ncclCommGetAsyncError 1834151 1834151 89154 18414380746383 18414380746411
19 RCCL_API ncclCommGetAsyncError 1834151 1834151 89157 18414380749863 18414380749889
20 RCCL_API ncclCommGetAsyncError 1834151 1834151 89160 18414380751671 18414380751696
21 RCCL_API ncclCommGetAsyncError 1834151 1834151 89163 18414380753326 18414380753353
22 RCCL_API ncclCommGetAsyncError 1834151 1834151 89166 18414380755128 18414380755154
Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 35 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 11 KiB

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 51 KiB

+165 -19
Просмотреть файл
@@ -173,7 +173,7 @@ To use ``rocprofv3`` for application tracing, run:
.. code-block:: bash
rocprofv3 <tracing_option> -- <app_relative_path>
rocprofv3 <tracing_option> -- <application_path>
HIP trace
+++++++++++
@@ -184,7 +184,7 @@ To trace HIP runtime APIs, use:
.. code-block:: bash
rocprofv3 --hip-trace -- < app_relative_path >
rocprofv3 --hip-trace -- <application_path>
The above command generates a ``hip_api_trace.csv`` file prefixed with the process ID.
@@ -203,7 +203,7 @@ To trace HIP compile time APIs, use:
.. code-block:: shell
rocprofv3 --hip-compiler-trace -- < app_relative_path >
rocprofv3 --hip-compiler-trace -- <application_path>
The above command generates a ``hip_api_trace.csv`` file prefixed with the process ID.
@@ -229,7 +229,7 @@ HSA trace contains the start and end time of HSA runtime API calls and their asy
.. code-block:: bash
rocprofv3 --hsa-trace -- < app_relative_path >
rocprofv3 --hsa-trace -- <application_path>
The above command generates a ``hsa_api_trace.csv`` file prefixed with process ID. Note that the contents of this file have been truncated for demonstration purposes.
@@ -295,7 +295,7 @@ To trace the API calls enclosed within the range, use:
.. code-block:: bash
rocprofv3 --marker-trace -- < app_relative_path >
rocprofv3 --marker-trace -- <application_path>
Running the preceding command generates a ``marker_api_trace.csv`` file prefixed with the process ID.
@@ -312,6 +312,74 @@ Here are the contents of ``marker_api_trace.csv`` file:
For the description of the fields in the output file, see :ref:`output-file-fields`.
Kernel Rename
++++++++++++++
To rename kernels with their enclosing roctxRangePush/roctxRangePop message. Known as --roctx-rename in earlier rocprof versions.
See how to use ``--kernel-rename`` option with help of below code snippet:
.. code-block:: bash
#include <rocprofiler-sdk-roctx/roctx.h>
roctxRangePush("HIP_Kernel-1");
// Launching kernel from host
hipLaunchKernelGGL(matrixTranspose, dim3(WIDTH/THREADS_PER_BLOCK_X, WIDTH/THREADS_PER_BLOCK_Y), dim3(THREADS_PER_BLOCK_X, THREADS_PER_BLOCK_Y), 0,0,gpuTransposeMatrix,gpuMatrix, WIDTH);
// Memory transfer from device to host
roctxRangePush("hipMemCpy-DeviceToHost");
hipMemcpy(TransposeMatrix, gpuTransposeMatrix, NUM * sizeof(float), hipMemcpyDeviceToHost);
roctxRangePop(); // for "hipMemcpy"
roctxRangePop(); // for "hipLaunchKernel"
roctxRangeStop(rangeId);
To rename the kernel , use:
.. code-block:: bash
rocprofv3 --marker-trace --kernel-rename -- <application_path>
The above command generates a ``marker-trace`` file prefixed with the process ID.
.. code-block:: shell
$ cat 210_marker_api_trace.csv
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
"MARKER_CORE_API","roctxGetThreadId",315155,315155,2,58378843928406,58378843930247
"MARKER_CONTROL_API","roctxProfilerPause",315155,315155,3,58378844627184,58378844627502
"MARKER_CONTROL_API","roctxProfilerResume",315155,315155,4,58378844638601,58378844639267
"MARKER_CORE_API","pre-kernel-launch",315155,315155,5,58378844641787,58378844641787
"MARKER_CORE_API","post-kernel-launch",315155,315155,6,58378844936586,58378844936586
"MARKER_CORE_API","memCopyDth",315155,315155,7,58378844938371,58378851383270
"MARKER_CORE_API","HIP_Kernel-1",315155,315155,1,58378526575735,58378851384485
Kokkos Trace
++++++++++++++
rocprofv3 has a built-in `Kokkos Tools library <https://github.com/kokkos/kokkos-tools>`_ support to trace Kokkos API calls. `Kokkos <https://github.com/kokkos/kokkos>`_ is a C++ library for writing performance portable applications. It is used in many scientific applications to write performance portable code that can run on CPUs, GPUs, and other accelerators.
rocprofv3 loads a built-in Kokkos tools library which emits roctx ranges with the labels passed through the API, e.g. Kokkos::parallel_for(“MyParallelForLabel”, …); will internally calls for roctxRangePush and enables the kernel renaming option so that the highly templated kernel names are replaced by the Kokkos labels.
To enable built-in marker support, use the ``kokkos-trace`` option. Internally this option enables ``marker-trace`` and ``kernel-rename``.:
.. code-block:: bash
rocprofv3 --kokkos-trace -- <application_path>
The above command generates a ``marker-trace`` file prefixed with the process ID.
.. code-block:: shell
$ cat 210_marker_api_trace.csv
"Domain","Function","Process_Id","Thread_Id","Correlation_Id","Start_Timestamp","End_Timestamp"
"MARKER_CORE_API","Kokkos::Initialization Complete",4069256,4069256,1,56728499773965,56728499773965
"MARKER_CORE_API","Kokkos::Impl::CombinedFunctorReducer<CountFunctor, Kokkos::Impl::FunctorAnalysis<Kokkos::Impl::FunctorPatternInterface::REDUCE, Kokkos::RangePolicy<Kokkos::Serial>, CountFunctor, long int>::Reducer, void>",4069256,4069256,2,56728501756088,56728501764241
"MARKER_CORE_API","Kokkos::parallel_reduce: fence due to result being value, not view",4069256,4069256,4,56728501767957,56728501769600
"MARKER_CORE_API","Kokkos::Finalization Complete",4069256,4069256,6,56728502054554,56728502054554
Kernel trace
++++++++++++++
@@ -319,7 +387,7 @@ To trace kernel dispatch traces, use:
.. code-block:: shell
rocprofv3 --kernel-trace -- < app_relative_path >
rocprofv3 --kernel-trace -- <application_path>
The above command generates a ``kernel_trace.csv`` file prefixed with the process ID.
@@ -343,7 +411,7 @@ To trace memory moves across the application, use:
.. code-block:: shell
rocprofv3 –-memory-copy-trace -- < app_relative_path >
rocprofv3 –-memory-copy-trace -- <application_path>
The above command generates a ``memory_copy_trace.csv`` file prefixed with the process ID.
@@ -400,7 +468,7 @@ memory operations (copies and scratch).
.. code-block:: shell
rocprofv3 –-runtime-trace -- < app_relative_path >
rocprofv3 –-runtime-trace -- <application_path>
Running the above command generates ``hip_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``scratch_memory_trace.csv``, ``memory_allocation_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.
@@ -411,7 +479,7 @@ This is an all-inclusive option to collect all the above-mentioned traces.
.. code-block:: shell
rocprofv3 –-sys-trace -- < app_relative_path >
rocprofv3 –-sys-trace -- <application_path>
Running the above command generates ``hip_api_trace.csv``, ``hsa_api_trace.csv``, ``kernel_trace.csv``, ``memory_copy_trace.csv``, ``memory_allocation_trace.csv``, and ``marker_api_trace.csv`` (if ``ROCTx`` APIs are specified in the application) files prefixed with the process ID.
@@ -422,19 +490,45 @@ This option collects scratch memory operation's traces. Scratch is an address sp
.. code-block:: shell
rocprofv3 --scratch-memory-trace -- < app_relative_path >
rocprofv3 --scratch-memory-trace -- <application_path>
Stats
++++++++
RCCL trace
++++++++++++
`RCCL <https://github.com/ROCm/rccl>`_ (pronounced "Rickle") is a stand-alone library of standard collective communication routines for GPUs. This option traces those communication routines.
.. code-block:: shell
rocprofv3 --rccl-trace -- <application_path>
The above command generates a ``rccl_api_trace`` file prefixed with the process ID.
.. code-block:: shell
$ cat 197_rccl_api_trace.csv
Here are the contents of ``rccl_api_trace.csv`` file:
.. csv-table:: RCCL trace
:file: /data/rccl_trace.csv
:widths: 10,10,10,10,10,20,20
:header-rows: 1
Post-processing tracing options
++++++++++++++++++++++++++++++++
1. Stats
+++++++++
This option collects statistics for the enabled tracing types. For example, to collect statistics of HIP APIs, when HIP trace is enabled.
A higher percentage in statistics can help user focus on the API/function that has taken the most time:
.. code-block:: shell
rocprofv3 --stats --hip-trace -- < app_relative_path >
rocprofv3 --stats --hip-trace -- <application_path>
The above command generates a ``hip_api_stats.csv`` and ``hip_api_trace`` file prefixed with the process ID.
The above command generates a ``hip_api_stats.csv``, ``domain_stats.csv`` and ``hip_api_trace.csv`` file prefixed with the process ID.
.. code-block:: shell
@@ -447,8 +541,60 @@ Here are the contents of ``hip_api_stats.csv`` file:
:widths: 10,10,20,20,10,10,10,10
:header-rows: 1
Here are the contents of ``domain_stats.csv`` file:
.. csv-table:: Domain stats
:file: /data/hip_domain_stats.csv
:widths: 10,10,20,20,10,10,10,10
:header-rows: 1
For the description of the fields in the output file, see :ref:`output-file-fields`.
2. Summary
+++++++++++
Output single summary of tracing data at the conclusion of the profiling session
.. code-block:: shell
rocprofv3 -S --hip-trace -- <application_path>
.. image:: /data/rocprofv3_summary.png
2.1 Summary per domain
++++++++++++++++++++++
Outputs the summary of each tracing domain at the end of profiling session.
.. code-block:: shell
rocprofv3 -D --hsa-trace --hip-trace -- <application_path>
The above command generates a ``hip_trace.csv``, ``hsa_trace.csv`` file prefixed with the process ID along with the summary of each domain at the terminal.
2.2 Summary groups
+++++++++++++++++++
Users can create a summary of multiple domains by specifying the domain names in the command line. The summary groups are separated by a pipe (|) symbol.
To create a summary for ``MEMORY_COPY`` domains, use:
.. code-block:: shell
rocprofv3 --summary-groups MEMORY_COPY --sys-trace -- <application_path>
.. image:: /data/rocprofv3_memcpy_summary.png
To create a summary for ``MEMORY_COPY`` and ``HIP_API`` domains, use:
.. code-block:: shell
rocprofv3 --summary-groups 'MEMORY_COPY|HIP_API' --sys-trace -- <application_path>
.. image:: /data/rocprofv3_hip_memcpy_summary.png
Kernel profiling
-------------------
@@ -540,7 +686,7 @@ Properties
{
"jobs": [
{
"pmc": ["SQ_WAVES", "GRBM_COUNT", "GUI_ACTIVE"]
"pmc": ["SQ_WAVES", "GRBM_COUNT", "GRBM_GUI_ACTIVE"]
},
{
"pmc": ["FETCH_SIZE", "WRITE_SIZE"],
@@ -564,7 +710,7 @@ Properties
- pmc:
- SQ_WAVES
- GRBM_COUNT
- GUI_ACTIVE
- GRBM_GUI_ACTIVE
- 'TCC_HIT[1]'
- 'TCC_HIT[2]'
- pmc:
@@ -581,7 +727,7 @@ To supply the counters via ``command-line`` options, use:
.. code-block:: shell
rocprofv3 --pmc SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE -- <app_relative_path>
rocprofv3 --pmc SQ_WAVES GRBM_COUNT GRBM_GUI_ACTIVE -- <application_path>
.. note::
1. Please note that more than 1 counters should be separated by a space or a comma.
@@ -594,7 +740,7 @@ To supply the input file for kernel profiling, use:
.. code-block:: shell
rocprofv3 -i input.txt -- <app_relative_path>
rocprofv3 -i input.txt -- <application_path>
Running the above command generates a ``./pmc_n/counter_collection.csv`` file prefixed with the process ID. For each ``pmc`` row, a directory ``pmc_n`` containing a ``counter_collection.csv`` file is generated, where n = 1 for the first row and so on.
@@ -666,7 +812,7 @@ To collect counters for the kernels matching the filters specified in the preced
.. code-block:: shell
rocprofv3 -i input.yml -- <app_relative_path>
rocprofv3 -i input.yml -- <application_path>
$ cat pass_1/312_counter_collection.csv
"Correlation_Id","Dispatch_Id","Agent_Id","Queue_Id","Process_Id","Thread_Id","Grid_Size","Kernel_Name","Workgroup_Size","LDS_Block_Size","Scratch_Size","VGPR_Count","SGPR_Count","Counter_Name","Counter_Value","Start_Timestamp","End_Timestamp"
+5 -3
Просмотреть файл
@@ -139,7 +139,8 @@ FILE_PATTERNS = *.h \
*.tcc \
conf.py
RECURSIVE = YES
EXCLUDE =
EXCLUDE = @SOURCE_DIR@/README.md \
@SOURCE_DIR@/include/rocprofiler-sdk/rccl/details/rccl.h
EXCLUDE_SYMLINKS = YES
EXCLUDE_PATTERNS = */.git/* \
@SOURCE_DIR@/**/tests/* \
@@ -266,7 +267,7 @@ LATEX_CMD_NAME = latex
MAKEINDEX_CMD_NAME = makeindex
LATEX_MAKEINDEX_CMD = makeindex
COMPACT_LATEX = NO
PAPER_TYPE =
PAPER_TYPE = a4
EXTRA_PACKAGES = float
LATEX_HEADER =
LATEX_FOOTER =
@@ -332,7 +333,8 @@ MACRO_EXPANSION = YES
EXPAND_ONLY_PREDEF = NO
SEARCH_INCLUDES = NO
INCLUDE_PATH = @SOURCE_DIR@/source/include
INCLUDE_FILE_PATTERNS = *.h
INCLUDE_FILE_PATTERNS = *.h \
*.hpp
PREDEFINED = "ROCPROFILER_API=" \
"ROCPROFILER_EXPORT=" \
"ROCPROFILER_IMPORT=" \
+15 -13
Просмотреть файл
@@ -47,7 +47,7 @@ typedef struct
} ncclUniqueId;
/*! @defgroup rccl_result_code Result Codes
@details The various result codes that RCCL API calls may return
@brief The various result codes that RCCL API calls may return
@{ */
/*! @brief Result type
@@ -71,12 +71,14 @@ typedef enum
#define NCCL_SPLIT_NOCOLOR -1
/*! @defgroup rccl_config_type Communicator Configuration
@details Structure that allows for customizing Communicator behavior via
@brief Structure that allows for customizing Communicator behavior via
ncclCommInitRankConfig
@{ */
/*! @brief Communicator configuration
@details Users can assign value to attributes to specify the behavior of a communicator */
/**
* @defgroup Communicator configuration
* @brief Users can assign value to attributes to specify the behavior of a communicator.
*/
typedef struct ncclConfig_v21700
{
/* attributes that users should never touch. */
@@ -122,7 +124,7 @@ ncclResult_t
pncclMemFree(void* ptr);
/*! @defgroup rccl_api_version Version Information
@details API call that returns RCCL version
@brief API call that returns RCCL version
@{ */
/*! @brief Return the RCCL_VERSION_CODE of RCCL in the supplied integer.
@@ -140,7 +142,7 @@ pncclGetVersion(int* version);
/*! @} */
/*! @defgroup rccl_api_communicator Communicator Initialization/Destruction
@details API calls that operate on communicators.
@brief API calls that operate on communicators.
Communicators objects are used to launch collective communication
operations. Unique ranks between 0 and N-1 must be assigned to
each HIP device participating in the same Communicator.
@@ -292,7 +294,7 @@ pncclCommSplit(ncclComm_t comm, int color, int key, ncclComm_t* newcomm, ncclCon
/*! @} */
/*! @defgroup rccl_api_errcheck Error Checking Calls
@details API calls that check for errors
@brief API calls that check for errors
@{ */
/*! @brief Returns a string for each result code.
@@ -330,7 +332,7 @@ pncclCommGetAsyncError(ncclComm_t comm, ncclResult_t* asyncError);
/*! @} */
/*! @defgroup rccl_api_comminfo Communicator Information
@details API calls that query communicator information
@brief API calls that query communicator information
@{ */
/*! @brief Gets the number of ranks in the communicator clique.
@@ -391,7 +393,7 @@ pncclCommDeregister(const ncclComm_t comm, void* handle);
/*! @endcond */
/*! @defgroup rccl_api_enumerations API Enumerations
@details Enumerations used by collective communication calls
@brief Enumerations used by collective communication calls
@{ */
/*! @brief Dummy reduction enumeration
@@ -454,7 +456,7 @@ typedef enum
/*! @} */
/*! @defgroup rccl_api_custom_redop Custom Reduction Operator
@details API calls relating to creation/destroying custom reduction operator
@brief API calls relating to creation/destroying custom reduction operator
that pre-multiplies local source arrays prior to reduction
@{ */
@@ -516,7 +518,7 @@ pncclRedOpDestroy(ncclRedOp_t op, ncclComm_t comm);
/*! @} */
/*! @defgroup rccl_collective_api Collective Communication Operations
@details Collective communication operations must be called separately for each
@brief Collective communication operations must be called separately for each
communicator in a communicator clique.
They return when operations have been enqueued on the HIP stream.
@@ -935,7 +937,7 @@ pncclAllToAllv(const void* sendbuff,
/*! @} */
/*! @defgroup msccl_api MSCCL Algorithm
@details API calls relating to the optional MSCCL algorithm datapath
@brief API calls relating to the optional MSCCL algorithm datapath
@{ */
/*! @brief Opaque handle to MSCCL algorithm */
@@ -1030,7 +1032,7 @@ pmscclUnloadAlgo(mscclAlgoHandle_t mscclAlgoHandle);
/*! @} */
/*! @defgroup rccl_group_api Group semantics
@details When managing multiple GPUs from a single thread, and since RCCL collective
@brief When managing multiple GPUs from a single thread, and since RCCL collective
calls may perform inter-CPU synchronization, we need to "group" calls for
different ranks/devices into a single call.
+1 -1
Просмотреть файл
@@ -30,7 +30,7 @@ ROCPROFILER_EXTERN_C_INIT
/**
* @defgroup REGISTRATION_GROUP Tool registration
*
* Data types and functions for tool registration with rocprofiler
* @brief Data types and functions for tool registration with rocprofiler
* @{
*/
+1 -1
Просмотреть файл
@@ -86,7 +86,7 @@ ROCPROFILER_EXTERN_C_INIT
/**
* @defgroup MISCELLANEOUS_GROUP Miscellaneous Utility Functions
*
* @brief utility functions for library
* @{
*/