Add documentation describing ROCPROFSYS_USE_RCCP (#110)

* Add documentation describing ROCPROFSYS_USE_RCCP

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update wordlist

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CHANGELOGS.md

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
This commit is contained in:
systems-assistant[bot]
2025-08-13 18:01:18 -04:00
committad av GitHub
förälder 80b7e6baee
incheckning dd37d215fd
4 ändrade filer med 19 tillägg och 0 borttagningar
@@ -30,6 +30,8 @@ ppc
proc
proto
Pthreads
RCCL
RCCLP
rocDecode
rocdecode
ROCprofiler
+1
Visa fil
@@ -18,6 +18,7 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.
- Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
- ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
- Use the setting `ROCPROFSYS_USE_RCCLP = ON` to enable profiling and tracing of RCCL application data.
- Updated the Dyninst submodule to v13.0.
- Set the default value of `ROCPROFSYS_SAMPLING_CPUS` to `none`.
Binary file not shown.

Efter

Bredd:  |  Höjd:  |  Storlek: 34 KiB

@@ -225,6 +225,22 @@ and memory copy operations submitted. With the
``ROCPROFSYS_ROCM_GROUP_BY_QUEUE=ON`` setting, the trace will display HSA queues
to which these kernel and memory operations were submitted.
ROCPROFSYS_USE_RCCLP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use the setting ``ROCPROFSYS_USE_RCCLP = ON`` to enable profiling and tracing of
ROCm Communication Collectives Library (RCCL, also pronounced as 'Rickle'). When this setting is enabled,
ROCm Systems Profiler will trace the RCCL API calls and collect performance metrics related to collective operations.
The image below shows an example of a Perfetto trace with RCCL communication data and API tracing enabled:
.. image:: ../data/rccl-comm-recv.png
:alt: Perfetto tracks with RCCL Communication Data and API tracing
.. note::
There is a known issue which causes the application to exit with an error. However, the trace data can still be found in the output directory.
This issue is being tracked internally.
Exploring GPU Metrics
---------------------