Add documentation describing ROCPROFSYS_USE_RCCP (#110)

* Add documentation describing ROCPROFSYS_USE_RCCP

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update wordlist

Signed-off-by: David Galiffi <David.Galiffi@amd.com>

* Update CHANGELOGS.md

---------

Signed-off-by: David Galiffi <David.Galiffi@amd.com>
Co-authored-by: David Galiffi <David.Galiffi@amd.com>
Tá an tiomantas seo le fáil i:
systems-assistant[bot]
2025-08-13 18:01:18 -04:00
tiomanta ag GitHub
tuismitheoir 80b7e6baee
tiomantas dd37d215fd
D'athraigh 4 comhad le 19 breiseanna agus 0 scriosta
+2
Féach ar an gComhad
@@ -30,6 +30,8 @@ ppc
proc
proto
Pthreads
RCCL
RCCLP
rocDecode
rocdecode
ROCprofiler
+1
Féach ar an gComhad
@@ -18,6 +18,7 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs.
- Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics.
- ROCprofiler-SDK is now used to trace RCCL API and collect communication counters.
- Use the setting `ROCPROFSYS_USE_RCCLP = ON` to enable profiling and tracing of RCCL application data.
- Updated the Dyninst submodule to v13.0.
- Set the default value of `ROCPROFSYS_SAMPLING_CPUS` to `none`.
Ní thaispeántar comhad dénártha.

Tar éis

Leithead:  |  Airde:  |  Méid: 34 KiB

@@ -225,6 +225,22 @@ and memory copy operations submitted. With the
``ROCPROFSYS_ROCM_GROUP_BY_QUEUE=ON`` setting, the trace will display HSA queues
to which these kernel and memory operations were submitted.
ROCPROFSYS_USE_RCCLP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Use the setting ``ROCPROFSYS_USE_RCCLP = ON`` to enable profiling and tracing of
ROCm Communication Collectives Library (RCCL, also pronounced as 'Rickle'). When this setting is enabled,
ROCm Systems Profiler will trace the RCCL API calls and collect performance metrics related to collective operations.
The image below shows an example of a Perfetto trace with RCCL communication data and API tracing enabled:
.. image:: ../data/rccl-comm-recv.png
:alt: Perfetto tracks with RCCL Communication Data and API tracing
.. note::
There is a known issue which causes the application to exit with an error. However, the trace data can still be found in the output directory.
This issue is being tracked internally.
Exploring GPU Metrics
---------------------