diff --git a/projects/rocprofiler-systems/.wordlist.txt b/projects/rocprofiler-systems/.wordlist.txt index 19f383ce6e..dd6a6219c0 100644 --- a/projects/rocprofiler-systems/.wordlist.txt +++ b/projects/rocprofiler-systems/.wordlist.txt @@ -30,6 +30,8 @@ ppc proc proto Pthreads +RCCL +RCCLP rocDecode rocdecode ROCprofiler diff --git a/projects/rocprofiler-systems/CHANGELOG.md b/projects/rocprofiler-systems/CHANGELOG.md index 88de0f6a25..dea0591f5c 100644 --- a/projects/rocprofiler-systems/CHANGELOG.md +++ b/projects/rocprofiler-systems/CHANGELOG.md @@ -18,6 +18,7 @@ Full documentation for ROCm Systems Profiler is available at [https://rocm.docs. - Replaced ROCm SMI backend with AMD SMI backend for collecting GPU metrics. - ROCprofiler-SDK is now used to trace RCCL API and collect communication counters. + - Use the setting `ROCPROFSYS_USE_RCCLP = ON` to enable profiling and tracing of RCCL application data. - Updated the Dyninst submodule to v13.0. - Set the default value of `ROCPROFSYS_SAMPLING_CPUS` to `none`. diff --git a/projects/rocprofiler-systems/docs/data/rccl-comm-recv.png b/projects/rocprofiler-systems/docs/data/rccl-comm-recv.png new file mode 100644 index 0000000000..3e597cf9db Binary files /dev/null and b/projects/rocprofiler-systems/docs/data/rccl-comm-recv.png differ diff --git a/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst b/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst index 9e540243f0..7d5420f06f 100644 --- a/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst +++ b/projects/rocprofiler-systems/docs/how-to/configuring-runtime-options.rst @@ -225,6 +225,22 @@ and memory copy operations submitted. With the ``ROCPROFSYS_ROCM_GROUP_BY_QUEUE=ON`` setting, the trace will display HSA queues to which these kernel and memory operations were submitted. +ROCPROFSYS_USE_RCCLP +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Use the setting ``ROCPROFSYS_USE_RCCLP = ON`` to enable profiling and tracing of +ROCm Communication Collectives Library (RCCL, also pronounced as 'Rickle'). When this setting is enabled, +ROCm Systems Profiler will trace the RCCL API calls and collect performance metrics related to collective operations. + +The image below shows an example of a Perfetto trace with RCCL communication data and API tracing enabled: + +.. image:: ../data/rccl-comm-recv.png + :alt: Perfetto tracks with RCCL Communication Data and API tracing + +.. note:: + There is a known issue which causes the application to exit with an error. However, the trace data can still be found in the output directory. + This issue is being tracked internally. + Exploring GPU Metrics ---------------------