From 996202f56033598b66cdc7d7c4473c6c283e6cb4 Mon Sep 17 00:00:00 2001 From: vedithal-amd Date: Tue, 27 Jan 2026 17:22:41 -0500 Subject: [PATCH] [rocprofiler-compute] Backport documentation changes from ROCm 7.1 release branch (#2894) * Backport documentation changes from ROCm 7.1 release branch * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Address review comments --------- Co-authored-by: Pratik Basyal Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- projects/rocprofiler-compute/CHANGELOG.md | 26 ++++++++++--------- .../docs/how-to/live_attach_detach.rst | 16 ++++++------ projects/rocprofiler-compute/docs/index.rst | 5 +++- 3 files changed, 26 insertions(+), 21 deletions(-) diff --git a/projects/rocprofiler-compute/CHANGELOG.md b/projects/rocprofiler-compute/CHANGELOG.md index 0b28ffca84..8c33a1f044 100644 --- a/projects/rocprofiler-compute/CHANGELOG.md +++ b/projects/rocprofiler-compute/CHANGELOG.md @@ -132,15 +132,9 @@ A proposed long-term solution uses threshold-based clamping, distinguishing betw ### Added -* Improved standalone Roofline plots in profile mode (PDF output) and analyze mode (CLI and GUI visual plots): - * Fixed the peak MFMA/VALU lines being cut off. - * Cleaned up the overlapping roofline numeric values by moving them into the side legend. - * Added AI points chart with respective values, cache level, and compute/memory bound status. - * Added full kernel names to symbol chart. - -* Add support for multi-kernel applications' pc sampling. - * PC Sampling's outputs' instructions are displayed with the name of the kernel that individual instruction belongs to. - * Single kernel selection is supported so that the pc samples of selected kernel can be displayed. +* Add support for PC sampling of multi-kernel applications. + * PC Sampling output instructions are displayed with the name of the kernel that individual instruction belongs to. + * Single kernel selection is supported so that the PC samples of selected kernel can be displayed. ### Changed @@ -149,15 +143,23 @@ A proposed long-term solution uses threshold-based clamping, distinguishing betw ### Optimized -* Improved Roofline Benchmarking by updating the `flops_benchmark` calculation. +* Improved roofline benchmarking by updating the `flops_benchmark` calculation. + +* Improved standalone roofline plots in profile mode (PDF output) and analyze mode (CLI and GUI visual plots): + * Fixed the peak MFMA/VALU lines being cut off. + * Cleaned up the overlapping roofline numeric values by moving them into the side legend. + * Added AI points chart with respective values, cache level, and compute/memory bound status. + * Added full kernel names to symbol chart. ### Resolved issues -* Bugfixes for stability + +* Resolved existing issues to improve stability. ## ROCm Compute Profiler 3.3.0 for ROCm 7.1.0 ### Added -* Live attach/detach feature that allows coupling with a workload process, without controlling its start or end. + +* Dynamic process attachment feature that allows coupling with a workload process, without controlling its start or end. * Use '--attach-pid' to specify the target process ID. * Use '--attach-duration-msec' to specify time duration. diff --git a/projects/rocprofiler-compute/docs/how-to/live_attach_detach.rst b/projects/rocprofiler-compute/docs/how-to/live_attach_detach.rst index c88090b811..b28bfa6ce9 100644 --- a/projects/rocprofiler-compute/docs/how-to/live_attach_detach.rst +++ b/projects/rocprofiler-compute/docs/how-to/live_attach_detach.rst @@ -1,12 +1,12 @@ .. meta:: - :description: ROCm Compute Profiler: using Live Attach Detach - :keywords: ROCm Compute Profiler, Attach Detach + :description: Dynamic process attachment in ROCm Compute Profiler + :keywords: ROCm Compute Profiler, Attach, Detach, Dynamic process attachment *********************************************************** -Using Live Attach/Detach in ROCm Compute Profiler +Dynamic process attachment in ROCm Compute Profiler *********************************************************** -Live Attach/Detach is a new feature of ROCm Compute Profiler that allows coupling with a workload process, without controlling its start or end. The application can already be running before the profiler application is invoked. The profiler simply attaches to the process, collects the required counters, and then detaches—without altering the lifecycle of the workload. +Dynamic process attachment is a new feature of ROCm Compute Profiler that allows coupling with a workload process, without controlling its start or end. The application can already be running before the profiler application is invoked. The profiler simply attaches to the process, collects the required counters, and then detaches—without altering the lifecycle of the workload. A specific attach is not repeatable, and it can only collect the set of counters that the hardware is capable of capturing in a single run. As such, in the current implementation, you must specify a subset of counter groups that can be collected within one run. This can be done either by using the ``--block`` option (for example, --block 3.1.1 4.1.1 5.1.1) or by providing a predefined set through the use of single pass counter collection ``--set``. @@ -38,11 +38,11 @@ For using profiling options for PC sampling the configuration needed are: ----------------------- Analysis options ----------------------- -The analyze options for attach/detach are completely compatible with the non-attach/detach option. + +The analyze options for Dynamic process attachment are completely compatible with other non-Dynamic process attachment options. .. note:: - * Live Attach Detach feature is currently in BETA version. To enable Live/Attach Detach, you need to have the correct supported proper version of ROCprofiler-SDK and rocprofiler-register. - * Live Attach/Detach does not work with --iteration-multiplexing option. This is because --iteration-multiplexing uses native counter collection tool which currently does not support attach/detach feature. - * To make the Live Attach/Detach feature work, you must restrict the number of counter input files (which determine number of application runs) to one. This can be achieved with options such as: "--block", "--set". + * Dynamic process attachment feature is currently in BETA version. To enable Dynamic process attachment, you need to have the correct supported version of ROCprofiler-SDK and rocprofiler-register. + * To make the Dynamic process attachment feature work, you must use "--block" or "--set" to limit the number of counter input files to ensure single application run. You can also use "--iteration-multiplexing" to ensure single application run. * Due to the limitation of ROCprofiler-SDK, the attach can now only happen before Heterogeneous System Architecture (HSA) initialization. HSA initialization happens before the execution of the first HIP kernel call. It only happens once to save all the kernels' function signature, such as the function name and other launch parameters. Attaching after this stage misses all crucial information of the HIP kernel and makes it impossible to store the output. This limitation will be solved in later releases of ROCprofiler-SDK. diff --git a/projects/rocprofiler-compute/docs/index.rst b/projects/rocprofiler-compute/docs/index.rst index 7ea63dac79..a8bdd2df56 100644 --- a/projects/rocprofiler-compute/docs/index.rst +++ b/projects/rocprofiler-compute/docs/index.rst @@ -41,8 +41,11 @@ in practice. * :doc:`how-to/use` - * :doc:`how-to/profile/mode` + * :doc:`how-to/pc_sampling` + * :doc:`how-to/live_attach_detach` + + * :doc:`how-to/profile/mode` * :doc:`how-to/analyze/mode` * :doc:`how-to/analyze/cli`