From d5ca98baed35629fdcede103bb2be93b1dc7da52 Mon Sep 17 00:00:00 2001 From: "Bhardwaj, Gopesh" Date: Wed, 2 Jul 2025 12:25:24 +0530 Subject: [PATCH] Adding OpenMP usage with rocprofv3 (#472) * Adding openmp usage with rocprofv3 * minor changes * Fixing missing line [ROCm/rocprofiler-sdk commit: e7616c3aad9ed99dbc2d1fcf97b1dbdc7399905f] --- .../rocprofiler-sdk/source/docs/_toc.yml.in | 1 + .../how-to/using-rocprofv3-with-openmp.rst | 90 +++++++++++++++++++ .../rocprofiler-sdk/source/docs/index.rst | 1 + 3 files changed, 92 insertions(+) create mode 100644 projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-openmp.rst diff --git a/projects/rocprofiler-sdk/source/docs/_toc.yml.in b/projects/rocprofiler-sdk/source/docs/_toc.yml.in index f47bac65db..34573293e8 100644 --- a/projects/rocprofiler-sdk/source/docs/_toc.yml.in +++ b/projects/rocprofiler-sdk/source/docs/_toc.yml.in @@ -18,6 +18,7 @@ subtrees: - file: how-to/using-rocprofv3-avail - file: how-to/using-rocprofiler-sdk-roctx - file: how-to/using-rocprofv3-with-mpi + - file: how-to/using-rocprofv3-with-openmp - file: how-to/using-pc-sampling - file: how-to/using-thread-trace - caption: API reference diff --git a/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-openmp.rst b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-openmp.rst new file mode 100644 index 0000000000..3ceaa70411 --- /dev/null +++ b/projects/rocprofiler-sdk/source/docs/how-to/using-rocprofv3-with-openmp.rst @@ -0,0 +1,90 @@ +.. meta:: + :description: Documentation for using rocprofv3 with OpenMP applications + :keywords: ROCprofiler-SDK tool, OpenMP, rocprofv3, rocprofv3 tool usage, ROCprofiler-SDK command line tool, ROCprofiler-SDK CLI + + +.. _using-rocprofv3-with-openmp: + +Using rocprofv3 with OpenMP ++++++++++++++++++++++++++++++ + +`rocprofv3` does not provide native support for profiling CPU-side OpenMP code. However, when OpenMP is used to offload computations to AMD GPUs (for example, via OpenMP target offload), `rocprofv3` can capture and profile GPU activities initiated by these offloaded regions. Note that profiling of CPU-side OpenMP parallel regions is not supported. + +Example: Vector Addition Using OpenMP Offload on AMD GPUs +--------------------------------------------------------- + +The following example demonstrates how to perform vector addition using OpenMP target offload, enabling execution of the workload on AMD GPUs. + +**Key Steps:** + +- Initialize input arrays on the host. +- Offload the vector addition computation to the GPU using OpenMP directives. +- Retrieve and verify the results on the host. + +.. code-block:: c + + #include + #include + + #define N 1024 + + int main() { + float a[N], b[N], c[N]; + + // Initialize input arrays + for (int i = 0; i < N; ++i) { + a[i] = i * 1.0f; + b[i] = (N - i) * 1.0f; + } + + // Offload vector addition to GPU + #pragma omp target teams distribute parallel for map(to: a[0:N], b[0:N]) map(from: c[0:N]) + for (int i = 0; i < N; ++i) { + c[i] = a[i] + b[i]; + } + + // Verify results + int errors = 0; + for (int i = 0; i < N; ++i) { + if (c[i] != N * 1.0f) { + errors++; + } + } + + if (errors == 0) { + printf("Vector addition successful!\\n"); + } else { + printf("Vector addition failed with %d errors.\\n", errors); + } + + return 0; + } + + +Building the OpenMP Offload Application +--------------------------------------- + +To compile the application for AMD GPU offload, use the following command: + +.. code-block:: bash + + amdclang++ -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa -L/opt/rocm/lib --offload-arch=gfx9xx -o vector_add + +Profiling the Application with rocprofv3 +---------------------------------------- + +To profile the GPU activity during execution, run the application with `rocprofv3`: + +.. code-block:: bash + + rocprofv3 -s --output-format csv -- ./vector_add + +Upon execution, `rocprofv3` will generate several CSV trace files, such as: + +- `_kernel_trace.csv` +- `_hsa_api_trace.csv` +- `_memory_copy_trace.csv` +- `_memory_allocation_trace.csv` +- `_scratch_memory_trace.csv` + +These files contain detailed profiling information about GPU kernel execution, HSA API calls, memory operations, and more, enabling comprehensive analysis of the offloaded workload. diff --git a/projects/rocprofiler-sdk/source/docs/index.rst b/projects/rocprofiler-sdk/source/docs/index.rst index 383acafe9f..8ef7325bee 100644 --- a/projects/rocprofiler-sdk/source/docs/index.rst +++ b/projects/rocprofiler-sdk/source/docs/index.rst @@ -35,6 +35,7 @@ The documentation is structured as follows: * :ref:`using-rocpd-output-format` * :ref:`using-rocprofiler-sdk-roctx` * :ref:`using-rocprofv3-with-mpi` + * :ref:`using-rocprofv3-with-openmp` * :ref:`using-pc-sampling` * :ref:`using-thread-trace`