Welton, Benjamin 4cd121e27b [SDK] Release 1.0 Public API Modifications (#277)
* Make sure all structs/enums can be forward declared

* Updates to counter collection

- consistency updates and cleanup

* Conversion of dimension information to info struct

* Added deprecated folder

* Testing changes

* merge changes

* Fix shadowed variable

* Source code formatting

* Fix shadowed variable

* Update rocprofiler_counter_info_v1_t member names

* Split version.h into version.h and ext_version.h

- ext_version.h contains external version info, e.g. ROCPROFILER_HSA_API_TABLE_MAJOR_VERSION, ROCPROFILER_HSA_RUNTIME_VERSION
- this reduces amount of recompilation after a commit since version.h gets updated with the git revision

* profile_config -> counter_config

* EOF new line

* [Samples] Reduce header includes + reorg counter collection samples

* Misc compilation fixes

- shadowed variables
- use of [[deprecated("...")]] in C code
- unused variables

* Minor misc modifications

- use common:: instead of rocprofiler::common:: when inside rocprofiler namespace
- counters.cpp
  - move local anon namespace functions into rocprofiler::counters:: anon namespace
  - use std::string_view for get_static_string
  - const ref for get_static_ptr
  - misc namespace shortening

* [Public API] rocprofiler_get_version_triplet + rocprofiler_version_triplet_t

- struct rocprofiler_version_triplet_t containing fields for the major, minor, and patch version
- public API function: rocprofiler_get_version_triplet
- define C++ operators for rocprofiler_version_triplet_t
- C++ function compute_version_triplet

* [Tests] Improve async-copy-testing test

- relax constraints
- improve logging

* Update counter_config.h doxygen docs

* ROCPROFILER_SDK_BETA_COMPAT

- ppdef which helps with renaming when set to 1

* Remove spurious include

* Fix includes for cxx/version.hpp

* Doxygen fixes for rocprofiler_get_version and rocprofiler_get_version_triplet

* Public API Experimental Designation

- ROCPROFILER_SDK_EXPERIMENTAL added to experimental function
- "(experimental)" added to doxygen @brief entries

* Fix use of assert instead of static_assert in hip/stream.cpp

* Use typedef instead of define for rocprofiler_profile_config_id_t

* Use inline rocprofiler_{create,destroy}_profile_config instead of ppdef

- added <rocprofiler-sdk/deprecated/profile_config.h>

* Doxygen for rocprofiler_{create,destroy}_profile_config

* ROCPROFILER_SDK_DEPRECATED_WARNINGS

* Temporarily comment out ROCPROFILER_SDK_DEPRECATED_WARNINGS=1

* cmake formatting

* Misc variable renaming in samples and tests

* Fix declarations of types

* Fix hip stream tracing service struct name

- rocprofiler_callback_tracing_stream_handle_data_t renamed to rocprofiler_callback_tracing_hip_stream_api_data_t

* Rename "HIP_STREAM_API" to "HIP_STREAM"

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
2025-03-24 12:07:33 +05:30
2023-08-24 19:19:48 -05:00
2025-02-04 04:05:38 -06:00
2025-01-22 11:34:21 -06:00
2025-01-22 19:11:20 -06:00
2025-02-04 04:05:38 -06:00
2025-02-13 08:51:10 -06:00

ROCprofiler-SDK: Application Profiling, Tracing, and Performance Analysis

Important

We are phasing out development and support for roctracer/rocprofiler/rocprof/rocprofv2 in favor of rocprofiler-sdk/rocprofv3 in upcoming ROCm releases. Going forward, only critical defect fixes will be addressed for older versions of profiling tools and libraries. We encourage all users to upgrade to the latest version, rocprofiler-sdk library and rocprofv3 tool, to ensure continued support and access to new features.

Overview

ROCProfiler-SDK is AMDs new and improved tooling infrastructure, providing a hardware-specific low-level performance analysis interface for profiling and tracing GPU compute applications. To see what's changed Click Here

Note

The published documentation is available at ROCprofiler-SDK documentation in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the rocprofiler-sdk/source/docs folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see Contribute to ROCm documentation.

GPU Metrics

  • GPU hardware counters
  • HIP API tracing
  • HIP kernel tracing
  • HSA API tracing
  • HSA operation tracing
  • Marker(ROCTx) tracing
  • PC Sampling (Beta)
  • RCCL tracing
  • Kokkoks tracing

Tool Support

rocprofv3 is the command line tool built using the rocprofiler-sdk library and shipped with the ROCm stack. To see details on the command line options of rocprofv3, please see rocprofv3 user guide Click Here

Documentation

We make use of doxygen to generate API documentation automatically. The generated document can be found in the following path:

<ROCM_PATH>/share/html/rocprofiler-sdk

ROCM_PATH by default is /opt/rocm It can be set by the user in different locations if needed.

Build and Installation

git clone https://github.com/ROCm/rocprofiler-sdk.git rocprofiler-sdk-source
cmake                                         \
      -B rocprofiler-sdk-build                \
      -D ROCPROFILER_BUILD_TESTS=ON           \
      -D ROCPROFILER_BUILD_SAMPLES=ON         \
      -D CMAKE_INSTALL_PREFIX=/opt/rocm       \
       rocprofiler-sdk-source

cmake --build rocprofiler-sdk-build --target all --parallel 8

To install ROCprofiler, run:

cmake --build rocprofiler-sdk-build --target install

Please see the detailed section on build and installation here: Click Here

Support

Please report in the Github Issues.

Limitations

  • Individual XCC mode is not supported.

  • By default, PC sampling API is disabled. To use PC sampling. Setting the ROCPROFILER_PC_SAMPLING_BETA_ENABLED environment variable grants access to the PC Sampling experimental beta feature. This feature is still under development and may not be completely stable.

    • Risk Acknowledgment: By activating this environment variable, you acknowledge and accept the following potential risks:

      • Hardware Freeze: This beta feature could cause your hardware to freeze unexpectedly.
      • Need for Cold Restart: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations. Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.
    • At this point, We do not recommend stress-testing the beta implementation.

    • Correlation IDs provided by the PC sampling service are verified only for HIP API calls.

    • Timestamps in PC sampling records might not be 100% accurate.

    • Using PC sampling on multi-threaded applications might fail with HSA_STATUS_ERROR_EXCEPTION.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the HSA_STATUS_ERROR_EXCEPTION might appear.

  • gfx11 and gfx12 requires a stable power state for counter collection. This includes Radeon 7000 GPUs.

    # For device <N>. Use 'rocm-smi' or 'amd-smi monitor' to see device number.
    sudo amd-smi set -g <N> -l stable_std
    # After profiling, set power state back to 'auto'
    sudo amd-smi set -g <N> -l auto
    

    The gfx version can be found via amd-smi static --asic -g <N> in the TARGET_GRAPHICS_VERSION field:

    $ amd-smi static -a -g 2
    GPU: 2
        ASIC:
            MARKET_NAME: Navi 33 [Radeon Pro W7500]
            VENDOR_ID: 0x1002
            VENDOR_NAME: Advanced Micro Devices Inc. [AMD/ATI]
            SUBVENDOR_ID: 0x1002
            DEVICE_ID: 0x7489
            SUBSYSTEM_ID: 0x0e0d
            REV_ID: 0x00
            ASIC_SERIAL: N/A
            OAM_ID: N/A
            NUM_COMPUTE_UNITS: 28
            TARGET_GRAPHICS_VERSION: gfx1102
    

Warning

The latest mainline version of AQLprofile can be found at https://repo.radeon.com/rocm/misc/aqlprofile/. However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source.

S
توضیحات
No description provided
Readme 282 MiB
Languages
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
دیگر 1.1%