Mythreya 363f85dc72 Report page migration events as start/end (#793)
* Squashed commit of the following:

commit b76f2635f4b65599f03812a73d0cf410f5ada213
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Fri Apr 26 00:29:09 2024 +0000

    Changed for PR feedback

commit bedb8ad566ff42fbf117b19202c26c507abcf8ac
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:20:06 2024 -0500

    Fix installation

commit a98f8a69459a1450a1be9c98e20b3c1e7f2568c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date:   Thu Apr 25 19:16:35 2024 -0500

    Restructure the headers

commit 46489a020ffafdd5f4ce3f580469ff233ef67fe1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:31:10 2024 +0000

    Update hsa include

commit 8e795282cce348fc6aa736b7857b21aeb32aa20a
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 23 23:02:32 2024 +0000

    Report page migration events as start/end

    * Updated tests accordingly
    * Page migration events are reported independently

commit 8784e5ad4895a626a2a8e4ac12f8021b34172bd4
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:57 2024 +0000

    Update handling of dropped page migration events

    Previously, we dropped all locally buffered events when we detect that
    KFD has dropped some events. This may drop too many pending events too eagerly.

    When we receive an end event and cannot find the corresponding start,
    we can be sure that KFD has dropped some events in the immediate past.

    When this happens, we look through all locally buffered events and report
    the start events that are older than 10s as partial events --- they have
    no "end" information (we expect that the end events have been dropped).

    We also set the polling timeout to 10s to prevent the local buffer from
    getting too large with events waiting to be paired up.

    Updated tests

commit 2e8e0b07eeda9b5990e1ae8d28dcd3a035ce38e1
Author: Mythreya <mythreya.kuricheti@amd.com>
Date:   Tue Apr 16 17:01:31 2024 +0000

    Docs for triggers

* Fix page migration sample

* Fix hasher, kfd install

* Add hsa include
* Install KFD include dir

* Updates from code review

- single timestamp field
- node_id -> agent_id
- from_node -> from_agent
- to_node -> to_agent

* Misc revisions

* Remove page-migration install target

* Update page-migration pytest

* Tweak to serialization

* Address PR comments

* Update page-migration test

* Add cli args, update iterations

* Address PR comments

* Add abi.cpp for static_asserts
* Update page_migration gtest with only runtime tests
* Moved helpers into utils.hpp

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-11-11 11:08:47 -06:00
2023-08-24 19:19:48 -05:00
2024-06-22 00:10:54 +05:30
2024-04-14 14:35:00 -05:00
2023-11-14 10:58:33 -06:00

ROCprofiler-SDK: Application Profiling, Tracing, and Performance Analysis

Note

rocprofiler-sdk is currently considered a beta version and is subject to change in future releases

Overview

ROCProfiler-SDK is AMD’s new and improved tooling infrastructure, providing a hardware-specific low-level performance analysis interface for profiling and tracing GPU compute applications. To see what's changed Click Here

Note

The published documentation is available at ROCprofiler-SDK documentation in an organized, easy-to-read format, with search and a table of contents. The documentation source files reside in the rocprofiler-sdk/source/docs folder of this repository. As with all ROCm projects, the documentation is open source. For more information on contributing to the documentation, see Contribute to ROCm documentation.

GPU Metrics

  • GPU hardware counters
  • HIP API tracing
  • HIP kernel tracing
  • HSA API tracing
  • HSA operation tracing
  • Marker(ROCTx) tracing
  • PC Sampling (Beta)

Tool Support

rocprofv3 is the command line tool built using the rocprofiler-sdk library and shipped with the ROCm stack. To see details on the command line options of rocprofv3, please see rocprofv3 user guide Click Here

Documentation

We make use of doxygen to generate API documentation automatically. The generated document can be found in the following path:

<ROCM_PATH>/share/html/rocprofiler-sdk

ROCM_PATH by default is /opt/rocm It can be set by the user in different locations if needed.

Build and Installation

git clone https://git@github.com:ROCm/rocprofiler-sdk.git rocprofiler-sdk-source
cmake                                         \
      -B rocprofiler-sdk-build                \
      -D ROCPROFILER_BUILD_TESTS=ON           \
      -D ROCPROFILER_BUILD_SAMPLES=ON         \
      -D CMAKE_INSTALL_PREFIX=/opt/rocm       \
       rocprofiler-sdk-source

cmake --build rocprofiler-sdk-build --target all --parallel 8

To install ROCprofiler, run:

cmake --build rocprofiler-sdk-build --target install

Please see the detailed section on build and installation here: Click Here

Support

Please report in the Github Issues.

Limitations

  • Individual XCC mode is not supported.

  • By default, PC sampling API is disabled. To use PC sampling. Setting the ROCPROFILER_PC_SAMPLING_BETA_ENABLED environment variable grants access to the PC Sampling experimental beta feature. This feature is still under development and may not be completely stable.

    • Risk Acknowledgment: By activating this environment variable, you acknowledge and accept the following potential risks:
      • Hardware Freeze: This beta feature could cause your hardware to freeze unexpectedly.
      • Need for Cold Restart: In the event of a hardware freeze, you may need to perform a cold restart (turning the hardware off and on) to restore normal operations. Please use this beta feature cautiously. It may affect your system's stability and performance. Proceed at your own risk.
  • At this point, We do not recommend stress-testing the beta implementation.

  • Correlation IDs provided by the PC sampling service are verified only for HIP API calls.

  • Timestamps in PC sampling records might not be 100% accurate.

  • Using PC sampling on multi-threaded applications might fail with HSA_STATUS_ERROR_EXCEPTION.Furthermore, if three or more threads launch operations to the same agent, and if PC sampling is enabled, the HSA_STATUS_ERROR_EXCEPTION might appear.

Warning

The latest mainline version of AQLprofile can be found at https://repo.radeon.com/rocm/misc/aqlprofile/. However, it's important to note that updates to the public AQLProfile may not occur as frequently as updates to the rocprofiler-sdk. This discrepancy could lead to a potential mismatch between the AQLprofile binary and the rocprofiler-sdk source.

S
Description
Aucune description fournie
Lisez-moi 282 MiB
Langue
C++ 67.5%
C 20.6%
Python 6.6%
CMake 3.4%
Shell 0.6%
Autre 1.1%