Pull from Github

Squashed commit of the following:

commit f029195705a15700380c6f832ba5d15d46fd6de7
Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Date:   Thu Jul 13 14:38:56 2023 -0500

    Formatting workflows for source (clang-format) and cmake (cmake-format) (#4)

    * Add .cmake-format.yaml file

    * Add formatting workflow

    * provide base input for creating PR

    * Update scheme for extracting branch name

    - disable running formatting on push to amd-staging branch

    * patch .cmake-format.yaml for find_package signature

    - apparently cmake-format doesn't format the full signature of find_package

    * run formatting (clang-format v11) (#7)

    Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

    * run cmake formatting (cmake-format) (#6)

    Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

    ---------

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

commit bc4d135fdd8a1a9e51235f18a5d575fd2b3735e6
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Thu Jul 13 12:55:17 2023 -0500

    Removing Build cache for potential issues with auto-generated header files (#5)

    Change-Id: I9e2319f4335e2f88585ffa6fac2bd88a1c952e6e

commit ce86dea6a311d44d880fa684eb78f3329295e2a4
Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Date:   Thu Jul 13 11:08:58 2023 -0500

    Fix decltype(<hsa-function>) function pointer usage (#3)

    - the following is done in several places:
        decltype(hsa_memory_allocate)* hsa_memory_allocate
    - above can cause compiler errors
    - replace decltype(<hsa-function>) with decltype(::<hsa-function>)
      - this ensures that the type within the decltype is recognized as the global scope HSA function, not the variable
    - in many places, the variable has a "_fn" suffix to prevent this issue but added '::' anyway for consistency

commit ac49fdd92a72e9c99394253a02da413a6c2e3b3a
Merge: a07946a 03a0855
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Wed Jul 12 11:36:24 2023 -0500

    Merge pull request #2 from ROCm-Developer-Tools/gerrit-amd-staging

    Pull from gerrit

commit 03a085588cffe863e8f466de67be1cfb205b675a
Merge: c26b32b a07946a
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Wed Jul 12 10:57:30 2023 -0500

    Merge branch 'amd-staging' into gerrit-amd-staging

commit a07946a5cd4c670c83c27ad1a076a9d4567ce6d7
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 15:46:04 2023 +0000

    Enabling Cached Builds

commit 525e494a7f13941077a8fd4ad6840904db4d27d4
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 04:53:54 2023 +0000

    Updating missed GPU Targets

commit 42c75862f628c9bee7cfb7dc04dff2619430efbc
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 04:43:02 2023 +0000

    Adding V1 Testing

commit 9d72fd4aee85e4b0c12e717060d2730fa5b73be1
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 03:34:31 2023 +0000

    Fixing Artifacts directory path

commit f4000cc558b3b2e4676f7994f7ce8c8e6f94518e
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 03:27:26 2023 +0000

    Fixing CMake for test build job

commit 2ce8115d4c33948c3c8f957f545a95a04e1d6cd2
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 03:16:18 2023 +0000

    Fixing Ubuntu CMake for ubuntu test build

commit 6d0ed439191be900748d0c025157f9d689a73ec7
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 01:28:41 2023 +0000

    Removing Navi21

commit e349a7642e5ae5eb03ab9fcd0a0f74f09f78cab5
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 01:14:14 2023 +0000

    Removing Navi21

commit fefd02fe68d2a4bca7ec2e381960ad004ee9fc5b
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 00:42:48 2023 +0000

    Fixing CMake Job

commit 2ea46abf7bf92643efa8c549fa70346ffbd79d65
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 00:35:13 2023 +0000

    Fixing CMake Job

commit d99d681ed1999c5fcf291dc678b11a77205fb0f3
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 00:32:13 2023 +0000

    Fixing Pull Latest Dockers and CMake Jobs

commit dfc4498072d13b4a1df3a63047d34c682c3d9a29
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 23:54:21 2023 +0000

    Fixing CMake job

commit 919efe04de707f7c702031be15c3e2c5f8442cbb
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 23:52:13 2023 +0000

    Adding Pull Last dockers job

commit be1b1256e8b0e05308e8f7e7e69bee3acca55281
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 18:25:40 2023 -0500

    Update cmake.yml

commit 212299fa4355ae6ec18f9aaacbb79c51ea6c6f97
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 18:23:35 2023 -0500

    Update cmake.yml

commit 7c2c1327086a61466cc6cac39f70865c051a8bc7
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 18:18:53 2023 -0500

    Update cmake.yml

commit 191b5ce007e612e814c1d7a3afb4ad398f3852e1
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 16:03:22 2023 -0500

    Update cmake.yml

commit 8824113d95f3e13c7ce4d0af8e0d9d8f522a6c4a
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 16:28:09 2023 +0000

    Fixing Pull from Gerrit job name

    Change-Id: I9e7ed9a27a13ca49d62c93bdadb30f0057e4d385

commit cc3d5e4b02ffb439e8cc2b3efa53527c376f9982
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 16:21:43 2023 +0000

    Adding Staging sync job

    Change-Id: I0551f43878b0678ce4b3e74e27d62357cf95ad95

commit b9be2eee71380a2e6dd34d520e92d0c4209277a0
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 15:57:11 2023 +0000

    Fixing build.sh

    Change-Id: Ia987b0244f0875370d5fe69907b3f5e9cea914de

commit 9eee33a95a1abd656a7ac5ca10a9f245e9825431
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 21:39:46 2023 -0500

    Update cmake.yml

commit 7093b85a78497140e8b52632ca2a002bdaeacd62
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 21:33:29 2023 -0500

    Update cmake.yml

commit f54697172c72a67740f9fdfa0c217b6ea6931576
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 21:01:26 2023 -0500

    Update cmake.yml

commit 1b6620e16f8940386b0f4f04e69e2410d21c0e26
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 20:21:02 2023 -0500

    Update cmake.yml

commit a94bec740c6b42c4b79c87bca20fa87b99bf060d
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:46:35 2023 -0500

    Update cmake.yml

commit 85d6b29d4375a69d575c18ece8542c50f2ddfcc3
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:34:39 2023 -0500

    Update cmake.yml

commit 8c004887cf1435f1a6214c3d2455299a8a27bd4c
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:31:17 2023 -0500

    Update cmake.yml

commit a14a9168e17d9348a53c6e9c9a47ba1edb4c4509
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:25:46 2023 -0500

    Update cmake.yml

commit 000f2f40b84e6a2f7d4becdbf5aed01436ca4c83
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:08:18 2023 -0500

    Update cmake.yml

commit a28a53d56731cad848fa9133d1c4dbaa8fc7afa7
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:03:39 2023 -0500

    Update cmake.yml

commit a6a2db01027f0b01fdfbb5997ddb772c7f51b649
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 18:21:53 2023 -0500

    Update cmake.yml

commit 118ef2a88b2d44e3207c31c343da3e5e5ec6f176
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 17:55:57 2023 -0500

    Update cmake.yml

commit 03c4c232396440cd0be6d2dd7baf4ceea1c2589d
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 17:48:49 2023 -0500

    Create cmake.yml

Change-Id: I77992f15694e77cbae49c56f9ff02f4f9079235d


[ROCm/rocprofiler commit: d4a33cf33a]
This commit is contained in:
Ammar ELWazir
2023-07-13 19:48:38 +00:00
کامیت شده توسط Ammar Elwazir
والد e599708211
کامیت 6eb06cf201
144فایلهای تغییر یافته به همراه154836 افزوده شده و 166916 حذف شده
@@ -0,0 +1,98 @@
parse:
additional_commands:
find_package:
flags:
- EXACT
- QUIET
- MODULE
- REQUIRED
- CONFIG
- NO_MODULE
- GLOBAL
- NO_POLICY_SCOPE
- BYPASS_PROVIDER
- NO_DEFAULT_PATH
- NO_PACKAGE_ROOT_PATH
- NO_CMAKE_PATH
- NO_CMAKE_ENVIRONMENT_PATH
- NO_SYSTEM_ENVIRONMENT_PATH
- NO_CMAKE_PACKAGE_REGISTRY
- NO_CMAKE_BUILDS_PATH
- NO_CMAKE_SYSTEM_PATH
- NO_CMAKE_INSTALL_PREFIX
- NO_CMAKE_SYSTEM_PACKAGE_REGISTRY
- CMAKE_FIND_ROOT_PATH_BOTH
- ONLY_CMAKE_FIND_ROOT_PATH
- NO_CMAKE_FIND_ROOT_PATH
kwargs:
COMPONENTS: '*'
OPTIONAL_COMPONENTS: '*'
NAMES: '*'
CONFIGS: '*'
HINTS: '*'
PATHS: '*'
REGISTRY_VIEW: '*'
PATH_SUFFIXES: '*'
override_spec: {}
vartags: []
proptags: []
format:
disable: false
line_width: 90
tab_size: 4
use_tabchars: false
fractional_tab_policy: use-space
max_subgroups_hwrap: 2
max_pargs_hwrap: 8
max_rows_cmdline: 2
separate_ctrl_name_with_space: false
separate_fn_name_with_space: false
dangle_parens: false
dangle_align: child
min_prefix_chars: 4
max_prefix_chars: 10
max_lines_hwrap: 2
line_ending: unix
command_case: lower
keyword_case: upper
always_wrap: []
enable_sort: true
autosort: false
require_valid_layout: false
layout_passes: {}
markup:
bullet_char: '*'
enum_char: .
first_comment_is_literal: true
literal_comment_pattern: ^#
fence_pattern: ^\s*([`~]{3}[`~]*)(.*)$
ruler_pattern: ^\s*[^\w\s]{3}.*[^\w\s]{3}$
explicit_trailing_pattern: '#<'
hashruler_min_length: 10
canonicalize_hashrulers: true
enable_markup: true
lint:
disabled_codes: []
function_pattern: '[0-9a-z_]+'
macro_pattern: '[0-9A-Z_]+'
global_var_pattern: '[A-Z][0-9A-Z_]+'
internal_var_pattern: _[A-Z][0-9A-Z_]+
local_var_pattern: '[a-z][a-z0-9_]+'
private_var_pattern: _[0-9a-z_]+
public_var_pattern: '[A-Z][0-9A-Z_]+'
argument_var_pattern: '[a-z][a-z0-9_]+'
keyword_pattern: '[A-Z][0-9A-Z_]+'
max_conditionals_custom_parser: 2
min_statement_spacing: 1
max_statement_spacing: 2
max_returns: 6
max_branches: 12
max_arguments: 5
max_localvars: 15
max_statements: 50
encode:
emit_byteorder_mark: false
input_encoding: utf-8
output_encoding: utf-8
misc:
per_command: {}
+14 -159
مشاهده پرونده
@@ -34,16 +34,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_ubuntu_22_04
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -56,14 +47,6 @@ jobs:
- name: Build Tests, Samples, Documentation, Packages
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests samples doc package
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_ubuntu_22_04
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -102,16 +85,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_ubuntu_20_04
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -153,16 +127,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_sles
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -175,14 +140,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_sles
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -212,16 +169,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_8
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -234,14 +182,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_8
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -271,16 +211,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_9
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -293,14 +224,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_9
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -326,16 +249,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: vega20
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -348,14 +262,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: vega20
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -381,16 +287,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: navi32
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -403,14 +300,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: navi32
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -436,16 +325,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi100
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -458,14 +338,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi100
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -492,16 +364,7 @@ jobs:
# steps:
# - uses: actions/checkout@v3
# - name: Restore cached Build
# id: cache-build-restore
# uses: actions/cache/restore@v3
# with:
# path: |
# ${{github.workspace}}/build
# key: navi21
# - name: Configure CMake
# if: steps.cache-build-restore.outputs.cache-hit != 'false'
# # Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# # See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
# run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -514,26 +377,18 @@ jobs:
# - name: Build Tests
# run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
# - name: Save Build
# id: cache-build-save
# uses: actions/cache/save@v3
# with:
# path: |
# ${{github.workspace}}/build
# key: navi21
# - name: Testing V1
# run: |
# cd ${{github.workspace}}/build
# ./run.sh
# # TODO(aelwazir): Enable this once ctest is fixed
# # working-directory: ${{github.workspace}}/build/tests-v2
# # Execute tests defined by the CMake configuration.
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
# # TODO(aelwazir): Enable this once ctest is fixed
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
# - name: Testing V1
# run: |
# cd ${{github.workspace}}/build
# ./run.sh
# # TODO(aelwazir): Enable this once ctest is fixed
# # working-directory: ${{github.workspace}}/build/tests-v2
# # Execute tests defined by the CMake configuration.
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
# # TODO(aelwazir): Enable this once ctest is fixed
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
# - name: Testing V2
# run: |
# cd ${{github.workspace}}/build
# make -j check
# - name: Testing V2
# run: |
# cd ${{github.workspace}}/build
# make -j check
+95
مشاهده پرونده
@@ -0,0 +1,95 @@
name: Formatting
run-name: formatting
on:
pull_request:
branches: [ amd-staging ]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
cmake:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Extract branch name
shell: bash
run: |
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
id: extract_branch
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y python3-pip
python3 -m pip install -U cmake-format
- name: Run cmake-format
run: |
set +e
cmake-format -i $(find . -type f | egrep 'CMakeLists.txt|\.cmake$')
if [ $(git diff | wc -l) -ne 0 ]; then
echo -e "\nError! CMake code not formatted. Run cmake-format...\n"
echo -e "\nFiles:\n"
git diff --name-only
echo -e "\nFull diff:\n"
git diff
exit 1
fi
- name: Create pull request
if: failure()
uses: peter-evans/create-pull-request@v5
with:
commit-message: "run cmake formatting (cmake-format)"
branch: ${{ steps.extract_branch.outputs.branch }}-cmake-format
delete-branch: true
title: "Apply cmake-format to ${{ steps.extract_branch.outputs.branch }}"
base: ${{ steps.extract_branch.outputs.branch }}
source:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: |
DISTRIB_CODENAME=$(cat /etc/lsb-release | grep DISTRIB_CODENAME | awk -F '=' '{print $NF}')
sudo apt-get update
sudo apt-get install -y software-properties-common wget curl clang-format-11
- name: Extract branch name
shell: bash
run: |
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
id: extract_branch
- name: Run clang-format
run: |
set +e
FILES=$(find include plugin samples src test tests-v2 -type f | egrep '\.(h|hpp|hh|c|cc|cpp)(|\.in)$')
FORMAT_OUT=$(clang-format-11 -i ${FILES})
if [ $(git diff | wc -l) -ne 0 ]; then
echo -e "\nError! Code not formatted. Run clang-format (version 11)...\n"
echo -e "\nFiles:\n"
git diff --name-only
echo -e "\nFull diff:\n"
git diff
exit 1
fi
- name: Create pull request
if: failure()
uses: peter-evans/create-pull-request@v5
with:
commit-message: "run formatting (clang-format v11)"
branch: ${{ steps.extract_branch.outputs.branch }}-clang-format
delete-branch: true
title: "Apply clang-format (v11) to ${{ steps.extract_branch.outputs.branch }}"
base: ${{ steps.extract_branch.outputs.branch }}
@@ -24,7 +24,7 @@ cmake_minimum_required(VERSION 3.18.0)
# Build is not supported on Windows plaform
if(WIN32)
message(FATAL_ERROR "Windows build is not supported.")
message(FATAL_ERROR "Windows build is not supported.")
endif()
# Set module name and project name.
@@ -37,9 +37,9 @@ include(GNUInstallDirs)
# set default ROCM_PATH
if(NOT DEFINED ROCM_PATH)
set(ROCM_PATH
"/opt/rocm"
CACHE STRING "Default ROCM installation directory")
set(ROCM_PATH
"/opt/rocm"
CACHE STRING "Default ROCM installation directory")
endif()
set(CMAKE_CXX_STANDARD 17)
@@ -62,8 +62,8 @@ set(BUILD_VERSION_MAJOR ${VERSION_MAJOR})
set(BUILD_VERSION_MINOR ${VERSION_MINOR})
set(BUILD_VERSION_PATCH ${VERSION_PATCH})
if(DEFINED VERSION_BUILD AND NOT ${VERSION_BUILD} STREQUAL "")
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
endif()
set(BUILD_VERSION_STRING
"${BUILD_VERSION_MAJOR}.${BUILD_VERSION_MINOR}.${BUILD_VERSION_PATCH}")
@@ -71,12 +71,11 @@ set(BUILD_VERSION_STRING
set(LIB_VERSION_MAJOR ${VERSION_MAJOR})
set(LIB_VERSION_MINOR ${VERSION_MINOR})
if(${ROCM_PATCH_VERSION})
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
else()
set(LIB_VERSION_PATCH ${VERSION_PATCH})
set(LIB_VERSION_PATCH ${VERSION_PATCH})
endif()
set(LIB_VERSION_STRING
"${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
set(LIB_VERSION_STRING "${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
message("-- LIB-VERSION STRING: ${LIB_VERSION_STRING}")
# Set target and root/lib/test directory
@@ -86,97 +85,84 @@ set(LIB_DIR "${ROOT_DIR}/src")
set(TEST_DIR "${ROOT_DIR}/test")
find_package(
amd_comgr
REQUIRED
CONFIG
HINTS
${CMAKE_INSTALL_PREFIX}
PATHS
${ROCM_PATH}
PATH_SUFFIXES
lib/cmake/amd_comgr)
amd_comgr REQUIRED CONFIG
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH}
PATH_SUFFIXES lib/cmake/amd_comgr)
message(STATUS "Code Object Manager found at ${amd_comgr_DIR}.")
link_libraries(amd_comgr)
find_package(Threads REQUIRED)
find_package(
hsa-runtime64
REQUIRED
CONFIG
HINTS
${CMAKE_INSTALL_PREFIX}
PATHS
${ROCM_PATH})
hsa-runtime64 REQUIRED CONFIG
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH})
find_package(
HIP
REQUIRED
CONFIG
HINTS
${CMAKE_INSTALL_PREFIX}
PATHS
${ROCM_PATH})
HIP REQUIRED CONFIG
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH})
find_library(NUMA NAME numa REQUIRED)
link_libraries(${NUMA})
find_program(ROCMINFO_EXEC NAMES "rocminfo"
PATHS ${ROCM_PATH}
${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
PATH_SUFFIXES bin)
find_program(
ROCMINFO_EXEC
NAMES "rocminfo"
PATHS ${ROCM_PATH} ${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
PATH_SUFFIXES bin)
set(ORIGINAL_SCRIPT_PATH ${CMAKE_CURRENT_SOURCE_DIR}/bin/tblextr.py)
set(OUTPUT_SCRIPT_PATH ${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py)
configure_file(${ORIGINAL_SCRIPT_PATH} ${OUTPUT_SCRIPT_PATH} @ONLY)
get_property(
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
include_directories(${HSA_RUNTIME_INC_PATH})
if(NOT DEFINED LIBRARY_TYPE)
set(LIBRARY_TYPE SHARED)
set(LIBRARY_TYPE SHARED)
endif()
# Enable tracing API
if(NOT USE_PROF_API)
set(USE_PROF_API 1)
set(USE_PROF_API 1)
endif()
# Protocol header lookup
set(PROF_API_HEADER_NAME prof_protocol.h)
if(USE_PROF_API EQUAL 1)
find_path(
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
HINTS ${PROF_API_HEADER_PATH}
PATHS /opt/rocm/include
PATH_SUFFIXES roctracer/ext)
if(NOT PROF_API_HEADER_DIR)
message(
FATAL_ERROR
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
)
else()
include_directories(${PROF_API_HEADER_DIR})
message(
STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
endif()
find_path(
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
HINTS ${PROF_API_HEADER_PATH}
PATHS /opt/rocm/include
PATH_SUFFIXES roctracer/ext)
if(NOT PROF_API_HEADER_DIR)
message(
FATAL_ERROR
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
)
else()
include_directories(${PROF_API_HEADER_DIR})
message(STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
endif()
endif()
# Build libraries
add_subdirectory(src)
if(${LIBRARY_TYPE} STREQUAL SHARED)
# Build samples
add_subdirectory(samples)
# Build samples
add_subdirectory(samples)
# Build tests
add_subdirectory(tests-v2)
# Build tests
add_subdirectory(tests-v2)
endif()
# Build Plugins
@@ -188,20 +174,20 @@ add_subdirectory(${TEST_DIR} ${PROJECT_BINARY_DIR}/test)
# Installation and packaging
set(DEST_NAME ${ROCPROFILER_NAME})
if(DEFINED CMAKE_INSTALL_PREFIX)
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
endif()
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
endif()
endif()
if(DEFINED CPACK_PACKAGING_INSTALL_PREFIX)
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
endif()
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
endif()
else()
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
endif()
message("CMake-install-prefix: ${CMAKE_INSTALL_PREFIX}")
message("CPack-install-prefix: ${CPACK_PACKAGING_INSTALL_PREFIX}")
@@ -209,413 +195,395 @@ message("-----------Dest-name: ${DEST_NAME}")
# Install headers
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
COMPONENT dev)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
COMPONENT dev)
# rpl_run.sh
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
RENAME rocprof
COMPONENT runtime)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
RENAME rocprof
COMPONENT runtime)
configure_file(bin/rocprofv2 ${PROJECT_BINARY_DIR} COPYONLY)
install(
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
COMPONENT runtime)
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
COMPONENT runtime)
# gfx_metrics.xml metrics.xml
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
# librocprof-tool.so
install(
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
install(
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT asan)
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT asan)
install(
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
PERMISSIONS
OWNER_READ
OWNER_WRITE
OWNER_EXECUTE
GROUP_READ
GROUP_EXECUTE
WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
# File reorg backward compatibility for non ASAN packaging
if ( NOT ENABLE_ASAN_PACKAGING )
# File reorg Backward compatibility
option(FILE_REORG_BACKWARD_COMPATIBILITY
"Enable File Reorg with backward compatibility" ON)
if(NOT ENABLE_ASAN_PACKAGING)
# File reorg Backward compatibility
option(FILE_REORG_BACKWARD_COMPATIBILITY
"Enable File Reorg with backward compatibility" ON)
endif()
if(FILE_REORG_BACKWARD_COMPATIBILITY)
# To enabe/disable #error in wrapper header files
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}"
CACHE STRING "Header wrapper warnings as errors.")
else()
set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.")
# To enabe/disable #error in wrapper header files
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
set(ROCM_HEADER_WRAPPER_WERROR
"$ENV{ROCM_HEADER_WRAPPER_WERROR}"
CACHE STRING "Header wrapper warnings as errors.")
else()
set(ROCM_HEADER_WRAPPER_WERROR
"OFF"
CACHE STRING "Header wrapper warnings as errors.")
endif()
endif()
endif()
if(ROCM_HEADER_WRAPPER_WERROR)
set(deprecated_error 1)
else()
set(deprecated_error 0)
endif()
include(rocprofiler-backward-compat.cmake)
endif() #FILE_REORG_BACKWARD_COMPATIBILITY
if(ROCM_HEADER_WRAPPER_WERROR)
set(deprecated_error 1)
else()
set(deprecated_error 0)
endif()
include(rocprofiler-backward-compat.cmake)
endif() # FILE_REORG_BACKWARD_COMPATIBILITY
if(${LIBRARY_TYPE} STREQUAL SHARED)
# Packaging directives
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
set(ENABLE_LDCONFIG
ON
CACHE BOOL "Set library links and caches using ldconfig.")
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
set(CPACK_PACKAGE_VERSION
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
)
set(CPACK_PACKAGE_CONTACT
"ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
"ROCPROFILER library for AMD HSA runtime API extension support")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
# Packaging directives
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
set(ENABLE_LDCONFIG
ON
CACHE BOOL "Set library links and caches using ldconfig.")
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
set(CPACK_PACKAGE_VERSION
"${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
endif()
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
)
set(CPACK_PACKAGE_CONTACT "ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
"ROCPROFILER library for AMD HSA runtime API extension support")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
set(CPACK_PACKAGE_VERSION "${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
endif()
# Debian package specific variable for ASAN
set(CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan")
# Debian package specific variable for ASAN
set ( CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
set ( CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan" )
# Install license file
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}
COMPONENT runtime)
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
COMPONENT asan)
# Install license file
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}
COMPONENT runtime)
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
COMPONENT asan)
# Debian package specific variables
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
else()
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
endif()
# Debian package specific variables
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
else()
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
endif()
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
set(CPACK_DEB_COMPONENT_INSTALL ON)
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
"hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev"
)
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
set(CPACK_DEB_COMPONENT_INSTALL ON)
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS "hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev")
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md")
set ( CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
# RPM package specific variables
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
else()
set(CPACK_RPM_PACKAGE_RELEASE "local")
endif()
# RPM package specific variables
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
else()
set(CPACK_RPM_PACKAGE_RELEASE "local")
endif()
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
# 'dist' breaks manual builds on debian systems due to empty Provides
execute_process(
COMMAND rpm --eval %{?dist}
RESULT_VARIABLE PROC_RESULT
OUTPUT_VARIABLE EVAL_RESULT
OUTPUT_STRIP_TRAILING_WHITESPACE)
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
# 'dist' breaks manual builds on debian systems due to empty Provides
execute_process(
COMMAND rpm --eval %{?dist}
RESULT_VARIABLE PROC_RESULT
OUTPUT_VARIABLE EVAL_RESULT
OUTPUT_STRIP_TRAILING_WHITESPACE)
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
endif()
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
endif()
set(CPACK_RPM_COMPONENT_INSTALL ON)
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
"hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
set(CPACK_RPM_COMPONENT_INSTALL ON)
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES "hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
#Disable build id for rocprofiler as its creating transaction error
set ( CPACK_RPM_SPEC_MORE_DEFINE "%define _build_id_links none
# Disable build id for rocprofiler as its creating transaction error
set(CPACK_RPM_SPEC_MORE_DEFINE
"%define _build_id_links none
%global __strip ${CPACK_STRIP_EXECUTABLE}
%global __objdump ${CPACK_OBJDUMP_EXECUTABLE}
%global __objcopy ${CPACK_OBJCOPY_EXECUTABLE}
%global __readelf ${CPACK_READELF_EXECUTABLE}")
# RPM package specific variable for ASAN
set ( CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
set ( CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan" )
#set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
# RPM package specific variable for ASAN
set(CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan")
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
if(NOT ROCM_DEP_ROCMCORE)
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
string(REGEX
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
string(REGEX
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
string(REGEX
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
endif()
# set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
## set components
if(ENABLE_ASAN_PACKAGING)
# ASAN Package requires only asan component with libraries and license file
set(CPACK_COMPONENTS_ALL asan)
else()
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
endif()
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
if(NOT ROCM_DEP_ROCMCORE)
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
endif()
include(CPack)
# set components
if(ENABLE_ASAN_PACKAGING)
# ASAN Package requires only asan component with libraries and license file
set(CPACK_COMPONENTS_ALL asan)
else()
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
endif()
cpack_add_component(
runtime
DISPLAY_NAME "Runtime"
DESCRIPTION "Dynamic libraries for the ROCProfiler")
include(CPack)
cpack_add_component(
dev
DISPLAY_NAME "Development"
DESCRIPTION "Development needed header files for ROCProfiler"
DEPENDS runtime)
cpack_add_component(
runtime
DISPLAY_NAME "Runtime"
DESCRIPTION "Dynamic libraries for the ROCProfiler")
cpack_add_component(
plugins
DISPLAY_NAME "ROCProfile Plugins"
DESCRIPTION "Plugins for handling ROCProfiler data output"
DEPENDS runtime)
cpack_add_component(
dev
DISPLAY_NAME "Development"
DESCRIPTION "Development needed header files for ROCProfiler"
DEPENDS runtime)
cpack_add_component(
tests
DISPLAY_NAME "Tests"
DESCRIPTION "Tests for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
plugins
DISPLAY_NAME "ROCProfile Plugins"
DESCRIPTION "Plugins for handling ROCProfiler data output"
DEPENDS runtime)
cpack_add_component(
samples
DISPLAY_NAME "Samples"
DESCRIPTION "Samples for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
tests
DISPLAY_NAME "Tests"
DESCRIPTION "Tests for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
docs
DISPLAY_NAME "Documentation"
DESCRIPTION "Documentation for the ROCProfiler API"
DEPENDS dev)
cpack_add_component(
samples
DISPLAY_NAME "Samples"
DESCRIPTION "Samples for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
asan
DISPLAY_NAME "ASAN"
DESCRIPTION "ASAN libraries for the ROCPROFILER"
DEPENDS asan)
cpack_add_component(
docs
DISPLAY_NAME "Documentation"
DESCRIPTION "Documentation for the ROCProfiler API"
DEPENDS dev)
cpack_add_component(
asan
DISPLAY_NAME "ASAN"
DESCRIPTION "ASAN libraries for the ROCPROFILER"
DEPENDS asan)
endif()
find_package(Doxygen)
if(DOXYGEN_FOUND)
# # Set input and output files for API Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
# # Set input and output files for API Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
COMMENT "Generating API documentation")
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
COMMENT "Generating API documentation")
add_custom_target(
doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
add_custom_target(doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
OPTIONAL
COMPONENT docs)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
# # Set input and output files for Tools Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
# # Set input and output files for Tools Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
COMMENT "Generating Tools documentation")
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
COMMENT "Generating Tools documentation")
add_custom_target(
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
add_custom_target(
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_tool.pdf"
OPTIONAL
COMPONENT docs)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_tool.pdf"
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
# # Set input and output files for changelog document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
# # Set input and output files for changelog document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
COMMENT "Generating changelog documentation")
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
COMMENT "Generating changelog documentation")
add_custom_target(
doc_changelog DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
add_custom_target(doc_changelog
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
OPTIONAL
COMPONENT docs)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
OPTIONAL
COMPONENT docs)
add_dependencies(doc doc_changelog)
add_dependencies(doc doc_changelog)
endif()
@@ -5,23 +5,16 @@
# - LIBDW_INCLUDE_DIRS - the libelf include directory
# - LIBDW_LIBRARIES - Link these to use libelf
# - LIBDW_DEFINITIONS - Compiler switches required for using libelf
find_path(FIND_LIBDW_INCLUDES
NAMES
elfutils/libdw.h
PATHS
/usr/include
/usr/local/include)
find_path(
FIND_LIBDW_INCLUDES
NAMES elfutils/libdw.h
PATHS /usr/include /usr/local/include)
find_library(FIND_LIBDW_LIBRARIES
NAMES
dw
PATH
/usr/lib
/usr/local/lib)
find_library(FIND_LIBDW_LIBRARIES NAMES dw PATH /usr/lib /usr/local/lib)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(LibDw DEFAULT_MSG
FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
find_package_handle_standard_args(LibDw DEFAULT_MSG FIND_LIBDW_INCLUDES
FIND_LIBDW_LIBRARIES)
mark_as_advanced(FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
set(LIBDW_INCLUDES ${FIND_LIBDW_INCLUDES})
@@ -5,25 +5,16 @@
# - LIBELF_INCLUDE_DIRS - the libelf include directory
# - LIBELF_LIBRARIES - Link these to use libelf
# - LIBELF_DEFINITIONS - Compiler switches required for using libelf
find_path(FIND_LIBELF_INCLUDES
NAMES
libelf.h
PATHS
/usr/include
/usr/include/libelf
/usr/local/include
/usr/local/include/libelf)
find_path(
FIND_LIBELF_INCLUDES
NAMES libelf.h
PATHS /usr/include /usr/include/libelf /usr/local/include /usr/local/include/libelf)
find_library(FIND_LIBELF_LIBRARIES
NAMES
elf
PATH
/usr/lib
/usr/local/lib)
find_library(FIND_LIBELF_LIBRARIES NAMES elf PATH /usr/lib /usr/local/lib)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(LibElf DEFAULT_MSG
FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
find_package_handle_standard_args(LibElf DEFAULT_MSG FIND_LIBELF_INCLUDES
FIND_LIBELF_LIBRARIES)
mark_as_advanced(FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
set(LIBELF_INCLUDES ${FIND_LIBELF_INCLUDES})
@@ -20,60 +20,75 @@
# THE SOFTWARE.
################################################################################
## Linux Compiler options
# Linux Compiler options
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fms-extensions")
add_definitions ( -DNEW_TRACE_API=1 )
add_definitions(-DNEW_TRACE_API=1)
## CLANG options
# CLANG options
if("$ENV{CXX}" STREQUAL "/usr/bin/clang++")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
endif()
## Enable debug trace
if ( DEFINED ENV{CMAKE_DEBUG_TRACE} )
add_definitions ( -DDEBUG_TRACE=1 )
# Enable debug trace
if(DEFINED ENV{CMAKE_DEBUG_TRACE})
add_definitions(-DDEBUG_TRACE=1)
endif()
## Enable AQL-profile new API
if ( NOT DEFINED ENV{CMAKE_CURR_API} )
add_definitions ( -DAQLPROF_NEW_API=1 )
# Enable AQL-profile new API
if(NOT DEFINED ENV{CMAKE_CURR_API})
add_definitions(-DAQLPROF_NEW_API=1)
endif()
## Enable direct loading of AQL-profile HSA extension
if ( DEFINED ENV{CMAKE_LD_AQLPROFILE} )
add_definitions ( -DROCP_LD_AQLPROFILE=1 )
# Enable direct loading of AQL-profile HSA extension
if(DEFINED ENV{CMAKE_LD_AQLPROFILE})
add_definitions(-DROCP_LD_AQLPROFILE=1)
endif()
## Find hsa-runtime
find_package(hsa-runtime64 CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsa-runtime64 )
# Find hsa-runtime
find_package(
hsa-runtime64 CONFIG REQUIRED
HINTS ${CMAKE_PREFIX_PATH}
PATHS /opt/rocm
PATH_SUFFIXES lib/cmake/hsa-runtime64)
# find KFD thunk
find_package(hsakmt CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsakmt )
find_package(
hsakmt CONFIG REQUIRED
HINTS ${CMAKE_PREFIX_PATH}
PATHS /opt/rocm
PATH_SUFFIXES lib/cmake/hsakmt)
## Find ROCm
## TODO: Need a better method to find the ROCm path
find_path ( HSA_KMT_INC_PATH "hsakmt/hsakmt.h" HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES include )
if ( "${HSA_KMT_INC_PATH}" STREQUAL "" )
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
# Find ROCm TODO: Need a better method to find the ROCm path
find_path(
HSA_KMT_INC_PATH "hsakmt/hsakmt.h"
HINTS ${CMAKE_PREFIX_PATH}
PATHS /opt/rocm
PATH_SUFFIXES include)
if("${HSA_KMT_INC_PATH}" STREQUAL "")
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
endif()
## Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
get_filename_component ( ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY )
# Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
get_filename_component(ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY)
## Basic Tool Chain Information
message ( "----------Build-Type: ${CMAKE_BUILD_TYPE}" )
message ( "------------Compiler: ${CMAKE_CXX_COMPILER}" )
message ( "----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}" )
message ( "-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}" )
message ( "-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}" )
message ( "---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}" )
message ( "---------GPU_TARGETS: ${GPU_TARGETS}" )
# Basic Tool Chain Information
message("----------Build-Type: ${CMAKE_BUILD_TYPE}")
message("------------Compiler: ${CMAKE_CXX_COMPILER}")
message("----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}")
message("-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}")
message("-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
message("---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}")
message("---------GPU_TARGETS: ${GPU_TARGETS}")
if ( "${ROCM_ROOT_DIR}" STREQUAL "" )
message ( FATAL_ERROR "ROCM_ROOT_DIR is not found." )
endif ()
if("${ROCM_ROOT_DIR}" STREQUAL "")
message(FATAL_ERROR "ROCM_ROOT_DIR is not found.")
endif()
find_library(FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_ROOT_DIR} PATH_SUFFIXES lib REQUIRED)
find_library(
FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so"
HINTS ${CMAKE_PREFIX_PATH}
PATHS ${ROCM_ROOT_DIR}
PATH_SUFFIXES lib REQUIRED)
if(NOT FIND_AQL_PROFILE_LIB)
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
endif()
@@ -20,77 +20,95 @@
# THE SOFTWARE.
################################################################################
## Parses the VERSION_STRING variable and places
## the first, second and third number values in
## the major, minor and patch variables.
function( parse_version VERSION_STRING )
# Parses the VERSION_STRING variable and places the first, second and third number values
# in the major, minor and patch variables.
function(parse_version VERSION_STRING)
string ( FIND ${VERSION_STRING} "-" STRING_INDEX )
string(FIND ${VERSION_STRING} "-" STRING_INDEX)
if ( ${STRING_INDEX} GREATER -1 )
math ( EXPR STRING_INDEX "${STRING_INDEX} + 1" )
string ( SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD )
endif ()
if(${STRING_INDEX} GREATER -1)
math(EXPR STRING_INDEX "${STRING_INDEX} + 1")
string(SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD)
endif()
string ( REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING} )
list ( LENGTH VERSIONS VERSION_COUNT )
string(REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING})
list(LENGTH VERSIONS VERSION_COUNT)
if ( ${VERSION_COUNT} GREATER 0)
list ( GET VERSIONS 0 MAJOR )
set ( VERSION_MAJOR ${MAJOR} PARENT_SCOPE )
set ( TEMP_VERSION_STRING "${MAJOR}" )
endif ()
if(${VERSION_COUNT} GREATER 0)
list(GET VERSIONS 0 MAJOR)
set(VERSION_MAJOR
${MAJOR}
PARENT_SCOPE)
set(TEMP_VERSION_STRING "${MAJOR}")
endif()
if ( ${VERSION_COUNT} GREATER 1 )
list ( GET VERSIONS 1 MINOR )
set ( VERSION_MINOR ${MINOR} PARENT_SCOPE )
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}" )
endif ()
if(${VERSION_COUNT} GREATER 1)
list(GET VERSIONS 1 MINOR)
set(VERSION_MINOR
${MINOR}
PARENT_SCOPE)
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}")
endif()
if ( ${VERSION_COUNT} GREATER 2 )
list ( GET VERSIONS 2 PATCH )
set ( VERSION_PATCH ${PATCH} PARENT_SCOPE )
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}" )
endif ()
if(${VERSION_COUNT} GREATER 2)
list(GET VERSIONS 2 PATCH)
set(VERSION_PATCH
${PATCH}
PARENT_SCOPE)
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}")
endif()
if ( DEFINED VERSION_BUILD )
set ( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
endif ()
if(DEFINED VERSION_BUILD)
set(VERSION_BUILD
"${VERSION_BUILD}"
PARENT_SCOPE)
endif()
set ( VERSION_STRING "${TEMP_VERSION_STRING}" PARENT_SCOPE )
endfunction ()
## Gets the current version of the repository
## using versioning tags and git describe.
## Passes back a packaging version string
## and a library version string.
function ( get_version DEFAULT_VERSION_STRING )
parse_version ( ${DEFAULT_VERSION_STRING} )
find_program ( GIT NAMES git )
if ( GIT )
execute_process ( COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE GIT_TAG_STRING
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RESULT )
if ( ${RESULT} EQUAL 0 )
parse_version ( ${GIT_TAG_STRING} )
endif ()
endif ()
set( VERSION_STRING "${VERSION_STRING}" PARENT_SCOPE )
set( VERSION_MAJOR "${VERSION_MAJOR}" PARENT_SCOPE )
set( VERSION_MINOR "${VERSION_MINOR}" PARENT_SCOPE )
set( VERSION_PATCH "${VERSION_PATCH}" PARENT_SCOPE )
set( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
set(VERSION_STRING
"${TEMP_VERSION_STRING}"
PARENT_SCOPE)
endfunction()
# Gets the current version of the repository using versioning tags and git describe.
# Passes back a packaging version string and a library version string.
function(get_version DEFAULT_VERSION_STRING)
parse_version(${DEFAULT_VERSION_STRING})
find_program(GIT NAMES git)
if(GIT)
execute_process(
COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE GIT_TAG_STRING
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RESULT)
if(${RESULT} EQUAL 0)
parse_version(${GIT_TAG_STRING})
endif()
endif()
set(VERSION_STRING
"${VERSION_STRING}"
PARENT_SCOPE)
set(VERSION_MAJOR
"${VERSION_MAJOR}"
PARENT_SCOPE)
set(VERSION_MINOR
"${VERSION_MINOR}"
PARENT_SCOPE)
set(VERSION_PATCH
"${VERSION_PATCH}"
PARENT_SCOPE)
set(VERSION_BUILD
"${VERSION_BUILD}"
PARENT_SCOPE)
endfunction()
@@ -164,12 +164,12 @@ typedef struct {
// Profiling feature type
typedef struct {
rocprofiler_feature_kind_t kind; // feature kind
rocprofiler_feature_kind_t kind; // feature kind
union {
const char* name; // feature name
const char* name; // feature name
struct {
const char* block; // counter block name
uint32_t event; // counter event id
const char* block; // counter block name
uint32_t event; // counter event id
} counter;
};
const rocprofiler_parameter_t* parameters; // feature parameters array
@@ -216,23 +216,25 @@ typedef struct {
} rocprofiler_properties_t;
// Create new profiling context
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_properties_t* properties); // profiling properties
// Add feature to a features set
hsa_status_t rocprofiler_add_feature(const rocprofiler_feature_t* feature, // [in]
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
hsa_status_t rocprofiler_add_feature(
const rocprofiler_feature_t* feature, // [in]
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
// Create new profiling context
hsa_status_t rocprofiler_features_set_open(hsa_agent_t agent, // GPU handle
rocprofiler_feature_set_t* features_set, // [in] profiling features set
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_properties_t* properties); // profiling properties
hsa_status_t rocprofiler_features_set_open(
hsa_agent_t agent, // GPU handle
rocprofiler_feature_set_t* features_set, // [in] profiling features set
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_properties_t* properties); // profiling properties
// Delete profiling info
hsa_status_t rocprofiler_close(rocprofiler_t* context); // [in] profiling context
@@ -242,24 +244,24 @@ hsa_status_t rocprofiler_reset(rocprofiler_t* context, // [in] profiling contex
uint32_t group_index); // group index
// Return context agent
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
hsa_agent_t* agent); // [out] GPU handle
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
hsa_agent_t* agent); // [out] GPU handle
// Supported time value ID
typedef enum {
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
} rocprofiler_time_id_t;
// Return time value for a given time ID and profiling timestamp
hsa_status_t rocprofiler_get_time(
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
uint64_t timestamp, // profiling timestamp
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
uint64_t timestamp, // profiling timestamp
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
////////////////////////////////////////////////////////////////////////////////
// Queue callbacks
@@ -269,26 +271,26 @@ hsa_status_t rocprofiler_get_time(
// Dispatch record
typedef struct {
uint64_t dispatch; // dispatch timestamp, ns
uint64_t begin; // kernel begin timestamp, ns
uint64_t end; // kernel end timestamp, ns
uint64_t complete; // completion signal timestamp, ns
uint64_t dispatch; // dispatch timestamp, ns
uint64_t begin; // kernel begin timestamp, ns
uint64_t end; // kernel end timestamp, ns
uint64_t complete; // completion signal timestamp, ns
} rocprofiler_dispatch_record_t;
// Profiling callback data
typedef struct {
hsa_agent_t agent; // GPU agent handle
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
const hsa_queue_t* queue; // HSA queue
uint64_t queue_index; // Index in the queue
uint32_t queue_id; // Queue id
hsa_signal_t completion_signal; // Completion signal
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
const char* kernel_name; // Kernel name
uint64_t kernel_object; // Kernel object address
const amd_kernel_code_t* kernel_code; // Kernel code pointer
uint32_t thread_id; // Thread id
const rocprofiler_dispatch_record_t* record; // Dispatch record
hsa_agent_t agent; // GPU agent handle
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
const hsa_queue_t* queue; // HSA queue
uint64_t queue_index; // Index in the queue
uint32_t queue_id; // Queue id
hsa_signal_t completion_signal; // Completion signal
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
const char* kernel_name; // Kernel name
uint64_t kernel_object; // Kernel object address
const amd_kernel_code_t* kernel_code; // Kernel code pointer
uint32_t thread_id; // Thread id
const rocprofiler_dispatch_record_t* record; // Dispatch record
} rocprofiler_callback_data_t;
// Profiling callback type
@@ -299,15 +301,14 @@ typedef hsa_status_t (*rocprofiler_callback_t)(
// Queue callbacks
typedef struct {
rocprofiler_callback_t dispatch; // dispatch callback
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
rocprofiler_callback_t dispatch; // dispatch callback
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
} rocprofiler_queue_callbacks_t;
// Set queue callbacks
hsa_status_t rocprofiler_set_queue_callbacks(
rocprofiler_queue_callbacks_t callbacks, // callbacks
void* data); // [in/out] passed callbacks data
hsa_status_t rocprofiler_set_queue_callbacks(rocprofiler_queue_callbacks_t callbacks, // callbacks
void* data); // [in/out] passed callbacks data
// Remove queue callbacks
hsa_status_t rocprofiler_remove_queue_callbacks();
@@ -323,20 +324,20 @@ hsa_status_t rocprofiler_stop_queue_callbacks();
// contect.invocations' to collect all profiling data
// Start profiling
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Stop profiling
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Read profiling
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Read profiling data
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Get profiling groups count
hsa_status_t rocprofiler_group_count(const rocprofiler_t* context, // [in] profiling context
@@ -379,75 +380,76 @@ hsa_status_t rocprofiler_iterate_trace_data(
// Profiling info kind
typedef enum {
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
} rocprofiler_info_kind_t;
// Profiling info query
typedef union {
rocprofiler_info_kind_t info_kind; // queried profiling info kind
rocprofiler_info_kind_t info_kind; // queried profiling info kind
struct {
const char* trace_name; // queried info trace name
const char* trace_name; // queried info trace name
} trace_parameter;
} rocprofiler_info_query_t;
// Profiling info data
typedef struct {
uint32_t agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
rocprofiler_info_kind_t kind; // info data kind
uint32_t
agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
rocprofiler_info_kind_t kind; // info data kind
union {
struct {
const char* name; // metric name
uint32_t instances; // instances number
const char* expr; // metric expression, NULL for basic counters
const char* description; // metric description
const char* block_name; // block name
uint32_t block_counters; // number of block counters
const char* name; // metric name
uint32_t instances; // instances number
const char* expr; // metric expression, NULL for basic counters
const char* description; // metric description
const char* block_name; // block name
uint32_t block_counters; // number of block counters
} metric;
struct {
const char* name; // trace name
const char* description; // trace description
uint32_t parameter_count; // supported by the trace number parameters
const char* name; // trace name
const char* description; // trace description
uint32_t parameter_count; // supported by the trace number parameters
} trace;
struct {
uint32_t code; // parameter code
const char* trace_name; // trace name
const char* parameter_name; // parameter name
const char* description; // trace parameter description
uint32_t code; // parameter code
const char* trace_name; // trace name
const char* parameter_name; // parameter name
const char* description; // trace parameter description
} trace_parameter;
};
} rocprofiler_info_data_t;
// Return the info for a given info kind
hsa_status_t rocprofiler_get_info(
const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
void *data); // [in/out] returned data
hsa_status_t rocprofiler_get_info(const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
void* data); // [in/out] returned data
// Iterate over the info for a given info kind, and invoke an application-defined callback on every iteration
hsa_status_t rocprofiler_iterate_info(
const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
void *data); // [in/out] data passed to callback
// Iterate over the info for a given info kind, and invoke an application-defined callback on every
// iteration
hsa_status_t rocprofiler_iterate_info(const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
void* data), // callback
void* data); // [in/out] data passed to callback
// Iterate over the info for a given info query, and invoke an application-defined callback on every iteration
hsa_status_t rocprofiler_query_info(
const hsa_agent_t *agent, // [in] GFXIP handle
rocprofiler_info_query_t query, // iterated info query
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
void *data); // [in/out] data passed to callback
// Iterate over the info for a given info query, and invoke an application-defined callback on every
// iteration
hsa_status_t rocprofiler_query_info(const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_query_t query, // iterated info query
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
void* data), // callback
void* data); // [in/out] data passed to callback
// Create a profiled queue. All dispatches on this queue will be profiled
hsa_status_t rocprofiler_queue_create_profiled(
hsa_agent_t agent_handle,uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data),
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
hsa_queue_t** queue);
hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue);
////////////////////////////////////////////////////////////////////////////////
// Profiling pool
@@ -461,8 +463,8 @@ typedef void rocprofiler_pool_t;
// Profiling pool entry
typedef struct {
rocprofiler_t* context; // context object
void* payload; // payload data object
rocprofiler_t* context; // context object
void* payload; // payload data object
} rocprofiler_pool_entry_t;
// Profiling handler, calling on profiling completion
@@ -478,120 +480,118 @@ typedef struct {
// Open profiling pool
hsa_status_t rocprofiler_pool_open(
hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_pool_t** pool, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_pool_properties_t*); // pool properties
hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_pool_t** pool, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_pool_properties_t*); // pool properties
// Close profiling pool
hsa_status_t rocprofiler_pool_close(
rocprofiler_pool_t* pool); // profiling pool handle
hsa_status_t rocprofiler_pool_close(rocprofiler_pool_t* pool); // profiling pool handle
// Fetch profiling pool entry
hsa_status_t rocprofiler_pool_fetch(
rocprofiler_pool_t* pool, // profiling pool handle
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
rocprofiler_pool_t* pool, // profiling pool handle
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
// Release profiling pool entry
hsa_status_t rocprofiler_pool_release(
rocprofiler_pool_entry_t* entry); // released profiling pool entry
rocprofiler_pool_entry_t* entry); // released profiling pool entry
// Iterate fetched profiling pool entries
hsa_status_t rocprofiler_pool_iterate(
rocprofiler_pool_t* pool, // profiling pool handle
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), // callback
void *data); // [in/out] data passed to callback
hsa_status_t rocprofiler_pool_iterate(rocprofiler_pool_t* pool, // profiling pool handle
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry,
void* data), // callback
void* data); // [in/out] data passed to callback
// Flush completed entries in profiling pool
hsa_status_t rocprofiler_pool_flush(
rocprofiler_pool_t* pool); // profiling pool handle
hsa_status_t rocprofiler_pool_flush(rocprofiler_pool_t* pool); // profiling pool handle
////////////////////////////////////////////////////////////////////////////////
// HSA intercepting API
// HSA callbacks ID enumeration
typedef enum {
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
} rocprofiler_hsa_cb_id_t;
// HSA callback data type
typedef struct {
union {
struct {
const void* ptr; // allocated area ptr
size_t size; // allocated area size, zero size means 'free' callback
hsa_amd_segment_t segment; // allocated area's memory segment type
const void* ptr; // allocated area ptr
size_t size; // allocated area size, zero size means 'free' callback
hsa_amd_segment_t segment; // allocated area's memory segment type
hsa_amd_memory_pool_global_flag_t global_flag; // allocated area's memory global flag
int is_code; // equal to 1 if code is allocated
} allocate;
struct {
hsa_device_type_t type; // type of assigned device
uint32_t id; // id of assigned device
hsa_agent_t agent; // device HSA agent handle
const void* ptr; // ptr the device is assigned to
hsa_device_type_t type; // type of assigned device
uint32_t id; // id of assigned device
hsa_agent_t agent; // device HSA agent handle
const void* ptr; // ptr the device is assigned to
} device;
struct {
const void* dst; // memcopy dst ptr
const void* src; // memcopy src ptr
size_t size; // memcopy size bytes
const void* dst; // memcopy dst ptr
const void* src; // memcopy src ptr
size_t size; // memcopy size bytes
} memcopy;
struct {
const void* packet; // submitted to GPU packet
const char* kernel_name; // kernel name, not NULL if dispatch
hsa_queue_t* queue; // HSA queue the kernel was submitted to
uint32_t device_type; // type of device the packed is submitted to
uint32_t device_id; // id of device the packed is submitted to
const void* packet; // submitted to GPU packet
const char* kernel_name; // kernel name, not NULL if dispatch
hsa_queue_t* queue; // HSA queue the kernel was submitted to
uint32_t device_type; // type of device the packed is submitted to
uint32_t device_id; // id of device the packed is submitted to
} submit;
struct {
uint64_t object; // kernel symbol object
const char* name; // kernel symbol name
uint32_t name_length; // kernel symbol name length
int unload; // symbol executable destroy
uint64_t object; // kernel symbol object
const char* name; // kernel symbol name
uint32_t name_length; // kernel symbol name length
int unload; // symbol executable destroy
} ksymbol;
struct {
uint32_t storage_type; // code object storage type
int storage_file; // origin file descriptor
uint64_t memory_base; // origin memory base
uint64_t memory_size; // origin memory size
uint64_t load_base; // codeobj load base
uint64_t load_size; // codeobj load size
uint64_t load_delta; // codeobj load size
uint32_t uri_length; // URI string length
char* uri; // URI string
int unload; // unload flag
uint32_t storage_type; // code object storage type
int storage_file; // origin file descriptor
uint64_t memory_base; // origin memory base
uint64_t memory_size; // origin memory size
uint64_t load_base; // codeobj load base
uint64_t load_size; // codeobj load size
uint64_t load_delta; // codeobj load size
uint32_t uri_length; // URI string length
char* uri; // URI string
int unload; // unload flag
} codeobj;
};
} rocprofiler_hsa_callback_data_t;
// HSA callback function type
typedef hsa_status_t (*rocprofiler_hsa_callback_fun_t)(
rocprofiler_hsa_cb_id_t id, // callback id
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
void* arg); // [in/out] user passed data
rocprofiler_hsa_cb_id_t id, // callback id
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
void* arg); // [in/out] user passed data
// HSA callbacks structure
typedef struct {
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
rocprofiler_hsa_callback_fun_t device; // agent assign callback
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
rocprofiler_hsa_callback_fun_t device; // agent assign callback
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
} rocprofiler_hsa_callbacks_t;
// Set callbacks. If the callback is NULL then it is disabled.
// If callback returns a value that is not HSA_STATUS_SUCCESS the callback
// will be unregistered.
hsa_status_t rocprofiler_set_hsa_callbacks(
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
void* arg); // callback user data
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
void* arg); // callback user data
#ifdef __cplusplus
} // extern "C" block
@@ -1714,7 +1714,7 @@ typedef enum {
ROCPROFILER_ATT_TOKEN_MASK2 = 4,
ROCPROFILER_ATT_SE_MASK = 5,
ROCPROFILER_ATT_SAMPLE_RATE = 6,
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
ROCPROFILER_ATT_PERF_MASK = 240,
ROCPROFILER_ATT_PERF_CTRL = 241,
ROCPROFILER_ATT_PERFCOUNTER = 242,
@@ -1,23 +1,23 @@
################################################################################
## Copyright (c) 2022 Advanced Micro Devices, Inc.
##
## Permission is hereby granted, free of charge, to any person obtaining a copy
## of this software and associated documentation files (the "Software"), to
## deal in the Software without restriction, including without limitation the
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
## sell copies of the Software, and to permit persons to whom the Software is
## furnished to do so, subject to the following conditions:
##
## The above copyright notice and this permission notice shall be included in
## all copies or substantial portions of the Software.
##
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
## IN THE SOFTWARE.
# Copyright (c) 2022 Advanced Micro Devices, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
################################################################################
add_subdirectory(file)
@@ -17,10 +17,10 @@
# ##############################################################################
find_library(
ROCPROFV2_ATT rocprofv2_att
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH}
PATH_SUFFIXES hsa-amd-aqlprofile)
ROCPROFV2_ATT rocprofv2_att
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH}
PATH_SUFFIXES hsa-amd-aqlprofile)
set(ENV{ROCPROFV2_ATT_LIB_PATH} $ROCPROFV2_ATT)
@@ -30,30 +30,26 @@ file(GLOB FILE_SOURCES att.cpp)
add_library(att_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
set_target_properties(
att_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
att_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(att_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(
att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR})
target_link_options(
att_plugin PRIVATE
-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(att_plugin PRIVATE rocprofiler-v2
hsa-runtime64::hsa-runtime64 stdc++fs)
att_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(att_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs)
install(TARGETS att_plugin
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS att_plugin
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
configure_file(att.py att/att.py COPYONLY)
configure_file(trace_view.py att/trace_view.py COPYONLY)
@@ -64,7 +60,7 @@ configure_file(ui/logo.svg att/ui/logo.svg COPYONLY)
configure_file(ui/styles.css att/ui/styles.css COPYONLY)
configure_file(ui/httpserver.py att/ui/httpserver.py COPYONLY)
install(
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
@@ -54,11 +54,12 @@ class att_plugin_t {
att_plugin_t() {
std::vector<const char*> mpivars = {"MPI_RANK", "OMPI_COMM_WORLD_RANK", "MV2_COMM_WORLD_RANK"};
for (const char* envvar : mpivars) if (const char* env = getenv(envvar)) {
MPI_RANK = atoi(env);
MPI_ENABLE = true;
break;
}
for (const char* envvar : mpivars)
if (const char* env = getenv(envvar)) {
MPI_RANK = atoi(env);
MPI_ENABLE = true;
break;
}
}
bool MPI_ENABLE = false;
@@ -92,16 +93,15 @@ class att_plugin_t {
std::string name_demangled =
rocprofiler::truncate_name(rocprofiler::cxx_demangle(kernel_name_c));
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
name_demangled = name_demangled.substr(0, ATT_FILENAME_MAXBYTES);
std::string outfilepath = ".";
if (const char* env = getenv("OUTPUT_PATH"))
outfilepath = std::string(env);
if (const char* env = getenv("OUTPUT_PATH")) outfilepath = std::string(env);
outfilepath.reserve(outfilepath.size()+128); // Max filename size
outfilepath += '/'+name_demangled;
if (MPI_ENABLE) outfilepath += "_rank"+std::to_string(MPI_RANK);
outfilepath.reserve(outfilepath.size() + 128); // Max filename size
outfilepath += '/' + name_demangled;
if (MPI_ENABLE) outfilepath += "_rank" + std::to_string(MPI_RANK);
outfilepath += "_v";
// Find if this filename already exists. If so, increment vname.
@@ -113,9 +113,9 @@ class att_plugin_t {
auto dispatch_id = att_tracer_record->header.id.handle;
std::string fname = outfilepath + "_kernel.txt";
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id
<< "] GPU[" << att_tracer_record->gpu_id.handle
<< "]: " << kernel_name_c << '\n';
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id << "] GPU["
<< att_tracer_record->gpu_id.handle << "]: " << kernel_name_c
<< '\n';
// iterate over each shader engine att trace
int se_num = att_tracer_record->shader_engine_data_count;
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
file(GLOB CLI_SOURCES "*.cpp")
add_library(cli_plugin SHARED ${CLI_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
set_target_properties(cli_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
set_target_properties(
cli_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(cli_plugin
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
target_compile_definitions(cli_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(cli_plugin PRIVATE ${PROJECT_SOURCE_DIR})
target_link_options(cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
target_link_options(
cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs atomic amd_comgr dl)
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs atomic amd_comgr dl)
install(TARGETS cli_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS cli_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
@@ -1,76 +1,84 @@
################################################################################
## Copyright (c) 2022 Advanced Micro Devices, Inc.
##
## Permission is hereby granted, free of charge, to any person obtaining a copy
## of this software and associated documentation files (the "Software"), to
## deal in the Software without restriction, including without limitation the
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
## sell copies of the Software, and to permit persons to whom the Software is
## furnished to do so, subject to the following conditions:
##
## The above copyright notice and this permission notice shall be included in
## all copies or substantial portions of the Software.
##
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
## IN THE SOFTWARE.
# Copyright (c) 2022 Advanced Micro Devices, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
################################################################################
# Plugin shared object.
add_library(ctf_plugin SHARED
ctf.cpp
plugin.cpp
barectf.c "${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
hsa_begin.cpp.i hsa_end.cpp.i
hip_begin.cpp.i hip_end.cpp.i)
set_target_properties(ctf_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
add_library(
ctf_plugin SHARED
ctf.cpp
plugin.cpp
barectf.c
"${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
hsa_begin.cpp.i
hsa_end.cpp.i
hip_begin.cpp.i
hip_end.cpp.i)
set_target_properties(
ctf_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
set(METADATA_STREAM_FILE_DIR "${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME}/plugin/ctf")
target_compile_definitions(ctf_plugin PUBLIC AMD_INTERNAL_BUILD PRIVATE
HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata")
target_include_directories(ctf_plugin PRIVATE
"${PROJECT_SOURCE_DIR}"
"${CMAKE_BINARY_DIR}/src/api"
"${CMAKE_CURRENT_BINARY_DIR}")
target_link_options(ctf_plugin PRIVATE
"-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
-Wl,--no-undefined)
target_link_libraries(ctf_plugin PRIVATE
rocprofiler-v2
hsa-runtime64::hsa-runtime64
stdc++fs
dl)
install(TARGETS ctf_plugin LIBRARY
DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
COMPONENT plugins)
target_compile_definitions(
ctf_plugin
PUBLIC AMD_INTERNAL_BUILD
PRIVATE
HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata"
)
target_include_directories(
ctf_plugin PRIVATE "${PROJECT_SOURCE_DIR}" "${CMAKE_BINARY_DIR}/src/api"
"${CMAKE_CURRENT_BINARY_DIR}")
target_link_options(
ctf_plugin PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
-Wl,--no-undefined)
target_link_libraries(ctf_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs dl)
install(TARGETS ctf_plugin LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
COMPONENT plugins)
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3,
# CppHeaderParser, PyYAML, and barectf.
find_package(Python3 COMPONENTS Interpreter REQUIRED)
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3, CppHeaderParser, PyYAML, and
# barectf.
find_package(
Python3
COMPONENTS Interpreter
REQUIRED)
message("Python: ${Python3_EXECUTABLE})")
execute_process(COMMAND Python3::Interpreter -c "print('hello')")
function(check_py3_pkg pkg_name)
execute_process(COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
RESULT_VARIABLE PY3_IMPORT_RES
OUTPUT_QUIET)
execute_process(
COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
RESULT_VARIABLE PY3_IMPORT_RES
OUTPUT_QUIET)
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
endif()
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
endif()
message(STATUS "Found Python 3 package `${pkg_name}`")
message(STATUS "Found Python 3 package `${pkg_name}`")
endfunction()
check_py3_pkg(CppHeaderParser)
@@ -78,82 +86,76 @@ check_py3_pkg(yaml)
find_program(BARECTF_RES barectf REQUIRED HINTS "$ENV{HOME}/.local/bin")
# Generate barectf YAML and C++ files for HSA API.
get_property(HSA_RUNTIME_INCLUDE_DIRS
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH
REQUIRED)
get_property(
HSA_RUNTIME_INCLUDE_DIRS
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH "${HSA_H}" DIRECTORY)
add_custom_command(
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
-o hsa_ext_amd.h.i
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i
hsa_ext_amd.h.i
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
> hsa_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
hsa hsa_input.h
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
"${HSA_RUNTIME_INC_PATH}/hsa.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
COMMENT "Generating HSA API files for the `ctf` plugin...")
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
hsa_ext_amd.h.i
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i hsa_ext_amd.h.i
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h" > hsa_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hsa
hsa_input.h
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HSA_RUNTIME_INC_PATH}/hsa.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
COMMENT "Generating HSA API files for the `ctf` plugin...")
# Generate barectf YAML and C++ files for HIP API.
get_property(HIP_INCLUDE_DIRS TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip
NO_DEFAULT_PATH
REQUIRED)
find_file(HIP_PROF_STR_H hip_prof_str.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip hip/amd_detail
NO_DEFAULT_PATH
REQUIRED)
get_property(
HIP_INCLUDE_DIRS
TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip
NO_DEFAULT_PATH REQUIRED)
find_file(
HIP_PROF_STR_H hip_prof_str.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip hip/amd_detail
NO_DEFAULT_PATH REQUIRED)
list(TRANSFORM HIP_INCLUDE_DIRS PREPEND -I)
add_custom_command(
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS}
-E "${HIP_RUNTIME_API_H}"
-D__HIP_PLATFORM_HCC__=1
-D__HIP_ROCclr__=1
-o hip_runtime_api.h.i
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
BYPRODUCTS hip_runtime_api.h.i hip_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
hip hip_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
"${HIP_RUNTIME_API_H}"
"${HIP_PROF_STR_H}"
COMMENT "Generating HIP API files for the `ctf` plugin...")
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS} -E "${HIP_RUNTIME_API_H}"
-D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
BYPRODUCTS hip_runtime_api.h.i hip_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hip
hip_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HIP_RUNTIME_API_H}"
"${HIP_PROF_STR_H}"
COMMENT "Generating HIP API files for the `ctf` plugin...")
# Generate `env.yaml` (trace environment for barectf).
add_custom_command(
OUTPUT env.yaml
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
${PROJECT_VERSION}
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
COMMENT "Generating `env.yaml`...")
OUTPUT env.yaml
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
${PROJECT_VERSION}
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
COMMENT "Generating `env.yaml`...")
# Generate raw CTF tracer with barectf.
add_custom_command(
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
"-I${CMAKE_CURRENT_SOURCE_DIR}"
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
DEPENDS hsa_erts.yaml
hip_erts.yaml
env.yaml
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
COMMENT "Generating raw CTF tracer with barectf...")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
DESTINATION "${METADATA_STREAM_FILE_DIR}" COMPONENT plugins)
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
"-I${CMAKE_CURRENT_SOURCE_DIR}" "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
DEPENDS hsa_erts.yaml hip_erts.yaml env.yaml "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
COMMENT "Generating raw CTF tracer with barectf...")
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
DESTINATION "${METADATA_STREAM_FILE_DIR}"
COMPONENT plugins)
@@ -156,9 +156,8 @@ class HsaApiEventRecord : public TracerEventRecord<barectf_hsa_api_ctx> {
const rocprofiler_session_id_t session_id,
const std::uint64_t clock_val)
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val} {
if(record.api_data.hsa)
api_data_ = *(record.api_data.hsa);
}
if (record.api_data.hsa) api_data_ = *(record.api_data.hsa);
}
explicit HsaApiEventRecord(const rocprofiler_record_tracer_t& record,
const std::uint64_t clock_val, hsa_api_data_t& api_data)
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val}, api_data_(api_data) {}
@@ -206,7 +205,7 @@ class HipApiEventRecord : public TracerEventRecord<barectf_hip_api_ctx> {
const rocprofiler_session_id_t session_id,
const std::uint64_t clock_val)
: TracerEventRecord<barectf_hip_api_ctx>{record, clock_val},
api_data_{record.api_data.hip? *(record.api_data.hip) : hip_api_data_t{}},
api_data_{record.api_data.hip ? *(record.api_data.hip) : hip_api_data_t{}},
kernel_name_{record.name ? record.name : std::string{}} {}
explicit HipApiEventRecord(const rocprofiler_record_tracer_t& record,
const std::uint64_t clock_val, hip_api_data_t& api_data,
@@ -760,16 +759,11 @@ std::uint64_t GetMetadataClkClsOffset() {
static const char* LOOP_MPI_RANK(const std::vector<const char*>& mpivars) {
for (const char* env : mpivars)
if (const char* envvar = getenv(env))
return envvar;
if (const char* envvar = getenv(env)) return envvar;
return nullptr;
}
static void insert_meta_to_stream(
std::stringstream& stream,
const char* field,
const char* value
) {
static void insert_meta_to_stream(std::stringstream& stream, const char* field, const char* value) {
if (!field || !value) return;
stream << "\n\t" << std::string(field) << " = " << std::string(value) << ';';
}
@@ -802,7 +796,7 @@ void Plugin::CopyAdjustedMetadataStreamFile(const fs::path& metadata_stream_path
std::string data_ins = data_stream.str();
size_t env_pos = metadata.find("env {");
if (env_pos != std::string::npos)
metadata.insert(metadata.begin()+env_pos+5, data_ins.begin(), data_ins.end());
metadata.insert(metadata.begin() + env_pos + 5, data_ins.begin(), data_ins.end());
else
std::cerr << "Failed to insert MPI metadata!" << std::endl;
}
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
file(GLOB FILE_SOURCES "*.cpp")
add_library(file_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
set_target_properties(file_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
set_target_properties(
file_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(file_plugin
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
target_compile_definitions(file_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(file_plugin PRIVATE ${PROJECT_SOURCE_DIR})
target_link_options(file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
target_link_options(
file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs amd_comgr dl)
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs amd_comgr dl)
install(TARGETS file_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS file_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
@@ -216,8 +216,7 @@ class file_plugin_t {
case ACTIVITY_DOMAIN_HIP_API: {
if (hip_api_header_written_.load(std::memory_order_relaxed)) return;
output_file = get_output_file(output_type_t::TRACER, ACTIVITY_DOMAIN_HIP_API);
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID"
<< std::endl;
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID" << std::endl;
*output_file << std::endl;
hip_api_header_written_.exchange(true, std::memory_order_release);
return;
@@ -1,27 +1,27 @@
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
add_library(perfetto_plugin
${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES}
perfetto.cpp perfetto_sdk/sdk/perfetto.cc)
add_library(perfetto_plugin ${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES} perfetto.cpp
perfetto_sdk/sdk/perfetto.cc)
set_target_properties(perfetto_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
set_target_properties(
perfetto_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(perfetto_plugin
PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_compile_definitions(perfetto_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(perfetto_plugin
PRIVATE ${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
target_include_directories(
perfetto_plugin PRIVATE ${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
target_link_options(perfetto_plugin
PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
target_link_options(
perfetto_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs amd_comgr)
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs
amd_comgr)
install(TARGETS perfetto_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT plugins)
install(TARGETS perfetto_plugin
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME} COMPONENT plugins)
@@ -556,8 +556,7 @@ class perfetto_plugin_t {
if (tracer_record.name) {
kernel_name = rocprofiler::cxx_demangle(tracer_record.name);
TRACE_EVENT_BEGIN(
"HIP_OPS",
perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
"HIP_OPS", perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
gpu_track, tracer_record.timestamps.begin.value, "Agent ID",
tracer_record.agent_id.handle, "Process ID", GetPid(), "Kernel Name", kernel_name,
perfetto::Flow::ProcessScoped(tracer_record.correlation_id.value));
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
@@ -36,9 +36,10 @@
#include "src/utils/helper.h"
// Macro to check ROCProfiler calls status
#define CHECK_ROCPROFILER(call) \
#define CHECK_ROCPROFILER(call) \
do { \
if ((call) != ROCPROFILER_STATUS_SUCCESS) rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
if ((call) != ROCPROFILER_STATUS_SUCCESS) \
rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
} while (false)
namespace {
@@ -48,8 +49,6 @@ namespace {
return pid;
}
[[maybe_unused]] uint64_t GetMachineID() {
return gethostid();
}
[[maybe_unused]] uint64_t GetMachineID() { return gethostid(); }
} // namespace
@@ -26,9 +26,11 @@ set(ROCPROF_WRAPPER_BIN_DIR ${ROCPROF_WRAPPER_DIR}/bin)
set(ROCPROF_WRAPPER_LIB_DIR ${ROCPROF_WRAPPER_DIR}/lib)
set(ROCPROF_WRAPPER_TOOL_DIR ${ROCPROF_WRAPPER_DIR}/tool)
#Function to generate header template file
# Function to generate header template file
function(create_header_template)
file(WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in "/*
file(
WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in
"/*
Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -69,105 +71,142 @@ function(create_header_template)
#endif")
endfunction()
#use header template file and generate wrapper header files
# use header template file and generate wrapper header files
function(generate_wrapper_header)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
#find all header files from inc
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
#Convert the list of files into #includes
foreach(header_file ${include_files})
#set include guard
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
# find all header files from inc
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
# Convert the list of files into #includes
foreach(header_file ${include_files})
# set include guard
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
# set include statement
get_filename_component(file_name ${header_file} NAME)
set(include_statements
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
)
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
unset(include_guard)
unset(include_statements)
endforeach()
# Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged.
# So drectly using that file name
set(file_name "activity.h")
# set include guard
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
#set include statement
get_filename_component(file_name ${header_file} NAME)
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
unset(include_guard)
unset(include_statements)
endforeach()
#Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged. So drectly using that file name
set(file_name "activity.h")
#set include guard
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
set(include_statements
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
)
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
endfunction()
#function to create symlink to binaries
# function to create symlink to binaries
function(create_binary_symlink)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
#create symlink for rocprof
set(file_name "rocprof")
add_custom_target(link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_BINDIR}/${file_name} ${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
# create symlink for rocprof
set(file_name "rocprof")
add_custom_target(
link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_BINDIR}/${file_name}
${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
endfunction()
#function to create symlink to libraries
# function to create symlink to libraries
function(create_library_symlink)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
set(SO_VERSION "${LIB_VERSION_STRING}")
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}" "${LIB_ROCPROF}.${SO_VERSION}")
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
set(SO_VERSION "${LIB_VERSION_STRING}")
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}"
"${LIB_ROCPROF}.${SO_VERSION}")
foreach(file_name ${library_files})
add_custom_target(link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
endforeach()
#create symlink to rocprofiler/tool/libtool.so
# With File reorg,tool renamed to rocprof-tool
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
set(LIB_TOOL "libtool.so")
set(LIB_ROCPROFTOOL "librocprof-tool.so")
add_custom_target(link_${LIB_TOOL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL} ${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
#create symlink to test binary
#since its saved in lib folder , the code for the same is added here
# With File reorg ,binary name changed from ctrl to rocprof-ctrl
set(TEST_CTRL "ctrl")
set(TEST_ROCPROFCTRL "rocprof-ctrl")
add_custom_target(link_${TEST_CTRL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL} ${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
set(METRICS "metrics.xml")
add_custom_target(link_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
foreach(file_name ${library_files})
add_custom_target(
link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${file_name}
${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
endforeach()
# create symlink to rocprofiler/tool/libtool.so With File reorg,tool renamed to
# rocprof-tool
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
set(LIB_TOOL "libtool.so")
set(LIB_ROCPROFTOOL "librocprof-tool.so")
add_custom_target(
link_${LIB_TOOL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL}
${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
# create symlink to test binary since its saved in lib folder , the code for the same
# is added here With File reorg ,binary name changed from ctrl to rocprof-ctrl
set(TEST_CTRL "ctrl")
set(TEST_ROCPROFCTRL "rocprof-ctrl")
add_custom_target(
link_${TEST_CTRL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL}
${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
set(METRICS "metrics.xml")
add_custom_target(
link_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS}
${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
set(GFX_METRICS "gfx_metrics.xml")
add_custom_target(link_gfx_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
set(GFX_METRICS "gfx_metrics.xml")
add_custom_target(
link_gfx_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS}
${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
endfunction()
#Creater a template for header file
# Creater a template for header file
create_header_template()
#Use template header file and generater wrapper header files
# Use template header file and generater wrapper header files
generate_wrapper_header()
install(DIRECTORY ${ROCPROF_WRAPPER_INC_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT dev)
install(
DIRECTORY ${ROCPROF_WRAPPER_INC_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT dev)
# Create symlink to binaries
create_binary_symlink()
install(DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
install(
DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime)
create_library_symlink()
install(DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR} DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
install(FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so DESTINATION ${ROCPROFILER_NAME}/lib
COMPONENT dev)
#install tools directory
install(DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
install(
DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
install(
FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so
DESTINATION ${ROCPROFILER_NAME}/lib
COMPONENT dev)
# install tools directory
install(
DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime)
@@ -1,15 +1,18 @@
include (CheckCSourceCompiles)
# ############################################################################################################################################
# ############################################################################################################################################
include(CheckCSourceCompiles)
# ########################################################################################
# ########################################################################################
# General Requirements
# ############################################################################################################################################
# ############################################################################################################################################
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH
REQUIRED)
# ########################################################################################
# ########################################################################################
get_property(
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
include_directories(${HSA_RUNTIME_INC_PATH})
@@ -22,138 +25,179 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${ROCM_PATH}/lib/cmake/hip")
set(CMAKE_HIP_ARCHITECTURES OFF)
find_package(HIP REQUIRED MODULE)
find_package(Clang REQUIRED CONFIG
PATHS "${ROCM_PATH}"
PATH_SUFFIXES "llvm/lib/cmake/clang")
find_package(
Clang REQUIRED CONFIG
PATHS "${ROCM_PATH}"
PATH_SUFFIXES "llvm/lib/cmake/clang")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules" "${ROCM_PATH}/lib/cmake/hip")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules"
"${ROCM_PATH}/lib/cmake/hip")
find_package(LibElf REQUIRED)
find_package(LibDw REQUIRED)
## Add a custom targets to build and run all the tests
# Add a custom targets to build and run all the tests
add_custom_target(samples)
add_dependencies(samples rocprofiler-v2)
add_custom_target(run-samples COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh DEPENDS samples)
add_custom_target(
run-samples
COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh
DEPENDS samples)
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
# ############################################################################################################################################
# ########################################################################################
# ############################################################################################################################################
# ############################################################################################################################################
# ########################################################################################
# ########################################################################################
# Samples Build & Run Script
# ############################################################################################################################################
# ############################################################################################################################################
# ########################################################################################
# ########################################################################################
# ############################################################################################################################################
# ########################################################################################
# Profiler Samples
# ############################################################################################################################################
# ########################################################################################
## Build Kernel No Replay Sample
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build Kernel No Replay Sample
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(
profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(profiler_kernel_no_replay PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(profiler_kernel_no_replay PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples profiler_kernel_no_replay)
install(TARGETS profiler_kernel_no_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS profiler_kernel_no_replay
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build Device Profiling Sample
set_source_files_properties(profiler/device_profiling_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build Device Profiling Sample
set_source_files_properties(profiler/device_profiling_sample.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(profiler_device_profiling PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(profiler_device_profiling PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples profiler_device_profiling)
install(TARGETS profiler_device_profiling RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS profiler_device_profiling
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build Counters Sampling example
set_source_files_properties(counters_sampler/pcie_counters_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build Counters Sampling example
set_source_files_properties(counters_sampler/pcie_counters_example.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(pcie_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
target_link_options(pcie_counters_sampler PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples pcie_counters_sampler)
install(TARGETS pcie_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS pcie_counters_sampler
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build XGMI Counters Sampling example
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build XGMI Counters Sampling example
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(
xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(xgmi_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
target_link_options(xgmi_counters_sampler PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples xgmi_counters_sampler)
install(TARGETS xgmi_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS xgmi_counters_sampler
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
# ################################################################################################################
# ########################################################################################
# ############################################################################################################################################
# ########################################################################################
# Tracer Samples
# ############################################################################################################################################
# ########################################################################################
## Build HIP/HSA Trace Sample
# Build HIP/HSA Trace Sample
set_source_files_properties(tracer/sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(tracer_hip_hsa tracer/sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(tracer_hip_hsa PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(tracer_hip_hsa PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples tracer_hip_hsa)
install(TARGETS tracer_hip_hsa RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS tracer_hip_hsa
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build HIP/HSA Trace with async output api trace data Sample
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build HIP/HSA Trace with async output api trace data Sample
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT
1)
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(tracer_hip_hsa_async PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(tracer_hip_hsa_async PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples tracer_hip_hsa_async)
install(TARGETS tracer_hip_hsa_async RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS tracer_hip_hsa_async
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
# ############################################################################################################################################
# ########################################################################################
# PC Sampling Samples
# ############################################################################################################################################
# ########################################################################################
set(CODE_PRINTING_SAMPLE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/pcsampler/code_printing_sample)
file(GLOB PC_SAMPLING_CODE_PRINTING_FILES ${CODE_PRINTING_SAMPLE_DIR}/*.cpp)
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES}
HIPCC_OPTIONS
-std=c++17
# Include debugging symbols and source for the contextual disassembly
-gdwarf-4)
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES}
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(
pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES} HIPCC_OPTIONS -std=c++17
# Include debugging symbols and source for the contextual disassembly
-gdwarf-4)
check_c_source_compiles("
check_c_source_compiles(
"
#define _GNU_SOURCE
#include <sys/mman.h>
int main() { return memfd_create (\"cmake_test\", 0); }
" HAVE_MEMFD_CREATE)
if (HAVE_MEMFD_CREATE)
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
endif()
"
HAVE_MEMFD_CREATE)
if(HAVE_MEMFD_CREATE)
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
endif()
target_link_libraries(pc_sampling_code_printing
PRIVATE
rocprofiler-v2
rocm-dbgapi
${LIBELF_LIBRARIES}
${LIBDW_LIBRARIES}
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
target_include_directories(pc_sampling_code_printing
PRIVATE
${TEST_DIR}
${ROOT_DIR}
${HSA_RUNTIME_INC_PATH}
${PROJECT_SOURCE_DIR})
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples pc_sampling_code_printing)
install(TARGETS pc_sampling_code_printing RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
target_link_libraries(
pc_sampling_code_printing
PRIVATE rocprofiler-v2 rocm-dbgapi ${LIBELF_LIBRARIES} ${LIBDW_LIBRARIES}
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
target_include_directories(
pc_sampling_code_printing PRIVATE ${TEST_DIR} ${ROOT_DIR} ${HSA_RUNTIME_INC_PATH}
${PROJECT_SOURCE_DIR})
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples pc_sampling_code_printing)
install(TARGETS pc_sampling_code_printing
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
install(DIRECTORY "${PROJECT_SOURCE_DIR}/samples/" DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src OPTIONAL COMPONENT samples)
install(
DIRECTORY "${PROJECT_SOURCE_DIR}/samples/"
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src
OPTIONAL
COMPONENT samples)
# ############################################################################################################################################
# ########################################################################################
# Scripts to run samples
# ############################################################################################################################################
# ########################################################################################
# Copy run_samples script to samples folder
configure_file(run_samples.sh ${PROJECT_BINARY_DIR}/samples COPYONLY)
# ############################################################################################################################################
# ########################################################################################
@@ -8,15 +8,14 @@ int main(int argc, char** argv) {
"CI_PERF_slv_MemRd_Bandwidth0", "CI_PERF_slv_MemWr_Bandwidth0", "CI_PERF_slv_totalMemRdTx",
"CI_PERF_slv_totalMemWrTx", "CI_PERF_slv_totalTx"};
if(argc > 1) {
if (argc > 1) {
counter_option = atoi(argv[1]);
}
else{
std::cout<< "Please provide one of the counter index options as argument:\n";
for(int i = 0; i < pcie_counters.size(); i++){
std::cout<< "[" << i << "]: " << pcie_counters[i] << std::endl;
} else {
std::cout << "Please provide one of the counter index options as argument:\n";
for (int i = 0; i < pcie_counters.size(); i++) {
std::cout << "[" << i << "]: " << pcie_counters[i] << std::endl;
}
std::cout<< "Example:\n ./pcie_counters_sampler 1\n";
std::cout << "Example:\n ./pcie_counters_sampler 1\n";
exit(0);
}
@@ -55,10 +54,10 @@ int main(int argc, char** argv) {
.sampling_rate = rate,
.sampling_duration = duration,
.gpu_agent_index = 0};
CHECK_ROCPROFILER(
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
0, &filter_id, property));
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
// Normal HIP Calls
@@ -40,14 +40,14 @@ int main(int argc, char** argv) {
uint32_t duration = 5000;
rocprofiler_counters_sampler_parameters_t cs_parameters = {.counters = counters_input,
.counters_num = 1,
.sampling_rate = rate,
.sampling_duration = duration,
.gpu_agent_index = 0};
CHECK_ROCPROFILER(
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
0, &filter_id, property));
.counters_num = 1,
.sampling_rate = rate,
.sampling_duration = duration,
.gpu_agent_index = 0};
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
// Normal HIP Calls
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
@@ -31,96 +31,74 @@
namespace amd::debug_agent {
class code_object_t {
struct symbol_info_t {
const std::string m_name;
amd_dbgapi_global_address_t m_value;
amd_dbgapi_size_t m_size;
};
struct symbol_info_t {
const std::string m_name;
amd_dbgapi_global_address_t m_value;
amd_dbgapi_size_t m_size;
};
using symbol_map_t =
std::optional
< std::map
< amd_dbgapi_global_address_t
, std::pair<std::string, amd_dbgapi_size_t>
>
>;
using symbol_map_t = std::optional<
std::map<amd_dbgapi_global_address_t, std::pair<std::string, amd_dbgapi_size_t>>>;
public:
void load_symbol_map();
void load_debug_info();
public:
void load_symbol_map();
void load_debug_info();
std::optional<symbol_info_t>
find_symbol(amd_dbgapi_global_address_t address);
std::optional<symbol_info_t> find_symbol(amd_dbgapi_global_address_t address);
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
code_object_t(code_object_t &&rhs);
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
code_object_t(code_object_t&& rhs);
~code_object_t();
~code_object_t();
void open();
bool is_open() const { return m_fd.has_value(); }
void open();
bool is_open() const { return m_fd.has_value(); }
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
// FIXME(?): extra function not in rocr-debug-agent
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
// FIXME(?): extra function not in rocr-debug-agent
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t pc);
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t pc);
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t start_addr,
bool const print_src = false);
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t start_addr, bool const print_src = false);
bool save(const std::string &directory) const;
bool save(const std::string& directory) const;
amd_dbgapi_global_address_t m_load_address{ 0 };
amd_dbgapi_size_t m_mem_size{ 0 };
std::optional<int> m_fd;
amd_dbgapi_global_address_t m_load_address{0};
amd_dbgapi_size_t m_mem_size{0};
std::optional<int> m_fd;
std::optional
< std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>
>
m_line_number_map;
std::optional<std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>>
m_line_number_map;
std::optional
< std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>
>
m_pc_ranges_map;
std::optional<std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>> m_pc_ranges_map;
symbol_map_t m_symbol_map;
std::string m_uri;
amd_dbgapi_code_object_id_t const m_code_object_id;
// FIXME(?): extra field not in rocr-debug-agent
uint32_t m_elf_amdgpu_machine{ 0 };
symbol_map_t m_symbol_map;
std::string m_uri;
amd_dbgapi_code_object_id_t const m_code_object_id;
// FIXME(?): extra field not in rocr-debug-agent
uint32_t m_elf_amdgpu_machine{0};
};
} // namespace amd::debug_agent
} // namespace amd::debug_agent
enum struct disassembly_mode {
AROUND,
KERNEL
};
enum struct disassembly_mode { AROUND, KERNEL };
std::tuple
< amd_dbgapi_process_id_t
, std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
>
std::tuple<amd_dbgapi_process_id_t,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>>
init_disassembly();
void
disassemble(
disassembly_mode const mode,
amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
&code_object_map,
uint64_t const addr);
void disassemble(
disassembly_mode const mode, amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
uint64_t const addr);
void
print_pc_context(
amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
&code_object_map,
amd_dbgapi_global_address_t const pc);
void print_pc_context(
amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
amd_dbgapi_global_address_t const pc);
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
@@ -47,169 +47,130 @@
#include "program.hpp"
struct libc_freer {
void operator()(char *p) { free(p); }
void operator()(char* p) { free(p); }
};
namespace util {
template <typename T, typename... Ts>
static void
hash_combine(size_t &hsh, T const& v, Ts const&... rest)
{
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
(hash_combine(hsh, rest), ...);
static void hash_combine(size_t& hsh, T const& v, Ts const&... rest) {
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
(hash_combine(hsh, rest), ...);
}
} // namespace util
} // namespace util
[[maybe_unused]]
static inline bool
operator==(hsa_executable_t const &l, hsa_executable_t const &r)
{
return l.handle == r.handle;
[[maybe_unused]] static inline bool operator==(hsa_executable_t const& l,
hsa_executable_t const& r) {
return l.handle == r.handle;
}
[[maybe_unused]]
static inline bool
operator==(
rocprofiler_kernel_dispatch_id_t const &l,
rocprofiler_kernel_dispatch_id_t const &r)
{
return l.value == r.value;
[[maybe_unused]] static inline bool operator==(rocprofiler_kernel_dispatch_id_t const& l,
rocprofiler_kernel_dispatch_id_t const& r) {
return l.value == r.value;
}
static inline bool
operator==(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
{
return l.handle == r.handle;
static inline bool operator==(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
return l.handle == r.handle;
}
static inline bool
operator!=(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
{
return !(l == r);
static inline bool operator!=(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
return !(l == r);
}
namespace std {
template <>
struct hash<hsa_executable_t> {
size_t operator()(hsa_executable_t const &v) const {
size_t ret = 0;
util::hash_combine(ret, v.handle);
return ret;
}
template <> struct hash<hsa_executable_t> {
size_t operator()(hsa_executable_t const& v) const {
size_t ret = 0;
util::hash_combine(ret, v.handle);
return ret;
}
};
template <>
struct hash<rocprofiler_kernel_dispatch_id_t> {
size_t operator()(rocprofiler_kernel_dispatch_id_t const &v) const {
size_t ret = 0;
util::hash_combine(ret, v.value);
return ret;
}
template <> struct hash<rocprofiler_kernel_dispatch_id_t> {
size_t operator()(rocprofiler_kernel_dispatch_id_t const& v) const {
size_t ret = 0;
util::hash_combine(ret, v.value);
return ret;
}
};
} // namespace std
} // namespace std
struct disassembly_ctx_t {
disassembly_ctx_t();
~disassembly_ctx_t();
disassembly_ctx_t();
~disassembly_ctx_t();
void disassemble_kernels(bool const reinitialize);
void init();
bool inited() const;
void reset();
void disassemble_kernels(bool const reinitialize);
void init();
bool inited() const;
void reset();
amd_dbgapi_process_id_t process_id;
std::map
< amd_dbgapi_global_address_t
, amd::debug_agent::code_object_t
> codeobjs;
amd_dbgapi_process_id_t process_id;
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t> codeobjs;
};
disassembly_ctx_t::disassembly_ctx_t()
: process_id(AMD_DBGAPI_PROCESS_NONE)
, codeobjs()
{}
disassembly_ctx_t::disassembly_ctx_t() : process_id(AMD_DBGAPI_PROCESS_NONE), codeobjs() {}
disassembly_ctx_t::~disassembly_ctx_t()
{
disassembly_ctx_t::~disassembly_ctx_t() { reset(); }
void disassembly_ctx_t::disassemble_kernels(bool const reinitialize) {
if (reinitialize) {
reset();
}
}
if (!inited()) {
init();
}
void
disassembly_ctx_t::disassemble_kernels(bool const reinitialize)
{
if (reinitialize) {
reset();
}
if (!inited()) {
init();
auto it = codeobjs.begin();
auto const end = codeobjs.end();
auto const pred = [](decltype(*it)& x) {
/*
* A lame filter for the kernels in the current file, because nothing
* else in this little demo will have the URL prefix of `file://`.
*/
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
};
while (end != (it = std::find_if(it, end, pred))) {
auto& codeobj = it->second;
codeobj.load_symbol_map();
if (!codeobj.m_symbol_map) {
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
break;
}
auto it = codeobjs.begin();
auto const end = codeobjs.end();
auto const pred = [](decltype(*it) &x){
/*
* A lame filter for the kernels in the current file, because nothing
* else in this little demo will have the URL prefix of `file://`.
*/
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
};
while (end != (it = std::find_if(it, end, pred))) {
auto &codeobj = it->second;
codeobj.load_symbol_map();
if (!codeobj.m_symbol_map) {
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
break;
}
for (auto const &sym : *codeobj.m_symbol_map) {
auto const &addr = sym.first;
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
}
++it;
for (auto const& sym : *codeobj.m_symbol_map) {
auto const& addr = sym.first;
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
}
++it;
}
}
inline void
disassembly_ctx_t::init()
{
std::tie(process_id, codeobjs) = init_disassembly();
}
inline void disassembly_ctx_t::init() { std::tie(process_id, codeobjs) = init_disassembly(); }
inline bool
disassembly_ctx_t::inited() const
{
return AMD_DBGAPI_PROCESS_NONE != process_id;
}
inline bool disassembly_ctx_t::inited() const { return AMD_DBGAPI_PROCESS_NONE != process_id; }
void
disassembly_ctx_t::reset()
{
codeobjs.clear();
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
amd_dbgapi_process_detach(process_id);
amd_dbgapi_finalize();
process_id = AMD_DBGAPI_PROCESS_NONE;
}
void disassembly_ctx_t::reset() {
codeobjs.clear();
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
amd_dbgapi_process_detach(process_id);
amd_dbgapi_finalize();
process_id = AMD_DBGAPI_PROCESS_NONE;
}
}
static disassembly_ctx_t g_dis;
void
disassembly_disassemble_kernels(bool const reinitialize)
{
g_dis.disassemble_kernels(reinitialize);
void disassembly_disassemble_kernels(bool const reinitialize) {
g_dis.disassemble_kernels(reinitialize);
}
void
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc)
{
if (!g_dis.inited()) {
g_dis.init();
}
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc) {
if (!g_dis.inited()) {
g_dis.init();
}
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
}
@@ -23,10 +23,8 @@
#include <amd-dbgapi/amd-dbgapi.h>
void
disassembly_disassemble_kernels(bool const);
void disassembly_disassemble_kernels(bool const);
void
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
@@ -46,274 +46,227 @@
namespace util {
struct hipMalloc_freer {
void operator()(void * const ptr) { (void)hipFree(ptr); }
void operator()(void* const ptr) { (void)hipFree(ptr); }
};
} // namespace util
} // namespace util
namespace prng {
static uint64_t
splitmix64_next(uint64_t * const sm64_state)
{
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return z ^ (z >> 31);
static uint64_t splitmix64_next(uint64_t* const sm64_state) {
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return z ^ (z >> 31);
}
static inline uint64_t
rotl64(const uint64_t x, int k)
{
return (x << k) | (x >> (64 - k));
static inline uint64_t rotl64(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); }
static uint64_t xrs_next(uint64_t* const xrs_state) {
const uint64_t result = rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
const uint64_t t = xrs_state[1] << 17;
xrs_state[2] ^= xrs_state[0];
xrs_state[3] ^= xrs_state[1];
xrs_state[1] ^= xrs_state[2];
xrs_state[0] ^= xrs_state[3];
xrs_state[2] ^= t;
xrs_state[3] = rotl64(xrs_state[3], 45);
return result;
}
static uint64_t
xrs_next(uint64_t * const xrs_state)
{
const uint64_t result =
rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
const uint64_t t = xrs_state[1] << 17;
xrs_state[2] ^= xrs_state[0];
xrs_state[3] ^= xrs_state[1];
xrs_state[1] ^= xrs_state[2];
xrs_state[0] ^= xrs_state[3];
xrs_state[2] ^= t;
xrs_state[3] = rotl64(xrs_state[3], 45);
return result;
}
} // namespace prng
} // namespace prng
namespace kernel {
template <typename T>
__global__ static void
memset_gpu(T * const s, T const c, size_t const n)
{
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
s[i] = c;
}
template <typename T> __global__ static void memset_gpu(T* const s, T const c, size_t const n) {
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
s[i] = c;
}
}
template <typename T>
__global__ static void
count_gpu(
T const * const xs,
T * const out,
size_t const n,
size_t const nblocks,
T const gt)
{
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
if (xs[i] > gt) {
atomicAdd(&out[i % nblocks], 1);
}
__global__ static void count_gpu(T const* const xs, T* const out, size_t const n,
size_t const nblocks, T const gt) {
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
if (xs[i] > gt) {
atomicAdd(&out[i % nblocks], 1);
}
}
}
} // namespace kernel
} // namespace kernel
static char const GETOPT_ARGS[] = "cd:mn:DP";
static void
usage()
{
fputs("usage: " PROGNAME " [OPTION]... MIN [SEED]\n"
" -d DEV\tHIP device number\n"
" -n LEN\tLength of random integer array\n"
" -D\t\tPrint kernel disassembly\n"
" -P\t\tPrint source and disassembly of sampled PC locations\n"
"where\n"
" DEV : i32\n"
" MIN : u64\n"
" LEN : u64\n"
" SEED : u64\n",
stderr);
static void usage() {
fputs("usage: " PROGNAME
" [OPTION]... MIN [SEED]\n"
" -d DEV\tHIP device number\n"
" -n LEN\tLength of random integer array\n"
" -D\t\tPrint kernel disassembly\n"
" -P\t\tPrint source and disassembly of sampled PC locations\n"
"where\n"
" DEV : i32\n"
" MIN : u64\n"
" LEN : u64\n"
" SEED : u64\n",
stderr);
}
static int
get_options(int argc, char **argv, program_options * const opts)
{
int opt;
static int get_options(int argc, char** argv, program_options* const opts) {
int opt;
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
switch (opt) {
case 'd':
// TODO error checking
opts->device = strtol(optarg, nullptr, 10);
break;
case 'n':
// TODO error checking
opts->rands_len = strtoul(optarg, nullptr, 10);
break;
case 'D':
opts->disassemble = true;
break;
case 'P':
opts->pc_sampling = true;
break;
default:
usage();
return EXIT_FAILURE;
}
}
auto const optcount = argc - optind;
if (!(1 == optcount || 2 == optcount)) {
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
switch (opt) {
case 'd':
// TODO error checking
opts->device = strtol(optarg, nullptr, 10);
break;
case 'n':
// TODO error checking
opts->rands_len = strtoul(optarg, nullptr, 10);
break;
case 'D':
opts->disassemble = true;
break;
case 'P':
opts->pc_sampling = true;
break;
default:
usage();
return EXIT_FAILURE;
}
}
// TODO error checking
opts->gt = strtoul(argv[optind], nullptr, 10);
if (2 == argc - optind) {
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
}
auto const optcount = argc - optind;
if (!(1 == optcount || 2 == optcount)) {
usage();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
// TODO error checking
opts->gt = strtoul(argv[optind], nullptr, 10);
if (2 == argc - optind) {
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
}
return EXIT_SUCCESS;
}
static program_options g_opts;
static void
callback_flush_fn(
rocprofiler_record_header_t const *record,
rocprofiler_record_header_t const *end_record,
rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id)
{
while (record < end_record) {
if (nullptr == record) {
break;
}
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
auto const &pcr = (rocprofiler_record_pc_sample_t &)*record;
printf(
"dispatch[%" PRIu64 "] timestamp(%" PRIu64
") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
") se(%" PRIu32 ")\n",
pcr.pc_sample.dispatch_id.value,
pcr.pc_sample.timestamp.value,
pcr.pc_sample.gpu_id.handle,
pcr.pc_sample.pc,
pcr.pc_sample.se);
if (g_opts.pc_sampling) {
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
}
}
rocprofiler_next_record(record, &record, session_id, buffer_id);
static void callback_flush_fn(rocprofiler_record_header_t const* record,
rocprofiler_record_header_t const* end_record,
rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id) {
while (record < end_record) {
if (nullptr == record) {
break;
}
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
auto const& pcr = (rocprofiler_record_pc_sample_t&)*record;
printf("dispatch[%" PRIu64 "] timestamp(%" PRIu64 ") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
") se(%" PRIu32 ")\n",
pcr.pc_sample.dispatch_id.value, pcr.pc_sample.timestamp.value,
pcr.pc_sample.gpu_id.handle, pcr.pc_sample.pc, pcr.pc_sample.se);
if (g_opts.pc_sampling) {
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
}
}
rocprofiler_next_record(record, &record, session_id, buffer_id);
}
}
static int
run_kernel(program_options const &opts)
{
rocprofiler_session_id_t sid;
rocprofiler_filter_id_t fid, fid2;
rocprofiler_buffer_id_t bid;
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
static int run_kernel(program_options const& opts) {
rocprofiler_session_id_t sid;
rocprofiler_filter_id_t fid, fid2;
rocprofiler_buffer_id_t bid;
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
if (opts.pc_sampling) {
ROCPROFILER_CHECK(
rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to create rocprofiler session\n", stderr);
return EXIT_FAILURE;
}
rocprofiler_filter_property_t property{};
ROCPROFILER_CHECK(
rocprofiler_create_buffer(
sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to add PC sampling session mode\n", stderr);
goto out;
}
ROCPROFILER_CHECK(
rocprofiler_create_filter(
sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
rocprofiler_filter_data_t{},
0, &fid, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_create_filter(
sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{},
0, &fid2, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_set_filter_buffer(sid, fid, bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_set_filter_buffer(sid, fid2, bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_start_session(sid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
if (opts.pc_sampling) {
ROCPROFILER_CHECK(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to create rocprofiler session\n", stderr);
return EXIT_FAILURE;
}
{
rocprofiler_filter_property_t property{};
ROCPROFILER_CHECK(
rocprofiler_create_buffer(sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to add PC sampling session mode\n", stderr);
goto out;
}
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
rocprofiler_filter_data_t{}, 0, &fid, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{}, 0, &fid2, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid, bid), rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid2, bid), rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_start_session(sid), rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
}
{
printf("seed = %" PRIu64 "\n", opts.seed);
std::vector<uint64_t> rands(opts.rands_len);
using rands_elt_t = decltype(rands)::value_type;
uint64_t
sm64_state = opts.seed,
xrs_state[4];
uint64_t sm64_state = opts.seed, xrs_state[4];
{
using prng::splitmix64_next;
using prng::xrs_next;
using prng::splitmix64_next;
using prng::xrs_next;
// Initialize the Xoroshiro PRNG
xrs_state[0] = splitmix64_next(&sm64_state);
xrs_state[1] = splitmix64_next(&sm64_state);
xrs_state[2] = splitmix64_next(&sm64_state);
xrs_state[3] = splitmix64_next(&sm64_state);
// Initialize the Xoroshiro PRNG
xrs_state[0] = splitmix64_next(&sm64_state);
xrs_state[1] = splitmix64_next(&sm64_state);
xrs_state[2] = splitmix64_next(&sm64_state);
xrs_state[3] = splitmix64_next(&sm64_state);
// Fill rands with random integers
for (auto &i : rands) {
i = xrs_next(xrs_state);
}
// Fill rands with random integers
for (auto& i : rands) {
i = xrs_next(xrs_state);
}
}
struct tm {
using monoclk = std::chrono::steady_clock;
using dur = std::chrono::duration<double>;
using monoclk = std::chrono::steady_clock;
using dur = std::chrono::duration<double>;
};
using util::hipMalloc_freer;
@@ -322,126 +275,109 @@ run_kernel(program_options const &opts)
auto hip_ok = hipSuccess;
do {
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
{
rands_elt_t *rands_gpu_ptr;
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
rands_gpu.reset(rands_gpu_ptr);
}
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
{
rands_elt_t* rands_gpu_ptr;
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
rands_gpu.reset(rands_gpu_ptr);
}
HIP_CHECK_BREAK(
hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes,
hipMemcpyHostToDevice),
hip_ok);
(void)hipDeviceSynchronize();
HIP_CHECK_BREAK(hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes, hipMemcpyHostToDevice),
hip_ok);
(void)hipDeviceSynchronize();
uint32_t constexpr nthreads = 256U;
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
uint32_t constexpr nthreads = 256U;
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
using count_elt_t = size_t;
using count_elt_t = size_t;
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
{
count_elt_t *count_subtotals_gpu_ptr;
HIP_CHECK_BREAK(
hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes),
hip_ok);
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
}
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
{
count_elt_t* count_subtotals_gpu_ptr;
HIP_CHECK_BREAK(hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes), hip_ok);
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
}
hipLaunchKernelGGL(
kernel::memset_gpu, nblocks, nthreads, 0, 0,
count_subtotals_gpu.get(), 0UL, static_cast<size_t>(nblocks));
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
hipLaunchKernelGGL(kernel::memset_gpu, nblocks, nthreads, 0, 0, count_subtotals_gpu.get(),
0UL, static_cast<size_t>(nblocks));
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
auto const kernel_begin_time = tm::monoclk::now();
auto const kernel_begin_time = tm::monoclk::now();
hipLaunchKernelGGL(
kernel::count_gpu, nblocks, nthreads, 0, 0,
rands_gpu.get(), count_subtotals_gpu.get(), rands.size(),
static_cast<size_t>(nblocks), opts.gt);
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
hipLaunchKernelGGL(kernel::count_gpu, nblocks, nthreads, 0, 0, rands_gpu.get(),
count_subtotals_gpu.get(), rands.size(), static_cast<size_t>(nblocks),
opts.gt);
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
auto const kernel_end_time = tm::monoclk::now();
auto const kernel_end_time = tm::monoclk::now();
std::vector<size_t> count_subtotals(nblocks);
HIP_CHECK_BREAK(
hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
count_subtotals_nbytes, hipMemcpyDeviceToHost),
hip_ok);
(void)hipDeviceSynchronize();
std::vector<size_t> count_subtotals(nblocks);
HIP_CHECK_BREAK(hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
count_subtotals_nbytes, hipMemcpyDeviceToHost),
hip_ok);
(void)hipDeviceSynchronize();
// TODO parallel sum on GPU
auto const total =
std::accumulate(
count_subtotals.cbegin(), count_subtotals.cend(),
static_cast<size_t>(0));
// TODO parallel sum on GPU
auto const total =
std::accumulate(count_subtotals.cbegin(), count_subtotals.cend(), static_cast<size_t>(0));
auto const all_end_time = tm::monoclk::now();
auto const all_end_time = tm::monoclk::now();
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
auto total_time(all_end_time - begin_time);
tm::dur const total_time_without_tool_init(total_time);
printf("len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
"main kernel time elapsed: %" DBL_FMT "\n"
"full time elapsed: %" DBL_FMT "\n",
rands.size(), opts.gt, total,
kernel_time.count(),
total_time_without_tool_init.count());
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
auto total_time(all_end_time - begin_time);
tm::dur const total_time_without_tool_init(total_time);
printf(
"len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
"main kernel time elapsed: %" DBL_FMT
"\n"
"full time elapsed: %" DBL_FMT "\n",
rands.size(), opts.gt, total, kernel_time.count(), total_time_without_tool_init.count());
} while (false);
if (opts.disassemble) {
disassembly_disassemble_kernels(false);
}
disassembly_disassemble_kernels(false);
}
}
cleanup:
if (opts.pc_sampling) {
rocprofiler_terminate_session(sid);
rocprofiler_flush_data(sid, bid);
rocprofiler_destroy_session(sid);
}
if (opts.pc_sampling) {
rocprofiler_terminate_session(sid);
rocprofiler_flush_data(sid, bid);
rocprofiler_destroy_session(sid);
}
out:
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok
? EXIT_SUCCESS
: EXIT_FAILURE;
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok ? EXIT_SUCCESS : EXIT_FAILURE;
}
int
main(int argc, char **argv)
{
if (auto const ret = get_options(argc, argv, &g_opts);
EXIT_SUCCESS != ret)
{
return ret;
}
int main(int argc, char** argv) {
if (auto const ret = get_options(argc, argv, &g_opts); EXIT_SUCCESS != ret) {
return ret;
}
if (hsa_init() != HSA_STATUS_SUCCESS){
return EXIT_FAILURE;
}
if (hsa_init() != HSA_STATUS_SUCCESS) {
return EXIT_FAILURE;
}
int ret = EXIT_FAILURE;
auto ok = ROCPROFILER_STATUS_SUCCESS;
int ret = EXIT_FAILURE;
auto ok = ROCPROFILER_STATUS_SUCCESS;
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
if (ROCPROFILER_STATUS_SUCCESS == ok) {
ret = run_kernel(g_opts);
} else {
goto out;
}
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
if (ROCPROFILER_STATUS_SUCCESS == ok) {
ret = run_kernel(g_opts);
} else {
goto out;
}
rocprofiler_finalize();
rocprofiler_finalize();
out:
hsa_shut_down();
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret
? EXIT_SUCCESS
: EXIT_FAILURE;
hsa_shut_down();
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret ? EXIT_SUCCESS : EXIT_FAILURE;
}
@@ -23,32 +23,30 @@
#define PROGNAME "code_printing_sample"
#define HIP_ERROR(code) \
do { \
fprintf(stderr, \
PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", \
__FILE__, __LINE__, hipGetErrorString((code))); \
fflush(stderr); \
} while (false);
#define HIP_ERROR(code) \
do { \
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", __FILE__, __LINE__, \
hipGetErrorString((code))); \
fflush(stderr); \
} while (false);
#define HIP_CHECK_BREAK(expr, var) \
if (auto const code = (expr); hipSuccess != code) { \
HIP_ERROR(code); \
(var) = code; \
break; \
}
#define HIP_CHECK_BREAK(expr, var) \
if (auto const code = (expr); hipSuccess != code) { \
HIP_ERROR(code); \
(var) = code; \
break; \
}
#define ROCPROFILER_ERROR(code) \
do { \
fprintf(stderr, \
PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", \
__FILE__, __LINE__, rocprofiler_error_str(code)); \
fflush(stderr); \
} while (false);
#define ROCPROFILER_ERROR(code) \
do { \
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", __FILE__, \
__LINE__, rocprofiler_error_str(code)); \
fflush(stderr); \
} while (false);
#define ROCPROFILER_CHECK(expr, var) \
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
ROCPROFILER_ERROR((var)); \
}
#define ROCPROFILER_CHECK(expr, var) \
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
ROCPROFILER_ERROR((var)); \
}
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
@@ -25,25 +25,24 @@
#include <cstdint>
struct program_options {
program_options()
: device(0)
, no_gpu(false)
, hip_memset(false)
, rands_len(1024 * 1024 * 4)
, gt(0)
, seed(std::chrono::steady_clock::now().time_since_epoch().count())
, disassemble(false)
, pc_sampling(false)
{}
program_options()
: device(0),
no_gpu(false),
hip_memset(false),
rands_len(1024 * 1024 * 4),
gt(0),
seed(std::chrono::steady_clock::now().time_since_epoch().count()),
disassemble(false),
pc_sampling(false) {}
int device;
bool no_gpu;
bool hip_memset;
size_t rands_len;
uint64_t gt;
uint64_t seed;
bool disassemble;
bool pc_sampling;
int device;
bool no_gpu;
bool hip_memset;
size_t rands_len;
uint64_t gt;
uint64_t seed;
bool disassemble;
bool pc_sampling;
};
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
@@ -23,8 +23,8 @@ int main(int argc, char** argv) {
int gpu_agent = 0;
int cpu_agent = 0;
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(&counters[0], counters.size(),
&dp_session_id, gpu_agent, cpu_agent));
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(
&counters[0], counters.size(), &dp_session_id, gpu_agent, cpu_agent));
printf("session start \n");
// start GPU device profiling
@@ -25,9 +25,10 @@ int main(int argc, char** argv) {
counters.emplace_back("GRBM_COUNT");
rocprofiler_filter_id_t filter_id;
[[maybe_unused]] rocprofiler_filter_property_t property = {};
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
rocprofiler_filter_data_t{.counters_names = &counters[0]},
counters.size(), &filter_id, property));
CHECK_ROCPROFILER(
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
rocprofiler_filter_data_t{.counters_names = &counters[0]},
counters.size(), &filter_id, property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
// Normal HIP Calls
@@ -40,9 +40,9 @@ int main(int argc, char** argv) {
// Kernel Tracing
rocprofiler_filter_id_t kernel_tracing_filter_id;
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
// Normal HIP Calls won't be traced
@@ -35,9 +35,9 @@ int main(int argc, char** argv) {
// Kernel Tracing
rocprofiler_filter_id_t kernel_tracing_filter_id;
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
// Normal HIP Calls won't be traced
@@ -1,25 +1,34 @@
# ############################################################################################################################################
# ROCProfiler General Requirements
# ############################################################################################################################################
find_package(Python3 COMPONENTS Interpreter REQUIRED)
find_package(
Python3
COMPONENTS Interpreter
REQUIRED)
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import lxml"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
execute_process(
COMMAND ${Python3_EXECUTABLE} -c "import lxml"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
message(FATAL_ERROR "\
message(
FATAL_ERROR
"\
The \"lxml\" Python3 package is not installed. \
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install lxml\".\
")
endif()
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
execute_process(
COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
message(FATAL_ERROR "\
message(
FATAL_ERROR
"\
The \"CppHeaderParser\" Python3 package is not installed. \
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install CppHeaderParser\".\
")
@@ -29,134 +38,157 @@ endif()
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR})
# Getting HSA Include Directory
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH
REQUIRED)
get_property(
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
find_library(AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_PATH} PATH_SUFFIXES lib)
find_library(
AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so"
HINTS ${CMAKE_PREFIX_PATH}
PATHS ${ROCM_PATH}
PATH_SUFFIXES lib)
if(NOT AQLPROFILE_LIB)
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
endif()
# ############################################################################################################################################
# ########################################################################################
# Adding Old Library Files
# ############################################################################################################################################
set (OLD_LIB_SRC
${LIB_DIR}/core/rocprofiler.cpp
${LIB_DIR}/core/gpu_command.cpp
${LIB_DIR}/core/proxy_queue.cpp
${LIB_DIR}/core/simple_proxy_queue.cpp
${LIB_DIR}/core/intercept_queue.cpp
${LIB_DIR}/core/metrics.cpp
${LIB_DIR}/core/activity.cpp
${LIB_DIR}/util/hsa_rsrc_factory.cpp
)
# ########################################################################################
set(OLD_LIB_SRC
${LIB_DIR}/core/rocprofiler.cpp
${LIB_DIR}/core/gpu_command.cpp
${LIB_DIR}/core/proxy_queue.cpp
${LIB_DIR}/core/simple_proxy_queue.cpp
${LIB_DIR}/core/intercept_queue.cpp
${LIB_DIR}/core/metrics.cpp
${LIB_DIR}/core/activity.cpp
${LIB_DIR}/util/hsa_rsrc_factory.cpp)
# ############################################################################################################################################
# ########################################################################################
# Configuring Basic/Derived Counters
# ############################################################################################################################################
# ########################################################################################
set(COUNTERS_DIR ${PROJECT_SOURCE_DIR}/src/core/counters)
execute_process(
COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py ${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp
COMMENT "Generating basic_counter.cpp...")
COMMAND
${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py
${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp COMMENT
"Generating basic_counter.cpp...")
# execute_process(
# COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp
# COMMENT "Generating derived_counter.cpp...")
# execute_process( COMMAND ${Python3_EXECUTABLE}
# ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived
# ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp COMMENT "Generating
# derived_counter.cpp...")
# ############################################################################################################################################
# ########################################################################################
# ROCProfiler Tracer HIP/HSA Parsing
# ############################################################################################################################################
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH
REQUIRED)
# ########################################################################################
get_property(
HIP_INCLUDE_DIRECTORIES
TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH REQUIRED)
# # Generate the HSA wrapper functions header
add_custom_command(
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py ${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h" "${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py
${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
"${HSA_RUNTIME_INC_PATH}/hsa.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h"
"${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
# # Generate the HSA pretty printers
add_custom_command(
OUTPUT hsa_ostream_ops.h
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o hsa_ext_amd.h.i
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
-in hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
COMMENT "Generating hsa_ostream_ops.h...")
OUTPUT hsa_ostream_ops.h
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
hsa_ext_amd.h.i
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
COMMENT "Generating hsa_ostream_ops.h...")
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH
REQUIRED)
get_property(
HIP_INCLUDE_DIRECTORIES
TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH REQUIRED)
## Generate the HIP pretty printers
# Generate the HIP pretty printers
add_custom_command(
OUTPUT hip_ostream_ops.h
COMMAND ${CMAKE_C_COMPILER} "$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
BYPRODUCTS hip_runtime_api.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
-in hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
COMMENT "Generating hip_ostream_ops.h..."
COMMAND_EXPAND_LISTS)
OUTPUT hip_ostream_ops.h
COMMAND
${CMAKE_C_COMPILER}
"$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o
hip_runtime_api.h.i
BYPRODUCTS hip_runtime_api.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
COMMENT "Generating hip_ostream_ops.h..."
COMMAND_EXPAND_LISTS)
set(GENERATED_SOURCES
hip_ostream_ops.h
hsa_prof_str.h
hsa_ostream_ops.h
hsa_prof_str.inline.h)
set(GENERATED_SOURCES hip_ostream_ops.h hsa_prof_str.h hsa_ostream_ops.h
hsa_prof_str.inline.h)
# ############################################################################################################################################
# ########################################################################################
# ROCProfiler API
# ############################################################################################################################################
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is
# unavailable
# ########################################################################################
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is unavailable
find_path(PCIACCESS_INCLUDE_DIR pciaccess.h REQUIRED)
find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED)
set(PUBLIC_HEADERS rocprofiler.h)
foreach(header ${PUBLIC_HEADERS})
install(FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
COMPONENT dev)
endforeach()
install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
install(
FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
COMPONENT dev)
endforeach()
install(
DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
COMPONENT dev)
# Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils
file(GLOB ROCPROFILER_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
file(GLOB ROCPROFILER_PROFILER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
file(GLOB ROCPROFILER_TRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
file(GLOB ROCPROFILER_PROFILER_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
file(GLOB ROCPROFILER_TRACER_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
file(GLOB ROCPROFILER_ATT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/att/att.cpp)
file(GLOB ROCPROFILER_CLASS_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
file(GLOB ROCPROFILER_CLASS_SRC_FILES
${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
file(GLOB ROCPROFILER_SPM_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/spm/spm.cpp)
set(CORE_HARDWARE_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware)
file(GLOB CORE_HARDWARE_SRC_FILES ${CORE_HARDWARE_DIR}/*.cpp)
@@ -180,148 +212,202 @@ file(GLOB CORE_COUNTERS_SAMPLER_SRC_FILES ${CORE_SESSION_DIR}/counters_sampler.c
file(GLOB CORE_COUNTERS_SRC_FILES ${PROJECT_BINARY_DIR}/src/api/*_counter.cpp)
file(GLOB CORE_COUNTERS_PARENT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/*.cpp)
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
file(GLOB CORE_COUNTERS_MMIO_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/mmio/*.cpp)
set(CORE_UTILS_DIR ${PROJECT_SOURCE_DIR}/src/utils)
file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp)
set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler)
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp
${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
#### V1 Library
# Compiling/Installing ROCProfiler API V1
# V1 Library Compiling/Installing ROCProfiler API V1
add_library(${ROCPROFILER_TARGET} SHARED ${OLD_LIB_SRC})
set_target_properties(${ROCPROFILER_TARGET} PROPERTIES
CXX_VISIBILITY_PRESET hidden
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION 1.0.0
SOVERSION 1)
set_target_properties(
${ROCPROFILER_TARGET}
PROPERTIES CXX_VISIBILITY_PRESET hidden
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION 1.0.0
SOVERSION 1)
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
target_compile_definitions(${ROCPROFILER_TARGET} PUBLIC AMD_INTERNAL_BUILD)
target_include_directories(${ROCPROFILER_TARGET}
PUBLIC
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
PRIVATE
${LIB_DIR} ${ROOT_DIR}
${PROJECT_SOURCE_DIR}/include/rocprofiler)
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 c stdc++)
target_include_directories(
${ROCPROFILER_TARGET}
PUBLIC $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include/rocprofiler)
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB}
hsa-runtime64::hsa-runtime64 c stdc++)
get_target_property(ROCPROFILER_LIBRARY_V1_NAME ${ROCPROFILER_TARGET} NAME)
get_target_property(ROCPROFILER_LIBRARY_V1_VERSION ${ROCPROFILER_TARGET} VERSION)
get_target_property(ROCPROFILER_LIBRARY_V1_SOVERSION ${ROCPROFILER_TARGET} SOVERSION)
## Install libraries: Non versioned lib file in dev package
## Skipping NameLink as it will be installed using symlinks
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
# installed using symlinks
install(
TARGETS ${ROCPROFILER_TARGET}
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT runtime)
install(
TARGETS ${ROCPROFILER_TARGET}
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT asan)
#### V2 Library
# Compiling/Installing ROCProfiler API
add_library(rocprofiler-v2 SHARED
${ROCPROFILER_SRC_FILES}
${ROCPROFILER_CLASS_SRC_FILES}
${ROCPROFILER_PROFILER_SRC_FILES}
${ROCPROFILER_ATT_SRC_FILES}
${CORE_HARDWARE_SRC_FILES}
${CORE_HSA_SRC_FILES}
${ROCPROFILER_SPM_SRC_FILES}
${CORE_MEMORY_SRC_FILES}
${CORE_SESSION_SRC_FILES}
${CORE_FILTER_SRC_FILES}
${CORE_DEVICE_PROFILING_SRC_FILES}
${CORE_COUNTERS_SAMPLER_SRC_FILES}
${CORE_COUNTERS_PARENT_SRC_FILES}
${CORE_COUNTERS_METRICS_SRC_FILES}
${CORE_COUNTERS_MMIO_SRC_FILES}
${CORE_UTILS_SRC_FILES}
${CORE_HSA_PACKETS_SRC_FILES}
${CORE_HSA_QUEUES_SRC_FILES}
${ROCPROFILER_TRACER_SRC_FILES}
${ROCPROFILER_ROCTRACER_SRC_FILES}
${GENERATED_SOURCES}
${CORE_COUNTERS_SRC_FILES}
${CORE_PC_SAMPLING_FILES})
set_target_properties(rocprofiler-v2 PROPERTIES
CXX_VISIBILITY_PRESET hidden
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
OUTPUT_NAME rocprofiler64
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION ${PROJECT_VERSION}
SOVERSION ${PROJECT_VERSION_MAJOR})
# V2 Library Compiling/Installing ROCProfiler API
add_library(
rocprofiler-v2 SHARED
${ROCPROFILER_SRC_FILES}
${ROCPROFILER_CLASS_SRC_FILES}
${ROCPROFILER_PROFILER_SRC_FILES}
${ROCPROFILER_ATT_SRC_FILES}
${CORE_HARDWARE_SRC_FILES}
${CORE_HSA_SRC_FILES}
${ROCPROFILER_SPM_SRC_FILES}
${CORE_MEMORY_SRC_FILES}
${CORE_SESSION_SRC_FILES}
${CORE_FILTER_SRC_FILES}
${CORE_DEVICE_PROFILING_SRC_FILES}
${CORE_COUNTERS_SAMPLER_SRC_FILES}
${CORE_COUNTERS_PARENT_SRC_FILES}
${CORE_COUNTERS_METRICS_SRC_FILES}
${CORE_COUNTERS_MMIO_SRC_FILES}
${CORE_UTILS_SRC_FILES}
${CORE_HSA_PACKETS_SRC_FILES}
${CORE_HSA_QUEUES_SRC_FILES}
${ROCPROFILER_TRACER_SRC_FILES}
${ROCPROFILER_ROCTRACER_SRC_FILES}
${GENERATED_SOURCES}
${CORE_COUNTERS_SRC_FILES}
${CORE_PC_SAMPLING_FILES})
set_target_properties(
rocprofiler-v2
PROPERTIES CXX_VISIBILITY_PRESET hidden
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
OUTPUT_NAME rocprofiler64
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION ${PROJECT_VERSION}
SOVERSION ${PROJECT_VERSION_MAJOR})
target_compile_definitions(rocprofiler-v2
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
PRIVATE AMD_INTERNAL_BUILD
PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1)
target_include_directories(rocprofiler-v2
PUBLIC
${HIP_INCLUDE_DIRECTORIES} ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
PRIVATE
${LIB_DIR} ${ROOT_DIR}
${CMAKE_CURRENT_BINARY_DIR}
${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/tools)
target_compile_definitions(
rocprofiler-v2
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
PRIVATE AMD_INTERNAL_BUILD PROF_API_IMPL HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_AMD__=1)
target_include_directories(
rocprofiler-v2
PUBLIC ${HIP_INCLUDE_DIRECTORIES}
${HSA_RUNTIME_INCLUDE_DIRECTORIES}
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${CMAKE_CURRENT_BINARY_DIR} ${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/tools)
if(ASAN)
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address)
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa asan dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
target_link_options(
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
-Wl,--no-undefined,-fsanitize=address)
target_link_libraries(
rocprofiler-v2
PRIVATE ${AQLPROFILE_LIB}
hsa-runtime64::hsa-runtime64
Threads::Threads
atomic
numa
asan
dl
c
stdc++
stdc++fs
amd_comgr
${PCIACCESS_LIBRARIES})
else()
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined)
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
target_link_options(
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
-Wl,--no-undefined)
target_link_libraries(
rocprofiler-v2
PRIVATE ${AQLPROFILE_LIB}
hsa-runtime64::hsa-runtime64
Threads::Threads
atomic
numa
dl
c
stdc++
stdc++fs
amd_comgr
${PCIACCESS_LIBRARIES})
endif()
get_target_property(ROCPROFILER_LIBRARY_V2_NAME rocprofiler-v2 OUTPUT_NAME)
get_target_property(ROCPROFILER_LIBRARY_V2_VERSION rocprofiler-v2 VERSION)
get_target_property(ROCPROFILER_LIBRARY_V2_SOVERSION rocprofiler-v2 SOVERSION)
## Prepare Name Link SO files for V1 & V2 Libraries
add_custom_command(TARGET rocprofiler-v2 POST_BUILD
COMMAND ${CMAKE_COMMAND} -E rm -f ${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND ${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND ${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so
)
# Prepare Name Link SO files for V1 & V2 Libraries
add_custom_command(
TARGET rocprofiler-v2
POST_BUILD
COMMAND ${CMAKE_COMMAND} -E rm -f
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND
${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND
${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so)
# Add custom target to trigger the create_symlink command
add_custom_target(create_rocprofiler_lib DEPENDS rocprofiler-v2 ${ROCPROFILER_TARGET})
## Install libraries: Non versioned lib file in dev package
## Skipping NameLink as it will be installed using symlinks
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
# installed using symlinks
install(
TARGETS rocprofiler-v2
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT runtime)
install(
TARGETS rocprofiler-v2
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT asan)
## Installing NameLinks for V1 & V2
## librocprofiler64.so links to V1 library
## librocprofiler64v2.so links to V2 library
install(CODE "execute_process( \
# Installing NameLinks for V1 & V2 librocprofiler64.so links to V1 library
# librocprofiler64v2.so links to V2 library
install(
CODE "execute_process( \
COMMAND ${CMAKE_COMMAND} -E create_symlink \
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION} \
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so \
)" COMPONENT dev
)
install(CODE "execute_process( \
)"
COMPONENT dev)
install(
CODE "execute_process( \
COMMAND ${CMAKE_COMMAND} -E create_symlink \
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION} \
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so \
)" COMPONENT dev
)
)"
COMPONENT dev)
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
configure_file(
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
configure_file(
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
install(DIRECTORY
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
install(
DIRECTORY ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
# ############################################################################################################################################
# ########################################################################################
@@ -74,13 +74,14 @@ class ROCProfiler_Singleton {
// Device Profiling Session
bool FindDeviceProfilingSession(rocprofiler_session_id_t session_id);
rocprofiler_session_id_t CreateDeviceProfilingSession(std::vector<std::string> counters,
int cpu_agent_index, int gpu_agent_index);
int cpu_agent_index, int gpu_agent_index);
void DestroyDeviceProfilingSession(rocprofiler_session_id_t session_id);
DeviceProfileSession* GetDeviceProfilingSession(rocprofiler_session_id_t session_id);
// Generic
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind, rocprofiler_filter_data_t filter_data);
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind,
rocprofiler_filter_data_t filter_data);
uint64_t GetUniqueRecordId();
uint64_t GetUniqueKernelDispatchId();
@@ -11,8 +11,7 @@
// TODO(aelwazir): change that to adapt with our own Exception
// What about outside exceptions and callbacks exceptions!!
#define API_METHOD_PREFIX \
try {
#define API_METHOD_PREFIX try {
#define API_METHOD_SUFFIX \
} \
catch (rocprofiler::Exception & e) { \
@@ -61,11 +61,11 @@ void check_status(hsa_status_t status) {
namespace activity_prim {
// PC sampling callback data
struct pcsmp_callback_data_t {
const char* kernel_name; // sampled kernel name
void* data_buffer; // host buffer for tracing data
uint64_t id; // sample id
uint64_t cycle; // sample cycle
uint64_t pc; // sample PC
const char* kernel_name; // sampled kernel name
void* data_buffer; // host buffer for tracing data
uint64_t id; // sample id
uint64_t cycle; // sample cycle
uint64_t pc; // sample PC
};
uint32_t activity_op = UINT32_MAX;
@@ -74,9 +74,8 @@ std::atomic<activity_async_callback_t> activity_callback{NULL};
rocprofiler_t* context = NULL;
hsa_status_t trace_data_cb(hsa_ven_amd_aqlprofile_info_type_t info_type,
hsa_ven_amd_aqlprofile_info_data_t* info_data,
void* data) {
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*) data;
hsa_ven_amd_aqlprofile_info_data_t* info_data, void* data) {
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*)data;
activity_record_t record{};
record.op = activity_op;
@@ -96,11 +95,13 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
hsa_agent_t agent{};
hsa_status_t status = rocprofiler_get_agent(group.context, &agent);
check_status(status);
const rocprofiler::util::AgentInfo* agent_info = rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
const rocprofiler::util::AgentInfo* agent_info =
rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
pcsmp_callback_data_t pcsmp_data{};
pcsmp_data.kernel_name = (const char*)arg;
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(agent_info, rocprofiler::TraceProfile::GetSize());
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(
agent_info, rocprofiler::TraceProfile::GetSize());
status = rocprofiler_iterate_trace_data(group.context, trace_data_cb, &pcsmp_data);
check_status(status);
return false;
@@ -110,8 +111,8 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data, void* user_data,
rocprofiler_group_t* group) {
// context features
const rocprofiler_feature_kind_t trace_kind =
(rocprofiler_feature_kind_t)(ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
const rocprofiler_feature_kind_t trace_kind = (rocprofiler_feature_kind_t)(
ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
const uint32_t feature_count = 1;
const uint32_t parameter_count = 1;
rocprofiler_feature_t* features = new rocprofiler_feature_t[feature_count];
@@ -131,8 +132,8 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
properties.handler_arg = (void*)strdup(callback_data->kernel_name);
// Open profiling context
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count,
&context, 0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count, &context,
0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
check_status(status);
// Get group[0]
@@ -141,7 +142,7 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
return status;
}
} // namespace activity_prim
} // namespace activity_prim
extern "C" {
PUBLIC_API const char* GetOpName(uint32_t op) { return strdup("PCSAMPLE"); }
@@ -152,7 +153,8 @@ PUBLIC_API bool RemoveApiCallback(uint32_t op) { return true; }
PUBLIC_API bool InitActivityCallback(void* callback, void* arg) {
activity_prim::activity_arg = arg;
activity_prim::activity_callback.store((activity_async_callback_t)callback, std::memory_order_release);
activity_prim::activity_callback.store((activity_async_callback_t)callback,
std::memory_order_release);
rocprofiler_queue_callbacks_t queue_callbacks{};
queue_callbacks.dispatch = activity_prim::dispatch_callback;
@@ -191,11 +193,8 @@ struct evt_cb_entry_t {
};
evt_cb_entry_t evt_cb_table[HSA_EVT_ID_NUMBER];
hsa_status_t codeobj_evt_callback(
rocprofiler_hsa_cb_id_t id,
const rocprofiler_hsa_callback_data_t* cb_data,
void* arg)
{
hsa_status_t codeobj_evt_callback(rocprofiler_hsa_cb_id_t id,
const rocprofiler_hsa_callback_data_t* cb_data, void* arg) {
const auto evt = evt_cb_table[id].get();
activity_rtapi_callback_t evt_callback = (activity_rtapi_callback_t)evt.first;
if (evt_callback != NULL) evt_callback(ACTIVITY_DOMAIN_HSA_EVT, id, cb_data, evt.second);
@@ -19,4 +19,4 @@ enum hsa_evt_id_t {
// HSA EVT callback data type
typedef rocprofiler_hsa_callback_data_t hsa_evt_data_t;
#endif // _SRC_CORE_ACTIVITY_H
#endif // _SRC_CORE_ACTIVITY_H
@@ -27,7 +27,7 @@ THE SOFTWARE.
#include <hsa/hsa.h>
#include <hsa/hsa_ext_amd.h>
#include <unistd.h> // usleep
#include <unistd.h> // usleep
#include <atomic>
#include <list>
#include <map>
@@ -91,8 +91,7 @@ class Group {
barrier_signal_{},
dispatch_signal_{},
orig_signal_{},
record_{}
{}
record_{} {}
void Insert(const profile_info_t& info) {
const rocprofiler_feature_kind_t kind = info.rinfo->kind;
@@ -110,11 +109,10 @@ class Group {
}
hsa_status_t Finalize(const bool is_concurrent = false) {
hsa_status_t status = pmc_profile_.Finalize(start_vector_, stop_vector_,
read_vector_, is_concurrent);
hsa_status_t status =
pmc_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
if (status == HSA_STATUS_SUCCESS) {
status = trace_profile_.Finalize(start_vector_, stop_vector_,
read_vector_, is_concurrent);
status = trace_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
}
if (status == HSA_STATUS_SUCCESS) {
if (!pmc_profile_.Empty()) ++n_profiles_;
@@ -137,32 +135,20 @@ class Group {
Context* GetContext() { return context_; }
uint32_t GetIndex() const { return index_; }
void SetBarrierSignal(const hsa_signal_t &signal) {
barrier_signal_ = signal;
}
hsa_signal_t& GetBarrierSignal() {
return barrier_signal_;
}
void SetDispatchSignal(const hsa_signal_t &signal) {
dispatch_signal_ = signal;
}
hsa_signal_t& GetDispatchSignal() {
return dispatch_signal_;
}
void SetOrigSignal(const hsa_signal_t &signal) {
orig_signal_ = signal;
}
const hsa_signal_t& GetOrigSignal() const {
return orig_signal_;
}
rocprofiler_dispatch_record_t* GetRecord() {
return &record_;
}
void SetBarrierSignal(const hsa_signal_t& signal) { barrier_signal_ = signal; }
hsa_signal_t& GetBarrierSignal() { return barrier_signal_; }
void SetDispatchSignal(const hsa_signal_t& signal) { dispatch_signal_ = signal; }
hsa_signal_t& GetDispatchSignal() { return dispatch_signal_; }
void SetOrigSignal(const hsa_signal_t& signal) { orig_signal_ = signal; }
const hsa_signal_t& GetOrigSignal() const { return orig_signal_; }
rocprofiler_dispatch_record_t* GetRecord() { return &record_; }
atomic_refs_t* AtomicRefsCount() { return reinterpret_cast<atomic_refs_t*>(&refs_); }
void ResetRefsCount() { AtomicRefsCount()->store(n_profiles_, std::memory_order_release); }
void IncrRefsCount() { AtomicRefsCount()->fetch_add(1, std::memory_order_acq_rel); }
uint32_t FetchDecrRefsCount() { return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel); }
uint32_t FetchDecrRefsCount() {
return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel);
}
private:
PmcProfile pmc_profile_;
@@ -188,23 +174,23 @@ class Context {
public:
typedef std::map<std::string, rocprofiler_feature_t*> info_map_t;
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
{
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue,
rocprofiler_feature_t* info, const uint32_t info_count,
rocprofiler_handler_t handler, void* handler_arg) {
new (obj) Context(agent_info, queue, info, info_count, handler, handler_arg);
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
}
static void Release(Context* obj) { obj->Destruct(); }
static Context* Create(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
{
static Context* Create(const util::AgentInfo* agent_info, Queue* queue,
rocprofiler_feature_t* info, const uint32_t info_count,
rocprofiler_handler_t handler, void* handler_arg) {
Context* obj = new Context(agent_info, queue, info, info_count, handler, handler_arg);
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
try {
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
} catch(...) {
} catch (...) {
delete obj;
obj = NULL;
std::cerr << "Error: Context Create failed" << std::endl;
@@ -213,7 +199,9 @@ class Context {
return obj;
}
static void Destroy(Context* obj) { if (obj != NULL) delete obj; }
static void Destroy(Context* obj) {
if (obj != NULL) delete obj;
}
void Reset(const uint32_t& group_index) { set_[group_index].ResetRefsCount(); }
@@ -293,8 +281,10 @@ class Context {
hsa_rsrc_->SignalWaitRestore(tuple.completion_signal, 1);
// Restore other signals
RestoreSignals(tuple);
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector)) rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(), NULL};
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector))
rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(),
NULL};
const hsa_status_t status =
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, DataCallback, &callback_data);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
@@ -310,7 +300,8 @@ class Context {
if (expr) {
auto it = info_map_.find(name);
if (it == info_map_.end())
EXC_RAISING(HSA_STATUS_ERROR, "metric '" << name << "', rocprofiler info is not found " << this);
EXC_RAISING(HSA_STATUS_ERROR,
"metric '" << name << "', rocprofiler info is not found " << this);
rocprofiler_feature_t* info = it->second;
info->data.result_double = expr->Eval(args);
info->data.kind = ROCPROFILER_DATA_KIND_DOUBLE;
@@ -324,7 +315,7 @@ class Context {
for (auto& tuple : profile_vector) {
if (pcsmp_mode_) const_cast<profile_t*>(tuple.profile)->event_count = UINT32_MAX;
const hsa_status_t status =
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
}
}
@@ -342,7 +333,10 @@ class Context {
hsa_agent_t GetAgent() const { return agent_; }
Group* GetGroup(const uint32_t& index) { return &set_[index]; }
rocprofiler_handler_t GetHandler(void** arg) const { *arg = handler_arg_; return handler_; }
rocprofiler_handler_t GetHandler(void** arg) const {
*arg = handler_arg_;
return handler_;
}
// Concurrent profiling mode
static bool k_concurrent_;
@@ -358,8 +352,7 @@ class Context {
metrics_(NULL),
handler_(handler),
handler_arg_(handler_arg),
pcsmp_mode_(false)
{}
pcsmp_mode_(false) {}
~Context() { Destruct(); }
@@ -375,8 +368,7 @@ class Context {
}
void Construct(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
{
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg) {
if (info_count == 0) {
set_.push_back(Group(agent_info_, this, 0));
return;
@@ -386,9 +378,11 @@ class Context {
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
if (Initialize(info, info_count) == false) {
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n"); fflush(stdout);
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n");
fflush(stdout);
MetricsGroupSet(agent_info, info, info_count).Print(stdout);
fprintf(stdout, "\n"); fflush(stdout);
fprintf(stdout, "\n");
fflush(stdout);
EXC_RAISING(HSA_STATUS_ERROR, "Metrics list exceeds HW limits");
}
Finalize();
@@ -420,8 +414,8 @@ class Context {
info_map_[name] = info;
auto ret = metrics_map_.insert({name, NULL});
if (!ret.second)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name
<< "' is registered more then once");
EXC_RAISING(HSA_STATUS_ERROR,
"input metric '" << name << "' is registered more then once");
}
}
@@ -437,8 +431,9 @@ class Context {
if (kind == ROCPROFILER_FEATURE_KIND_METRIC) { // Processing metrics features
const Metric* metric = metrics_->Get(name);
if (metric == NULL)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not supported on this hardware: "
<< agent_info_->name);
EXC_RAISING(HSA_STATUS_ERROR,
"input metric '"
<< name << "' is not supported on this hardware: " << agent_info_->name);
#if 0
std::cout << " " << name << (metric->GetExpr() ? " = " + metric->GetExpr()->String() : " counter") << std::endl;
#endif
@@ -493,9 +488,9 @@ class Context {
info->kind = ROCPROFILER_FEATURE_KIND_TRACE;
const event_t* event = NULL;
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
pcsmp_mode_ = true;
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
const Metric* metric = metrics_->Get(name);
if (metric == NULL)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
@@ -559,14 +554,14 @@ class Context {
const bool trace_local = TraceProfile::IsLocal();
util::HsaRsrcFactory* hsa_rsrc = &util::HsaRsrcFactory::Instance();
if (sample_id == 0) {
const uint32_t output_buffer_size = profile->output_buffer.size;
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size) :
calloc(output_buffer_size64, sizeof(uint64_t));
rinfo->data.result_bytes.size = output_buffer_size;
rinfo->data.result_bytes.ptr = ptr;
callback_data->ptr = reinterpret_cast<char*>(ptr);
const uint32_t output_buffer_size = profile->output_buffer.size;
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size)
: calloc(output_buffer_size64, sizeof(uint64_t));
rinfo->data.result_bytes.size = output_buffer_size;
rinfo->data.result_bytes.ptr = ptr;
callback_data->ptr = reinterpret_cast<char*>(ptr);
}
char* result_bytes_ptr = reinterpret_cast<char*>(rinfo->data.result_bytes.ptr);
const char* end = result_bytes_ptr + rinfo->data.result_bytes.size;
@@ -577,8 +572,10 @@ class Context {
char* dest = ptr + sizeof(*header);
if ((dest + size) >= end) {
if (dest < end) size = end - dest;
else EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
if (dest < end)
size = end - dest;
else
EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
}
bool suc = true;
@@ -593,7 +590,9 @@ class Context {
rinfo->data.result_bytes.instance_count = sample_id + 1;
rinfo->data.kind = ROCPROFILER_DATA_KIND_BYTES;
} else
EXC_RAISING(HSA_STATUS_ERROR, "Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src << ") size(" << size << ")");
EXC_RAISING(HSA_STATUS_ERROR,
"Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src
<< ") size(" << size << ")");
} else {
if (sample_id == 0) {
rinfo->data.result_bytes.ptr = profile->output_buffer.ptr;
@@ -647,8 +646,7 @@ class Context {
bool pcsmp_mode_;
};
#define CONTEXT_INSTANTIATE() \
bool rocprofiler::Context::k_concurrent_ = false;
#define CONTEXT_INSTANTIATE() bool rocprofiler::Context::k_concurrent_ = false;
} // namespace rocprofiler
@@ -31,7 +31,7 @@ THE SOFTWARE.
namespace rocprofiler {
class ContextPool {
public:
public:
typedef uint64_t index_t;
typedef std::mutex mutex_t;
@@ -41,16 +41,12 @@ class ContextPool {
std::atomic<bool> completed;
};
static ContextPool* Create(
uint32_t num_entries,
uint32_t payload_bytes,
const util::AgentInfo* agent_info,
rocprofiler_feature_t* info,
const uint32_t info_count,
rocprofiler_pool_handler_t handler,
void* handler_arg)
{
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count, handler, handler_arg);
static ContextPool* Create(uint32_t num_entries, uint32_t payload_bytes,
const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_pool_handler_t handler,
void* handler_arg) {
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count,
handler, handler_arg);
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
return obj;
}
@@ -61,18 +57,18 @@ class ContextPool {
if (constructed_ == false) {
Construct(agent_info_, info_, info_count_);
}
const index_t write_index = write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
const index_t write_index =
write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
while (write_index >= (read_index_.load(std::memory_order_acquire) + array_size_bytes_)) {
check_completed();
std::this_thread::yield();
}
entry_t* entry = GetPoolEntry(write_index, pool_entry);
if (entry->completed.load(std::memory_order_relaxed) != false) EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
if (entry->completed.load(std::memory_order_relaxed) != false)
EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
}
void Flush() {
check_completed();
}
void Flush() { check_completed(); }
#if 0
template <class F>
F for_each(const F& f_p) {
@@ -95,7 +91,7 @@ class ContextPool {
return f;
}
#endif
private:
private:
static unsigned aligned64(const unsigned& size) { return (size + 0x3f) & ~0x3fu; }
static bool context_handler(rocprofiler_group_t group, void* arg) {
@@ -105,45 +101,41 @@ class ContextPool {
return true;
}
ContextPool(
uint32_t num_entries,
uint32_t payload_bytes,
const util::AgentInfo* agent_info,
rocprofiler_feature_t* info,
const uint32_t info_count,
rocprofiler_pool_handler_t pool_handler,
void* pool_handler_arg
) :
payload_off_(aligned64(sizeof(entry_t))),
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
array_size_bytes_(entry_size_bytes_ * num_entries),
array_(NULL),
read_index_(0),
write_index_(0),
sync_flag_(false),
ContextPool(uint32_t num_entries, uint32_t payload_bytes, const util::AgentInfo* agent_info,
rocprofiler_feature_t* info, const uint32_t info_count,
rocprofiler_pool_handler_t pool_handler, void* pool_handler_arg)
: payload_off_(aligned64(sizeof(entry_t))),
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
array_size_bytes_(entry_size_bytes_ * num_entries),
array_(NULL),
read_index_(0),
write_index_(0),
sync_flag_(false),
agent_info_(agent_info),
info_(info),
info_count_(info_count),
pool_handler_(pool_handler),
pool_handler_arg_(pool_handler_arg),
constructed_(false)
{}
agent_info_(agent_info),
info_(info),
info_count_(info_count),
pool_handler_(pool_handler),
pool_handler_arg_(pool_handler_arg),
constructed_(false) {}
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info, const uint32_t info_count) {
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
const uint32_t info_count) {
std::lock_guard<mutex_t> lck(mutex_);
if (constructed_ == false) {
array_data_ = (char*) malloc(array_size_bytes_ + 0x3f);
array_data_ = (char*)malloc(array_size_bytes_ + 0x3f);
array_ = reinterpret_cast<char*>(((intptr_t)array_data_ + 0x3f) >> 6 << 6);
if (((intptr_t)array_ & 0x3f) != 0) EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
if (((intptr_t)array_ & 0x3f) != 0)
EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
memset(array_, 0, array_size_bytes_);
const char* end = array_ + array_size_bytes_;
for (char* ptr = array_; ptr < end; ptr += entry_size_bytes_) {
entry_t* entry = reinterpret_cast<entry_t*>(ptr);
entry->pool = this;
entry->context = Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
entry->context =
Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
}
constructed_ = true;
@@ -175,7 +167,7 @@ class ContextPool {
if (sync_flag_.test_and_set(std::memory_order_acquire) == false) {
index_t read_index = read_index_.load(std::memory_order_relaxed);
const index_t write_index = write_index_.load(std::memory_order_relaxed);
while(read_index < write_index) {
while (read_index < write_index) {
rocprofiler_pool_entry_t pool_entry{};
entry_t* entry = GetPoolEntry(read_index, &pool_entry);
if (entry->completed.load(std::memory_order_acquire) == true) {
@@ -1,8 +1,7 @@
#ifndef _CORE_TIMER_H_
#define _CORE_TIMER_H_
template <int Size>
class CoreTimer {
template <int Size> class CoreTimer {
CoreTimer() {
index_ = 0;
freq_in_100mhz_ = MeasureTSCFreqHz();
@@ -20,15 +19,15 @@ class CoreTimer {
// AMD Linux timing
unsigned int unused;
n = __rdtscp(&unused);
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
index_ += 1;
}
double Print()
double Print()
private:
// timer data
double data_[Size];
private :
// timer data
double data_[Size];
// data index
uint32_t index_;
// frequency
@@ -40,20 +39,20 @@ class CoreTimer {
clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
return uint64_t(ts.tv_sec) * 1000000 + ts.tv_nsec / 1000;
}
uint64_t CoreTimer::MeasureTSCFreqHz() {
// Make a coarse interval measurement of TSC ticks for 1 gigacycles.
unsigned int unused;
uint64_t tscTicksEnd;
uint64_t coarseBeginUs = CoarseTimestampUs();
uint64_t tscTicksBegin = __rdtscp(&unused);
do {
tscTicksEnd = __rdtscp(&unused);
} while (tscTicksEnd - tscTicksBegin < 1000000000);
uint64_t coarseEndUs = CoarseTimestampUs();
// Compute the TSC frequency and round to nearest 100MHz.
uint64_t coarseIntervalNs = (coarseEndUs - coarseBeginUs) * 1000;
uint64_t tscIntervalTicks = tscTicksEnd - tscTicksBegin;
@@ -61,4 +60,4 @@ class CoreTimer {
}
};
#endif // _CORE_TIMER_H_
#endif // _CORE_TIMER_H_
@@ -27,8 +27,7 @@ namespace Counter {
static std::atomic<uint64_t> COUNTER_COUNTER{0};
DerivedCounter::DerivedCounter(std::string name, std::string description,
std::string gpu_name)
DerivedCounter::DerivedCounter(std::string name, std::string description, std::string gpu_name)
: Counter(name, description, gpu_name) {
metric_id_ = COUNTER_COUNTER.fetch_add(1, std::memory_order_release);
addCounterToCounterMap();
@@ -41,20 +40,17 @@ DerivedCounter::~DerivedCounter() {
uint64_t DerivedCounter::getMetricId() { return metric_id_; }
std::map<uint64_t, BasicCounter*> *DerivedCounter::getAllCounters() {
return &counters_;
}
std::map<uint64_t, BasicCounter*>* DerivedCounter::getAllCounters() { return &counters_; }
BasicCounter *DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
BasicCounter* DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
return counters_[counter_id];
}
void DerivedCounter::addBasicCounter(uint64_t counter_id,
BasicCounter *counter) {
void DerivedCounter::addBasicCounter(uint64_t counter_id, BasicCounter* counter) {
counters_.emplace(counter_id, counter);
}
@DERIVED_XML_PARSE_RESULT@
@DERIVED_XML_PARSE_RESULT @
} // namespace Counter
@@ -39,8 +39,7 @@ namespace Counter {
class DerivedCounter : Counter {
public:
std::function<uint64_t()> evaluate_metric;
DerivedCounter(std::string name, std::string description,
std::string gpu_name);
DerivedCounter(std::string name, std::string description, std::string gpu_name);
~DerivedCounter();
uint64_t getMetricId();
@@ -108,7 +108,7 @@ bool metrics::ExtractMetricEvents(
// adding result object for derived metric
std::lock_guard<std::mutex> lock(extract_metric_events_lock);
if(metric_names[i].compare("KERNEL_DURATION")==0) {
if (metric_names[i].compare("KERNEL_DURATION") == 0) {
if (results_map.find(metric_names[i]) == results_map.end()) {
results_map[metric_names[i]] = new results_t(metric_names[i], {}, xcc_count);
}
@@ -192,7 +192,7 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
auto it = results_map.find(metric->GetName());
if (it == results_map.end()) rocprofiler::fatal("metric results not found ");
results_t* res = it->second;
if(metric->GetName().compare("KERNEL_DURATION") == 0) {
if (metric->GetName().compare("KERNEL_DURATION") == 0) {
res->val_double = kernel_duration;
continue;
}
@@ -206,7 +206,8 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
void metrics::GetCountersAndMetricResultsByXcc(uint32_t xcc_index,
std::vector<results_t*>& results_list,
std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration) {
std::vector<const Metric*>& metrics_list,
uint64_t kernel_duration) {
for (auto it = results_list.begin(); it != results_list.end(); it++) {
(*it)->val_double =
(*it)->xcc_vals[xcc_index]; // set val_double to hold value for specific xcc
@@ -35,10 +35,10 @@ namespace rocprofiler {
typedef std::vector<double> xcc_results_t;
class results_t{
public:
results_t(std::string in_name, event_t in_event, uint32_t xcc_count):
name(in_name), val_double(0), event(in_event) {
class results_t {
public:
results_t(std::string in_name, event_t in_event, uint32_t xcc_count)
: name(in_name), val_double(0), event(in_event) {
xcc_vals.resize(xcc_count);
std::fill(xcc_vals.begin(), xcc_vals.end(), 0);
}
@@ -78,8 +78,9 @@ bool GetMetricsData(std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
void GetCountersAndMetricResultsByXcc(uint32_t xcc_index, std::vector<results_t*>& results_list,
std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list,
uint64_t kernel_duration = 0);
} // namespace metrics
} // namespace rocprofiler
@@ -45,7 +45,7 @@ THE SOFTWARE.
do { \
std::ostringstream oss; \
oss << __FUNCTION__ << "(), " << stream; \
throw rocprofiler::util::exception(error, oss.str()); \
throw rocprofiler::util::exception(error, oss.str()); \
} while (0)
#define AQL_EXC_RAISING(error, stream) \
+3 -6
مشاهده پرونده
@@ -221,14 +221,11 @@ class MetricsDict {
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
std::unordered_set<std::string> supported_agent_names = {
"gfx906",
"gfx908",
"gfx906", "gfx908",
"gfx90a", // Vega
"gfx940",
"gfx941",
"gfx940", "gfx941",
"gfx942", // Mi300
"gfx1030",
"gfx1031",
"gfx1030", "gfx1031",
"gfx1032", // Navi2x
"gfx1100",
"gfx1101" // Navi3x
@@ -17,8 +17,8 @@ class DFPerfMonMI200 : public PerfMon {
DFPerfMonMI200(const Agent::AgentInfo& info);
~DFPerfMonMI200();
void Start() override;
void Stop() {};
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {};
void Stop(){};
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){};
void SetCounterNames(std::vector<std::string>& counter_names);
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::DF_PERFMON; }
@@ -31,7 +31,6 @@ class DFPerfMonMI200 : public PerfMon {
uint64_t GetFicaNodeOutboundBw(uint32_t ficaa_val);
private:
mmio::DFPerfmonMMIO* mmio_;
static std::mutex mutex_; // should be an MMIO member
@@ -13,12 +13,12 @@ PciePerfMonMI200::~PciePerfMonMI200() {
mmio::MMIOManager::DestroyMMIOInstance(dynamic_cast<mmio::MMIO*>(mmio_));
}
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value){
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value) {
// mmio or ioctl approaches
mmio_->RegisterWriteAPI(reg_offset, value);
mmio_->RegisterWriteAPI(reg_offset, value);
}
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value){
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value) {
// mmio or ioctl approaches
mmio_->RegisterReadAPI(reg_offset, value);
}
@@ -35,44 +35,40 @@ void PciePerfMonMI200::SetCounterNames(std::vector<std::string>& counter_names)
}
}
void PciePerfMonMI200::Start(){
void PciePerfMonMI200::Start() {
// TODO: make sure values stored in table
// in registers header are dec and not hex
Start_RX_TILE_SCLK(event_id_);
}
void PciePerfMonMI200::Stop(){
void PciePerfMonMI200::Stop() {
// TODO: revisit correct value to stop
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
}
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){
uint64_t val=0;
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {
uint64_t val = 0;
Read_RX_TILE_SCLK(val);
rocprofiler_counters_sampler_counter_output_t value = {
ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
static_cast<double>(val)
};
rocprofiler_counters_sampler_counter_output_t value = {ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
static_cast<double>(val)};
values.push_back(value);
}
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event){
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event) {
// Step 1: PORT SEL update
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
// Step 2: EVENT SEL update
uint32_t value = event; // last 8 bits for event
uint32_t value = event; // last 8 bits for event
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_TXCLK3, value);
// Steps 3 & 4: Performance counters initialization, enable:
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
}
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result) {
// Step 5: Performance counters read:
uint32_t lo_val, hi_val;
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_TXCLK3, lo_val);
@@ -84,22 +80,20 @@ void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
result = val | lo_val;
}
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event){
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event) {
// Step 1: PORT SEL update
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
// Step 2: EVENT SEL update
uint32_t value = event; // last 8 bits for event
uint32_t value = event; // last 8 bits for event
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_LCLK1, value);
// Steps 3 & 4: Performance counters initialization, enable:
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
}
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result) {
// Step 5: Performance counters read:
uint32_t lo_val, hi_val;
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_LCLK1, lo_val);
@@ -111,6 +105,4 @@ void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
result = val | lo_val;
}
} // namespace rocprofiler
} // namespace rocprofiler
@@ -22,7 +22,7 @@ class PciePerfMonMI200 : public PerfMon {
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::PCIE_PERFMON; }
private:
// TODO : check google coding std
// TODO : check google coding std
void writeRegister(uint32_t reg_offset, uint32_t value);
void readRegister(uint32_t reg_offset, uint32_t& value);
@@ -4,70 +4,70 @@
#include <stdint.h>
namespace PCIE_MI200 {
// -------- RX Tile TXCLK Start --------
// Step 1: PORT SEL update
const static uint32_t PCIE_PERF_CNTL_EVENT_CI_PORT_SEL = 0x11180250;
// Step 2: EVENT SEL update
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
// Steps 3 & 4: Performance counters initialization, enable:
const static uint32_t PCIE_PERF_COUNT_CNTL = 0x11180200;
// Step 5: Performance counters read:
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK10 = 0x11180930;
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
const static uint32_t PCIE_PERF_COUNT1_TXCLK10 = 0x111808B4;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK10 = 0x11180934;
@@ -127,201 +127,200 @@ const static uint32_t PCIE_PERF_COUNT1_UPVAL_LCLK8 = 0x11180974;
// -------- RX Tile SCLK End ----------
typedef enum{
TX_TILE_TXCLK = 0,
TX_TILE_SCLK = 1,
RX_TILE_TXCLK = 2,
RX_TILE_SCLK = 3,
LC_TILE_TXCLK = 4
}pcie_event_category_t;
typedef enum {
TX_TILE_TXCLK = 0,
TX_TILE_SCLK = 1,
RX_TILE_TXCLK = 2,
RX_TILE_SCLK = 3,
LC_TILE_TXCLK = 4
} pcie_event_category_t;
struct pcie_event_t{
pcie_event_t(int id, pcie_event_category_t cat): event_id(id), event_category(cat){}
int event_id;
pcie_event_category_t event_category;
struct pcie_event_t {
pcie_event_t(int id, pcie_event_category_t cat) : event_id(id), event_category(cat) {}
int event_id;
pcie_event_category_t event_category;
};
const static std::map<std::string, pcie_event_t> pcie_events_table = {
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}
};
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}};
}
} // namespace PCIE_MI200
#endif
@@ -42,6 +42,6 @@ class PerfMon {
std::vector<std::string> counter_names_;
};
} // namespace rocprofiler
} // namespace rocprofiler
#endif
@@ -31,10 +31,8 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
namespace rocprofiler {
size_t CreateGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
packet_t* command,
const size_t& slot_count) {
size_t CreateGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
packet_t* command, const size_t& slot_count) {
if (op >= NUMBER_GPU_CMD_OP) EXC_RAISING(HSA_STATUS_ERROR, "bad op value (" << op << ")");
const bool is_legacy = (strncmp(agent_info->name, "gfx8", 4) == 0);
@@ -49,14 +47,15 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
profile.agent = agent_info->dev_id;
// Query for cmd buffer size
hsa_ven_amd_aqlprofile_info_type_t info_type =
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
hsa_status_t status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
hsa_status_t status =
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
if (profile.command_buffer.size == 0) EXC_RAISING(status, "get_info(ENABLE_CMD).size == 0");
// Allocate cmd buffer
const size_t aligment_mask = 0x100 - 1;
profile.command_buffer.ptr =
hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
profile.command_buffer.ptr = hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
if ((reinterpret_cast<uintptr_t>(profile.command_buffer.ptr) & aligment_mask) != 0) {
EXC_RAISING(status, "profile.command_buffer.ptr bad alignment");
}
@@ -66,15 +65,18 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
packet_t packet{};
// Query for cmd buffer data
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
status =
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
// Check for legacy GFXIP
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_legacy_get_pm4(&packet, command);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
if (status != HSA_STATUS_SUCCESS)
AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
} else {
// Query for cmd buffer data
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
status =
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
}
@@ -91,15 +93,14 @@ struct gpu_cmd_key_t {
uint32_t node_id;
};
struct gpu_cmd_fncomp_t {
bool operator() (const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
bool operator()(const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
return (a.op < b.op) || ((a.op == b.op) && (a.node_id < b.node_id));
}
};
typedef std::map<gpu_cmd_key_t, gpu_cmd_entry_t, gpu_cmd_fncomp_t> gpu_cmd_map_t;
size_t GetGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out) {
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out) {
thread_local gpu_cmd_map_t map;
// Getting NUMA node id
@@ -112,7 +113,8 @@ size_t GetGpuCommand(gpu_cmd_op_t op,
auto ret = map.insert({gpu_cmd_key_t{op, node_id}, gpu_cmd_entry_t{}});
gpu_cmd_map_t::iterator it = ret.first;
if (ret.second) {
it->second.size = CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
it->second.size =
CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
}
*command_out = it->second.command;
@@ -37,9 +37,8 @@ enum gpu_cmd_op_t {
NUMBER_GPU_CMD_OP
};
size_t GetGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out);
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out);
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
@@ -55,9 +54,7 @@ static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
return HSA_STATUS_SUCCESS;
}
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
hsa_agent_t agent,
hsa_queue_t* queue) {
static inline size_t IssueGpuCommand(gpu_cmd_op_t op, hsa_agent_t agent, hsa_queue_t* queue) {
rocprofiler::util::HsaRsrcFactory* hsa_rsrc = &rocprofiler::util::HsaRsrcFactory::Instance();
const rocprofiler::util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(agent);
return IssueGpuCommand(op, agent_info, queue);
@@ -55,31 +55,30 @@ struct block_status_t {
// Metrics set class
class MetricsGroup {
public:
public:
// Info map type
typedef std::map<std::string, const Metric*> info_map_t;
// Blocks map type
typedef std::map<block_des_t, block_status_t, lt_block_des> blocks_map_t;
MetricsGroup(const util::AgentInfo* agent_info) :
agent_info_(agent_info)
{
MetricsGroup(const util::AgentInfo* agent_info) : agent_info_(agent_info) {
metrics_ = MetricsDict::Create(agent_info);
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
}
void Print(FILE* file) const {
for (const Metric* metric : metrics_vec_) {
fprintf(file, " %s", metric->GetName().c_str()); fflush(stdout);
fprintf(file, " %s", metric->GetName().c_str());
fflush(stdout);
}
fprintf(file, "\n"); fflush(stdout);
fprintf(file, "\n");
fflush(stdout);
}
static const Metric* GetMetric(const MetricsDict* metrics, const std::string& name) {
// Metric object
const Metric* metric = metrics->Get(name);
if (metric == NULL)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
if (metric == NULL) EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
return metric;
}
@@ -95,9 +94,7 @@ class MetricsGroup {
}
// Add metric
bool AddMetric(const rocprofiler_feature_t* info) {
return AddMetric(GetMetric(metrics_, info));
}
bool AddMetric(const rocprofiler_feature_t* info) { return AddMetric(GetMetric(metrics_, info)); }
bool AddMetric(const Metric* metric) {
// Blocks utilization delta
@@ -125,8 +122,9 @@ class MetricsGroup {
query.events = event;
uint32_t block_counters;
hsa_status_t status = util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
hsa_status_t status =
util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "get block_counters info");
block_status.max_counters = block_counters;
}
@@ -141,7 +139,8 @@ class MetricsGroup {
metrics_vec_.push_back(metric);
info_map_[metric->GetName()] = metric;
for (const counter_t* counter : counters_vec) {
if (info_map_.find(counter->name) == info_map_.end()) info_map_[counter->name] = NewCounterInfo(counter->name);
if (info_map_.find(counter->name) == info_map_.end())
info_map_[counter->name] = NewCounterInfo(counter->name);
}
for (const auto& entry : blocks_delta) {
blocks_map_[entry.first] = entry.second;
@@ -150,10 +149,8 @@ class MetricsGroup {
return true;
}
private:
const Metric* NewCounterInfo(const std::string& name) const {
return GetMetric(metrics_, name);
}
private:
const Metric* NewCounterInfo(const std::string& name) const { return GetMetric(metrics_, name); }
// Agent info
const util::AgentInfo* const agent_info_;
@@ -169,10 +166,10 @@ class MetricsGroup {
// Metrics groups class
class MetricsGroupSet {
public:
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array, const uint32_t info_count) :
agent_info_(agent_info)
{
public:
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array,
const uint32_t info_count)
: agent_info_(agent_info) {
metrics_ = MetricsDict::Create(agent_info);
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
Initialize(info_array, info_count);
@@ -186,12 +183,13 @@ class MetricsGroupSet {
void Print(FILE* file) const {
for (const auto* group : groups_) {
fprintf(stdout, " pmc : "); fflush(stdout);
fprintf(stdout, " pmc : ");
fflush(stdout);
group->Print(file);
}
}
private:
private:
void Initialize(const rocprofiler_feature_t* info_array, const uint32_t info_count) {
std::multimap<uint32_t, const Metric*, std::greater<uint32_t> > input_metrics;
for (unsigned i = 0; i < info_count; ++i) {
@@ -202,7 +200,8 @@ class MetricsGroupSet {
input_metrics.insert({counters_num, metric});
if (MetricsGroup(agent_info_).AddMetric(metric) == false) {
AQL_EXC_RAISING(HSA_STATUS_ERROR, "Metric '" << metric->GetName() << "' doesn't fit in one group");
AQL_EXC_RAISING(HSA_STATUS_ERROR,
"Metric '" << metric->GetName() << "' doesn't fit in one group");
}
}
#if 0
@@ -239,4 +238,4 @@ class MetricsGroupSet {
} // namespace rocprofiler
#endif // SRC_CORE_GROUP_SET_H_
#endif // SRC_CORE_GROUP_SET_H_
@@ -62,33 +62,28 @@ AgentInfo::AgentInfo(const hsa_agent_t agent, ::CoreApiTable* table) : handle_(a
table->hsa_agent_get_info_fn(
agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES), &se_num_);
if (table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
&waves_per_cu_) != HSA_STATUS_SUCCESS)
{
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
if (table->hsa_agent_get_info_fn(agent,
(hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
&waves_per_cu_) != HSA_STATUS_SUCCESS) {
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
}
compute_units_per_sh_ = cu_num_ / (se_num_ * shader_arrays_per_se_);
wave_slots_per_simd_ = waves_per_cu_ / simds_per_cu_;
if (table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
&pci_domain_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
&pci_location_id_) != HSA_STATUS_SUCCESS)
{
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
if (table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
&pci_domain_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
&pci_location_id_) != HSA_STATUS_SUCCESS) {
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
}
// TODO(saurabh, giovanni): Remove this in 5.7
if (table->hsa_agent_get_info_fn(agent,
static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC), &xcc_num_) != HSA_STATUS_SUCCESS) {
xcc_num_ = 1;
if (table->hsa_agent_get_info_fn(agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC),
&xcc_num_) != HSA_STATUS_SUCCESS) {
xcc_num_ = 1;
}
}
@@ -33,8 +33,8 @@ Agent::AgentInfo& GetAgentInfo(decltype(hsa_agent_t::handle) handle) {
if (agent_info_map.find(handle) != agent_info_map.end()) {
return agent_info_map.at(handle);
} else {
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle) <<
") in this system" << std::endl;
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle)
<< ") in this system" << std::endl;
abort();
}
}
@@ -49,9 +49,7 @@ void SetAgentInfo(decltype(hsa_agent_t::handle) handle, const Agent::AgentInfo&
}
}
std::vector<hsa_agent_t>& GetCPUAgentList() {
return cpu_agents_list;
}
std::vector<hsa_agent_t>& GetCPUAgentList() { return cpu_agents_list; }
hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
std::lock_guard<std::mutex> lock(agents_map_lock);
@@ -60,8 +58,8 @@ hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
return hsa_agent_t{agent_info.second.getHandle()};
}
}
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index) <<
") in this system" << std::endl;
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index)
<< ") in this system" << std::endl;
abort();
}
@@ -95,7 +95,7 @@ namespace rocprofiler {
namespace hsa_support {
void Initialize(HsaApiTable* Table);
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void *data);
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void* data);
void Finalize();
bool IterateCounters(rocprofiler_counters_info_callback_t counters_info_callback);
@@ -181,7 +181,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
// TODO: validate needs to be called on each events_list[i]
// Validating the events array for the specified gpu agent
if(events_list.size() > 0) {
if (events_list.size() > 0) {
bool validate_event_result;
status =
hsa_ven_amd_aqlprofile_validate_event(gpu_agent, &events_list[0], &validate_event_result);
@@ -234,9 +234,10 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
}
}
for(auto& cname : counter_names) {
if(cname.compare("KERNEL_DURATION")==0) {
rocprofiler::Metric* metric = const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
for (auto& cname : counter_names) {
if (cname.compare("KERNEL_DURATION") == 0) {
rocprofiler::Metric* metric =
const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
if (metric == nullptr) std::cout << cname << " not found in metricsDict\n";
context->metrics_list.push_back(metric);
}
@@ -315,7 +316,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
hsa_agent_t ag_list[ag_list_count];
ag_list[0] = gpu_agent;
if(context->events_list.size() > 0) {
if (context->events_list.size() > 0) {
// Preparing an Getting the size of the command and output buffers
status = hsa_ven_amd_aqlprofile_start(profile, NULL);
// CHECK_HSA_STATUS("Error: Getting Buffers Size", status);
@@ -510,7 +511,8 @@ uint8_t* AllocateLocalMemory(size_t size, hsa_amd_memory_pool_t* gpu_pool) {
return ptr;
}
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile, size_t att_buffer_size) {
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile,
size_t att_buffer_size) {
Agent::AgentInfo& agentInfo = rocprofiler::hsa_support::GetAgentInfo(gpu_agent.handle);
profile->command_buffer.ptr =
AllocateSysMemory(gpu_agent, profile->command_buffer.size, &agentInfo.cpu_pool);
@@ -435,16 +435,18 @@ bool AsyncSignalHandler(hsa_signal_value_t signal_value, void* data) {
pending->session_id = GetROCProfilerSingleton()->GetCurrentSessionId();
}
if (pending->counters_count > 0) {
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 && pending->profile) // call to GetCounterData() is required only once for a dispatch
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 &&
pending->profile) // call to GetCounterData() is required only once for a dispatch
rocprofiler::metrics::GetCounterData(pending->profile, queue_info_session->agent,
pending->context->results_list);
if (is_individual_xcc_mode)
rocprofiler::metrics::GetCountersAndMetricResultsByXcc(
xcc_id, pending->context->results_list, pending->context->results_map,
pending->context->metrics_list, time.end-time.start);
pending->context->metrics_list, time.end - time.start);
else
rocprofiler::metrics::GetMetricsData(pending->context->results_map,
pending->context->metrics_list, time.end-time.start);
pending->context->metrics_list,
time.end - time.start);
AddRecordCounters(&record, pending);
} else {
if (session->FindBuffer(pending->buffer_id)) {
@@ -652,8 +654,8 @@ void CheckNeededProfileConfigs() {
att_counters_names = filter->GetCounterData();
kernel_profile_names = std::get<std::vector<std::string>>(
filter->GetProperty(ROCPROFILER_FILTER_KERNEL_NAMES));
kernel_profile_dispatch_ids = std::get<std::vector<uint64_t>>(
filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
kernel_profile_dispatch_ids =
std::get<std::vector<uint64_t>>(filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
} else if (session && session->FindFilterWithKind(ROCPROFILER_PC_SAMPLING_COLLECTION)) {
is_pc_sampling_collection_mode = true;
}
@@ -685,23 +687,20 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
auto& kdispatch = static_cast<const hsa_kernel_dispatch_packet_s*>(packets)[i];
// If Dispatch IDs specified, profile based on dispatch ID
for (auto id : kernel_profile_dispatch_ids)
b_profile_this_object |= id == current_writer_id;
for (auto id : kernel_profile_dispatch_ids) b_profile_this_object |= id == current_writer_id;
try {
// Can throw
const std::string& kernel_name = ksymbols->at(kdispatch.kernel_object);
// If no filters specified, auto profile this kernel
if (kernel_profile_names.size() == 0 &&
kernel_profile_dispatch_ids.size() == 0 &&
if (kernel_profile_names.size() == 0 && kernel_profile_dispatch_ids.size() == 0 &&
kernel_name.find("__amd_rocclr_") == std::string::npos)
b_profile_this_object = true;
b_profile_this_object = true;
// Try to match the mangled kernel name with given matches in input.txt
// We want to initiate att profiling if a match exists
for (const std::string& kernel_matches : kernel_profile_names)
if (kernel_name.find(kernel_matches) != std::string::npos)
b_profile_this_object = true;
if (kernel_name.find(kernel_matches) != std::string::npos) b_profile_this_object = true;
} catch (...) {
printf("Warning: Unknown name for object %lu\n", kdispatch.kernel_object);
}
@@ -711,17 +710,13 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
can_profile_packet.push_back(b_profile_this_object);
}
// If we're going to skip all packets, need to update writer ID
if (!b_can_profile_anypacket)
WRITER_ID.store(current_writer_id, std::memory_order_release);
if (!b_can_profile_anypacket) WRITER_ID.store(current_writer_id, std::memory_order_release);
return {can_profile_packet, b_can_profile_anypacket};
}
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
Packet::packet_t& start_packet,
Packet::packet_t& stop_packet,
Queue& queue_info,
Agent::AgentInfo& agentInfo
) {
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(Packet::packet_t& start_packet,
Packet::packet_t& stop_packet, Queue& queue_info,
Agent::AgentInfo& agentInfo) {
std::vector<hsa_ven_amd_aqlprofile_parameter_t> att_params;
int num_att_counters = 0;
uint32_t att_buffer_size = DEFAULT_ATT_BUFFER_SIZE;
@@ -731,15 +726,16 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
case ROCPROFILER_ATT_PERFCOUNTER_NAME:
break;
case ROCPROFILER_ATT_BUFFER_SIZE:
att_buffer_size = std::max(96l<<10l, std::min(int64_t(param.value)<<20l, (1l<<32l)-(3l<<20)));
break; // Clip to [96KB, 4GB)
att_buffer_size =
std::max(96l << 10l, std::min(int64_t(param.value) << 20l, (1l << 32l) - (3l << 20)));
break; // Clip to [96KB, 4GB)
case ROCPROFILER_ATT_PERFCOUNTER:
num_att_counters += 1;
break;
default:
att_params.push_back(
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
param.value});
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
param.value});
}
}
@@ -760,22 +756,21 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
printf("Only events from the SQ block can be selected for ATT.");
exit(1);
}
att_params.push_back({static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(
int(ROCPROFILER_ATT_PERFCOUNTER)),
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
att_params.push_back(
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
num_att_counters += 1;
}
hsa_ven_amd_aqlprofile_parameter_t zero_perf = {
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
0};
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)), 0};
// Fill other perfcounters with 0's
for (; num_att_counters < 16; num_att_counters++) att_params.push_back(zero_perf);
}
// Get the PM4 Packets using packets_generator
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(),
att_params, &start_packet, &stop_packet, att_buffer_size);
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(), att_params,
&start_packet, &stop_packet, att_buffer_size);
}
/**
@@ -866,14 +861,16 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
record_id);
if (session_data_count > 0 && profile.second) {
session->GetProfiler()->AddPendingSignals(
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
profile.first, session_data_count, profile.second, kernel_properties,
(uint32_t)syscall(__NR_gettid), user_pkt_index, correlation_id);
writer_id, record_id, original_packet.completion_signal,
dispatch_packet.completion_signal, session_id, buffer_id, profile.first,
session_data_count, profile.second, kernel_properties, (uint32_t)syscall(__NR_gettid),
user_pkt_index, correlation_id);
} else {
session->GetProfiler()->AddPendingSignals(
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
nullptr, session_data_count, nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid),
user_pkt_index, correlation_id);
writer_id, record_id, original_packet.completion_signal,
dispatch_packet.completion_signal, session_id, buffer_id, nullptr, session_data_count,
nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index,
correlation_id);
}
}
@@ -893,7 +890,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
CreateSignal(0, &interrupt_signal);
// Adding Stop and Read PM4 Packets
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 && profile.first && profile.first->stop_packet) {
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 &&
profile.first && profile.first->stop_packet) {
hsa_signal_t dummy_signal{};
profile.first->stop_packet->header = HSA_PACKET_TYPE_VENDOR_SPECIFIC
<< HSA_PACKET_HEADER_TYPE;
@@ -937,7 +935,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
bool can_profile_anypacket = false;
std::vector<bool> can_profile_packet;
std::tie(can_profile_packet, can_profile_anypacket) = GetAllowedProfilesList(packets, pkt_count);
std::tie(can_profile_packet, can_profile_anypacket) =
GetAllowedProfilesList(packets, pkt_count);
if (!can_profile_anypacket) {
/* Write the original packets to the hardware if no patch will be profiled */
@@ -964,8 +963,9 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
// increment writer ID for every packet
if (bit_extract(original_packet.header, HSA_PACKET_HEADER_TYPE,
HSA_PACKET_HEADER_TYPE+HSA_PACKET_HEADER_WIDTH_TYPE-1) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
HSA_PACKET_HEADER_TYPE + HSA_PACKET_HEADER_WIDTH_TYPE - 1) ==
HSA_PACKET_TYPE_KERNEL_DISPATCH)
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
continue;
}
@@ -37,33 +37,37 @@ SOFTWARE.
#include "util/exception.h"
#include "util/hsa_rsrc_factory.h"
#define HSA_RT(call) \
do { \
const hsa_status_t status = call; \
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
} while(0)
#define IS_HSA_CALLBACK(ID) \
const auto __id = ID; (void)__id; \
void *__arg = arg_.load(); (void)__arg; \
rocprofiler_hsa_callback_fun_t __callback = \
(ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) ? callbacks_.allocate: \
(ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device: \
(ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy: \
(ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit: \
(ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol: \
callbacks_.codeobj; \
if ((__callback != NULL) && (recursion_ == false))
#define DO_HSA_CALLBACK \
do { \
recursion_ = true; \
__callback(__id, &data, __arg); \
recursion_ = false; \
#define HSA_RT(call) \
do { \
const hsa_status_t status = call; \
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
} while (0)
#define ISSUE_HSA_CALLBACK(ID) \
do { IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } } while(0)
#define IS_HSA_CALLBACK(ID) \
const auto __id = ID; \
(void)__id; \
void* __arg = arg_.load(); \
(void)__arg; \
rocprofiler_hsa_callback_fun_t __callback = (ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) \
? callbacks_.allocate \
: (ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device \
: (ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy \
: (ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit \
: (ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol \
: callbacks_.codeobj; \
if ((__callback != NULL) && (recursion_ == false))
#define DO_HSA_CALLBACK \
do { \
recursion_ = true; \
__callback(__id, &data, __arg); \
recursion_ = false; \
} while (0)
#define ISSUE_HSA_CALLBACK(ID) \
do { \
IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } \
} while (0)
// Demangle C++ symbol name
static const char* cpp_demangle(const char* symname) {
@@ -74,15 +78,15 @@ static const char* cpp_demangle(const char* symname) {
}
namespace rocprofiler {
extern decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
extern decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
extern decltype(hsa_memory_copy)* hsa_memory_copy_fn;
extern decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
extern decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
extern decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
extern decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
extern decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
extern decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
extern decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
extern decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
extern decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
extern decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
extern decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
extern decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
extern decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
extern decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
extern decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
class HsaInterceptor {
public:
@@ -95,10 +99,7 @@ class HsaInterceptor {
if (enable_) {
// Fetching AMD Loader HSA extension API
HSA_RT(hsa_system_get_major_extension_table(
HSA_EXTENSION_AMD_LOADER,
1,
sizeof(hsa_ven_amd_loader_1_01_pfn_t),
&LoaderApiTable));
HSA_EXTENSION_AMD_LOADER, 1, sizeof(hsa_ven_amd_loader_1_01_pfn_t), &LoaderApiTable));
// Saving original API functions
hsa_memory_allocate_fn = table->core_->hsa_memory_allocate_fn;
@@ -131,10 +132,7 @@ class HsaInterceptor {
}
private:
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region,
size_t size,
void** ptr)
{
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region, size_t size, void** ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_memory_allocate_fn(region, size, ptr));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
@@ -150,11 +148,8 @@ class HsaInterceptor {
return status;
}
static hsa_status_t MemoryAssignAgent(
void *ptr,
hsa_agent_t agent,
hsa_access_permission_t access)
{
static hsa_status_t MemoryAssignAgent(void* ptr, hsa_agent_t agent,
hsa_access_permission_t access) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_memory_assign_agent_fn(ptr, agent, access));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
@@ -169,11 +164,7 @@ class HsaInterceptor {
}
// Spawn device allow access callback
static void DeviceCallback(
uint32_t num_agents,
const hsa_agent_t* agents,
const void* ptr)
{
static void DeviceCallback(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr) {
for (const hsa_agent_t* agent_p = agents; agent_p < (agents + num_agents); ++agent_p) {
hsa_agent_t agent = *agent_p;
rocprofiler_hsa_callback_data_t data{};
@@ -188,17 +179,11 @@ class HsaInterceptor {
}
// Agent allow access callback 'hsa_amd_agents_allow_access'
static hsa_status_t AgentsAllowAccess(
uint32_t num_agents,
const hsa_agent_t* agents,
const uint32_t* flags,
const void* ptr)
{
static hsa_status_t AgentsAllowAccess(uint32_t num_agents, const hsa_agent_t* agents,
const uint32_t* flags, const void* ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_amd_agents_allow_access_fn(num_agents, agents, flags, ptr));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
DeviceCallback(num_agents, agents, ptr);
}
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) { DeviceCallback(num_agents, agents, ptr); }
return status;
}
@@ -218,12 +203,8 @@ class HsaInterceptor {
return HSA_STATUS_SUCCESS;
}
static hsa_status_t MemoryPoolAllocate(
hsa_amd_memory_pool_t pool,
size_t size,
uint32_t flags,
void** ptr)
{
static hsa_status_t MemoryPoolAllocate(hsa_amd_memory_pool_t pool, size_t size, uint32_t flags,
void** ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_amd_memory_pool_allocate_fn(pool, size, flags, ptr));
if (size != 0) {
@@ -232,8 +213,10 @@ class HsaInterceptor {
data.allocate.ptr = *ptr;
data.allocate.size = size;
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &data.allocate.segment));
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &data.allocate.global_flag));
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT,
&data.allocate.segment));
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS,
&data.allocate.global_flag));
DO_HSA_CALLBACK;
@@ -246,9 +229,7 @@ class HsaInterceptor {
}
return status;
}
static hsa_status_t MemoryPoolFree(
void* ptr)
{
static hsa_status_t MemoryPoolFree(void* ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
rocprofiler_hsa_callback_data_t data{};
@@ -260,11 +241,7 @@ class HsaInterceptor {
return status;
}
static hsa_status_t MemoryCopy(
void *dst,
const void *src,
size_t size)
{
static hsa_status_t MemoryCopy(void* dst, const void* src, size_t size) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_memory_copy_fn(dst, src, size));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
@@ -277,17 +254,13 @@ class HsaInterceptor {
return status;
}
static hsa_status_t MemoryAsyncCopy(
void* dst, hsa_agent_t dst_agent, const void* src,
hsa_agent_t src_agent, size_t size,
uint32_t num_dep_signals,
const hsa_signal_t* dep_signals,
hsa_signal_t completion_signal)
{
static hsa_status_t MemoryAsyncCopy(void* dst, hsa_agent_t dst_agent, const void* src,
hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals,
const hsa_signal_t* dep_signals,
hsa_signal_t completion_signal) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_amd_memory_async_copy_fn(
dst, dst_agent, src, src_agent, size,
num_dep_signals, dep_signals, completion_signal));
HSA_RT(hsa_amd_memory_async_copy_fn(dst, dst_agent, src, src_agent, size, num_dep_signals,
dep_signals, completion_signal));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
rocprofiler_hsa_callback_data_t data{};
data.memcopy.dst = dst;
@@ -298,14 +271,11 @@ class HsaInterceptor {
return status;
}
static hsa_status_t CodeObjectCallback(
hsa_executable_t executable,
hsa_loaded_code_object_t loaded_code_object,
void* arg)
{
static hsa_status_t CodeObjectCallback(hsa_executable_t executable,
hsa_loaded_code_object_t loaded_code_object, void* arg) {
const int free_flag = reinterpret_cast<long>(arg);
hsa_ven_amd_loader_code_object_storage_type_t storage_type =
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
int storage_fd = -1;
uint64_t memory_base = 0;
uint64_t memory_size = 0;
@@ -316,56 +286,45 @@ class HsaInterceptor {
char* uri_str = NULL;
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
&storage_type));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
&storage_type));
if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE) {
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
&storage_fd));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
&storage_fd));
if (storage_fd == -1) {
printf("CodeObjectCallback: fd == -1\n"); fflush(stdout);
abort();
printf("CodeObjectCallback: fd == -1\n");
fflush(stdout);
abort();
}
} else if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY) {
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
&memory_base));
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
&memory_base));
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
&memory_size));
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
&memory_size));
}
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE,
&load_base));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE, &load_base));
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE,
&load_size));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE, &load_size));
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA,
&load_delta));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA, &load_delta));
// Getting URI
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH,
&uri_len));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH, &uri_len));
uri_str = (char*)calloc(uri_len + 1, sizeof(char));
if (!uri_str) EXC_ABORT(HSA_STATUS_ERROR, "URI allocation");
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI,
uri_str));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI, uri_str));
if (storage_type != HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE) {
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_CODEOBJ) {
@@ -377,8 +336,8 @@ class HsaInterceptor {
data.codeobj.load_base = load_base;
data.codeobj.load_size = load_size;
data.codeobj.load_delta = load_delta;
data.codeobj.uri_length = uri_len;
data.codeobj.uri = uri_str;
data.codeobj.uri_length = uri_len;
data.codeobj.uri = uri_str;
data.codeobj.unload = free_flag;
DO_HSA_CALLBACK;
@@ -406,12 +365,8 @@ class HsaInterceptor {
uint32_t num_agents = 0;
hsa_agent_t* agents = NULL;
pointer_info.size = sizeof(hsa_amd_pointer_info_t);
HSA_RT(hsa_amd_pointer_info(
reinterpret_cast<void*>(load_base),
&pointer_info,
malloc,
&num_agents,
&agents));
HSA_RT(hsa_amd_pointer_info(reinterpret_cast<void*>(load_base), &pointer_info, malloc,
&num_agents, &agents));
DeviceCallback(num_agents, agents, reinterpret_cast<void*>(load_base));
}
@@ -420,11 +375,8 @@ class HsaInterceptor {
return HSA_STATUS_SUCCESS;
}
static hsa_status_t KernelSymbolCallback(
hsa_executable_t executable,
hsa_executable_symbol_t symbol,
void *arg)
{
static hsa_status_t KernelSymbolCallback(hsa_executable_t executable,
hsa_executable_symbol_t symbol, void* arg) {
const int free_flag = reinterpret_cast<long>(arg);
hsa_symbol_kind_t kind = (hsa_symbol_kind_t)0;
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &kind));
@@ -433,9 +385,11 @@ class HsaInterceptor {
const char* name = NULL;
uint32_t len = 0;
uint64_t obj = 0;
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
HSA_RT(
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
if (free_flag == 0) {
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
HSA_RT(
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
char sym_name[len + 1];
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME, sym_name));
name = cpp_demangle(sym_name);
@@ -453,10 +407,7 @@ class HsaInterceptor {
return HSA_STATUS_SUCCESS;
}
static hsa_status_t ExecutableFreeze(
hsa_executable_t executable,
const char *options)
{
static hsa_status_t ExecutableFreeze(hsa_executable_t executable, const char* options) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_executable_freeze_fn(executable, options));
@@ -466,39 +417,29 @@ class HsaInterceptor {
{ IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) is_codeobj_cb |= 1; }
if (is_codeobj_cb) {
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
executable,
CodeObjectCallback,
reinterpret_cast<void*>(0));
executable, CodeObjectCallback, reinterpret_cast<void*>(0));
}
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
HSA_RT(hsa_executable_iterate_symbols(
executable,
KernelSymbolCallback,
reinterpret_cast<void*>(0)));
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
reinterpret_cast<void*>(0)));
}
return status;
}
static hsa_status_t ExecutableDestroy(
hsa_executable_t executable)
{
static hsa_status_t ExecutableDestroy(hsa_executable_t executable) {
hsa_status_t status = HSA_STATUS_SUCCESS;
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
executable,
CodeObjectCallback,
reinterpret_cast<void*>(1));
executable, CodeObjectCallback, reinterpret_cast<void*>(1));
}
{
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
HSA_RT(hsa_executable_iterate_symbols(
executable,
KernelSymbolCallback,
reinterpret_cast<void*>(1)));
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
reinterpret_cast<void*>(1)));
}
}
@@ -33,9 +33,9 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
namespace rocprofiler {
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
extern decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
extern decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
class HsaProxyQueue : public ProxyQueue {
public:
@@ -40,16 +40,13 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
namespace rocprofiler {
enum {
K_CONC_OFF = 0,
K_CONC_PMC = 1,
K_CONC_TRACE = 2
};
enum { K_CONC_OFF = 0, K_CONC_PMC = 1, K_CONC_TRACE = 2 };
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
static inline void print_packet(const void* in_p, const uint32_t& in_n, const uint32_t& w_n = UINT32_MAX) {
static inline void print_packet(const void* in_p, const uint32_t& in_n,
const uint32_t& w_n = UINT32_MAX) {
const uint32_t size32 = util::HsaRsrcFactory::CMD_SLOT_SIZE_B / 4;
const uint32_t* beg = (const uint32_t*)in_p;
const uint32_t* end = beg + (in_n * size32);
@@ -85,31 +82,33 @@ class InterceptQueue {
typedef std::recursive_mutex mutex_t;
typedef std::map<uint64_t, InterceptQueue*> obj_map_t;
typedef hsa_status_t (*queue_callback_t)(hsa_queue_t*, void* data);
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t *queue, void *arg);
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t* queue, void* arg);
typedef uint32_t queue_id_t;
static void HsaIntercept(HsaApiTable* table);
static hsa_status_t InterceptQueueCreate(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source,
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue,
const bool& tracker_on) {
static hsa_status_t InterceptQueueCreate(
hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue,
const bool& tracker_on) {
std::lock_guard<mutex_t> lck(mutex_);
hsa_status_t status = HSA_STATUS_ERROR;
if (in_create_call_) EXC_ABORT(status, "recursive InterceptQueueCreate()");
in_create_call_ = true;
ProxyQueue* proxy = ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
group_segment_size, queue, &status);
ProxyQueue* proxy =
ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
group_segment_size, queue, &status);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "ProxyQueue::Create()");
if (tracker_on || tracker_on_) {
if (tracker_ == NULL) tracker_ = &Tracker::Instance();
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(*queue, true);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(
*queue, true);
if (status != HSA_STATUS_SUCCESS)
EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
}
InterceptQueue* obj = new InterceptQueue(agent, *queue, proxy);
@@ -138,15 +137,17 @@ class InterceptQueue {
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue) {
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, false);
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
group_segment_size, queue, false);
}
static hsa_status_t QueueCreateTracked(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source,
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue) {
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, true);
void (*callback)(hsa_status_t status, hsa_queue_t* source,
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue) {
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
group_segment_size, queue, true);
}
static hsa_status_t QueueDestroy(hsa_queue_t* queue) {
@@ -170,8 +171,8 @@ class InterceptQueue {
return status;
}
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
hsa_amd_queue_intercept_packet_writer writer) {
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx,
void* data, hsa_amd_queue_intercept_packet_writer writer) {
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
Queue* proxy = obj->proxy_;
@@ -195,10 +196,10 @@ class InterceptQueue {
obj->queue_id,
completion_signal,
dispatch_packet,
NULL, // kernel_name
0, // kernel_object
NULL, // kernel_code
0, // (uint32_t)syscall(__NR_gettid),
NULL, // kernel_name
0, // kernel_object
NULL, // kernel_code
0, // (uint32_t)syscall(__NR_gettid),
NULL}; // record
// Calling dispatch callback
@@ -210,7 +211,8 @@ class InterceptQueue {
if (group.feature_count != 0) {
if (tracker_ != NULL) {
Group* context_group = context->GetGroup(group.index);
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context_group->GetDispatchSignal();
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
context_group->GetDispatchSignal();
Tracker::Enable_opt(context_group, completion_signal);
context_group->IncrRefsCount();
}
@@ -254,8 +256,9 @@ class InterceptQueue {
const uint32_t tid = syscall(__NR_gettid);
hsa_queue_t* qptr = obj->queue_;
const void* slot_ptr = util::HsaRsrcFactory::GetSlotPointer(qptr, user_que_idx);
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n",
pid, tid, qptr, user_que_idx, in_packets, slot_ptr, count, header_val); fflush(stdout);
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n", pid, tid, qptr,
user_que_idx, in_packets, slot_ptr, count, header_val);
fflush(stdout);
print_packet(in_packets, count);
abort();
#endif
@@ -277,8 +280,9 @@ class InterceptQueue {
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
uint64_t kernel_object = dispatch_packet->kernel_object;
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
QueryKernelName(kernel_object, kernel_code) : NULL;
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
? QueryKernelName(kernel_object, kernel_code)
: NULL;
}
// Prepareing submit callback data
@@ -311,8 +315,11 @@ class InterceptQueue {
const bool is_serial = (k_concurrent_ == K_CONC_OFF);
if (tracker_ != NULL) {
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id, dispatch_packet->completion_signal, is_serial);
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->signal;
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id,
dispatch_packet->completion_signal, is_serial);
if (is_serial)
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
tracker_entry->signal;
}
// Prepareing dispatch callback data
@@ -339,7 +346,9 @@ class InterceptQueue {
// Injecting profiling start/stop/read packets
if ((status != HSA_STATUS_SUCCESS) || (group.context == NULL)) {
if (tracker_entry != NULL) {
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->orig;
if (is_serial)
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
tracker_entry->orig;
tracker_->Delete(tracker_entry);
}
} else {
@@ -351,11 +360,11 @@ class InterceptQueue {
const pkt_vector_t& read_vector = context->ReadPackets(group.index);
pkt_vector_t packets;
if (is_serial) { // serial
if (is_serial) { // serial
packets = start_vector;
packets.insert(packets.end(), *packet);
packets.insert(packets.end(), stop_vector.begin(), stop_vector.end());
} else { // concurrent
} else { // concurrent
// Insert start packets once
auto inject_start = [&packets](const pkt_vector_t& starts) mutable {
packets = starts;
@@ -363,14 +372,15 @@ class InterceptQueue {
std::call_once(once_flag_, inject_start, start_vector);
// Reads at both kernel start and end (also with barriers)
assert(read_vector.size() >= 2 * start_vector.size());
auto mid = read_vector.begin() + read_vector.size()/2;
auto mid = read_vector.begin() + read_vector.size() / 2;
// Read at kernel start
packets.insert(packets.end(), read_vector.begin(), mid);
// Kernel dispatch packet
assert(tracker_entry != NULL);
// Bind dispatch and barrier signals with tracker entry
tracker_->SetHandler(tracker_entry, context->GetGroup(group.index));
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context->GetGroup(group.index)->GetDispatchSignal();
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
context->GetGroup(group.index)->GetDispatchSignal();
packets.insert(packets.end(), *packet);
// Read at kernel end
packets.insert(packets.end(), mid, read_vector.end());
@@ -379,7 +389,8 @@ class InterceptQueue {
if (tracker_entry != NULL) {
Group* context_group = context->GetGroup(group.index);
context_group->IncrRefsCount();
tracker_->EnableContext(tracker_entry, Context::Handler, reinterpret_cast<void*>(context_group));
tracker_->EnableContext(tracker_entry, Context::Handler,
reinterpret_cast<void*>(context_group));
}
if (writer != NULL) {
@@ -409,8 +420,8 @@ class InterceptQueue {
}
}
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
hsa_amd_queue_intercept_packet_writer writer) {
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx,
void* data, hsa_amd_queue_intercept_packet_writer writer) {
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
Queue* proxy = obj->proxy_;
@@ -431,8 +442,9 @@ class InterceptQueue {
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
uint64_t kernel_object = dispatch_packet->kernel_object;
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
QueryKernelName(kernel_object, kernel_code) : NULL;
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
? QueryKernelName(kernel_object, kernel_code)
: NULL;
}
// Prepareing submit callback data
@@ -529,7 +541,9 @@ class InterceptQueue {
Stop();
}
static inline void Start() { dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release); }
static inline void Start() {
dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release);
}
static inline void Stop() { dispatch_callback_.store(NULL, std::memory_order_relaxed); }
static void SetSubmitCallback(rocprofiler_hsa_callback_fun_t fun, void* arg) {
@@ -545,7 +559,7 @@ class InterceptQueue {
static uint32_t k_concurrent_;
private:
static void queue_event_callback(hsa_status_t status, hsa_queue_t *queue, void *arg) {
static void queue_event_callback(hsa_status_t status, hsa_queue_t* queue, void* arg) {
if (status != HSA_STATUS_SUCCESS) {
uint32_t* read_ptr32 = (uint32_t*)util::HsaRsrcFactory::GetReadPointer(queue);
print_packet(read_ptr32, 1);
@@ -582,12 +596,13 @@ class InterceptQueue {
const uint16_t kernel_object_flag = *((uint64_t*)kernel_code + 1);
if (kernel_object_flag == 0) {
if (!util::HsaRsrcFactory::IsExecutableTracking()) {
EXC_ABORT(HSA_STATUS_ERROR, "Error: V3 code object detected - code objects tracking should be enabled\n");
EXC_ABORT(HSA_STATUS_ERROR,
"Error: V3 code object detected - code objects tracking should be enabled\n");
}
}
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking()) ?
util::HsaRsrcFactory::GetKernelNameRef(kernel_object) :
GetKernelName(kernel_code->runtime_loader_kernel_symbol);
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking())
? util::HsaRsrcFactory::GetKernelNameRef(kernel_object)
: GetKernelName(kernel_code->runtime_loader_kernel_symbol);
return kernel_symname;
}
@@ -618,17 +633,13 @@ class InterceptQueue {
return status;
}
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy) :
queue_(queue),
proxy_(proxy)
{
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy)
: queue_(queue), proxy_(proxy) {
agent_info_ = util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
queue_event_callback_ = NULL;
}
~InterceptQueue() {
ProxyQueue::Destroy(proxy_);
}
~InterceptQueue() { ProxyQueue::Destroy(proxy_); }
static const packet_word_t header_type_mask = (1ul << HSA_PACKET_HEADER_WIDTH_TYPE) - 1;
@@ -25,4 +25,4 @@ THE SOFTWARE.
namespace rocprofiler {
MetricsDict::map_t* MetricsDict::map_ = NULL;
MetricsDict::mutex_t MetricsDict::mutex_;
}
} // namespace rocprofiler
+5 -5
مشاهده پرونده
@@ -202,15 +202,15 @@ class MetricsDict {
xml_->AddConst("top.const.metric", "SE_NUM", agent_info->se_num);
ImportMetrics(agent_info, "const");
agent_name_ = agent_info->name;
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
std::unordered_set<std::string> supported_agent_names = {
"gfx906", "gfx908", "gfx90a", // Vega
"gfx940", "gfx941", "gfx942", // Mi300
"gfx906", "gfx908", "gfx90a", // Vega
"gfx940", "gfx941", "gfx942", // Mi300
"gfx1030", "gfx1031", "gfx1032", // Navi2x
"gfx1100", "gfx1101" // Navi3x
"gfx1100", "gfx1101" // Navi3x
};
if (supported_agent_names.find(agent_name_) != supported_agent_names.end()) {
ImportMetrics(agent_info, agent_name_);
@@ -140,7 +140,7 @@ class Profile {
static void SetConcurrent(profile_t* profile) {
// Check whether conconcurrent has been set
for (const parameter_t* p = profile->parameters;
p < (profile->parameters + profile->parameter_count); ++p) {
p < (profile->parameters + profile->parameter_count); ++p) {
// If yes, stop here
if (p->parameter_name == HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_K_CONCURRENT) {
return;
@@ -148,7 +148,7 @@ class Profile {
}
// Otherwise, try to set
parameter_t* parameters = new parameter_t[profile->parameter_count+1];
parameter_t* parameters = new parameter_t[profile->parameter_count + 1];
for (unsigned i = 0; i < profile->parameter_count; ++i) {
parameters[i].parameter_name = profile->parameters[i].parameter_name;
parameters[i].value = profile->parameters[i].value;
@@ -162,15 +162,16 @@ class Profile {
}
void BarrierPacket(packet_t* packet, const hsa_signal_t& prior_signal) {
hsa_barrier_and_packet_t* barrier =
reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
hsa_barrier_and_packet_t* barrier = reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
barrier->header = HSA_PACKET_TYPE_BARRIER_AND;
if (prior_signal.handle) barrier->dep_signal[0] = prior_signal; // set packet dependency
else barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
if (prior_signal.handle)
barrier->dep_signal[0] = prior_signal; // set packet dependency
else
barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
}
hsa_status_t Finalize(pkt_vector_t& start_vector, pkt_vector_t& stop_vector,
pkt_vector_t& read_vector, bool is_concurrent = false) {
pkt_vector_t& read_vector, bool is_concurrent = false) {
if (is_concurrent) SetConcurrent(&profile_);
hsa_status_t status = HSA_STATUS_SUCCESS;
@@ -180,8 +181,8 @@ class Profile {
const pfn_t* api = rsrc->AqlProfileApi();
packet_t start{};
packet_t stop{};
packet_t read{}; // read at kernel start
packet_t read2{}; // read at kernel end
packet_t read{}; // read at kernel start
packet_t read2{}; // read at kernel end
// Check the profile buffer sizes
status = api->hsa_ven_amd_aqlprofile_start(&profile_, NULL);
@@ -200,12 +201,12 @@ class Profile {
#ifdef AQLPROF_NEW_API
if (profile_.type == HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC) {
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read);
if (is_concurrent){ // concurrent: one more read
if (is_concurrent) { // concurrent: one more read
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read2);
}
}
#if 0 // Read API returns error if disabled
#if 0 // Read API returns error if disabled
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
#endif
#endif
@@ -220,7 +221,8 @@ class Profile {
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
if (is_concurrent) {
status = hsa_signal_create(1, 0, NULL, &read_signal_);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "signal_create " << std::hex << status);
read.completion_signal = read_signal_;
read2.completion_signal = completion_signal_;
} else {
@@ -239,7 +241,8 @@ class Profile {
BarrierPacket(&barrier_rd, read.completion_signal);
BarrierPacket(&barrier_rd2, dispatch_signal_);
status = hsa_signal_create(1, 0, NULL, &(barrier_signal_));
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "signal_create " << std::hex << status);
barrier_rd2.completion_signal = barrier_signal_;
}
@@ -297,8 +300,8 @@ class Profile {
void GetProfiles(profile_vector_t& vec) {
if (!info_vector_.empty()) {
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_,
dispatch_signal_, barrier_signal_, read_signal_});
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_, dispatch_signal_,
barrier_signal_, read_signal_});
}
}
@@ -330,11 +333,12 @@ class PmcProfile : public Profile {
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
profile_.command_buffer.ptr =
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
// Allocate profile output buffer from kernarg memory pool since kernarg
// memory buffer is uncached. So when GPU copies performance counter values
// to this buffer they are guaranteed to be visible to CPU.
profile_.output_buffer.ptr = rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
profile_.output_buffer.ptr =
rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
: HSA_STATUS_ERROR;
}
@@ -366,11 +370,11 @@ class TraceProfile : public Profile {
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
profile_.command_buffer.ptr =
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
profile_.output_buffer.size = output_buffer_size_;
profile_.output_buffer.ptr = (output_buffer_local_) ?
rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size) :
rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
profile_.output_buffer.ptr = (output_buffer_local_)
? rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size)
: rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
: HSA_STATUS_ERROR;
}
@@ -38,10 +38,10 @@ ProxyQueue* ProxyQueue::Create(hsa_agent_t agent, uint32_t size, hsa_queue_type3
hsa_status_t* status) {
hsa_status_t suc = HSA_STATUS_ERROR;
ProxyQueue* instance =
(rocp_type_) ? (ProxyQueue*) new SimpleProxyQueue() : (ProxyQueue*) new HsaProxyQueue();
(rocp_type_) ? (ProxyQueue*)new SimpleProxyQueue() : (ProxyQueue*)new HsaProxyQueue();
if (instance != NULL) {
suc = instance->Init(agent, size, type, callback, data, private_segment_size,
group_segment_size, queue);
group_segment_size, queue);
if (suc != HSA_STATUS_SUCCESS) {
delete instance;
instance = NULL;
@@ -75,34 +75,34 @@ hsa_status_t CreateQueuePro(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
hsa_queue_t** queue);
decltype(hsa_queue_create)* hsa_queue_create_fn;
decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
decltype(::hsa_queue_create)* hsa_queue_create_fn;
decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
decltype(hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
decltype(::hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
decltype(hsa_memory_copy)* hsa_memory_copy_fn;
decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
decltype(hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
decltype(::hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
::HsaApiTable* kHsaApiTable;
@@ -393,80 +393,80 @@ ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25;
PUBLIC_API bool OnLoad(HsaApiTable* table, uint64_t runtime_version, uint64_t failed_tool_count,
const char* const* failed_tool_names) {
ONLOAD_TRACE_BEG();
rocprofiler::SaveHsaApi(table);
rocprofiler::ProxyQueue::InitFactory();
rocprofiler::SaveHsaApi(table);
rocprofiler::ProxyQueue::InitFactory();
// Checking environment to enable intercept mode
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
// Checking environment to enable intercept mode
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
int intercept_env_value = 0;
if (intercept_env != NULL) {
intercept_env_value = atoi(intercept_env);
int intercept_env_value = 0;
if (intercept_env != NULL) {
intercept_env_value = atoi(intercept_env);
switch (intercept_env_value) {
case 0:
case 1:
// 0: Intercepting disabled
// 1: Intercepting enabled without timestamping
rocprofiler::InterceptQueue::TrackerOn(false);
break;
case 2:
// Intercepting enabled with timestamping
rocprofiler::InterceptQueue::TrackerOn(true);
break;
default:
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
<< intercept_env << "): "
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
"(intercepting with timestamp)");
return false;
}
switch (intercept_env_value) {
case 0:
case 1:
// 0: Intercepting disabled
// 1: Intercepting enabled without timestamping
rocprofiler::InterceptQueue::TrackerOn(false);
break;
case 2:
// Intercepting enabled with timestamping
rocprofiler::InterceptQueue::TrackerOn(true);
break;
default:
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
<< intercept_env << "): "
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
"(intercepting with timestamp)");
return false;
}
}
// always enable excutable tracking
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
// always enable excutable tracking
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
// Loading a tool lib and setting of intercept mode
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
// Loading a tool lib and setting of intercept mode
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
table->amd_ext_->hsa_amd_memory_async_copy_fn =
rocprofiler::hsa_amd_memory_async_copy_interceptor;
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
}
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
table->amd_ext_->hsa_amd_memory_async_copy_fn =
rocprofiler::hsa_amd_memory_async_copy_interceptor;
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
}
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
}
rocprofiler::HsaInterceptor::Enable(true);
rocprofiler::HsaInterceptor::HsaIntercept(table);
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
}
rocprofiler::HsaInterceptor::Enable(true);
rocprofiler::HsaInterceptor::HsaIntercept(table);
}
// HSA intercepting
if (intercept_env_value != 0) {
rocprofiler::ProxyQueue::HsaIntercept(table);
rocprofiler::InterceptQueue::HsaIntercept(table);
} else {
rocprofiler::StandaloneIntercept();
}
// HSA intercepting
if (intercept_env_value != 0) {
rocprofiler::ProxyQueue::HsaIntercept(table);
rocprofiler::InterceptQueue::HsaIntercept(table);
} else {
rocprofiler::StandaloneIntercept();
}
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
<< ")" << std::dec);
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
<< ")" << std::dec);
return true;
}
// HSA-runtime tool on-unload method
PUBLIC_API void OnUnload() {
ONLOAD_TRACE_BEG();
rocprofiler::UnloadTool();
rocprofiler::RestoreHsaApi();
rocprofiler::UnloadTool();
rocprofiler::RestoreHsaApi();
ONLOAD_TRACE_END();
}
@@ -27,22 +27,20 @@ namespace rocprofiler {
namespace att {
AttTracer::AttTracer(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
rocprofiler_session_id_t session_id)
rocprofiler_session_id_t session_id)
: buffer_id_(buffer_id), filter_id_(filter_id), session_id_(session_id) {}
void AttTracer::AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id,
hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties,
uint32_t thread_id, uint64_t queue_index) {
void AttTracer::AddPendingSignals(
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id, hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index) {
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
if (sessions_pending_signals_.find(writer_id) == sessions_pending_signals_.end())
sessions_pending_signals_.emplace(writer_id, std::vector<att_pending_signal_t>());
sessions_pending_signals_.at(writer_id).emplace_back(
att_pending_signal_t{kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, profile,
kernel_properties, thread_id, queue_index});
sessions_pending_signals_.at(writer_id).emplace_back(att_pending_signal_t{
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id,
profile, kernel_properties, thread_id, queue_index});
std::atomic_thread_fence(std::memory_order_release);
}
@@ -40,7 +40,7 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
}
break;
}
case ROCPROFILER_PC_SAMPLING_COLLECTION:{
case ROCPROFILER_PC_SAMPLING_COLLECTION: {
break;
}
case ROCPROFILER_ATT_TRACE_COLLECTION: {
@@ -62,8 +62,8 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
}
case ROCPROFILER_API_TRACE: {
tracer_apis_.clear();
for (uint32_t j = 0; j < data_count; j++){
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
for (uint32_t j = 0; j < data_count; j++) {
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
}
break;
}
@@ -195,7 +195,7 @@ void Filter::SetProperty(rocprofiler_filter_property_t property) {
case ROCPROFILER_FILTER_DISPATCH_IDS:
dispatch_id_filter_.clear();
for (uint32_t j = 0; j < property.data_count; j++)
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
break;
default:
break;
@@ -249,9 +249,7 @@ void Filter::SetCallback(rocprofiler_sync_callback_t& callback) {
bool Filter::HasCallback() { return has_sync_callback_; }
rocprofiler_sync_callback_t& Filter::GetCallback() {
return callback_;
}
rocprofiler_sync_callback_t& Filter::GetCallback() { return callback_; }
size_t Filter::GetPropertiesCount(rocprofiler_filter_property_kind_t kind) {
switch (kind) {
@@ -53,11 +53,8 @@ class Filter {
bool HasCallback();
void SetProperty(rocprofiler_filter_property_t property);
std::variant<
std::vector<std::string>,
uint32_t*,
std::vector<uint64_t>
> GetProperty(rocprofiler_filter_property_kind_t kind);
std::variant<std::vector<std::string>, uint32_t*, std::vector<uint64_t> > GetProperty(
rocprofiler_filter_property_kind_t kind);
size_t GetPropertiesCount(rocprofiler_filter_property_kind_t kind);
rocprofiler_spm_parameter_t* GetSpmParameterData();
@@ -74,11 +71,12 @@ class Filter {
std::vector<std::string> kernel_names_; // HIP/HSA API Functions
uint32_t dispatch_range_[2]; // Kernel Dispatches OR API Range
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
std::vector<rocprofiler_tracer_activity_domain_t> tracer_apis_; // ROCTX/HIP/HSA API
rocprofiler_spm_parameter_t* spm_parameter_; // spm parameter
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
rocprofiler_counters_sampler_parameters_t counters_sampler_parameters_; // sampled counters parameters
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
rocprofiler_counters_sampler_parameters_t
counters_sampler_parameters_; // sampled counters parameters
std::vector<uint64_t> dispatch_id_filter_;
bool has_sync_callback_{false};
@@ -125,17 +125,19 @@ bool Profiler::HasActivePass() {
}
void Profiler::AddPendingSignals(
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
hsa_ven_amd_aqlprofile_profile_t* profile, rocprofiler_kernel_properties_t kernel_properties,
uint32_t thread_id, uint64_t queue_index, uint64_t correlation_id) {
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id, rocprofiler::profiling_context_t* context,
uint64_t session_data_count, hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index,
uint64_t correlation_id) {
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
if (sessions_pending_signals_->find(writer_id) == sessions_pending_signals_->end())
sessions_pending_signals_->emplace(writer_id, std::vector<pending_signal_t*>());
sessions_pending_signals_->at(writer_id).emplace_back(new pending_signal_t{
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, context, session_data_count,
profile, kernel_properties, thread_id, queue_index, correlation_id});
sessions_pending_signals_->at(writer_id).emplace_back(
new pending_signal_t{kernel_object, original_completion_signal, new_completion_signal,
session_id_, buffer_id, context, session_data_count, profile,
kernel_properties, thread_id, queue_index, correlation_id});
}
const std::vector<pending_signal_t*>& Profiler::GetPendingSignals(uint32_t writer_id) {
@@ -36,7 +36,7 @@
#include "src/core/counters/metrics/eval_metrics.h"
typedef void (*rocprofiler_add_profiler_record_t)(rocprofiler_record_profiler_t&& record,
rocprofiler_session_id_t session_id);
rocprofiler_session_id_t session_id);
typedef rocprofiler_timestamp_t (*rocprofiler_get_timestamp_t)();
@@ -68,12 +68,13 @@ class Profiler {
~Profiler();
void AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id,
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
uint64_t queue_index, uint64_t correlation_id);
const hsa_signal_t& original_completion_signal,
const hsa_signal_t& new_completion_signal,
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
uint64_t queue_index, uint64_t correlation_id);
const std::vector<pending_signal_t*>& GetPendingSignals(uint32_t writer_id);
bool CheckPendingSignalsIsEmpty();
@@ -83,8 +84,10 @@ class Profiler {
std::string& GetCounterName(rocprofiler_counter_id_t handler);
bool FindCounter(rocprofiler_counter_id_t counter_id);
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind,
rocprofiler_counter_id_t counter_id);
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind,
rocprofiler_counter_id_t counter_id);
void StartReplayPass(rocprofiler_session_id_t session_id);
void EndReplayPass();
@@ -67,8 +67,8 @@ class Session {
// Filter
rocprofiler_filter_id_t CreateFilter(rocprofiler_filter_kind_t filter_kind,
rocprofiler_filter_data_t filter_data, uint64_t data_count,
rocprofiler_filter_property_t property);
rocprofiler_filter_data_t filter_data, uint64_t data_count,
rocprofiler_filter_property_t property);
bool FindFilter(rocprofiler_filter_id_t filter_id);
void DestroyFilter(rocprofiler_filter_id_t filter_id);
Filter* GetFilter(rocprofiler_filter_id_t filter_id);
@@ -83,7 +83,7 @@ class Session {
// Buffer
rocprofiler_buffer_id_t CreateBuffer(rocprofiler_buffer_callback_t buffer_callback,
size_t buffer_size);
size_t buffer_size);
bool FindBuffer(rocprofiler_buffer_id_t buffer_id);
void DestroyBuffer(rocprofiler_buffer_id_t buffer_id);
Memory::GenericBuffer* GetBuffer(rocprofiler_buffer_id_t buffer_id);
@@ -112,8 +112,7 @@ const char* roctracer_op_string(uint32_t domain, uint32_t op) {
case ACTIVITY_DOMAIN_EXT_API:
return "EXT_API";
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -178,8 +177,7 @@ constexpr uint32_t get_op_begin(activity_domain_t domain) {
case ACTIVITY_DOMAIN_EXT_API:
return 0;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -200,8 +198,7 @@ constexpr uint32_t get_op_end(activity_domain_t domain) {
case ACTIVITY_DOMAIN_EXT_API:
return get_op_begin(ACTIVITY_DOMAIN_EXT_API);
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -476,11 +473,10 @@ int TracerCallback(activity_domain_t domain, uint32_t operation_id, void* data)
rocprofiler::GetROCProfilerSingleton()
->GetSession((*pool)->session_id)
->GetBuffer((*pool)->buffer_id)
->AddRecord(
rocprofiler_record, record->kernel_name, kernel_name_size,
[](auto& rocprofiler_record, const void* data) {
rocprofiler_record.name = static_cast<const char*>(data);
});
->AddRecord(rocprofiler_record, record->kernel_name, kernel_name_size,
[](auto& rocprofiler_record, const void* data) {
rocprofiler_record.name = static_cast<const char*>(data);
});
} else {
rocprofiler::GetROCProfilerSingleton()
->GetSession((*pool)->session_id)
@@ -584,8 +580,7 @@ static void roctracer_enable_op_callback(activity_domain_t domain, uint32_t oper
user_data);
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -623,8 +618,7 @@ void roctracer_disable_op_callback(activity_domain_t domain, uint32_t operation_
ROCTX_registration_group.Unregister(roctx_api_callback_table, operation_id);
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -667,8 +661,7 @@ void roctracer_enable_op_activity(activity_domain_t domain, uint32_t op,
case ACTIVITY_DOMAIN_ROCTX:
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -710,8 +703,7 @@ void roctracer_disable_activity(activity_domain_t domain, uint32_t op) {
case ACTIVITY_DOMAIN_ROCTX:
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -774,8 +766,7 @@ void roctracer_set_properties(activity_domain_t domain, void* properties) {
break;
}
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -791,9 +782,7 @@ static std::string getKernelNameMultiKernelMultiDevice(hipLaunchParams* launchPa
return name_str.str();
}
template <typename... Ts> struct Overloaded : Ts... {
using Ts::operator()...;
};
template <typename... Ts> struct Overloaded : Ts... { using Ts::operator()...; };
template <class... Ts> Overloaded(Ts...) -> Overloaded<Ts...>;
std::optional<std::string> GetHipKernelName(uint32_t cid, hip_api_data_t* data) {
@@ -27,13 +27,19 @@ void SimpleProxyQueue::HsaIntercept(HsaApiTable* table) {
table->core_->hsa_signal_store_relaxed_fn = rocprofiler::SimpleProxyQueue::SignalStore;
table->core_->hsa_signal_store_screlease_fn = rocprofiler::SimpleProxyQueue::SignalStore;
table->core_->hsa_queue_load_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
table->core_->hsa_queue_load_write_index_relaxed_fn =
rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_relaxed_fn =
rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_relaxed_fn =
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
table->core_->hsa_queue_load_write_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_screlease_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
table->core_->hsa_queue_load_write_index_scacquire_fn =
rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_screlease_fn =
rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_scacquire_fn =
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
}
SimpleProxyQueue::queue_map_t* SimpleProxyQueue::queue_map_ = NULL;
@@ -33,23 +33,23 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
#ifndef ROCP_PROXY_LOCK
# define ROCP_PROXY_LOCK 1
#define ROCP_PROXY_LOCK 1
#endif
namespace rocprofiler {
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
extern decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
extern decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
extern decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
extern decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
extern decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
extern decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
extern decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
extern decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
extern decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
extern decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
extern decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
extern decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
typedef decltype(hsa_signal_t::handle) signal_handle_t;
@@ -128,7 +128,8 @@ class SimpleProxyQueue : public ProxyQueue {
const uint64_t que_idx = hsa_queue_load_write_index_relaxed_fn(queue_);
// Waiting untill there is a free space in the queue
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_));
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_))
;
// Increment the write index
hsa_queue_store_write_index_relaxed_fn(queue_, que_idx + 1);
@@ -163,8 +164,7 @@ class SimpleProxyQueue : public ProxyQueue {
queue_mask_(0),
submit_index_(0),
on_submit_cb_(NULL),
on_submit_cb_data_(NULL)
{
on_submit_cb_data_(NULL) {
printf("ROCProfiler: SimpleProxyQueue is enabled\n");
fflush(stdout);
}
@@ -203,8 +203,8 @@ class SimpleProxyQueue : public ProxyQueue {
if (queue_map_ == NULL) queue_map_ = new queue_map_t;
(*queue_map_)[queue_->doorbell_signal.handle] = this;
}
else abort();
} else
abort();
}
}
if (status != HSA_STATUS_SUCCESS) abort();
@@ -40,7 +40,7 @@ THE SOFTWARE.
namespace rocprofiler {
class Tracker {
public:
public:
typedef std::mutex mutex_t;
typedef util::HsaRsrcFactory::timestamp_t timestamp_t;
typedef rocprofiler_dispatch_record_t record_t;
@@ -89,7 +89,7 @@ class Tracker {
}
// Add tracker entry
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy=true) {
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy = true) {
hsa_status_t status = HSA_STATUS_ERROR;
// Creating a new tracker entry
@@ -108,10 +108,12 @@ class Tracker {
// Creating a proxy signal
if (proxy) {
entry->is_proxy = true;
const hsa_signal_value_t signal_value = (orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
const hsa_signal_value_t signal_value =
(orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
status = hsa_api_.hsa_signal_create(signal_value, 0, NULL, &(entry->signal));
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_signal_create");
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler, entry);
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT,
signal_value, Handler, entry);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
}
@@ -128,7 +130,8 @@ class Tracker {
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
hsa_signal_t& handler_signal = group->GetBarrierSignal();
entry->signal = dispatch_signal;
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(
handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
}
@@ -150,7 +153,8 @@ class Tracker {
// Debug trace
if (trace_on_) {
auto outstanding = outstanding_.fetch_add(1);
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record,
outstanding);
fflush(stdout);
}
}
@@ -173,12 +177,14 @@ class Tracker {
group->GetRecord()->dispatch = util::HsaRsrcFactory::Instance().TimestampNs();
// Creating a proxy signal
const hsa_signal_value_t signal_value = (orig_signal.handle) ?
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal) : 1;
const hsa_signal_value_t signal_value = (orig_signal.handle)
? util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal)
: 1;
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal, signal_value);
hsa_status_t status =
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal,
signal_value);
hsa_status_t status = util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(
dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
}
@@ -190,7 +196,8 @@ class Tracker {
record_t* record = group->GetRecord();
hsa_amd_profiling_dispatch_time_t dispatch_time{};
hsa_status_t status =
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(context->GetAgent(), dispatch_signal, &dispatch_time);
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(
context->GetAgent(), dispatch_signal, &dispatch_time);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
record->begin = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.start);
record->end = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.end);
@@ -203,22 +210,23 @@ class Tracker {
amd_signal_t* prof_signal_ptr = reinterpret_cast<amd_signal_t*>(dispatch_signal.handle);
orig_signal_ptr->start_ts = prof_signal_ptr->start_ts;
orig_signal_ptr->end_ts = prof_signal_ptr->end_ts;
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal, signal_value);
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal,
signal_value);
}
return Context::Handler(signal_value, arg);
}
private:
Tracker() :
outstanding_(0),
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
hsa_api_(*(hsa_rsrc_->HsaApi()))
{}
private:
Tracker()
: outstanding_(0),
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
hsa_api_(*(hsa_rsrc_->HsaApi())) {}
~Tracker() {
if (trace_on_) {
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()), outstanding_.load());
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()),
outstanding_.load());
fflush(stdout);
}
@@ -226,8 +234,8 @@ class Tracker {
auto end = sig_list_.end();
while (it != end) {
auto cur = it++;
// The wait should be optiona as there possible some inter kernel dependencies and it possible to wait for
// the kernels will never be lunched as the application was finished by some reason.
// The wait should be optiona as there possible some inter kernel dependencies and it possible to
// wait for the kernels will never be lunched as the application was finished by some reason.
#if 0
// FIXME: currently the signal value for tracking signals are taken from original application signal
hsa_rsrc_->SignalWait((*cur)->signal, 1);
@@ -246,20 +254,24 @@ class Tracker {
// Debug trace
if (trace_on_) {
auto outstanding = outstanding_.fetch_sub(1);
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record,
outstanding);
fflush(stdout);
}
// Query begin/end and complete timestamps
if (entry->is_memcopy) {
hsa_amd_profiling_async_copy_time_t async_copy_time{};
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
hsa_status_t status =
hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
record->begin = hsa_rsrc_->SysclockToNs(async_copy_time.start);
record->end = hsa_rsrc_->SysclockToNs(async_copy_time.end);
} else {
hsa_amd_profiling_dispatch_time_t dispatch_time{};
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
hsa_status_t status =
hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
record->begin = hsa_rsrc_->SysclockToNs(dispatch_time.start);
record->end = hsa_rsrc_->SysclockToNs(dispatch_time.end);
@@ -349,6 +361,6 @@ class Tracker {
static const bool trace_on_ = false;
};
} // namespace rocprofiler
} // namespace rocprofiler
#endif // SRC_CORE_TRACKER_H_
#endif // SRC_CORE_TRACKER_H_
@@ -36,11 +36,12 @@ typedef hsa_ext_amd_aql_pm4_packet_t packet_t;
typedef uint32_t packet_word_t;
typedef uint64_t timestamp_t;
inline std::ostream& operator<< (std::ostream& out, const event_t& event) {
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index << "). counter_id(" << event.counter_id << ")]";
inline std::ostream& operator<<(std::ostream& out, const event_t& event) {
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index
<< "). counter_id(" << event.counter_id << ")]";
return out;
}
inline std::ostream& operator<< (std::ostream& out, const parameter_t& parameter) {
inline std::ostream& operator<<(std::ostream& out, const parameter_t& parameter) {
out << "[parameter_name(" << parameter.parameter_name << "). value(" << parameter.value << ")]";
return out;
}
@@ -35,15 +35,12 @@
namespace rocprofiler::pc_sampler {
PCSampler::PCSampler(
rocprofiler_buffer_id_t buffer_id,
rocprofiler_filter_id_t filter_id,
rocprofiler_session_id_t session_id)
: buffer_id_(buffer_id)
, filter_id_(filter_id)
, session_id_(session_id)
, pci_system_initialized_(pci_system_init() == 0)
{}
PCSampler::PCSampler(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
rocprofiler_session_id_t session_id)
: buffer_id_(buffer_id),
filter_id_(filter_id),
session_id_(session_id),
pci_system_initialized_(pci_system_init() == 0) {}
PCSampler::~PCSampler() {
if (pci_system_initialized_) {
@@ -53,7 +50,9 @@ PCSampler::~PCSampler() {
}
void PCSampler::Start() {
if (sampler_thread_.joinable()) { return; }
if (sampler_thread_.joinable()) {
return;
}
devices_.clear();
@@ -61,15 +60,15 @@ void PCSampler::Start() {
agents_t agents;
rocprofiler::hsa_support::GetCoreApiTable().hsa_iterate_agents_fn(
[](hsa_agent_t agent, void *arg){
auto &agents = *reinterpret_cast<agents_t *>(arg);
agents.emplace_back(agent);
return HSA_STATUS_SUCCESS;
},
&agents);
[](hsa_agent_t agent, void* arg) {
auto& agents = *reinterpret_cast<agents_t*>(arg);
agents.emplace_back(agent);
return HSA_STATUS_SUCCESS;
},
&agents);
for (const auto &agent : agents) {
const auto &ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
for (const auto& agent : agents) {
const auto& ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
if (ai.getType() != HSA_DEVICE_TYPE_GPU) {
continue;
}
@@ -81,31 +80,30 @@ void PCSampler::Start() {
}
void PCSampler::Stop() {
if (!sampler_thread_.joinable()) { return; }
if (!sampler_thread_.joinable()) {
return;
}
keep_running_ = false;
sampler_thread_.join();
}
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t &record) {
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t& record) {
const auto tool = rocprofiler::GetROCProfilerSingleton();
const auto session = tool->GetSession(session_id_);
const auto buffer = session->GetBuffer(buffer_id_);
std::lock_guard<std::mutex> lk(session->GetSessionLock());
record.header = {
ROCPROFILER_PC_SAMPLING_RECORD,
{ tool->GetUniqueRecordId() }
};
record.header = {ROCPROFILER_PC_SAMPLING_RECORD, {tool->GetUniqueRecordId()}};
buffer->AddRecord(record);
}
void PCSampler::SamplerLoop() {
while (keep_running_) {
auto next_tick = std::chrono::steady_clock::now() + std::chrono::milliseconds(10);
for (auto &agent : devices_) {
auto &device = agent.second;
for (auto& agent : devices_) {
auto& device = agent.second;
if (device.fd_.mmio2.get() >= 0) {
gfxip::read_pc_samples_v9_ioctl(device, this);
} else {
@@ -116,4 +114,4 @@ void PCSampler::SamplerLoop() {
}
}
} // namespace rocprofiler::pc_sampler
} // namespace rocprofiler::pc_sampler
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
@@ -23,244 +23,244 @@
// addressBlock: gc_grbmdec
// base address: 0x8000
#define mmGRBM_CNTL 0x0000
#define mmGRBM_CNTL_BASE_IDX 0
#define mmGRBM_SKEW_CNTL 0x0001
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
#define mmGRBM_STATUS2 0x0002
#define mmGRBM_STATUS2_BASE_IDX 0
#define mmGRBM_PWR_CNTL 0x0003
#define mmGRBM_PWR_CNTL_BASE_IDX 0
#define mmGRBM_STATUS 0x0004
#define mmGRBM_STATUS_BASE_IDX 0
#define mmGRBM_STATUS_SE0 0x0005
#define mmGRBM_STATUS_SE0_BASE_IDX 0
#define mmGRBM_STATUS_SE1 0x0006
#define mmGRBM_STATUS_SE1_BASE_IDX 0
#define mmGRBM_SOFT_RESET 0x0008
#define mmGRBM_SOFT_RESET_BASE_IDX 0
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
#define mmGRBM_STATUS_SE2 0x000e
#define mmGRBM_STATUS_SE2_BASE_IDX 0
#define mmGRBM_STATUS_SE3 0x000f
#define mmGRBM_STATUS_SE3_BASE_IDX 0
#define mmGRBM_READ_ERROR 0x0016
#define mmGRBM_READ_ERROR_BASE_IDX 0
#define mmGRBM_READ_ERROR2 0x0017
#define mmGRBM_READ_ERROR2_BASE_IDX 0
#define mmGRBM_INT_CNTL 0x0018
#define mmGRBM_INT_CNTL_BASE_IDX 0
#define mmGRBM_TRAP_OP 0x0019
#define mmGRBM_TRAP_OP_BASE_IDX 0
#define mmGRBM_TRAP_ADDR 0x001a
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
#define mmGRBM_TRAP_ADDR_MSK 0x001b
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
#define mmGRBM_TRAP_WD 0x001c
#define mmGRBM_TRAP_WD_BASE_IDX 0
#define mmGRBM_TRAP_WD_MSK 0x001d
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
#define mmGRBM_DSM_BYPASS 0x001e
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
#define mmGRBM_WRITE_ERROR 0x001f
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
#define mmGRBM_IOV_ERROR 0x0020
#define mmGRBM_IOV_ERROR_BASE_IDX 0
#define mmGRBM_CHIP_REVISION 0x0021
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
#define mmGRBM_GFX_CNTL 0x0022
#define mmGRBM_GFX_CNTL_BASE_IDX 0
#define mmGRBM_RSMU_CFG 0x0023
#define mmGRBM_RSMU_CFG_BASE_IDX 0
#define mmGRBM_IH_CREDIT 0x0024
#define mmGRBM_IH_CREDIT_BASE_IDX 0
#define mmGRBM_PWR_CNTL2 0x0025
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
#define mmGRBM_RSMU_READ_ERROR 0x0028
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
#define mmGRBM_CHICKEN_BITS 0x0029
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
#define mmGRBM_FENCE_RANGE0 0x002a
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
#define mmGRBM_FENCE_RANGE1 0x002b
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
#define mmGRBM_NOWHERE 0x003f
#define mmGRBM_NOWHERE_BASE_IDX 0
#define mmGRBM_SCRATCH_REG0 0x0040
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
#define mmGRBM_SCRATCH_REG1 0x0041
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
#define mmGRBM_SCRATCH_REG2 0x0042
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
#define mmGRBM_SCRATCH_REG3 0x0043
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
#define mmGRBM_SCRATCH_REG4 0x0044
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
#define mmGRBM_SCRATCH_REG5 0x0045
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
#define mmGRBM_SCRATCH_REG6 0x0046
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
#define mmGRBM_SCRATCH_REG7 0x0047
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
#define mmGRBM_CNTL 0x0000
#define mmGRBM_CNTL_BASE_IDX 0
#define mmGRBM_SKEW_CNTL 0x0001
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
#define mmGRBM_STATUS2 0x0002
#define mmGRBM_STATUS2_BASE_IDX 0
#define mmGRBM_PWR_CNTL 0x0003
#define mmGRBM_PWR_CNTL_BASE_IDX 0
#define mmGRBM_STATUS 0x0004
#define mmGRBM_STATUS_BASE_IDX 0
#define mmGRBM_STATUS_SE0 0x0005
#define mmGRBM_STATUS_SE0_BASE_IDX 0
#define mmGRBM_STATUS_SE1 0x0006
#define mmGRBM_STATUS_SE1_BASE_IDX 0
#define mmGRBM_SOFT_RESET 0x0008
#define mmGRBM_SOFT_RESET_BASE_IDX 0
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
#define mmGRBM_STATUS_SE2 0x000e
#define mmGRBM_STATUS_SE2_BASE_IDX 0
#define mmGRBM_STATUS_SE3 0x000f
#define mmGRBM_STATUS_SE3_BASE_IDX 0
#define mmGRBM_READ_ERROR 0x0016
#define mmGRBM_READ_ERROR_BASE_IDX 0
#define mmGRBM_READ_ERROR2 0x0017
#define mmGRBM_READ_ERROR2_BASE_IDX 0
#define mmGRBM_INT_CNTL 0x0018
#define mmGRBM_INT_CNTL_BASE_IDX 0
#define mmGRBM_TRAP_OP 0x0019
#define mmGRBM_TRAP_OP_BASE_IDX 0
#define mmGRBM_TRAP_ADDR 0x001a
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
#define mmGRBM_TRAP_ADDR_MSK 0x001b
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
#define mmGRBM_TRAP_WD 0x001c
#define mmGRBM_TRAP_WD_BASE_IDX 0
#define mmGRBM_TRAP_WD_MSK 0x001d
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
#define mmGRBM_DSM_BYPASS 0x001e
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
#define mmGRBM_WRITE_ERROR 0x001f
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
#define mmGRBM_IOV_ERROR 0x0020
#define mmGRBM_IOV_ERROR_BASE_IDX 0
#define mmGRBM_CHIP_REVISION 0x0021
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
#define mmGRBM_GFX_CNTL 0x0022
#define mmGRBM_GFX_CNTL_BASE_IDX 0
#define mmGRBM_RSMU_CFG 0x0023
#define mmGRBM_RSMU_CFG_BASE_IDX 0
#define mmGRBM_IH_CREDIT 0x0024
#define mmGRBM_IH_CREDIT_BASE_IDX 0
#define mmGRBM_PWR_CNTL2 0x0025
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
#define mmGRBM_RSMU_READ_ERROR 0x0028
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
#define mmGRBM_CHICKEN_BITS 0x0029
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
#define mmGRBM_FENCE_RANGE0 0x002a
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
#define mmGRBM_FENCE_RANGE1 0x002b
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
#define mmGRBM_NOWHERE 0x003f
#define mmGRBM_NOWHERE_BASE_IDX 0
#define mmGRBM_SCRATCH_REG0 0x0040
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
#define mmGRBM_SCRATCH_REG1 0x0041
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
#define mmGRBM_SCRATCH_REG2 0x0042
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
#define mmGRBM_SCRATCH_REG3 0x0043
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
#define mmGRBM_SCRATCH_REG4 0x0044
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
#define mmGRBM_SCRATCH_REG5 0x0045
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
#define mmGRBM_SCRATCH_REG6 0x0046
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
#define mmGRBM_SCRATCH_REG7 0x0047
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
// addressBlock: gc_cppdec2
// base address: 0xc600
#define mmCPF_EDC_TAG_CNT 0x1189
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
#define mmCPF_EDC_ROQ_CNT 0x118a
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
#define mmCPG_EDC_TAG_CNT 0x118b
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
#define mmCPG_EDC_DMA_CNT 0x118d
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
#define mmCPC_EDC_SCRATCH_CNT 0x118e
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
#define mmCPC_EDC_UCODE_CNT 0x118f
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
#define mmDC_EDC_STATE_CNT 0x1191
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
#define mmDC_EDC_CSINVOC_CNT 0x1192
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
#define mmDC_EDC_RESTORE_CNT 0x1193
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
#define mmCPF_EDC_TAG_CNT 0x1189
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
#define mmCPF_EDC_ROQ_CNT 0x118a
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
#define mmCPG_EDC_TAG_CNT 0x118b
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
#define mmCPG_EDC_DMA_CNT 0x118d
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
#define mmCPC_EDC_SCRATCH_CNT 0x118e
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
#define mmCPC_EDC_UCODE_CNT 0x118f
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
#define mmDC_EDC_STATE_CNT 0x1191
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
#define mmDC_EDC_CSINVOC_CNT 0x1192
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
#define mmDC_EDC_RESTORE_CNT 0x1193
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
// addressBlock: gc_gdsdec
// base address: 0x9700
#define mmGDS_EDC_CNT 0x05c5
#define mmGDS_EDC_CNT_BASE_IDX 0
#define mmGDS_EDC_GRBM_CNT 0x05c6
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_DED 0x05c7
#define mmGDS_EDC_OA_DED_BASE_IDX 0
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
#define mmGDS_EDC_CNT 0x05c5
#define mmGDS_EDC_CNT_BASE_IDX 0
#define mmGDS_EDC_GRBM_CNT 0x05c6
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_DED 0x05c7
#define mmGDS_EDC_OA_DED_BASE_IDX 0
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
// addressBlock: gc_shsdec
// base address: 0x9000
#define mmSPI_EDC_CNT 0x0445
#define mmSPI_EDC_CNT_BASE_IDX 0
#define mmSPI_EDC_CNT 0x0445
#define mmSPI_EDC_CNT_BASE_IDX 0
// addressBlock: gc_sqdec
// base address: 0x8c00
#define mmSQC_EDC_CNT2 0x032c
#define mmSQC_EDC_CNT2_BASE_IDX 0
#define mmSQC_EDC_CNT3 0x032d
#define mmSQC_EDC_CNT3_BASE_IDX 0
#define mmSQC_EDC_PARITY_CNT3 0x032e
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
#define mmSQC_EDC_CNT 0x03a2
#define mmSQC_EDC_CNT_BASE_IDX 0
#define mmSQ_EDC_SEC_CNT 0x03a3
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
#define mmSQ_EDC_DED_CNT 0x03a4
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
#define mmSQ_EDC_INFO 0x03a5
#define mmSQ_EDC_INFO_BASE_IDX 0
#define mmSQ_EDC_CNT 0x03a6
#define mmSQ_EDC_CNT_BASE_IDX 0
#define mmSQC_EDC_CNT2 0x032c
#define mmSQC_EDC_CNT2_BASE_IDX 0
#define mmSQC_EDC_CNT3 0x032d
#define mmSQC_EDC_CNT3_BASE_IDX 0
#define mmSQC_EDC_PARITY_CNT3 0x032e
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
#define mmSQC_EDC_CNT 0x03a2
#define mmSQC_EDC_CNT_BASE_IDX 0
#define mmSQ_EDC_SEC_CNT 0x03a3
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
#define mmSQ_EDC_DED_CNT 0x03a4
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
#define mmSQ_EDC_INFO 0x03a5
#define mmSQ_EDC_INFO_BASE_IDX 0
#define mmSQ_EDC_CNT 0x03a6
#define mmSQ_EDC_CNT_BASE_IDX 0
// addressBlock: gc_tpdec
// base address: 0x9400
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
// addressBlock: gc_tcdec
// base address: 0xac00
#define mmTCP_EDC_CNT 0x0b17
#define mmTCP_EDC_CNT_BASE_IDX 0
#define mmTCP_EDC_CNT_NEW 0x0b18
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
#define mmTCI_EDC_CNT 0x0b60
#define mmTCI_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT 0x0b82
#define mmTCC_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT2 0x0b83
#define mmTCC_EDC_CNT2_BASE_IDX 0
#define mmTCA_EDC_CNT 0x0bc5
#define mmTCA_EDC_CNT_BASE_IDX 0
#define mmTCP_EDC_CNT 0x0b17
#define mmTCP_EDC_CNT_BASE_IDX 0
#define mmTCP_EDC_CNT_NEW 0x0b18
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
#define mmTCI_EDC_CNT 0x0b60
#define mmTCI_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT 0x0b82
#define mmTCC_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT2 0x0b83
#define mmTCC_EDC_CNT2_BASE_IDX 0
#define mmTCA_EDC_CNT 0x0bc5
#define mmTCA_EDC_CNT_BASE_IDX 0
// addressBlock: gc_tpdec
// base address: 0x9400
#define mmTD_EDC_CNT 0x052e
#define mmTD_EDC_CNT_BASE_IDX 0
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
#define mmTD_EDC_CNT 0x052e
#define mmTD_EDC_CNT_BASE_IDX 0
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
// addressBlock: gc_ea_gceadec2
// base address: 0x9c00
#define mmGCEA_EDC_CNT 0x0706
#define mmGCEA_EDC_CNT_BASE_IDX 0
#define mmGCEA_EDC_CNT2 0x0707
#define mmGCEA_EDC_CNT2_BASE_IDX 0
#define mmGCEA_EDC_CNT3 0x071b
#define mmGCEA_EDC_CNT3_BASE_IDX 0
#define mmGCEA_ERR_STATUS 0x0712
#define mmGCEA_ERR_STATUS_BASE_IDX 0
#define mmGCEA_EDC_CNT 0x0706
#define mmGCEA_EDC_CNT_BASE_IDX 0
#define mmGCEA_EDC_CNT2 0x0707
#define mmGCEA_EDC_CNT2_BASE_IDX 0
#define mmGCEA_EDC_CNT3 0x071b
#define mmGCEA_EDC_CNT3_BASE_IDX 0
#define mmGCEA_ERR_STATUS 0x0712
#define mmGCEA_ERR_STATUS_BASE_IDX 0
// addressBlock: gc_gfxudec
// base address: 0x30000
#define mmSCRATCH_REG0 0x2040
#define mmSCRATCH_REG0_BASE_IDX 1
#define mmSCRATCH_REG1 0x2041
#define mmSCRATCH_REG1_BASE_IDX 1
#define mmSCRATCH_REG2 0x2042
#define mmSCRATCH_REG2_BASE_IDX 1
#define mmSCRATCH_REG3 0x2043
#define mmSCRATCH_REG3_BASE_IDX 1
#define mmSCRATCH_REG4 0x2044
#define mmSCRATCH_REG4_BASE_IDX 1
#define mmSCRATCH_REG5 0x2045
#define mmSCRATCH_REG5_BASE_IDX 1
#define mmSCRATCH_REG6 0x2046
#define mmSCRATCH_REG6_BASE_IDX 1
#define mmSCRATCH_REG7 0x2047
#define mmSCRATCH_REG7_BASE_IDX 1
#define mmGRBM_GFX_INDEX 0x2200
#define mmGRBM_GFX_INDEX_BASE_IDX 1
#define mmSCRATCH_REG0 0x2040
#define mmSCRATCH_REG0_BASE_IDX 1
#define mmSCRATCH_REG1 0x2041
#define mmSCRATCH_REG1_BASE_IDX 1
#define mmSCRATCH_REG2 0x2042
#define mmSCRATCH_REG2_BASE_IDX 1
#define mmSCRATCH_REG3 0x2043
#define mmSCRATCH_REG3_BASE_IDX 1
#define mmSCRATCH_REG4 0x2044
#define mmSCRATCH_REG4_BASE_IDX 1
#define mmSCRATCH_REG5 0x2045
#define mmSCRATCH_REG5_BASE_IDX 1
#define mmSCRATCH_REG6 0x2046
#define mmSCRATCH_REG6_BASE_IDX 1
#define mmSCRATCH_REG7 0x2047
#define mmSCRATCH_REG7_BASE_IDX 1
#define mmGRBM_GFX_INDEX 0x2200
#define mmGRBM_GFX_INDEX_BASE_IDX 1
// addressBlock: gc_utcl2_atcl2dec
// base address: 0xa000
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
// addressBlock: gc_utcl2_vml2pfdec
// base address: 0xa100
#define mmVML2_MEM_ECC_INDEX 0x0860
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
#define mmUTCL2_MEM_ECC_INDEX 0x0862
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_MEM_ECC_INDEX 0x0860
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
#define mmUTCL2_MEM_ECC_INDEX 0x0862
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_MEM_ECC_CNTL 0x0863
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
#define mmUTCL2_MEM_ECC_CNTL 0x0865
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
#define mmVML2_MEM_ECC_CNTL 0x0863
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
#define mmUTCL2_MEM_ECC_CNTL 0x0865
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
// addressBlock: gc_rlcpdec
// base address: 0x3b000
#define mmRLC_EDC_CNT 0x4d40
#define mmRLC_EDC_CNT_BASE_IDX 1
#define mmRLC_EDC_CNT2 0x4d41
#define mmRLC_EDC_CNT2_BASE_IDX 1
#define mmRLC_EDC_CNT 0x4d40
#define mmRLC_EDC_CNT_BASE_IDX 1
#define mmRLC_EDC_CNT2 0x4d41
#define mmRLC_EDC_CNT2_BASE_IDX 1
#endif
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
@@ -41,18 +41,17 @@ namespace rocprofiler::pc_sampler::gfxip {
namespace {
static int find_pci_instance(const std::string &pci_string) {
static int find_pci_instance(const std::string& pci_string) {
rocprofiler::handle_t<DIR*, util::dir_closer> dir(opendir(DEBUG_DRI_PATH));
if (dir.get() == nullptr) {
char *errstr = strerror(errno);
char* errstr = strerror(errno);
warning("Can't open debugfs dri directory: %s\n", errstr);
goto fail;
}
struct dirent *dent;
struct dirent* dent;
while ((dent = readdir(dir.get())) != nullptr) {
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0)
continue;
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0) continue;
std::string name(DEBUG_DRI_PATH);
name += dent->d_name;
@@ -66,8 +65,7 @@ static int find_pci_instance(const std::string &pci_string) {
ifs >> device;
}
if (device.empty()) continue;
if (auto p = device.find(DEV_PFX); p != device.npos)
device.erase(p, strlen(DEV_PFX));
if (auto p = device.find(DEV_PFX); p != device.npos) device.erase(p, strlen(DEV_PFX));
if (pci_string == device) return std::stoi(dent->d_name);
}
@@ -75,7 +73,7 @@ fail:
return -1;
}
} // namespace
} // namespace
uint32_t pasid() {
static std::optional<uint32_t> pasid;
@@ -89,9 +87,7 @@ uint32_t pasid() {
return *pasid;
}
int debugfs_ioctl_set_state(
const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc) {
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc) {
int ret = ioctl(dev.fd_.mmio2.get(), AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE, &ioc);
if (ret < 0) {
fatal("Couldn't set register ioctl state\n");
@@ -99,11 +95,9 @@ int debugfs_ioctl_set_state(
return ret;
}
int debugfs_ioctl_write_register(
const device_t &dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc,
const uint64_t addr,
const uint32_t value) {
int debugfs_ioctl_write_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr, const uint32_t value) {
debugfs_ioctl_set_state(dev, ioc);
if (lseek(dev.fd_.mmio2.get(), addr * 4, SEEK_SET) < 0) {
fatal("Cannot seek to MMIO address for write\n");
@@ -115,10 +109,9 @@ int debugfs_ioctl_write_register(
return r;
}
uint32_t debugfs_ioctl_read_register(
const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc,
const uint64_t addr) {
uint32_t debugfs_ioctl_read_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr) {
// Select the SE, SH, and CU.
debugfs_ioctl_set_state(dev, ioc);
@@ -134,20 +127,17 @@ uint32_t debugfs_ioctl_read_register(
return value;
}
device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
: agent_info_(info)
, pci_memory_(nullptr)
{
device_t::device_t(const bool pci_inited, const Agent::AgentInfo& info)
: agent_info_(info), pci_memory_(nullptr) {
const auto pci_domain = agent_info_.getPCIDomain();
const auto pci_location_id = agent_info_.getPCILocationID();
std::string name([pci_domain, pci_location_id]() {
std::ostringstream out;
out.fill('0');
out << std::hex << std::setw(4) << pci_domain << ':'
<< std::hex << std::setw(2) << (pci_location_id >> 8) << ':'
<< std::hex << std::setw(2) << (pci_location_id & 0xFF) << '.'
<< 0;
out << std::hex << std::setw(4) << pci_domain << ':' << std::hex << std::setw(2)
<< (pci_location_id >> 8) << ':' << std::hex << std::setw(2) << (pci_location_id & 0xFF)
<< '.' << 0;
return out.str();
}());
@@ -162,8 +152,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
if (fd_.mmio2.get() < 0) {
warning("Couldn't open amdgpu_regs2 debugfs file\n");
if (!pci_inited) {
constexpr char msg[] =
"PCI system uninitialized; no PC sampling methods available\n";
constexpr char msg[] = "PCI system uninitialized; no PC sampling methods available\n";
fatal(msg);
}
} else {
@@ -173,8 +162,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
pci_device_ =
pci_device_find_by_slot(pci_domain, pci_location_id >> 8, pci_location_id & 0xFF, 0);
if (!pci_device_ || pci_device_probe(pci_device_))
fatal("failed to probe the GPU device\n");
if (!pci_device_ || pci_device_probe(pci_device_)) fatal("failed to probe the GPU device\n");
// Look for a region between 256KB and 4096KB, 32-bit, non IO, and non prefetchable.
for (size_t region = 0; region < sizeof(pci_device::regions) / sizeof(pci_device::regions[0]);
@@ -199,11 +187,9 @@ device_specific_init:
}
device_t::~device_t() {
if (pci_memory_ &&
pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_))
{
if (pci_memory_ && pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_)) {
warning("failed to unmap the pci memory\n");
}
}
} // namespace rocprofiler::pc_sampler::gfxip
} // namespace rocprofiler::pc_sampler::gfxip
@@ -52,14 +52,18 @@ namespace gfxip {
namespace util {
struct dir_closer {
void operator()(DIR *dir) { if (dir != nullptr) closedir(dir); }
void operator()(DIR* dir) {
if (dir != nullptr) closedir(dir);
}
};
struct fd_closer {
void operator()(int fd) { if (fd >= 0) close(fd); }
void operator()(int fd) {
if (fd >= 0) close(fd);
}
};
} // namespace rocprofiler::pc_sampler::gfxip::util
} // namespace util
struct amdgpu_debugfs_regs2_iocdata {
__u32 use_srbm, use_grbm, pg_lock;
@@ -71,11 +75,10 @@ struct amdgpu_debugfs_regs2_iocdata {
} srbm;
};
enum AMDGPU_DEBUGFS_REGS2_CMDS {
AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0
};
enum AMDGPU_DEBUGFS_REGS2_CMDS { AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0 };
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE _IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE \
_IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
enum {
GC_HWIP = 1, // Graphics Core IP
@@ -96,14 +99,14 @@ static constexpr int HWIP_MAX_INSTANCE = 11;
(REG_FIELD_MASK(reg, field) & ((field_val) << REG_FIELD_SHIFT(reg, field))))
struct device_t {
device_t(const bool pci_inited, const Agent::AgentInfo &agent_info);
device_t(const bool pci_inited, const Agent::AgentInfo& agent_info);
~device_t();
device_t(const device_t&) = delete;
device_t& operator=(const device_t&) = delete;
device_t(device_t&&) = default;
const Agent::AgentInfo &agent_info_;
const Agent::AgentInfo& agent_info_;
struct pci_device* pci_device_;
size_t pci_memory_size_;
@@ -120,19 +123,23 @@ struct device_t {
uint32_t pasid();
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc);
int debugfs_ioctl_write_register(const device_t &dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr, const uint32_t value);
uint32_t debugfs_ioctl_read_register(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr);
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc);
int debugfs_ioctl_write_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr, const uint32_t value);
uint32_t debugfs_ioctl_read_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr);
void vega10_reg_offset_init(device_t& dev);
void vega20_reg_offset_init(device_t& dev);
void arct_reg_offset_init(device_t& dev);
void aldebaran_reg_offset_init(device_t& dev);
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler);
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler);
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler);
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler);
} // namespace rocprofiler::pc_sampler::gfxip
} // namespace gfxip
} // namespace rocprofiler::pc_sampler
} // namespace rocprofiler::pc_sampler
#endif // SRC_PCSAMPLER_GFXIP_GFXIP_H_
@@ -54,12 +54,10 @@ uint32_t read_sq_register(const device_t& dev, uint32_t simd, uint32_t wave_id,
return dev.pci_memory_[REG_OFFSET(GC, 0, mmSQ_IND_DATA)];
}
uint32_t debugfs_ioctl_read_sq_register(
const device_t &dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc,
const uint32_t simd,
const uint32_t wave_id,
const uint32_t register_address) {
uint32_t debugfs_ioctl_read_sq_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint32_t simd, const uint32_t wave_id,
const uint32_t register_address) {
uint32_t data = REG_SET_FIELD(0, SQ_IND_INDEX, WAVE_ID, wave_id);
data = REG_SET_FIELD(data, SQ_IND_INDEX, SIMD_ID, simd);
data = REG_SET_FIELD(data, SQ_IND_INDEX, INDEX, register_address);
@@ -67,21 +65,15 @@ uint32_t debugfs_ioctl_read_sq_register(
return debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmSQ_IND_DATA));
}
void fill_record(
const device_t &dev,
rocprofiler_record_pc_sample_t *record,
uint32_t se,
uint64_t pc,
hsa_kernel_dispatch_packet_t *pkt) {
void fill_record(const device_t& dev, rocprofiler_record_pc_sample_t* record, uint32_t se,
uint64_t pc, hsa_kernel_dispatch_packet_t* pkt) {
/*
* XXX: Use of the reserved2 field in the HSA dispatch packet to uniquely
* identify kernel dispatches for PC sampling is an internal implementation
* detail which is subject to change. See the comment associated with
* rocprofiler::rocprofiler::kernel_dispatch_counter_.
*/
record->pc_sample.dispatch_id =
rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
record->pc_sample.dispatch_id = rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
/*
* TODO: Fill this with gpu_clock_counter via AMDKFD_IOC_GET_CLOCK_COUNTERS,
@@ -98,12 +90,12 @@ void fill_record(
* Future sampling methods may fill this in automatically from the GPU's
* real-time counter.
*/
//record->pc_sample.cycle = 0;
// record->pc_sample.cycle = 0;
rocprofiler_get_timestamp(&record->pc_sample.timestamp);
record->pc_sample.pc = pc;
record->pc_sample.se = se;
const auto &hdl = dev.agent_info_.getHandle();
const auto& hdl = dev.agent_info_.getHandle();
/*
* XXX FIXME: For consistency, this is the same method as used by
@@ -112,17 +104,16 @@ void fill_record(
* comment in rocprofiler::hsa_support::Initialize about using KFD's gpu_id for
* more information.
*/
record->pc_sample.gpu_id = rocprofiler_agent_id_t{
(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
record->pc_sample.gpu_id =
rocprofiler_agent_id_t{(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
}
} // namespace
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler) {
assert(sampler);
uint32_t saved_grbm_gfx_index =
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
uint32_t saved_grbm_gfx_index = dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
uint32_t data;
for (uint32_t se = 0; se < dev.agent_info_.getShaderEngineCount(); ++se)
@@ -174,19 +165,16 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
data = REG_SET_FIELD(data, GRBM_GFX_CNTL, VMID, vm_id);
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL)] = data;
uint32_t pq_base_lo =
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
uint32_t pq_base_hi =
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
uint32_t pq_base_lo = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
uint32_t pq_base_hi = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
uint32_t cp_hqd_pq_control_queue_size =
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
auto pkt = (hsa_kernel_dispatch_packet_t*)(
pq_base + disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t)
);
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t));
fill_record(dev, &record, se, *pc, pkt);
}
@@ -208,10 +196,10 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)] = saved_grbm_gfx_index;
}
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler) {
assert(sampler);
struct amdgpu_debugfs_regs2_iocdata ioc{};
struct amdgpu_debugfs_regs2_iocdata ioc {};
ioc.use_grbm = 1;
uint32_t data;
@@ -236,11 +224,13 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
// Skip this slot if the wave is not valid.
debugfs_ioctl_set_state(dev, ioc);
uint32_t status = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
uint32_t status =
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
if (!REG_GET_FIELD(status, SQ_WAVE_STATUS, VALID)) continue;
debugfs_ioctl_set_state(dev, ioc);
uint32_t hw_id = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
uint32_t hw_id =
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
uint32_t vm_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, VM_ID);
rocprofiler_record_pc_sample_t record;
@@ -248,12 +238,16 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
// If the wave's PASID matches the process', read and report the PC
// and dispatch packet for the wave.
std::optional<uint64_t> pc;
if (debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
pc = (uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI) << 32 |
if (debugfs_ioctl_read_register(
dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
pc =
(uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI)
<< 32 |
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_LO);
// The dispatch index into the queue
uint32_t disp_idx = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
uint32_t disp_idx =
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
// Set up reading CP_HQD_PQ_BASE and CP_HQD_PQ_BASE_HI
uint32_t pipe_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, PIPE_ID);
@@ -266,18 +260,19 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
debugfs_ioctl_write_register(dev, ioc, REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL), data);
uint32_t pq_base_lo =
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
uint32_t pq_base_hi =
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) & 0xff;
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) &
0xff;
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
uint32_t cp_hqd_pq_control_queue_size =
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) & 0x3f;
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) &
0x3f;
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
auto pkt = (hsa_kernel_dispatch_packet_t*)(
pq_base + disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t)
);
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t));
fill_record(dev, &record, se, *pc, pkt);
}
@@ -22,306 +22,305 @@
#define _osssys_4_0_OFFSET_HEADER
// addressBlock: osssys_osssysdec
// base address: 0x4280
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x0088
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x0089
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008a
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008b
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x008c
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x008f
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0090
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0091
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x0092
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x0093
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x0094
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x0097
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x0098
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x0088
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x0089
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008a
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008b
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x008c
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x008f
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0090
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0091
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x0092
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x0093
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x0094
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x0097
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x0098
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#endif
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن
@@ -24,322 +24,321 @@
#define _osssys_4_2_0_OFFSET_HEADER
// addressBlock: osssys_osssysdec
// base address: 0x4280
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x008c
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x008d
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008e
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008f
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x0090
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x0093
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0098
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0099
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x009a
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x009b
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x009c
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x009f
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x00a5
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_MEM_POWER_CTRL 0x00e8
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_MEM_POWER_CTRL 0x0110
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_CNTL 0x018e
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_RESP_UVD_1 0x01ad
#define mmSEM_RESP_UVD_1_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x008c
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x008d
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008e
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008f
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x0090
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x0093
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0098
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0099
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x009a
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x009b
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x009c
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x009f
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x00a5
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_MEM_POWER_CTRL 0x00e8
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_MEM_POWER_CTRL 0x0110
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_CNTL 0x018e
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_RESP_UVD_1 0x01ad
#define mmSEM_RESP_UVD_1_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#endif
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است Diff را بارگزاری کن

برخی از فایل ها نشان داده نشدند زیرا تعداد زیادی فایل در این تفاوت تغییر کرده اند نمایش بیشتر