Squashed commit of the following:

commit f029195705a15700380c6f832ba5d15d46fd6de7
Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Date:   Thu Jul 13 14:38:56 2023 -0500

    Formatting workflows for source (clang-format) and cmake (cmake-format) (#4)

    * Add .cmake-format.yaml file

    * Add formatting workflow

    * provide base input for creating PR

    * Update scheme for extracting branch name

    - disable running formatting on push to amd-staging branch

    * patch .cmake-format.yaml for find_package signature

    - apparently cmake-format doesn't format the full signature of find_package

    * run formatting (clang-format v11) (#7)

    Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

    * run cmake formatting (cmake-format) (#6)

    Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com>

    ---------

    Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

commit bc4d135fdd8a1a9e51235f18a5d575fd2b3735e6
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Thu Jul 13 12:55:17 2023 -0500

    Removing Build cache for potential issues with auto-generated header files (#5)

    Change-Id: I9e2319f4335e2f88585ffa6fac2bd88a1c952e6e

commit ce86dea6a311d44d880fa684eb78f3329295e2a4
Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Date:   Thu Jul 13 11:08:58 2023 -0500

    Fix decltype(<hsa-function>) function pointer usage (#3)

    - the following is done in several places:
        decltype(hsa_memory_allocate)* hsa_memory_allocate
    - above can cause compiler errors
    - replace decltype(<hsa-function>) with decltype(::<hsa-function>)
      - this ensures that the type within the decltype is recognized as the global scope HSA function, not the variable
    - in many places, the variable has a "_fn" suffix to prevent this issue but added '::' anyway for consistency

commit ac49fdd92a72e9c99394253a02da413a6c2e3b3a
Merge: a07946a 03a0855
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Wed Jul 12 11:36:24 2023 -0500

    Merge pull request #2 from ROCm-Developer-Tools/gerrit-amd-staging

    Pull from gerrit

commit 03a085588cffe863e8f466de67be1cfb205b675a
Merge: c26b32b a07946a
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Wed Jul 12 10:57:30 2023 -0500

    Merge branch 'amd-staging' into gerrit-amd-staging

commit a07946a5cd4c670c83c27ad1a076a9d4567ce6d7
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 15:46:04 2023 +0000

    Enabling Cached Builds

commit 525e494a7f13941077a8fd4ad6840904db4d27d4
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 04:53:54 2023 +0000

    Updating missed GPU Targets

commit 42c75862f628c9bee7cfb7dc04dff2619430efbc
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 04:43:02 2023 +0000

    Adding V1 Testing

commit 9d72fd4aee85e4b0c12e717060d2730fa5b73be1
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 03:34:31 2023 +0000

    Fixing Artifacts directory path

commit f4000cc558b3b2e4676f7994f7ce8c8e6f94518e
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 03:27:26 2023 +0000

    Fixing CMake for test build job

commit 2ce8115d4c33948c3c8f957f545a95a04e1d6cd2
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 03:16:18 2023 +0000

    Fixing Ubuntu CMake for ubuntu test build

commit 6d0ed439191be900748d0c025157f9d689a73ec7
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 01:28:41 2023 +0000

    Removing Navi21

commit e349a7642e5ae5eb03ab9fcd0a0f74f09f78cab5
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 01:14:14 2023 +0000

    Removing Navi21

commit fefd02fe68d2a4bca7ec2e381960ad004ee9fc5b
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 00:42:48 2023 +0000

    Fixing CMake Job

commit 2ea46abf7bf92643efa8c549fa70346ffbd79d65
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 00:35:13 2023 +0000

    Fixing CMake Job

commit d99d681ed1999c5fcf291dc678b11a77205fb0f3
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Wed Jul 12 00:32:13 2023 +0000

    Fixing Pull Latest Dockers and CMake Jobs

commit dfc4498072d13b4a1df3a63047d34c682c3d9a29
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 23:54:21 2023 +0000

    Fixing CMake job

commit 919efe04de707f7c702031be15c3e2c5f8442cbb
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 23:52:13 2023 +0000

    Adding Pull Last dockers job

commit be1b1256e8b0e05308e8f7e7e69bee3acca55281
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 18:25:40 2023 -0500

    Update cmake.yml

commit 212299fa4355ae6ec18f9aaacbb79c51ea6c6f97
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 18:23:35 2023 -0500

    Update cmake.yml

commit 7c2c1327086a61466cc6cac39f70865c051a8bc7
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 18:18:53 2023 -0500

    Update cmake.yml

commit 191b5ce007e612e814c1d7a3afb4ad398f3852e1
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Tue Jul 11 16:03:22 2023 -0500

    Update cmake.yml

commit 8824113d95f3e13c7ce4d0af8e0d9d8f522a6c4a
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 16:28:09 2023 +0000

    Fixing Pull from Gerrit job name

    Change-Id: I9e7ed9a27a13ca49d62c93bdadb30f0057e4d385

commit cc3d5e4b02ffb439e8cc2b3efa53527c376f9982
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 16:21:43 2023 +0000

    Adding Staging sync job

    Change-Id: I0551f43878b0678ce4b3e74e27d62357cf95ad95

commit b9be2eee71380a2e6dd34d520e92d0c4209277a0
Author: Ammar ELWazir <Ammar.ELWazir@amd.com>
Date:   Tue Jul 11 15:57:11 2023 +0000

    Fixing build.sh

    Change-Id: Ia987b0244f0875370d5fe69907b3f5e9cea914de

commit 9eee33a95a1abd656a7ac5ca10a9f245e9825431
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 21:39:46 2023 -0500

    Update cmake.yml

commit 7093b85a78497140e8b52632ca2a002bdaeacd62
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 21:33:29 2023 -0500

    Update cmake.yml

commit f54697172c72a67740f9fdfa0c217b6ea6931576
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 21:01:26 2023 -0500

    Update cmake.yml

commit 1b6620e16f8940386b0f4f04e69e2410d21c0e26
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 20:21:02 2023 -0500

    Update cmake.yml

commit a94bec740c6b42c4b79c87bca20fa87b99bf060d
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:46:35 2023 -0500

    Update cmake.yml

commit 85d6b29d4375a69d575c18ece8542c50f2ddfcc3
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:34:39 2023 -0500

    Update cmake.yml

commit 8c004887cf1435f1a6214c3d2455299a8a27bd4c
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:31:17 2023 -0500

    Update cmake.yml

commit a14a9168e17d9348a53c6e9c9a47ba1edb4c4509
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:25:46 2023 -0500

    Update cmake.yml

commit 000f2f40b84e6a2f7d4becdbf5aed01436ca4c83
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:08:18 2023 -0500

    Update cmake.yml

commit a28a53d56731cad848fa9133d1c4dbaa8fc7afa7
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 19:03:39 2023 -0500

    Update cmake.yml

commit a6a2db01027f0b01fdfbb5997ddb772c7f51b649
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 18:21:53 2023 -0500

    Update cmake.yml

commit 118ef2a88b2d44e3207c31c343da3e5e5ec6f176
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 17:55:57 2023 -0500

    Update cmake.yml

commit 03c4c232396440cd0be6d2dd7baf4ceea1c2589d
Author: Ammar ELWazir <aelwazir@amd.com>
Date:   Mon Jul 10 17:48:49 2023 -0500

    Create cmake.yml

Change-Id: I77992f15694e77cbae49c56f9ff02f4f9079235d


[ROCm/rocprofiler commit: d4a33cf33a]
Этот коммит содержится в:
Ammar ELWazir
2023-07-13 19:48:38 +00:00
коммит произвёл Ammar Elwazir
родитель e599708211
Коммит 6eb06cf201
144 изменённых файлов: 154836 добавлений и 166916 удалений
+98
Просмотреть файл
@@ -0,0 +1,98 @@
parse:
additional_commands:
find_package:
flags:
- EXACT
- QUIET
- MODULE
- REQUIRED
- CONFIG
- NO_MODULE
- GLOBAL
- NO_POLICY_SCOPE
- BYPASS_PROVIDER
- NO_DEFAULT_PATH
- NO_PACKAGE_ROOT_PATH
- NO_CMAKE_PATH
- NO_CMAKE_ENVIRONMENT_PATH
- NO_SYSTEM_ENVIRONMENT_PATH
- NO_CMAKE_PACKAGE_REGISTRY
- NO_CMAKE_BUILDS_PATH
- NO_CMAKE_SYSTEM_PATH
- NO_CMAKE_INSTALL_PREFIX
- NO_CMAKE_SYSTEM_PACKAGE_REGISTRY
- CMAKE_FIND_ROOT_PATH_BOTH
- ONLY_CMAKE_FIND_ROOT_PATH
- NO_CMAKE_FIND_ROOT_PATH
kwargs:
COMPONENTS: '*'
OPTIONAL_COMPONENTS: '*'
NAMES: '*'
CONFIGS: '*'
HINTS: '*'
PATHS: '*'
REGISTRY_VIEW: '*'
PATH_SUFFIXES: '*'
override_spec: {}
vartags: []
proptags: []
format:
disable: false
line_width: 90
tab_size: 4
use_tabchars: false
fractional_tab_policy: use-space
max_subgroups_hwrap: 2
max_pargs_hwrap: 8
max_rows_cmdline: 2
separate_ctrl_name_with_space: false
separate_fn_name_with_space: false
dangle_parens: false
dangle_align: child
min_prefix_chars: 4
max_prefix_chars: 10
max_lines_hwrap: 2
line_ending: unix
command_case: lower
keyword_case: upper
always_wrap: []
enable_sort: true
autosort: false
require_valid_layout: false
layout_passes: {}
markup:
bullet_char: '*'
enum_char: .
first_comment_is_literal: true
literal_comment_pattern: ^#
fence_pattern: ^\s*([`~]{3}[`~]*)(.*)$
ruler_pattern: ^\s*[^\w\s]{3}.*[^\w\s]{3}$
explicit_trailing_pattern: '#<'
hashruler_min_length: 10
canonicalize_hashrulers: true
enable_markup: true
lint:
disabled_codes: []
function_pattern: '[0-9a-z_]+'
macro_pattern: '[0-9A-Z_]+'
global_var_pattern: '[A-Z][0-9A-Z_]+'
internal_var_pattern: _[A-Z][0-9A-Z_]+
local_var_pattern: '[a-z][a-z0-9_]+'
private_var_pattern: _[0-9a-z_]+
public_var_pattern: '[A-Z][0-9A-Z_]+'
argument_var_pattern: '[a-z][a-z0-9_]+'
keyword_pattern: '[A-Z][0-9A-Z_]+'
max_conditionals_custom_parser: 2
min_statement_spacing: 1
max_statement_spacing: 2
max_returns: 6
max_branches: 12
max_arguments: 5
max_localvars: 15
max_statements: 50
encode:
emit_byteorder_mark: false
input_encoding: utf-8
output_encoding: utf-8
misc:
per_command: {}
+14 -159
Просмотреть файл
@@ -34,16 +34,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_ubuntu_22_04
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -56,14 +47,6 @@ jobs:
- name: Build Tests, Samples, Documentation, Packages
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests samples doc package
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_ubuntu_22_04
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -102,16 +85,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_ubuntu_20_04
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -153,16 +127,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_sles
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -175,14 +140,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_sles
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -212,16 +169,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_8
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -234,14 +182,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_8
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -271,16 +211,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_9
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -293,14 +224,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi200_rhel_9
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -326,16 +249,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: vega20
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -348,14 +262,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: vega20
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -381,16 +287,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: navi32
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -403,14 +300,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: navi32
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -436,16 +325,7 @@ jobs:
steps:
- uses: actions/checkout@v3
- name: Restore cached Build
id: cache-build-restore
uses: actions/cache/restore@v3
with:
path: |
${{github.workspace}}/build
key: mi100
- name: Configure CMake
if: steps.cache-build-restore.outputs.cache-hit != 'false'
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -458,14 +338,6 @@ jobs:
- name: Build Tests
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
- name: Save Build
id: cache-build-save
uses: actions/cache/save@v3
with:
path: |
${{github.workspace}}/build
key: mi100
- name: Testing V1
run: |
cd ${{github.workspace}}/build
@@ -492,16 +364,7 @@ jobs:
# steps:
# - uses: actions/checkout@v3
# - name: Restore cached Build
# id: cache-build-restore
# uses: actions/cache/restore@v3
# with:
# path: |
# ${{github.workspace}}/build
# key: navi21
# - name: Configure CMake
# if: steps.cache-build-restore.outputs.cache-hit != 'false'
# # Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
# # See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
# run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
@@ -514,26 +377,18 @@ jobs:
# - name: Build Tests
# run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
# - name: Save Build
# id: cache-build-save
# uses: actions/cache/save@v3
# with:
# path: |
# ${{github.workspace}}/build
# key: navi21
# - name: Testing V1
# run: |
# cd ${{github.workspace}}/build
# ./run.sh
# # TODO(aelwazir): Enable this once ctest is fixed
# # working-directory: ${{github.workspace}}/build/tests-v2
# # Execute tests defined by the CMake configuration.
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
# # TODO(aelwazir): Enable this once ctest is fixed
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
# - name: Testing V1
# run: |
# cd ${{github.workspace}}/build
# ./run.sh
# # TODO(aelwazir): Enable this once ctest is fixed
# # working-directory: ${{github.workspace}}/build/tests-v2
# # Execute tests defined by the CMake configuration.
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
# # TODO(aelwazir): Enable this once ctest is fixed
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
# - name: Testing V2
# run: |
# cd ${{github.workspace}}/build
# make -j check
# - name: Testing V2
# run: |
# cd ${{github.workspace}}/build
# make -j check
+95
Просмотреть файл
@@ -0,0 +1,95 @@
name: Formatting
run-name: formatting
on:
pull_request:
branches: [ amd-staging ]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
cmake:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Extract branch name
shell: bash
run: |
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
id: extract_branch
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y python3-pip
python3 -m pip install -U cmake-format
- name: Run cmake-format
run: |
set +e
cmake-format -i $(find . -type f | egrep 'CMakeLists.txt|\.cmake$')
if [ $(git diff | wc -l) -ne 0 ]; then
echo -e "\nError! CMake code not formatted. Run cmake-format...\n"
echo -e "\nFiles:\n"
git diff --name-only
echo -e "\nFull diff:\n"
git diff
exit 1
fi
- name: Create pull request
if: failure()
uses: peter-evans/create-pull-request@v5
with:
commit-message: "run cmake formatting (cmake-format)"
branch: ${{ steps.extract_branch.outputs.branch }}-cmake-format
delete-branch: true
title: "Apply cmake-format to ${{ steps.extract_branch.outputs.branch }}"
base: ${{ steps.extract_branch.outputs.branch }}
source:
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v3
- name: Install dependencies
run: |
DISTRIB_CODENAME=$(cat /etc/lsb-release | grep DISTRIB_CODENAME | awk -F '=' '{print $NF}')
sudo apt-get update
sudo apt-get install -y software-properties-common wget curl clang-format-11
- name: Extract branch name
shell: bash
run: |
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
id: extract_branch
- name: Run clang-format
run: |
set +e
FILES=$(find include plugin samples src test tests-v2 -type f | egrep '\.(h|hpp|hh|c|cc|cpp)(|\.in)$')
FORMAT_OUT=$(clang-format-11 -i ${FILES})
if [ $(git diff | wc -l) -ne 0 ]; then
echo -e "\nError! Code not formatted. Run clang-format (version 11)...\n"
echo -e "\nFiles:\n"
git diff --name-only
echo -e "\nFull diff:\n"
git diff
exit 1
fi
- name: Create pull request
if: failure()
uses: peter-evans/create-pull-request@v5
with:
commit-message: "run formatting (clang-format v11)"
branch: ${{ steps.extract_branch.outputs.branch }}-clang-format
delete-branch: true
title: "Apply clang-format (v11) to ${{ steps.extract_branch.outputs.branch }}"
base: ${{ steps.extract_branch.outputs.branch }}
+375 -407
Просмотреть файл
@@ -24,7 +24,7 @@ cmake_minimum_required(VERSION 3.18.0)
# Build is not supported on Windows plaform
if(WIN32)
message(FATAL_ERROR "Windows build is not supported.")
message(FATAL_ERROR "Windows build is not supported.")
endif()
# Set module name and project name.
@@ -37,9 +37,9 @@ include(GNUInstallDirs)
# set default ROCM_PATH
if(NOT DEFINED ROCM_PATH)
set(ROCM_PATH
"/opt/rocm"
CACHE STRING "Default ROCM installation directory")
set(ROCM_PATH
"/opt/rocm"
CACHE STRING "Default ROCM installation directory")
endif()
set(CMAKE_CXX_STANDARD 17)
@@ -62,8 +62,8 @@ set(BUILD_VERSION_MAJOR ${VERSION_MAJOR})
set(BUILD_VERSION_MINOR ${VERSION_MINOR})
set(BUILD_VERSION_PATCH ${VERSION_PATCH})
if(DEFINED VERSION_BUILD AND NOT ${VERSION_BUILD} STREQUAL "")
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
endif()
set(BUILD_VERSION_STRING
"${BUILD_VERSION_MAJOR}.${BUILD_VERSION_MINOR}.${BUILD_VERSION_PATCH}")
@@ -71,12 +71,11 @@ set(BUILD_VERSION_STRING
set(LIB_VERSION_MAJOR ${VERSION_MAJOR})
set(LIB_VERSION_MINOR ${VERSION_MINOR})
if(${ROCM_PATCH_VERSION})
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
else()
set(LIB_VERSION_PATCH ${VERSION_PATCH})
set(LIB_VERSION_PATCH ${VERSION_PATCH})
endif()
set(LIB_VERSION_STRING
"${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
set(LIB_VERSION_STRING "${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
message("-- LIB-VERSION STRING: ${LIB_VERSION_STRING}")
# Set target and root/lib/test directory
@@ -86,97 +85,84 @@ set(LIB_DIR "${ROOT_DIR}/src")
set(TEST_DIR "${ROOT_DIR}/test")
find_package(
amd_comgr
REQUIRED
CONFIG
HINTS
${CMAKE_INSTALL_PREFIX}
PATHS
${ROCM_PATH}
PATH_SUFFIXES
lib/cmake/amd_comgr)
amd_comgr REQUIRED CONFIG
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH}
PATH_SUFFIXES lib/cmake/amd_comgr)
message(STATUS "Code Object Manager found at ${amd_comgr_DIR}.")
link_libraries(amd_comgr)
find_package(Threads REQUIRED)
find_package(
hsa-runtime64
REQUIRED
CONFIG
HINTS
${CMAKE_INSTALL_PREFIX}
PATHS
${ROCM_PATH})
hsa-runtime64 REQUIRED CONFIG
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH})
find_package(
HIP
REQUIRED
CONFIG
HINTS
${CMAKE_INSTALL_PREFIX}
PATHS
${ROCM_PATH})
HIP REQUIRED CONFIG
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH})
find_library(NUMA NAME numa REQUIRED)
link_libraries(${NUMA})
find_program(ROCMINFO_EXEC NAMES "rocminfo"
PATHS ${ROCM_PATH}
${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
PATH_SUFFIXES bin)
find_program(
ROCMINFO_EXEC
NAMES "rocminfo"
PATHS ${ROCM_PATH} ${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
PATH_SUFFIXES bin)
set(ORIGINAL_SCRIPT_PATH ${CMAKE_CURRENT_SOURCE_DIR}/bin/tblextr.py)
set(OUTPUT_SCRIPT_PATH ${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py)
configure_file(${ORIGINAL_SCRIPT_PATH} ${OUTPUT_SCRIPT_PATH} @ONLY)
get_property(
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
include_directories(${HSA_RUNTIME_INC_PATH})
if(NOT DEFINED LIBRARY_TYPE)
set(LIBRARY_TYPE SHARED)
set(LIBRARY_TYPE SHARED)
endif()
# Enable tracing API
if(NOT USE_PROF_API)
set(USE_PROF_API 1)
set(USE_PROF_API 1)
endif()
# Protocol header lookup
set(PROF_API_HEADER_NAME prof_protocol.h)
if(USE_PROF_API EQUAL 1)
find_path(
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
HINTS ${PROF_API_HEADER_PATH}
PATHS /opt/rocm/include
PATH_SUFFIXES roctracer/ext)
if(NOT PROF_API_HEADER_DIR)
message(
FATAL_ERROR
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
)
else()
include_directories(${PROF_API_HEADER_DIR})
message(
STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
endif()
find_path(
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
HINTS ${PROF_API_HEADER_PATH}
PATHS /opt/rocm/include
PATH_SUFFIXES roctracer/ext)
if(NOT PROF_API_HEADER_DIR)
message(
FATAL_ERROR
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
)
else()
include_directories(${PROF_API_HEADER_DIR})
message(STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
endif()
endif()
# Build libraries
add_subdirectory(src)
if(${LIBRARY_TYPE} STREQUAL SHARED)
# Build samples
add_subdirectory(samples)
# Build samples
add_subdirectory(samples)
# Build tests
add_subdirectory(tests-v2)
# Build tests
add_subdirectory(tests-v2)
endif()
# Build Plugins
@@ -188,20 +174,20 @@ add_subdirectory(${TEST_DIR} ${PROJECT_BINARY_DIR}/test)
# Installation and packaging
set(DEST_NAME ${ROCPROFILER_NAME})
if(DEFINED CMAKE_INSTALL_PREFIX)
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
endif()
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
endif()
endif()
if(DEFINED CPACK_PACKAGING_INSTALL_PREFIX)
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
endif()
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
if(prefix_name STREQUAL ${DEST_NAME})
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
endif()
else()
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
endif()
message("CMake-install-prefix: ${CMAKE_INSTALL_PREFIX}")
message("CPack-install-prefix: ${CPACK_PACKAGING_INSTALL_PREFIX}")
@@ -209,413 +195,395 @@ message("-----------Dest-name: ${DEST_NAME}")
# Install headers
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
COMPONENT dev)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
COMPONENT dev)
# rpl_run.sh
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
RENAME rocprof
COMPONENT runtime)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
RENAME rocprof
COMPONENT runtime)
configure_file(bin/rocprofv2 ${PROJECT_BINARY_DIR} COPYONLY)
install(
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
DESTINATION ${CMAKE_INSTALL_BINDIR}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
COMPONENT runtime)
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
COMPONENT runtime)
# gfx_metrics.xml metrics.xml
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
# librocprof-tool.so
install(
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT runtime)
install(
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT asan)
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
COMPONENT asan)
install(
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
PERMISSIONS
OWNER_READ
OWNER_WRITE
OWNER_EXECUTE
GROUP_READ
GROUP_EXECUTE
WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
WORLD_EXECUTE
COMPONENT runtime)
# File reorg backward compatibility for non ASAN packaging
if ( NOT ENABLE_ASAN_PACKAGING )
# File reorg Backward compatibility
option(FILE_REORG_BACKWARD_COMPATIBILITY
"Enable File Reorg with backward compatibility" ON)
if(NOT ENABLE_ASAN_PACKAGING)
# File reorg Backward compatibility
option(FILE_REORG_BACKWARD_COMPATIBILITY
"Enable File Reorg with backward compatibility" ON)
endif()
if(FILE_REORG_BACKWARD_COMPATIBILITY)
# To enabe/disable #error in wrapper header files
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}"
CACHE STRING "Header wrapper warnings as errors.")
else()
set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.")
# To enabe/disable #error in wrapper header files
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
set(ROCM_HEADER_WRAPPER_WERROR
"$ENV{ROCM_HEADER_WRAPPER_WERROR}"
CACHE STRING "Header wrapper warnings as errors.")
else()
set(ROCM_HEADER_WRAPPER_WERROR
"OFF"
CACHE STRING "Header wrapper warnings as errors.")
endif()
endif()
endif()
if(ROCM_HEADER_WRAPPER_WERROR)
set(deprecated_error 1)
else()
set(deprecated_error 0)
endif()
include(rocprofiler-backward-compat.cmake)
endif() #FILE_REORG_BACKWARD_COMPATIBILITY
if(ROCM_HEADER_WRAPPER_WERROR)
set(deprecated_error 1)
else()
set(deprecated_error 0)
endif()
include(rocprofiler-backward-compat.cmake)
endif() # FILE_REORG_BACKWARD_COMPATIBILITY
if(${LIBRARY_TYPE} STREQUAL SHARED)
# Packaging directives
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
set(ENABLE_LDCONFIG
ON
CACHE BOOL "Set library links and caches using ldconfig.")
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
set(CPACK_PACKAGE_VERSION
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
)
set(CPACK_PACKAGE_CONTACT
"ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
"ROCPROFILER library for AMD HSA runtime API extension support")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
# Packaging directives
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
set(ENABLE_LDCONFIG
ON
CACHE BOOL "Set library links and caches using ldconfig.")
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
set(CPACK_PACKAGE_VERSION
"${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
endif()
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
)
set(CPACK_PACKAGE_CONTACT "ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
"ROCPROFILER library for AMD HSA runtime API extension support")
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
set(CPACK_PACKAGE_VERSION "${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
endif()
# Debian package specific variable for ASAN
set(CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan")
# Debian package specific variable for ASAN
set ( CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
set ( CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan" )
# Install license file
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}
COMPONENT runtime)
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
COMPONENT asan)
# Install license file
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}
COMPONENT runtime)
install(
FILES ${CPACK_RESOURCE_FILE_LICENSE}
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
COMPONENT asan)
# Debian package specific variables
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
else()
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
endif()
# Debian package specific variables
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
else()
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
endif()
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
set(CPACK_DEB_COMPONENT_INSTALL ON)
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
"hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev"
)
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
set(CPACK_DEB_COMPONENT_INSTALL ON)
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS "hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev")
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md")
set ( CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
# RPM package specific variables
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
else()
set(CPACK_RPM_PACKAGE_RELEASE "local")
endif()
# RPM package specific variables
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
else()
set(CPACK_RPM_PACKAGE_RELEASE "local")
endif()
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
# 'dist' breaks manual builds on debian systems due to empty Provides
execute_process(
COMMAND rpm --eval %{?dist}
RESULT_VARIABLE PROC_RESULT
OUTPUT_VARIABLE EVAL_RESULT
OUTPUT_STRIP_TRAILING_WHITESPACE)
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
# 'dist' breaks manual builds on debian systems due to empty Provides
execute_process(
COMMAND rpm --eval %{?dist}
RESULT_VARIABLE PROC_RESULT
OUTPUT_VARIABLE EVAL_RESULT
OUTPUT_STRIP_TRAILING_WHITESPACE)
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
endif()
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
endif()
set(CPACK_RPM_COMPONENT_INSTALL ON)
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
"hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
set(CPACK_RPM_COMPONENT_INSTALL ON)
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES "hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
#Disable build id for rocprofiler as its creating transaction error
set ( CPACK_RPM_SPEC_MORE_DEFINE "%define _build_id_links none
# Disable build id for rocprofiler as its creating transaction error
set(CPACK_RPM_SPEC_MORE_DEFINE
"%define _build_id_links none
%global __strip ${CPACK_STRIP_EXECUTABLE}
%global __objdump ${CPACK_OBJDUMP_EXECUTABLE}
%global __objcopy ${CPACK_OBJCOPY_EXECUTABLE}
%global __readelf ${CPACK_READELF_EXECUTABLE}")
# RPM package specific variable for ASAN
set ( CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
set ( CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan" )
#set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
# RPM package specific variable for ASAN
set(CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan")
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
if(NOT ROCM_DEP_ROCMCORE)
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
string(REGEX
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
string(REGEX
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
string(REGEX
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
endif()
# set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
## set components
if(ENABLE_ASAN_PACKAGING)
# ASAN Package requires only asan component with libraries and license file
set(CPACK_COMPONENTS_ALL asan)
else()
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
endif()
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
if(NOT ROCM_DEP_ROCMCORE)
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
endif()
include(CPack)
# set components
if(ENABLE_ASAN_PACKAGING)
# ASAN Package requires only asan component with libraries and license file
set(CPACK_COMPONENTS_ALL asan)
else()
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
endif()
cpack_add_component(
runtime
DISPLAY_NAME "Runtime"
DESCRIPTION "Dynamic libraries for the ROCProfiler")
include(CPack)
cpack_add_component(
dev
DISPLAY_NAME "Development"
DESCRIPTION "Development needed header files for ROCProfiler"
DEPENDS runtime)
cpack_add_component(
runtime
DISPLAY_NAME "Runtime"
DESCRIPTION "Dynamic libraries for the ROCProfiler")
cpack_add_component(
plugins
DISPLAY_NAME "ROCProfile Plugins"
DESCRIPTION "Plugins for handling ROCProfiler data output"
DEPENDS runtime)
cpack_add_component(
dev
DISPLAY_NAME "Development"
DESCRIPTION "Development needed header files for ROCProfiler"
DEPENDS runtime)
cpack_add_component(
tests
DISPLAY_NAME "Tests"
DESCRIPTION "Tests for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
plugins
DISPLAY_NAME "ROCProfile Plugins"
DESCRIPTION "Plugins for handling ROCProfiler data output"
DEPENDS runtime)
cpack_add_component(
samples
DISPLAY_NAME "Samples"
DESCRIPTION "Samples for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
tests
DISPLAY_NAME "Tests"
DESCRIPTION "Tests for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
docs
DISPLAY_NAME "Documentation"
DESCRIPTION "Documentation for the ROCProfiler API"
DEPENDS dev)
cpack_add_component(
samples
DISPLAY_NAME "Samples"
DESCRIPTION "Samples for the ROCProfiler"
DEPENDS dev)
cpack_add_component(
asan
DISPLAY_NAME "ASAN"
DESCRIPTION "ASAN libraries for the ROCPROFILER"
DEPENDS asan)
cpack_add_component(
docs
DISPLAY_NAME "Documentation"
DESCRIPTION "Documentation for the ROCProfiler API"
DEPENDS dev)
cpack_add_component(
asan
DISPLAY_NAME "ASAN"
DESCRIPTION "ASAN libraries for the ROCPROFILER"
DEPENDS asan)
endif()
find_package(Doxygen)
if(DOXYGEN_FOUND)
# # Set input and output files for API Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
# # Set input and output files for API Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
COMMENT "Generating API documentation")
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
COMMENT "Generating API documentation")
add_custom_target(
doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
add_custom_target(doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
OPTIONAL
COMPONENT docs)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
# # Set input and output files for Tools Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
# # Set input and output files for Tools Document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
COMMENT "Generating Tools documentation")
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
COMMENT "Generating Tools documentation")
add_custom_target(
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
add_custom_target(
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_tool.pdf"
OPTIONAL
COMPONENT docs)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}v2_tool.pdf"
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
install(
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
OPTIONAL
COMPONENT docs)
# # Set input and output files for changelog document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
# # Set input and output files for changelog document
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
# # Request to configure the file
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
COMMENT "Generating changelog documentation")
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
MAIN_DEPENDENCY ${DOXYGEN_OUT}
${DOXYGEN_IN}
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
COMMENT "Generating changelog documentation")
add_custom_target(
doc_changelog DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
add_custom_target(doc_changelog
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
OPTIONAL
COMPONENT docs)
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
DESTINATION ${CMAKE_INSTALL_DOCDIR}
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
OPTIONAL
COMPONENT docs)
add_dependencies(doc doc_changelog)
add_dependencies(doc doc_changelog)
endif()
+7 -14
Просмотреть файл
@@ -5,23 +5,16 @@
# - LIBDW_INCLUDE_DIRS - the libelf include directory
# - LIBDW_LIBRARIES - Link these to use libelf
# - LIBDW_DEFINITIONS - Compiler switches required for using libelf
find_path(FIND_LIBDW_INCLUDES
NAMES
elfutils/libdw.h
PATHS
/usr/include
/usr/local/include)
find_path(
FIND_LIBDW_INCLUDES
NAMES elfutils/libdw.h
PATHS /usr/include /usr/local/include)
find_library(FIND_LIBDW_LIBRARIES
NAMES
dw
PATH
/usr/lib
/usr/local/lib)
find_library(FIND_LIBDW_LIBRARIES NAMES dw PATH /usr/lib /usr/local/lib)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(LibDw DEFAULT_MSG
FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
find_package_handle_standard_args(LibDw DEFAULT_MSG FIND_LIBDW_INCLUDES
FIND_LIBDW_LIBRARIES)
mark_as_advanced(FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
set(LIBDW_INCLUDES ${FIND_LIBDW_INCLUDES})
+7 -16
Просмотреть файл
@@ -5,25 +5,16 @@
# - LIBELF_INCLUDE_DIRS - the libelf include directory
# - LIBELF_LIBRARIES - Link these to use libelf
# - LIBELF_DEFINITIONS - Compiler switches required for using libelf
find_path(FIND_LIBELF_INCLUDES
NAMES
libelf.h
PATHS
/usr/include
/usr/include/libelf
/usr/local/include
/usr/local/include/libelf)
find_path(
FIND_LIBELF_INCLUDES
NAMES libelf.h
PATHS /usr/include /usr/include/libelf /usr/local/include /usr/local/include/libelf)
find_library(FIND_LIBELF_LIBRARIES
NAMES
elf
PATH
/usr/lib
/usr/local/lib)
find_library(FIND_LIBELF_LIBRARIES NAMES elf PATH /usr/lib /usr/local/lib)
include(FindPackageHandleStandardArgs)
find_package_handle_standard_args(LibElf DEFAULT_MSG
FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
find_package_handle_standard_args(LibElf DEFAULT_MSG FIND_LIBELF_INCLUDES
FIND_LIBELF_LIBRARIES)
mark_as_advanced(FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
set(LIBELF_INCLUDES ${FIND_LIBELF_INCLUDES})
+51 -36
Просмотреть файл
@@ -20,60 +20,75 @@
# THE SOFTWARE.
################################################################################
## Linux Compiler options
# Linux Compiler options
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fms-extensions")
add_definitions ( -DNEW_TRACE_API=1 )
add_definitions(-DNEW_TRACE_API=1)
## CLANG options
# CLANG options
if("$ENV{CXX}" STREQUAL "/usr/bin/clang++")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
endif()
## Enable debug trace
if ( DEFINED ENV{CMAKE_DEBUG_TRACE} )
add_definitions ( -DDEBUG_TRACE=1 )
# Enable debug trace
if(DEFINED ENV{CMAKE_DEBUG_TRACE})
add_definitions(-DDEBUG_TRACE=1)
endif()
## Enable AQL-profile new API
if ( NOT DEFINED ENV{CMAKE_CURR_API} )
add_definitions ( -DAQLPROF_NEW_API=1 )
# Enable AQL-profile new API
if(NOT DEFINED ENV{CMAKE_CURR_API})
add_definitions(-DAQLPROF_NEW_API=1)
endif()
## Enable direct loading of AQL-profile HSA extension
if ( DEFINED ENV{CMAKE_LD_AQLPROFILE} )
add_definitions ( -DROCP_LD_AQLPROFILE=1 )
# Enable direct loading of AQL-profile HSA extension
if(DEFINED ENV{CMAKE_LD_AQLPROFILE})
add_definitions(-DROCP_LD_AQLPROFILE=1)
endif()
## Find hsa-runtime
find_package(hsa-runtime64 CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsa-runtime64 )
# Find hsa-runtime
find_package(
hsa-runtime64 CONFIG REQUIRED
HINTS ${CMAKE_PREFIX_PATH}
PATHS /opt/rocm
PATH_SUFFIXES lib/cmake/hsa-runtime64)
# find KFD thunk
find_package(hsakmt CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsakmt )
find_package(
hsakmt CONFIG REQUIRED
HINTS ${CMAKE_PREFIX_PATH}
PATHS /opt/rocm
PATH_SUFFIXES lib/cmake/hsakmt)
## Find ROCm
## TODO: Need a better method to find the ROCm path
find_path ( HSA_KMT_INC_PATH "hsakmt/hsakmt.h" HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES include )
if ( "${HSA_KMT_INC_PATH}" STREQUAL "" )
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
# Find ROCm TODO: Need a better method to find the ROCm path
find_path(
HSA_KMT_INC_PATH "hsakmt/hsakmt.h"
HINTS ${CMAKE_PREFIX_PATH}
PATHS /opt/rocm
PATH_SUFFIXES include)
if("${HSA_KMT_INC_PATH}" STREQUAL "")
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
endif()
## Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
get_filename_component ( ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY )
# Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
get_filename_component(ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY)
## Basic Tool Chain Information
message ( "----------Build-Type: ${CMAKE_BUILD_TYPE}" )
message ( "------------Compiler: ${CMAKE_CXX_COMPILER}" )
message ( "----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}" )
message ( "-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}" )
message ( "-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}" )
message ( "---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}" )
message ( "---------GPU_TARGETS: ${GPU_TARGETS}" )
# Basic Tool Chain Information
message("----------Build-Type: ${CMAKE_BUILD_TYPE}")
message("------------Compiler: ${CMAKE_CXX_COMPILER}")
message("----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}")
message("-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}")
message("-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
message("---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}")
message("---------GPU_TARGETS: ${GPU_TARGETS}")
if ( "${ROCM_ROOT_DIR}" STREQUAL "" )
message ( FATAL_ERROR "ROCM_ROOT_DIR is not found." )
endif ()
if("${ROCM_ROOT_DIR}" STREQUAL "")
message(FATAL_ERROR "ROCM_ROOT_DIR is not found.")
endif()
find_library(FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_ROOT_DIR} PATH_SUFFIXES lib REQUIRED)
find_library(
FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so"
HINTS ${CMAKE_PREFIX_PATH}
PATHS ${ROCM_ROOT_DIR}
PATH_SUFFIXES lib REQUIRED)
if(NOT FIND_AQL_PROFILE_LIB)
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
endif()
+82 -64
Просмотреть файл
@@ -20,77 +20,95 @@
# THE SOFTWARE.
################################################################################
## Parses the VERSION_STRING variable and places
## the first, second and third number values in
## the major, minor and patch variables.
function( parse_version VERSION_STRING )
# Parses the VERSION_STRING variable and places the first, second and third number values
# in the major, minor and patch variables.
function(parse_version VERSION_STRING)
string ( FIND ${VERSION_STRING} "-" STRING_INDEX )
string(FIND ${VERSION_STRING} "-" STRING_INDEX)
if ( ${STRING_INDEX} GREATER -1 )
math ( EXPR STRING_INDEX "${STRING_INDEX} + 1" )
string ( SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD )
endif ()
if(${STRING_INDEX} GREATER -1)
math(EXPR STRING_INDEX "${STRING_INDEX} + 1")
string(SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD)
endif()
string ( REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING} )
list ( LENGTH VERSIONS VERSION_COUNT )
string(REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING})
list(LENGTH VERSIONS VERSION_COUNT)
if ( ${VERSION_COUNT} GREATER 0)
list ( GET VERSIONS 0 MAJOR )
set ( VERSION_MAJOR ${MAJOR} PARENT_SCOPE )
set ( TEMP_VERSION_STRING "${MAJOR}" )
endif ()
if(${VERSION_COUNT} GREATER 0)
list(GET VERSIONS 0 MAJOR)
set(VERSION_MAJOR
${MAJOR}
PARENT_SCOPE)
set(TEMP_VERSION_STRING "${MAJOR}")
endif()
if ( ${VERSION_COUNT} GREATER 1 )
list ( GET VERSIONS 1 MINOR )
set ( VERSION_MINOR ${MINOR} PARENT_SCOPE )
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}" )
endif ()
if(${VERSION_COUNT} GREATER 1)
list(GET VERSIONS 1 MINOR)
set(VERSION_MINOR
${MINOR}
PARENT_SCOPE)
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}")
endif()
if ( ${VERSION_COUNT} GREATER 2 )
list ( GET VERSIONS 2 PATCH )
set ( VERSION_PATCH ${PATCH} PARENT_SCOPE )
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}" )
endif ()
if(${VERSION_COUNT} GREATER 2)
list(GET VERSIONS 2 PATCH)
set(VERSION_PATCH
${PATCH}
PARENT_SCOPE)
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}")
endif()
if ( DEFINED VERSION_BUILD )
set ( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
endif ()
if(DEFINED VERSION_BUILD)
set(VERSION_BUILD
"${VERSION_BUILD}"
PARENT_SCOPE)
endif()
set ( VERSION_STRING "${TEMP_VERSION_STRING}" PARENT_SCOPE )
endfunction ()
## Gets the current version of the repository
## using versioning tags and git describe.
## Passes back a packaging version string
## and a library version string.
function ( get_version DEFAULT_VERSION_STRING )
parse_version ( ${DEFAULT_VERSION_STRING} )
find_program ( GIT NAMES git )
if ( GIT )
execute_process ( COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE GIT_TAG_STRING
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RESULT )
if ( ${RESULT} EQUAL 0 )
parse_version ( ${GIT_TAG_STRING} )
endif ()
endif ()
set( VERSION_STRING "${VERSION_STRING}" PARENT_SCOPE )
set( VERSION_MAJOR "${VERSION_MAJOR}" PARENT_SCOPE )
set( VERSION_MINOR "${VERSION_MINOR}" PARENT_SCOPE )
set( VERSION_PATCH "${VERSION_PATCH}" PARENT_SCOPE )
set( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
set(VERSION_STRING
"${TEMP_VERSION_STRING}"
PARENT_SCOPE)
endfunction()
# Gets the current version of the repository using versioning tags and git describe.
# Passes back a packaging version string and a library version string.
function(get_version DEFAULT_VERSION_STRING)
parse_version(${DEFAULT_VERSION_STRING})
find_program(GIT NAMES git)
if(GIT)
execute_process(
COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
OUTPUT_VARIABLE GIT_TAG_STRING
OUTPUT_STRIP_TRAILING_WHITESPACE
RESULT_VARIABLE RESULT)
if(${RESULT} EQUAL 0)
parse_version(${GIT_TAG_STRING})
endif()
endif()
set(VERSION_STRING
"${VERSION_STRING}"
PARENT_SCOPE)
set(VERSION_MAJOR
"${VERSION_MAJOR}"
PARENT_SCOPE)
set(VERSION_MINOR
"${VERSION_MINOR}"
PARENT_SCOPE)
set(VERSION_PATCH
"${VERSION_PATCH}"
PARENT_SCOPE)
set(VERSION_BUILD
"${VERSION_BUILD}"
PARENT_SCOPE)
endfunction()
+165 -165
Просмотреть файл
@@ -164,12 +164,12 @@ typedef struct {
// Profiling feature type
typedef struct {
rocprofiler_feature_kind_t kind; // feature kind
rocprofiler_feature_kind_t kind; // feature kind
union {
const char* name; // feature name
const char* name; // feature name
struct {
const char* block; // counter block name
uint32_t event; // counter event id
const char* block; // counter block name
uint32_t event; // counter event id
} counter;
};
const rocprofiler_parameter_t* parameters; // feature parameters array
@@ -216,23 +216,25 @@ typedef struct {
} rocprofiler_properties_t;
// Create new profiling context
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_properties_t* properties); // profiling properties
// Add feature to a features set
hsa_status_t rocprofiler_add_feature(const rocprofiler_feature_t* feature, // [in]
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
hsa_status_t rocprofiler_add_feature(
const rocprofiler_feature_t* feature, // [in]
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
// Create new profiling context
hsa_status_t rocprofiler_features_set_open(hsa_agent_t agent, // GPU handle
rocprofiler_feature_set_t* features_set, // [in] profiling features set
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_properties_t* properties); // profiling properties
hsa_status_t rocprofiler_features_set_open(
hsa_agent_t agent, // GPU handle
rocprofiler_feature_set_t* features_set, // [in] profiling features set
rocprofiler_t** context, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_properties_t* properties); // profiling properties
// Delete profiling info
hsa_status_t rocprofiler_close(rocprofiler_t* context); // [in] profiling context
@@ -242,24 +244,24 @@ hsa_status_t rocprofiler_reset(rocprofiler_t* context, // [in] profiling contex
uint32_t group_index); // group index
// Return context agent
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
hsa_agent_t* agent); // [out] GPU handle
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
hsa_agent_t* agent); // [out] GPU handle
// Supported time value ID
typedef enum {
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
} rocprofiler_time_id_t;
// Return time value for a given time ID and profiling timestamp
hsa_status_t rocprofiler_get_time(
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
uint64_t timestamp, // profiling timestamp
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
uint64_t timestamp, // profiling timestamp
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
////////////////////////////////////////////////////////////////////////////////
// Queue callbacks
@@ -269,26 +271,26 @@ hsa_status_t rocprofiler_get_time(
// Dispatch record
typedef struct {
uint64_t dispatch; // dispatch timestamp, ns
uint64_t begin; // kernel begin timestamp, ns
uint64_t end; // kernel end timestamp, ns
uint64_t complete; // completion signal timestamp, ns
uint64_t dispatch; // dispatch timestamp, ns
uint64_t begin; // kernel begin timestamp, ns
uint64_t end; // kernel end timestamp, ns
uint64_t complete; // completion signal timestamp, ns
} rocprofiler_dispatch_record_t;
// Profiling callback data
typedef struct {
hsa_agent_t agent; // GPU agent handle
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
const hsa_queue_t* queue; // HSA queue
uint64_t queue_index; // Index in the queue
uint32_t queue_id; // Queue id
hsa_signal_t completion_signal; // Completion signal
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
const char* kernel_name; // Kernel name
uint64_t kernel_object; // Kernel object address
const amd_kernel_code_t* kernel_code; // Kernel code pointer
uint32_t thread_id; // Thread id
const rocprofiler_dispatch_record_t* record; // Dispatch record
hsa_agent_t agent; // GPU agent handle
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
const hsa_queue_t* queue; // HSA queue
uint64_t queue_index; // Index in the queue
uint32_t queue_id; // Queue id
hsa_signal_t completion_signal; // Completion signal
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
const char* kernel_name; // Kernel name
uint64_t kernel_object; // Kernel object address
const amd_kernel_code_t* kernel_code; // Kernel code pointer
uint32_t thread_id; // Thread id
const rocprofiler_dispatch_record_t* record; // Dispatch record
} rocprofiler_callback_data_t;
// Profiling callback type
@@ -299,15 +301,14 @@ typedef hsa_status_t (*rocprofiler_callback_t)(
// Queue callbacks
typedef struct {
rocprofiler_callback_t dispatch; // dispatch callback
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
rocprofiler_callback_t dispatch; // dispatch callback
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
} rocprofiler_queue_callbacks_t;
// Set queue callbacks
hsa_status_t rocprofiler_set_queue_callbacks(
rocprofiler_queue_callbacks_t callbacks, // callbacks
void* data); // [in/out] passed callbacks data
hsa_status_t rocprofiler_set_queue_callbacks(rocprofiler_queue_callbacks_t callbacks, // callbacks
void* data); // [in/out] passed callbacks data
// Remove queue callbacks
hsa_status_t rocprofiler_remove_queue_callbacks();
@@ -323,20 +324,20 @@ hsa_status_t rocprofiler_stop_queue_callbacks();
// contect.invocations' to collect all profiling data
// Start profiling
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Stop profiling
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Read profiling
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Read profiling data
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
uint32_t group_index); // group index
// Get profiling groups count
hsa_status_t rocprofiler_group_count(const rocprofiler_t* context, // [in] profiling context
@@ -379,75 +380,76 @@ hsa_status_t rocprofiler_iterate_trace_data(
// Profiling info kind
typedef enum {
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
} rocprofiler_info_kind_t;
// Profiling info query
typedef union {
rocprofiler_info_kind_t info_kind; // queried profiling info kind
rocprofiler_info_kind_t info_kind; // queried profiling info kind
struct {
const char* trace_name; // queried info trace name
const char* trace_name; // queried info trace name
} trace_parameter;
} rocprofiler_info_query_t;
// Profiling info data
typedef struct {
uint32_t agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
rocprofiler_info_kind_t kind; // info data kind
uint32_t
agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
rocprofiler_info_kind_t kind; // info data kind
union {
struct {
const char* name; // metric name
uint32_t instances; // instances number
const char* expr; // metric expression, NULL for basic counters
const char* description; // metric description
const char* block_name; // block name
uint32_t block_counters; // number of block counters
const char* name; // metric name
uint32_t instances; // instances number
const char* expr; // metric expression, NULL for basic counters
const char* description; // metric description
const char* block_name; // block name
uint32_t block_counters; // number of block counters
} metric;
struct {
const char* name; // trace name
const char* description; // trace description
uint32_t parameter_count; // supported by the trace number parameters
const char* name; // trace name
const char* description; // trace description
uint32_t parameter_count; // supported by the trace number parameters
} trace;
struct {
uint32_t code; // parameter code
const char* trace_name; // trace name
const char* parameter_name; // parameter name
const char* description; // trace parameter description
uint32_t code; // parameter code
const char* trace_name; // trace name
const char* parameter_name; // parameter name
const char* description; // trace parameter description
} trace_parameter;
};
} rocprofiler_info_data_t;
// Return the info for a given info kind
hsa_status_t rocprofiler_get_info(
const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
void *data); // [in/out] returned data
hsa_status_t rocprofiler_get_info(const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
void* data); // [in/out] returned data
// Iterate over the info for a given info kind, and invoke an application-defined callback on every iteration
hsa_status_t rocprofiler_iterate_info(
const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
void *data); // [in/out] data passed to callback
// Iterate over the info for a given info kind, and invoke an application-defined callback on every
// iteration
hsa_status_t rocprofiler_iterate_info(const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_kind_t kind, // kind of iterated info
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
void* data), // callback
void* data); // [in/out] data passed to callback
// Iterate over the info for a given info query, and invoke an application-defined callback on every iteration
hsa_status_t rocprofiler_query_info(
const hsa_agent_t *agent, // [in] GFXIP handle
rocprofiler_info_query_t query, // iterated info query
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
void *data); // [in/out] data passed to callback
// Iterate over the info for a given info query, and invoke an application-defined callback on every
// iteration
hsa_status_t rocprofiler_query_info(const hsa_agent_t* agent, // [in] GFXIP handle
rocprofiler_info_query_t query, // iterated info query
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
void* data), // callback
void* data); // [in/out] data passed to callback
// Create a profiled queue. All dispatches on this queue will be profiled
hsa_status_t rocprofiler_queue_create_profiled(
hsa_agent_t agent_handle,uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data),
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
hsa_queue_t** queue);
hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue);
////////////////////////////////////////////////////////////////////////////////
// Profiling pool
@@ -461,8 +463,8 @@ typedef void rocprofiler_pool_t;
// Profiling pool entry
typedef struct {
rocprofiler_t* context; // context object
void* payload; // payload data object
rocprofiler_t* context; // context object
void* payload; // payload data object
} rocprofiler_pool_entry_t;
// Profiling handler, calling on profiling completion
@@ -478,120 +480,118 @@ typedef struct {
// Open profiling pool
hsa_status_t rocprofiler_pool_open(
hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_pool_t** pool, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_pool_properties_t*); // pool properties
hsa_agent_t agent, // GPU handle
rocprofiler_feature_t* features, // [in] profiling features array
uint32_t feature_count, // profiling info count
rocprofiler_pool_t** pool, // [out] context object
uint32_t mode, // profiling mode mask
rocprofiler_pool_properties_t*); // pool properties
// Close profiling pool
hsa_status_t rocprofiler_pool_close(
rocprofiler_pool_t* pool); // profiling pool handle
hsa_status_t rocprofiler_pool_close(rocprofiler_pool_t* pool); // profiling pool handle
// Fetch profiling pool entry
hsa_status_t rocprofiler_pool_fetch(
rocprofiler_pool_t* pool, // profiling pool handle
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
rocprofiler_pool_t* pool, // profiling pool handle
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
// Release profiling pool entry
hsa_status_t rocprofiler_pool_release(
rocprofiler_pool_entry_t* entry); // released profiling pool entry
rocprofiler_pool_entry_t* entry); // released profiling pool entry
// Iterate fetched profiling pool entries
hsa_status_t rocprofiler_pool_iterate(
rocprofiler_pool_t* pool, // profiling pool handle
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), // callback
void *data); // [in/out] data passed to callback
hsa_status_t rocprofiler_pool_iterate(rocprofiler_pool_t* pool, // profiling pool handle
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry,
void* data), // callback
void* data); // [in/out] data passed to callback
// Flush completed entries in profiling pool
hsa_status_t rocprofiler_pool_flush(
rocprofiler_pool_t* pool); // profiling pool handle
hsa_status_t rocprofiler_pool_flush(rocprofiler_pool_t* pool); // profiling pool handle
////////////////////////////////////////////////////////////////////////////////
// HSA intercepting API
// HSA callbacks ID enumeration
typedef enum {
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
} rocprofiler_hsa_cb_id_t;
// HSA callback data type
typedef struct {
union {
struct {
const void* ptr; // allocated area ptr
size_t size; // allocated area size, zero size means 'free' callback
hsa_amd_segment_t segment; // allocated area's memory segment type
const void* ptr; // allocated area ptr
size_t size; // allocated area size, zero size means 'free' callback
hsa_amd_segment_t segment; // allocated area's memory segment type
hsa_amd_memory_pool_global_flag_t global_flag; // allocated area's memory global flag
int is_code; // equal to 1 if code is allocated
} allocate;
struct {
hsa_device_type_t type; // type of assigned device
uint32_t id; // id of assigned device
hsa_agent_t agent; // device HSA agent handle
const void* ptr; // ptr the device is assigned to
hsa_device_type_t type; // type of assigned device
uint32_t id; // id of assigned device
hsa_agent_t agent; // device HSA agent handle
const void* ptr; // ptr the device is assigned to
} device;
struct {
const void* dst; // memcopy dst ptr
const void* src; // memcopy src ptr
size_t size; // memcopy size bytes
const void* dst; // memcopy dst ptr
const void* src; // memcopy src ptr
size_t size; // memcopy size bytes
} memcopy;
struct {
const void* packet; // submitted to GPU packet
const char* kernel_name; // kernel name, not NULL if dispatch
hsa_queue_t* queue; // HSA queue the kernel was submitted to
uint32_t device_type; // type of device the packed is submitted to
uint32_t device_id; // id of device the packed is submitted to
const void* packet; // submitted to GPU packet
const char* kernel_name; // kernel name, not NULL if dispatch
hsa_queue_t* queue; // HSA queue the kernel was submitted to
uint32_t device_type; // type of device the packed is submitted to
uint32_t device_id; // id of device the packed is submitted to
} submit;
struct {
uint64_t object; // kernel symbol object
const char* name; // kernel symbol name
uint32_t name_length; // kernel symbol name length
int unload; // symbol executable destroy
uint64_t object; // kernel symbol object
const char* name; // kernel symbol name
uint32_t name_length; // kernel symbol name length
int unload; // symbol executable destroy
} ksymbol;
struct {
uint32_t storage_type; // code object storage type
int storage_file; // origin file descriptor
uint64_t memory_base; // origin memory base
uint64_t memory_size; // origin memory size
uint64_t load_base; // codeobj load base
uint64_t load_size; // codeobj load size
uint64_t load_delta; // codeobj load size
uint32_t uri_length; // URI string length
char* uri; // URI string
int unload; // unload flag
uint32_t storage_type; // code object storage type
int storage_file; // origin file descriptor
uint64_t memory_base; // origin memory base
uint64_t memory_size; // origin memory size
uint64_t load_base; // codeobj load base
uint64_t load_size; // codeobj load size
uint64_t load_delta; // codeobj load size
uint32_t uri_length; // URI string length
char* uri; // URI string
int unload; // unload flag
} codeobj;
};
} rocprofiler_hsa_callback_data_t;
// HSA callback function type
typedef hsa_status_t (*rocprofiler_hsa_callback_fun_t)(
rocprofiler_hsa_cb_id_t id, // callback id
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
void* arg); // [in/out] user passed data
rocprofiler_hsa_cb_id_t id, // callback id
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
void* arg); // [in/out] user passed data
// HSA callbacks structure
typedef struct {
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
rocprofiler_hsa_callback_fun_t device; // agent assign callback
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
rocprofiler_hsa_callback_fun_t device; // agent assign callback
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
} rocprofiler_hsa_callbacks_t;
// Set callbacks. If the callback is NULL then it is disabled.
// If callback returns a value that is not HSA_STATUS_SUCCESS the callback
// will be unregistered.
hsa_status_t rocprofiler_set_hsa_callbacks(
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
void* arg); // callback user data
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
void* arg); // callback user data
#ifdef __cplusplus
} // extern "C" block
+1 -1
Просмотреть файл
@@ -1714,7 +1714,7 @@ typedef enum {
ROCPROFILER_ATT_TOKEN_MASK2 = 4,
ROCPROFILER_ATT_SE_MASK = 5,
ROCPROFILER_ATT_SAMPLE_RATE = 6,
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
ROCPROFILER_ATT_PERF_MASK = 240,
ROCPROFILER_ATT_PERF_CTRL = 241,
ROCPROFILER_ATT_PERFCOUNTER = 242,
+19 -19
Просмотреть файл
@@ -1,23 +1,23 @@
################################################################################
## Copyright (c) 2022 Advanced Micro Devices, Inc.
##
## Permission is hereby granted, free of charge, to any person obtaining a copy
## of this software and associated documentation files (the "Software"), to
## deal in the Software without restriction, including without limitation the
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
## sell copies of the Software, and to permit persons to whom the Software is
## furnished to do so, subject to the following conditions:
##
## The above copyright notice and this permission notice shall be included in
## all copies or substantial portions of the Software.
##
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
## IN THE SOFTWARE.
# Copyright (c) 2022 Advanced Micro Devices, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
################################################################################
add_subdirectory(file)
+22 -26
Просмотреть файл
@@ -17,10 +17,10 @@
# ##############################################################################
find_library(
ROCPROFV2_ATT rocprofv2_att
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH}
PATH_SUFFIXES hsa-amd-aqlprofile)
ROCPROFV2_ATT rocprofv2_att
HINTS ${CMAKE_INSTALL_PREFIX}
PATHS ${ROCM_PATH}
PATH_SUFFIXES hsa-amd-aqlprofile)
set(ENV{ROCPROFV2_ATT_LIB_PATH} $ROCPROFV2_ATT)
@@ -30,30 +30,26 @@ file(GLOB FILE_SOURCES att.cpp)
add_library(att_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
set_target_properties(
att_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
att_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(att_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(
att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR})
target_include_directories(att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR})
target_link_options(
att_plugin PRIVATE
-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(att_plugin PRIVATE rocprofiler-v2
hsa-runtime64::hsa-runtime64 stdc++fs)
att_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(att_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs)
install(TARGETS att_plugin
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS att_plugin
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
configure_file(att.py att/att.py COPYONLY)
configure_file(trace_view.py att/trace_view.py COPYONLY)
@@ -64,7 +60,7 @@ configure_file(ui/logo.svg att/ui/logo.svg COPYONLY)
configure_file(ui/styles.css att/ui/styles.css COPYONLY)
configure_file(ui/httpserver.py att/ui/httpserver.py COPYONLY)
install(
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
+14 -14
Просмотреть файл
@@ -54,11 +54,12 @@ class att_plugin_t {
att_plugin_t() {
std::vector<const char*> mpivars = {"MPI_RANK", "OMPI_COMM_WORLD_RANK", "MV2_COMM_WORLD_RANK"};
for (const char* envvar : mpivars) if (const char* env = getenv(envvar)) {
MPI_RANK = atoi(env);
MPI_ENABLE = true;
break;
}
for (const char* envvar : mpivars)
if (const char* env = getenv(envvar)) {
MPI_RANK = atoi(env);
MPI_ENABLE = true;
break;
}
}
bool MPI_ENABLE = false;
@@ -92,16 +93,15 @@ class att_plugin_t {
std::string name_demangled =
rocprofiler::truncate_name(rocprofiler::cxx_demangle(kernel_name_c));
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
name_demangled = name_demangled.substr(0, ATT_FILENAME_MAXBYTES);
std::string outfilepath = ".";
if (const char* env = getenv("OUTPUT_PATH"))
outfilepath = std::string(env);
if (const char* env = getenv("OUTPUT_PATH")) outfilepath = std::string(env);
outfilepath.reserve(outfilepath.size()+128); // Max filename size
outfilepath += '/'+name_demangled;
if (MPI_ENABLE) outfilepath += "_rank"+std::to_string(MPI_RANK);
outfilepath.reserve(outfilepath.size() + 128); // Max filename size
outfilepath += '/' + name_demangled;
if (MPI_ENABLE) outfilepath += "_rank" + std::to_string(MPI_RANK);
outfilepath += "_v";
// Find if this filename already exists. If so, increment vname.
@@ -113,9 +113,9 @@ class att_plugin_t {
auto dispatch_id = att_tracer_record->header.id.handle;
std::string fname = outfilepath + "_kernel.txt";
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id
<< "] GPU[" << att_tracer_record->gpu_id.handle
<< "]: " << kernel_name_c << '\n';
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id << "] GPU["
<< att_tracer_record->gpu_id.handle << "]: " << kernel_name_c
<< '\n';
// iterate over each shader engine att trace
int se_num = att_tracer_record->shader_engine_data_count;
+16 -14
Просмотреть файл
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
file(GLOB CLI_SOURCES "*.cpp")
add_library(cli_plugin SHARED ${CLI_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
set_target_properties(cli_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
set_target_properties(
cli_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(cli_plugin
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
target_compile_definitions(cli_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(cli_plugin PRIVATE ${PROJECT_SOURCE_DIR})
target_link_options(cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
target_link_options(
cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs atomic amd_comgr dl)
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs atomic amd_comgr dl)
install(TARGETS cli_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS cli_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
+128 -126
Просмотреть файл
@@ -1,76 +1,84 @@
################################################################################
## Copyright (c) 2022 Advanced Micro Devices, Inc.
##
## Permission is hereby granted, free of charge, to any person obtaining a copy
## of this software and associated documentation files (the "Software"), to
## deal in the Software without restriction, including without limitation the
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
## sell copies of the Software, and to permit persons to whom the Software is
## furnished to do so, subject to the following conditions:
##
## The above copyright notice and this permission notice shall be included in
## all copies or substantial portions of the Software.
##
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
## IN THE SOFTWARE.
# Copyright (c) 2022 Advanced Micro Devices, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
################################################################################
# Plugin shared object.
add_library(ctf_plugin SHARED
ctf.cpp
plugin.cpp
barectf.c "${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
hsa_begin.cpp.i hsa_end.cpp.i
hip_begin.cpp.i hip_end.cpp.i)
set_target_properties(ctf_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
add_library(
ctf_plugin SHARED
ctf.cpp
plugin.cpp
barectf.c
"${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
hsa_begin.cpp.i
hsa_end.cpp.i
hip_begin.cpp.i
hip_end.cpp.i)
set_target_properties(
ctf_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
set(METADATA_STREAM_FILE_DIR "${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME}/plugin/ctf")
target_compile_definitions(ctf_plugin PUBLIC AMD_INTERNAL_BUILD PRIVATE
HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata")
target_include_directories(ctf_plugin PRIVATE
"${PROJECT_SOURCE_DIR}"
"${CMAKE_BINARY_DIR}/src/api"
"${CMAKE_CURRENT_BINARY_DIR}")
target_link_options(ctf_plugin PRIVATE
"-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
-Wl,--no-undefined)
target_link_libraries(ctf_plugin PRIVATE
rocprofiler-v2
hsa-runtime64::hsa-runtime64
stdc++fs
dl)
install(TARGETS ctf_plugin LIBRARY
DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
COMPONENT plugins)
target_compile_definitions(
ctf_plugin
PUBLIC AMD_INTERNAL_BUILD
PRIVATE
HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata"
)
target_include_directories(
ctf_plugin PRIVATE "${PROJECT_SOURCE_DIR}" "${CMAKE_BINARY_DIR}/src/api"
"${CMAKE_CURRENT_BINARY_DIR}")
target_link_options(
ctf_plugin PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
-Wl,--no-undefined)
target_link_libraries(ctf_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs dl)
install(TARGETS ctf_plugin LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
COMPONENT plugins)
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3,
# CppHeaderParser, PyYAML, and barectf.
find_package(Python3 COMPONENTS Interpreter REQUIRED)
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3, CppHeaderParser, PyYAML, and
# barectf.
find_package(
Python3
COMPONENTS Interpreter
REQUIRED)
message("Python: ${Python3_EXECUTABLE})")
execute_process(COMMAND Python3::Interpreter -c "print('hello')")
function(check_py3_pkg pkg_name)
execute_process(COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
RESULT_VARIABLE PY3_IMPORT_RES
OUTPUT_QUIET)
execute_process(
COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
RESULT_VARIABLE PY3_IMPORT_RES
OUTPUT_QUIET)
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
endif()
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
endif()
message(STATUS "Found Python 3 package `${pkg_name}`")
message(STATUS "Found Python 3 package `${pkg_name}`")
endfunction()
check_py3_pkg(CppHeaderParser)
@@ -78,82 +86,76 @@ check_py3_pkg(yaml)
find_program(BARECTF_RES barectf REQUIRED HINTS "$ENV{HOME}/.local/bin")
# Generate barectf YAML and C++ files for HSA API.
get_property(HSA_RUNTIME_INCLUDE_DIRS
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH
REQUIRED)
get_property(
HSA_RUNTIME_INCLUDE_DIRS
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH "${HSA_H}" DIRECTORY)
add_custom_command(
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
-o hsa_ext_amd.h.i
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i
hsa_ext_amd.h.i
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
> hsa_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
hsa hsa_input.h
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
"${HSA_RUNTIME_INC_PATH}/hsa.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
COMMENT "Generating HSA API files for the `ctf` plugin...")
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
hsa_ext_amd.h.i
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i hsa_ext_amd.h.i
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h" > hsa_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hsa
hsa_input.h
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HSA_RUNTIME_INC_PATH}/hsa.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
COMMENT "Generating HSA API files for the `ctf` plugin...")
# Generate barectf YAML and C++ files for HIP API.
get_property(HIP_INCLUDE_DIRS TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip
NO_DEFAULT_PATH
REQUIRED)
find_file(HIP_PROF_STR_H hip_prof_str.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip hip/amd_detail
NO_DEFAULT_PATH
REQUIRED)
get_property(
HIP_INCLUDE_DIRS
TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip
NO_DEFAULT_PATH REQUIRED)
find_file(
HIP_PROF_STR_H hip_prof_str.h
PATHS ${HIP_INCLUDE_DIRS}
PATH_SUFFIXES hip hip/amd_detail
NO_DEFAULT_PATH REQUIRED)
list(TRANSFORM HIP_INCLUDE_DIRS PREPEND -I)
add_custom_command(
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS}
-E "${HIP_RUNTIME_API_H}"
-D__HIP_PLATFORM_HCC__=1
-D__HIP_ROCclr__=1
-o hip_runtime_api.h.i
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
BYPRODUCTS hip_runtime_api.h.i hip_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
hip hip_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
"${HIP_RUNTIME_API_H}"
"${HIP_PROF_STR_H}"
COMMENT "Generating HIP API files for the `ctf` plugin...")
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS} -E "${HIP_RUNTIME_API_H}"
-D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
BYPRODUCTS hip_runtime_api.h.i hip_input.h
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hip
hip_input.h
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HIP_RUNTIME_API_H}"
"${HIP_PROF_STR_H}"
COMMENT "Generating HIP API files for the `ctf` plugin...")
# Generate `env.yaml` (trace environment for barectf).
add_custom_command(
OUTPUT env.yaml
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
${PROJECT_VERSION}
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
COMMENT "Generating `env.yaml`...")
OUTPUT env.yaml
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
${PROJECT_VERSION}
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
COMMENT "Generating `env.yaml`...")
# Generate raw CTF tracer with barectf.
add_custom_command(
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
"-I${CMAKE_CURRENT_SOURCE_DIR}"
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
DEPENDS hsa_erts.yaml
hip_erts.yaml
env.yaml
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
COMMENT "Generating raw CTF tracer with barectf...")
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
DESTINATION "${METADATA_STREAM_FILE_DIR}" COMPONENT plugins)
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
"-I${CMAKE_CURRENT_SOURCE_DIR}" "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
DEPENDS hsa_erts.yaml hip_erts.yaml env.yaml "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
COMMENT "Generating raw CTF tracer with barectf...")
install(
FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
DESTINATION "${METADATA_STREAM_FILE_DIR}"
COMPONENT plugins)
+6 -12
Просмотреть файл
@@ -156,9 +156,8 @@ class HsaApiEventRecord : public TracerEventRecord<barectf_hsa_api_ctx> {
const rocprofiler_session_id_t session_id,
const std::uint64_t clock_val)
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val} {
if(record.api_data.hsa)
api_data_ = *(record.api_data.hsa);
}
if (record.api_data.hsa) api_data_ = *(record.api_data.hsa);
}
explicit HsaApiEventRecord(const rocprofiler_record_tracer_t& record,
const std::uint64_t clock_val, hsa_api_data_t& api_data)
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val}, api_data_(api_data) {}
@@ -206,7 +205,7 @@ class HipApiEventRecord : public TracerEventRecord<barectf_hip_api_ctx> {
const rocprofiler_session_id_t session_id,
const std::uint64_t clock_val)
: TracerEventRecord<barectf_hip_api_ctx>{record, clock_val},
api_data_{record.api_data.hip? *(record.api_data.hip) : hip_api_data_t{}},
api_data_{record.api_data.hip ? *(record.api_data.hip) : hip_api_data_t{}},
kernel_name_{record.name ? record.name : std::string{}} {}
explicit HipApiEventRecord(const rocprofiler_record_tracer_t& record,
const std::uint64_t clock_val, hip_api_data_t& api_data,
@@ -760,16 +759,11 @@ std::uint64_t GetMetadataClkClsOffset() {
static const char* LOOP_MPI_RANK(const std::vector<const char*>& mpivars) {
for (const char* env : mpivars)
if (const char* envvar = getenv(env))
return envvar;
if (const char* envvar = getenv(env)) return envvar;
return nullptr;
}
static void insert_meta_to_stream(
std::stringstream& stream,
const char* field,
const char* value
) {
static void insert_meta_to_stream(std::stringstream& stream, const char* field, const char* value) {
if (!field || !value) return;
stream << "\n\t" << std::string(field) << " = " << std::string(value) << ';';
}
@@ -802,7 +796,7 @@ void Plugin::CopyAdjustedMetadataStreamFile(const fs::path& metadata_stream_path
std::string data_ins = data_stream.str();
size_t env_pos = metadata.find("env {");
if (env_pos != std::string::npos)
metadata.insert(metadata.begin()+env_pos+5, data_ins.begin(), data_ins.end());
metadata.insert(metadata.begin() + env_pos + 5, data_ins.begin(), data_ins.end());
else
std::cerr << "Failed to insert MPI metadata!" << std::endl;
}
+16 -14
Просмотреть файл
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
file(GLOB FILE_SOURCES "*.cpp")
add_library(file_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
set_target_properties(file_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
set_target_properties(
file_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(file_plugin
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
target_compile_definitions(file_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(file_plugin PRIVATE ${PROJECT_SOURCE_DIR})
target_link_options(file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
target_link_options(
file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs amd_comgr dl)
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
stdc++fs amd_comgr dl)
install(TARGETS file_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS file_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT asan)
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT runtime)
+1 -2
Просмотреть файл
@@ -216,8 +216,7 @@ class file_plugin_t {
case ACTIVITY_DOMAIN_HIP_API: {
if (hip_api_header_written_.load(std::memory_order_relaxed)) return;
output_file = get_output_file(output_type_t::TRACER, ACTIVITY_DOMAIN_HIP_API);
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID"
<< std::endl;
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID" << std::endl;
*output_file << std::endl;
hip_api_header_written_.exchange(true, std::memory_order_release);
return;
+19 -19
Просмотреть файл
@@ -1,27 +1,27 @@
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
add_library(perfetto_plugin
${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES}
perfetto.cpp perfetto_sdk/sdk/perfetto.cc)
add_library(perfetto_plugin ${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES} perfetto.cpp
perfetto_sdk/sdk/perfetto.cc)
set_target_properties(perfetto_plugin PROPERTIES
CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
set_target_properties(
perfetto_plugin
PROPERTIES CXX_VISIBILITY_PRESET hidden
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
target_compile_definitions(perfetto_plugin
PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_compile_definitions(perfetto_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_HCC__=1)
target_include_directories(perfetto_plugin
PRIVATE ${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
target_include_directories(
perfetto_plugin PRIVATE ${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
target_link_options(perfetto_plugin
PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
target_link_options(
perfetto_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
-Wl,--no-undefined)
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs amd_comgr)
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs
amd_comgr)
install(TARGETS perfetto_plugin LIBRARY
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
COMPONENT plugins)
install(TARGETS perfetto_plugin
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME} COMPONENT plugins)
+1 -2
Просмотреть файл
@@ -556,8 +556,7 @@ class perfetto_plugin_t {
if (tracer_record.name) {
kernel_name = rocprofiler::cxx_demangle(tracer_record.name);
TRACE_EVENT_BEGIN(
"HIP_OPS",
perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
"HIP_OPS", perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
gpu_track, tracer_record.timestamps.begin.value, "Agent ID",
tracer_record.agent_id.handle, "Process ID", GetPid(), "Kernel Name", kernel_name,
perfetto::Flow::ProcessScoped(tracer_record.correlation_id.value));
Разница между файлами не показана из-за своего большого размера Загрузить разницу
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+4 -5
Просмотреть файл
@@ -36,9 +36,10 @@
#include "src/utils/helper.h"
// Macro to check ROCProfiler calls status
#define CHECK_ROCPROFILER(call) \
#define CHECK_ROCPROFILER(call) \
do { \
if ((call) != ROCPROFILER_STATUS_SUCCESS) rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
if ((call) != ROCPROFILER_STATUS_SUCCESS) \
rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
} while (false)
namespace {
@@ -48,8 +49,6 @@ namespace {
return pid;
}
[[maybe_unused]] uint64_t GetMachineID() {
return gethostid();
}
[[maybe_unused]] uint64_t GetMachineID() { return gethostid(); }
} // namespace
+124 -85
Просмотреть файл
@@ -26,9 +26,11 @@ set(ROCPROF_WRAPPER_BIN_DIR ${ROCPROF_WRAPPER_DIR}/bin)
set(ROCPROF_WRAPPER_LIB_DIR ${ROCPROF_WRAPPER_DIR}/lib)
set(ROCPROF_WRAPPER_TOOL_DIR ${ROCPROF_WRAPPER_DIR}/tool)
#Function to generate header template file
# Function to generate header template file
function(create_header_template)
file(WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in "/*
file(
WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in
"/*
Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy
@@ -69,105 +71,142 @@ function(create_header_template)
#endif")
endfunction()
#use header template file and generate wrapper header files
# use header template file and generate wrapper header files
function(generate_wrapper_header)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
#find all header files from inc
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
#Convert the list of files into #includes
foreach(header_file ${include_files})
#set include guard
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
# find all header files from inc
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
# Convert the list of files into #includes
foreach(header_file ${include_files})
# set include guard
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
# set include statement
get_filename_component(file_name ${header_file} NAME)
set(include_statements
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
)
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
unset(include_guard)
unset(include_statements)
endforeach()
# Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged.
# So drectly using that file name
set(file_name "activity.h")
# set include guard
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
#set include statement
get_filename_component(file_name ${header_file} NAME)
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
unset(include_guard)
unset(include_statements)
endforeach()
#Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged. So drectly using that file name
set(file_name "activity.h")
#set include guard
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
set(include_statements
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
)
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
endfunction()
#function to create symlink to binaries
# function to create symlink to binaries
function(create_binary_symlink)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
#create symlink for rocprof
set(file_name "rocprof")
add_custom_target(link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_BINDIR}/${file_name} ${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
# create symlink for rocprof
set(file_name "rocprof")
add_custom_target(
link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_BINDIR}/${file_name}
${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
endfunction()
#function to create symlink to libraries
# function to create symlink to libraries
function(create_library_symlink)
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
set(SO_VERSION "${LIB_VERSION_STRING}")
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}" "${LIB_ROCPROF}.${SO_VERSION}")
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
set(SO_VERSION "${LIB_VERSION_STRING}")
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}"
"${LIB_ROCPROF}.${SO_VERSION}")
foreach(file_name ${library_files})
add_custom_target(link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
endforeach()
#create symlink to rocprofiler/tool/libtool.so
# With File reorg,tool renamed to rocprof-tool
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
set(LIB_TOOL "libtool.so")
set(LIB_ROCPROFTOOL "librocprof-tool.so")
add_custom_target(link_${LIB_TOOL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL} ${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
#create symlink to test binary
#since its saved in lib folder , the code for the same is added here
# With File reorg ,binary name changed from ctrl to rocprof-ctrl
set(TEST_CTRL "ctrl")
set(TEST_ROCPROFCTRL "rocprof-ctrl")
add_custom_target(link_${TEST_CTRL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL} ${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
set(METRICS "metrics.xml")
add_custom_target(link_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
foreach(file_name ${library_files})
add_custom_target(
link_${file_name} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${file_name}
${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
endforeach()
# create symlink to rocprofiler/tool/libtool.so With File reorg,tool renamed to
# rocprof-tool
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
set(LIB_TOOL "libtool.so")
set(LIB_ROCPROFTOOL "librocprof-tool.so")
add_custom_target(
link_${LIB_TOOL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL}
${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
# create symlink to test binary since its saved in lib folder , the code for the same
# is added here With File reorg ,binary name changed from ctrl to rocprof-ctrl
set(TEST_CTRL "ctrl")
set(TEST_ROCPROFCTRL "rocprof-ctrl")
add_custom_target(
link_${TEST_CTRL} ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL}
${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
set(METRICS "metrics.xml")
add_custom_target(
link_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS}
${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
set(GFX_METRICS "gfx_metrics.xml")
add_custom_target(link_gfx_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND ${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
set(GFX_METRICS "gfx_metrics.xml")
add_custom_target(
link_gfx_metrics ALL
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
COMMAND
${CMAKE_COMMAND} -E create_symlink
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS}
${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
endfunction()
#Creater a template for header file
# Creater a template for header file
create_header_template()
#Use template header file and generater wrapper header files
# Use template header file and generater wrapper header files
generate_wrapper_header()
install(DIRECTORY ${ROCPROF_WRAPPER_INC_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT dev)
install(
DIRECTORY ${ROCPROF_WRAPPER_INC_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT dev)
# Create symlink to binaries
create_binary_symlink()
install(DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
install(
DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime)
create_library_symlink()
install(DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR} DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
install(FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so DESTINATION ${ROCPROFILER_NAME}/lib
COMPONENT dev)
#install tools directory
install(DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
install(
DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
install(
FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so
DESTINATION ${ROCPROFILER_NAME}/lib
COMPONENT dev)
# install tools directory
install(
DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR}
DESTINATION ${ROCPROFILER_NAME}
COMPONENT runtime)
+132 -88
Просмотреть файл
@@ -1,15 +1,18 @@
include (CheckCSourceCompiles)
# ############################################################################################################################################
# ############################################################################################################################################
include(CheckCSourceCompiles)
# ########################################################################################
# ########################################################################################
# General Requirements
# ############################################################################################################################################
# ############################################################################################################################################
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH
REQUIRED)
# ########################################################################################
# ########################################################################################
get_property(
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
include_directories(${HSA_RUNTIME_INC_PATH})
@@ -22,138 +25,179 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${ROCM_PATH}/lib/cmake/hip")
set(CMAKE_HIP_ARCHITECTURES OFF)
find_package(HIP REQUIRED MODULE)
find_package(Clang REQUIRED CONFIG
PATHS "${ROCM_PATH}"
PATH_SUFFIXES "llvm/lib/cmake/clang")
find_package(
Clang REQUIRED CONFIG
PATHS "${ROCM_PATH}"
PATH_SUFFIXES "llvm/lib/cmake/clang")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules" "${ROCM_PATH}/lib/cmake/hip")
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules"
"${ROCM_PATH}/lib/cmake/hip")
find_package(LibElf REQUIRED)
find_package(LibDw REQUIRED)
## Add a custom targets to build and run all the tests
# Add a custom targets to build and run all the tests
add_custom_target(samples)
add_dependencies(samples rocprofiler-v2)
add_custom_target(run-samples COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh DEPENDS samples)
add_custom_target(
run-samples
COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh
DEPENDS samples)
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
# ############################################################################################################################################
# ########################################################################################
# ############################################################################################################################################
# ############################################################################################################################################
# ########################################################################################
# ########################################################################################
# Samples Build & Run Script
# ############################################################################################################################################
# ############################################################################################################################################
# ########################################################################################
# ########################################################################################
# ############################################################################################################################################
# ########################################################################################
# Profiler Samples
# ############################################################################################################################################
# ########################################################################################
## Build Kernel No Replay Sample
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build Kernel No Replay Sample
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(
profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(profiler_kernel_no_replay PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(profiler_kernel_no_replay PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples profiler_kernel_no_replay)
install(TARGETS profiler_kernel_no_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS profiler_kernel_no_replay
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build Device Profiling Sample
set_source_files_properties(profiler/device_profiling_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build Device Profiling Sample
set_source_files_properties(profiler/device_profiling_sample.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(profiler_device_profiling PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(profiler_device_profiling PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples profiler_device_profiling)
install(TARGETS profiler_device_profiling RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS profiler_device_profiling
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build Counters Sampling example
set_source_files_properties(counters_sampler/pcie_counters_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build Counters Sampling example
set_source_files_properties(counters_sampler/pcie_counters_example.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(pcie_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
target_link_options(pcie_counters_sampler PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples pcie_counters_sampler)
install(TARGETS pcie_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS pcie_counters_sampler
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build XGMI Counters Sampling example
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build XGMI Counters Sampling example
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(
xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(xgmi_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
target_link_options(xgmi_counters_sampler PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples xgmi_counters_sampler)
install(TARGETS xgmi_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS xgmi_counters_sampler
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
# ################################################################################################################
# ########################################################################################
# ############################################################################################################################################
# ########################################################################################
# Tracer Samples
# ############################################################################################################################################
# ########################################################################################
## Build HIP/HSA Trace Sample
# Build HIP/HSA Trace Sample
set_source_files_properties(tracer/sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(tracer_hip_hsa tracer/sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR}
${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(tracer_hip_hsa PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(tracer_hip_hsa PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples tracer_hip_hsa)
install(TARGETS tracer_hip_hsa RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS tracer_hip_hsa
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
## Build HIP/HSA Trace with async output api trace data Sample
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp ${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
# Build HIP/HSA Trace with async output api trace data Sample
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT
1)
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp
${ROCPROFILER_UTIL_SRC_FILES})
target_include_directories(
tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
target_link_libraries(tracer_hip_hsa_async PRIVATE rocprofiler-v2 amd_comgr)
target_link_options(tracer_hip_hsa_async PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples tracer_hip_hsa_async)
install(TARGETS tracer_hip_hsa_async RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
install(TARGETS tracer_hip_hsa_async
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
# ############################################################################################################################################
# ########################################################################################
# PC Sampling Samples
# ############################################################################################################################################
# ########################################################################################
set(CODE_PRINTING_SAMPLE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/pcsampler/code_printing_sample)
file(GLOB PC_SAMPLING_CODE_PRINTING_FILES ${CODE_PRINTING_SAMPLE_DIR}/*.cpp)
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES}
HIPCC_OPTIONS
-std=c++17
# Include debugging symbols and source for the contextual disassembly
-gdwarf-4)
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES}
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
hip_add_executable(
pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES} HIPCC_OPTIONS -std=c++17
# Include debugging symbols and source for the contextual disassembly
-gdwarf-4)
check_c_source_compiles("
check_c_source_compiles(
"
#define _GNU_SOURCE
#include <sys/mman.h>
int main() { return memfd_create (\"cmake_test\", 0); }
" HAVE_MEMFD_CREATE)
if (HAVE_MEMFD_CREATE)
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
endif()
"
HAVE_MEMFD_CREATE)
if(HAVE_MEMFD_CREATE)
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
endif()
target_link_libraries(pc_sampling_code_printing
PRIVATE
rocprofiler-v2
rocm-dbgapi
${LIBELF_LIBRARIES}
${LIBDW_LIBRARIES}
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
target_include_directories(pc_sampling_code_printing
PRIVATE
${TEST_DIR}
${ROOT_DIR}
${HSA_RUNTIME_INC_PATH}
${PROJECT_SOURCE_DIR})
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples pc_sampling_code_printing)
install(TARGETS pc_sampling_code_printing RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
target_link_libraries(
pc_sampling_code_printing
PRIVATE rocprofiler-v2 rocm-dbgapi ${LIBELF_LIBRARIES} ${LIBDW_LIBRARIES}
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
target_include_directories(
pc_sampling_code_printing PRIVATE ${TEST_DIR} ${ROOT_DIR} ${HSA_RUNTIME_INC_PATH}
${PROJECT_SOURCE_DIR})
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
add_dependencies(samples pc_sampling_code_printing)
install(TARGETS pc_sampling_code_printing
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
COMPONENT samples)
install(DIRECTORY "${PROJECT_SOURCE_DIR}/samples/" DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src OPTIONAL COMPONENT samples)
install(
DIRECTORY "${PROJECT_SOURCE_DIR}/samples/"
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src
OPTIONAL
COMPONENT samples)
# ############################################################################################################################################
# ########################################################################################
# Scripts to run samples
# ############################################################################################################################################
# ########################################################################################
# Copy run_samples script to samples folder
configure_file(run_samples.sh ${PROJECT_BINARY_DIR}/samples COPYONLY)
# ############################################################################################################################################
# ########################################################################################
+10 -11
Просмотреть файл
@@ -8,15 +8,14 @@ int main(int argc, char** argv) {
"CI_PERF_slv_MemRd_Bandwidth0", "CI_PERF_slv_MemWr_Bandwidth0", "CI_PERF_slv_totalMemRdTx",
"CI_PERF_slv_totalMemWrTx", "CI_PERF_slv_totalTx"};
if(argc > 1) {
if (argc > 1) {
counter_option = atoi(argv[1]);
}
else{
std::cout<< "Please provide one of the counter index options as argument:\n";
for(int i = 0; i < pcie_counters.size(); i++){
std::cout<< "[" << i << "]: " << pcie_counters[i] << std::endl;
} else {
std::cout << "Please provide one of the counter index options as argument:\n";
for (int i = 0; i < pcie_counters.size(); i++) {
std::cout << "[" << i << "]: " << pcie_counters[i] << std::endl;
}
std::cout<< "Example:\n ./pcie_counters_sampler 1\n";
std::cout << "Example:\n ./pcie_counters_sampler 1\n";
exit(0);
}
@@ -55,10 +54,10 @@ int main(int argc, char** argv) {
.sampling_rate = rate,
.sampling_duration = duration,
.gpu_agent_index = 0};
CHECK_ROCPROFILER(
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
0, &filter_id, property));
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
// Normal HIP Calls
+8 -8
Просмотреть файл
@@ -40,14 +40,14 @@ int main(int argc, char** argv) {
uint32_t duration = 5000;
rocprofiler_counters_sampler_parameters_t cs_parameters = {.counters = counters_input,
.counters_num = 1,
.sampling_rate = rate,
.sampling_duration = duration,
.gpu_agent_index = 0};
CHECK_ROCPROFILER(
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
0, &filter_id, property));
.counters_num = 1,
.sampling_rate = rate,
.sampling_duration = duration,
.gpu_agent_index = 0};
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_COUNTERS_SAMPLER,
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
// Normal HIP Calls
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+49 -71
Просмотреть файл
@@ -31,96 +31,74 @@
namespace amd::debug_agent {
class code_object_t {
struct symbol_info_t {
const std::string m_name;
amd_dbgapi_global_address_t m_value;
amd_dbgapi_size_t m_size;
};
struct symbol_info_t {
const std::string m_name;
amd_dbgapi_global_address_t m_value;
amd_dbgapi_size_t m_size;
};
using symbol_map_t =
std::optional
< std::map
< amd_dbgapi_global_address_t
, std::pair<std::string, amd_dbgapi_size_t>
>
>;
using symbol_map_t = std::optional<
std::map<amd_dbgapi_global_address_t, std::pair<std::string, amd_dbgapi_size_t>>>;
public:
void load_symbol_map();
void load_debug_info();
public:
void load_symbol_map();
void load_debug_info();
std::optional<symbol_info_t>
find_symbol(amd_dbgapi_global_address_t address);
std::optional<symbol_info_t> find_symbol(amd_dbgapi_global_address_t address);
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
code_object_t(code_object_t &&rhs);
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
code_object_t(code_object_t&& rhs);
~code_object_t();
~code_object_t();
void open();
bool is_open() const { return m_fd.has_value(); }
void open();
bool is_open() const { return m_fd.has_value(); }
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
// FIXME(?): extra function not in rocr-debug-agent
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
// FIXME(?): extra function not in rocr-debug-agent
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t pc);
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t pc);
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t start_addr,
bool const print_src = false);
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
amd_dbgapi_global_address_t start_addr, bool const print_src = false);
bool save(const std::string &directory) const;
bool save(const std::string& directory) const;
amd_dbgapi_global_address_t m_load_address{ 0 };
amd_dbgapi_size_t m_mem_size{ 0 };
std::optional<int> m_fd;
amd_dbgapi_global_address_t m_load_address{0};
amd_dbgapi_size_t m_mem_size{0};
std::optional<int> m_fd;
std::optional
< std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>
>
m_line_number_map;
std::optional<std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>>
m_line_number_map;
std::optional
< std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>
>
m_pc_ranges_map;
std::optional<std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>> m_pc_ranges_map;
symbol_map_t m_symbol_map;
std::string m_uri;
amd_dbgapi_code_object_id_t const m_code_object_id;
// FIXME(?): extra field not in rocr-debug-agent
uint32_t m_elf_amdgpu_machine{ 0 };
symbol_map_t m_symbol_map;
std::string m_uri;
amd_dbgapi_code_object_id_t const m_code_object_id;
// FIXME(?): extra field not in rocr-debug-agent
uint32_t m_elf_amdgpu_machine{0};
};
} // namespace amd::debug_agent
} // namespace amd::debug_agent
enum struct disassembly_mode {
AROUND,
KERNEL
};
enum struct disassembly_mode { AROUND, KERNEL };
std::tuple
< amd_dbgapi_process_id_t
, std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
>
std::tuple<amd_dbgapi_process_id_t,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>>
init_disassembly();
void
disassemble(
disassembly_mode const mode,
amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
&code_object_map,
uint64_t const addr);
void disassemble(
disassembly_mode const mode, amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
uint64_t const addr);
void
print_pc_context(
amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
&code_object_map,
amd_dbgapi_global_address_t const pc);
void print_pc_context(
amd_dbgapi_process_id_t const process_id,
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
amd_dbgapi_global_address_t const pc);
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
+82 -121
Просмотреть файл
@@ -47,169 +47,130 @@
#include "program.hpp"
struct libc_freer {
void operator()(char *p) { free(p); }
void operator()(char* p) { free(p); }
};
namespace util {
template <typename T, typename... Ts>
static void
hash_combine(size_t &hsh, T const& v, Ts const&... rest)
{
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
(hash_combine(hsh, rest), ...);
static void hash_combine(size_t& hsh, T const& v, Ts const&... rest) {
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
(hash_combine(hsh, rest), ...);
}
} // namespace util
} // namespace util
[[maybe_unused]]
static inline bool
operator==(hsa_executable_t const &l, hsa_executable_t const &r)
{
return l.handle == r.handle;
[[maybe_unused]] static inline bool operator==(hsa_executable_t const& l,
hsa_executable_t const& r) {
return l.handle == r.handle;
}
[[maybe_unused]]
static inline bool
operator==(
rocprofiler_kernel_dispatch_id_t const &l,
rocprofiler_kernel_dispatch_id_t const &r)
{
return l.value == r.value;
[[maybe_unused]] static inline bool operator==(rocprofiler_kernel_dispatch_id_t const& l,
rocprofiler_kernel_dispatch_id_t const& r) {
return l.value == r.value;
}
static inline bool
operator==(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
{
return l.handle == r.handle;
static inline bool operator==(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
return l.handle == r.handle;
}
static inline bool
operator!=(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
{
return !(l == r);
static inline bool operator!=(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
return !(l == r);
}
namespace std {
template <>
struct hash<hsa_executable_t> {
size_t operator()(hsa_executable_t const &v) const {
size_t ret = 0;
util::hash_combine(ret, v.handle);
return ret;
}
template <> struct hash<hsa_executable_t> {
size_t operator()(hsa_executable_t const& v) const {
size_t ret = 0;
util::hash_combine(ret, v.handle);
return ret;
}
};
template <>
struct hash<rocprofiler_kernel_dispatch_id_t> {
size_t operator()(rocprofiler_kernel_dispatch_id_t const &v) const {
size_t ret = 0;
util::hash_combine(ret, v.value);
return ret;
}
template <> struct hash<rocprofiler_kernel_dispatch_id_t> {
size_t operator()(rocprofiler_kernel_dispatch_id_t const& v) const {
size_t ret = 0;
util::hash_combine(ret, v.value);
return ret;
}
};
} // namespace std
} // namespace std
struct disassembly_ctx_t {
disassembly_ctx_t();
~disassembly_ctx_t();
disassembly_ctx_t();
~disassembly_ctx_t();
void disassemble_kernels(bool const reinitialize);
void init();
bool inited() const;
void reset();
void disassemble_kernels(bool const reinitialize);
void init();
bool inited() const;
void reset();
amd_dbgapi_process_id_t process_id;
std::map
< amd_dbgapi_global_address_t
, amd::debug_agent::code_object_t
> codeobjs;
amd_dbgapi_process_id_t process_id;
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t> codeobjs;
};
disassembly_ctx_t::disassembly_ctx_t()
: process_id(AMD_DBGAPI_PROCESS_NONE)
, codeobjs()
{}
disassembly_ctx_t::disassembly_ctx_t() : process_id(AMD_DBGAPI_PROCESS_NONE), codeobjs() {}
disassembly_ctx_t::~disassembly_ctx_t()
{
disassembly_ctx_t::~disassembly_ctx_t() { reset(); }
void disassembly_ctx_t::disassemble_kernels(bool const reinitialize) {
if (reinitialize) {
reset();
}
}
if (!inited()) {
init();
}
void
disassembly_ctx_t::disassemble_kernels(bool const reinitialize)
{
if (reinitialize) {
reset();
}
if (!inited()) {
init();
auto it = codeobjs.begin();
auto const end = codeobjs.end();
auto const pred = [](decltype(*it)& x) {
/*
* A lame filter for the kernels in the current file, because nothing
* else in this little demo will have the URL prefix of `file://`.
*/
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
};
while (end != (it = std::find_if(it, end, pred))) {
auto& codeobj = it->second;
codeobj.load_symbol_map();
if (!codeobj.m_symbol_map) {
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
break;
}
auto it = codeobjs.begin();
auto const end = codeobjs.end();
auto const pred = [](decltype(*it) &x){
/*
* A lame filter for the kernels in the current file, because nothing
* else in this little demo will have the URL prefix of `file://`.
*/
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
};
while (end != (it = std::find_if(it, end, pred))) {
auto &codeobj = it->second;
codeobj.load_symbol_map();
if (!codeobj.m_symbol_map) {
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
break;
}
for (auto const &sym : *codeobj.m_symbol_map) {
auto const &addr = sym.first;
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
}
++it;
for (auto const& sym : *codeobj.m_symbol_map) {
auto const& addr = sym.first;
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
}
++it;
}
}
inline void
disassembly_ctx_t::init()
{
std::tie(process_id, codeobjs) = init_disassembly();
}
inline void disassembly_ctx_t::init() { std::tie(process_id, codeobjs) = init_disassembly(); }
inline bool
disassembly_ctx_t::inited() const
{
return AMD_DBGAPI_PROCESS_NONE != process_id;
}
inline bool disassembly_ctx_t::inited() const { return AMD_DBGAPI_PROCESS_NONE != process_id; }
void
disassembly_ctx_t::reset()
{
codeobjs.clear();
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
amd_dbgapi_process_detach(process_id);
amd_dbgapi_finalize();
process_id = AMD_DBGAPI_PROCESS_NONE;
}
void disassembly_ctx_t::reset() {
codeobjs.clear();
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
amd_dbgapi_process_detach(process_id);
amd_dbgapi_finalize();
process_id = AMD_DBGAPI_PROCESS_NONE;
}
}
static disassembly_ctx_t g_dis;
void
disassembly_disassemble_kernels(bool const reinitialize)
{
g_dis.disassemble_kernels(reinitialize);
void disassembly_disassemble_kernels(bool const reinitialize) {
g_dis.disassemble_kernels(reinitialize);
}
void
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc)
{
if (!g_dis.inited()) {
g_dis.init();
}
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc) {
if (!g_dis.inited()) {
g_dis.init();
}
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
}
+3 -5
Просмотреть файл
@@ -23,10 +23,8 @@
#include <amd-dbgapi/amd-dbgapi.h>
void
disassembly_disassemble_kernels(bool const);
void disassembly_disassemble_kernels(bool const);
void
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
+246 -310
Просмотреть файл
@@ -46,274 +46,227 @@
namespace util {
struct hipMalloc_freer {
void operator()(void * const ptr) { (void)hipFree(ptr); }
void operator()(void* const ptr) { (void)hipFree(ptr); }
};
} // namespace util
} // namespace util
namespace prng {
static uint64_t
splitmix64_next(uint64_t * const sm64_state)
{
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return z ^ (z >> 31);
static uint64_t splitmix64_next(uint64_t* const sm64_state) {
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
return z ^ (z >> 31);
}
static inline uint64_t
rotl64(const uint64_t x, int k)
{
return (x << k) | (x >> (64 - k));
static inline uint64_t rotl64(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); }
static uint64_t xrs_next(uint64_t* const xrs_state) {
const uint64_t result = rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
const uint64_t t = xrs_state[1] << 17;
xrs_state[2] ^= xrs_state[0];
xrs_state[3] ^= xrs_state[1];
xrs_state[1] ^= xrs_state[2];
xrs_state[0] ^= xrs_state[3];
xrs_state[2] ^= t;
xrs_state[3] = rotl64(xrs_state[3], 45);
return result;
}
static uint64_t
xrs_next(uint64_t * const xrs_state)
{
const uint64_t result =
rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
const uint64_t t = xrs_state[1] << 17;
xrs_state[2] ^= xrs_state[0];
xrs_state[3] ^= xrs_state[1];
xrs_state[1] ^= xrs_state[2];
xrs_state[0] ^= xrs_state[3];
xrs_state[2] ^= t;
xrs_state[3] = rotl64(xrs_state[3], 45);
return result;
}
} // namespace prng
} // namespace prng
namespace kernel {
template <typename T>
__global__ static void
memset_gpu(T * const s, T const c, size_t const n)
{
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
s[i] = c;
}
template <typename T> __global__ static void memset_gpu(T* const s, T const c, size_t const n) {
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
s[i] = c;
}
}
template <typename T>
__global__ static void
count_gpu(
T const * const xs,
T * const out,
size_t const n,
size_t const nblocks,
T const gt)
{
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
if (xs[i] > gt) {
atomicAdd(&out[i % nblocks], 1);
}
__global__ static void count_gpu(T const* const xs, T* const out, size_t const n,
size_t const nblocks, T const gt) {
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
size_t i_shift = blockDim.x * gridDim.x;
for (size_t i = i_start; i < n; i += i_shift) {
if (xs[i] > gt) {
atomicAdd(&out[i % nblocks], 1);
}
}
}
} // namespace kernel
} // namespace kernel
static char const GETOPT_ARGS[] = "cd:mn:DP";
static void
usage()
{
fputs("usage: " PROGNAME " [OPTION]... MIN [SEED]\n"
" -d DEV\tHIP device number\n"
" -n LEN\tLength of random integer array\n"
" -D\t\tPrint kernel disassembly\n"
" -P\t\tPrint source and disassembly of sampled PC locations\n"
"where\n"
" DEV : i32\n"
" MIN : u64\n"
" LEN : u64\n"
" SEED : u64\n",
stderr);
static void usage() {
fputs("usage: " PROGNAME
" [OPTION]... MIN [SEED]\n"
" -d DEV\tHIP device number\n"
" -n LEN\tLength of random integer array\n"
" -D\t\tPrint kernel disassembly\n"
" -P\t\tPrint source and disassembly of sampled PC locations\n"
"where\n"
" DEV : i32\n"
" MIN : u64\n"
" LEN : u64\n"
" SEED : u64\n",
stderr);
}
static int
get_options(int argc, char **argv, program_options * const opts)
{
int opt;
static int get_options(int argc, char** argv, program_options* const opts) {
int opt;
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
switch (opt) {
case 'd':
// TODO error checking
opts->device = strtol(optarg, nullptr, 10);
break;
case 'n':
// TODO error checking
opts->rands_len = strtoul(optarg, nullptr, 10);
break;
case 'D':
opts->disassemble = true;
break;
case 'P':
opts->pc_sampling = true;
break;
default:
usage();
return EXIT_FAILURE;
}
}
auto const optcount = argc - optind;
if (!(1 == optcount || 2 == optcount)) {
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
switch (opt) {
case 'd':
// TODO error checking
opts->device = strtol(optarg, nullptr, 10);
break;
case 'n':
// TODO error checking
opts->rands_len = strtoul(optarg, nullptr, 10);
break;
case 'D':
opts->disassemble = true;
break;
case 'P':
opts->pc_sampling = true;
break;
default:
usage();
return EXIT_FAILURE;
}
}
// TODO error checking
opts->gt = strtoul(argv[optind], nullptr, 10);
if (2 == argc - optind) {
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
}
auto const optcount = argc - optind;
if (!(1 == optcount || 2 == optcount)) {
usage();
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
// TODO error checking
opts->gt = strtoul(argv[optind], nullptr, 10);
if (2 == argc - optind) {
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
}
return EXIT_SUCCESS;
}
static program_options g_opts;
static void
callback_flush_fn(
rocprofiler_record_header_t const *record,
rocprofiler_record_header_t const *end_record,
rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id)
{
while (record < end_record) {
if (nullptr == record) {
break;
}
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
auto const &pcr = (rocprofiler_record_pc_sample_t &)*record;
printf(
"dispatch[%" PRIu64 "] timestamp(%" PRIu64
") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
") se(%" PRIu32 ")\n",
pcr.pc_sample.dispatch_id.value,
pcr.pc_sample.timestamp.value,
pcr.pc_sample.gpu_id.handle,
pcr.pc_sample.pc,
pcr.pc_sample.se);
if (g_opts.pc_sampling) {
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
}
}
rocprofiler_next_record(record, &record, session_id, buffer_id);
static void callback_flush_fn(rocprofiler_record_header_t const* record,
rocprofiler_record_header_t const* end_record,
rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id) {
while (record < end_record) {
if (nullptr == record) {
break;
}
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
auto const& pcr = (rocprofiler_record_pc_sample_t&)*record;
printf("dispatch[%" PRIu64 "] timestamp(%" PRIu64 ") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
") se(%" PRIu32 ")\n",
pcr.pc_sample.dispatch_id.value, pcr.pc_sample.timestamp.value,
pcr.pc_sample.gpu_id.handle, pcr.pc_sample.pc, pcr.pc_sample.se);
if (g_opts.pc_sampling) {
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
}
}
rocprofiler_next_record(record, &record, session_id, buffer_id);
}
}
static int
run_kernel(program_options const &opts)
{
rocprofiler_session_id_t sid;
rocprofiler_filter_id_t fid, fid2;
rocprofiler_buffer_id_t bid;
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
static int run_kernel(program_options const& opts) {
rocprofiler_session_id_t sid;
rocprofiler_filter_id_t fid, fid2;
rocprofiler_buffer_id_t bid;
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
if (opts.pc_sampling) {
ROCPROFILER_CHECK(
rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to create rocprofiler session\n", stderr);
return EXIT_FAILURE;
}
rocprofiler_filter_property_t property{};
ROCPROFILER_CHECK(
rocprofiler_create_buffer(
sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to add PC sampling session mode\n", stderr);
goto out;
}
ROCPROFILER_CHECK(
rocprofiler_create_filter(
sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
rocprofiler_filter_data_t{},
0, &fid, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_create_filter(
sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{},
0, &fid2, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_set_filter_buffer(sid, fid, bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_set_filter_buffer(sid, fid2, bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(
rocprofiler_start_session(sid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
if (opts.pc_sampling) {
ROCPROFILER_CHECK(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to create rocprofiler session\n", stderr);
return EXIT_FAILURE;
}
{
rocprofiler_filter_property_t property{};
ROCPROFILER_CHECK(
rocprofiler_create_buffer(sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
fputs("error: failed to add PC sampling session mode\n", stderr);
goto out;
}
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
rocprofiler_filter_data_t{}, 0, &fid, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{}, 0, &fid2, property),
rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid, bid), rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid2, bid), rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
ROCPROFILER_CHECK(rocprofiler_start_session(sid), rocprofiler_ok);
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
goto cleanup;
}
}
{
printf("seed = %" PRIu64 "\n", opts.seed);
std::vector<uint64_t> rands(opts.rands_len);
using rands_elt_t = decltype(rands)::value_type;
uint64_t
sm64_state = opts.seed,
xrs_state[4];
uint64_t sm64_state = opts.seed, xrs_state[4];
{
using prng::splitmix64_next;
using prng::xrs_next;
using prng::splitmix64_next;
using prng::xrs_next;
// Initialize the Xoroshiro PRNG
xrs_state[0] = splitmix64_next(&sm64_state);
xrs_state[1] = splitmix64_next(&sm64_state);
xrs_state[2] = splitmix64_next(&sm64_state);
xrs_state[3] = splitmix64_next(&sm64_state);
// Initialize the Xoroshiro PRNG
xrs_state[0] = splitmix64_next(&sm64_state);
xrs_state[1] = splitmix64_next(&sm64_state);
xrs_state[2] = splitmix64_next(&sm64_state);
xrs_state[3] = splitmix64_next(&sm64_state);
// Fill rands with random integers
for (auto &i : rands) {
i = xrs_next(xrs_state);
}
// Fill rands with random integers
for (auto& i : rands) {
i = xrs_next(xrs_state);
}
}
struct tm {
using monoclk = std::chrono::steady_clock;
using dur = std::chrono::duration<double>;
using monoclk = std::chrono::steady_clock;
using dur = std::chrono::duration<double>;
};
using util::hipMalloc_freer;
@@ -322,126 +275,109 @@ run_kernel(program_options const &opts)
auto hip_ok = hipSuccess;
do {
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
{
rands_elt_t *rands_gpu_ptr;
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
rands_gpu.reset(rands_gpu_ptr);
}
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
{
rands_elt_t* rands_gpu_ptr;
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
rands_gpu.reset(rands_gpu_ptr);
}
HIP_CHECK_BREAK(
hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes,
hipMemcpyHostToDevice),
hip_ok);
(void)hipDeviceSynchronize();
HIP_CHECK_BREAK(hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes, hipMemcpyHostToDevice),
hip_ok);
(void)hipDeviceSynchronize();
uint32_t constexpr nthreads = 256U;
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
uint32_t constexpr nthreads = 256U;
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
using count_elt_t = size_t;
using count_elt_t = size_t;
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
{
count_elt_t *count_subtotals_gpu_ptr;
HIP_CHECK_BREAK(
hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes),
hip_ok);
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
}
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
{
count_elt_t* count_subtotals_gpu_ptr;
HIP_CHECK_BREAK(hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes), hip_ok);
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
}
hipLaunchKernelGGL(
kernel::memset_gpu, nblocks, nthreads, 0, 0,
count_subtotals_gpu.get(), 0UL, static_cast<size_t>(nblocks));
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
hipLaunchKernelGGL(kernel::memset_gpu, nblocks, nthreads, 0, 0, count_subtotals_gpu.get(),
0UL, static_cast<size_t>(nblocks));
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
auto const kernel_begin_time = tm::monoclk::now();
auto const kernel_begin_time = tm::monoclk::now();
hipLaunchKernelGGL(
kernel::count_gpu, nblocks, nthreads, 0, 0,
rands_gpu.get(), count_subtotals_gpu.get(), rands.size(),
static_cast<size_t>(nblocks), opts.gt);
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
hipLaunchKernelGGL(kernel::count_gpu, nblocks, nthreads, 0, 0, rands_gpu.get(),
count_subtotals_gpu.get(), rands.size(), static_cast<size_t>(nblocks),
opts.gt);
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
(void)hipDeviceSynchronize();
auto const kernel_end_time = tm::monoclk::now();
auto const kernel_end_time = tm::monoclk::now();
std::vector<size_t> count_subtotals(nblocks);
HIP_CHECK_BREAK(
hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
count_subtotals_nbytes, hipMemcpyDeviceToHost),
hip_ok);
(void)hipDeviceSynchronize();
std::vector<size_t> count_subtotals(nblocks);
HIP_CHECK_BREAK(hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
count_subtotals_nbytes, hipMemcpyDeviceToHost),
hip_ok);
(void)hipDeviceSynchronize();
// TODO parallel sum on GPU
auto const total =
std::accumulate(
count_subtotals.cbegin(), count_subtotals.cend(),
static_cast<size_t>(0));
// TODO parallel sum on GPU
auto const total =
std::accumulate(count_subtotals.cbegin(), count_subtotals.cend(), static_cast<size_t>(0));
auto const all_end_time = tm::monoclk::now();
auto const all_end_time = tm::monoclk::now();
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
auto total_time(all_end_time - begin_time);
tm::dur const total_time_without_tool_init(total_time);
printf("len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
"main kernel time elapsed: %" DBL_FMT "\n"
"full time elapsed: %" DBL_FMT "\n",
rands.size(), opts.gt, total,
kernel_time.count(),
total_time_without_tool_init.count());
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
auto total_time(all_end_time - begin_time);
tm::dur const total_time_without_tool_init(total_time);
printf(
"len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
"main kernel time elapsed: %" DBL_FMT
"\n"
"full time elapsed: %" DBL_FMT "\n",
rands.size(), opts.gt, total, kernel_time.count(), total_time_without_tool_init.count());
} while (false);
if (opts.disassemble) {
disassembly_disassemble_kernels(false);
}
disassembly_disassemble_kernels(false);
}
}
cleanup:
if (opts.pc_sampling) {
rocprofiler_terminate_session(sid);
rocprofiler_flush_data(sid, bid);
rocprofiler_destroy_session(sid);
}
if (opts.pc_sampling) {
rocprofiler_terminate_session(sid);
rocprofiler_flush_data(sid, bid);
rocprofiler_destroy_session(sid);
}
out:
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok
? EXIT_SUCCESS
: EXIT_FAILURE;
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok ? EXIT_SUCCESS : EXIT_FAILURE;
}
int
main(int argc, char **argv)
{
if (auto const ret = get_options(argc, argv, &g_opts);
EXIT_SUCCESS != ret)
{
return ret;
}
int main(int argc, char** argv) {
if (auto const ret = get_options(argc, argv, &g_opts); EXIT_SUCCESS != ret) {
return ret;
}
if (hsa_init() != HSA_STATUS_SUCCESS){
return EXIT_FAILURE;
}
if (hsa_init() != HSA_STATUS_SUCCESS) {
return EXIT_FAILURE;
}
int ret = EXIT_FAILURE;
auto ok = ROCPROFILER_STATUS_SUCCESS;
int ret = EXIT_FAILURE;
auto ok = ROCPROFILER_STATUS_SUCCESS;
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
if (ROCPROFILER_STATUS_SUCCESS == ok) {
ret = run_kernel(g_opts);
} else {
goto out;
}
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
if (ROCPROFILER_STATUS_SUCCESS == ok) {
ret = run_kernel(g_opts);
} else {
goto out;
}
rocprofiler_finalize();
rocprofiler_finalize();
out:
hsa_shut_down();
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret
? EXIT_SUCCESS
: EXIT_FAILURE;
hsa_shut_down();
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret ? EXIT_SUCCESS : EXIT_FAILURE;
}
+23 -25
Просмотреть файл
@@ -23,32 +23,30 @@
#define PROGNAME "code_printing_sample"
#define HIP_ERROR(code) \
do { \
fprintf(stderr, \
PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", \
__FILE__, __LINE__, hipGetErrorString((code))); \
fflush(stderr); \
} while (false);
#define HIP_ERROR(code) \
do { \
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", __FILE__, __LINE__, \
hipGetErrorString((code))); \
fflush(stderr); \
} while (false);
#define HIP_CHECK_BREAK(expr, var) \
if (auto const code = (expr); hipSuccess != code) { \
HIP_ERROR(code); \
(var) = code; \
break; \
}
#define HIP_CHECK_BREAK(expr, var) \
if (auto const code = (expr); hipSuccess != code) { \
HIP_ERROR(code); \
(var) = code; \
break; \
}
#define ROCPROFILER_ERROR(code) \
do { \
fprintf(stderr, \
PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", \
__FILE__, __LINE__, rocprofiler_error_str(code)); \
fflush(stderr); \
} while (false);
#define ROCPROFILER_ERROR(code) \
do { \
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", __FILE__, \
__LINE__, rocprofiler_error_str(code)); \
fflush(stderr); \
} while (false);
#define ROCPROFILER_CHECK(expr, var) \
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
ROCPROFILER_ERROR((var)); \
}
#define ROCPROFILER_CHECK(expr, var) \
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
ROCPROFILER_ERROR((var)); \
}
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
+18 -19
Просмотреть файл
@@ -25,25 +25,24 @@
#include <cstdint>
struct program_options {
program_options()
: device(0)
, no_gpu(false)
, hip_memset(false)
, rands_len(1024 * 1024 * 4)
, gt(0)
, seed(std::chrono::steady_clock::now().time_since_epoch().count())
, disassemble(false)
, pc_sampling(false)
{}
program_options()
: device(0),
no_gpu(false),
hip_memset(false),
rands_len(1024 * 1024 * 4),
gt(0),
seed(std::chrono::steady_clock::now().time_since_epoch().count()),
disassemble(false),
pc_sampling(false) {}
int device;
bool no_gpu;
bool hip_memset;
size_t rands_len;
uint64_t gt;
uint64_t seed;
bool disassemble;
bool pc_sampling;
int device;
bool no_gpu;
bool hip_memset;
size_t rands_len;
uint64_t gt;
uint64_t seed;
bool disassemble;
bool pc_sampling;
};
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
+2 -2
Просмотреть файл
@@ -23,8 +23,8 @@ int main(int argc, char** argv) {
int gpu_agent = 0;
int cpu_agent = 0;
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(&counters[0], counters.size(),
&dp_session_id, gpu_agent, cpu_agent));
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(
&counters[0], counters.size(), &dp_session_id, gpu_agent, cpu_agent));
printf("session start \n");
// start GPU device profiling
+4 -3
Просмотреть файл
@@ -25,9 +25,10 @@ int main(int argc, char** argv) {
counters.emplace_back("GRBM_COUNT");
rocprofiler_filter_id_t filter_id;
[[maybe_unused]] rocprofiler_filter_property_t property = {};
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
rocprofiler_filter_data_t{.counters_names = &counters[0]},
counters.size(), &filter_id, property));
CHECK_ROCPROFILER(
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
rocprofiler_filter_data_t{.counters_names = &counters[0]},
counters.size(), &filter_id, property));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
// Normal HIP Calls
+3 -3
Просмотреть файл
@@ -40,9 +40,9 @@ int main(int argc, char** argv) {
// Kernel Tracing
rocprofiler_filter_id_t kernel_tracing_filter_id;
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
// Normal HIP Calls won't be traced
+3 -3
Просмотреть файл
@@ -35,9 +35,9 @@ int main(int argc, char** argv) {
// Kernel Tracing
rocprofiler_filter_id_t kernel_tracing_filter_id;
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_create_filter(
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
// Normal HIP Calls won't be traced
+287 -201
Просмотреть файл
@@ -1,25 +1,34 @@
# ############################################################################################################################################
# ROCProfiler General Requirements
# ############################################################################################################################################
find_package(Python3 COMPONENTS Interpreter REQUIRED)
find_package(
Python3
COMPONENTS Interpreter
REQUIRED)
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import lxml"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
execute_process(
COMMAND ${Python3_EXECUTABLE} -c "import lxml"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
message(FATAL_ERROR "\
message(
FATAL_ERROR
"\
The \"lxml\" Python3 package is not installed. \
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install lxml\".\
")
endif()
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
execute_process(
COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
RESULT_VARIABLE CPP_HEADER_PARSER
OUTPUT_QUIET)
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
message(FATAL_ERROR "\
message(
FATAL_ERROR
"\
The \"CppHeaderParser\" Python3 package is not installed. \
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install CppHeaderParser\".\
")
@@ -29,134 +38,157 @@ endif()
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR})
# Getting HSA Include Directory
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH
REQUIRED)
get_property(
HSA_RUNTIME_INCLUDE_DIRECTORIES
TARGET hsa-runtime64::hsa-runtime64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HSA_H hsa.h
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hsa
NO_DEFAULT_PATH REQUIRED)
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
find_library(AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_PATH} PATH_SUFFIXES lib)
find_library(
AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so"
HINTS ${CMAKE_PREFIX_PATH}
PATHS ${ROCM_PATH}
PATH_SUFFIXES lib)
if(NOT AQLPROFILE_LIB)
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
endif()
# ############################################################################################################################################
# ########################################################################################
# Adding Old Library Files
# ############################################################################################################################################
set (OLD_LIB_SRC
${LIB_DIR}/core/rocprofiler.cpp
${LIB_DIR}/core/gpu_command.cpp
${LIB_DIR}/core/proxy_queue.cpp
${LIB_DIR}/core/simple_proxy_queue.cpp
${LIB_DIR}/core/intercept_queue.cpp
${LIB_DIR}/core/metrics.cpp
${LIB_DIR}/core/activity.cpp
${LIB_DIR}/util/hsa_rsrc_factory.cpp
)
# ########################################################################################
set(OLD_LIB_SRC
${LIB_DIR}/core/rocprofiler.cpp
${LIB_DIR}/core/gpu_command.cpp
${LIB_DIR}/core/proxy_queue.cpp
${LIB_DIR}/core/simple_proxy_queue.cpp
${LIB_DIR}/core/intercept_queue.cpp
${LIB_DIR}/core/metrics.cpp
${LIB_DIR}/core/activity.cpp
${LIB_DIR}/util/hsa_rsrc_factory.cpp)
# ############################################################################################################################################
# ########################################################################################
# Configuring Basic/Derived Counters
# ############################################################################################################################################
# ########################################################################################
set(COUNTERS_DIR ${PROJECT_SOURCE_DIR}/src/core/counters)
execute_process(
COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py ${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp
COMMENT "Generating basic_counter.cpp...")
COMMAND
${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py
${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp COMMENT
"Generating basic_counter.cpp...")
# execute_process(
# COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp
# COMMENT "Generating derived_counter.cpp...")
# execute_process( COMMAND ${Python3_EXECUTABLE}
# ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived
# ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp COMMENT "Generating
# derived_counter.cpp...")
# ############################################################################################################################################
# ########################################################################################
# ROCProfiler Tracer HIP/HSA Parsing
# ############################################################################################################################################
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH
REQUIRED)
# ########################################################################################
get_property(
HIP_INCLUDE_DIRECTORIES
TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH REQUIRED)
# # Generate the HSA wrapper functions header
add_custom_command(
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py ${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h" "${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py
${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
"${HSA_RUNTIME_INC_PATH}/hsa.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h"
"${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
# # Generate the HSA pretty printers
add_custom_command(
OUTPUT hsa_ostream_ops.h
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o hsa_ext_amd.h.i
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
-in hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
COMMENT "Generating hsa_ostream_ops.h...")
OUTPUT hsa_ostream_ops.h
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
hsa_ext_amd.h.i
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
COMMENT "Generating hsa_ostream_ops.h...")
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH
REQUIRED)
get_property(
HIP_INCLUDE_DIRECTORIES
TARGET hip::amdhip64
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
find_file(
HIP_RUNTIME_API_H hip_runtime_api.h
PATHS ${HIP_INCLUDE_DIRECTORIES}
PATH_SUFFIXES hip
NO_DEFAULT_PATH REQUIRED)
## Generate the HIP pretty printers
# Generate the HIP pretty printers
add_custom_command(
OUTPUT hip_ostream_ops.h
COMMAND ${CMAKE_C_COMPILER} "$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
BYPRODUCTS hip_runtime_api.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
-in hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
COMMENT "Generating hip_ostream_ops.h..."
COMMAND_EXPAND_LISTS)
OUTPUT hip_ostream_ops.h
COMMAND
${CMAKE_C_COMPILER}
"$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o
hip_runtime_api.h.i
BYPRODUCTS hip_runtime_api.h.i
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
COMMENT "Generating hip_ostream_ops.h..."
COMMAND_EXPAND_LISTS)
set(GENERATED_SOURCES
hip_ostream_ops.h
hsa_prof_str.h
hsa_ostream_ops.h
hsa_prof_str.inline.h)
set(GENERATED_SOURCES hip_ostream_ops.h hsa_prof_str.h hsa_ostream_ops.h
hsa_prof_str.inline.h)
# ############################################################################################################################################
# ########################################################################################
# ROCProfiler API
# ############################################################################################################################################
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is
# unavailable
# ########################################################################################
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is unavailable
find_path(PCIACCESS_INCLUDE_DIR pciaccess.h REQUIRED)
find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED)
set(PUBLIC_HEADERS rocprofiler.h)
foreach(header ${PUBLIC_HEADERS})
install(FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
COMPONENT dev)
endforeach()
install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
install(
FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
COMPONENT dev)
endforeach()
install(
DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
COMPONENT dev)
# Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils
file(GLOB ROCPROFILER_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
file(GLOB ROCPROFILER_PROFILER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
file(GLOB ROCPROFILER_TRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
file(GLOB ROCPROFILER_PROFILER_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
file(GLOB ROCPROFILER_TRACER_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
file(GLOB ROCPROFILER_ATT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/att/att.cpp)
file(GLOB ROCPROFILER_CLASS_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
file(GLOB ROCPROFILER_CLASS_SRC_FILES
${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
file(GLOB ROCPROFILER_SPM_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/spm/spm.cpp)
set(CORE_HARDWARE_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware)
file(GLOB CORE_HARDWARE_SRC_FILES ${CORE_HARDWARE_DIR}/*.cpp)
@@ -180,148 +212,202 @@ file(GLOB CORE_COUNTERS_SAMPLER_SRC_FILES ${CORE_SESSION_DIR}/counters_sampler.c
file(GLOB CORE_COUNTERS_SRC_FILES ${PROJECT_BINARY_DIR}/src/api/*_counter.cpp)
file(GLOB CORE_COUNTERS_PARENT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/*.cpp)
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
file(GLOB CORE_COUNTERS_MMIO_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/mmio/*.cpp)
set(CORE_UTILS_DIR ${PROJECT_SOURCE_DIR}/src/utils)
file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp)
set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler)
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp
${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
#### V1 Library
# Compiling/Installing ROCProfiler API V1
# V1 Library Compiling/Installing ROCProfiler API V1
add_library(${ROCPROFILER_TARGET} SHARED ${OLD_LIB_SRC})
set_target_properties(${ROCPROFILER_TARGET} PROPERTIES
CXX_VISIBILITY_PRESET hidden
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION 1.0.0
SOVERSION 1)
set_target_properties(
${ROCPROFILER_TARGET}
PROPERTIES CXX_VISIBILITY_PRESET hidden
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION 1.0.0
SOVERSION 1)
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
target_compile_definitions(${ROCPROFILER_TARGET} PUBLIC AMD_INTERNAL_BUILD)
target_include_directories(${ROCPROFILER_TARGET}
PUBLIC
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
PRIVATE
${LIB_DIR} ${ROOT_DIR}
${PROJECT_SOURCE_DIR}/include/rocprofiler)
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 c stdc++)
target_include_directories(
${ROCPROFILER_TARGET}
PUBLIC $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include/rocprofiler)
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB}
hsa-runtime64::hsa-runtime64 c stdc++)
get_target_property(ROCPROFILER_LIBRARY_V1_NAME ${ROCPROFILER_TARGET} NAME)
get_target_property(ROCPROFILER_LIBRARY_V1_VERSION ${ROCPROFILER_TARGET} VERSION)
get_target_property(ROCPROFILER_LIBRARY_V1_SOVERSION ${ROCPROFILER_TARGET} SOVERSION)
## Install libraries: Non versioned lib file in dev package
## Skipping NameLink as it will be installed using symlinks
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
# installed using symlinks
install(
TARGETS ${ROCPROFILER_TARGET}
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT runtime)
install(
TARGETS ${ROCPROFILER_TARGET}
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT asan)
#### V2 Library
# Compiling/Installing ROCProfiler API
add_library(rocprofiler-v2 SHARED
${ROCPROFILER_SRC_FILES}
${ROCPROFILER_CLASS_SRC_FILES}
${ROCPROFILER_PROFILER_SRC_FILES}
${ROCPROFILER_ATT_SRC_FILES}
${CORE_HARDWARE_SRC_FILES}
${CORE_HSA_SRC_FILES}
${ROCPROFILER_SPM_SRC_FILES}
${CORE_MEMORY_SRC_FILES}
${CORE_SESSION_SRC_FILES}
${CORE_FILTER_SRC_FILES}
${CORE_DEVICE_PROFILING_SRC_FILES}
${CORE_COUNTERS_SAMPLER_SRC_FILES}
${CORE_COUNTERS_PARENT_SRC_FILES}
${CORE_COUNTERS_METRICS_SRC_FILES}
${CORE_COUNTERS_MMIO_SRC_FILES}
${CORE_UTILS_SRC_FILES}
${CORE_HSA_PACKETS_SRC_FILES}
${CORE_HSA_QUEUES_SRC_FILES}
${ROCPROFILER_TRACER_SRC_FILES}
${ROCPROFILER_ROCTRACER_SRC_FILES}
${GENERATED_SOURCES}
${CORE_COUNTERS_SRC_FILES}
${CORE_PC_SAMPLING_FILES})
set_target_properties(rocprofiler-v2 PROPERTIES
CXX_VISIBILITY_PRESET hidden
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
OUTPUT_NAME rocprofiler64
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION ${PROJECT_VERSION}
SOVERSION ${PROJECT_VERSION_MAJOR})
# V2 Library Compiling/Installing ROCProfiler API
add_library(
rocprofiler-v2 SHARED
${ROCPROFILER_SRC_FILES}
${ROCPROFILER_CLASS_SRC_FILES}
${ROCPROFILER_PROFILER_SRC_FILES}
${ROCPROFILER_ATT_SRC_FILES}
${CORE_HARDWARE_SRC_FILES}
${CORE_HSA_SRC_FILES}
${ROCPROFILER_SPM_SRC_FILES}
${CORE_MEMORY_SRC_FILES}
${CORE_SESSION_SRC_FILES}
${CORE_FILTER_SRC_FILES}
${CORE_DEVICE_PROFILING_SRC_FILES}
${CORE_COUNTERS_SAMPLER_SRC_FILES}
${CORE_COUNTERS_PARENT_SRC_FILES}
${CORE_COUNTERS_METRICS_SRC_FILES}
${CORE_COUNTERS_MMIO_SRC_FILES}
${CORE_UTILS_SRC_FILES}
${CORE_HSA_PACKETS_SRC_FILES}
${CORE_HSA_QUEUES_SRC_FILES}
${ROCPROFILER_TRACER_SRC_FILES}
${ROCPROFILER_ROCTRACER_SRC_FILES}
${GENERATED_SOURCES}
${CORE_COUNTERS_SRC_FILES}
${CORE_PC_SAMPLING_FILES})
set_target_properties(
rocprofiler-v2
PROPERTIES CXX_VISIBILITY_PRESET hidden
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
OUTPUT_NAME rocprofiler64
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
VERSION ${PROJECT_VERSION}
SOVERSION ${PROJECT_VERSION_MAJOR})
target_compile_definitions(rocprofiler-v2
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
PRIVATE AMD_INTERNAL_BUILD
PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1)
target_include_directories(rocprofiler-v2
PUBLIC
${HIP_INCLUDE_DIRECTORIES} ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
PRIVATE
${LIB_DIR} ${ROOT_DIR}
${CMAKE_CURRENT_BINARY_DIR}
${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/tools)
target_compile_definitions(
rocprofiler-v2
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
PRIVATE AMD_INTERNAL_BUILD PROF_API_IMPL HIP_PROF_HIP_API_STRING=1
__HIP_PLATFORM_AMD__=1)
target_include_directories(
rocprofiler-v2
PUBLIC ${HIP_INCLUDE_DIRECTORIES}
${HSA_RUNTIME_INCLUDE_DIRECTORIES}
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${CMAKE_CURRENT_BINARY_DIR} ${PROJECT_SOURCE_DIR}
${PROJECT_SOURCE_DIR}/tools)
if(ASAN)
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address)
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa asan dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
target_link_options(
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
-Wl,--no-undefined,-fsanitize=address)
target_link_libraries(
rocprofiler-v2
PRIVATE ${AQLPROFILE_LIB}
hsa-runtime64::hsa-runtime64
Threads::Threads
atomic
numa
asan
dl
c
stdc++
stdc++fs
amd_comgr
${PCIACCESS_LIBRARIES})
else()
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined)
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
target_link_options(
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
-Wl,--no-undefined)
target_link_libraries(
rocprofiler-v2
PRIVATE ${AQLPROFILE_LIB}
hsa-runtime64::hsa-runtime64
Threads::Threads
atomic
numa
dl
c
stdc++
stdc++fs
amd_comgr
${PCIACCESS_LIBRARIES})
endif()
get_target_property(ROCPROFILER_LIBRARY_V2_NAME rocprofiler-v2 OUTPUT_NAME)
get_target_property(ROCPROFILER_LIBRARY_V2_VERSION rocprofiler-v2 VERSION)
get_target_property(ROCPROFILER_LIBRARY_V2_SOVERSION rocprofiler-v2 SOVERSION)
## Prepare Name Link SO files for V1 & V2 Libraries
add_custom_command(TARGET rocprofiler-v2 POST_BUILD
COMMAND ${CMAKE_COMMAND} -E rm -f ${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND ${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND ${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so
)
# Prepare Name Link SO files for V1 & V2 Libraries
add_custom_command(
TARGET rocprofiler-v2
POST_BUILD
COMMAND ${CMAKE_COMMAND} -E rm -f
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND
${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
COMMAND
${CMAKE_COMMAND} -E create_symlink
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so)
# Add custom target to trigger the create_symlink command
add_custom_target(create_rocprofiler_lib DEPENDS rocprofiler-v2 ${ROCPROFILER_TARGET})
## Install libraries: Non versioned lib file in dev package
## Skipping NameLink as it will be installed using symlinks
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
# installed using symlinks
install(
TARGETS rocprofiler-v2
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT runtime)
install(
TARGETS rocprofiler-v2
LIBRARY NAMELINK_SKIP
DESTINATION ${CMAKE_INSTALL_LIBDIR}
COMPONENT asan)
## Installing NameLinks for V1 & V2
## librocprofiler64.so links to V1 library
## librocprofiler64v2.so links to V2 library
install(CODE "execute_process( \
# Installing NameLinks for V1 & V2 librocprofiler64.so links to V1 library
# librocprofiler64v2.so links to V2 library
install(
CODE "execute_process( \
COMMAND ${CMAKE_COMMAND} -E create_symlink \
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION} \
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so \
)" COMPONENT dev
)
install(CODE "execute_process( \
)"
COMPONENT dev)
install(
CODE "execute_process( \
COMMAND ${CMAKE_COMMAND} -E create_symlink \
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION} \
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so \
)" COMPONENT dev
)
)"
COMPONENT dev)
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
configure_file(
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
configure_file(
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
install(DIRECTORY
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
install(
DIRECTORY ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
USE_SOURCE_PERMISSIONS
COMPONENT runtime)
# ############################################################################################################################################
# ########################################################################################
+3 -2
Просмотреть файл
@@ -74,13 +74,14 @@ class ROCProfiler_Singleton {
// Device Profiling Session
bool FindDeviceProfilingSession(rocprofiler_session_id_t session_id);
rocprofiler_session_id_t CreateDeviceProfilingSession(std::vector<std::string> counters,
int cpu_agent_index, int gpu_agent_index);
int cpu_agent_index, int gpu_agent_index);
void DestroyDeviceProfilingSession(rocprofiler_session_id_t session_id);
DeviceProfileSession* GetDeviceProfilingSession(rocprofiler_session_id_t session_id);
// Generic
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind, rocprofiler_filter_data_t filter_data);
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind,
rocprofiler_filter_data_t filter_data);
uint64_t GetUniqueRecordId();
uint64_t GetUniqueKernelDispatchId();
+1 -2
Просмотреть файл
@@ -11,8 +11,7 @@
// TODO(aelwazir): change that to adapt with our own Exception
// What about outside exceptions and callbacks exceptions!!
#define API_METHOD_PREFIX \
try {
#define API_METHOD_PREFIX try {
#define API_METHOD_SUFFIX \
} \
catch (rocprofiler::Exception & e) { \
+20 -21
Просмотреть файл
@@ -61,11 +61,11 @@ void check_status(hsa_status_t status) {
namespace activity_prim {
// PC sampling callback data
struct pcsmp_callback_data_t {
const char* kernel_name; // sampled kernel name
void* data_buffer; // host buffer for tracing data
uint64_t id; // sample id
uint64_t cycle; // sample cycle
uint64_t pc; // sample PC
const char* kernel_name; // sampled kernel name
void* data_buffer; // host buffer for tracing data
uint64_t id; // sample id
uint64_t cycle; // sample cycle
uint64_t pc; // sample PC
};
uint32_t activity_op = UINT32_MAX;
@@ -74,9 +74,8 @@ std::atomic<activity_async_callback_t> activity_callback{NULL};
rocprofiler_t* context = NULL;
hsa_status_t trace_data_cb(hsa_ven_amd_aqlprofile_info_type_t info_type,
hsa_ven_amd_aqlprofile_info_data_t* info_data,
void* data) {
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*) data;
hsa_ven_amd_aqlprofile_info_data_t* info_data, void* data) {
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*)data;
activity_record_t record{};
record.op = activity_op;
@@ -96,11 +95,13 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
hsa_agent_t agent{};
hsa_status_t status = rocprofiler_get_agent(group.context, &agent);
check_status(status);
const rocprofiler::util::AgentInfo* agent_info = rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
const rocprofiler::util::AgentInfo* agent_info =
rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
pcsmp_callback_data_t pcsmp_data{};
pcsmp_data.kernel_name = (const char*)arg;
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(agent_info, rocprofiler::TraceProfile::GetSize());
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(
agent_info, rocprofiler::TraceProfile::GetSize());
status = rocprofiler_iterate_trace_data(group.context, trace_data_cb, &pcsmp_data);
check_status(status);
return false;
@@ -110,8 +111,8 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data, void* user_data,
rocprofiler_group_t* group) {
// context features
const rocprofiler_feature_kind_t trace_kind =
(rocprofiler_feature_kind_t)(ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
const rocprofiler_feature_kind_t trace_kind = (rocprofiler_feature_kind_t)(
ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
const uint32_t feature_count = 1;
const uint32_t parameter_count = 1;
rocprofiler_feature_t* features = new rocprofiler_feature_t[feature_count];
@@ -131,8 +132,8 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
properties.handler_arg = (void*)strdup(callback_data->kernel_name);
// Open profiling context
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count,
&context, 0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count, &context,
0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
check_status(status);
// Get group[0]
@@ -141,7 +142,7 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
return status;
}
} // namespace activity_prim
} // namespace activity_prim
extern "C" {
PUBLIC_API const char* GetOpName(uint32_t op) { return strdup("PCSAMPLE"); }
@@ -152,7 +153,8 @@ PUBLIC_API bool RemoveApiCallback(uint32_t op) { return true; }
PUBLIC_API bool InitActivityCallback(void* callback, void* arg) {
activity_prim::activity_arg = arg;
activity_prim::activity_callback.store((activity_async_callback_t)callback, std::memory_order_release);
activity_prim::activity_callback.store((activity_async_callback_t)callback,
std::memory_order_release);
rocprofiler_queue_callbacks_t queue_callbacks{};
queue_callbacks.dispatch = activity_prim::dispatch_callback;
@@ -191,11 +193,8 @@ struct evt_cb_entry_t {
};
evt_cb_entry_t evt_cb_table[HSA_EVT_ID_NUMBER];
hsa_status_t codeobj_evt_callback(
rocprofiler_hsa_cb_id_t id,
const rocprofiler_hsa_callback_data_t* cb_data,
void* arg)
{
hsa_status_t codeobj_evt_callback(rocprofiler_hsa_cb_id_t id,
const rocprofiler_hsa_callback_data_t* cb_data, void* arg) {
const auto evt = evt_cb_table[id].get();
activity_rtapi_callback_t evt_callback = (activity_rtapi_callback_t)evt.first;
if (evt_callback != NULL) evt_callback(ACTIVITY_DOMAIN_HSA_EVT, id, cb_data, evt.second);
+1 -1
Просмотреть файл
@@ -19,4 +19,4 @@ enum hsa_evt_id_t {
// HSA EVT callback data type
typedef rocprofiler_hsa_callback_data_t hsa_evt_data_t;
#endif // _SRC_CORE_ACTIVITY_H
#endif // _SRC_CORE_ACTIVITY_H
+65 -67
Просмотреть файл
@@ -27,7 +27,7 @@ THE SOFTWARE.
#include <hsa/hsa.h>
#include <hsa/hsa_ext_amd.h>
#include <unistd.h> // usleep
#include <unistd.h> // usleep
#include <atomic>
#include <list>
#include <map>
@@ -91,8 +91,7 @@ class Group {
barrier_signal_{},
dispatch_signal_{},
orig_signal_{},
record_{}
{}
record_{} {}
void Insert(const profile_info_t& info) {
const rocprofiler_feature_kind_t kind = info.rinfo->kind;
@@ -110,11 +109,10 @@ class Group {
}
hsa_status_t Finalize(const bool is_concurrent = false) {
hsa_status_t status = pmc_profile_.Finalize(start_vector_, stop_vector_,
read_vector_, is_concurrent);
hsa_status_t status =
pmc_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
if (status == HSA_STATUS_SUCCESS) {
status = trace_profile_.Finalize(start_vector_, stop_vector_,
read_vector_, is_concurrent);
status = trace_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
}
if (status == HSA_STATUS_SUCCESS) {
if (!pmc_profile_.Empty()) ++n_profiles_;
@@ -137,32 +135,20 @@ class Group {
Context* GetContext() { return context_; }
uint32_t GetIndex() const { return index_; }
void SetBarrierSignal(const hsa_signal_t &signal) {
barrier_signal_ = signal;
}
hsa_signal_t& GetBarrierSignal() {
return barrier_signal_;
}
void SetDispatchSignal(const hsa_signal_t &signal) {
dispatch_signal_ = signal;
}
hsa_signal_t& GetDispatchSignal() {
return dispatch_signal_;
}
void SetOrigSignal(const hsa_signal_t &signal) {
orig_signal_ = signal;
}
const hsa_signal_t& GetOrigSignal() const {
return orig_signal_;
}
rocprofiler_dispatch_record_t* GetRecord() {
return &record_;
}
void SetBarrierSignal(const hsa_signal_t& signal) { barrier_signal_ = signal; }
hsa_signal_t& GetBarrierSignal() { return barrier_signal_; }
void SetDispatchSignal(const hsa_signal_t& signal) { dispatch_signal_ = signal; }
hsa_signal_t& GetDispatchSignal() { return dispatch_signal_; }
void SetOrigSignal(const hsa_signal_t& signal) { orig_signal_ = signal; }
const hsa_signal_t& GetOrigSignal() const { return orig_signal_; }
rocprofiler_dispatch_record_t* GetRecord() { return &record_; }
atomic_refs_t* AtomicRefsCount() { return reinterpret_cast<atomic_refs_t*>(&refs_); }
void ResetRefsCount() { AtomicRefsCount()->store(n_profiles_, std::memory_order_release); }
void IncrRefsCount() { AtomicRefsCount()->fetch_add(1, std::memory_order_acq_rel); }
uint32_t FetchDecrRefsCount() { return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel); }
uint32_t FetchDecrRefsCount() {
return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel);
}
private:
PmcProfile pmc_profile_;
@@ -188,23 +174,23 @@ class Context {
public:
typedef std::map<std::string, rocprofiler_feature_t*> info_map_t;
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
{
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue,
rocprofiler_feature_t* info, const uint32_t info_count,
rocprofiler_handler_t handler, void* handler_arg) {
new (obj) Context(agent_info, queue, info, info_count, handler, handler_arg);
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
}
static void Release(Context* obj) { obj->Destruct(); }
static Context* Create(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
{
static Context* Create(const util::AgentInfo* agent_info, Queue* queue,
rocprofiler_feature_t* info, const uint32_t info_count,
rocprofiler_handler_t handler, void* handler_arg) {
Context* obj = new Context(agent_info, queue, info, info_count, handler, handler_arg);
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
try {
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
} catch(...) {
} catch (...) {
delete obj;
obj = NULL;
std::cerr << "Error: Context Create failed" << std::endl;
@@ -213,7 +199,9 @@ class Context {
return obj;
}
static void Destroy(Context* obj) { if (obj != NULL) delete obj; }
static void Destroy(Context* obj) {
if (obj != NULL) delete obj;
}
void Reset(const uint32_t& group_index) { set_[group_index].ResetRefsCount(); }
@@ -293,8 +281,10 @@ class Context {
hsa_rsrc_->SignalWaitRestore(tuple.completion_signal, 1);
// Restore other signals
RestoreSignals(tuple);
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector)) rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(), NULL};
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector))
rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(),
NULL};
const hsa_status_t status =
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, DataCallback, &callback_data);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
@@ -310,7 +300,8 @@ class Context {
if (expr) {
auto it = info_map_.find(name);
if (it == info_map_.end())
EXC_RAISING(HSA_STATUS_ERROR, "metric '" << name << "', rocprofiler info is not found " << this);
EXC_RAISING(HSA_STATUS_ERROR,
"metric '" << name << "', rocprofiler info is not found " << this);
rocprofiler_feature_t* info = it->second;
info->data.result_double = expr->Eval(args);
info->data.kind = ROCPROFILER_DATA_KIND_DOUBLE;
@@ -324,7 +315,7 @@ class Context {
for (auto& tuple : profile_vector) {
if (pcsmp_mode_) const_cast<profile_t*>(tuple.profile)->event_count = UINT32_MAX;
const hsa_status_t status =
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
}
}
@@ -342,7 +333,10 @@ class Context {
hsa_agent_t GetAgent() const { return agent_; }
Group* GetGroup(const uint32_t& index) { return &set_[index]; }
rocprofiler_handler_t GetHandler(void** arg) const { *arg = handler_arg_; return handler_; }
rocprofiler_handler_t GetHandler(void** arg) const {
*arg = handler_arg_;
return handler_;
}
// Concurrent profiling mode
static bool k_concurrent_;
@@ -358,8 +352,7 @@ class Context {
metrics_(NULL),
handler_(handler),
handler_arg_(handler_arg),
pcsmp_mode_(false)
{}
pcsmp_mode_(false) {}
~Context() { Destruct(); }
@@ -375,8 +368,7 @@ class Context {
}
void Construct(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
{
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg) {
if (info_count == 0) {
set_.push_back(Group(agent_info_, this, 0));
return;
@@ -386,9 +378,11 @@ class Context {
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
if (Initialize(info, info_count) == false) {
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n"); fflush(stdout);
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n");
fflush(stdout);
MetricsGroupSet(agent_info, info, info_count).Print(stdout);
fprintf(stdout, "\n"); fflush(stdout);
fprintf(stdout, "\n");
fflush(stdout);
EXC_RAISING(HSA_STATUS_ERROR, "Metrics list exceeds HW limits");
}
Finalize();
@@ -420,8 +414,8 @@ class Context {
info_map_[name] = info;
auto ret = metrics_map_.insert({name, NULL});
if (!ret.second)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name
<< "' is registered more then once");
EXC_RAISING(HSA_STATUS_ERROR,
"input metric '" << name << "' is registered more then once");
}
}
@@ -437,8 +431,9 @@ class Context {
if (kind == ROCPROFILER_FEATURE_KIND_METRIC) { // Processing metrics features
const Metric* metric = metrics_->Get(name);
if (metric == NULL)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not supported on this hardware: "
<< agent_info_->name);
EXC_RAISING(HSA_STATUS_ERROR,
"input metric '"
<< name << "' is not supported on this hardware: " << agent_info_->name);
#if 0
std::cout << " " << name << (metric->GetExpr() ? " = " + metric->GetExpr()->String() : " counter") << std::endl;
#endif
@@ -493,9 +488,9 @@ class Context {
info->kind = ROCPROFILER_FEATURE_KIND_TRACE;
const event_t* event = NULL;
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
pcsmp_mode_ = true;
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
const Metric* metric = metrics_->Get(name);
if (metric == NULL)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
@@ -559,14 +554,14 @@ class Context {
const bool trace_local = TraceProfile::IsLocal();
util::HsaRsrcFactory* hsa_rsrc = &util::HsaRsrcFactory::Instance();
if (sample_id == 0) {
const uint32_t output_buffer_size = profile->output_buffer.size;
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size) :
calloc(output_buffer_size64, sizeof(uint64_t));
rinfo->data.result_bytes.size = output_buffer_size;
rinfo->data.result_bytes.ptr = ptr;
callback_data->ptr = reinterpret_cast<char*>(ptr);
const uint32_t output_buffer_size = profile->output_buffer.size;
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size)
: calloc(output_buffer_size64, sizeof(uint64_t));
rinfo->data.result_bytes.size = output_buffer_size;
rinfo->data.result_bytes.ptr = ptr;
callback_data->ptr = reinterpret_cast<char*>(ptr);
}
char* result_bytes_ptr = reinterpret_cast<char*>(rinfo->data.result_bytes.ptr);
const char* end = result_bytes_ptr + rinfo->data.result_bytes.size;
@@ -577,8 +572,10 @@ class Context {
char* dest = ptr + sizeof(*header);
if ((dest + size) >= end) {
if (dest < end) size = end - dest;
else EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
if (dest < end)
size = end - dest;
else
EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
}
bool suc = true;
@@ -593,7 +590,9 @@ class Context {
rinfo->data.result_bytes.instance_count = sample_id + 1;
rinfo->data.kind = ROCPROFILER_DATA_KIND_BYTES;
} else
EXC_RAISING(HSA_STATUS_ERROR, "Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src << ") size(" << size << ")");
EXC_RAISING(HSA_STATUS_ERROR,
"Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src
<< ") size(" << size << ")");
} else {
if (sample_id == 0) {
rinfo->data.result_bytes.ptr = profile->output_buffer.ptr;
@@ -647,8 +646,7 @@ class Context {
bool pcsmp_mode_;
};
#define CONTEXT_INSTANTIATE() \
bool rocprofiler::Context::k_concurrent_ = false;
#define CONTEXT_INSTANTIATE() bool rocprofiler::Context::k_concurrent_ = false;
} // namespace rocprofiler
+37 -45
Просмотреть файл
@@ -31,7 +31,7 @@ THE SOFTWARE.
namespace rocprofiler {
class ContextPool {
public:
public:
typedef uint64_t index_t;
typedef std::mutex mutex_t;
@@ -41,16 +41,12 @@ class ContextPool {
std::atomic<bool> completed;
};
static ContextPool* Create(
uint32_t num_entries,
uint32_t payload_bytes,
const util::AgentInfo* agent_info,
rocprofiler_feature_t* info,
const uint32_t info_count,
rocprofiler_pool_handler_t handler,
void* handler_arg)
{
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count, handler, handler_arg);
static ContextPool* Create(uint32_t num_entries, uint32_t payload_bytes,
const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
const uint32_t info_count, rocprofiler_pool_handler_t handler,
void* handler_arg) {
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count,
handler, handler_arg);
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
return obj;
}
@@ -61,18 +57,18 @@ class ContextPool {
if (constructed_ == false) {
Construct(agent_info_, info_, info_count_);
}
const index_t write_index = write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
const index_t write_index =
write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
while (write_index >= (read_index_.load(std::memory_order_acquire) + array_size_bytes_)) {
check_completed();
std::this_thread::yield();
}
entry_t* entry = GetPoolEntry(write_index, pool_entry);
if (entry->completed.load(std::memory_order_relaxed) != false) EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
if (entry->completed.load(std::memory_order_relaxed) != false)
EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
}
void Flush() {
check_completed();
}
void Flush() { check_completed(); }
#if 0
template <class F>
F for_each(const F& f_p) {
@@ -95,7 +91,7 @@ class ContextPool {
return f;
}
#endif
private:
private:
static unsigned aligned64(const unsigned& size) { return (size + 0x3f) & ~0x3fu; }
static bool context_handler(rocprofiler_group_t group, void* arg) {
@@ -105,45 +101,41 @@ class ContextPool {
return true;
}
ContextPool(
uint32_t num_entries,
uint32_t payload_bytes,
const util::AgentInfo* agent_info,
rocprofiler_feature_t* info,
const uint32_t info_count,
rocprofiler_pool_handler_t pool_handler,
void* pool_handler_arg
) :
payload_off_(aligned64(sizeof(entry_t))),
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
array_size_bytes_(entry_size_bytes_ * num_entries),
array_(NULL),
read_index_(0),
write_index_(0),
sync_flag_(false),
ContextPool(uint32_t num_entries, uint32_t payload_bytes, const util::AgentInfo* agent_info,
rocprofiler_feature_t* info, const uint32_t info_count,
rocprofiler_pool_handler_t pool_handler, void* pool_handler_arg)
: payload_off_(aligned64(sizeof(entry_t))),
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
array_size_bytes_(entry_size_bytes_ * num_entries),
array_(NULL),
read_index_(0),
write_index_(0),
sync_flag_(false),
agent_info_(agent_info),
info_(info),
info_count_(info_count),
pool_handler_(pool_handler),
pool_handler_arg_(pool_handler_arg),
constructed_(false)
{}
agent_info_(agent_info),
info_(info),
info_count_(info_count),
pool_handler_(pool_handler),
pool_handler_arg_(pool_handler_arg),
constructed_(false) {}
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info, const uint32_t info_count) {
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
const uint32_t info_count) {
std::lock_guard<mutex_t> lck(mutex_);
if (constructed_ == false) {
array_data_ = (char*) malloc(array_size_bytes_ + 0x3f);
array_data_ = (char*)malloc(array_size_bytes_ + 0x3f);
array_ = reinterpret_cast<char*>(((intptr_t)array_data_ + 0x3f) >> 6 << 6);
if (((intptr_t)array_ & 0x3f) != 0) EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
if (((intptr_t)array_ & 0x3f) != 0)
EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
memset(array_, 0, array_size_bytes_);
const char* end = array_ + array_size_bytes_;
for (char* ptr = array_; ptr < end; ptr += entry_size_bytes_) {
entry_t* entry = reinterpret_cast<entry_t*>(ptr);
entry->pool = this;
entry->context = Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
entry->context =
Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
}
constructed_ = true;
@@ -175,7 +167,7 @@ class ContextPool {
if (sync_flag_.test_and_set(std::memory_order_acquire) == false) {
index_t read_index = read_index_.load(std::memory_order_relaxed);
const index_t write_index = write_index_.load(std::memory_order_relaxed);
while(read_index < write_index) {
while (read_index < write_index) {
rocprofiler_pool_entry_t pool_entry{};
entry_t* entry = GetPoolEntry(read_index, &pool_entry);
if (entry->completed.load(std::memory_order_acquire) == true) {
+11 -12
Просмотреть файл
@@ -1,8 +1,7 @@
#ifndef _CORE_TIMER_H_
#define _CORE_TIMER_H_
template <int Size>
class CoreTimer {
template <int Size> class CoreTimer {
CoreTimer() {
index_ = 0;
freq_in_100mhz_ = MeasureTSCFreqHz();
@@ -20,15 +19,15 @@ class CoreTimer {
// AMD Linux timing
unsigned int unused;
n = __rdtscp(&unused);
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
index_ += 1;
}
double Print()
double Print()
private:
// timer data
double data_[Size];
private :
// timer data
double data_[Size];
// data index
uint32_t index_;
// frequency
@@ -40,20 +39,20 @@ class CoreTimer {
clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
return uint64_t(ts.tv_sec) * 1000000 + ts.tv_nsec / 1000;
}
uint64_t CoreTimer::MeasureTSCFreqHz() {
// Make a coarse interval measurement of TSC ticks for 1 gigacycles.
unsigned int unused;
uint64_t tscTicksEnd;
uint64_t coarseBeginUs = CoarseTimestampUs();
uint64_t tscTicksBegin = __rdtscp(&unused);
do {
tscTicksEnd = __rdtscp(&unused);
} while (tscTicksEnd - tscTicksBegin < 1000000000);
uint64_t coarseEndUs = CoarseTimestampUs();
// Compute the TSC frequency and round to nearest 100MHz.
uint64_t coarseIntervalNs = (coarseEndUs - coarseBeginUs) * 1000;
uint64_t tscIntervalTicks = tscTicksEnd - tscTicksBegin;
@@ -61,4 +60,4 @@ class CoreTimer {
}
};
#endif // _CORE_TIMER_H_
#endif // _CORE_TIMER_H_
+5 -9
Просмотреть файл
@@ -27,8 +27,7 @@ namespace Counter {
static std::atomic<uint64_t> COUNTER_COUNTER{0};
DerivedCounter::DerivedCounter(std::string name, std::string description,
std::string gpu_name)
DerivedCounter::DerivedCounter(std::string name, std::string description, std::string gpu_name)
: Counter(name, description, gpu_name) {
metric_id_ = COUNTER_COUNTER.fetch_add(1, std::memory_order_release);
addCounterToCounterMap();
@@ -41,20 +40,17 @@ DerivedCounter::~DerivedCounter() {
uint64_t DerivedCounter::getMetricId() { return metric_id_; }
std::map<uint64_t, BasicCounter*> *DerivedCounter::getAllCounters() {
return &counters_;
}
std::map<uint64_t, BasicCounter*>* DerivedCounter::getAllCounters() { return &counters_; }
BasicCounter *DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
BasicCounter* DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
return counters_[counter_id];
}
void DerivedCounter::addBasicCounter(uint64_t counter_id,
BasicCounter *counter) {
void DerivedCounter::addBasicCounter(uint64_t counter_id, BasicCounter* counter) {
counters_.emplace(counter_id, counter);
}
@DERIVED_XML_PARSE_RESULT@
@DERIVED_XML_PARSE_RESULT @
} // namespace Counter
+1 -2
Просмотреть файл
@@ -39,8 +39,7 @@ namespace Counter {
class DerivedCounter : Counter {
public:
std::function<uint64_t()> evaluate_metric;
DerivedCounter(std::string name, std::string description,
std::string gpu_name);
DerivedCounter(std::string name, std::string description, std::string gpu_name);
~DerivedCounter();
uint64_t getMetricId();
+4 -3
Просмотреть файл
@@ -108,7 +108,7 @@ bool metrics::ExtractMetricEvents(
// adding result object for derived metric
std::lock_guard<std::mutex> lock(extract_metric_events_lock);
if(metric_names[i].compare("KERNEL_DURATION")==0) {
if (metric_names[i].compare("KERNEL_DURATION") == 0) {
if (results_map.find(metric_names[i]) == results_map.end()) {
results_map[metric_names[i]] = new results_t(metric_names[i], {}, xcc_count);
}
@@ -192,7 +192,7 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
auto it = results_map.find(metric->GetName());
if (it == results_map.end()) rocprofiler::fatal("metric results not found ");
results_t* res = it->second;
if(metric->GetName().compare("KERNEL_DURATION") == 0) {
if (metric->GetName().compare("KERNEL_DURATION") == 0) {
res->val_double = kernel_duration;
continue;
}
@@ -206,7 +206,8 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
void metrics::GetCountersAndMetricResultsByXcc(uint32_t xcc_index,
std::vector<results_t*>& results_list,
std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration) {
std::vector<const Metric*>& metrics_list,
uint64_t kernel_duration) {
for (auto it = results_list.begin(); it != results_list.end(); it++) {
(*it)->val_double =
(*it)->xcc_vals[xcc_index]; // set val_double to hold value for specific xcc
+7 -6
Просмотреть файл
@@ -35,10 +35,10 @@ namespace rocprofiler {
typedef std::vector<double> xcc_results_t;
class results_t{
public:
results_t(std::string in_name, event_t in_event, uint32_t xcc_count):
name(in_name), val_double(0), event(in_event) {
class results_t {
public:
results_t(std::string in_name, event_t in_event, uint32_t xcc_count)
: name(in_name), val_double(0), event(in_event) {
xcc_vals.resize(xcc_count);
std::fill(xcc_vals.begin(), xcc_vals.end(), 0);
}
@@ -78,8 +78,9 @@ bool GetMetricsData(std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
void GetCountersAndMetricResultsByXcc(uint32_t xcc_index, std::vector<results_t*>& results_list,
std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
std::map<std::string, results_t*>& results_map,
std::vector<const Metric*>& metrics_list,
uint64_t kernel_duration = 0);
} // namespace metrics
} // namespace rocprofiler
+1 -1
Просмотреть файл
@@ -45,7 +45,7 @@ THE SOFTWARE.
do { \
std::ostringstream oss; \
oss << __FUNCTION__ << "(), " << stream; \
throw rocprofiler::util::exception(error, oss.str()); \
throw rocprofiler::util::exception(error, oss.str()); \
} while (0)
#define AQL_EXC_RAISING(error, stream) \
Исполняемый файл → Обычный файл
+3 -6
Просмотреть файл
@@ -221,14 +221,11 @@ class MetricsDict {
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
std::unordered_set<std::string> supported_agent_names = {
"gfx906",
"gfx908",
"gfx906", "gfx908",
"gfx90a", // Vega
"gfx940",
"gfx941",
"gfx940", "gfx941",
"gfx942", // Mi300
"gfx1030",
"gfx1031",
"gfx1030", "gfx1031",
"gfx1032", // Navi2x
"gfx1100",
"gfx1101" // Navi3x
+2 -3
Просмотреть файл
@@ -17,8 +17,8 @@ class DFPerfMonMI200 : public PerfMon {
DFPerfMonMI200(const Agent::AgentInfo& info);
~DFPerfMonMI200();
void Start() override;
void Stop() {};
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {};
void Stop(){};
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){};
void SetCounterNames(std::vector<std::string>& counter_names);
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::DF_PERFMON; }
@@ -31,7 +31,6 @@ class DFPerfMonMI200 : public PerfMon {
uint64_t GetFicaNodeOutboundBw(uint32_t ficaa_val);
private:
mmio::DFPerfmonMMIO* mmio_;
static std::mutex mutex_; // should be an MMIO member
+17 -25
Просмотреть файл
@@ -13,12 +13,12 @@ PciePerfMonMI200::~PciePerfMonMI200() {
mmio::MMIOManager::DestroyMMIOInstance(dynamic_cast<mmio::MMIO*>(mmio_));
}
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value){
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value) {
// mmio or ioctl approaches
mmio_->RegisterWriteAPI(reg_offset, value);
mmio_->RegisterWriteAPI(reg_offset, value);
}
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value){
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value) {
// mmio or ioctl approaches
mmio_->RegisterReadAPI(reg_offset, value);
}
@@ -35,44 +35,40 @@ void PciePerfMonMI200::SetCounterNames(std::vector<std::string>& counter_names)
}
}
void PciePerfMonMI200::Start(){
void PciePerfMonMI200::Start() {
// TODO: make sure values stored in table
// in registers header are dec and not hex
Start_RX_TILE_SCLK(event_id_);
}
void PciePerfMonMI200::Stop(){
void PciePerfMonMI200::Stop() {
// TODO: revisit correct value to stop
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
}
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){
uint64_t val=0;
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {
uint64_t val = 0;
Read_RX_TILE_SCLK(val);
rocprofiler_counters_sampler_counter_output_t value = {
ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
static_cast<double>(val)
};
rocprofiler_counters_sampler_counter_output_t value = {ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
static_cast<double>(val)};
values.push_back(value);
}
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event){
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event) {
// Step 1: PORT SEL update
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
// Step 2: EVENT SEL update
uint32_t value = event; // last 8 bits for event
uint32_t value = event; // last 8 bits for event
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_TXCLK3, value);
// Steps 3 & 4: Performance counters initialization, enable:
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
}
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result) {
// Step 5: Performance counters read:
uint32_t lo_val, hi_val;
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_TXCLK3, lo_val);
@@ -84,22 +80,20 @@ void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
result = val | lo_val;
}
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event){
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event) {
// Step 1: PORT SEL update
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
// Step 2: EVENT SEL update
uint32_t value = event; // last 8 bits for event
uint32_t value = event; // last 8 bits for event
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_LCLK1, value);
// Steps 3 & 4: Performance counters initialization, enable:
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
}
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result) {
// Step 5: Performance counters read:
uint32_t lo_val, hi_val;
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_LCLK1, lo_val);
@@ -111,6 +105,4 @@ void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
result = val | lo_val;
}
} // namespace rocprofiler
} // namespace rocprofiler
+1 -1
Просмотреть файл
@@ -22,7 +22,7 @@ class PciePerfMonMI200 : public PerfMon {
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::PCIE_PERFMON; }
private:
// TODO : check google coding std
// TODO : check google coding std
void writeRegister(uint32_t reg_offset, uint32_t value);
void readRegister(uint32_t reg_offset, uint32_t& value);
+236 -237
Просмотреть файл
@@ -4,70 +4,70 @@
#include <stdint.h>
namespace PCIE_MI200 {
// -------- RX Tile TXCLK Start --------
// Step 1: PORT SEL update
const static uint32_t PCIE_PERF_CNTL_EVENT_CI_PORT_SEL = 0x11180250;
// Step 2: EVENT SEL update
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
// Steps 3 & 4: Performance counters initialization, enable:
const static uint32_t PCIE_PERF_COUNT_CNTL = 0x11180200;
// Step 5: Performance counters read:
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK10 = 0x11180930;
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
const static uint32_t PCIE_PERF_COUNT1_TXCLK10 = 0x111808B4;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK10 = 0x11180934;
@@ -127,201 +127,200 @@ const static uint32_t PCIE_PERF_COUNT1_UPVAL_LCLK8 = 0x11180974;
// -------- RX Tile SCLK End ----------
typedef enum{
TX_TILE_TXCLK = 0,
TX_TILE_SCLK = 1,
RX_TILE_TXCLK = 2,
RX_TILE_SCLK = 3,
LC_TILE_TXCLK = 4
}pcie_event_category_t;
typedef enum {
TX_TILE_TXCLK = 0,
TX_TILE_SCLK = 1,
RX_TILE_TXCLK = 2,
RX_TILE_SCLK = 3,
LC_TILE_TXCLK = 4
} pcie_event_category_t;
struct pcie_event_t{
pcie_event_t(int id, pcie_event_category_t cat): event_id(id), event_category(cat){}
int event_id;
pcie_event_category_t event_category;
struct pcie_event_t {
pcie_event_t(int id, pcie_event_category_t cat) : event_id(id), event_category(cat) {}
int event_id;
pcie_event_category_t event_category;
};
const static std::map<std::string, pcie_event_t> pcie_events_table = {
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}
};
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}};
}
} // namespace PCIE_MI200
#endif
+1 -1
Просмотреть файл
@@ -42,6 +42,6 @@ class PerfMon {
std::vector<std::string> counter_names_;
};
} // namespace rocprofiler
} // namespace rocprofiler
#endif
+19 -17
Просмотреть файл
@@ -31,10 +31,8 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
namespace rocprofiler {
size_t CreateGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
packet_t* command,
const size_t& slot_count) {
size_t CreateGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
packet_t* command, const size_t& slot_count) {
if (op >= NUMBER_GPU_CMD_OP) EXC_RAISING(HSA_STATUS_ERROR, "bad op value (" << op << ")");
const bool is_legacy = (strncmp(agent_info->name, "gfx8", 4) == 0);
@@ -49,14 +47,15 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
profile.agent = agent_info->dev_id;
// Query for cmd buffer size
hsa_ven_amd_aqlprofile_info_type_t info_type =
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
hsa_status_t status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
hsa_status_t status =
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
if (profile.command_buffer.size == 0) EXC_RAISING(status, "get_info(ENABLE_CMD).size == 0");
// Allocate cmd buffer
const size_t aligment_mask = 0x100 - 1;
profile.command_buffer.ptr =
hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
profile.command_buffer.ptr = hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
if ((reinterpret_cast<uintptr_t>(profile.command_buffer.ptr) & aligment_mask) != 0) {
EXC_RAISING(status, "profile.command_buffer.ptr bad alignment");
}
@@ -66,15 +65,18 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
packet_t packet{};
// Query for cmd buffer data
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
status =
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
// Check for legacy GFXIP
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_legacy_get_pm4(&packet, command);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
if (status != HSA_STATUS_SUCCESS)
AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
} else {
// Query for cmd buffer data
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
status =
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
}
@@ -91,15 +93,14 @@ struct gpu_cmd_key_t {
uint32_t node_id;
};
struct gpu_cmd_fncomp_t {
bool operator() (const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
bool operator()(const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
return (a.op < b.op) || ((a.op == b.op) && (a.node_id < b.node_id));
}
};
typedef std::map<gpu_cmd_key_t, gpu_cmd_entry_t, gpu_cmd_fncomp_t> gpu_cmd_map_t;
size_t GetGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out) {
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out) {
thread_local gpu_cmd_map_t map;
// Getting NUMA node id
@@ -112,7 +113,8 @@ size_t GetGpuCommand(gpu_cmd_op_t op,
auto ret = map.insert({gpu_cmd_key_t{op, node_id}, gpu_cmd_entry_t{}});
gpu_cmd_map_t::iterator it = ret.first;
if (ret.second) {
it->second.size = CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
it->second.size =
CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
}
*command_out = it->second.command;
+3 -6
Просмотреть файл
@@ -37,9 +37,8 @@ enum gpu_cmd_op_t {
NUMBER_GPU_CMD_OP
};
size_t GetGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out);
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
packet_t** command_out);
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
const rocprofiler::util::AgentInfo* agent_info,
@@ -55,9 +54,7 @@ static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
return HSA_STATUS_SUCCESS;
}
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
hsa_agent_t agent,
hsa_queue_t* queue) {
static inline size_t IssueGpuCommand(gpu_cmd_op_t op, hsa_agent_t agent, hsa_queue_t* queue) {
rocprofiler::util::HsaRsrcFactory* hsa_rsrc = &rocprofiler::util::HsaRsrcFactory::Instance();
const rocprofiler::util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(agent);
return IssueGpuCommand(op, agent_info, queue);
+25 -26
Просмотреть файл
@@ -55,31 +55,30 @@ struct block_status_t {
// Metrics set class
class MetricsGroup {
public:
public:
// Info map type
typedef std::map<std::string, const Metric*> info_map_t;
// Blocks map type
typedef std::map<block_des_t, block_status_t, lt_block_des> blocks_map_t;
MetricsGroup(const util::AgentInfo* agent_info) :
agent_info_(agent_info)
{
MetricsGroup(const util::AgentInfo* agent_info) : agent_info_(agent_info) {
metrics_ = MetricsDict::Create(agent_info);
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
}
void Print(FILE* file) const {
for (const Metric* metric : metrics_vec_) {
fprintf(file, " %s", metric->GetName().c_str()); fflush(stdout);
fprintf(file, " %s", metric->GetName().c_str());
fflush(stdout);
}
fprintf(file, "\n"); fflush(stdout);
fprintf(file, "\n");
fflush(stdout);
}
static const Metric* GetMetric(const MetricsDict* metrics, const std::string& name) {
// Metric object
const Metric* metric = metrics->Get(name);
if (metric == NULL)
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
if (metric == NULL) EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
return metric;
}
@@ -95,9 +94,7 @@ class MetricsGroup {
}
// Add metric
bool AddMetric(const rocprofiler_feature_t* info) {
return AddMetric(GetMetric(metrics_, info));
}
bool AddMetric(const rocprofiler_feature_t* info) { return AddMetric(GetMetric(metrics_, info)); }
bool AddMetric(const Metric* metric) {
// Blocks utilization delta
@@ -125,8 +122,9 @@ class MetricsGroup {
query.events = event;
uint32_t block_counters;
hsa_status_t status = util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
hsa_status_t status =
util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "get block_counters info");
block_status.max_counters = block_counters;
}
@@ -141,7 +139,8 @@ class MetricsGroup {
metrics_vec_.push_back(metric);
info_map_[metric->GetName()] = metric;
for (const counter_t* counter : counters_vec) {
if (info_map_.find(counter->name) == info_map_.end()) info_map_[counter->name] = NewCounterInfo(counter->name);
if (info_map_.find(counter->name) == info_map_.end())
info_map_[counter->name] = NewCounterInfo(counter->name);
}
for (const auto& entry : blocks_delta) {
blocks_map_[entry.first] = entry.second;
@@ -150,10 +149,8 @@ class MetricsGroup {
return true;
}
private:
const Metric* NewCounterInfo(const std::string& name) const {
return GetMetric(metrics_, name);
}
private:
const Metric* NewCounterInfo(const std::string& name) const { return GetMetric(metrics_, name); }
// Agent info
const util::AgentInfo* const agent_info_;
@@ -169,10 +166,10 @@ class MetricsGroup {
// Metrics groups class
class MetricsGroupSet {
public:
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array, const uint32_t info_count) :
agent_info_(agent_info)
{
public:
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array,
const uint32_t info_count)
: agent_info_(agent_info) {
metrics_ = MetricsDict::Create(agent_info);
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
Initialize(info_array, info_count);
@@ -186,12 +183,13 @@ class MetricsGroupSet {
void Print(FILE* file) const {
for (const auto* group : groups_) {
fprintf(stdout, " pmc : "); fflush(stdout);
fprintf(stdout, " pmc : ");
fflush(stdout);
group->Print(file);
}
}
private:
private:
void Initialize(const rocprofiler_feature_t* info_array, const uint32_t info_count) {
std::multimap<uint32_t, const Metric*, std::greater<uint32_t> > input_metrics;
for (unsigned i = 0; i < info_count; ++i) {
@@ -202,7 +200,8 @@ class MetricsGroupSet {
input_metrics.insert({counters_num, metric});
if (MetricsGroup(agent_info_).AddMetric(metric) == false) {
AQL_EXC_RAISING(HSA_STATUS_ERROR, "Metric '" << metric->GetName() << "' doesn't fit in one group");
AQL_EXC_RAISING(HSA_STATUS_ERROR,
"Metric '" << metric->GetName() << "' doesn't fit in one group");
}
}
#if 0
@@ -239,4 +238,4 @@ class MetricsGroupSet {
} // namespace rocprofiler
#endif // SRC_CORE_GROUP_SET_H_
#endif // SRC_CORE_GROUP_SET_H_
+14 -19
Просмотреть файл
@@ -62,33 +62,28 @@ AgentInfo::AgentInfo(const hsa_agent_t agent, ::CoreApiTable* table) : handle_(a
table->hsa_agent_get_info_fn(
agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES), &se_num_);
if (table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
&waves_per_cu_) != HSA_STATUS_SUCCESS)
{
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
if (table->hsa_agent_get_info_fn(agent,
(hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
&waves_per_cu_) != HSA_STATUS_SUCCESS) {
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
}
compute_units_per_sh_ = cu_num_ / (se_num_ * shader_arrays_per_se_);
wave_slots_per_simd_ = waves_per_cu_ / simds_per_cu_;
if (table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
&pci_domain_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
&pci_location_id_) != HSA_STATUS_SUCCESS)
{
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
if (table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
&pci_domain_) != HSA_STATUS_SUCCESS ||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
&pci_location_id_) != HSA_STATUS_SUCCESS) {
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
}
// TODO(saurabh, giovanni): Remove this in 5.7
if (table->hsa_agent_get_info_fn(agent,
static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC), &xcc_num_) != HSA_STATUS_SUCCESS) {
xcc_num_ = 1;
if (table->hsa_agent_get_info_fn(agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC),
&xcc_num_) != HSA_STATUS_SUCCESS) {
xcc_num_ = 1;
}
}
+5 -7
Просмотреть файл
@@ -33,8 +33,8 @@ Agent::AgentInfo& GetAgentInfo(decltype(hsa_agent_t::handle) handle) {
if (agent_info_map.find(handle) != agent_info_map.end()) {
return agent_info_map.at(handle);
} else {
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle) <<
") in this system" << std::endl;
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle)
<< ") in this system" << std::endl;
abort();
}
}
@@ -49,9 +49,7 @@ void SetAgentInfo(decltype(hsa_agent_t::handle) handle, const Agent::AgentInfo&
}
}
std::vector<hsa_agent_t>& GetCPUAgentList() {
return cpu_agents_list;
}
std::vector<hsa_agent_t>& GetCPUAgentList() { return cpu_agents_list; }
hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
std::lock_guard<std::mutex> lock(agents_map_lock);
@@ -60,8 +58,8 @@ hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
return hsa_agent_t{agent_info.second.getHandle()};
}
}
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index) <<
") in this system" << std::endl;
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index)
<< ") in this system" << std::endl;
abort();
}
+1 -1
Просмотреть файл
@@ -95,7 +95,7 @@ namespace rocprofiler {
namespace hsa_support {
void Initialize(HsaApiTable* Table);
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void *data);
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void* data);
void Finalize();
bool IterateCounters(rocprofiler_counters_info_callback_t counters_info_callback);
+8 -6
Просмотреть файл
@@ -181,7 +181,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
// TODO: validate needs to be called on each events_list[i]
// Validating the events array for the specified gpu agent
if(events_list.size() > 0) {
if (events_list.size() > 0) {
bool validate_event_result;
status =
hsa_ven_amd_aqlprofile_validate_event(gpu_agent, &events_list[0], &validate_event_result);
@@ -234,9 +234,10 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
}
}
for(auto& cname : counter_names) {
if(cname.compare("KERNEL_DURATION")==0) {
rocprofiler::Metric* metric = const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
for (auto& cname : counter_names) {
if (cname.compare("KERNEL_DURATION") == 0) {
rocprofiler::Metric* metric =
const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
if (metric == nullptr) std::cout << cname << " not found in metricsDict\n";
context->metrics_list.push_back(metric);
}
@@ -315,7 +316,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
hsa_agent_t ag_list[ag_list_count];
ag_list[0] = gpu_agent;
if(context->events_list.size() > 0) {
if (context->events_list.size() > 0) {
// Preparing an Getting the size of the command and output buffers
status = hsa_ven_amd_aqlprofile_start(profile, NULL);
// CHECK_HSA_STATUS("Error: Getting Buffers Size", status);
@@ -510,7 +511,8 @@ uint8_t* AllocateLocalMemory(size_t size, hsa_amd_memory_pool_t* gpu_pool) {
return ptr;
}
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile, size_t att_buffer_size) {
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile,
size_t att_buffer_size) {
Agent::AgentInfo& agentInfo = rocprofiler::hsa_support::GetAgentInfo(gpu_agent.handle);
profile->command_buffer.ptr =
AllocateSysMemory(gpu_agent, profile->command_buffer.size, &agentInfo.cpu_pool);
+41 -41
Просмотреть файл
@@ -435,16 +435,18 @@ bool AsyncSignalHandler(hsa_signal_value_t signal_value, void* data) {
pending->session_id = GetROCProfilerSingleton()->GetCurrentSessionId();
}
if (pending->counters_count > 0) {
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 && pending->profile) // call to GetCounterData() is required only once for a dispatch
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 &&
pending->profile) // call to GetCounterData() is required only once for a dispatch
rocprofiler::metrics::GetCounterData(pending->profile, queue_info_session->agent,
pending->context->results_list);
if (is_individual_xcc_mode)
rocprofiler::metrics::GetCountersAndMetricResultsByXcc(
xcc_id, pending->context->results_list, pending->context->results_map,
pending->context->metrics_list, time.end-time.start);
pending->context->metrics_list, time.end - time.start);
else
rocprofiler::metrics::GetMetricsData(pending->context->results_map,
pending->context->metrics_list, time.end-time.start);
pending->context->metrics_list,
time.end - time.start);
AddRecordCounters(&record, pending);
} else {
if (session->FindBuffer(pending->buffer_id)) {
@@ -652,8 +654,8 @@ void CheckNeededProfileConfigs() {
att_counters_names = filter->GetCounterData();
kernel_profile_names = std::get<std::vector<std::string>>(
filter->GetProperty(ROCPROFILER_FILTER_KERNEL_NAMES));
kernel_profile_dispatch_ids = std::get<std::vector<uint64_t>>(
filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
kernel_profile_dispatch_ids =
std::get<std::vector<uint64_t>>(filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
} else if (session && session->FindFilterWithKind(ROCPROFILER_PC_SAMPLING_COLLECTION)) {
is_pc_sampling_collection_mode = true;
}
@@ -685,23 +687,20 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
auto& kdispatch = static_cast<const hsa_kernel_dispatch_packet_s*>(packets)[i];
// If Dispatch IDs specified, profile based on dispatch ID
for (auto id : kernel_profile_dispatch_ids)
b_profile_this_object |= id == current_writer_id;
for (auto id : kernel_profile_dispatch_ids) b_profile_this_object |= id == current_writer_id;
try {
// Can throw
const std::string& kernel_name = ksymbols->at(kdispatch.kernel_object);
// If no filters specified, auto profile this kernel
if (kernel_profile_names.size() == 0 &&
kernel_profile_dispatch_ids.size() == 0 &&
if (kernel_profile_names.size() == 0 && kernel_profile_dispatch_ids.size() == 0 &&
kernel_name.find("__amd_rocclr_") == std::string::npos)
b_profile_this_object = true;
b_profile_this_object = true;
// Try to match the mangled kernel name with given matches in input.txt
// We want to initiate att profiling if a match exists
for (const std::string& kernel_matches : kernel_profile_names)
if (kernel_name.find(kernel_matches) != std::string::npos)
b_profile_this_object = true;
if (kernel_name.find(kernel_matches) != std::string::npos) b_profile_this_object = true;
} catch (...) {
printf("Warning: Unknown name for object %lu\n", kdispatch.kernel_object);
}
@@ -711,17 +710,13 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
can_profile_packet.push_back(b_profile_this_object);
}
// If we're going to skip all packets, need to update writer ID
if (!b_can_profile_anypacket)
WRITER_ID.store(current_writer_id, std::memory_order_release);
if (!b_can_profile_anypacket) WRITER_ID.store(current_writer_id, std::memory_order_release);
return {can_profile_packet, b_can_profile_anypacket};
}
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
Packet::packet_t& start_packet,
Packet::packet_t& stop_packet,
Queue& queue_info,
Agent::AgentInfo& agentInfo
) {
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(Packet::packet_t& start_packet,
Packet::packet_t& stop_packet, Queue& queue_info,
Agent::AgentInfo& agentInfo) {
std::vector<hsa_ven_amd_aqlprofile_parameter_t> att_params;
int num_att_counters = 0;
uint32_t att_buffer_size = DEFAULT_ATT_BUFFER_SIZE;
@@ -731,15 +726,16 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
case ROCPROFILER_ATT_PERFCOUNTER_NAME:
break;
case ROCPROFILER_ATT_BUFFER_SIZE:
att_buffer_size = std::max(96l<<10l, std::min(int64_t(param.value)<<20l, (1l<<32l)-(3l<<20)));
break; // Clip to [96KB, 4GB)
att_buffer_size =
std::max(96l << 10l, std::min(int64_t(param.value) << 20l, (1l << 32l) - (3l << 20)));
break; // Clip to [96KB, 4GB)
case ROCPROFILER_ATT_PERFCOUNTER:
num_att_counters += 1;
break;
default:
att_params.push_back(
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
param.value});
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
param.value});
}
}
@@ -760,22 +756,21 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
printf("Only events from the SQ block can be selected for ATT.");
exit(1);
}
att_params.push_back({static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(
int(ROCPROFILER_ATT_PERFCOUNTER)),
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
att_params.push_back(
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
num_att_counters += 1;
}
hsa_ven_amd_aqlprofile_parameter_t zero_perf = {
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
0};
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)), 0};
// Fill other perfcounters with 0's
for (; num_att_counters < 16; num_att_counters++) att_params.push_back(zero_perf);
}
// Get the PM4 Packets using packets_generator
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(),
att_params, &start_packet, &stop_packet, att_buffer_size);
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(), att_params,
&start_packet, &stop_packet, att_buffer_size);
}
/**
@@ -866,14 +861,16 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
record_id);
if (session_data_count > 0 && profile.second) {
session->GetProfiler()->AddPendingSignals(
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
profile.first, session_data_count, profile.second, kernel_properties,
(uint32_t)syscall(__NR_gettid), user_pkt_index, correlation_id);
writer_id, record_id, original_packet.completion_signal,
dispatch_packet.completion_signal, session_id, buffer_id, profile.first,
session_data_count, profile.second, kernel_properties, (uint32_t)syscall(__NR_gettid),
user_pkt_index, correlation_id);
} else {
session->GetProfiler()->AddPendingSignals(
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
nullptr, session_data_count, nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid),
user_pkt_index, correlation_id);
writer_id, record_id, original_packet.completion_signal,
dispatch_packet.completion_signal, session_id, buffer_id, nullptr, session_data_count,
nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index,
correlation_id);
}
}
@@ -893,7 +890,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
CreateSignal(0, &interrupt_signal);
// Adding Stop and Read PM4 Packets
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 && profile.first && profile.first->stop_packet) {
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 &&
profile.first && profile.first->stop_packet) {
hsa_signal_t dummy_signal{};
profile.first->stop_packet->header = HSA_PACKET_TYPE_VENDOR_SPECIFIC
<< HSA_PACKET_HEADER_TYPE;
@@ -937,7 +935,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
bool can_profile_anypacket = false;
std::vector<bool> can_profile_packet;
std::tie(can_profile_packet, can_profile_anypacket) = GetAllowedProfilesList(packets, pkt_count);
std::tie(can_profile_packet, can_profile_anypacket) =
GetAllowedProfilesList(packets, pkt_count);
if (!can_profile_anypacket) {
/* Write the original packets to the hardware if no patch will be profiled */
@@ -964,8 +963,9 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
// increment writer ID for every packet
if (bit_extract(original_packet.header, HSA_PACKET_HEADER_TYPE,
HSA_PACKET_HEADER_TYPE+HSA_PACKET_HEADER_WIDTH_TYPE-1) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
HSA_PACKET_HEADER_TYPE + HSA_PACKET_HEADER_WIDTH_TYPE - 1) ==
HSA_PACKET_TYPE_KERNEL_DISPATCH)
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
continue;
}
+99 -158
Просмотреть файл
@@ -37,33 +37,37 @@ SOFTWARE.
#include "util/exception.h"
#include "util/hsa_rsrc_factory.h"
#define HSA_RT(call) \
do { \
const hsa_status_t status = call; \
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
} while(0)
#define IS_HSA_CALLBACK(ID) \
const auto __id = ID; (void)__id; \
void *__arg = arg_.load(); (void)__arg; \
rocprofiler_hsa_callback_fun_t __callback = \
(ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) ? callbacks_.allocate: \
(ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device: \
(ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy: \
(ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit: \
(ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol: \
callbacks_.codeobj; \
if ((__callback != NULL) && (recursion_ == false))
#define DO_HSA_CALLBACK \
do { \
recursion_ = true; \
__callback(__id, &data, __arg); \
recursion_ = false; \
#define HSA_RT(call) \
do { \
const hsa_status_t status = call; \
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
} while (0)
#define ISSUE_HSA_CALLBACK(ID) \
do { IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } } while(0)
#define IS_HSA_CALLBACK(ID) \
const auto __id = ID; \
(void)__id; \
void* __arg = arg_.load(); \
(void)__arg; \
rocprofiler_hsa_callback_fun_t __callback = (ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) \
? callbacks_.allocate \
: (ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device \
: (ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy \
: (ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit \
: (ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol \
: callbacks_.codeobj; \
if ((__callback != NULL) && (recursion_ == false))
#define DO_HSA_CALLBACK \
do { \
recursion_ = true; \
__callback(__id, &data, __arg); \
recursion_ = false; \
} while (0)
#define ISSUE_HSA_CALLBACK(ID) \
do { \
IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } \
} while (0)
// Demangle C++ symbol name
static const char* cpp_demangle(const char* symname) {
@@ -74,15 +78,15 @@ static const char* cpp_demangle(const char* symname) {
}
namespace rocprofiler {
extern decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
extern decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
extern decltype(hsa_memory_copy)* hsa_memory_copy_fn;
extern decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
extern decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
extern decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
extern decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
extern decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
extern decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
extern decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
extern decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
extern decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
extern decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
extern decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
extern decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
extern decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
extern decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
extern decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
class HsaInterceptor {
public:
@@ -95,10 +99,7 @@ class HsaInterceptor {
if (enable_) {
// Fetching AMD Loader HSA extension API
HSA_RT(hsa_system_get_major_extension_table(
HSA_EXTENSION_AMD_LOADER,
1,
sizeof(hsa_ven_amd_loader_1_01_pfn_t),
&LoaderApiTable));
HSA_EXTENSION_AMD_LOADER, 1, sizeof(hsa_ven_amd_loader_1_01_pfn_t), &LoaderApiTable));
// Saving original API functions
hsa_memory_allocate_fn = table->core_->hsa_memory_allocate_fn;
@@ -131,10 +132,7 @@ class HsaInterceptor {
}
private:
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region,
size_t size,
void** ptr)
{
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region, size_t size, void** ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_memory_allocate_fn(region, size, ptr));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
@@ -150,11 +148,8 @@ class HsaInterceptor {
return status;
}
static hsa_status_t MemoryAssignAgent(
void *ptr,
hsa_agent_t agent,
hsa_access_permission_t access)
{
static hsa_status_t MemoryAssignAgent(void* ptr, hsa_agent_t agent,
hsa_access_permission_t access) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_memory_assign_agent_fn(ptr, agent, access));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
@@ -169,11 +164,7 @@ class HsaInterceptor {
}
// Spawn device allow access callback
static void DeviceCallback(
uint32_t num_agents,
const hsa_agent_t* agents,
const void* ptr)
{
static void DeviceCallback(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr) {
for (const hsa_agent_t* agent_p = agents; agent_p < (agents + num_agents); ++agent_p) {
hsa_agent_t agent = *agent_p;
rocprofiler_hsa_callback_data_t data{};
@@ -188,17 +179,11 @@ class HsaInterceptor {
}
// Agent allow access callback 'hsa_amd_agents_allow_access'
static hsa_status_t AgentsAllowAccess(
uint32_t num_agents,
const hsa_agent_t* agents,
const uint32_t* flags,
const void* ptr)
{
static hsa_status_t AgentsAllowAccess(uint32_t num_agents, const hsa_agent_t* agents,
const uint32_t* flags, const void* ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_amd_agents_allow_access_fn(num_agents, agents, flags, ptr));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
DeviceCallback(num_agents, agents, ptr);
}
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) { DeviceCallback(num_agents, agents, ptr); }
return status;
}
@@ -218,12 +203,8 @@ class HsaInterceptor {
return HSA_STATUS_SUCCESS;
}
static hsa_status_t MemoryPoolAllocate(
hsa_amd_memory_pool_t pool,
size_t size,
uint32_t flags,
void** ptr)
{
static hsa_status_t MemoryPoolAllocate(hsa_amd_memory_pool_t pool, size_t size, uint32_t flags,
void** ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_amd_memory_pool_allocate_fn(pool, size, flags, ptr));
if (size != 0) {
@@ -232,8 +213,10 @@ class HsaInterceptor {
data.allocate.ptr = *ptr;
data.allocate.size = size;
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &data.allocate.segment));
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &data.allocate.global_flag));
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT,
&data.allocate.segment));
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS,
&data.allocate.global_flag));
DO_HSA_CALLBACK;
@@ -246,9 +229,7 @@ class HsaInterceptor {
}
return status;
}
static hsa_status_t MemoryPoolFree(
void* ptr)
{
static hsa_status_t MemoryPoolFree(void* ptr) {
hsa_status_t status = HSA_STATUS_SUCCESS;
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
rocprofiler_hsa_callback_data_t data{};
@@ -260,11 +241,7 @@ class HsaInterceptor {
return status;
}
static hsa_status_t MemoryCopy(
void *dst,
const void *src,
size_t size)
{
static hsa_status_t MemoryCopy(void* dst, const void* src, size_t size) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_memory_copy_fn(dst, src, size));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
@@ -277,17 +254,13 @@ class HsaInterceptor {
return status;
}
static hsa_status_t MemoryAsyncCopy(
void* dst, hsa_agent_t dst_agent, const void* src,
hsa_agent_t src_agent, size_t size,
uint32_t num_dep_signals,
const hsa_signal_t* dep_signals,
hsa_signal_t completion_signal)
{
static hsa_status_t MemoryAsyncCopy(void* dst, hsa_agent_t dst_agent, const void* src,
hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals,
const hsa_signal_t* dep_signals,
hsa_signal_t completion_signal) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_amd_memory_async_copy_fn(
dst, dst_agent, src, src_agent, size,
num_dep_signals, dep_signals, completion_signal));
HSA_RT(hsa_amd_memory_async_copy_fn(dst, dst_agent, src, src_agent, size, num_dep_signals,
dep_signals, completion_signal));
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
rocprofiler_hsa_callback_data_t data{};
data.memcopy.dst = dst;
@@ -298,14 +271,11 @@ class HsaInterceptor {
return status;
}
static hsa_status_t CodeObjectCallback(
hsa_executable_t executable,
hsa_loaded_code_object_t loaded_code_object,
void* arg)
{
static hsa_status_t CodeObjectCallback(hsa_executable_t executable,
hsa_loaded_code_object_t loaded_code_object, void* arg) {
const int free_flag = reinterpret_cast<long>(arg);
hsa_ven_amd_loader_code_object_storage_type_t storage_type =
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
int storage_fd = -1;
uint64_t memory_base = 0;
uint64_t memory_size = 0;
@@ -316,56 +286,45 @@ class HsaInterceptor {
char* uri_str = NULL;
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
&storage_type));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
&storage_type));
if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE) {
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
&storage_fd));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
&storage_fd));
if (storage_fd == -1) {
printf("CodeObjectCallback: fd == -1\n"); fflush(stdout);
abort();
printf("CodeObjectCallback: fd == -1\n");
fflush(stdout);
abort();
}
} else if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY) {
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
&memory_base));
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
&memory_base));
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
&memory_size));
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
&memory_size));
}
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE,
&load_base));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE, &load_base));
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE,
&load_size));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE, &load_size));
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA,
&load_delta));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA, &load_delta));
// Getting URI
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH,
&uri_len));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH, &uri_len));
uri_str = (char*)calloc(uri_len + 1, sizeof(char));
if (!uri_str) EXC_ABORT(HSA_STATUS_ERROR, "URI allocation");
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
loaded_code_object,
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI,
uri_str));
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI, uri_str));
if (storage_type != HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE) {
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_CODEOBJ) {
@@ -377,8 +336,8 @@ class HsaInterceptor {
data.codeobj.load_base = load_base;
data.codeobj.load_size = load_size;
data.codeobj.load_delta = load_delta;
data.codeobj.uri_length = uri_len;
data.codeobj.uri = uri_str;
data.codeobj.uri_length = uri_len;
data.codeobj.uri = uri_str;
data.codeobj.unload = free_flag;
DO_HSA_CALLBACK;
@@ -406,12 +365,8 @@ class HsaInterceptor {
uint32_t num_agents = 0;
hsa_agent_t* agents = NULL;
pointer_info.size = sizeof(hsa_amd_pointer_info_t);
HSA_RT(hsa_amd_pointer_info(
reinterpret_cast<void*>(load_base),
&pointer_info,
malloc,
&num_agents,
&agents));
HSA_RT(hsa_amd_pointer_info(reinterpret_cast<void*>(load_base), &pointer_info, malloc,
&num_agents, &agents));
DeviceCallback(num_agents, agents, reinterpret_cast<void*>(load_base));
}
@@ -420,11 +375,8 @@ class HsaInterceptor {
return HSA_STATUS_SUCCESS;
}
static hsa_status_t KernelSymbolCallback(
hsa_executable_t executable,
hsa_executable_symbol_t symbol,
void *arg)
{
static hsa_status_t KernelSymbolCallback(hsa_executable_t executable,
hsa_executable_symbol_t symbol, void* arg) {
const int free_flag = reinterpret_cast<long>(arg);
hsa_symbol_kind_t kind = (hsa_symbol_kind_t)0;
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &kind));
@@ -433,9 +385,11 @@ class HsaInterceptor {
const char* name = NULL;
uint32_t len = 0;
uint64_t obj = 0;
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
HSA_RT(
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
if (free_flag == 0) {
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
HSA_RT(
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
char sym_name[len + 1];
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME, sym_name));
name = cpp_demangle(sym_name);
@@ -453,10 +407,7 @@ class HsaInterceptor {
return HSA_STATUS_SUCCESS;
}
static hsa_status_t ExecutableFreeze(
hsa_executable_t executable,
const char *options)
{
static hsa_status_t ExecutableFreeze(hsa_executable_t executable, const char* options) {
hsa_status_t status = HSA_STATUS_SUCCESS;
HSA_RT(hsa_executable_freeze_fn(executable, options));
@@ -466,39 +417,29 @@ class HsaInterceptor {
{ IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) is_codeobj_cb |= 1; }
if (is_codeobj_cb) {
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
executable,
CodeObjectCallback,
reinterpret_cast<void*>(0));
executable, CodeObjectCallback, reinterpret_cast<void*>(0));
}
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
HSA_RT(hsa_executable_iterate_symbols(
executable,
KernelSymbolCallback,
reinterpret_cast<void*>(0)));
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
reinterpret_cast<void*>(0)));
}
return status;
}
static hsa_status_t ExecutableDestroy(
hsa_executable_t executable)
{
static hsa_status_t ExecutableDestroy(hsa_executable_t executable) {
hsa_status_t status = HSA_STATUS_SUCCESS;
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
executable,
CodeObjectCallback,
reinterpret_cast<void*>(1));
executable, CodeObjectCallback, reinterpret_cast<void*>(1));
}
{
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
HSA_RT(hsa_executable_iterate_symbols(
executable,
KernelSymbolCallback,
reinterpret_cast<void*>(1)));
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
reinterpret_cast<void*>(1)));
}
}
+3 -3
Просмотреть файл
@@ -33,9 +33,9 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
namespace rocprofiler {
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
extern decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
extern decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
class HsaProxyQueue : public ProxyQueue {
public:
+72 -61
Просмотреть файл
@@ -40,16 +40,13 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
namespace rocprofiler {
enum {
K_CONC_OFF = 0,
K_CONC_PMC = 1,
K_CONC_TRACE = 2
};
enum { K_CONC_OFF = 0, K_CONC_PMC = 1, K_CONC_TRACE = 2 };
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
static inline void print_packet(const void* in_p, const uint32_t& in_n, const uint32_t& w_n = UINT32_MAX) {
static inline void print_packet(const void* in_p, const uint32_t& in_n,
const uint32_t& w_n = UINT32_MAX) {
const uint32_t size32 = util::HsaRsrcFactory::CMD_SLOT_SIZE_B / 4;
const uint32_t* beg = (const uint32_t*)in_p;
const uint32_t* end = beg + (in_n * size32);
@@ -85,31 +82,33 @@ class InterceptQueue {
typedef std::recursive_mutex mutex_t;
typedef std::map<uint64_t, InterceptQueue*> obj_map_t;
typedef hsa_status_t (*queue_callback_t)(hsa_queue_t*, void* data);
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t *queue, void *arg);
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t* queue, void* arg);
typedef uint32_t queue_id_t;
static void HsaIntercept(HsaApiTable* table);
static hsa_status_t InterceptQueueCreate(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source,
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue,
const bool& tracker_on) {
static hsa_status_t InterceptQueueCreate(
hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue,
const bool& tracker_on) {
std::lock_guard<mutex_t> lck(mutex_);
hsa_status_t status = HSA_STATUS_ERROR;
if (in_create_call_) EXC_ABORT(status, "recursive InterceptQueueCreate()");
in_create_call_ = true;
ProxyQueue* proxy = ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
group_segment_size, queue, &status);
ProxyQueue* proxy =
ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
group_segment_size, queue, &status);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "ProxyQueue::Create()");
if (tracker_on || tracker_on_) {
if (tracker_ == NULL) tracker_ = &Tracker::Instance();
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(*queue, true);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(
*queue, true);
if (status != HSA_STATUS_SUCCESS)
EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
}
InterceptQueue* obj = new InterceptQueue(agent, *queue, proxy);
@@ -138,15 +137,17 @@ class InterceptQueue {
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue) {
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, false);
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
group_segment_size, queue, false);
}
static hsa_status_t QueueCreateTracked(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source,
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue) {
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, true);
void (*callback)(hsa_status_t status, hsa_queue_t* source,
void* data),
void* data, uint32_t private_segment_size,
uint32_t group_segment_size, hsa_queue_t** queue) {
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
group_segment_size, queue, true);
}
static hsa_status_t QueueDestroy(hsa_queue_t* queue) {
@@ -170,8 +171,8 @@ class InterceptQueue {
return status;
}
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
hsa_amd_queue_intercept_packet_writer writer) {
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx,
void* data, hsa_amd_queue_intercept_packet_writer writer) {
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
Queue* proxy = obj->proxy_;
@@ -195,10 +196,10 @@ class InterceptQueue {
obj->queue_id,
completion_signal,
dispatch_packet,
NULL, // kernel_name
0, // kernel_object
NULL, // kernel_code
0, // (uint32_t)syscall(__NR_gettid),
NULL, // kernel_name
0, // kernel_object
NULL, // kernel_code
0, // (uint32_t)syscall(__NR_gettid),
NULL}; // record
// Calling dispatch callback
@@ -210,7 +211,8 @@ class InterceptQueue {
if (group.feature_count != 0) {
if (tracker_ != NULL) {
Group* context_group = context->GetGroup(group.index);
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context_group->GetDispatchSignal();
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
context_group->GetDispatchSignal();
Tracker::Enable_opt(context_group, completion_signal);
context_group->IncrRefsCount();
}
@@ -254,8 +256,9 @@ class InterceptQueue {
const uint32_t tid = syscall(__NR_gettid);
hsa_queue_t* qptr = obj->queue_;
const void* slot_ptr = util::HsaRsrcFactory::GetSlotPointer(qptr, user_que_idx);
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n",
pid, tid, qptr, user_que_idx, in_packets, slot_ptr, count, header_val); fflush(stdout);
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n", pid, tid, qptr,
user_que_idx, in_packets, slot_ptr, count, header_val);
fflush(stdout);
print_packet(in_packets, count);
abort();
#endif
@@ -277,8 +280,9 @@ class InterceptQueue {
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
uint64_t kernel_object = dispatch_packet->kernel_object;
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
QueryKernelName(kernel_object, kernel_code) : NULL;
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
? QueryKernelName(kernel_object, kernel_code)
: NULL;
}
// Prepareing submit callback data
@@ -311,8 +315,11 @@ class InterceptQueue {
const bool is_serial = (k_concurrent_ == K_CONC_OFF);
if (tracker_ != NULL) {
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id, dispatch_packet->completion_signal, is_serial);
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->signal;
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id,
dispatch_packet->completion_signal, is_serial);
if (is_serial)
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
tracker_entry->signal;
}
// Prepareing dispatch callback data
@@ -339,7 +346,9 @@ class InterceptQueue {
// Injecting profiling start/stop/read packets
if ((status != HSA_STATUS_SUCCESS) || (group.context == NULL)) {
if (tracker_entry != NULL) {
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->orig;
if (is_serial)
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
tracker_entry->orig;
tracker_->Delete(tracker_entry);
}
} else {
@@ -351,11 +360,11 @@ class InterceptQueue {
const pkt_vector_t& read_vector = context->ReadPackets(group.index);
pkt_vector_t packets;
if (is_serial) { // serial
if (is_serial) { // serial
packets = start_vector;
packets.insert(packets.end(), *packet);
packets.insert(packets.end(), stop_vector.begin(), stop_vector.end());
} else { // concurrent
} else { // concurrent
// Insert start packets once
auto inject_start = [&packets](const pkt_vector_t& starts) mutable {
packets = starts;
@@ -363,14 +372,15 @@ class InterceptQueue {
std::call_once(once_flag_, inject_start, start_vector);
// Reads at both kernel start and end (also with barriers)
assert(read_vector.size() >= 2 * start_vector.size());
auto mid = read_vector.begin() + read_vector.size()/2;
auto mid = read_vector.begin() + read_vector.size() / 2;
// Read at kernel start
packets.insert(packets.end(), read_vector.begin(), mid);
// Kernel dispatch packet
assert(tracker_entry != NULL);
// Bind dispatch and barrier signals with tracker entry
tracker_->SetHandler(tracker_entry, context->GetGroup(group.index));
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context->GetGroup(group.index)->GetDispatchSignal();
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
context->GetGroup(group.index)->GetDispatchSignal();
packets.insert(packets.end(), *packet);
// Read at kernel end
packets.insert(packets.end(), mid, read_vector.end());
@@ -379,7 +389,8 @@ class InterceptQueue {
if (tracker_entry != NULL) {
Group* context_group = context->GetGroup(group.index);
context_group->IncrRefsCount();
tracker_->EnableContext(tracker_entry, Context::Handler, reinterpret_cast<void*>(context_group));
tracker_->EnableContext(tracker_entry, Context::Handler,
reinterpret_cast<void*>(context_group));
}
if (writer != NULL) {
@@ -409,8 +420,8 @@ class InterceptQueue {
}
}
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
hsa_amd_queue_intercept_packet_writer writer) {
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx,
void* data, hsa_amd_queue_intercept_packet_writer writer) {
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
Queue* proxy = obj->proxy_;
@@ -431,8 +442,9 @@ class InterceptQueue {
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
uint64_t kernel_object = dispatch_packet->kernel_object;
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
QueryKernelName(kernel_object, kernel_code) : NULL;
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
? QueryKernelName(kernel_object, kernel_code)
: NULL;
}
// Prepareing submit callback data
@@ -529,7 +541,9 @@ class InterceptQueue {
Stop();
}
static inline void Start() { dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release); }
static inline void Start() {
dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release);
}
static inline void Stop() { dispatch_callback_.store(NULL, std::memory_order_relaxed); }
static void SetSubmitCallback(rocprofiler_hsa_callback_fun_t fun, void* arg) {
@@ -545,7 +559,7 @@ class InterceptQueue {
static uint32_t k_concurrent_;
private:
static void queue_event_callback(hsa_status_t status, hsa_queue_t *queue, void *arg) {
static void queue_event_callback(hsa_status_t status, hsa_queue_t* queue, void* arg) {
if (status != HSA_STATUS_SUCCESS) {
uint32_t* read_ptr32 = (uint32_t*)util::HsaRsrcFactory::GetReadPointer(queue);
print_packet(read_ptr32, 1);
@@ -582,12 +596,13 @@ class InterceptQueue {
const uint16_t kernel_object_flag = *((uint64_t*)kernel_code + 1);
if (kernel_object_flag == 0) {
if (!util::HsaRsrcFactory::IsExecutableTracking()) {
EXC_ABORT(HSA_STATUS_ERROR, "Error: V3 code object detected - code objects tracking should be enabled\n");
EXC_ABORT(HSA_STATUS_ERROR,
"Error: V3 code object detected - code objects tracking should be enabled\n");
}
}
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking()) ?
util::HsaRsrcFactory::GetKernelNameRef(kernel_object) :
GetKernelName(kernel_code->runtime_loader_kernel_symbol);
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking())
? util::HsaRsrcFactory::GetKernelNameRef(kernel_object)
: GetKernelName(kernel_code->runtime_loader_kernel_symbol);
return kernel_symname;
}
@@ -618,17 +633,13 @@ class InterceptQueue {
return status;
}
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy) :
queue_(queue),
proxy_(proxy)
{
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy)
: queue_(queue), proxy_(proxy) {
agent_info_ = util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
queue_event_callback_ = NULL;
}
~InterceptQueue() {
ProxyQueue::Destroy(proxy_);
}
~InterceptQueue() { ProxyQueue::Destroy(proxy_); }
static const packet_word_t header_type_mask = (1ul << HSA_PACKET_HEADER_WIDTH_TYPE) - 1;
+1 -1
Просмотреть файл
@@ -25,4 +25,4 @@ THE SOFTWARE.
namespace rocprofiler {
MetricsDict::map_t* MetricsDict::map_ = NULL;
MetricsDict::mutex_t MetricsDict::mutex_;
}
} // namespace rocprofiler
Исполняемый файл → Обычный файл
+5 -5
Просмотреть файл
@@ -202,15 +202,15 @@ class MetricsDict {
xml_->AddConst("top.const.metric", "SE_NUM", agent_info->se_num);
ImportMetrics(agent_info, "const");
agent_name_ = agent_info->name;
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
std::unordered_set<std::string> supported_agent_names = {
"gfx906", "gfx908", "gfx90a", // Vega
"gfx940", "gfx941", "gfx942", // Mi300
"gfx906", "gfx908", "gfx90a", // Vega
"gfx940", "gfx941", "gfx942", // Mi300
"gfx1030", "gfx1031", "gfx1032", // Navi2x
"gfx1100", "gfx1101" // Navi3x
"gfx1100", "gfx1101" // Navi3x
};
if (supported_agent_names.find(agent_name_) != supported_agent_names.end()) {
ImportMetrics(agent_info, agent_name_);
+25 -21
Просмотреть файл
@@ -140,7 +140,7 @@ class Profile {
static void SetConcurrent(profile_t* profile) {
// Check whether conconcurrent has been set
for (const parameter_t* p = profile->parameters;
p < (profile->parameters + profile->parameter_count); ++p) {
p < (profile->parameters + profile->parameter_count); ++p) {
// If yes, stop here
if (p->parameter_name == HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_K_CONCURRENT) {
return;
@@ -148,7 +148,7 @@ class Profile {
}
// Otherwise, try to set
parameter_t* parameters = new parameter_t[profile->parameter_count+1];
parameter_t* parameters = new parameter_t[profile->parameter_count + 1];
for (unsigned i = 0; i < profile->parameter_count; ++i) {
parameters[i].parameter_name = profile->parameters[i].parameter_name;
parameters[i].value = profile->parameters[i].value;
@@ -162,15 +162,16 @@ class Profile {
}
void BarrierPacket(packet_t* packet, const hsa_signal_t& prior_signal) {
hsa_barrier_and_packet_t* barrier =
reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
hsa_barrier_and_packet_t* barrier = reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
barrier->header = HSA_PACKET_TYPE_BARRIER_AND;
if (prior_signal.handle) barrier->dep_signal[0] = prior_signal; // set packet dependency
else barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
if (prior_signal.handle)
barrier->dep_signal[0] = prior_signal; // set packet dependency
else
barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
}
hsa_status_t Finalize(pkt_vector_t& start_vector, pkt_vector_t& stop_vector,
pkt_vector_t& read_vector, bool is_concurrent = false) {
pkt_vector_t& read_vector, bool is_concurrent = false) {
if (is_concurrent) SetConcurrent(&profile_);
hsa_status_t status = HSA_STATUS_SUCCESS;
@@ -180,8 +181,8 @@ class Profile {
const pfn_t* api = rsrc->AqlProfileApi();
packet_t start{};
packet_t stop{};
packet_t read{}; // read at kernel start
packet_t read2{}; // read at kernel end
packet_t read{}; // read at kernel start
packet_t read2{}; // read at kernel end
// Check the profile buffer sizes
status = api->hsa_ven_amd_aqlprofile_start(&profile_, NULL);
@@ -200,12 +201,12 @@ class Profile {
#ifdef AQLPROF_NEW_API
if (profile_.type == HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC) {
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read);
if (is_concurrent){ // concurrent: one more read
if (is_concurrent) { // concurrent: one more read
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read2);
}
}
#if 0 // Read API returns error if disabled
#if 0 // Read API returns error if disabled
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
#endif
#endif
@@ -220,7 +221,8 @@ class Profile {
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
if (is_concurrent) {
status = hsa_signal_create(1, 0, NULL, &read_signal_);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "signal_create " << std::hex << status);
read.completion_signal = read_signal_;
read2.completion_signal = completion_signal_;
} else {
@@ -239,7 +241,8 @@ class Profile {
BarrierPacket(&barrier_rd, read.completion_signal);
BarrierPacket(&barrier_rd2, dispatch_signal_);
status = hsa_signal_create(1, 0, NULL, &(barrier_signal_));
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "signal_create " << std::hex << status);
barrier_rd2.completion_signal = barrier_signal_;
}
@@ -297,8 +300,8 @@ class Profile {
void GetProfiles(profile_vector_t& vec) {
if (!info_vector_.empty()) {
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_,
dispatch_signal_, barrier_signal_, read_signal_});
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_, dispatch_signal_,
barrier_signal_, read_signal_});
}
}
@@ -330,11 +333,12 @@ class PmcProfile : public Profile {
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
profile_.command_buffer.ptr =
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
// Allocate profile output buffer from kernarg memory pool since kernarg
// memory buffer is uncached. So when GPU copies performance counter values
// to this buffer they are guaranteed to be visible to CPU.
profile_.output_buffer.ptr = rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
profile_.output_buffer.ptr =
rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
: HSA_STATUS_ERROR;
}
@@ -366,11 +370,11 @@ class TraceProfile : public Profile {
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
profile_.command_buffer.ptr =
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
profile_.output_buffer.size = output_buffer_size_;
profile_.output_buffer.ptr = (output_buffer_local_) ?
rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size) :
rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
profile_.output_buffer.ptr = (output_buffer_local_)
? rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size)
: rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
: HSA_STATUS_ERROR;
}
+2 -2
Просмотреть файл
@@ -38,10 +38,10 @@ ProxyQueue* ProxyQueue::Create(hsa_agent_t agent, uint32_t size, hsa_queue_type3
hsa_status_t* status) {
hsa_status_t suc = HSA_STATUS_ERROR;
ProxyQueue* instance =
(rocp_type_) ? (ProxyQueue*) new SimpleProxyQueue() : (ProxyQueue*) new HsaProxyQueue();
(rocp_type_) ? (ProxyQueue*)new SimpleProxyQueue() : (ProxyQueue*)new HsaProxyQueue();
if (instance != NULL) {
suc = instance->Init(agent, size, type, callback, data, private_segment_size,
group_segment_size, queue);
group_segment_size, queue);
if (suc != HSA_STATUS_SUCCESS) {
delete instance;
instance = NULL;
+80 -80
Просмотреть файл
@@ -75,34 +75,34 @@ hsa_status_t CreateQueuePro(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
hsa_queue_t** queue);
decltype(hsa_queue_create)* hsa_queue_create_fn;
decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
decltype(::hsa_queue_create)* hsa_queue_create_fn;
decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
decltype(hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
decltype(::hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
decltype(hsa_memory_copy)* hsa_memory_copy_fn;
decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
decltype(hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
decltype(::hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
::HsaApiTable* kHsaApiTable;
@@ -393,80 +393,80 @@ ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25;
PUBLIC_API bool OnLoad(HsaApiTable* table, uint64_t runtime_version, uint64_t failed_tool_count,
const char* const* failed_tool_names) {
ONLOAD_TRACE_BEG();
rocprofiler::SaveHsaApi(table);
rocprofiler::ProxyQueue::InitFactory();
rocprofiler::SaveHsaApi(table);
rocprofiler::ProxyQueue::InitFactory();
// Checking environment to enable intercept mode
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
// Checking environment to enable intercept mode
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
int intercept_env_value = 0;
if (intercept_env != NULL) {
intercept_env_value = atoi(intercept_env);
int intercept_env_value = 0;
if (intercept_env != NULL) {
intercept_env_value = atoi(intercept_env);
switch (intercept_env_value) {
case 0:
case 1:
// 0: Intercepting disabled
// 1: Intercepting enabled without timestamping
rocprofiler::InterceptQueue::TrackerOn(false);
break;
case 2:
// Intercepting enabled with timestamping
rocprofiler::InterceptQueue::TrackerOn(true);
break;
default:
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
<< intercept_env << "): "
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
"(intercepting with timestamp)");
return false;
}
switch (intercept_env_value) {
case 0:
case 1:
// 0: Intercepting disabled
// 1: Intercepting enabled without timestamping
rocprofiler::InterceptQueue::TrackerOn(false);
break;
case 2:
// Intercepting enabled with timestamping
rocprofiler::InterceptQueue::TrackerOn(true);
break;
default:
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
<< intercept_env << "): "
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
"(intercepting with timestamp)");
return false;
}
}
// always enable excutable tracking
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
// always enable excutable tracking
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
// Loading a tool lib and setting of intercept mode
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
// Loading a tool lib and setting of intercept mode
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
table->amd_ext_->hsa_amd_memory_async_copy_fn =
rocprofiler::hsa_amd_memory_async_copy_interceptor;
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
}
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
table->amd_ext_->hsa_amd_memory_async_copy_fn =
rocprofiler::hsa_amd_memory_async_copy_interceptor;
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
}
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
}
rocprofiler::HsaInterceptor::Enable(true);
rocprofiler::HsaInterceptor::HsaIntercept(table);
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
}
rocprofiler::HsaInterceptor::Enable(true);
rocprofiler::HsaInterceptor::HsaIntercept(table);
}
// HSA intercepting
if (intercept_env_value != 0) {
rocprofiler::ProxyQueue::HsaIntercept(table);
rocprofiler::InterceptQueue::HsaIntercept(table);
} else {
rocprofiler::StandaloneIntercept();
}
// HSA intercepting
if (intercept_env_value != 0) {
rocprofiler::ProxyQueue::HsaIntercept(table);
rocprofiler::InterceptQueue::HsaIntercept(table);
} else {
rocprofiler::StandaloneIntercept();
}
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
<< ")" << std::dec);
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
<< ")" << std::dec);
return true;
}
// HSA-runtime tool on-unload method
PUBLIC_API void OnUnload() {
ONLOAD_TRACE_BEG();
rocprofiler::UnloadTool();
rocprofiler::RestoreHsaApi();
rocprofiler::UnloadTool();
rocprofiler::RestoreHsaApi();
ONLOAD_TRACE_END();
}
+9 -11
Просмотреть файл
@@ -27,22 +27,20 @@ namespace rocprofiler {
namespace att {
AttTracer::AttTracer(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
rocprofiler_session_id_t session_id)
rocprofiler_session_id_t session_id)
: buffer_id_(buffer_id), filter_id_(filter_id), session_id_(session_id) {}
void AttTracer::AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id,
hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties,
uint32_t thread_id, uint64_t queue_index) {
void AttTracer::AddPendingSignals(
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id, hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index) {
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
if (sessions_pending_signals_.find(writer_id) == sessions_pending_signals_.end())
sessions_pending_signals_.emplace(writer_id, std::vector<att_pending_signal_t>());
sessions_pending_signals_.at(writer_id).emplace_back(
att_pending_signal_t{kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, profile,
kernel_properties, thread_id, queue_index});
sessions_pending_signals_.at(writer_id).emplace_back(att_pending_signal_t{
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id,
profile, kernel_properties, thread_id, queue_index});
std::atomic_thread_fence(std::memory_order_release);
}
+5 -7
Просмотреть файл
@@ -40,7 +40,7 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
}
break;
}
case ROCPROFILER_PC_SAMPLING_COLLECTION:{
case ROCPROFILER_PC_SAMPLING_COLLECTION: {
break;
}
case ROCPROFILER_ATT_TRACE_COLLECTION: {
@@ -62,8 +62,8 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
}
case ROCPROFILER_API_TRACE: {
tracer_apis_.clear();
for (uint32_t j = 0; j < data_count; j++){
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
for (uint32_t j = 0; j < data_count; j++) {
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
}
break;
}
@@ -195,7 +195,7 @@ void Filter::SetProperty(rocprofiler_filter_property_t property) {
case ROCPROFILER_FILTER_DISPATCH_IDS:
dispatch_id_filter_.clear();
for (uint32_t j = 0; j < property.data_count; j++)
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
break;
default:
break;
@@ -249,9 +249,7 @@ void Filter::SetCallback(rocprofiler_sync_callback_t& callback) {
bool Filter::HasCallback() { return has_sync_callback_; }
rocprofiler_sync_callback_t& Filter::GetCallback() {
return callback_;
}
rocprofiler_sync_callback_t& Filter::GetCallback() { return callback_; }
size_t Filter::GetPropertiesCount(rocprofiler_filter_property_kind_t kind) {
switch (kind) {
+6 -8
Просмотреть файл
@@ -53,11 +53,8 @@ class Filter {
bool HasCallback();
void SetProperty(rocprofiler_filter_property_t property);
std::variant<
std::vector<std::string>,
uint32_t*,
std::vector<uint64_t>
> GetProperty(rocprofiler_filter_property_kind_t kind);
std::variant<std::vector<std::string>, uint32_t*, std::vector<uint64_t> > GetProperty(
rocprofiler_filter_property_kind_t kind);
size_t GetPropertiesCount(rocprofiler_filter_property_kind_t kind);
rocprofiler_spm_parameter_t* GetSpmParameterData();
@@ -74,11 +71,12 @@ class Filter {
std::vector<std::string> kernel_names_; // HIP/HSA API Functions
uint32_t dispatch_range_[2]; // Kernel Dispatches OR API Range
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
std::vector<rocprofiler_tracer_activity_domain_t> tracer_apis_; // ROCTX/HIP/HSA API
rocprofiler_spm_parameter_t* spm_parameter_; // spm parameter
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
rocprofiler_counters_sampler_parameters_t counters_sampler_parameters_; // sampled counters parameters
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
rocprofiler_counters_sampler_parameters_t
counters_sampler_parameters_; // sampled counters parameters
std::vector<uint64_t> dispatch_id_filter_;
bool has_sync_callback_{false};
+10 -8
Просмотреть файл
@@ -125,17 +125,19 @@ bool Profiler::HasActivePass() {
}
void Profiler::AddPendingSignals(
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
hsa_ven_amd_aqlprofile_profile_t* profile, rocprofiler_kernel_properties_t kernel_properties,
uint32_t thread_id, uint64_t queue_index, uint64_t correlation_id) {
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id, rocprofiler::profiling_context_t* context,
uint64_t session_data_count, hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index,
uint64_t correlation_id) {
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
if (sessions_pending_signals_->find(writer_id) == sessions_pending_signals_->end())
sessions_pending_signals_->emplace(writer_id, std::vector<pending_signal_t*>());
sessions_pending_signals_->at(writer_id).emplace_back(new pending_signal_t{
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, context, session_data_count,
profile, kernel_properties, thread_id, queue_index, correlation_id});
sessions_pending_signals_->at(writer_id).emplace_back(
new pending_signal_t{kernel_object, original_completion_signal, new_completion_signal,
session_id_, buffer_id, context, session_data_count, profile,
kernel_properties, thread_id, queue_index, correlation_id});
}
const std::vector<pending_signal_t*>& Profiler::GetPendingSignals(uint32_t writer_id) {
+12 -9
Просмотреть файл
@@ -36,7 +36,7 @@
#include "src/core/counters/metrics/eval_metrics.h"
typedef void (*rocprofiler_add_profiler_record_t)(rocprofiler_record_profiler_t&& record,
rocprofiler_session_id_t session_id);
rocprofiler_session_id_t session_id);
typedef rocprofiler_timestamp_t (*rocprofiler_get_timestamp_t)();
@@ -68,12 +68,13 @@ class Profiler {
~Profiler();
void AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
rocprofiler_buffer_id_t buffer_id,
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
uint64_t queue_index, uint64_t correlation_id);
const hsa_signal_t& original_completion_signal,
const hsa_signal_t& new_completion_signal,
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
hsa_ven_amd_aqlprofile_profile_t* profile,
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
uint64_t queue_index, uint64_t correlation_id);
const std::vector<pending_signal_t*>& GetPendingSignals(uint32_t writer_id);
bool CheckPendingSignalsIsEmpty();
@@ -83,8 +84,10 @@ class Profiler {
std::string& GetCounterName(rocprofiler_counter_id_t handler);
bool FindCounter(rocprofiler_counter_id_t counter_id);
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind,
rocprofiler_counter_id_t counter_id);
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind,
rocprofiler_counter_id_t counter_id);
void StartReplayPass(rocprofiler_session_id_t session_id);
void EndReplayPass();
+3 -3
Просмотреть файл
@@ -67,8 +67,8 @@ class Session {
// Filter
rocprofiler_filter_id_t CreateFilter(rocprofiler_filter_kind_t filter_kind,
rocprofiler_filter_data_t filter_data, uint64_t data_count,
rocprofiler_filter_property_t property);
rocprofiler_filter_data_t filter_data, uint64_t data_count,
rocprofiler_filter_property_t property);
bool FindFilter(rocprofiler_filter_id_t filter_id);
void DestroyFilter(rocprofiler_filter_id_t filter_id);
Filter* GetFilter(rocprofiler_filter_id_t filter_id);
@@ -83,7 +83,7 @@ class Session {
// Buffer
rocprofiler_buffer_id_t CreateBuffer(rocprofiler_buffer_callback_t buffer_callback,
size_t buffer_size);
size_t buffer_size);
bool FindBuffer(rocprofiler_buffer_id_t buffer_id);
void DestroyBuffer(rocprofiler_buffer_id_t buffer_id);
Memory::GenericBuffer* GetBuffer(rocprofiler_buffer_id_t buffer_id);
+13 -24
Просмотреть файл
@@ -112,8 +112,7 @@ const char* roctracer_op_string(uint32_t domain, uint32_t op) {
case ACTIVITY_DOMAIN_EXT_API:
return "EXT_API";
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -178,8 +177,7 @@ constexpr uint32_t get_op_begin(activity_domain_t domain) {
case ACTIVITY_DOMAIN_EXT_API:
return 0;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -200,8 +198,7 @@ constexpr uint32_t get_op_end(activity_domain_t domain) {
case ACTIVITY_DOMAIN_EXT_API:
return get_op_begin(ACTIVITY_DOMAIN_EXT_API);
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -476,11 +473,10 @@ int TracerCallback(activity_domain_t domain, uint32_t operation_id, void* data)
rocprofiler::GetROCProfilerSingleton()
->GetSession((*pool)->session_id)
->GetBuffer((*pool)->buffer_id)
->AddRecord(
rocprofiler_record, record->kernel_name, kernel_name_size,
[](auto& rocprofiler_record, const void* data) {
rocprofiler_record.name = static_cast<const char*>(data);
});
->AddRecord(rocprofiler_record, record->kernel_name, kernel_name_size,
[](auto& rocprofiler_record, const void* data) {
rocprofiler_record.name = static_cast<const char*>(data);
});
} else {
rocprofiler::GetROCProfilerSingleton()
->GetSession((*pool)->session_id)
@@ -584,8 +580,7 @@ static void roctracer_enable_op_callback(activity_domain_t domain, uint32_t oper
user_data);
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -623,8 +618,7 @@ void roctracer_disable_op_callback(activity_domain_t domain, uint32_t operation_
ROCTX_registration_group.Unregister(roctx_api_callback_table, operation_id);
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -667,8 +661,7 @@ void roctracer_enable_op_activity(activity_domain_t domain, uint32_t op,
case ACTIVITY_DOMAIN_ROCTX:
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -710,8 +703,7 @@ void roctracer_disable_activity(activity_domain_t domain, uint32_t op) {
case ACTIVITY_DOMAIN_ROCTX:
break;
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -774,8 +766,7 @@ void roctracer_set_properties(activity_domain_t domain, void* properties) {
break;
}
default:
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
"Invalid domain ID");
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
}
}
@@ -791,9 +782,7 @@ static std::string getKernelNameMultiKernelMultiDevice(hipLaunchParams* launchPa
return name_str.str();
}
template <typename... Ts> struct Overloaded : Ts... {
using Ts::operator()...;
};
template <typename... Ts> struct Overloaded : Ts... { using Ts::operator()...; };
template <class... Ts> Overloaded(Ts...) -> Overloaded<Ts...>;
std::optional<std::string> GetHipKernelName(uint32_t cid, hip_api_data_t* data) {
+12 -6
Просмотреть файл
@@ -27,13 +27,19 @@ void SimpleProxyQueue::HsaIntercept(HsaApiTable* table) {
table->core_->hsa_signal_store_relaxed_fn = rocprofiler::SimpleProxyQueue::SignalStore;
table->core_->hsa_signal_store_screlease_fn = rocprofiler::SimpleProxyQueue::SignalStore;
table->core_->hsa_queue_load_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
table->core_->hsa_queue_load_write_index_relaxed_fn =
rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_relaxed_fn =
rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_relaxed_fn =
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
table->core_->hsa_queue_load_write_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_screlease_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
table->core_->hsa_queue_load_write_index_scacquire_fn =
rocprofiler::SimpleProxyQueue::GetQueueIndex;
table->core_->hsa_queue_store_write_index_screlease_fn =
rocprofiler::SimpleProxyQueue::SetQueueIndex;
table->core_->hsa_queue_load_read_index_scacquire_fn =
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
}
SimpleProxyQueue::queue_map_t* SimpleProxyQueue::queue_map_ = NULL;
+16 -16
Просмотреть файл
@@ -33,23 +33,23 @@ THE SOFTWARE.
#include "util/hsa_rsrc_factory.h"
#ifndef ROCP_PROXY_LOCK
# define ROCP_PROXY_LOCK 1
#define ROCP_PROXY_LOCK 1
#endif
namespace rocprofiler {
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
extern decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
extern decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
extern decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
extern decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
extern decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
extern decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
extern decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
extern decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
extern decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
extern decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
extern decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
extern decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
typedef decltype(hsa_signal_t::handle) signal_handle_t;
@@ -128,7 +128,8 @@ class SimpleProxyQueue : public ProxyQueue {
const uint64_t que_idx = hsa_queue_load_write_index_relaxed_fn(queue_);
// Waiting untill there is a free space in the queue
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_));
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_))
;
// Increment the write index
hsa_queue_store_write_index_relaxed_fn(queue_, que_idx + 1);
@@ -163,8 +164,7 @@ class SimpleProxyQueue : public ProxyQueue {
queue_mask_(0),
submit_index_(0),
on_submit_cb_(NULL),
on_submit_cb_data_(NULL)
{
on_submit_cb_data_(NULL) {
printf("ROCProfiler: SimpleProxyQueue is enabled\n");
fflush(stdout);
}
@@ -203,8 +203,8 @@ class SimpleProxyQueue : public ProxyQueue {
if (queue_map_ == NULL) queue_map_ = new queue_map_t;
(*queue_map_)[queue_->doorbell_signal.handle] = this;
}
else abort();
} else
abort();
}
}
if (status != HSA_STATUS_SUCCESS) abort();
+40 -28
Просмотреть файл
@@ -40,7 +40,7 @@ THE SOFTWARE.
namespace rocprofiler {
class Tracker {
public:
public:
typedef std::mutex mutex_t;
typedef util::HsaRsrcFactory::timestamp_t timestamp_t;
typedef rocprofiler_dispatch_record_t record_t;
@@ -89,7 +89,7 @@ class Tracker {
}
// Add tracker entry
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy=true) {
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy = true) {
hsa_status_t status = HSA_STATUS_ERROR;
// Creating a new tracker entry
@@ -108,10 +108,12 @@ class Tracker {
// Creating a proxy signal
if (proxy) {
entry->is_proxy = true;
const hsa_signal_value_t signal_value = (orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
const hsa_signal_value_t signal_value =
(orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
status = hsa_api_.hsa_signal_create(signal_value, 0, NULL, &(entry->signal));
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_signal_create");
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler, entry);
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT,
signal_value, Handler, entry);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
}
@@ -128,7 +130,8 @@ class Tracker {
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
hsa_signal_t& handler_signal = group->GetBarrierSignal();
entry->signal = dispatch_signal;
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(
handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
}
@@ -150,7 +153,8 @@ class Tracker {
// Debug trace
if (trace_on_) {
auto outstanding = outstanding_.fetch_add(1);
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record,
outstanding);
fflush(stdout);
}
}
@@ -173,12 +177,14 @@ class Tracker {
group->GetRecord()->dispatch = util::HsaRsrcFactory::Instance().TimestampNs();
// Creating a proxy signal
const hsa_signal_value_t signal_value = (orig_signal.handle) ?
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal) : 1;
const hsa_signal_value_t signal_value = (orig_signal.handle)
? util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal)
: 1;
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal, signal_value);
hsa_status_t status =
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal,
signal_value);
hsa_status_t status = util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(
dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
}
@@ -190,7 +196,8 @@ class Tracker {
record_t* record = group->GetRecord();
hsa_amd_profiling_dispatch_time_t dispatch_time{};
hsa_status_t status =
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(context->GetAgent(), dispatch_signal, &dispatch_time);
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(
context->GetAgent(), dispatch_signal, &dispatch_time);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
record->begin = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.start);
record->end = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.end);
@@ -203,22 +210,23 @@ class Tracker {
amd_signal_t* prof_signal_ptr = reinterpret_cast<amd_signal_t*>(dispatch_signal.handle);
orig_signal_ptr->start_ts = prof_signal_ptr->start_ts;
orig_signal_ptr->end_ts = prof_signal_ptr->end_ts;
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal, signal_value);
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal,
signal_value);
}
return Context::Handler(signal_value, arg);
}
private:
Tracker() :
outstanding_(0),
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
hsa_api_(*(hsa_rsrc_->HsaApi()))
{}
private:
Tracker()
: outstanding_(0),
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
hsa_api_(*(hsa_rsrc_->HsaApi())) {}
~Tracker() {
if (trace_on_) {
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()), outstanding_.load());
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()),
outstanding_.load());
fflush(stdout);
}
@@ -226,8 +234,8 @@ class Tracker {
auto end = sig_list_.end();
while (it != end) {
auto cur = it++;
// The wait should be optiona as there possible some inter kernel dependencies and it possible to wait for
// the kernels will never be lunched as the application was finished by some reason.
// The wait should be optiona as there possible some inter kernel dependencies and it possible to
// wait for the kernels will never be lunched as the application was finished by some reason.
#if 0
// FIXME: currently the signal value for tracking signals are taken from original application signal
hsa_rsrc_->SignalWait((*cur)->signal, 1);
@@ -246,20 +254,24 @@ class Tracker {
// Debug trace
if (trace_on_) {
auto outstanding = outstanding_.fetch_sub(1);
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record,
outstanding);
fflush(stdout);
}
// Query begin/end and complete timestamps
if (entry->is_memcopy) {
hsa_amd_profiling_async_copy_time_t async_copy_time{};
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
hsa_status_t status =
hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
if (status != HSA_STATUS_SUCCESS)
EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
record->begin = hsa_rsrc_->SysclockToNs(async_copy_time.start);
record->end = hsa_rsrc_->SysclockToNs(async_copy_time.end);
} else {
hsa_amd_profiling_dispatch_time_t dispatch_time{};
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
hsa_status_t status =
hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
record->begin = hsa_rsrc_->SysclockToNs(dispatch_time.start);
record->end = hsa_rsrc_->SysclockToNs(dispatch_time.end);
@@ -349,6 +361,6 @@ class Tracker {
static const bool trace_on_ = false;
};
} // namespace rocprofiler
} // namespace rocprofiler
#endif // SRC_CORE_TRACKER_H_
#endif // SRC_CORE_TRACKER_H_
+4 -3
Просмотреть файл
@@ -36,11 +36,12 @@ typedef hsa_ext_amd_aql_pm4_packet_t packet_t;
typedef uint32_t packet_word_t;
typedef uint64_t timestamp_t;
inline std::ostream& operator<< (std::ostream& out, const event_t& event) {
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index << "). counter_id(" << event.counter_id << ")]";
inline std::ostream& operator<<(std::ostream& out, const event_t& event) {
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index
<< "). counter_id(" << event.counter_id << ")]";
return out;
}
inline std::ostream& operator<< (std::ostream& out, const parameter_t& parameter) {
inline std::ostream& operator<<(std::ostream& out, const parameter_t& parameter) {
out << "[parameter_name(" << parameter.parameter_name << "). value(" << parameter.value << ")]";
return out;
}
+25 -27
Просмотреть файл
@@ -35,15 +35,12 @@
namespace rocprofiler::pc_sampler {
PCSampler::PCSampler(
rocprofiler_buffer_id_t buffer_id,
rocprofiler_filter_id_t filter_id,
rocprofiler_session_id_t session_id)
: buffer_id_(buffer_id)
, filter_id_(filter_id)
, session_id_(session_id)
, pci_system_initialized_(pci_system_init() == 0)
{}
PCSampler::PCSampler(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
rocprofiler_session_id_t session_id)
: buffer_id_(buffer_id),
filter_id_(filter_id),
session_id_(session_id),
pci_system_initialized_(pci_system_init() == 0) {}
PCSampler::~PCSampler() {
if (pci_system_initialized_) {
@@ -53,7 +50,9 @@ PCSampler::~PCSampler() {
}
void PCSampler::Start() {
if (sampler_thread_.joinable()) { return; }
if (sampler_thread_.joinable()) {
return;
}
devices_.clear();
@@ -61,15 +60,15 @@ void PCSampler::Start() {
agents_t agents;
rocprofiler::hsa_support::GetCoreApiTable().hsa_iterate_agents_fn(
[](hsa_agent_t agent, void *arg){
auto &agents = *reinterpret_cast<agents_t *>(arg);
agents.emplace_back(agent);
return HSA_STATUS_SUCCESS;
},
&agents);
[](hsa_agent_t agent, void* arg) {
auto& agents = *reinterpret_cast<agents_t*>(arg);
agents.emplace_back(agent);
return HSA_STATUS_SUCCESS;
},
&agents);
for (const auto &agent : agents) {
const auto &ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
for (const auto& agent : agents) {
const auto& ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
if (ai.getType() != HSA_DEVICE_TYPE_GPU) {
continue;
}
@@ -81,31 +80,30 @@ void PCSampler::Start() {
}
void PCSampler::Stop() {
if (!sampler_thread_.joinable()) { return; }
if (!sampler_thread_.joinable()) {
return;
}
keep_running_ = false;
sampler_thread_.join();
}
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t &record) {
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t& record) {
const auto tool = rocprofiler::GetROCProfilerSingleton();
const auto session = tool->GetSession(session_id_);
const auto buffer = session->GetBuffer(buffer_id_);
std::lock_guard<std::mutex> lk(session->GetSessionLock());
record.header = {
ROCPROFILER_PC_SAMPLING_RECORD,
{ tool->GetUniqueRecordId() }
};
record.header = {ROCPROFILER_PC_SAMPLING_RECORD, {tool->GetUniqueRecordId()}};
buffer->AddRecord(record);
}
void PCSampler::SamplerLoop() {
while (keep_running_) {
auto next_tick = std::chrono::steady_clock::now() + std::chrono::milliseconds(10);
for (auto &agent : devices_) {
auto &device = agent.second;
for (auto& agent : devices_) {
auto& device = agent.second;
if (device.fd_.mmio2.get() >= 0) {
gfxip::read_pc_samples_v9_ioctl(device, this);
} else {
@@ -116,4 +114,4 @@ void PCSampler::SamplerLoop() {
}
}
} // namespace rocprofiler::pc_sampler
} // namespace rocprofiler::pc_sampler
Разница между файлами не показана из-за своего большого размера Загрузить разницу
Разница между файлами не показана из-за своего большого размера Загрузить разницу
Разница между файлами не показана из-за своего большого размера Загрузить разницу
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+202 -202
Просмотреть файл
@@ -23,244 +23,244 @@
// addressBlock: gc_grbmdec
// base address: 0x8000
#define mmGRBM_CNTL 0x0000
#define mmGRBM_CNTL_BASE_IDX 0
#define mmGRBM_SKEW_CNTL 0x0001
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
#define mmGRBM_STATUS2 0x0002
#define mmGRBM_STATUS2_BASE_IDX 0
#define mmGRBM_PWR_CNTL 0x0003
#define mmGRBM_PWR_CNTL_BASE_IDX 0
#define mmGRBM_STATUS 0x0004
#define mmGRBM_STATUS_BASE_IDX 0
#define mmGRBM_STATUS_SE0 0x0005
#define mmGRBM_STATUS_SE0_BASE_IDX 0
#define mmGRBM_STATUS_SE1 0x0006
#define mmGRBM_STATUS_SE1_BASE_IDX 0
#define mmGRBM_SOFT_RESET 0x0008
#define mmGRBM_SOFT_RESET_BASE_IDX 0
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
#define mmGRBM_STATUS_SE2 0x000e
#define mmGRBM_STATUS_SE2_BASE_IDX 0
#define mmGRBM_STATUS_SE3 0x000f
#define mmGRBM_STATUS_SE3_BASE_IDX 0
#define mmGRBM_READ_ERROR 0x0016
#define mmGRBM_READ_ERROR_BASE_IDX 0
#define mmGRBM_READ_ERROR2 0x0017
#define mmGRBM_READ_ERROR2_BASE_IDX 0
#define mmGRBM_INT_CNTL 0x0018
#define mmGRBM_INT_CNTL_BASE_IDX 0
#define mmGRBM_TRAP_OP 0x0019
#define mmGRBM_TRAP_OP_BASE_IDX 0
#define mmGRBM_TRAP_ADDR 0x001a
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
#define mmGRBM_TRAP_ADDR_MSK 0x001b
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
#define mmGRBM_TRAP_WD 0x001c
#define mmGRBM_TRAP_WD_BASE_IDX 0
#define mmGRBM_TRAP_WD_MSK 0x001d
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
#define mmGRBM_DSM_BYPASS 0x001e
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
#define mmGRBM_WRITE_ERROR 0x001f
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
#define mmGRBM_IOV_ERROR 0x0020
#define mmGRBM_IOV_ERROR_BASE_IDX 0
#define mmGRBM_CHIP_REVISION 0x0021
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
#define mmGRBM_GFX_CNTL 0x0022
#define mmGRBM_GFX_CNTL_BASE_IDX 0
#define mmGRBM_RSMU_CFG 0x0023
#define mmGRBM_RSMU_CFG_BASE_IDX 0
#define mmGRBM_IH_CREDIT 0x0024
#define mmGRBM_IH_CREDIT_BASE_IDX 0
#define mmGRBM_PWR_CNTL2 0x0025
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
#define mmGRBM_RSMU_READ_ERROR 0x0028
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
#define mmGRBM_CHICKEN_BITS 0x0029
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
#define mmGRBM_FENCE_RANGE0 0x002a
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
#define mmGRBM_FENCE_RANGE1 0x002b
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
#define mmGRBM_NOWHERE 0x003f
#define mmGRBM_NOWHERE_BASE_IDX 0
#define mmGRBM_SCRATCH_REG0 0x0040
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
#define mmGRBM_SCRATCH_REG1 0x0041
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
#define mmGRBM_SCRATCH_REG2 0x0042
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
#define mmGRBM_SCRATCH_REG3 0x0043
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
#define mmGRBM_SCRATCH_REG4 0x0044
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
#define mmGRBM_SCRATCH_REG5 0x0045
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
#define mmGRBM_SCRATCH_REG6 0x0046
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
#define mmGRBM_SCRATCH_REG7 0x0047
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
#define mmGRBM_CNTL 0x0000
#define mmGRBM_CNTL_BASE_IDX 0
#define mmGRBM_SKEW_CNTL 0x0001
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
#define mmGRBM_STATUS2 0x0002
#define mmGRBM_STATUS2_BASE_IDX 0
#define mmGRBM_PWR_CNTL 0x0003
#define mmGRBM_PWR_CNTL_BASE_IDX 0
#define mmGRBM_STATUS 0x0004
#define mmGRBM_STATUS_BASE_IDX 0
#define mmGRBM_STATUS_SE0 0x0005
#define mmGRBM_STATUS_SE0_BASE_IDX 0
#define mmGRBM_STATUS_SE1 0x0006
#define mmGRBM_STATUS_SE1_BASE_IDX 0
#define mmGRBM_SOFT_RESET 0x0008
#define mmGRBM_SOFT_RESET_BASE_IDX 0
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
#define mmGRBM_STATUS_SE2 0x000e
#define mmGRBM_STATUS_SE2_BASE_IDX 0
#define mmGRBM_STATUS_SE3 0x000f
#define mmGRBM_STATUS_SE3_BASE_IDX 0
#define mmGRBM_READ_ERROR 0x0016
#define mmGRBM_READ_ERROR_BASE_IDX 0
#define mmGRBM_READ_ERROR2 0x0017
#define mmGRBM_READ_ERROR2_BASE_IDX 0
#define mmGRBM_INT_CNTL 0x0018
#define mmGRBM_INT_CNTL_BASE_IDX 0
#define mmGRBM_TRAP_OP 0x0019
#define mmGRBM_TRAP_OP_BASE_IDX 0
#define mmGRBM_TRAP_ADDR 0x001a
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
#define mmGRBM_TRAP_ADDR_MSK 0x001b
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
#define mmGRBM_TRAP_WD 0x001c
#define mmGRBM_TRAP_WD_BASE_IDX 0
#define mmGRBM_TRAP_WD_MSK 0x001d
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
#define mmGRBM_DSM_BYPASS 0x001e
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
#define mmGRBM_WRITE_ERROR 0x001f
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
#define mmGRBM_IOV_ERROR 0x0020
#define mmGRBM_IOV_ERROR_BASE_IDX 0
#define mmGRBM_CHIP_REVISION 0x0021
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
#define mmGRBM_GFX_CNTL 0x0022
#define mmGRBM_GFX_CNTL_BASE_IDX 0
#define mmGRBM_RSMU_CFG 0x0023
#define mmGRBM_RSMU_CFG_BASE_IDX 0
#define mmGRBM_IH_CREDIT 0x0024
#define mmGRBM_IH_CREDIT_BASE_IDX 0
#define mmGRBM_PWR_CNTL2 0x0025
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
#define mmGRBM_RSMU_READ_ERROR 0x0028
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
#define mmGRBM_CHICKEN_BITS 0x0029
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
#define mmGRBM_FENCE_RANGE0 0x002a
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
#define mmGRBM_FENCE_RANGE1 0x002b
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
#define mmGRBM_NOWHERE 0x003f
#define mmGRBM_NOWHERE_BASE_IDX 0
#define mmGRBM_SCRATCH_REG0 0x0040
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
#define mmGRBM_SCRATCH_REG1 0x0041
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
#define mmGRBM_SCRATCH_REG2 0x0042
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
#define mmGRBM_SCRATCH_REG3 0x0043
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
#define mmGRBM_SCRATCH_REG4 0x0044
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
#define mmGRBM_SCRATCH_REG5 0x0045
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
#define mmGRBM_SCRATCH_REG6 0x0046
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
#define mmGRBM_SCRATCH_REG7 0x0047
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
// addressBlock: gc_cppdec2
// base address: 0xc600
#define mmCPF_EDC_TAG_CNT 0x1189
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
#define mmCPF_EDC_ROQ_CNT 0x118a
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
#define mmCPG_EDC_TAG_CNT 0x118b
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
#define mmCPG_EDC_DMA_CNT 0x118d
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
#define mmCPC_EDC_SCRATCH_CNT 0x118e
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
#define mmCPC_EDC_UCODE_CNT 0x118f
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
#define mmDC_EDC_STATE_CNT 0x1191
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
#define mmDC_EDC_CSINVOC_CNT 0x1192
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
#define mmDC_EDC_RESTORE_CNT 0x1193
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
#define mmCPF_EDC_TAG_CNT 0x1189
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
#define mmCPF_EDC_ROQ_CNT 0x118a
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
#define mmCPG_EDC_TAG_CNT 0x118b
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
#define mmCPG_EDC_DMA_CNT 0x118d
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
#define mmCPC_EDC_SCRATCH_CNT 0x118e
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
#define mmCPC_EDC_UCODE_CNT 0x118f
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
#define mmDC_EDC_STATE_CNT 0x1191
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
#define mmDC_EDC_CSINVOC_CNT 0x1192
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
#define mmDC_EDC_RESTORE_CNT 0x1193
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
// addressBlock: gc_gdsdec
// base address: 0x9700
#define mmGDS_EDC_CNT 0x05c5
#define mmGDS_EDC_CNT_BASE_IDX 0
#define mmGDS_EDC_GRBM_CNT 0x05c6
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_DED 0x05c7
#define mmGDS_EDC_OA_DED_BASE_IDX 0
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
#define mmGDS_EDC_CNT 0x05c5
#define mmGDS_EDC_CNT_BASE_IDX 0
#define mmGDS_EDC_GRBM_CNT 0x05c6
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_DED 0x05c7
#define mmGDS_EDC_OA_DED_BASE_IDX 0
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
// addressBlock: gc_shsdec
// base address: 0x9000
#define mmSPI_EDC_CNT 0x0445
#define mmSPI_EDC_CNT_BASE_IDX 0
#define mmSPI_EDC_CNT 0x0445
#define mmSPI_EDC_CNT_BASE_IDX 0
// addressBlock: gc_sqdec
// base address: 0x8c00
#define mmSQC_EDC_CNT2 0x032c
#define mmSQC_EDC_CNT2_BASE_IDX 0
#define mmSQC_EDC_CNT3 0x032d
#define mmSQC_EDC_CNT3_BASE_IDX 0
#define mmSQC_EDC_PARITY_CNT3 0x032e
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
#define mmSQC_EDC_CNT 0x03a2
#define mmSQC_EDC_CNT_BASE_IDX 0
#define mmSQ_EDC_SEC_CNT 0x03a3
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
#define mmSQ_EDC_DED_CNT 0x03a4
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
#define mmSQ_EDC_INFO 0x03a5
#define mmSQ_EDC_INFO_BASE_IDX 0
#define mmSQ_EDC_CNT 0x03a6
#define mmSQ_EDC_CNT_BASE_IDX 0
#define mmSQC_EDC_CNT2 0x032c
#define mmSQC_EDC_CNT2_BASE_IDX 0
#define mmSQC_EDC_CNT3 0x032d
#define mmSQC_EDC_CNT3_BASE_IDX 0
#define mmSQC_EDC_PARITY_CNT3 0x032e
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
#define mmSQC_EDC_CNT 0x03a2
#define mmSQC_EDC_CNT_BASE_IDX 0
#define mmSQ_EDC_SEC_CNT 0x03a3
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
#define mmSQ_EDC_DED_CNT 0x03a4
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
#define mmSQ_EDC_INFO 0x03a5
#define mmSQ_EDC_INFO_BASE_IDX 0
#define mmSQ_EDC_CNT 0x03a6
#define mmSQ_EDC_CNT_BASE_IDX 0
// addressBlock: gc_tpdec
// base address: 0x9400
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
// addressBlock: gc_tcdec
// base address: 0xac00
#define mmTCP_EDC_CNT 0x0b17
#define mmTCP_EDC_CNT_BASE_IDX 0
#define mmTCP_EDC_CNT_NEW 0x0b18
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
#define mmTCI_EDC_CNT 0x0b60
#define mmTCI_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT 0x0b82
#define mmTCC_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT2 0x0b83
#define mmTCC_EDC_CNT2_BASE_IDX 0
#define mmTCA_EDC_CNT 0x0bc5
#define mmTCA_EDC_CNT_BASE_IDX 0
#define mmTCP_EDC_CNT 0x0b17
#define mmTCP_EDC_CNT_BASE_IDX 0
#define mmTCP_EDC_CNT_NEW 0x0b18
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
#define mmTCI_EDC_CNT 0x0b60
#define mmTCI_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT 0x0b82
#define mmTCC_EDC_CNT_BASE_IDX 0
#define mmTCC_EDC_CNT2 0x0b83
#define mmTCC_EDC_CNT2_BASE_IDX 0
#define mmTCA_EDC_CNT 0x0bc5
#define mmTCA_EDC_CNT_BASE_IDX 0
// addressBlock: gc_tpdec
// base address: 0x9400
#define mmTD_EDC_CNT 0x052e
#define mmTD_EDC_CNT_BASE_IDX 0
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
#define mmTD_EDC_CNT 0x052e
#define mmTD_EDC_CNT_BASE_IDX 0
#define mmTA_EDC_CNT 0x0586
#define mmTA_EDC_CNT_BASE_IDX 0
// addressBlock: gc_ea_gceadec2
// base address: 0x9c00
#define mmGCEA_EDC_CNT 0x0706
#define mmGCEA_EDC_CNT_BASE_IDX 0
#define mmGCEA_EDC_CNT2 0x0707
#define mmGCEA_EDC_CNT2_BASE_IDX 0
#define mmGCEA_EDC_CNT3 0x071b
#define mmGCEA_EDC_CNT3_BASE_IDX 0
#define mmGCEA_ERR_STATUS 0x0712
#define mmGCEA_ERR_STATUS_BASE_IDX 0
#define mmGCEA_EDC_CNT 0x0706
#define mmGCEA_EDC_CNT_BASE_IDX 0
#define mmGCEA_EDC_CNT2 0x0707
#define mmGCEA_EDC_CNT2_BASE_IDX 0
#define mmGCEA_EDC_CNT3 0x071b
#define mmGCEA_EDC_CNT3_BASE_IDX 0
#define mmGCEA_ERR_STATUS 0x0712
#define mmGCEA_ERR_STATUS_BASE_IDX 0
// addressBlock: gc_gfxudec
// base address: 0x30000
#define mmSCRATCH_REG0 0x2040
#define mmSCRATCH_REG0_BASE_IDX 1
#define mmSCRATCH_REG1 0x2041
#define mmSCRATCH_REG1_BASE_IDX 1
#define mmSCRATCH_REG2 0x2042
#define mmSCRATCH_REG2_BASE_IDX 1
#define mmSCRATCH_REG3 0x2043
#define mmSCRATCH_REG3_BASE_IDX 1
#define mmSCRATCH_REG4 0x2044
#define mmSCRATCH_REG4_BASE_IDX 1
#define mmSCRATCH_REG5 0x2045
#define mmSCRATCH_REG5_BASE_IDX 1
#define mmSCRATCH_REG6 0x2046
#define mmSCRATCH_REG6_BASE_IDX 1
#define mmSCRATCH_REG7 0x2047
#define mmSCRATCH_REG7_BASE_IDX 1
#define mmGRBM_GFX_INDEX 0x2200
#define mmGRBM_GFX_INDEX_BASE_IDX 1
#define mmSCRATCH_REG0 0x2040
#define mmSCRATCH_REG0_BASE_IDX 1
#define mmSCRATCH_REG1 0x2041
#define mmSCRATCH_REG1_BASE_IDX 1
#define mmSCRATCH_REG2 0x2042
#define mmSCRATCH_REG2_BASE_IDX 1
#define mmSCRATCH_REG3 0x2043
#define mmSCRATCH_REG3_BASE_IDX 1
#define mmSCRATCH_REG4 0x2044
#define mmSCRATCH_REG4_BASE_IDX 1
#define mmSCRATCH_REG5 0x2045
#define mmSCRATCH_REG5_BASE_IDX 1
#define mmSCRATCH_REG6 0x2046
#define mmSCRATCH_REG6_BASE_IDX 1
#define mmSCRATCH_REG7 0x2047
#define mmSCRATCH_REG7_BASE_IDX 1
#define mmGRBM_GFX_INDEX 0x2200
#define mmGRBM_GFX_INDEX_BASE_IDX 1
// addressBlock: gc_utcl2_atcl2dec
// base address: 0xa000
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
// addressBlock: gc_utcl2_vml2pfdec
// base address: 0xa100
#define mmVML2_MEM_ECC_INDEX 0x0860
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
#define mmUTCL2_MEM_ECC_INDEX 0x0862
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_MEM_ECC_INDEX 0x0860
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
#define mmUTCL2_MEM_ECC_INDEX 0x0862
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
#define mmVML2_MEM_ECC_CNTL 0x0863
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
#define mmUTCL2_MEM_ECC_CNTL 0x0865
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
#define mmVML2_MEM_ECC_CNTL 0x0863
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
#define mmUTCL2_MEM_ECC_CNTL 0x0865
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
// addressBlock: gc_rlcpdec
// base address: 0x3b000
#define mmRLC_EDC_CNT 0x4d40
#define mmRLC_EDC_CNT_BASE_IDX 1
#define mmRLC_EDC_CNT2 0x4d41
#define mmRLC_EDC_CNT2_BASE_IDX 1
#define mmRLC_EDC_CNT 0x4d40
#define mmRLC_EDC_CNT_BASE_IDX 1
#define mmRLC_EDC_CNT2 0x4d41
#define mmRLC_EDC_CNT2_BASE_IDX 1
#endif
Разница между файлами не показана из-за своего большого размера Загрузить разницу
Разница между файлами не показана из-за своего большого размера Загрузить разницу
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+22 -36
Просмотреть файл
@@ -41,18 +41,17 @@ namespace rocprofiler::pc_sampler::gfxip {
namespace {
static int find_pci_instance(const std::string &pci_string) {
static int find_pci_instance(const std::string& pci_string) {
rocprofiler::handle_t<DIR*, util::dir_closer> dir(opendir(DEBUG_DRI_PATH));
if (dir.get() == nullptr) {
char *errstr = strerror(errno);
char* errstr = strerror(errno);
warning("Can't open debugfs dri directory: %s\n", errstr);
goto fail;
}
struct dirent *dent;
struct dirent* dent;
while ((dent = readdir(dir.get())) != nullptr) {
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0)
continue;
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0) continue;
std::string name(DEBUG_DRI_PATH);
name += dent->d_name;
@@ -66,8 +65,7 @@ static int find_pci_instance(const std::string &pci_string) {
ifs >> device;
}
if (device.empty()) continue;
if (auto p = device.find(DEV_PFX); p != device.npos)
device.erase(p, strlen(DEV_PFX));
if (auto p = device.find(DEV_PFX); p != device.npos) device.erase(p, strlen(DEV_PFX));
if (pci_string == device) return std::stoi(dent->d_name);
}
@@ -75,7 +73,7 @@ fail:
return -1;
}
} // namespace
} // namespace
uint32_t pasid() {
static std::optional<uint32_t> pasid;
@@ -89,9 +87,7 @@ uint32_t pasid() {
return *pasid;
}
int debugfs_ioctl_set_state(
const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc) {
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc) {
int ret = ioctl(dev.fd_.mmio2.get(), AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE, &ioc);
if (ret < 0) {
fatal("Couldn't set register ioctl state\n");
@@ -99,11 +95,9 @@ int debugfs_ioctl_set_state(
return ret;
}
int debugfs_ioctl_write_register(
const device_t &dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc,
const uint64_t addr,
const uint32_t value) {
int debugfs_ioctl_write_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr, const uint32_t value) {
debugfs_ioctl_set_state(dev, ioc);
if (lseek(dev.fd_.mmio2.get(), addr * 4, SEEK_SET) < 0) {
fatal("Cannot seek to MMIO address for write\n");
@@ -115,10 +109,9 @@ int debugfs_ioctl_write_register(
return r;
}
uint32_t debugfs_ioctl_read_register(
const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc,
const uint64_t addr) {
uint32_t debugfs_ioctl_read_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr) {
// Select the SE, SH, and CU.
debugfs_ioctl_set_state(dev, ioc);
@@ -134,20 +127,17 @@ uint32_t debugfs_ioctl_read_register(
return value;
}
device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
: agent_info_(info)
, pci_memory_(nullptr)
{
device_t::device_t(const bool pci_inited, const Agent::AgentInfo& info)
: agent_info_(info), pci_memory_(nullptr) {
const auto pci_domain = agent_info_.getPCIDomain();
const auto pci_location_id = agent_info_.getPCILocationID();
std::string name([pci_domain, pci_location_id]() {
std::ostringstream out;
out.fill('0');
out << std::hex << std::setw(4) << pci_domain << ':'
<< std::hex << std::setw(2) << (pci_location_id >> 8) << ':'
<< std::hex << std::setw(2) << (pci_location_id & 0xFF) << '.'
<< 0;
out << std::hex << std::setw(4) << pci_domain << ':' << std::hex << std::setw(2)
<< (pci_location_id >> 8) << ':' << std::hex << std::setw(2) << (pci_location_id & 0xFF)
<< '.' << 0;
return out.str();
}());
@@ -162,8 +152,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
if (fd_.mmio2.get() < 0) {
warning("Couldn't open amdgpu_regs2 debugfs file\n");
if (!pci_inited) {
constexpr char msg[] =
"PCI system uninitialized; no PC sampling methods available\n";
constexpr char msg[] = "PCI system uninitialized; no PC sampling methods available\n";
fatal(msg);
}
} else {
@@ -173,8 +162,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
pci_device_ =
pci_device_find_by_slot(pci_domain, pci_location_id >> 8, pci_location_id & 0xFF, 0);
if (!pci_device_ || pci_device_probe(pci_device_))
fatal("failed to probe the GPU device\n");
if (!pci_device_ || pci_device_probe(pci_device_)) fatal("failed to probe the GPU device\n");
// Look for a region between 256KB and 4096KB, 32-bit, non IO, and non prefetchable.
for (size_t region = 0; region < sizeof(pci_device::regions) / sizeof(pci_device::regions[0]);
@@ -199,11 +187,9 @@ device_specific_init:
}
device_t::~device_t() {
if (pci_memory_ &&
pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_))
{
if (pci_memory_ && pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_)) {
warning("failed to unmap the pci memory\n");
}
}
} // namespace rocprofiler::pc_sampler::gfxip
} // namespace rocprofiler::pc_sampler::gfxip
+23 -16
Просмотреть файл
@@ -52,14 +52,18 @@ namespace gfxip {
namespace util {
struct dir_closer {
void operator()(DIR *dir) { if (dir != nullptr) closedir(dir); }
void operator()(DIR* dir) {
if (dir != nullptr) closedir(dir);
}
};
struct fd_closer {
void operator()(int fd) { if (fd >= 0) close(fd); }
void operator()(int fd) {
if (fd >= 0) close(fd);
}
};
} // namespace rocprofiler::pc_sampler::gfxip::util
} // namespace util
struct amdgpu_debugfs_regs2_iocdata {
__u32 use_srbm, use_grbm, pg_lock;
@@ -71,11 +75,10 @@ struct amdgpu_debugfs_regs2_iocdata {
} srbm;
};
enum AMDGPU_DEBUGFS_REGS2_CMDS {
AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0
};
enum AMDGPU_DEBUGFS_REGS2_CMDS { AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0 };
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE _IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE \
_IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
enum {
GC_HWIP = 1, // Graphics Core IP
@@ -96,14 +99,14 @@ static constexpr int HWIP_MAX_INSTANCE = 11;
(REG_FIELD_MASK(reg, field) & ((field_val) << REG_FIELD_SHIFT(reg, field))))
struct device_t {
device_t(const bool pci_inited, const Agent::AgentInfo &agent_info);
device_t(const bool pci_inited, const Agent::AgentInfo& agent_info);
~device_t();
device_t(const device_t&) = delete;
device_t& operator=(const device_t&) = delete;
device_t(device_t&&) = default;
const Agent::AgentInfo &agent_info_;
const Agent::AgentInfo& agent_info_;
struct pci_device* pci_device_;
size_t pci_memory_size_;
@@ -120,19 +123,23 @@ struct device_t {
uint32_t pasid();
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc);
int debugfs_ioctl_write_register(const device_t &dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr, const uint32_t value);
uint32_t debugfs_ioctl_read_register(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr);
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc);
int debugfs_ioctl_write_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr, const uint32_t value);
uint32_t debugfs_ioctl_read_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint64_t addr);
void vega10_reg_offset_init(device_t& dev);
void vega20_reg_offset_init(device_t& dev);
void arct_reg_offset_init(device_t& dev);
void aldebaran_reg_offset_init(device_t& dev);
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler);
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler);
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler);
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler);
} // namespace rocprofiler::pc_sampler::gfxip
} // namespace gfxip
} // namespace rocprofiler::pc_sampler
} // namespace rocprofiler::pc_sampler
#endif // SRC_PCSAMPLER_GFXIP_GFXIP_H_
+40 -45
Просмотреть файл
@@ -54,12 +54,10 @@ uint32_t read_sq_register(const device_t& dev, uint32_t simd, uint32_t wave_id,
return dev.pci_memory_[REG_OFFSET(GC, 0, mmSQ_IND_DATA)];
}
uint32_t debugfs_ioctl_read_sq_register(
const device_t &dev,
const struct amdgpu_debugfs_regs2_iocdata &ioc,
const uint32_t simd,
const uint32_t wave_id,
const uint32_t register_address) {
uint32_t debugfs_ioctl_read_sq_register(const device_t& dev,
const struct amdgpu_debugfs_regs2_iocdata& ioc,
const uint32_t simd, const uint32_t wave_id,
const uint32_t register_address) {
uint32_t data = REG_SET_FIELD(0, SQ_IND_INDEX, WAVE_ID, wave_id);
data = REG_SET_FIELD(data, SQ_IND_INDEX, SIMD_ID, simd);
data = REG_SET_FIELD(data, SQ_IND_INDEX, INDEX, register_address);
@@ -67,21 +65,15 @@ uint32_t debugfs_ioctl_read_sq_register(
return debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmSQ_IND_DATA));
}
void fill_record(
const device_t &dev,
rocprofiler_record_pc_sample_t *record,
uint32_t se,
uint64_t pc,
hsa_kernel_dispatch_packet_t *pkt) {
void fill_record(const device_t& dev, rocprofiler_record_pc_sample_t* record, uint32_t se,
uint64_t pc, hsa_kernel_dispatch_packet_t* pkt) {
/*
* XXX: Use of the reserved2 field in the HSA dispatch packet to uniquely
* identify kernel dispatches for PC sampling is an internal implementation
* detail which is subject to change. See the comment associated with
* rocprofiler::rocprofiler::kernel_dispatch_counter_.
*/
record->pc_sample.dispatch_id =
rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
record->pc_sample.dispatch_id = rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
/*
* TODO: Fill this with gpu_clock_counter via AMDKFD_IOC_GET_CLOCK_COUNTERS,
@@ -98,12 +90,12 @@ void fill_record(
* Future sampling methods may fill this in automatically from the GPU's
* real-time counter.
*/
//record->pc_sample.cycle = 0;
// record->pc_sample.cycle = 0;
rocprofiler_get_timestamp(&record->pc_sample.timestamp);
record->pc_sample.pc = pc;
record->pc_sample.se = se;
const auto &hdl = dev.agent_info_.getHandle();
const auto& hdl = dev.agent_info_.getHandle();
/*
* XXX FIXME: For consistency, this is the same method as used by
@@ -112,17 +104,16 @@ void fill_record(
* comment in rocprofiler::hsa_support::Initialize about using KFD's gpu_id for
* more information.
*/
record->pc_sample.gpu_id = rocprofiler_agent_id_t{
(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
record->pc_sample.gpu_id =
rocprofiler_agent_id_t{(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
}
} // namespace
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler) {
assert(sampler);
uint32_t saved_grbm_gfx_index =
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
uint32_t saved_grbm_gfx_index = dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
uint32_t data;
for (uint32_t se = 0; se < dev.agent_info_.getShaderEngineCount(); ++se)
@@ -174,19 +165,16 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
data = REG_SET_FIELD(data, GRBM_GFX_CNTL, VMID, vm_id);
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL)] = data;
uint32_t pq_base_lo =
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
uint32_t pq_base_hi =
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
uint32_t pq_base_lo = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
uint32_t pq_base_hi = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
uint32_t cp_hqd_pq_control_queue_size =
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
auto pkt = (hsa_kernel_dispatch_packet_t*)(
pq_base + disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t)
);
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t));
fill_record(dev, &record, se, *pc, pkt);
}
@@ -208,10 +196,10 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)] = saved_grbm_gfx_index;
}
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler) {
assert(sampler);
struct amdgpu_debugfs_regs2_iocdata ioc{};
struct amdgpu_debugfs_regs2_iocdata ioc {};
ioc.use_grbm = 1;
uint32_t data;
@@ -236,11 +224,13 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
// Skip this slot if the wave is not valid.
debugfs_ioctl_set_state(dev, ioc);
uint32_t status = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
uint32_t status =
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
if (!REG_GET_FIELD(status, SQ_WAVE_STATUS, VALID)) continue;
debugfs_ioctl_set_state(dev, ioc);
uint32_t hw_id = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
uint32_t hw_id =
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
uint32_t vm_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, VM_ID);
rocprofiler_record_pc_sample_t record;
@@ -248,12 +238,16 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
// If the wave's PASID matches the process', read and report the PC
// and dispatch packet for the wave.
std::optional<uint64_t> pc;
if (debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
pc = (uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI) << 32 |
if (debugfs_ioctl_read_register(
dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
pc =
(uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI)
<< 32 |
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_LO);
// The dispatch index into the queue
uint32_t disp_idx = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
uint32_t disp_idx =
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
// Set up reading CP_HQD_PQ_BASE and CP_HQD_PQ_BASE_HI
uint32_t pipe_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, PIPE_ID);
@@ -266,18 +260,19 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
debugfs_ioctl_write_register(dev, ioc, REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL), data);
uint32_t pq_base_lo =
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
uint32_t pq_base_hi =
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) & 0xff;
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) &
0xff;
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
uint32_t cp_hqd_pq_control_queue_size =
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) & 0x3f;
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) &
0x3f;
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
auto pkt = (hsa_kernel_dispatch_packet_t*)(
pq_base + disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t)
);
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
disp_idx % queue_size *
sizeof(hsa_kernel_dispatch_packet_t));
fill_record(dev, &record, se, *pc, pkt);
}
+298 -299
Просмотреть файл
@@ -22,306 +22,305 @@
#define _osssys_4_0_OFFSET_HEADER
// addressBlock: osssys_osssysdec
// base address: 0x4280
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x0088
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x0089
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008a
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008b
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x008c
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x008f
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0090
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0091
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x0092
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x0093
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x0094
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x0097
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x0098
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x0088
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x0089
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008a
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008b
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x008c
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x008f
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0090
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0091
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x0092
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x0093
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x0094
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x0097
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x0098
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#endif
Разница между файлами не показана из-за своего большого размера Загрузить разницу
+314 -315
Просмотреть файл
@@ -24,322 +24,321 @@
#define _osssys_4_2_0_OFFSET_HEADER
// addressBlock: osssys_osssysdec
// base address: 0x4280
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x008c
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x008d
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008e
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008f
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x0090
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x0093
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0098
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0099
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x009a
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x009b
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x009c
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x009f
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x00a5
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_MEM_POWER_CTRL 0x00e8
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_MEM_POWER_CTRL 0x0110
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_CNTL 0x018e
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_RESP_UVD_1 0x01ad
#define mmSEM_RESP_UVD_1_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#define mmIH_VMID_0_LUT 0x0000
#define mmIH_VMID_0_LUT_BASE_IDX 0
#define mmIH_VMID_1_LUT 0x0001
#define mmIH_VMID_1_LUT_BASE_IDX 0
#define mmIH_VMID_2_LUT 0x0002
#define mmIH_VMID_2_LUT_BASE_IDX 0
#define mmIH_VMID_3_LUT 0x0003
#define mmIH_VMID_3_LUT_BASE_IDX 0
#define mmIH_VMID_4_LUT 0x0004
#define mmIH_VMID_4_LUT_BASE_IDX 0
#define mmIH_VMID_5_LUT 0x0005
#define mmIH_VMID_5_LUT_BASE_IDX 0
#define mmIH_VMID_6_LUT 0x0006
#define mmIH_VMID_6_LUT_BASE_IDX 0
#define mmIH_VMID_7_LUT 0x0007
#define mmIH_VMID_7_LUT_BASE_IDX 0
#define mmIH_VMID_8_LUT 0x0008
#define mmIH_VMID_8_LUT_BASE_IDX 0
#define mmIH_VMID_9_LUT 0x0009
#define mmIH_VMID_9_LUT_BASE_IDX 0
#define mmIH_VMID_10_LUT 0x000a
#define mmIH_VMID_10_LUT_BASE_IDX 0
#define mmIH_VMID_11_LUT 0x000b
#define mmIH_VMID_11_LUT_BASE_IDX 0
#define mmIH_VMID_12_LUT 0x000c
#define mmIH_VMID_12_LUT_BASE_IDX 0
#define mmIH_VMID_13_LUT 0x000d
#define mmIH_VMID_13_LUT_BASE_IDX 0
#define mmIH_VMID_14_LUT 0x000e
#define mmIH_VMID_14_LUT_BASE_IDX 0
#define mmIH_VMID_15_LUT 0x000f
#define mmIH_VMID_15_LUT_BASE_IDX 0
#define mmIH_VMID_0_LUT_MM 0x0010
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
#define mmIH_VMID_1_LUT_MM 0x0011
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
#define mmIH_VMID_2_LUT_MM 0x0012
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
#define mmIH_VMID_3_LUT_MM 0x0013
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
#define mmIH_VMID_4_LUT_MM 0x0014
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
#define mmIH_VMID_5_LUT_MM 0x0015
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
#define mmIH_VMID_6_LUT_MM 0x0016
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
#define mmIH_VMID_7_LUT_MM 0x0017
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
#define mmIH_VMID_8_LUT_MM 0x0018
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
#define mmIH_VMID_9_LUT_MM 0x0019
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
#define mmIH_VMID_10_LUT_MM 0x001a
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
#define mmIH_VMID_11_LUT_MM 0x001b
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
#define mmIH_VMID_12_LUT_MM 0x001c
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
#define mmIH_VMID_13_LUT_MM 0x001d
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
#define mmIH_VMID_14_LUT_MM 0x001e
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
#define mmIH_VMID_15_LUT_MM 0x001f
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
#define mmIH_COOKIE_0 0x0020
#define mmIH_COOKIE_0_BASE_IDX 0
#define mmIH_COOKIE_1 0x0021
#define mmIH_COOKIE_1_BASE_IDX 0
#define mmIH_COOKIE_2 0x0022
#define mmIH_COOKIE_2_BASE_IDX 0
#define mmIH_COOKIE_3 0x0023
#define mmIH_COOKIE_3_BASE_IDX 0
#define mmIH_COOKIE_4 0x0024
#define mmIH_COOKIE_4_BASE_IDX 0
#define mmIH_COOKIE_5 0x0025
#define mmIH_COOKIE_5_BASE_IDX 0
#define mmIH_COOKIE_6 0x0026
#define mmIH_COOKIE_6_BASE_IDX 0
#define mmIH_COOKIE_7 0x0027
#define mmIH_COOKIE_7_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART0 0x003f
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
#define mmSEM_REQ_INPUT_0 0x0040
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
#define mmSEM_REQ_INPUT_1 0x0041
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
#define mmSEM_REQ_INPUT_2 0x0042
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
#define mmSEM_REQ_INPUT_3 0x0043
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART0 0x007f
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
#define mmIH_RB_CNTL 0x0080
#define mmIH_RB_CNTL_BASE_IDX 0
#define mmIH_RB_BASE 0x0081
#define mmIH_RB_BASE_BASE_IDX 0
#define mmIH_RB_BASE_HI 0x0082
#define mmIH_RB_BASE_HI_BASE_IDX 0
#define mmIH_RB_RPTR 0x0083
#define mmIH_RB_RPTR_BASE_IDX 0
#define mmIH_RB_WPTR 0x0084
#define mmIH_RB_WPTR_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_HI 0x0085
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
#define mmIH_RB_WPTR_ADDR_LO 0x0086
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
#define mmIH_DOORBELL_RPTR 0x0087
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
#define mmIH_RB_CNTL_RING1 0x008c
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
#define mmIH_RB_BASE_RING1 0x008d
#define mmIH_RB_BASE_RING1_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING1 0x008e
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
#define mmIH_RB_RPTR_RING1 0x008f
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_WPTR_RING1 0x0090
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING1 0x0093
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
#define mmIH_RB_CNTL_RING2 0x0098
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
#define mmIH_RB_BASE_RING2 0x0099
#define mmIH_RB_BASE_RING2_BASE_IDX 0
#define mmIH_RB_BASE_HI_RING2 0x009a
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
#define mmIH_RB_RPTR_RING2 0x009b
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
#define mmIH_RB_WPTR_RING2 0x009c
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
#define mmIH_DOORBELL_RPTR_RING2 0x009f
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
#define mmIH_VERSION 0x00a5
#define mmIH_VERSION_BASE_IDX 0
#define mmIH_CNTL 0x00c0
#define mmIH_CNTL_BASE_IDX 0
#define mmIH_CNTL2 0x00c1
#define mmIH_CNTL2_BASE_IDX 0
#define mmIH_STATUS 0x00c2
#define mmIH_STATUS_BASE_IDX 0
#define mmIH_PERFMON_CNTL 0x00c3
#define mmIH_PERFMON_CNTL_BASE_IDX 0
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
#define mmIH_VF_RB_STATUS 0x00ce
#define mmIH_VF_RB_STATUS_BASE_IDX 0
#define mmIH_VF_RB_STATUS2 0x00cf
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
#define mmIH_VF_RB1_STATUS 0x00d0
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
#define mmIH_VF_RB1_STATUS2 0x00d1
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
#define mmIH_VF_RB2_STATUS 0x00d2
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
#define mmIH_VF_RB2_STATUS2 0x00d3
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
#define mmIH_INT_FLOOD_CNTL 0x00d5
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_INT_FLOOD_STATUS 0x00d9
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
#define mmIH_CLK_CTRL 0x00db
#define mmIH_CLK_CTRL_BASE_IDX 0
#define mmIH_INT_FLAGS 0x00dc
#define mmIH_INT_FLAGS_BASE_IDX 0
#define mmIH_LAST_INT_INFO0 0x00dd
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
#define mmIH_LAST_INT_INFO1 0x00de
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
#define mmIH_LAST_INT_INFO2 0x00df
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
#define mmIH_SCRATCH 0x00e0
#define mmIH_SCRATCH_BASE_IDX 0
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
#define mmIH_CREDIT_STATUS 0x00e4
#define mmIH_CREDIT_STATUS_BASE_IDX 0
#define mmIH_MMHUB_ERROR 0x00e5
#define mmIH_MMHUB_ERROR_BASE_IDX 0
#define mmIH_MEM_POWER_CTRL 0x00e8
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART2 0x00ff
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
#define mmSEM_CLK_CTRL 0x0100
#define mmSEM_CLK_CTRL_BASE_IDX 0
#define mmSEM_UTC_CREDIT 0x0101
#define mmSEM_UTC_CREDIT_BASE_IDX 0
#define mmSEM_UTC_CONFIG 0x0102
#define mmSEM_UTC_CONFIG_BASE_IDX 0
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
#define mmSEM_MCIF_CONFIG 0x0104
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
#define mmSEM_PERFMON_CNTL 0x0105
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
#define mmSEM_STATUS 0x0108
#define mmSEM_STATUS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
#define mmSEM_MAILBOX 0x010a
#define mmSEM_MAILBOX_BASE_IDX 0
#define mmSEM_MAILBOX_CONTROL 0x010b
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
#define mmSEM_CHICKEN_BITS 0x010c
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
#define mmSEM_MEM_POWER_CTRL 0x0110
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART2 0x017f
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
#define mmIH_ACTIVE_FCN_ID 0x0180
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
#define mmIH_VIRT_RESET_REQ 0x0181
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
#define mmIH_CLIENT_CFG 0x0184
#define mmIH_CLIENT_CFG_BASE_IDX 0
#define mmIH_CLIENT_CFG_INDEX 0x0188
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
#define mmIH_CLIENT_CFG_DATA 0x0189
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
#define mmIH_CID_REMAP_INDEX 0x018a
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
#define mmIH_CID_REMAP_DATA 0x018b
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
#define mmIH_CHICKEN 0x018c
#define mmIH_CHICKEN_BASE_IDX 0
#define mmIH_MMHUB_CNTL 0x018d
#define mmIH_MMHUB_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_CNTL 0x018e
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
#define mmIH_REGISTER_LAST_PART1 0x019f
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
#define mmSEM_ACTIVE_FCN_ID 0x01a0
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
#define mmSEM_VIRT_RESET_REQ 0x01a1
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
#define mmSEM_RESP_SDMA0 0x01a4
#define mmSEM_RESP_SDMA0_BASE_IDX 0
#define mmSEM_RESP_SDMA1 0x01a5
#define mmSEM_RESP_SDMA1_BASE_IDX 0
#define mmSEM_RESP_UVD 0x01a6
#define mmSEM_RESP_UVD_BASE_IDX 0
#define mmSEM_RESP_VCE_0 0x01a7
#define mmSEM_RESP_VCE_0_BASE_IDX 0
#define mmSEM_RESP_ACP 0x01a8
#define mmSEM_RESP_ACP_BASE_IDX 0
#define mmSEM_RESP_ISP 0x01a9
#define mmSEM_RESP_ISP_BASE_IDX 0
#define mmSEM_RESP_VCE_1 0x01aa
#define mmSEM_RESP_VCE_1_BASE_IDX 0
#define mmSEM_RESP_VP8 0x01ab
#define mmSEM_RESP_VP8_BASE_IDX 0
#define mmSEM_RESP_GC 0x01ac
#define mmSEM_RESP_GC_BASE_IDX 0
#define mmSEM_RESP_UVD_1 0x01ad
#define mmSEM_RESP_UVD_1_BASE_IDX 0
#define mmSEM_CID_REMAP_INDEX 0x01b0
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
#define mmSEM_CID_REMAP_DATA 0x01b1
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
#define mmSEM_ATOMIC_OP_LUT 0x01b2
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
#define mmSEM_EDC_CONFIG 0x01b3
#define mmSEM_EDC_CONFIG_BASE_IDX 0
#define mmSEM_CHICKEN_BITS2 0x01b4
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
#define mmSEM_MMHUB_CNTL 0x01b5
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
#define mmSEM_REGISTER_LAST_PART1 0x01bf
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
#endif
Разница между файлами не показана из-за своего большого размера Загрузить разницу

Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше