Pull from Github
Squashed commit of the following: commit f029195705a15700380c6f832ba5d15d46fd6de7 Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Date: Thu Jul 13 14:38:56 2023 -0500 Formatting workflows for source (clang-format) and cmake (cmake-format) (#4) * Add .cmake-format.yaml file * Add formatting workflow * provide base input for creating PR * Update scheme for extracting branch name - disable running formatting on push to amd-staging branch * patch .cmake-format.yaml for find_package signature - apparently cmake-format doesn't format the full signature of find_package * run formatting (clang-format v11) (#7) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * run cmake formatting (cmake-format) (#6) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> commit bc4d135fdd8a1a9e51235f18a5d575fd2b3735e6 Author: Ammar ELWazir <aelwazir@amd.com> Date: Thu Jul 13 12:55:17 2023 -0500 Removing Build cache for potential issues with auto-generated header files (#5) Change-Id: I9e2319f4335e2f88585ffa6fac2bd88a1c952e6e commit ce86dea6a311d44d880fa684eb78f3329295e2a4 Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Date: Thu Jul 13 11:08:58 2023 -0500 Fix decltype(<hsa-function>) function pointer usage (#3) - the following is done in several places: decltype(hsa_memory_allocate)* hsa_memory_allocate - above can cause compiler errors - replace decltype(<hsa-function>) with decltype(::<hsa-function>) - this ensures that the type within the decltype is recognized as the global scope HSA function, not the variable - in many places, the variable has a "_fn" suffix to prevent this issue but added '::' anyway for consistency commit ac49fdd92a72e9c99394253a02da413a6c2e3b3a Merge: a07946a 03a0855 Author: Ammar ELWazir <aelwazir@amd.com> Date: Wed Jul 12 11:36:24 2023 -0500 Merge pull request #2 from ROCm-Developer-Tools/gerrit-amd-staging Pull from gerrit commit 03a085588cffe863e8f466de67be1cfb205b675a Merge:c26b32ba07946a Author: Ammar ELWazir <aelwazir@amd.com> Date: Wed Jul 12 10:57:30 2023 -0500 Merge branch 'amd-staging' into gerrit-amd-staging commit a07946a5cd4c670c83c27ad1a076a9d4567ce6d7 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 15:46:04 2023 +0000 Enabling Cached Builds commit 525e494a7f13941077a8fd4ad6840904db4d27d4 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 04:53:54 2023 +0000 Updating missed GPU Targets commit 42c75862f628c9bee7cfb7dc04dff2619430efbc Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 04:43:02 2023 +0000 Adding V1 Testing commit 9d72fd4aee85e4b0c12e717060d2730fa5b73be1 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 03:34:31 2023 +0000 Fixing Artifacts directory path commit f4000cc558b3b2e4676f7994f7ce8c8e6f94518e Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 03:27:26 2023 +0000 Fixing CMake for test build job commit 2ce8115d4c33948c3c8f957f545a95a04e1d6cd2 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 03:16:18 2023 +0000 Fixing Ubuntu CMake for ubuntu test build commit 6d0ed439191be900748d0c025157f9d689a73ec7 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 01:28:41 2023 +0000 Removing Navi21 commit e349a7642e5ae5eb03ab9fcd0a0f74f09f78cab5 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 01:14:14 2023 +0000 Removing Navi21 commit fefd02fe68d2a4bca7ec2e381960ad004ee9fc5b Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 00:42:48 2023 +0000 Fixing CMake Job commit 2ea46abf7bf92643efa8c549fa70346ffbd79d65 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 00:35:13 2023 +0000 Fixing CMake Job commit d99d681ed1999c5fcf291dc678b11a77205fb0f3 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 00:32:13 2023 +0000 Fixing Pull Latest Dockers and CMake Jobs commit dfc4498072d13b4a1df3a63047d34c682c3d9a29 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 23:54:21 2023 +0000 Fixing CMake job commit 919efe04de707f7c702031be15c3e2c5f8442cbb Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 23:52:13 2023 +0000 Adding Pull Last dockers job commit be1b1256e8b0e05308e8f7e7e69bee3acca55281 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 18:25:40 2023 -0500 Update cmake.yml commit 212299fa4355ae6ec18f9aaacbb79c51ea6c6f97 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 18:23:35 2023 -0500 Update cmake.yml commit 7c2c1327086a61466cc6cac39f70865c051a8bc7 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 18:18:53 2023 -0500 Update cmake.yml commit 191b5ce007e612e814c1d7a3afb4ad398f3852e1 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 16:03:22 2023 -0500 Update cmake.yml commit 8824113d95f3e13c7ce4d0af8e0d9d8f522a6c4a Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 16:28:09 2023 +0000 Fixing Pull from Gerrit job name Change-Id: I9e7ed9a27a13ca49d62c93bdadb30f0057e4d385 commit cc3d5e4b02ffb439e8cc2b3efa53527c376f9982 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 16:21:43 2023 +0000 Adding Staging sync job Change-Id: I0551f43878b0678ce4b3e74e27d62357cf95ad95 commit b9be2eee71380a2e6dd34d520e92d0c4209277a0 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 15:57:11 2023 +0000 Fixing build.sh Change-Id: Ia987b0244f0875370d5fe69907b3f5e9cea914de commit 9eee33a95a1abd656a7ac5ca10a9f245e9825431 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 21:39:46 2023 -0500 Update cmake.yml commit 7093b85a78497140e8b52632ca2a002bdaeacd62 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 21:33:29 2023 -0500 Update cmake.yml commit f54697172c72a67740f9fdfa0c217b6ea6931576 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 21:01:26 2023 -0500 Update cmake.yml commit 1b6620e16f8940386b0f4f04e69e2410d21c0e26 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 20:21:02 2023 -0500 Update cmake.yml commit a94bec740c6b42c4b79c87bca20fa87b99bf060d Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:46:35 2023 -0500 Update cmake.yml commit 85d6b29d4375a69d575c18ece8542c50f2ddfcc3 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:34:39 2023 -0500 Update cmake.yml commit 8c004887cf1435f1a6214c3d2455299a8a27bd4c Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:31:17 2023 -0500 Update cmake.yml commit a14a9168e17d9348a53c6e9c9a47ba1edb4c4509 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:25:46 2023 -0500 Update cmake.yml commit 000f2f40b84e6a2f7d4becdbf5aed01436ca4c83 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:08:18 2023 -0500 Update cmake.yml commit a28a53d56731cad848fa9133d1c4dbaa8fc7afa7 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:03:39 2023 -0500 Update cmake.yml commit a6a2db01027f0b01fdfbb5997ddb772c7f51b649 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 18:21:53 2023 -0500 Update cmake.yml commit 118ef2a88b2d44e3207c31c343da3e5e5ec6f176 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 17:55:57 2023 -0500 Update cmake.yml commit 03c4c232396440cd0be6d2dd7baf4ceea1c2589d Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 17:48:49 2023 -0500 Create cmake.yml Change-Id: I77992f15694e77cbae49c56f9ff02f4f9079235d [ROCm/rocprofiler commit:d4a33cf33a]
Этот коммит содержится в:
коммит произвёл
Ammar Elwazir
родитель
e599708211
Коммит
6eb06cf201
@@ -0,0 +1,98 @@
|
||||
parse:
|
||||
additional_commands:
|
||||
find_package:
|
||||
flags:
|
||||
- EXACT
|
||||
- QUIET
|
||||
- MODULE
|
||||
- REQUIRED
|
||||
- CONFIG
|
||||
- NO_MODULE
|
||||
- GLOBAL
|
||||
- NO_POLICY_SCOPE
|
||||
- BYPASS_PROVIDER
|
||||
- NO_DEFAULT_PATH
|
||||
- NO_PACKAGE_ROOT_PATH
|
||||
- NO_CMAKE_PATH
|
||||
- NO_CMAKE_ENVIRONMENT_PATH
|
||||
- NO_SYSTEM_ENVIRONMENT_PATH
|
||||
- NO_CMAKE_PACKAGE_REGISTRY
|
||||
- NO_CMAKE_BUILDS_PATH
|
||||
- NO_CMAKE_SYSTEM_PATH
|
||||
- NO_CMAKE_INSTALL_PREFIX
|
||||
- NO_CMAKE_SYSTEM_PACKAGE_REGISTRY
|
||||
- CMAKE_FIND_ROOT_PATH_BOTH
|
||||
- ONLY_CMAKE_FIND_ROOT_PATH
|
||||
- NO_CMAKE_FIND_ROOT_PATH
|
||||
kwargs:
|
||||
COMPONENTS: '*'
|
||||
OPTIONAL_COMPONENTS: '*'
|
||||
NAMES: '*'
|
||||
CONFIGS: '*'
|
||||
HINTS: '*'
|
||||
PATHS: '*'
|
||||
REGISTRY_VIEW: '*'
|
||||
PATH_SUFFIXES: '*'
|
||||
override_spec: {}
|
||||
vartags: []
|
||||
proptags: []
|
||||
format:
|
||||
disable: false
|
||||
line_width: 90
|
||||
tab_size: 4
|
||||
use_tabchars: false
|
||||
fractional_tab_policy: use-space
|
||||
max_subgroups_hwrap: 2
|
||||
max_pargs_hwrap: 8
|
||||
max_rows_cmdline: 2
|
||||
separate_ctrl_name_with_space: false
|
||||
separate_fn_name_with_space: false
|
||||
dangle_parens: false
|
||||
dangle_align: child
|
||||
min_prefix_chars: 4
|
||||
max_prefix_chars: 10
|
||||
max_lines_hwrap: 2
|
||||
line_ending: unix
|
||||
command_case: lower
|
||||
keyword_case: upper
|
||||
always_wrap: []
|
||||
enable_sort: true
|
||||
autosort: false
|
||||
require_valid_layout: false
|
||||
layout_passes: {}
|
||||
markup:
|
||||
bullet_char: '*'
|
||||
enum_char: .
|
||||
first_comment_is_literal: true
|
||||
literal_comment_pattern: ^#
|
||||
fence_pattern: ^\s*([`~]{3}[`~]*)(.*)$
|
||||
ruler_pattern: ^\s*[^\w\s]{3}.*[^\w\s]{3}$
|
||||
explicit_trailing_pattern: '#<'
|
||||
hashruler_min_length: 10
|
||||
canonicalize_hashrulers: true
|
||||
enable_markup: true
|
||||
lint:
|
||||
disabled_codes: []
|
||||
function_pattern: '[0-9a-z_]+'
|
||||
macro_pattern: '[0-9A-Z_]+'
|
||||
global_var_pattern: '[A-Z][0-9A-Z_]+'
|
||||
internal_var_pattern: _[A-Z][0-9A-Z_]+
|
||||
local_var_pattern: '[a-z][a-z0-9_]+'
|
||||
private_var_pattern: _[0-9a-z_]+
|
||||
public_var_pattern: '[A-Z][0-9A-Z_]+'
|
||||
argument_var_pattern: '[a-z][a-z0-9_]+'
|
||||
keyword_pattern: '[A-Z][0-9A-Z_]+'
|
||||
max_conditionals_custom_parser: 2
|
||||
min_statement_spacing: 1
|
||||
max_statement_spacing: 2
|
||||
max_returns: 6
|
||||
max_branches: 12
|
||||
max_arguments: 5
|
||||
max_localvars: 15
|
||||
max_statements: 50
|
||||
encode:
|
||||
emit_byteorder_mark: false
|
||||
input_encoding: utf-8
|
||||
output_encoding: utf-8
|
||||
misc:
|
||||
per_command: {}
|
||||
поставляемый
+14
-159
@@ -34,16 +34,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_ubuntu_22_04
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -56,14 +47,6 @@ jobs:
|
||||
- name: Build Tests, Samples, Documentation, Packages
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests samples doc package
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_ubuntu_22_04
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -102,16 +85,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_ubuntu_20_04
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -153,16 +127,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_sles
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -175,14 +140,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_sles
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -212,16 +169,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_8
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -234,14 +182,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_8
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -271,16 +211,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_9
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -293,14 +224,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_9
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -326,16 +249,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: vega20
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -348,14 +262,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: vega20
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -381,16 +287,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: navi32
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -403,14 +300,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: navi32
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -436,16 +325,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi100
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -458,14 +338,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi100
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -492,16 +364,7 @@ jobs:
|
||||
# steps:
|
||||
# - uses: actions/checkout@v3
|
||||
|
||||
# - name: Restore cached Build
|
||||
# id: cache-build-restore
|
||||
# uses: actions/cache/restore@v3
|
||||
# with:
|
||||
# path: |
|
||||
# ${{github.workspace}}/build
|
||||
# key: navi21
|
||||
|
||||
# - name: Configure CMake
|
||||
# if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# # Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# # See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
# run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -514,26 +377,18 @@ jobs:
|
||||
# - name: Build Tests
|
||||
# run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
# - name: Save Build
|
||||
# id: cache-build-save
|
||||
# uses: actions/cache/save@v3
|
||||
# with:
|
||||
# path: |
|
||||
# ${{github.workspace}}/build
|
||||
# key: navi21
|
||||
# - name: Testing V1
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# ./run.sh
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # working-directory: ${{github.workspace}}/build/tests-v2
|
||||
# # Execute tests defined by the CMake configuration.
|
||||
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
|
||||
|
||||
# - name: Testing V1
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# ./run.sh
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # working-directory: ${{github.workspace}}/build/tests-v2
|
||||
# # Execute tests defined by the CMake configuration.
|
||||
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
|
||||
|
||||
# - name: Testing V2
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# make -j check
|
||||
# - name: Testing V2
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# make -j check
|
||||
|
||||
+95
@@ -0,0 +1,95 @@
|
||||
|
||||
name: Formatting
|
||||
run-name: formatting
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [ amd-staging ]
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
cmake:
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Extract branch name
|
||||
shell: bash
|
||||
run: |
|
||||
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
|
||||
id: extract_branch
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y python3-pip
|
||||
python3 -m pip install -U cmake-format
|
||||
|
||||
- name: Run cmake-format
|
||||
run: |
|
||||
set +e
|
||||
cmake-format -i $(find . -type f | egrep 'CMakeLists.txt|\.cmake$')
|
||||
if [ $(git diff | wc -l) -ne 0 ]; then
|
||||
echo -e "\nError! CMake code not formatted. Run cmake-format...\n"
|
||||
echo -e "\nFiles:\n"
|
||||
git diff --name-only
|
||||
echo -e "\nFull diff:\n"
|
||||
git diff
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Create pull request
|
||||
if: failure()
|
||||
uses: peter-evans/create-pull-request@v5
|
||||
with:
|
||||
commit-message: "run cmake formatting (cmake-format)"
|
||||
branch: ${{ steps.extract_branch.outputs.branch }}-cmake-format
|
||||
delete-branch: true
|
||||
title: "Apply cmake-format to ${{ steps.extract_branch.outputs.branch }}"
|
||||
base: ${{ steps.extract_branch.outputs.branch }}
|
||||
|
||||
source:
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
DISTRIB_CODENAME=$(cat /etc/lsb-release | grep DISTRIB_CODENAME | awk -F '=' '{print $NF}')
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y software-properties-common wget curl clang-format-11
|
||||
|
||||
- name: Extract branch name
|
||||
shell: bash
|
||||
run: |
|
||||
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
|
||||
id: extract_branch
|
||||
|
||||
- name: Run clang-format
|
||||
run: |
|
||||
set +e
|
||||
FILES=$(find include plugin samples src test tests-v2 -type f | egrep '\.(h|hpp|hh|c|cc|cpp)(|\.in)$')
|
||||
FORMAT_OUT=$(clang-format-11 -i ${FILES})
|
||||
if [ $(git diff | wc -l) -ne 0 ]; then
|
||||
echo -e "\nError! Code not formatted. Run clang-format (version 11)...\n"
|
||||
echo -e "\nFiles:\n"
|
||||
git diff --name-only
|
||||
echo -e "\nFull diff:\n"
|
||||
git diff
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Create pull request
|
||||
if: failure()
|
||||
uses: peter-evans/create-pull-request@v5
|
||||
with:
|
||||
commit-message: "run formatting (clang-format v11)"
|
||||
branch: ${{ steps.extract_branch.outputs.branch }}-clang-format
|
||||
delete-branch: true
|
||||
title: "Apply clang-format (v11) to ${{ steps.extract_branch.outputs.branch }}"
|
||||
base: ${{ steps.extract_branch.outputs.branch }}
|
||||
@@ -24,7 +24,7 @@ cmake_minimum_required(VERSION 3.18.0)
|
||||
|
||||
# Build is not supported on Windows plaform
|
||||
if(WIN32)
|
||||
message(FATAL_ERROR "Windows build is not supported.")
|
||||
message(FATAL_ERROR "Windows build is not supported.")
|
||||
endif()
|
||||
|
||||
# Set module name and project name.
|
||||
@@ -37,9 +37,9 @@ include(GNUInstallDirs)
|
||||
|
||||
# set default ROCM_PATH
|
||||
if(NOT DEFINED ROCM_PATH)
|
||||
set(ROCM_PATH
|
||||
"/opt/rocm"
|
||||
CACHE STRING "Default ROCM installation directory")
|
||||
set(ROCM_PATH
|
||||
"/opt/rocm"
|
||||
CACHE STRING "Default ROCM installation directory")
|
||||
endif()
|
||||
|
||||
set(CMAKE_CXX_STANDARD 17)
|
||||
@@ -62,8 +62,8 @@ set(BUILD_VERSION_MAJOR ${VERSION_MAJOR})
|
||||
set(BUILD_VERSION_MINOR ${VERSION_MINOR})
|
||||
set(BUILD_VERSION_PATCH ${VERSION_PATCH})
|
||||
if(DEFINED VERSION_BUILD AND NOT ${VERSION_BUILD} STREQUAL "")
|
||||
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
|
||||
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
|
||||
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
|
||||
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
|
||||
endif()
|
||||
set(BUILD_VERSION_STRING
|
||||
"${BUILD_VERSION_MAJOR}.${BUILD_VERSION_MINOR}.${BUILD_VERSION_PATCH}")
|
||||
@@ -71,12 +71,11 @@ set(BUILD_VERSION_STRING
|
||||
set(LIB_VERSION_MAJOR ${VERSION_MAJOR})
|
||||
set(LIB_VERSION_MINOR ${VERSION_MINOR})
|
||||
if(${ROCM_PATCH_VERSION})
|
||||
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
|
||||
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
|
||||
else()
|
||||
set(LIB_VERSION_PATCH ${VERSION_PATCH})
|
||||
set(LIB_VERSION_PATCH ${VERSION_PATCH})
|
||||
endif()
|
||||
set(LIB_VERSION_STRING
|
||||
"${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
|
||||
set(LIB_VERSION_STRING "${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
|
||||
message("-- LIB-VERSION STRING: ${LIB_VERSION_STRING}")
|
||||
|
||||
# Set target and root/lib/test directory
|
||||
@@ -86,97 +85,84 @@ set(LIB_DIR "${ROOT_DIR}/src")
|
||||
set(TEST_DIR "${ROOT_DIR}/test")
|
||||
|
||||
find_package(
|
||||
amd_comgr
|
||||
REQUIRED
|
||||
CONFIG
|
||||
HINTS
|
||||
${CMAKE_INSTALL_PREFIX}
|
||||
PATHS
|
||||
${ROCM_PATH}
|
||||
PATH_SUFFIXES
|
||||
lib/cmake/amd_comgr)
|
||||
amd_comgr REQUIRED CONFIG
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES lib/cmake/amd_comgr)
|
||||
message(STATUS "Code Object Manager found at ${amd_comgr_DIR}.")
|
||||
link_libraries(amd_comgr)
|
||||
|
||||
find_package(Threads REQUIRED)
|
||||
find_package(
|
||||
hsa-runtime64
|
||||
REQUIRED
|
||||
CONFIG
|
||||
HINTS
|
||||
${CMAKE_INSTALL_PREFIX}
|
||||
PATHS
|
||||
${ROCM_PATH})
|
||||
hsa-runtime64 REQUIRED CONFIG
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH})
|
||||
find_package(
|
||||
HIP
|
||||
REQUIRED
|
||||
CONFIG
|
||||
HINTS
|
||||
${CMAKE_INSTALL_PREFIX}
|
||||
PATHS
|
||||
${ROCM_PATH})
|
||||
HIP REQUIRED CONFIG
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH})
|
||||
|
||||
find_library(NUMA NAME numa REQUIRED)
|
||||
link_libraries(${NUMA})
|
||||
|
||||
find_program(ROCMINFO_EXEC NAMES "rocminfo"
|
||||
PATHS ${ROCM_PATH}
|
||||
${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
|
||||
PATH_SUFFIXES bin)
|
||||
find_program(
|
||||
ROCMINFO_EXEC
|
||||
NAMES "rocminfo"
|
||||
PATHS ${ROCM_PATH} ${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
|
||||
PATH_SUFFIXES bin)
|
||||
set(ORIGINAL_SCRIPT_PATH ${CMAKE_CURRENT_SOURCE_DIR}/bin/tblextr.py)
|
||||
set(OUTPUT_SCRIPT_PATH ${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py)
|
||||
configure_file(${ORIGINAL_SCRIPT_PATH} ${OUTPUT_SCRIPT_PATH} @ONLY)
|
||||
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
|
||||
include_directories(${HSA_RUNTIME_INC_PATH})
|
||||
|
||||
if(NOT DEFINED LIBRARY_TYPE)
|
||||
set(LIBRARY_TYPE SHARED)
|
||||
set(LIBRARY_TYPE SHARED)
|
||||
endif()
|
||||
|
||||
# Enable tracing API
|
||||
if(NOT USE_PROF_API)
|
||||
set(USE_PROF_API 1)
|
||||
set(USE_PROF_API 1)
|
||||
endif()
|
||||
|
||||
# Protocol header lookup
|
||||
set(PROF_API_HEADER_NAME prof_protocol.h)
|
||||
if(USE_PROF_API EQUAL 1)
|
||||
find_path(
|
||||
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
|
||||
HINTS ${PROF_API_HEADER_PATH}
|
||||
PATHS /opt/rocm/include
|
||||
PATH_SUFFIXES roctracer/ext)
|
||||
if(NOT PROF_API_HEADER_DIR)
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
|
||||
)
|
||||
else()
|
||||
include_directories(${PROF_API_HEADER_DIR})
|
||||
message(
|
||||
STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
|
||||
endif()
|
||||
find_path(
|
||||
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
|
||||
HINTS ${PROF_API_HEADER_PATH}
|
||||
PATHS /opt/rocm/include
|
||||
PATH_SUFFIXES roctracer/ext)
|
||||
if(NOT PROF_API_HEADER_DIR)
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
|
||||
)
|
||||
else()
|
||||
include_directories(${PROF_API_HEADER_DIR})
|
||||
message(STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# Build libraries
|
||||
add_subdirectory(src)
|
||||
|
||||
if(${LIBRARY_TYPE} STREQUAL SHARED)
|
||||
# Build samples
|
||||
add_subdirectory(samples)
|
||||
# Build samples
|
||||
add_subdirectory(samples)
|
||||
|
||||
# Build tests
|
||||
add_subdirectory(tests-v2)
|
||||
# Build tests
|
||||
add_subdirectory(tests-v2)
|
||||
endif()
|
||||
|
||||
# Build Plugins
|
||||
@@ -188,20 +174,20 @@ add_subdirectory(${TEST_DIR} ${PROJECT_BINARY_DIR}/test)
|
||||
# Installation and packaging
|
||||
set(DEST_NAME ${ROCPROFILER_NAME})
|
||||
if(DEFINED CMAKE_INSTALL_PREFIX)
|
||||
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
endif()
|
||||
if(DEFINED CPACK_PACKAGING_INSTALL_PREFIX)
|
||||
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
else()
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
|
||||
endif()
|
||||
message("CMake-install-prefix: ${CMAKE_INSTALL_PREFIX}")
|
||||
message("CPack-install-prefix: ${CPACK_PACKAGING_INSTALL_PREFIX}")
|
||||
@@ -209,413 +195,395 @@ message("-----------Dest-name: ${DEST_NAME}")
|
||||
|
||||
# Install headers
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT dev)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT dev)
|
||||
|
||||
# rpl_run.sh
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
RENAME rocprof
|
||||
COMPONENT runtime)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
|
||||
RENAME rocprof
|
||||
COMPONENT runtime)
|
||||
|
||||
configure_file(bin/rocprofv2 ${PROJECT_BINARY_DIR} COPYONLY)
|
||||
install(
|
||||
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
|
||||
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
|
||||
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
|
||||
# gfx_metrics.xml metrics.xml
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
# librocprof-tool.so
|
||||
install(
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
install(
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT asan)
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT asan)
|
||||
|
||||
install(
|
||||
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS
|
||||
OWNER_READ
|
||||
OWNER_WRITE
|
||||
OWNER_EXECUTE
|
||||
GROUP_READ
|
||||
GROUP_EXECUTE
|
||||
WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
|
||||
# File reorg backward compatibility for non ASAN packaging
|
||||
if ( NOT ENABLE_ASAN_PACKAGING )
|
||||
# File reorg Backward compatibility
|
||||
option(FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
"Enable File Reorg with backward compatibility" ON)
|
||||
if(NOT ENABLE_ASAN_PACKAGING)
|
||||
# File reorg Backward compatibility
|
||||
option(FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
"Enable File Reorg with backward compatibility" ON)
|
||||
endif()
|
||||
|
||||
if(FILE_REORG_BACKWARD_COMPATIBILITY)
|
||||
# To enabe/disable #error in wrapper header files
|
||||
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
|
||||
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
|
||||
set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}"
|
||||
CACHE STRING "Header wrapper warnings as errors.")
|
||||
else()
|
||||
set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.")
|
||||
# To enabe/disable #error in wrapper header files
|
||||
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
|
||||
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
|
||||
set(ROCM_HEADER_WRAPPER_WERROR
|
||||
"$ENV{ROCM_HEADER_WRAPPER_WERROR}"
|
||||
CACHE STRING "Header wrapper warnings as errors.")
|
||||
else()
|
||||
set(ROCM_HEADER_WRAPPER_WERROR
|
||||
"OFF"
|
||||
CACHE STRING "Header wrapper warnings as errors.")
|
||||
endif()
|
||||
endif()
|
||||
endif()
|
||||
|
||||
if(ROCM_HEADER_WRAPPER_WERROR)
|
||||
set(deprecated_error 1)
|
||||
else()
|
||||
set(deprecated_error 0)
|
||||
endif()
|
||||
include(rocprofiler-backward-compat.cmake)
|
||||
endif() #FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
if(ROCM_HEADER_WRAPPER_WERROR)
|
||||
set(deprecated_error 1)
|
||||
else()
|
||||
set(deprecated_error 0)
|
||||
endif()
|
||||
include(rocprofiler-backward-compat.cmake)
|
||||
endif() # FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
|
||||
if(${LIBRARY_TYPE} STREQUAL SHARED)
|
||||
# Packaging directives
|
||||
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
|
||||
set(ENABLE_LDCONFIG
|
||||
ON
|
||||
CACHE BOOL "Set library links and caches using ldconfig.")
|
||||
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
|
||||
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
|
||||
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
|
||||
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
|
||||
set(CPACK_PACKAGE_VERSION
|
||||
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
|
||||
)
|
||||
set(CPACK_PACKAGE_CONTACT
|
||||
"ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
|
||||
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
|
||||
"ROCPROFILER library for AMD HSA runtime API extension support")
|
||||
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
|
||||
|
||||
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
|
||||
# Packaging directives
|
||||
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
|
||||
set(ENABLE_LDCONFIG
|
||||
ON
|
||||
CACHE BOOL "Set library links and caches using ldconfig.")
|
||||
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
|
||||
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
|
||||
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
|
||||
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
|
||||
set(CPACK_PACKAGE_VERSION
|
||||
"${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
|
||||
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
|
||||
endif()
|
||||
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
|
||||
)
|
||||
set(CPACK_PACKAGE_CONTACT "ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
|
||||
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
|
||||
"ROCPROFILER library for AMD HSA runtime API extension support")
|
||||
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
|
||||
|
||||
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
|
||||
set(CPACK_PACKAGE_VERSION "${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
|
||||
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
|
||||
endif()
|
||||
|
||||
# Debian package specific variable for ASAN
|
||||
set(CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
|
||||
set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan")
|
||||
|
||||
# Debian package specific variable for ASAN
|
||||
set ( CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
|
||||
set ( CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan" )
|
||||
# Install license file
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
|
||||
COMPONENT asan)
|
||||
|
||||
# Install license file
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
|
||||
COMPONENT asan)
|
||||
# Debian package specific variables
|
||||
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
|
||||
# Debian package specific variables
|
||||
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
|
||||
set(CPACK_DEB_COMPONENT_INSTALL ON)
|
||||
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
|
||||
"hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev"
|
||||
)
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
|
||||
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
|
||||
set(CPACK_DEB_COMPONENT_INSTALL ON)
|
||||
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS "hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev")
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md")
|
||||
|
||||
set ( CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
|
||||
# RPM package specific variables
|
||||
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_RPM_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
|
||||
# RPM package specific variables
|
||||
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_RPM_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
|
||||
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
|
||||
|
||||
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
|
||||
# 'dist' breaks manual builds on debian systems due to empty Provides
|
||||
execute_process(
|
||||
COMMAND rpm --eval %{?dist}
|
||||
RESULT_VARIABLE PROC_RESULT
|
||||
OUTPUT_VARIABLE EVAL_RESULT
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
|
||||
|
||||
# 'dist' breaks manual builds on debian systems due to empty Provides
|
||||
execute_process(
|
||||
COMMAND rpm --eval %{?dist}
|
||||
RESULT_VARIABLE PROC_RESULT
|
||||
OUTPUT_VARIABLE EVAL_RESULT
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
|
||||
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
|
||||
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
|
||||
endif()
|
||||
|
||||
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
|
||||
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
|
||||
endif()
|
||||
set(CPACK_RPM_COMPONENT_INSTALL ON)
|
||||
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
|
||||
"hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
|
||||
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
|
||||
set(CPACK_RPM_COMPONENT_INSTALL ON)
|
||||
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES "hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
|
||||
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
|
||||
#Disable build id for rocprofiler as its creating transaction error
|
||||
set ( CPACK_RPM_SPEC_MORE_DEFINE "%define _build_id_links none
|
||||
# Disable build id for rocprofiler as its creating transaction error
|
||||
set(CPACK_RPM_SPEC_MORE_DEFINE
|
||||
"%define _build_id_links none
|
||||
%global __strip ${CPACK_STRIP_EXECUTABLE}
|
||||
%global __objdump ${CPACK_OBJDUMP_EXECUTABLE}
|
||||
%global __objcopy ${CPACK_OBJCOPY_EXECUTABLE}
|
||||
%global __readelf ${CPACK_READELF_EXECUTABLE}")
|
||||
|
||||
# RPM package specific variable for ASAN
|
||||
set ( CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
|
||||
set ( CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan" )
|
||||
|
||||
#set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
|
||||
# RPM package specific variable for ASAN
|
||||
set(CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
|
||||
set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan")
|
||||
|
||||
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
|
||||
if(NOT ROCM_DEP_ROCMCORE)
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
|
||||
string(REGEX
|
||||
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
|
||||
string(REGEX
|
||||
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
|
||||
string(REGEX
|
||||
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
|
||||
endif()
|
||||
# set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
|
||||
|
||||
## set components
|
||||
if(ENABLE_ASAN_PACKAGING)
|
||||
# ASAN Package requires only asan component with libraries and license file
|
||||
set(CPACK_COMPONENTS_ALL asan)
|
||||
else()
|
||||
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
|
||||
endif()
|
||||
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
|
||||
if(NOT ROCM_DEP_ROCMCORE)
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
|
||||
endif()
|
||||
|
||||
include(CPack)
|
||||
# set components
|
||||
if(ENABLE_ASAN_PACKAGING)
|
||||
# ASAN Package requires only asan component with libraries and license file
|
||||
set(CPACK_COMPONENTS_ALL asan)
|
||||
else()
|
||||
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
|
||||
endif()
|
||||
|
||||
cpack_add_component(
|
||||
runtime
|
||||
DISPLAY_NAME "Runtime"
|
||||
DESCRIPTION "Dynamic libraries for the ROCProfiler")
|
||||
include(CPack)
|
||||
|
||||
cpack_add_component(
|
||||
dev
|
||||
DISPLAY_NAME "Development"
|
||||
DESCRIPTION "Development needed header files for ROCProfiler"
|
||||
DEPENDS runtime)
|
||||
cpack_add_component(
|
||||
runtime
|
||||
DISPLAY_NAME "Runtime"
|
||||
DESCRIPTION "Dynamic libraries for the ROCProfiler")
|
||||
|
||||
cpack_add_component(
|
||||
plugins
|
||||
DISPLAY_NAME "ROCProfile Plugins"
|
||||
DESCRIPTION "Plugins for handling ROCProfiler data output"
|
||||
DEPENDS runtime)
|
||||
cpack_add_component(
|
||||
dev
|
||||
DISPLAY_NAME "Development"
|
||||
DESCRIPTION "Development needed header files for ROCProfiler"
|
||||
DEPENDS runtime)
|
||||
|
||||
cpack_add_component(
|
||||
tests
|
||||
DISPLAY_NAME "Tests"
|
||||
DESCRIPTION "Tests for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
cpack_add_component(
|
||||
plugins
|
||||
DISPLAY_NAME "ROCProfile Plugins"
|
||||
DESCRIPTION "Plugins for handling ROCProfiler data output"
|
||||
DEPENDS runtime)
|
||||
|
||||
cpack_add_component(
|
||||
samples
|
||||
DISPLAY_NAME "Samples"
|
||||
DESCRIPTION "Samples for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
cpack_add_component(
|
||||
tests
|
||||
DISPLAY_NAME "Tests"
|
||||
DESCRIPTION "Tests for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
|
||||
cpack_add_component(
|
||||
docs
|
||||
DISPLAY_NAME "Documentation"
|
||||
DESCRIPTION "Documentation for the ROCProfiler API"
|
||||
DEPENDS dev)
|
||||
cpack_add_component(
|
||||
samples
|
||||
DISPLAY_NAME "Samples"
|
||||
DESCRIPTION "Samples for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
|
||||
cpack_add_component(
|
||||
asan
|
||||
DISPLAY_NAME "ASAN"
|
||||
DESCRIPTION "ASAN libraries for the ROCPROFILER"
|
||||
DEPENDS asan)
|
||||
cpack_add_component(
|
||||
docs
|
||||
DISPLAY_NAME "Documentation"
|
||||
DESCRIPTION "Documentation for the ROCProfiler API"
|
||||
DEPENDS dev)
|
||||
|
||||
cpack_add_component(
|
||||
asan
|
||||
DISPLAY_NAME "ASAN"
|
||||
DESCRIPTION "ASAN libraries for the ROCPROFILER"
|
||||
DEPENDS asan)
|
||||
endif()
|
||||
|
||||
find_package(Doxygen)
|
||||
|
||||
if(DOXYGEN_FOUND)
|
||||
# # Set input and output files for API Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
|
||||
# # Set input and output files for API Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
|
||||
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
|
||||
COMMENT "Generating API documentation")
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
|
||||
COMMENT "Generating API documentation")
|
||||
|
||||
add_custom_target(
|
||||
doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
|
||||
add_custom_target(doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
|
||||
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
# # Set input and output files for Tools Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
|
||||
# # Set input and output files for Tools Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
|
||||
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
|
||||
COMMENT "Generating Tools documentation")
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
|
||||
COMMENT "Generating Tools documentation")
|
||||
|
||||
add_custom_target(
|
||||
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
|
||||
add_custom_target(
|
||||
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
|
||||
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_tool.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_tool.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
# # Set input and output files for changelog document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
|
||||
# # Set input and output files for changelog document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
|
||||
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
|
||||
COMMENT "Generating changelog documentation")
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
|
||||
COMMENT "Generating changelog documentation")
|
||||
|
||||
add_custom_target(
|
||||
doc_changelog DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
|
||||
add_custom_target(doc_changelog
|
||||
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
|
||||
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
add_dependencies(doc doc_changelog)
|
||||
add_dependencies(doc doc_changelog)
|
||||
endif()
|
||||
|
||||
|
||||
@@ -5,23 +5,16 @@
|
||||
# - LIBDW_INCLUDE_DIRS - the libelf include directory
|
||||
# - LIBDW_LIBRARIES - Link these to use libelf
|
||||
# - LIBDW_DEFINITIONS - Compiler switches required for using libelf
|
||||
find_path(FIND_LIBDW_INCLUDES
|
||||
NAMES
|
||||
elfutils/libdw.h
|
||||
PATHS
|
||||
/usr/include
|
||||
/usr/local/include)
|
||||
find_path(
|
||||
FIND_LIBDW_INCLUDES
|
||||
NAMES elfutils/libdw.h
|
||||
PATHS /usr/include /usr/local/include)
|
||||
|
||||
find_library(FIND_LIBDW_LIBRARIES
|
||||
NAMES
|
||||
dw
|
||||
PATH
|
||||
/usr/lib
|
||||
/usr/local/lib)
|
||||
find_library(FIND_LIBDW_LIBRARIES NAMES dw PATH /usr/lib /usr/local/lib)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(LibDw DEFAULT_MSG
|
||||
FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
|
||||
find_package_handle_standard_args(LibDw DEFAULT_MSG FIND_LIBDW_INCLUDES
|
||||
FIND_LIBDW_LIBRARIES)
|
||||
mark_as_advanced(FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
|
||||
|
||||
set(LIBDW_INCLUDES ${FIND_LIBDW_INCLUDES})
|
||||
|
||||
@@ -5,25 +5,16 @@
|
||||
# - LIBELF_INCLUDE_DIRS - the libelf include directory
|
||||
# - LIBELF_LIBRARIES - Link these to use libelf
|
||||
# - LIBELF_DEFINITIONS - Compiler switches required for using libelf
|
||||
find_path(FIND_LIBELF_INCLUDES
|
||||
NAMES
|
||||
libelf.h
|
||||
PATHS
|
||||
/usr/include
|
||||
/usr/include/libelf
|
||||
/usr/local/include
|
||||
/usr/local/include/libelf)
|
||||
find_path(
|
||||
FIND_LIBELF_INCLUDES
|
||||
NAMES libelf.h
|
||||
PATHS /usr/include /usr/include/libelf /usr/local/include /usr/local/include/libelf)
|
||||
|
||||
find_library(FIND_LIBELF_LIBRARIES
|
||||
NAMES
|
||||
elf
|
||||
PATH
|
||||
/usr/lib
|
||||
/usr/local/lib)
|
||||
find_library(FIND_LIBELF_LIBRARIES NAMES elf PATH /usr/lib /usr/local/lib)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(LibElf DEFAULT_MSG
|
||||
FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
|
||||
find_package_handle_standard_args(LibElf DEFAULT_MSG FIND_LIBELF_INCLUDES
|
||||
FIND_LIBELF_LIBRARIES)
|
||||
mark_as_advanced(FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
|
||||
|
||||
set(LIBELF_INCLUDES ${FIND_LIBELF_INCLUDES})
|
||||
|
||||
@@ -20,60 +20,75 @@
|
||||
# THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
## Linux Compiler options
|
||||
# Linux Compiler options
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fms-extensions")
|
||||
|
||||
add_definitions ( -DNEW_TRACE_API=1 )
|
||||
add_definitions(-DNEW_TRACE_API=1)
|
||||
|
||||
## CLANG options
|
||||
# CLANG options
|
||||
if("$ENV{CXX}" STREQUAL "/usr/bin/clang++")
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
|
||||
endif()
|
||||
|
||||
## Enable debug trace
|
||||
if ( DEFINED ENV{CMAKE_DEBUG_TRACE} )
|
||||
add_definitions ( -DDEBUG_TRACE=1 )
|
||||
# Enable debug trace
|
||||
if(DEFINED ENV{CMAKE_DEBUG_TRACE})
|
||||
add_definitions(-DDEBUG_TRACE=1)
|
||||
endif()
|
||||
|
||||
## Enable AQL-profile new API
|
||||
if ( NOT DEFINED ENV{CMAKE_CURR_API} )
|
||||
add_definitions ( -DAQLPROF_NEW_API=1 )
|
||||
# Enable AQL-profile new API
|
||||
if(NOT DEFINED ENV{CMAKE_CURR_API})
|
||||
add_definitions(-DAQLPROF_NEW_API=1)
|
||||
endif()
|
||||
|
||||
## Enable direct loading of AQL-profile HSA extension
|
||||
if ( DEFINED ENV{CMAKE_LD_AQLPROFILE} )
|
||||
add_definitions ( -DROCP_LD_AQLPROFILE=1 )
|
||||
# Enable direct loading of AQL-profile HSA extension
|
||||
if(DEFINED ENV{CMAKE_LD_AQLPROFILE})
|
||||
add_definitions(-DROCP_LD_AQLPROFILE=1)
|
||||
endif()
|
||||
|
||||
## Find hsa-runtime
|
||||
find_package(hsa-runtime64 CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsa-runtime64 )
|
||||
# Find hsa-runtime
|
||||
find_package(
|
||||
hsa-runtime64 CONFIG REQUIRED
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS /opt/rocm
|
||||
PATH_SUFFIXES lib/cmake/hsa-runtime64)
|
||||
|
||||
# find KFD thunk
|
||||
find_package(hsakmt CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsakmt )
|
||||
find_package(
|
||||
hsakmt CONFIG REQUIRED
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS /opt/rocm
|
||||
PATH_SUFFIXES lib/cmake/hsakmt)
|
||||
|
||||
## Find ROCm
|
||||
## TODO: Need a better method to find the ROCm path
|
||||
find_path ( HSA_KMT_INC_PATH "hsakmt/hsakmt.h" HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES include )
|
||||
if ( "${HSA_KMT_INC_PATH}" STREQUAL "" )
|
||||
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
|
||||
# Find ROCm TODO: Need a better method to find the ROCm path
|
||||
find_path(
|
||||
HSA_KMT_INC_PATH "hsakmt/hsakmt.h"
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS /opt/rocm
|
||||
PATH_SUFFIXES include)
|
||||
if("${HSA_KMT_INC_PATH}" STREQUAL "")
|
||||
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
|
||||
endif()
|
||||
## Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
|
||||
get_filename_component ( ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY )
|
||||
# Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
|
||||
get_filename_component(ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY)
|
||||
|
||||
## Basic Tool Chain Information
|
||||
message ( "----------Build-Type: ${CMAKE_BUILD_TYPE}" )
|
||||
message ( "------------Compiler: ${CMAKE_CXX_COMPILER}" )
|
||||
message ( "----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}" )
|
||||
message ( "-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}" )
|
||||
message ( "-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}" )
|
||||
message ( "---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}" )
|
||||
message ( "---------GPU_TARGETS: ${GPU_TARGETS}" )
|
||||
# Basic Tool Chain Information
|
||||
message("----------Build-Type: ${CMAKE_BUILD_TYPE}")
|
||||
message("------------Compiler: ${CMAKE_CXX_COMPILER}")
|
||||
message("----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}")
|
||||
message("-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}")
|
||||
message("-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
|
||||
message("---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}")
|
||||
message("---------GPU_TARGETS: ${GPU_TARGETS}")
|
||||
|
||||
if ( "${ROCM_ROOT_DIR}" STREQUAL "" )
|
||||
message ( FATAL_ERROR "ROCM_ROOT_DIR is not found." )
|
||||
endif ()
|
||||
if("${ROCM_ROOT_DIR}" STREQUAL "")
|
||||
message(FATAL_ERROR "ROCM_ROOT_DIR is not found.")
|
||||
endif()
|
||||
|
||||
find_library(FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_ROOT_DIR} PATH_SUFFIXES lib REQUIRED)
|
||||
find_library(
|
||||
FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so"
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS ${ROCM_ROOT_DIR}
|
||||
PATH_SUFFIXES lib REQUIRED)
|
||||
if(NOT FIND_AQL_PROFILE_LIB)
|
||||
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
|
||||
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
|
||||
endif()
|
||||
|
||||
@@ -20,77 +20,95 @@
|
||||
# THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
## Parses the VERSION_STRING variable and places
|
||||
## the first, second and third number values in
|
||||
## the major, minor and patch variables.
|
||||
function( parse_version VERSION_STRING )
|
||||
# Parses the VERSION_STRING variable and places the first, second and third number values
|
||||
# in the major, minor and patch variables.
|
||||
function(parse_version VERSION_STRING)
|
||||
|
||||
string ( FIND ${VERSION_STRING} "-" STRING_INDEX )
|
||||
string(FIND ${VERSION_STRING} "-" STRING_INDEX)
|
||||
|
||||
if ( ${STRING_INDEX} GREATER -1 )
|
||||
math ( EXPR STRING_INDEX "${STRING_INDEX} + 1" )
|
||||
string ( SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD )
|
||||
endif ()
|
||||
if(${STRING_INDEX} GREATER -1)
|
||||
math(EXPR STRING_INDEX "${STRING_INDEX} + 1")
|
||||
string(SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD)
|
||||
endif()
|
||||
|
||||
string ( REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING} )
|
||||
list ( LENGTH VERSIONS VERSION_COUNT )
|
||||
string(REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING})
|
||||
list(LENGTH VERSIONS VERSION_COUNT)
|
||||
|
||||
if ( ${VERSION_COUNT} GREATER 0)
|
||||
list ( GET VERSIONS 0 MAJOR )
|
||||
set ( VERSION_MAJOR ${MAJOR} PARENT_SCOPE )
|
||||
set ( TEMP_VERSION_STRING "${MAJOR}" )
|
||||
endif ()
|
||||
if(${VERSION_COUNT} GREATER 0)
|
||||
list(GET VERSIONS 0 MAJOR)
|
||||
set(VERSION_MAJOR
|
||||
${MAJOR}
|
||||
PARENT_SCOPE)
|
||||
set(TEMP_VERSION_STRING "${MAJOR}")
|
||||
endif()
|
||||
|
||||
if ( ${VERSION_COUNT} GREATER 1 )
|
||||
list ( GET VERSIONS 1 MINOR )
|
||||
set ( VERSION_MINOR ${MINOR} PARENT_SCOPE )
|
||||
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}" )
|
||||
endif ()
|
||||
if(${VERSION_COUNT} GREATER 1)
|
||||
list(GET VERSIONS 1 MINOR)
|
||||
set(VERSION_MINOR
|
||||
${MINOR}
|
||||
PARENT_SCOPE)
|
||||
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}")
|
||||
endif()
|
||||
|
||||
if ( ${VERSION_COUNT} GREATER 2 )
|
||||
list ( GET VERSIONS 2 PATCH )
|
||||
set ( VERSION_PATCH ${PATCH} PARENT_SCOPE )
|
||||
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}" )
|
||||
endif ()
|
||||
if(${VERSION_COUNT} GREATER 2)
|
||||
list(GET VERSIONS 2 PATCH)
|
||||
set(VERSION_PATCH
|
||||
${PATCH}
|
||||
PARENT_SCOPE)
|
||||
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}")
|
||||
endif()
|
||||
|
||||
if ( DEFINED VERSION_BUILD )
|
||||
set ( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
|
||||
endif ()
|
||||
if(DEFINED VERSION_BUILD)
|
||||
set(VERSION_BUILD
|
||||
"${VERSION_BUILD}"
|
||||
PARENT_SCOPE)
|
||||
endif()
|
||||
|
||||
set ( VERSION_STRING "${TEMP_VERSION_STRING}" PARENT_SCOPE )
|
||||
|
||||
endfunction ()
|
||||
|
||||
## Gets the current version of the repository
|
||||
## using versioning tags and git describe.
|
||||
## Passes back a packaging version string
|
||||
## and a library version string.
|
||||
function ( get_version DEFAULT_VERSION_STRING )
|
||||
|
||||
parse_version ( ${DEFAULT_VERSION_STRING} )
|
||||
|
||||
find_program ( GIT NAMES git )
|
||||
|
||||
if ( GIT )
|
||||
|
||||
execute_process ( COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
|
||||
OUTPUT_VARIABLE GIT_TAG_STRING
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE RESULT )
|
||||
|
||||
if ( ${RESULT} EQUAL 0 )
|
||||
|
||||
parse_version ( ${GIT_TAG_STRING} )
|
||||
|
||||
endif ()
|
||||
|
||||
endif ()
|
||||
|
||||
set( VERSION_STRING "${VERSION_STRING}" PARENT_SCOPE )
|
||||
set( VERSION_MAJOR "${VERSION_MAJOR}" PARENT_SCOPE )
|
||||
set( VERSION_MINOR "${VERSION_MINOR}" PARENT_SCOPE )
|
||||
set( VERSION_PATCH "${VERSION_PATCH}" PARENT_SCOPE )
|
||||
set( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
|
||||
set(VERSION_STRING
|
||||
"${TEMP_VERSION_STRING}"
|
||||
PARENT_SCOPE)
|
||||
|
||||
endfunction()
|
||||
|
||||
# Gets the current version of the repository using versioning tags and git describe.
|
||||
# Passes back a packaging version string and a library version string.
|
||||
function(get_version DEFAULT_VERSION_STRING)
|
||||
|
||||
parse_version(${DEFAULT_VERSION_STRING})
|
||||
|
||||
find_program(GIT NAMES git)
|
||||
|
||||
if(GIT)
|
||||
|
||||
execute_process(
|
||||
COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
|
||||
OUTPUT_VARIABLE GIT_TAG_STRING
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE RESULT)
|
||||
|
||||
if(${RESULT} EQUAL 0)
|
||||
|
||||
parse_version(${GIT_TAG_STRING})
|
||||
|
||||
endif()
|
||||
|
||||
endif()
|
||||
|
||||
set(VERSION_STRING
|
||||
"${VERSION_STRING}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_MAJOR
|
||||
"${VERSION_MAJOR}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_MINOR
|
||||
"${VERSION_MINOR}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_PATCH
|
||||
"${VERSION_PATCH}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_BUILD
|
||||
"${VERSION_BUILD}"
|
||||
PARENT_SCOPE)
|
||||
|
||||
endfunction()
|
||||
|
||||
@@ -164,12 +164,12 @@ typedef struct {
|
||||
|
||||
// Profiling feature type
|
||||
typedef struct {
|
||||
rocprofiler_feature_kind_t kind; // feature kind
|
||||
rocprofiler_feature_kind_t kind; // feature kind
|
||||
union {
|
||||
const char* name; // feature name
|
||||
const char* name; // feature name
|
||||
struct {
|
||||
const char* block; // counter block name
|
||||
uint32_t event; // counter event id
|
||||
const char* block; // counter block name
|
||||
uint32_t event; // counter event id
|
||||
} counter;
|
||||
};
|
||||
const rocprofiler_parameter_t* parameters; // feature parameters array
|
||||
@@ -216,23 +216,25 @@ typedef struct {
|
||||
} rocprofiler_properties_t;
|
||||
|
||||
// Create new profiling context
|
||||
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_properties_t* properties); // profiling properties
|
||||
|
||||
// Add feature to a features set
|
||||
hsa_status_t rocprofiler_add_feature(const rocprofiler_feature_t* feature, // [in]
|
||||
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
|
||||
hsa_status_t rocprofiler_add_feature(
|
||||
const rocprofiler_feature_t* feature, // [in]
|
||||
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
|
||||
|
||||
// Create new profiling context
|
||||
hsa_status_t rocprofiler_features_set_open(hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_set_t* features_set, // [in] profiling features set
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_properties_t* properties); // profiling properties
|
||||
hsa_status_t rocprofiler_features_set_open(
|
||||
hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_set_t* features_set, // [in] profiling features set
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_properties_t* properties); // profiling properties
|
||||
|
||||
// Delete profiling info
|
||||
hsa_status_t rocprofiler_close(rocprofiler_t* context); // [in] profiling context
|
||||
@@ -242,24 +244,24 @@ hsa_status_t rocprofiler_reset(rocprofiler_t* context, // [in] profiling contex
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Return context agent
|
||||
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
|
||||
hsa_agent_t* agent); // [out] GPU handle
|
||||
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
|
||||
hsa_agent_t* agent); // [out] GPU handle
|
||||
|
||||
// Supported time value ID
|
||||
typedef enum {
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
|
||||
} rocprofiler_time_id_t;
|
||||
|
||||
// Return time value for a given time ID and profiling timestamp
|
||||
hsa_status_t rocprofiler_get_time(
|
||||
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
|
||||
uint64_t timestamp, // profiling timestamp
|
||||
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
|
||||
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
|
||||
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
|
||||
uint64_t timestamp, // profiling timestamp
|
||||
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
|
||||
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
// Queue callbacks
|
||||
@@ -269,26 +271,26 @@ hsa_status_t rocprofiler_get_time(
|
||||
|
||||
// Dispatch record
|
||||
typedef struct {
|
||||
uint64_t dispatch; // dispatch timestamp, ns
|
||||
uint64_t begin; // kernel begin timestamp, ns
|
||||
uint64_t end; // kernel end timestamp, ns
|
||||
uint64_t complete; // completion signal timestamp, ns
|
||||
uint64_t dispatch; // dispatch timestamp, ns
|
||||
uint64_t begin; // kernel begin timestamp, ns
|
||||
uint64_t end; // kernel end timestamp, ns
|
||||
uint64_t complete; // completion signal timestamp, ns
|
||||
} rocprofiler_dispatch_record_t;
|
||||
|
||||
// Profiling callback data
|
||||
typedef struct {
|
||||
hsa_agent_t agent; // GPU agent handle
|
||||
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
const hsa_queue_t* queue; // HSA queue
|
||||
uint64_t queue_index; // Index in the queue
|
||||
uint32_t queue_id; // Queue id
|
||||
hsa_signal_t completion_signal; // Completion signal
|
||||
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
|
||||
const char* kernel_name; // Kernel name
|
||||
uint64_t kernel_object; // Kernel object address
|
||||
const amd_kernel_code_t* kernel_code; // Kernel code pointer
|
||||
uint32_t thread_id; // Thread id
|
||||
const rocprofiler_dispatch_record_t* record; // Dispatch record
|
||||
hsa_agent_t agent; // GPU agent handle
|
||||
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
const hsa_queue_t* queue; // HSA queue
|
||||
uint64_t queue_index; // Index in the queue
|
||||
uint32_t queue_id; // Queue id
|
||||
hsa_signal_t completion_signal; // Completion signal
|
||||
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
|
||||
const char* kernel_name; // Kernel name
|
||||
uint64_t kernel_object; // Kernel object address
|
||||
const amd_kernel_code_t* kernel_code; // Kernel code pointer
|
||||
uint32_t thread_id; // Thread id
|
||||
const rocprofiler_dispatch_record_t* record; // Dispatch record
|
||||
} rocprofiler_callback_data_t;
|
||||
|
||||
// Profiling callback type
|
||||
@@ -299,15 +301,14 @@ typedef hsa_status_t (*rocprofiler_callback_t)(
|
||||
|
||||
// Queue callbacks
|
||||
typedef struct {
|
||||
rocprofiler_callback_t dispatch; // dispatch callback
|
||||
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
|
||||
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
|
||||
rocprofiler_callback_t dispatch; // dispatch callback
|
||||
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
|
||||
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
|
||||
} rocprofiler_queue_callbacks_t;
|
||||
|
||||
// Set queue callbacks
|
||||
hsa_status_t rocprofiler_set_queue_callbacks(
|
||||
rocprofiler_queue_callbacks_t callbacks, // callbacks
|
||||
void* data); // [in/out] passed callbacks data
|
||||
hsa_status_t rocprofiler_set_queue_callbacks(rocprofiler_queue_callbacks_t callbacks, // callbacks
|
||||
void* data); // [in/out] passed callbacks data
|
||||
|
||||
// Remove queue callbacks
|
||||
hsa_status_t rocprofiler_remove_queue_callbacks();
|
||||
@@ -323,20 +324,20 @@ hsa_status_t rocprofiler_stop_queue_callbacks();
|
||||
// contect.invocations' to collect all profiling data
|
||||
|
||||
// Start profiling
|
||||
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Stop profiling
|
||||
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Read profiling
|
||||
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Read profiling data
|
||||
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Get profiling groups count
|
||||
hsa_status_t rocprofiler_group_count(const rocprofiler_t* context, // [in] profiling context
|
||||
@@ -379,75 +380,76 @@ hsa_status_t rocprofiler_iterate_trace_data(
|
||||
|
||||
// Profiling info kind
|
||||
typedef enum {
|
||||
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
|
||||
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
|
||||
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
|
||||
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
|
||||
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
|
||||
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
|
||||
} rocprofiler_info_kind_t;
|
||||
|
||||
// Profiling info query
|
||||
typedef union {
|
||||
rocprofiler_info_kind_t info_kind; // queried profiling info kind
|
||||
rocprofiler_info_kind_t info_kind; // queried profiling info kind
|
||||
struct {
|
||||
const char* trace_name; // queried info trace name
|
||||
const char* trace_name; // queried info trace name
|
||||
} trace_parameter;
|
||||
} rocprofiler_info_query_t;
|
||||
|
||||
// Profiling info data
|
||||
typedef struct {
|
||||
uint32_t agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
rocprofiler_info_kind_t kind; // info data kind
|
||||
uint32_t
|
||||
agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
rocprofiler_info_kind_t kind; // info data kind
|
||||
union {
|
||||
struct {
|
||||
const char* name; // metric name
|
||||
uint32_t instances; // instances number
|
||||
const char* expr; // metric expression, NULL for basic counters
|
||||
const char* description; // metric description
|
||||
const char* block_name; // block name
|
||||
uint32_t block_counters; // number of block counters
|
||||
const char* name; // metric name
|
||||
uint32_t instances; // instances number
|
||||
const char* expr; // metric expression, NULL for basic counters
|
||||
const char* description; // metric description
|
||||
const char* block_name; // block name
|
||||
uint32_t block_counters; // number of block counters
|
||||
} metric;
|
||||
struct {
|
||||
const char* name; // trace name
|
||||
const char* description; // trace description
|
||||
uint32_t parameter_count; // supported by the trace number parameters
|
||||
const char* name; // trace name
|
||||
const char* description; // trace description
|
||||
uint32_t parameter_count; // supported by the trace number parameters
|
||||
} trace;
|
||||
struct {
|
||||
uint32_t code; // parameter code
|
||||
const char* trace_name; // trace name
|
||||
const char* parameter_name; // parameter name
|
||||
const char* description; // trace parameter description
|
||||
uint32_t code; // parameter code
|
||||
const char* trace_name; // trace name
|
||||
const char* parameter_name; // parameter name
|
||||
const char* description; // trace parameter description
|
||||
} trace_parameter;
|
||||
};
|
||||
} rocprofiler_info_data_t;
|
||||
|
||||
// Return the info for a given info kind
|
||||
hsa_status_t rocprofiler_get_info(
|
||||
const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
void *data); // [in/out] returned data
|
||||
hsa_status_t rocprofiler_get_info(const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
void* data); // [in/out] returned data
|
||||
|
||||
// Iterate over the info for a given info kind, and invoke an application-defined callback on every iteration
|
||||
hsa_status_t rocprofiler_iterate_info(
|
||||
const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
|
||||
void *data); // [in/out] data passed to callback
|
||||
// Iterate over the info for a given info kind, and invoke an application-defined callback on every
|
||||
// iteration
|
||||
hsa_status_t rocprofiler_iterate_info(const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
|
||||
void* data), // callback
|
||||
void* data); // [in/out] data passed to callback
|
||||
|
||||
// Iterate over the info for a given info query, and invoke an application-defined callback on every iteration
|
||||
hsa_status_t rocprofiler_query_info(
|
||||
const hsa_agent_t *agent, // [in] GFXIP handle
|
||||
rocprofiler_info_query_t query, // iterated info query
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
|
||||
void *data); // [in/out] data passed to callback
|
||||
// Iterate over the info for a given info query, and invoke an application-defined callback on every
|
||||
// iteration
|
||||
hsa_status_t rocprofiler_query_info(const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_query_t query, // iterated info query
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
|
||||
void* data), // callback
|
||||
void* data); // [in/out] data passed to callback
|
||||
|
||||
// Create a profiled queue. All dispatches on this queue will be profiled
|
||||
hsa_status_t rocprofiler_queue_create_profiled(
|
||||
hsa_agent_t agent_handle,uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data),
|
||||
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
|
||||
hsa_queue_t** queue);
|
||||
hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
|
||||
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue);
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
// Profiling pool
|
||||
@@ -461,8 +463,8 @@ typedef void rocprofiler_pool_t;
|
||||
|
||||
// Profiling pool entry
|
||||
typedef struct {
|
||||
rocprofiler_t* context; // context object
|
||||
void* payload; // payload data object
|
||||
rocprofiler_t* context; // context object
|
||||
void* payload; // payload data object
|
||||
} rocprofiler_pool_entry_t;
|
||||
|
||||
// Profiling handler, calling on profiling completion
|
||||
@@ -478,120 +480,118 @@ typedef struct {
|
||||
|
||||
// Open profiling pool
|
||||
hsa_status_t rocprofiler_pool_open(
|
||||
hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_pool_t** pool, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_pool_properties_t*); // pool properties
|
||||
hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_pool_t** pool, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_pool_properties_t*); // pool properties
|
||||
|
||||
// Close profiling pool
|
||||
hsa_status_t rocprofiler_pool_close(
|
||||
rocprofiler_pool_t* pool); // profiling pool handle
|
||||
hsa_status_t rocprofiler_pool_close(rocprofiler_pool_t* pool); // profiling pool handle
|
||||
|
||||
// Fetch profiling pool entry
|
||||
hsa_status_t rocprofiler_pool_fetch(
|
||||
rocprofiler_pool_t* pool, // profiling pool handle
|
||||
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
|
||||
rocprofiler_pool_t* pool, // profiling pool handle
|
||||
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
|
||||
|
||||
// Release profiling pool entry
|
||||
hsa_status_t rocprofiler_pool_release(
|
||||
rocprofiler_pool_entry_t* entry); // released profiling pool entry
|
||||
rocprofiler_pool_entry_t* entry); // released profiling pool entry
|
||||
|
||||
// Iterate fetched profiling pool entries
|
||||
hsa_status_t rocprofiler_pool_iterate(
|
||||
rocprofiler_pool_t* pool, // profiling pool handle
|
||||
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), // callback
|
||||
void *data); // [in/out] data passed to callback
|
||||
hsa_status_t rocprofiler_pool_iterate(rocprofiler_pool_t* pool, // profiling pool handle
|
||||
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry,
|
||||
void* data), // callback
|
||||
void* data); // [in/out] data passed to callback
|
||||
|
||||
// Flush completed entries in profiling pool
|
||||
hsa_status_t rocprofiler_pool_flush(
|
||||
rocprofiler_pool_t* pool); // profiling pool handle
|
||||
hsa_status_t rocprofiler_pool_flush(rocprofiler_pool_t* pool); // profiling pool handle
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
// HSA intercepting API
|
||||
|
||||
// HSA callbacks ID enumeration
|
||||
typedef enum {
|
||||
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
|
||||
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
|
||||
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
|
||||
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
|
||||
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
|
||||
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
|
||||
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
|
||||
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
|
||||
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
|
||||
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
|
||||
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
|
||||
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
|
||||
} rocprofiler_hsa_cb_id_t;
|
||||
|
||||
// HSA callback data type
|
||||
typedef struct {
|
||||
union {
|
||||
struct {
|
||||
const void* ptr; // allocated area ptr
|
||||
size_t size; // allocated area size, zero size means 'free' callback
|
||||
hsa_amd_segment_t segment; // allocated area's memory segment type
|
||||
const void* ptr; // allocated area ptr
|
||||
size_t size; // allocated area size, zero size means 'free' callback
|
||||
hsa_amd_segment_t segment; // allocated area's memory segment type
|
||||
hsa_amd_memory_pool_global_flag_t global_flag; // allocated area's memory global flag
|
||||
int is_code; // equal to 1 if code is allocated
|
||||
} allocate;
|
||||
struct {
|
||||
hsa_device_type_t type; // type of assigned device
|
||||
uint32_t id; // id of assigned device
|
||||
hsa_agent_t agent; // device HSA agent handle
|
||||
const void* ptr; // ptr the device is assigned to
|
||||
hsa_device_type_t type; // type of assigned device
|
||||
uint32_t id; // id of assigned device
|
||||
hsa_agent_t agent; // device HSA agent handle
|
||||
const void* ptr; // ptr the device is assigned to
|
||||
} device;
|
||||
struct {
|
||||
const void* dst; // memcopy dst ptr
|
||||
const void* src; // memcopy src ptr
|
||||
size_t size; // memcopy size bytes
|
||||
const void* dst; // memcopy dst ptr
|
||||
const void* src; // memcopy src ptr
|
||||
size_t size; // memcopy size bytes
|
||||
} memcopy;
|
||||
struct {
|
||||
const void* packet; // submitted to GPU packet
|
||||
const char* kernel_name; // kernel name, not NULL if dispatch
|
||||
hsa_queue_t* queue; // HSA queue the kernel was submitted to
|
||||
uint32_t device_type; // type of device the packed is submitted to
|
||||
uint32_t device_id; // id of device the packed is submitted to
|
||||
const void* packet; // submitted to GPU packet
|
||||
const char* kernel_name; // kernel name, not NULL if dispatch
|
||||
hsa_queue_t* queue; // HSA queue the kernel was submitted to
|
||||
uint32_t device_type; // type of device the packed is submitted to
|
||||
uint32_t device_id; // id of device the packed is submitted to
|
||||
} submit;
|
||||
struct {
|
||||
uint64_t object; // kernel symbol object
|
||||
const char* name; // kernel symbol name
|
||||
uint32_t name_length; // kernel symbol name length
|
||||
int unload; // symbol executable destroy
|
||||
uint64_t object; // kernel symbol object
|
||||
const char* name; // kernel symbol name
|
||||
uint32_t name_length; // kernel symbol name length
|
||||
int unload; // symbol executable destroy
|
||||
} ksymbol;
|
||||
struct {
|
||||
uint32_t storage_type; // code object storage type
|
||||
int storage_file; // origin file descriptor
|
||||
uint64_t memory_base; // origin memory base
|
||||
uint64_t memory_size; // origin memory size
|
||||
uint64_t load_base; // codeobj load base
|
||||
uint64_t load_size; // codeobj load size
|
||||
uint64_t load_delta; // codeobj load size
|
||||
uint32_t uri_length; // URI string length
|
||||
char* uri; // URI string
|
||||
int unload; // unload flag
|
||||
uint32_t storage_type; // code object storage type
|
||||
int storage_file; // origin file descriptor
|
||||
uint64_t memory_base; // origin memory base
|
||||
uint64_t memory_size; // origin memory size
|
||||
uint64_t load_base; // codeobj load base
|
||||
uint64_t load_size; // codeobj load size
|
||||
uint64_t load_delta; // codeobj load size
|
||||
uint32_t uri_length; // URI string length
|
||||
char* uri; // URI string
|
||||
int unload; // unload flag
|
||||
} codeobj;
|
||||
};
|
||||
} rocprofiler_hsa_callback_data_t;
|
||||
|
||||
// HSA callback function type
|
||||
typedef hsa_status_t (*rocprofiler_hsa_callback_fun_t)(
|
||||
rocprofiler_hsa_cb_id_t id, // callback id
|
||||
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
|
||||
void* arg); // [in/out] user passed data
|
||||
rocprofiler_hsa_cb_id_t id, // callback id
|
||||
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
|
||||
void* arg); // [in/out] user passed data
|
||||
|
||||
// HSA callbacks structure
|
||||
typedef struct {
|
||||
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
|
||||
rocprofiler_hsa_callback_fun_t device; // agent assign callback
|
||||
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
|
||||
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
|
||||
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
|
||||
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
|
||||
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
|
||||
rocprofiler_hsa_callback_fun_t device; // agent assign callback
|
||||
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
|
||||
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
|
||||
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
|
||||
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
|
||||
} rocprofiler_hsa_callbacks_t;
|
||||
|
||||
// Set callbacks. If the callback is NULL then it is disabled.
|
||||
// If callback returns a value that is not HSA_STATUS_SUCCESS the callback
|
||||
// will be unregistered.
|
||||
hsa_status_t rocprofiler_set_hsa_callbacks(
|
||||
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
|
||||
void* arg); // callback user data
|
||||
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
|
||||
void* arg); // callback user data
|
||||
|
||||
#ifdef __cplusplus
|
||||
} // extern "C" block
|
||||
|
||||
@@ -1714,7 +1714,7 @@ typedef enum {
|
||||
ROCPROFILER_ATT_TOKEN_MASK2 = 4,
|
||||
ROCPROFILER_ATT_SE_MASK = 5,
|
||||
ROCPROFILER_ATT_SAMPLE_RATE = 6,
|
||||
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
|
||||
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
|
||||
ROCPROFILER_ATT_PERF_MASK = 240,
|
||||
ROCPROFILER_ATT_PERF_CTRL = 241,
|
||||
ROCPROFILER_ATT_PERFCOUNTER = 242,
|
||||
|
||||
@@ -1,23 +1,23 @@
|
||||
################################################################################
|
||||
## Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
##
|
||||
## Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
## of this software and associated documentation files (the "Software"), to
|
||||
## deal in the Software without restriction, including without limitation the
|
||||
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
## sell copies of the Software, and to permit persons to whom the Software is
|
||||
## furnished to do so, subject to the following conditions:
|
||||
##
|
||||
## The above copyright notice and this permission notice shall be included in
|
||||
## all copies or substantial portions of the Software.
|
||||
##
|
||||
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
## IN THE SOFTWARE.
|
||||
# Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to
|
||||
# deal in the Software without restriction, including without limitation the
|
||||
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
# sell copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in
|
||||
# all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
# IN THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
add_subdirectory(file)
|
||||
|
||||
@@ -17,10 +17,10 @@
|
||||
# ##############################################################################
|
||||
|
||||
find_library(
|
||||
ROCPROFV2_ATT rocprofv2_att
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES hsa-amd-aqlprofile)
|
||||
ROCPROFV2_ATT rocprofv2_att
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES hsa-amd-aqlprofile)
|
||||
|
||||
set(ENV{ROCPROFV2_ATT_LIB_PATH} $ROCPROFV2_ATT)
|
||||
|
||||
@@ -30,30 +30,26 @@ file(GLOB FILE_SOURCES att.cpp)
|
||||
add_library(att_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
|
||||
set_target_properties(
|
||||
att_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
att_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(att_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(
|
||||
att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR})
|
||||
target_include_directories(att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR})
|
||||
target_link_options(
|
||||
att_plugin PRIVATE
|
||||
-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(att_plugin PRIVATE rocprofiler-v2
|
||||
hsa-runtime64::hsa-runtime64 stdc++fs)
|
||||
att_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(att_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs)
|
||||
|
||||
install(TARGETS att_plugin
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS att_plugin
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
configure_file(att.py att/att.py COPYONLY)
|
||||
configure_file(trace_view.py att/trace_view.py COPYONLY)
|
||||
@@ -64,7 +60,7 @@ configure_file(ui/logo.svg att/ui/logo.svg COPYONLY)
|
||||
configure_file(ui/styles.css att/ui/styles.css COPYONLY)
|
||||
configure_file(ui/httpserver.py att/ui/httpserver.py COPYONLY)
|
||||
install(
|
||||
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -54,11 +54,12 @@ class att_plugin_t {
|
||||
att_plugin_t() {
|
||||
std::vector<const char*> mpivars = {"MPI_RANK", "OMPI_COMM_WORLD_RANK", "MV2_COMM_WORLD_RANK"};
|
||||
|
||||
for (const char* envvar : mpivars) if (const char* env = getenv(envvar)) {
|
||||
MPI_RANK = atoi(env);
|
||||
MPI_ENABLE = true;
|
||||
break;
|
||||
}
|
||||
for (const char* envvar : mpivars)
|
||||
if (const char* env = getenv(envvar)) {
|
||||
MPI_RANK = atoi(env);
|
||||
MPI_ENABLE = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
bool MPI_ENABLE = false;
|
||||
@@ -92,16 +93,15 @@ class att_plugin_t {
|
||||
std::string name_demangled =
|
||||
rocprofiler::truncate_name(rocprofiler::cxx_demangle(kernel_name_c));
|
||||
|
||||
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
|
||||
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
|
||||
name_demangled = name_demangled.substr(0, ATT_FILENAME_MAXBYTES);
|
||||
|
||||
std::string outfilepath = ".";
|
||||
if (const char* env = getenv("OUTPUT_PATH"))
|
||||
outfilepath = std::string(env);
|
||||
if (const char* env = getenv("OUTPUT_PATH")) outfilepath = std::string(env);
|
||||
|
||||
outfilepath.reserve(outfilepath.size()+128); // Max filename size
|
||||
outfilepath += '/'+name_demangled;
|
||||
if (MPI_ENABLE) outfilepath += "_rank"+std::to_string(MPI_RANK);
|
||||
outfilepath.reserve(outfilepath.size() + 128); // Max filename size
|
||||
outfilepath += '/' + name_demangled;
|
||||
if (MPI_ENABLE) outfilepath += "_rank" + std::to_string(MPI_RANK);
|
||||
outfilepath += "_v";
|
||||
|
||||
// Find if this filename already exists. If so, increment vname.
|
||||
@@ -113,9 +113,9 @@ class att_plugin_t {
|
||||
auto dispatch_id = att_tracer_record->header.id.handle;
|
||||
|
||||
std::string fname = outfilepath + "_kernel.txt";
|
||||
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id
|
||||
<< "] GPU[" << att_tracer_record->gpu_id.handle
|
||||
<< "]: " << kernel_name_c << '\n';
|
||||
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id << "] GPU["
|
||||
<< att_tracer_record->gpu_id.handle << "]: " << kernel_name_c
|
||||
<< '\n';
|
||||
|
||||
// iterate over each shader engine att trace
|
||||
int se_num = att_tracer_record->shader_engine_data_count;
|
||||
|
||||
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
file(GLOB CLI_SOURCES "*.cpp")
|
||||
add_library(cli_plugin SHARED ${CLI_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
|
||||
set_target_properties(cli_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
set_target_properties(
|
||||
cli_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(cli_plugin
|
||||
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
|
||||
target_compile_definitions(cli_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(cli_plugin PRIVATE ${PROJECT_SOURCE_DIR})
|
||||
|
||||
target_link_options(cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
|
||||
target_link_options(
|
||||
cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
|
||||
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs atomic amd_comgr dl)
|
||||
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs atomic amd_comgr dl)
|
||||
|
||||
install(TARGETS cli_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS cli_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -1,76 +1,84 @@
|
||||
################################################################################
|
||||
## Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
##
|
||||
## Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
## of this software and associated documentation files (the "Software"), to
|
||||
## deal in the Software without restriction, including without limitation the
|
||||
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
## sell copies of the Software, and to permit persons to whom the Software is
|
||||
## furnished to do so, subject to the following conditions:
|
||||
##
|
||||
## The above copyright notice and this permission notice shall be included in
|
||||
## all copies or substantial portions of the Software.
|
||||
##
|
||||
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
## IN THE SOFTWARE.
|
||||
# Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to
|
||||
# deal in the Software without restriction, including without limitation the
|
||||
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
# sell copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in
|
||||
# all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
# IN THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
# Plugin shared object.
|
||||
add_library(ctf_plugin SHARED
|
||||
ctf.cpp
|
||||
plugin.cpp
|
||||
barectf.c "${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
|
||||
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
|
||||
hsa_begin.cpp.i hsa_end.cpp.i
|
||||
hip_begin.cpp.i hip_end.cpp.i)
|
||||
set_target_properties(ctf_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
|
||||
add_library(
|
||||
ctf_plugin SHARED
|
||||
ctf.cpp
|
||||
plugin.cpp
|
||||
barectf.c
|
||||
"${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
|
||||
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
|
||||
hsa_begin.cpp.i
|
||||
hsa_end.cpp.i
|
||||
hip_begin.cpp.i
|
||||
hip_end.cpp.i)
|
||||
set_target_properties(
|
||||
ctf_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
|
||||
set(METADATA_STREAM_FILE_DIR "${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME}/plugin/ctf")
|
||||
target_compile_definitions(ctf_plugin PUBLIC AMD_INTERNAL_BUILD PRIVATE
|
||||
HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1
|
||||
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata")
|
||||
target_include_directories(ctf_plugin PRIVATE
|
||||
"${PROJECT_SOURCE_DIR}"
|
||||
"${CMAKE_BINARY_DIR}/src/api"
|
||||
"${CMAKE_CURRENT_BINARY_DIR}")
|
||||
target_link_options(ctf_plugin PRIVATE
|
||||
"-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(ctf_plugin PRIVATE
|
||||
rocprofiler-v2
|
||||
hsa-runtime64::hsa-runtime64
|
||||
stdc++fs
|
||||
dl)
|
||||
install(TARGETS ctf_plugin LIBRARY
|
||||
DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
|
||||
COMPONENT plugins)
|
||||
target_compile_definitions(
|
||||
ctf_plugin
|
||||
PUBLIC AMD_INTERNAL_BUILD
|
||||
PRIVATE
|
||||
HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1
|
||||
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata"
|
||||
)
|
||||
target_include_directories(
|
||||
ctf_plugin PRIVATE "${PROJECT_SOURCE_DIR}" "${CMAKE_BINARY_DIR}/src/api"
|
||||
"${CMAKE_CURRENT_BINARY_DIR}")
|
||||
target_link_options(
|
||||
ctf_plugin PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(ctf_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs dl)
|
||||
install(TARGETS ctf_plugin LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
|
||||
COMPONENT plugins)
|
||||
|
||||
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3,
|
||||
# CppHeaderParser, PyYAML, and barectf.
|
||||
find_package(Python3 COMPONENTS Interpreter REQUIRED)
|
||||
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3, CppHeaderParser, PyYAML, and
|
||||
# barectf.
|
||||
find_package(
|
||||
Python3
|
||||
COMPONENTS Interpreter
|
||||
REQUIRED)
|
||||
|
||||
message("Python: ${Python3_EXECUTABLE})")
|
||||
|
||||
execute_process(COMMAND Python3::Interpreter -c "print('hello')")
|
||||
|
||||
function(check_py3_pkg pkg_name)
|
||||
execute_process(COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
|
||||
RESULT_VARIABLE PY3_IMPORT_RES
|
||||
OUTPUT_QUIET)
|
||||
execute_process(
|
||||
COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
|
||||
RESULT_VARIABLE PY3_IMPORT_RES
|
||||
OUTPUT_QUIET)
|
||||
|
||||
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
|
||||
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
|
||||
endif()
|
||||
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
|
||||
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
|
||||
endif()
|
||||
|
||||
message(STATUS "Found Python 3 package `${pkg_name}`")
|
||||
message(STATUS "Found Python 3 package `${pkg_name}`")
|
||||
endfunction()
|
||||
|
||||
check_py3_pkg(CppHeaderParser)
|
||||
@@ -78,82 +86,76 @@ check_py3_pkg(yaml)
|
||||
find_program(BARECTF_RES barectf REQUIRED HINTS "$ENV{HOME}/.local/bin")
|
||||
|
||||
# Generate barectf YAML and C++ files for HSA API.
|
||||
get_property(HSA_RUNTIME_INCLUDE_DIRS
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRS
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH "${HSA_H}" DIRECTORY)
|
||||
add_custom_command(
|
||||
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
-o hsa_ext_amd.h.i
|
||||
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i
|
||||
hsa_ext_amd.h.i
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
|
||||
> hsa_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
hsa hsa_input.h
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
|
||||
COMMENT "Generating HSA API files for the `ctf` plugin...")
|
||||
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
|
||||
hsa_ext_amd.h.i
|
||||
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i hsa_ext_amd.h.i
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h" > hsa_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hsa
|
||||
hsa_input.h
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HSA_RUNTIME_INC_PATH}/hsa.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
|
||||
COMMENT "Generating HSA API files for the `ctf` plugin...")
|
||||
|
||||
# Generate barectf YAML and C++ files for HIP API.
|
||||
get_property(HIP_INCLUDE_DIRS TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
find_file(HIP_PROF_STR_H hip_prof_str.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip hip/amd_detail
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HIP_INCLUDE_DIRS
|
||||
TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
find_file(
|
||||
HIP_PROF_STR_H hip_prof_str.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip hip/amd_detail
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
list(TRANSFORM HIP_INCLUDE_DIRS PREPEND -I)
|
||||
add_custom_command(
|
||||
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS}
|
||||
-E "${HIP_RUNTIME_API_H}"
|
||||
-D__HIP_PLATFORM_HCC__=1
|
||||
-D__HIP_ROCclr__=1
|
||||
-o hip_runtime_api.h.i
|
||||
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
|
||||
BYPRODUCTS hip_runtime_api.h.i hip_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
hip hip_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
"${HIP_RUNTIME_API_H}"
|
||||
"${HIP_PROF_STR_H}"
|
||||
COMMENT "Generating HIP API files for the `ctf` plugin...")
|
||||
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS} -E "${HIP_RUNTIME_API_H}"
|
||||
-D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
|
||||
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
|
||||
BYPRODUCTS hip_runtime_api.h.i hip_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hip
|
||||
hip_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HIP_RUNTIME_API_H}"
|
||||
"${HIP_PROF_STR_H}"
|
||||
COMMENT "Generating HIP API files for the `ctf` plugin...")
|
||||
|
||||
# Generate `env.yaml` (trace environment for barectf).
|
||||
add_custom_command(
|
||||
OUTPUT env.yaml
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
${PROJECT_VERSION}
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
COMMENT "Generating `env.yaml`...")
|
||||
OUTPUT env.yaml
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
${PROJECT_VERSION}
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
COMMENT "Generating `env.yaml`...")
|
||||
|
||||
# Generate raw CTF tracer with barectf.
|
||||
add_custom_command(
|
||||
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
|
||||
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
|
||||
"-I${CMAKE_CURRENT_SOURCE_DIR}"
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
DEPENDS hsa_erts.yaml
|
||||
hip_erts.yaml
|
||||
env.yaml
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
|
||||
COMMENT "Generating raw CTF tracer with barectf...")
|
||||
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
|
||||
DESTINATION "${METADATA_STREAM_FILE_DIR}" COMPONENT plugins)
|
||||
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
|
||||
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
|
||||
"-I${CMAKE_CURRENT_SOURCE_DIR}" "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
DEPENDS hsa_erts.yaml hip_erts.yaml env.yaml "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
|
||||
COMMENT "Generating raw CTF tracer with barectf...")
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
|
||||
DESTINATION "${METADATA_STREAM_FILE_DIR}"
|
||||
COMPONENT plugins)
|
||||
|
||||
@@ -156,9 +156,8 @@ class HsaApiEventRecord : public TracerEventRecord<barectf_hsa_api_ctx> {
|
||||
const rocprofiler_session_id_t session_id,
|
||||
const std::uint64_t clock_val)
|
||||
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val} {
|
||||
if(record.api_data.hsa)
|
||||
api_data_ = *(record.api_data.hsa);
|
||||
}
|
||||
if (record.api_data.hsa) api_data_ = *(record.api_data.hsa);
|
||||
}
|
||||
explicit HsaApiEventRecord(const rocprofiler_record_tracer_t& record,
|
||||
const std::uint64_t clock_val, hsa_api_data_t& api_data)
|
||||
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val}, api_data_(api_data) {}
|
||||
@@ -206,7 +205,7 @@ class HipApiEventRecord : public TracerEventRecord<barectf_hip_api_ctx> {
|
||||
const rocprofiler_session_id_t session_id,
|
||||
const std::uint64_t clock_val)
|
||||
: TracerEventRecord<barectf_hip_api_ctx>{record, clock_val},
|
||||
api_data_{record.api_data.hip? *(record.api_data.hip) : hip_api_data_t{}},
|
||||
api_data_{record.api_data.hip ? *(record.api_data.hip) : hip_api_data_t{}},
|
||||
kernel_name_{record.name ? record.name : std::string{}} {}
|
||||
explicit HipApiEventRecord(const rocprofiler_record_tracer_t& record,
|
||||
const std::uint64_t clock_val, hip_api_data_t& api_data,
|
||||
@@ -760,16 +759,11 @@ std::uint64_t GetMetadataClkClsOffset() {
|
||||
|
||||
static const char* LOOP_MPI_RANK(const std::vector<const char*>& mpivars) {
|
||||
for (const char* env : mpivars)
|
||||
if (const char* envvar = getenv(env))
|
||||
return envvar;
|
||||
if (const char* envvar = getenv(env)) return envvar;
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
static void insert_meta_to_stream(
|
||||
std::stringstream& stream,
|
||||
const char* field,
|
||||
const char* value
|
||||
) {
|
||||
static void insert_meta_to_stream(std::stringstream& stream, const char* field, const char* value) {
|
||||
if (!field || !value) return;
|
||||
stream << "\n\t" << std::string(field) << " = " << std::string(value) << ';';
|
||||
}
|
||||
@@ -802,7 +796,7 @@ void Plugin::CopyAdjustedMetadataStreamFile(const fs::path& metadata_stream_path
|
||||
std::string data_ins = data_stream.str();
|
||||
size_t env_pos = metadata.find("env {");
|
||||
if (env_pos != std::string::npos)
|
||||
metadata.insert(metadata.begin()+env_pos+5, data_ins.begin(), data_ins.end());
|
||||
metadata.insert(metadata.begin() + env_pos + 5, data_ins.begin(), data_ins.end());
|
||||
else
|
||||
std::cerr << "Failed to insert MPI metadata!" << std::endl;
|
||||
}
|
||||
|
||||
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
file(GLOB FILE_SOURCES "*.cpp")
|
||||
add_library(file_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
|
||||
set_target_properties(file_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
set_target_properties(
|
||||
file_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(file_plugin
|
||||
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
|
||||
target_compile_definitions(file_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(file_plugin PRIVATE ${PROJECT_SOURCE_DIR})
|
||||
|
||||
target_link_options(file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
|
||||
target_link_options(
|
||||
file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
|
||||
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs amd_comgr dl)
|
||||
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs amd_comgr dl)
|
||||
|
||||
install(TARGETS file_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS file_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -216,8 +216,7 @@ class file_plugin_t {
|
||||
case ACTIVITY_DOMAIN_HIP_API: {
|
||||
if (hip_api_header_written_.load(std::memory_order_relaxed)) return;
|
||||
output_file = get_output_file(output_type_t::TRACER, ACTIVITY_DOMAIN_HIP_API);
|
||||
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID"
|
||||
<< std::endl;
|
||||
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID" << std::endl;
|
||||
*output_file << std::endl;
|
||||
hip_api_header_written_.exchange(true, std::memory_order_release);
|
||||
return;
|
||||
|
||||
@@ -1,27 +1,27 @@
|
||||
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
|
||||
add_library(perfetto_plugin
|
||||
${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES}
|
||||
perfetto.cpp perfetto_sdk/sdk/perfetto.cc)
|
||||
add_library(perfetto_plugin ${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES} perfetto.cpp
|
||||
perfetto_sdk/sdk/perfetto.cc)
|
||||
|
||||
set_target_properties(perfetto_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
set_target_properties(
|
||||
perfetto_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(perfetto_plugin
|
||||
PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
target_compile_definitions(perfetto_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(perfetto_plugin
|
||||
PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
|
||||
target_include_directories(
|
||||
perfetto_plugin PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
|
||||
|
||||
target_link_options(perfetto_plugin
|
||||
PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
|
||||
target_link_options(
|
||||
perfetto_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
|
||||
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs amd_comgr)
|
||||
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs
|
||||
amd_comgr)
|
||||
|
||||
install(TARGETS perfetto_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT plugins)
|
||||
install(TARGETS perfetto_plugin
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME} COMPONENT plugins)
|
||||
|
||||
@@ -556,8 +556,7 @@ class perfetto_plugin_t {
|
||||
if (tracer_record.name) {
|
||||
kernel_name = rocprofiler::cxx_demangle(tracer_record.name);
|
||||
TRACE_EVENT_BEGIN(
|
||||
"HIP_OPS",
|
||||
perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
|
||||
"HIP_OPS", perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
|
||||
gpu_track, tracer_record.timestamps.begin.value, "Agent ID",
|
||||
tracer_record.agent_id.handle, "Process ID", GetPid(), "Kernel Name", kernel_name,
|
||||
perfetto::Flow::ProcessScoped(tracer_record.correlation_id.value));
|
||||
|
||||
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
@@ -36,9 +36,10 @@
|
||||
#include "src/utils/helper.h"
|
||||
|
||||
// Macro to check ROCProfiler calls status
|
||||
#define CHECK_ROCPROFILER(call) \
|
||||
#define CHECK_ROCPROFILER(call) \
|
||||
do { \
|
||||
if ((call) != ROCPROFILER_STATUS_SUCCESS) rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
|
||||
if ((call) != ROCPROFILER_STATUS_SUCCESS) \
|
||||
rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
|
||||
} while (false)
|
||||
|
||||
namespace {
|
||||
@@ -48,8 +49,6 @@ namespace {
|
||||
return pid;
|
||||
}
|
||||
|
||||
[[maybe_unused]] uint64_t GetMachineID() {
|
||||
return gethostid();
|
||||
}
|
||||
[[maybe_unused]] uint64_t GetMachineID() { return gethostid(); }
|
||||
|
||||
} // namespace
|
||||
|
||||
@@ -26,9 +26,11 @@ set(ROCPROF_WRAPPER_BIN_DIR ${ROCPROF_WRAPPER_DIR}/bin)
|
||||
set(ROCPROF_WRAPPER_LIB_DIR ${ROCPROF_WRAPPER_DIR}/lib)
|
||||
set(ROCPROF_WRAPPER_TOOL_DIR ${ROCPROF_WRAPPER_DIR}/tool)
|
||||
|
||||
#Function to generate header template file
|
||||
# Function to generate header template file
|
||||
function(create_header_template)
|
||||
file(WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in "/*
|
||||
file(
|
||||
WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in
|
||||
"/*
|
||||
Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
@@ -69,105 +71,142 @@ function(create_header_template)
|
||||
#endif")
|
||||
endfunction()
|
||||
|
||||
#use header template file and generate wrapper header files
|
||||
# use header template file and generate wrapper header files
|
||||
function(generate_wrapper_header)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
|
||||
#find all header files from inc
|
||||
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
|
||||
#Convert the list of files into #includes
|
||||
foreach(header_file ${include_files})
|
||||
#set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
|
||||
# find all header files from inc
|
||||
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
|
||||
# Convert the list of files into #includes
|
||||
foreach(header_file ${include_files})
|
||||
# set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
|
||||
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
|
||||
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
|
||||
# set include statement
|
||||
get_filename_component(file_name ${header_file} NAME)
|
||||
set(include_statements
|
||||
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
|
||||
)
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
|
||||
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
unset(include_guard)
|
||||
unset(include_statements)
|
||||
endforeach()
|
||||
|
||||
# Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged.
|
||||
# So drectly using that file name
|
||||
set(file_name "activity.h")
|
||||
# set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
|
||||
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
|
||||
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
|
||||
#set include statement
|
||||
get_filename_component(file_name ${header_file} NAME)
|
||||
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
unset(include_guard)
|
||||
unset(include_statements)
|
||||
endforeach()
|
||||
|
||||
#Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged. So drectly using that file name
|
||||
set(file_name "activity.h")
|
||||
#set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
|
||||
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
|
||||
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
|
||||
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
set(include_statements
|
||||
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
|
||||
)
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
|
||||
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
endfunction()
|
||||
|
||||
#function to create symlink to binaries
|
||||
# function to create symlink to binaries
|
||||
function(create_binary_symlink)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
|
||||
#create symlink for rocprof
|
||||
set(file_name "rocprof")
|
||||
add_custom_target(link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_BINDIR}/${file_name} ${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
|
||||
# create symlink for rocprof
|
||||
set(file_name "rocprof")
|
||||
add_custom_target(
|
||||
link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_BINDIR}/${file_name}
|
||||
${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
|
||||
|
||||
endfunction()
|
||||
|
||||
#function to create symlink to libraries
|
||||
# function to create symlink to libraries
|
||||
function(create_library_symlink)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
|
||||
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
|
||||
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
|
||||
set(SO_VERSION "${LIB_VERSION_STRING}")
|
||||
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}" "${LIB_ROCPROF}.${SO_VERSION}")
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
|
||||
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
|
||||
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
|
||||
set(SO_VERSION "${LIB_VERSION_STRING}")
|
||||
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}"
|
||||
"${LIB_ROCPROF}.${SO_VERSION}")
|
||||
|
||||
foreach(file_name ${library_files})
|
||||
add_custom_target(link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
|
||||
endforeach()
|
||||
#create symlink to rocprofiler/tool/libtool.so
|
||||
# With File reorg,tool renamed to rocprof-tool
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
|
||||
set(LIB_TOOL "libtool.so")
|
||||
set(LIB_ROCPROFTOOL "librocprof-tool.so")
|
||||
add_custom_target(link_${LIB_TOOL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL} ${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
|
||||
#create symlink to test binary
|
||||
#since its saved in lib folder , the code for the same is added here
|
||||
# With File reorg ,binary name changed from ctrl to rocprof-ctrl
|
||||
set(TEST_CTRL "ctrl")
|
||||
set(TEST_ROCPROFCTRL "rocprof-ctrl")
|
||||
add_custom_target(link_${TEST_CTRL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL} ${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
|
||||
set(METRICS "metrics.xml")
|
||||
add_custom_target(link_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
|
||||
foreach(file_name ${library_files})
|
||||
add_custom_target(
|
||||
link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${file_name}
|
||||
${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
|
||||
endforeach()
|
||||
# create symlink to rocprofiler/tool/libtool.so With File reorg,tool renamed to
|
||||
# rocprof-tool
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
|
||||
set(LIB_TOOL "libtool.so")
|
||||
set(LIB_ROCPROFTOOL "librocprof-tool.so")
|
||||
add_custom_target(
|
||||
link_${LIB_TOOL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL}
|
||||
${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
|
||||
# create symlink to test binary since its saved in lib folder , the code for the same
|
||||
# is added here With File reorg ,binary name changed from ctrl to rocprof-ctrl
|
||||
set(TEST_CTRL "ctrl")
|
||||
set(TEST_ROCPROFCTRL "rocprof-ctrl")
|
||||
add_custom_target(
|
||||
link_${TEST_CTRL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL}
|
||||
${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
|
||||
set(METRICS "metrics.xml")
|
||||
add_custom_target(
|
||||
link_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS}
|
||||
${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
|
||||
|
||||
set(GFX_METRICS "gfx_metrics.xml")
|
||||
add_custom_target(link_gfx_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
|
||||
set(GFX_METRICS "gfx_metrics.xml")
|
||||
add_custom_target(
|
||||
link_gfx_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS}
|
||||
${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
|
||||
endfunction()
|
||||
|
||||
#Creater a template for header file
|
||||
# Creater a template for header file
|
||||
create_header_template()
|
||||
#Use template header file and generater wrapper header files
|
||||
# Use template header file and generater wrapper header files
|
||||
generate_wrapper_header()
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_INC_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT dev)
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_INC_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT dev)
|
||||
# Create symlink to binaries
|
||||
create_binary_symlink()
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
create_library_symlink()
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR} DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime
|
||||
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
|
||||
install(FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so DESTINATION ${ROCPROFILER_NAME}/lib
|
||||
COMPONENT dev)
|
||||
#install tools directory
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime
|
||||
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
|
||||
install(
|
||||
FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so
|
||||
DESTINATION ${ROCPROFILER_NAME}/lib
|
||||
COMPONENT dev)
|
||||
# install tools directory
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -1,15 +1,18 @@
|
||||
include (CheckCSourceCompiles)
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
include(CheckCSourceCompiles)
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
# General Requirements
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
|
||||
include_directories(${HSA_RUNTIME_INC_PATH})
|
||||
|
||||
@@ -22,138 +25,179 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${ROCM_PATH}/lib/cmake/hip")
|
||||
set(CMAKE_HIP_ARCHITECTURES OFF)
|
||||
find_package(HIP REQUIRED MODULE)
|
||||
|
||||
find_package(Clang REQUIRED CONFIG
|
||||
PATHS "${ROCM_PATH}"
|
||||
PATH_SUFFIXES "llvm/lib/cmake/clang")
|
||||
find_package(
|
||||
Clang REQUIRED CONFIG
|
||||
PATHS "${ROCM_PATH}"
|
||||
PATH_SUFFIXES "llvm/lib/cmake/clang")
|
||||
|
||||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules" "${ROCM_PATH}/lib/cmake/hip")
|
||||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules"
|
||||
"${ROCM_PATH}/lib/cmake/hip")
|
||||
find_package(LibElf REQUIRED)
|
||||
find_package(LibDw REQUIRED)
|
||||
|
||||
## Add a custom targets to build and run all the tests
|
||||
# Add a custom targets to build and run all the tests
|
||||
add_custom_target(samples)
|
||||
add_dependencies(samples rocprofiler-v2)
|
||||
add_custom_target(run-samples COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh DEPENDS samples)
|
||||
add_custom_target(
|
||||
run-samples
|
||||
COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh
|
||||
DEPENDS samples)
|
||||
|
||||
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
# Samples Build & Run Script
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Profiler Samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
## Build Kernel No Replay Sample
|
||||
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build Kernel No Replay Sample
|
||||
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(
|
||||
profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(profiler_kernel_no_replay PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(profiler_kernel_no_replay PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples profiler_kernel_no_replay)
|
||||
install(TARGETS profiler_kernel_no_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS profiler_kernel_no_replay
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build Device Profiling Sample
|
||||
set_source_files_properties(profiler/device_profiling_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build Device Profiling Sample
|
||||
set_source_files_properties(profiler/device_profiling_sample.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(profiler_device_profiling PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(profiler_device_profiling PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples profiler_device_profiling)
|
||||
install(TARGETS profiler_device_profiling RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS profiler_device_profiling
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/pcie_counters_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/pcie_counters_example.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(pcie_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
|
||||
target_link_options(pcie_counters_sampler PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples pcie_counters_sampler)
|
||||
install(TARGETS pcie_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS pcie_counters_sampler
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build XGMI Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build XGMI Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(
|
||||
xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(xgmi_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
|
||||
target_link_options(xgmi_counters_sampler PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples xgmi_counters_sampler)
|
||||
install(TARGETS xgmi_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS xgmi_counters_sampler
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
# ################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Tracer Samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
## Build HIP/HSA Trace Sample
|
||||
# Build HIP/HSA Trace Sample
|
||||
set_source_files_properties(tracer/sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(tracer_hip_hsa tracer/sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(tracer_hip_hsa PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(tracer_hip_hsa PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples tracer_hip_hsa)
|
||||
install(TARGETS tracer_hip_hsa RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS tracer_hip_hsa
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build HIP/HSA Trace with async output api trace data Sample
|
||||
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build HIP/HSA Trace with async output api trace data Sample
|
||||
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT
|
||||
1)
|
||||
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(tracer_hip_hsa_async PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(tracer_hip_hsa_async PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples tracer_hip_hsa_async)
|
||||
install(TARGETS tracer_hip_hsa_async RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS tracer_hip_hsa_async
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# PC Sampling Samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
set(CODE_PRINTING_SAMPLE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/pcsampler/code_printing_sample)
|
||||
file(GLOB PC_SAMPLING_CODE_PRINTING_FILES ${CODE_PRINTING_SAMPLE_DIR}/*.cpp)
|
||||
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES}
|
||||
HIPCC_OPTIONS
|
||||
-std=c++17
|
||||
# Include debugging symbols and source for the contextual disassembly
|
||||
-gdwarf-4)
|
||||
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES}
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(
|
||||
pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES} HIPCC_OPTIONS -std=c++17
|
||||
# Include debugging symbols and source for the contextual disassembly
|
||||
-gdwarf-4)
|
||||
|
||||
check_c_source_compiles("
|
||||
check_c_source_compiles(
|
||||
"
|
||||
#define _GNU_SOURCE
|
||||
#include <sys/mman.h>
|
||||
int main() { return memfd_create (\"cmake_test\", 0); }
|
||||
" HAVE_MEMFD_CREATE)
|
||||
if (HAVE_MEMFD_CREATE)
|
||||
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
|
||||
endif()
|
||||
"
|
||||
HAVE_MEMFD_CREATE)
|
||||
if(HAVE_MEMFD_CREATE)
|
||||
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
|
||||
endif()
|
||||
|
||||
target_link_libraries(pc_sampling_code_printing
|
||||
PRIVATE
|
||||
rocprofiler-v2
|
||||
rocm-dbgapi
|
||||
${LIBELF_LIBRARIES}
|
||||
${LIBDW_LIBRARIES}
|
||||
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
|
||||
target_include_directories(pc_sampling_code_printing
|
||||
PRIVATE
|
||||
${TEST_DIR}
|
||||
${ROOT_DIR}
|
||||
${HSA_RUNTIME_INC_PATH}
|
||||
${PROJECT_SOURCE_DIR})
|
||||
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples pc_sampling_code_printing)
|
||||
install(TARGETS pc_sampling_code_printing RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
target_link_libraries(
|
||||
pc_sampling_code_printing
|
||||
PRIVATE rocprofiler-v2 rocm-dbgapi ${LIBELF_LIBRARIES} ${LIBDW_LIBRARIES}
|
||||
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
|
||||
target_include_directories(
|
||||
pc_sampling_code_printing PRIVATE ${TEST_DIR} ${ROOT_DIR} ${HSA_RUNTIME_INC_PATH}
|
||||
${PROJECT_SOURCE_DIR})
|
||||
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples pc_sampling_code_printing)
|
||||
install(TARGETS pc_sampling_code_printing
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
install(DIRECTORY "${PROJECT_SOURCE_DIR}/samples/" DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src OPTIONAL COMPONENT samples)
|
||||
install(
|
||||
DIRECTORY "${PROJECT_SOURCE_DIR}/samples/"
|
||||
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src
|
||||
OPTIONAL
|
||||
COMPONENT samples)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Scripts to run samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# Copy run_samples script to samples folder
|
||||
configure_file(run_samples.sh ${PROJECT_BINARY_DIR}/samples COPYONLY)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
@@ -8,15 +8,14 @@ int main(int argc, char** argv) {
|
||||
"CI_PERF_slv_MemRd_Bandwidth0", "CI_PERF_slv_MemWr_Bandwidth0", "CI_PERF_slv_totalMemRdTx",
|
||||
"CI_PERF_slv_totalMemWrTx", "CI_PERF_slv_totalTx"};
|
||||
|
||||
if(argc > 1) {
|
||||
if (argc > 1) {
|
||||
counter_option = atoi(argv[1]);
|
||||
}
|
||||
else{
|
||||
std::cout<< "Please provide one of the counter index options as argument:\n";
|
||||
for(int i = 0; i < pcie_counters.size(); i++){
|
||||
std::cout<< "[" << i << "]: " << pcie_counters[i] << std::endl;
|
||||
} else {
|
||||
std::cout << "Please provide one of the counter index options as argument:\n";
|
||||
for (int i = 0; i < pcie_counters.size(); i++) {
|
||||
std::cout << "[" << i << "]: " << pcie_counters[i] << std::endl;
|
||||
}
|
||||
std::cout<< "Example:\n ./pcie_counters_sampler 1\n";
|
||||
std::cout << "Example:\n ./pcie_counters_sampler 1\n";
|
||||
exit(0);
|
||||
}
|
||||
|
||||
@@ -55,10 +54,10 @@ int main(int argc, char** argv) {
|
||||
.sampling_rate = rate,
|
||||
.sampling_duration = duration,
|
||||
.gpu_agent_index = 0};
|
||||
CHECK_ROCPROFILER(
|
||||
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
|
||||
0, &filter_id, property));
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
|
||||
property));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls
|
||||
|
||||
+8
-8
@@ -40,14 +40,14 @@ int main(int argc, char** argv) {
|
||||
uint32_t duration = 5000;
|
||||
|
||||
rocprofiler_counters_sampler_parameters_t cs_parameters = {.counters = counters_input,
|
||||
.counters_num = 1,
|
||||
.sampling_rate = rate,
|
||||
.sampling_duration = duration,
|
||||
.gpu_agent_index = 0};
|
||||
CHECK_ROCPROFILER(
|
||||
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
|
||||
0, &filter_id, property));
|
||||
.counters_num = 1,
|
||||
.sampling_rate = rate,
|
||||
.sampling_duration = duration,
|
||||
.gpu_agent_index = 0};
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
|
||||
property));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls
|
||||
|
||||
+878
-1037
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
+49
-71
@@ -31,96 +31,74 @@
|
||||
namespace amd::debug_agent {
|
||||
|
||||
class code_object_t {
|
||||
struct symbol_info_t {
|
||||
const std::string m_name;
|
||||
amd_dbgapi_global_address_t m_value;
|
||||
amd_dbgapi_size_t m_size;
|
||||
};
|
||||
struct symbol_info_t {
|
||||
const std::string m_name;
|
||||
amd_dbgapi_global_address_t m_value;
|
||||
amd_dbgapi_size_t m_size;
|
||||
};
|
||||
|
||||
using symbol_map_t =
|
||||
std::optional
|
||||
< std::map
|
||||
< amd_dbgapi_global_address_t
|
||||
, std::pair<std::string, amd_dbgapi_size_t>
|
||||
>
|
||||
>;
|
||||
using symbol_map_t = std::optional<
|
||||
std::map<amd_dbgapi_global_address_t, std::pair<std::string, amd_dbgapi_size_t>>>;
|
||||
|
||||
public:
|
||||
void load_symbol_map();
|
||||
void load_debug_info();
|
||||
public:
|
||||
void load_symbol_map();
|
||||
void load_debug_info();
|
||||
|
||||
std::optional<symbol_info_t>
|
||||
find_symbol(amd_dbgapi_global_address_t address);
|
||||
std::optional<symbol_info_t> find_symbol(amd_dbgapi_global_address_t address);
|
||||
|
||||
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
|
||||
code_object_t(code_object_t &&rhs);
|
||||
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
|
||||
code_object_t(code_object_t&& rhs);
|
||||
|
||||
~code_object_t();
|
||||
~code_object_t();
|
||||
|
||||
void open();
|
||||
bool is_open() const { return m_fd.has_value(); }
|
||||
void open();
|
||||
bool is_open() const { return m_fd.has_value(); }
|
||||
|
||||
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
|
||||
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
|
||||
// FIXME(?): extra function not in rocr-debug-agent
|
||||
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
|
||||
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
|
||||
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
|
||||
// FIXME(?): extra function not in rocr-debug-agent
|
||||
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
|
||||
|
||||
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t pc);
|
||||
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t pc);
|
||||
|
||||
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t start_addr,
|
||||
bool const print_src = false);
|
||||
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t start_addr, bool const print_src = false);
|
||||
|
||||
bool save(const std::string &directory) const;
|
||||
bool save(const std::string& directory) const;
|
||||
|
||||
amd_dbgapi_global_address_t m_load_address{ 0 };
|
||||
amd_dbgapi_size_t m_mem_size{ 0 };
|
||||
std::optional<int> m_fd;
|
||||
amd_dbgapi_global_address_t m_load_address{0};
|
||||
amd_dbgapi_size_t m_mem_size{0};
|
||||
std::optional<int> m_fd;
|
||||
|
||||
std::optional
|
||||
< std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>
|
||||
>
|
||||
m_line_number_map;
|
||||
std::optional<std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>>
|
||||
m_line_number_map;
|
||||
|
||||
std::optional
|
||||
< std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>
|
||||
>
|
||||
m_pc_ranges_map;
|
||||
std::optional<std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>> m_pc_ranges_map;
|
||||
|
||||
symbol_map_t m_symbol_map;
|
||||
std::string m_uri;
|
||||
amd_dbgapi_code_object_id_t const m_code_object_id;
|
||||
// FIXME(?): extra field not in rocr-debug-agent
|
||||
uint32_t m_elf_amdgpu_machine{ 0 };
|
||||
symbol_map_t m_symbol_map;
|
||||
std::string m_uri;
|
||||
amd_dbgapi_code_object_id_t const m_code_object_id;
|
||||
// FIXME(?): extra field not in rocr-debug-agent
|
||||
uint32_t m_elf_amdgpu_machine{0};
|
||||
};
|
||||
|
||||
} // namespace amd::debug_agent
|
||||
} // namespace amd::debug_agent
|
||||
|
||||
enum struct disassembly_mode {
|
||||
AROUND,
|
||||
KERNEL
|
||||
};
|
||||
enum struct disassembly_mode { AROUND, KERNEL };
|
||||
|
||||
std::tuple
|
||||
< amd_dbgapi_process_id_t
|
||||
, std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
|
||||
>
|
||||
std::tuple<amd_dbgapi_process_id_t,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>>
|
||||
init_disassembly();
|
||||
|
||||
void
|
||||
disassemble(
|
||||
disassembly_mode const mode,
|
||||
amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
|
||||
&code_object_map,
|
||||
uint64_t const addr);
|
||||
void disassemble(
|
||||
disassembly_mode const mode, amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
|
||||
uint64_t const addr);
|
||||
|
||||
void
|
||||
print_pc_context(
|
||||
amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
|
||||
&code_object_map,
|
||||
amd_dbgapi_global_address_t const pc);
|
||||
void print_pc_context(
|
||||
amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
|
||||
amd_dbgapi_global_address_t const pc);
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
|
||||
|
||||
+82
-121
@@ -47,169 +47,130 @@
|
||||
#include "program.hpp"
|
||||
|
||||
struct libc_freer {
|
||||
void operator()(char *p) { free(p); }
|
||||
void operator()(char* p) { free(p); }
|
||||
};
|
||||
|
||||
namespace util {
|
||||
|
||||
template <typename T, typename... Ts>
|
||||
static void
|
||||
hash_combine(size_t &hsh, T const& v, Ts const&... rest)
|
||||
{
|
||||
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
|
||||
(hash_combine(hsh, rest), ...);
|
||||
static void hash_combine(size_t& hsh, T const& v, Ts const&... rest) {
|
||||
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
|
||||
(hash_combine(hsh, rest), ...);
|
||||
}
|
||||
|
||||
} // namespace util
|
||||
} // namespace util
|
||||
|
||||
[[maybe_unused]]
|
||||
static inline bool
|
||||
operator==(hsa_executable_t const &l, hsa_executable_t const &r)
|
||||
{
|
||||
return l.handle == r.handle;
|
||||
[[maybe_unused]] static inline bool operator==(hsa_executable_t const& l,
|
||||
hsa_executable_t const& r) {
|
||||
return l.handle == r.handle;
|
||||
}
|
||||
|
||||
[[maybe_unused]]
|
||||
static inline bool
|
||||
operator==(
|
||||
rocprofiler_kernel_dispatch_id_t const &l,
|
||||
rocprofiler_kernel_dispatch_id_t const &r)
|
||||
{
|
||||
return l.value == r.value;
|
||||
[[maybe_unused]] static inline bool operator==(rocprofiler_kernel_dispatch_id_t const& l,
|
||||
rocprofiler_kernel_dispatch_id_t const& r) {
|
||||
return l.value == r.value;
|
||||
}
|
||||
|
||||
static inline bool
|
||||
operator==(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
|
||||
{
|
||||
return l.handle == r.handle;
|
||||
static inline bool operator==(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
|
||||
return l.handle == r.handle;
|
||||
}
|
||||
|
||||
static inline bool
|
||||
operator!=(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
|
||||
{
|
||||
return !(l == r);
|
||||
static inline bool operator!=(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
|
||||
return !(l == r);
|
||||
}
|
||||
|
||||
namespace std {
|
||||
|
||||
template <>
|
||||
struct hash<hsa_executable_t> {
|
||||
size_t operator()(hsa_executable_t const &v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.handle);
|
||||
return ret;
|
||||
}
|
||||
template <> struct hash<hsa_executable_t> {
|
||||
size_t operator()(hsa_executable_t const& v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.handle);
|
||||
return ret;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct hash<rocprofiler_kernel_dispatch_id_t> {
|
||||
size_t operator()(rocprofiler_kernel_dispatch_id_t const &v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.value);
|
||||
return ret;
|
||||
}
|
||||
template <> struct hash<rocprofiler_kernel_dispatch_id_t> {
|
||||
size_t operator()(rocprofiler_kernel_dispatch_id_t const& v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.value);
|
||||
return ret;
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace std
|
||||
} // namespace std
|
||||
|
||||
struct disassembly_ctx_t {
|
||||
disassembly_ctx_t();
|
||||
~disassembly_ctx_t();
|
||||
disassembly_ctx_t();
|
||||
~disassembly_ctx_t();
|
||||
|
||||
void disassemble_kernels(bool const reinitialize);
|
||||
void init();
|
||||
bool inited() const;
|
||||
void reset();
|
||||
void disassemble_kernels(bool const reinitialize);
|
||||
void init();
|
||||
bool inited() const;
|
||||
void reset();
|
||||
|
||||
amd_dbgapi_process_id_t process_id;
|
||||
std::map
|
||||
< amd_dbgapi_global_address_t
|
||||
, amd::debug_agent::code_object_t
|
||||
> codeobjs;
|
||||
amd_dbgapi_process_id_t process_id;
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t> codeobjs;
|
||||
};
|
||||
|
||||
disassembly_ctx_t::disassembly_ctx_t()
|
||||
: process_id(AMD_DBGAPI_PROCESS_NONE)
|
||||
, codeobjs()
|
||||
{}
|
||||
disassembly_ctx_t::disassembly_ctx_t() : process_id(AMD_DBGAPI_PROCESS_NONE), codeobjs() {}
|
||||
|
||||
disassembly_ctx_t::~disassembly_ctx_t()
|
||||
{
|
||||
disassembly_ctx_t::~disassembly_ctx_t() { reset(); }
|
||||
|
||||
void disassembly_ctx_t::disassemble_kernels(bool const reinitialize) {
|
||||
if (reinitialize) {
|
||||
reset();
|
||||
}
|
||||
}
|
||||
if (!inited()) {
|
||||
init();
|
||||
}
|
||||
|
||||
void
|
||||
disassembly_ctx_t::disassemble_kernels(bool const reinitialize)
|
||||
{
|
||||
if (reinitialize) {
|
||||
reset();
|
||||
}
|
||||
if (!inited()) {
|
||||
init();
|
||||
auto it = codeobjs.begin();
|
||||
auto const end = codeobjs.end();
|
||||
auto const pred = [](decltype(*it)& x) {
|
||||
/*
|
||||
* A lame filter for the kernels in the current file, because nothing
|
||||
* else in this little demo will have the URL prefix of `file://`.
|
||||
*/
|
||||
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
|
||||
};
|
||||
while (end != (it = std::find_if(it, end, pred))) {
|
||||
auto& codeobj = it->second;
|
||||
codeobj.load_symbol_map();
|
||||
if (!codeobj.m_symbol_map) {
|
||||
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
|
||||
break;
|
||||
}
|
||||
|
||||
auto it = codeobjs.begin();
|
||||
auto const end = codeobjs.end();
|
||||
auto const pred = [](decltype(*it) &x){
|
||||
/*
|
||||
* A lame filter for the kernels in the current file, because nothing
|
||||
* else in this little demo will have the URL prefix of `file://`.
|
||||
*/
|
||||
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
|
||||
};
|
||||
while (end != (it = std::find_if(it, end, pred))) {
|
||||
auto &codeobj = it->second;
|
||||
codeobj.load_symbol_map();
|
||||
if (!codeobj.m_symbol_map) {
|
||||
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
|
||||
break;
|
||||
}
|
||||
|
||||
for (auto const &sym : *codeobj.m_symbol_map) {
|
||||
auto const &addr = sym.first;
|
||||
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
|
||||
}
|
||||
|
||||
++it;
|
||||
for (auto const& sym : *codeobj.m_symbol_map) {
|
||||
auto const& addr = sym.first;
|
||||
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
|
||||
}
|
||||
|
||||
++it;
|
||||
}
|
||||
}
|
||||
|
||||
inline void
|
||||
disassembly_ctx_t::init()
|
||||
{
|
||||
std::tie(process_id, codeobjs) = init_disassembly();
|
||||
}
|
||||
inline void disassembly_ctx_t::init() { std::tie(process_id, codeobjs) = init_disassembly(); }
|
||||
|
||||
inline bool
|
||||
disassembly_ctx_t::inited() const
|
||||
{
|
||||
return AMD_DBGAPI_PROCESS_NONE != process_id;
|
||||
}
|
||||
inline bool disassembly_ctx_t::inited() const { return AMD_DBGAPI_PROCESS_NONE != process_id; }
|
||||
|
||||
void
|
||||
disassembly_ctx_t::reset()
|
||||
{
|
||||
codeobjs.clear();
|
||||
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
|
||||
amd_dbgapi_process_detach(process_id);
|
||||
amd_dbgapi_finalize();
|
||||
process_id = AMD_DBGAPI_PROCESS_NONE;
|
||||
}
|
||||
void disassembly_ctx_t::reset() {
|
||||
codeobjs.clear();
|
||||
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
|
||||
amd_dbgapi_process_detach(process_id);
|
||||
amd_dbgapi_finalize();
|
||||
process_id = AMD_DBGAPI_PROCESS_NONE;
|
||||
}
|
||||
}
|
||||
|
||||
static disassembly_ctx_t g_dis;
|
||||
|
||||
void
|
||||
disassembly_disassemble_kernels(bool const reinitialize)
|
||||
{
|
||||
g_dis.disassemble_kernels(reinitialize);
|
||||
void disassembly_disassemble_kernels(bool const reinitialize) {
|
||||
g_dis.disassemble_kernels(reinitialize);
|
||||
}
|
||||
|
||||
void
|
||||
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc)
|
||||
{
|
||||
if (!g_dis.inited()) {
|
||||
g_dis.init();
|
||||
}
|
||||
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
|
||||
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc) {
|
||||
if (!g_dis.inited()) {
|
||||
g_dis.init();
|
||||
}
|
||||
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
|
||||
}
|
||||
|
||||
@@ -23,10 +23,8 @@
|
||||
|
||||
#include <amd-dbgapi/amd-dbgapi.h>
|
||||
|
||||
void
|
||||
disassembly_disassemble_kernels(bool const);
|
||||
void disassembly_disassemble_kernels(bool const);
|
||||
|
||||
void
|
||||
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
|
||||
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
|
||||
|
||||
@@ -46,274 +46,227 @@
|
||||
namespace util {
|
||||
|
||||
struct hipMalloc_freer {
|
||||
void operator()(void * const ptr) { (void)hipFree(ptr); }
|
||||
void operator()(void* const ptr) { (void)hipFree(ptr); }
|
||||
};
|
||||
|
||||
} // namespace util
|
||||
} // namespace util
|
||||
|
||||
namespace prng {
|
||||
|
||||
static uint64_t
|
||||
splitmix64_next(uint64_t * const sm64_state)
|
||||
{
|
||||
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
|
||||
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
|
||||
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
|
||||
return z ^ (z >> 31);
|
||||
static uint64_t splitmix64_next(uint64_t* const sm64_state) {
|
||||
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
|
||||
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
|
||||
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
|
||||
return z ^ (z >> 31);
|
||||
}
|
||||
|
||||
static inline uint64_t
|
||||
rotl64(const uint64_t x, int k)
|
||||
{
|
||||
return (x << k) | (x >> (64 - k));
|
||||
static inline uint64_t rotl64(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); }
|
||||
|
||||
static uint64_t xrs_next(uint64_t* const xrs_state) {
|
||||
const uint64_t result = rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
|
||||
|
||||
const uint64_t t = xrs_state[1] << 17;
|
||||
|
||||
xrs_state[2] ^= xrs_state[0];
|
||||
xrs_state[3] ^= xrs_state[1];
|
||||
xrs_state[1] ^= xrs_state[2];
|
||||
xrs_state[0] ^= xrs_state[3];
|
||||
|
||||
xrs_state[2] ^= t;
|
||||
|
||||
xrs_state[3] = rotl64(xrs_state[3], 45);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
static uint64_t
|
||||
xrs_next(uint64_t * const xrs_state)
|
||||
{
|
||||
const uint64_t result =
|
||||
rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
|
||||
|
||||
const uint64_t t = xrs_state[1] << 17;
|
||||
|
||||
xrs_state[2] ^= xrs_state[0];
|
||||
xrs_state[3] ^= xrs_state[1];
|
||||
xrs_state[1] ^= xrs_state[2];
|
||||
xrs_state[0] ^= xrs_state[3];
|
||||
|
||||
xrs_state[2] ^= t;
|
||||
|
||||
xrs_state[3] = rotl64(xrs_state[3], 45);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
} // namespace prng
|
||||
} // namespace prng
|
||||
|
||||
namespace kernel {
|
||||
|
||||
template <typename T>
|
||||
__global__ static void
|
||||
memset_gpu(T * const s, T const c, size_t const n)
|
||||
{
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
s[i] = c;
|
||||
}
|
||||
template <typename T> __global__ static void memset_gpu(T* const s, T const c, size_t const n) {
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
s[i] = c;
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ static void
|
||||
count_gpu(
|
||||
T const * const xs,
|
||||
T * const out,
|
||||
size_t const n,
|
||||
size_t const nblocks,
|
||||
T const gt)
|
||||
{
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
if (xs[i] > gt) {
|
||||
atomicAdd(&out[i % nblocks], 1);
|
||||
}
|
||||
__global__ static void count_gpu(T const* const xs, T* const out, size_t const n,
|
||||
size_t const nblocks, T const gt) {
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
if (xs[i] > gt) {
|
||||
atomicAdd(&out[i % nblocks], 1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace kernel
|
||||
} // namespace kernel
|
||||
|
||||
static char const GETOPT_ARGS[] = "cd:mn:DP";
|
||||
|
||||
static void
|
||||
usage()
|
||||
{
|
||||
fputs("usage: " PROGNAME " [OPTION]... MIN [SEED]\n"
|
||||
" -d DEV\tHIP device number\n"
|
||||
" -n LEN\tLength of random integer array\n"
|
||||
" -D\t\tPrint kernel disassembly\n"
|
||||
" -P\t\tPrint source and disassembly of sampled PC locations\n"
|
||||
"where\n"
|
||||
" DEV : i32\n"
|
||||
" MIN : u64\n"
|
||||
" LEN : u64\n"
|
||||
" SEED : u64\n",
|
||||
stderr);
|
||||
static void usage() {
|
||||
fputs("usage: " PROGNAME
|
||||
" [OPTION]... MIN [SEED]\n"
|
||||
" -d DEV\tHIP device number\n"
|
||||
" -n LEN\tLength of random integer array\n"
|
||||
" -D\t\tPrint kernel disassembly\n"
|
||||
" -P\t\tPrint source and disassembly of sampled PC locations\n"
|
||||
"where\n"
|
||||
" DEV : i32\n"
|
||||
" MIN : u64\n"
|
||||
" LEN : u64\n"
|
||||
" SEED : u64\n",
|
||||
stderr);
|
||||
}
|
||||
|
||||
static int
|
||||
get_options(int argc, char **argv, program_options * const opts)
|
||||
{
|
||||
int opt;
|
||||
static int get_options(int argc, char** argv, program_options* const opts) {
|
||||
int opt;
|
||||
|
||||
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
|
||||
switch (opt) {
|
||||
case 'd':
|
||||
// TODO error checking
|
||||
opts->device = strtol(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'n':
|
||||
// TODO error checking
|
||||
opts->rands_len = strtoul(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'D':
|
||||
opts->disassemble = true;
|
||||
break;
|
||||
case 'P':
|
||||
opts->pc_sampling = true;
|
||||
break;
|
||||
default:
|
||||
usage();
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
}
|
||||
|
||||
auto const optcount = argc - optind;
|
||||
if (!(1 == optcount || 2 == optcount)) {
|
||||
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
|
||||
switch (opt) {
|
||||
case 'd':
|
||||
// TODO error checking
|
||||
opts->device = strtol(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'n':
|
||||
// TODO error checking
|
||||
opts->rands_len = strtoul(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'D':
|
||||
opts->disassemble = true;
|
||||
break;
|
||||
case 'P':
|
||||
opts->pc_sampling = true;
|
||||
break;
|
||||
default:
|
||||
usage();
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
}
|
||||
|
||||
// TODO error checking
|
||||
opts->gt = strtoul(argv[optind], nullptr, 10);
|
||||
if (2 == argc - optind) {
|
||||
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
|
||||
}
|
||||
auto const optcount = argc - optind;
|
||||
if (!(1 == optcount || 2 == optcount)) {
|
||||
usage();
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
return EXIT_SUCCESS;
|
||||
// TODO error checking
|
||||
opts->gt = strtoul(argv[optind], nullptr, 10);
|
||||
if (2 == argc - optind) {
|
||||
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
|
||||
}
|
||||
|
||||
return EXIT_SUCCESS;
|
||||
}
|
||||
|
||||
static program_options g_opts;
|
||||
|
||||
static void
|
||||
callback_flush_fn(
|
||||
rocprofiler_record_header_t const *record,
|
||||
rocprofiler_record_header_t const *end_record,
|
||||
rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id)
|
||||
{
|
||||
while (record < end_record) {
|
||||
if (nullptr == record) {
|
||||
break;
|
||||
}
|
||||
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
|
||||
auto const &pcr = (rocprofiler_record_pc_sample_t &)*record;
|
||||
printf(
|
||||
"dispatch[%" PRIu64 "] timestamp(%" PRIu64
|
||||
") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
|
||||
") se(%" PRIu32 ")\n",
|
||||
pcr.pc_sample.dispatch_id.value,
|
||||
pcr.pc_sample.timestamp.value,
|
||||
pcr.pc_sample.gpu_id.handle,
|
||||
pcr.pc_sample.pc,
|
||||
pcr.pc_sample.se);
|
||||
if (g_opts.pc_sampling) {
|
||||
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
|
||||
}
|
||||
}
|
||||
rocprofiler_next_record(record, &record, session_id, buffer_id);
|
||||
static void callback_flush_fn(rocprofiler_record_header_t const* record,
|
||||
rocprofiler_record_header_t const* end_record,
|
||||
rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id) {
|
||||
while (record < end_record) {
|
||||
if (nullptr == record) {
|
||||
break;
|
||||
}
|
||||
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
|
||||
auto const& pcr = (rocprofiler_record_pc_sample_t&)*record;
|
||||
printf("dispatch[%" PRIu64 "] timestamp(%" PRIu64 ") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
|
||||
") se(%" PRIu32 ")\n",
|
||||
pcr.pc_sample.dispatch_id.value, pcr.pc_sample.timestamp.value,
|
||||
pcr.pc_sample.gpu_id.handle, pcr.pc_sample.pc, pcr.pc_sample.se);
|
||||
if (g_opts.pc_sampling) {
|
||||
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
|
||||
}
|
||||
}
|
||||
rocprofiler_next_record(record, &record, session_id, buffer_id);
|
||||
}
|
||||
}
|
||||
|
||||
static int
|
||||
run_kernel(program_options const &opts)
|
||||
{
|
||||
rocprofiler_session_id_t sid;
|
||||
rocprofiler_filter_id_t fid, fid2;
|
||||
rocprofiler_buffer_id_t bid;
|
||||
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
static int run_kernel(program_options const& opts) {
|
||||
rocprofiler_session_id_t sid;
|
||||
rocprofiler_filter_id_t fid, fid2;
|
||||
rocprofiler_buffer_id_t bid;
|
||||
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
|
||||
if (opts.pc_sampling) {
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to create rocprofiler session\n", stderr);
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
rocprofiler_filter_property_t property{};
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_buffer(
|
||||
sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to add PC sampling session mode\n", stderr);
|
||||
goto out;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_filter(
|
||||
sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
|
||||
rocprofiler_filter_data_t{},
|
||||
0, &fid, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_filter(
|
||||
sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{},
|
||||
0, &fid2, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_set_filter_buffer(sid, fid, bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_set_filter_buffer(sid, fid2, bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_start_session(sid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
if (opts.pc_sampling) {
|
||||
ROCPROFILER_CHECK(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to create rocprofiler session\n", stderr);
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
{
|
||||
rocprofiler_filter_property_t property{};
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_buffer(sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to add PC sampling session mode\n", stderr);
|
||||
goto out;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &fid, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &fid2, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid, bid), rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid2, bid), rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_start_session(sid), rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
printf("seed = %" PRIu64 "\n", opts.seed);
|
||||
|
||||
std::vector<uint64_t> rands(opts.rands_len);
|
||||
using rands_elt_t = decltype(rands)::value_type;
|
||||
|
||||
uint64_t
|
||||
sm64_state = opts.seed,
|
||||
xrs_state[4];
|
||||
uint64_t sm64_state = opts.seed, xrs_state[4];
|
||||
|
||||
{
|
||||
using prng::splitmix64_next;
|
||||
using prng::xrs_next;
|
||||
using prng::splitmix64_next;
|
||||
using prng::xrs_next;
|
||||
|
||||
// Initialize the Xoroshiro PRNG
|
||||
xrs_state[0] = splitmix64_next(&sm64_state);
|
||||
xrs_state[1] = splitmix64_next(&sm64_state);
|
||||
xrs_state[2] = splitmix64_next(&sm64_state);
|
||||
xrs_state[3] = splitmix64_next(&sm64_state);
|
||||
// Initialize the Xoroshiro PRNG
|
||||
xrs_state[0] = splitmix64_next(&sm64_state);
|
||||
xrs_state[1] = splitmix64_next(&sm64_state);
|
||||
xrs_state[2] = splitmix64_next(&sm64_state);
|
||||
xrs_state[3] = splitmix64_next(&sm64_state);
|
||||
|
||||
// Fill rands with random integers
|
||||
for (auto &i : rands) {
|
||||
i = xrs_next(xrs_state);
|
||||
}
|
||||
// Fill rands with random integers
|
||||
for (auto& i : rands) {
|
||||
i = xrs_next(xrs_state);
|
||||
}
|
||||
}
|
||||
|
||||
struct tm {
|
||||
using monoclk = std::chrono::steady_clock;
|
||||
using dur = std::chrono::duration<double>;
|
||||
using monoclk = std::chrono::steady_clock;
|
||||
using dur = std::chrono::duration<double>;
|
||||
};
|
||||
|
||||
using util::hipMalloc_freer;
|
||||
@@ -322,126 +275,109 @@ run_kernel(program_options const &opts)
|
||||
|
||||
auto hip_ok = hipSuccess;
|
||||
do {
|
||||
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
|
||||
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
|
||||
|
||||
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
|
||||
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
|
||||
{
|
||||
rands_elt_t *rands_gpu_ptr;
|
||||
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
|
||||
rands_gpu.reset(rands_gpu_ptr);
|
||||
}
|
||||
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
|
||||
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
|
||||
{
|
||||
rands_elt_t* rands_gpu_ptr;
|
||||
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
|
||||
rands_gpu.reset(rands_gpu_ptr);
|
||||
}
|
||||
|
||||
HIP_CHECK_BREAK(
|
||||
hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes,
|
||||
hipMemcpyHostToDevice),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
HIP_CHECK_BREAK(hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes, hipMemcpyHostToDevice),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
uint32_t constexpr nthreads = 256U;
|
||||
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
|
||||
uint32_t constexpr nthreads = 256U;
|
||||
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
|
||||
|
||||
using count_elt_t = size_t;
|
||||
using count_elt_t = size_t;
|
||||
|
||||
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
|
||||
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
|
||||
{
|
||||
count_elt_t *count_subtotals_gpu_ptr;
|
||||
HIP_CHECK_BREAK(
|
||||
hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes),
|
||||
hip_ok);
|
||||
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
|
||||
}
|
||||
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
|
||||
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
|
||||
{
|
||||
count_elt_t* count_subtotals_gpu_ptr;
|
||||
HIP_CHECK_BREAK(hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes), hip_ok);
|
||||
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
|
||||
}
|
||||
|
||||
hipLaunchKernelGGL(
|
||||
kernel::memset_gpu, nblocks, nthreads, 0, 0,
|
||||
count_subtotals_gpu.get(), 0UL, static_cast<size_t>(nblocks));
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
hipLaunchKernelGGL(kernel::memset_gpu, nblocks, nthreads, 0, 0, count_subtotals_gpu.get(),
|
||||
0UL, static_cast<size_t>(nblocks));
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
auto const kernel_begin_time = tm::monoclk::now();
|
||||
auto const kernel_begin_time = tm::monoclk::now();
|
||||
|
||||
hipLaunchKernelGGL(
|
||||
kernel::count_gpu, nblocks, nthreads, 0, 0,
|
||||
rands_gpu.get(), count_subtotals_gpu.get(), rands.size(),
|
||||
static_cast<size_t>(nblocks), opts.gt);
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
hipLaunchKernelGGL(kernel::count_gpu, nblocks, nthreads, 0, 0, rands_gpu.get(),
|
||||
count_subtotals_gpu.get(), rands.size(), static_cast<size_t>(nblocks),
|
||||
opts.gt);
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
auto const kernel_end_time = tm::monoclk::now();
|
||||
auto const kernel_end_time = tm::monoclk::now();
|
||||
|
||||
std::vector<size_t> count_subtotals(nblocks);
|
||||
HIP_CHECK_BREAK(
|
||||
hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
|
||||
count_subtotals_nbytes, hipMemcpyDeviceToHost),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
std::vector<size_t> count_subtotals(nblocks);
|
||||
HIP_CHECK_BREAK(hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
|
||||
count_subtotals_nbytes, hipMemcpyDeviceToHost),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
// TODO parallel sum on GPU
|
||||
auto const total =
|
||||
std::accumulate(
|
||||
count_subtotals.cbegin(), count_subtotals.cend(),
|
||||
static_cast<size_t>(0));
|
||||
// TODO parallel sum on GPU
|
||||
auto const total =
|
||||
std::accumulate(count_subtotals.cbegin(), count_subtotals.cend(), static_cast<size_t>(0));
|
||||
|
||||
auto const all_end_time = tm::monoclk::now();
|
||||
auto const all_end_time = tm::monoclk::now();
|
||||
|
||||
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
|
||||
auto total_time(all_end_time - begin_time);
|
||||
tm::dur const total_time_without_tool_init(total_time);
|
||||
printf("len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
|
||||
"main kernel time elapsed: %" DBL_FMT "\n"
|
||||
"full time elapsed: %" DBL_FMT "\n",
|
||||
rands.size(), opts.gt, total,
|
||||
kernel_time.count(),
|
||||
total_time_without_tool_init.count());
|
||||
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
|
||||
auto total_time(all_end_time - begin_time);
|
||||
tm::dur const total_time_without_tool_init(total_time);
|
||||
printf(
|
||||
"len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
|
||||
"main kernel time elapsed: %" DBL_FMT
|
||||
"\n"
|
||||
"full time elapsed: %" DBL_FMT "\n",
|
||||
rands.size(), opts.gt, total, kernel_time.count(), total_time_without_tool_init.count());
|
||||
} while (false);
|
||||
|
||||
if (opts.disassemble) {
|
||||
disassembly_disassemble_kernels(false);
|
||||
}
|
||||
disassembly_disassemble_kernels(false);
|
||||
}
|
||||
}
|
||||
|
||||
cleanup:
|
||||
if (opts.pc_sampling) {
|
||||
rocprofiler_terminate_session(sid);
|
||||
rocprofiler_flush_data(sid, bid);
|
||||
rocprofiler_destroy_session(sid);
|
||||
}
|
||||
if (opts.pc_sampling) {
|
||||
rocprofiler_terminate_session(sid);
|
||||
rocprofiler_flush_data(sid, bid);
|
||||
rocprofiler_destroy_session(sid);
|
||||
}
|
||||
|
||||
out:
|
||||
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok
|
||||
? EXIT_SUCCESS
|
||||
: EXIT_FAILURE;
|
||||
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok ? EXIT_SUCCESS : EXIT_FAILURE;
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char **argv)
|
||||
{
|
||||
if (auto const ret = get_options(argc, argv, &g_opts);
|
||||
EXIT_SUCCESS != ret)
|
||||
{
|
||||
return ret;
|
||||
}
|
||||
int main(int argc, char** argv) {
|
||||
if (auto const ret = get_options(argc, argv, &g_opts); EXIT_SUCCESS != ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (hsa_init() != HSA_STATUS_SUCCESS){
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
if (hsa_init() != HSA_STATUS_SUCCESS) {
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
int ret = EXIT_FAILURE;
|
||||
auto ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
int ret = EXIT_FAILURE;
|
||||
auto ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS == ok) {
|
||||
ret = run_kernel(g_opts);
|
||||
} else {
|
||||
goto out;
|
||||
}
|
||||
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS == ok) {
|
||||
ret = run_kernel(g_opts);
|
||||
} else {
|
||||
goto out;
|
||||
}
|
||||
|
||||
rocprofiler_finalize();
|
||||
rocprofiler_finalize();
|
||||
|
||||
out:
|
||||
hsa_shut_down();
|
||||
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret
|
||||
? EXIT_SUCCESS
|
||||
: EXIT_FAILURE;
|
||||
hsa_shut_down();
|
||||
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret ? EXIT_SUCCESS : EXIT_FAILURE;
|
||||
}
|
||||
|
||||
@@ -23,32 +23,30 @@
|
||||
|
||||
#define PROGNAME "code_printing_sample"
|
||||
|
||||
#define HIP_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, \
|
||||
PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", \
|
||||
__FILE__, __LINE__, hipGetErrorString((code))); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
#define HIP_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", __FILE__, __LINE__, \
|
||||
hipGetErrorString((code))); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
|
||||
#define HIP_CHECK_BREAK(expr, var) \
|
||||
if (auto const code = (expr); hipSuccess != code) { \
|
||||
HIP_ERROR(code); \
|
||||
(var) = code; \
|
||||
break; \
|
||||
}
|
||||
#define HIP_CHECK_BREAK(expr, var) \
|
||||
if (auto const code = (expr); hipSuccess != code) { \
|
||||
HIP_ERROR(code); \
|
||||
(var) = code; \
|
||||
break; \
|
||||
}
|
||||
|
||||
#define ROCPROFILER_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, \
|
||||
PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", \
|
||||
__FILE__, __LINE__, rocprofiler_error_str(code)); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
#define ROCPROFILER_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", __FILE__, \
|
||||
__LINE__, rocprofiler_error_str(code)); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
|
||||
#define ROCPROFILER_CHECK(expr, var) \
|
||||
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
|
||||
ROCPROFILER_ERROR((var)); \
|
||||
}
|
||||
#define ROCPROFILER_CHECK(expr, var) \
|
||||
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
|
||||
ROCPROFILER_ERROR((var)); \
|
||||
}
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
|
||||
|
||||
+18
-19
@@ -25,25 +25,24 @@
|
||||
#include <cstdint>
|
||||
|
||||
struct program_options {
|
||||
program_options()
|
||||
: device(0)
|
||||
, no_gpu(false)
|
||||
, hip_memset(false)
|
||||
, rands_len(1024 * 1024 * 4)
|
||||
, gt(0)
|
||||
, seed(std::chrono::steady_clock::now().time_since_epoch().count())
|
||||
, disassemble(false)
|
||||
, pc_sampling(false)
|
||||
{}
|
||||
program_options()
|
||||
: device(0),
|
||||
no_gpu(false),
|
||||
hip_memset(false),
|
||||
rands_len(1024 * 1024 * 4),
|
||||
gt(0),
|
||||
seed(std::chrono::steady_clock::now().time_since_epoch().count()),
|
||||
disassemble(false),
|
||||
pc_sampling(false) {}
|
||||
|
||||
int device;
|
||||
bool no_gpu;
|
||||
bool hip_memset;
|
||||
size_t rands_len;
|
||||
uint64_t gt;
|
||||
uint64_t seed;
|
||||
bool disassemble;
|
||||
bool pc_sampling;
|
||||
int device;
|
||||
bool no_gpu;
|
||||
bool hip_memset;
|
||||
size_t rands_len;
|
||||
uint64_t gt;
|
||||
uint64_t seed;
|
||||
bool disassemble;
|
||||
bool pc_sampling;
|
||||
};
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
|
||||
|
||||
@@ -23,8 +23,8 @@ int main(int argc, char** argv) {
|
||||
|
||||
int gpu_agent = 0;
|
||||
int cpu_agent = 0;
|
||||
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(&counters[0], counters.size(),
|
||||
&dp_session_id, gpu_agent, cpu_agent));
|
||||
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(
|
||||
&counters[0], counters.size(), &dp_session_id, gpu_agent, cpu_agent));
|
||||
|
||||
printf("session start \n");
|
||||
// start GPU device profiling
|
||||
|
||||
@@ -25,9 +25,10 @@ int main(int argc, char** argv) {
|
||||
counters.emplace_back("GRBM_COUNT");
|
||||
rocprofiler_filter_id_t filter_id;
|
||||
[[maybe_unused]] rocprofiler_filter_property_t property = {};
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
|
||||
rocprofiler_filter_data_t{.counters_names = &counters[0]},
|
||||
counters.size(), &filter_id, property));
|
||||
CHECK_ROCPROFILER(
|
||||
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
|
||||
rocprofiler_filter_data_t{.counters_names = &counters[0]},
|
||||
counters.size(), &filter_id, property));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls
|
||||
|
||||
@@ -40,9 +40,9 @@ int main(int argc, char** argv) {
|
||||
|
||||
// Kernel Tracing
|
||||
rocprofiler_filter_id_t kernel_tracing_filter_id;
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
|
||||
rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
|
||||
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls won't be traced
|
||||
|
||||
@@ -35,9 +35,9 @@ int main(int argc, char** argv) {
|
||||
|
||||
// Kernel Tracing
|
||||
rocprofiler_filter_id_t kernel_tracing_filter_id;
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
|
||||
rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
|
||||
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls won't be traced
|
||||
|
||||
@@ -1,25 +1,34 @@
|
||||
# ############################################################################################################################################
|
||||
# ROCProfiler General Requirements
|
||||
# ############################################################################################################################################
|
||||
find_package(Python3 COMPONENTS Interpreter REQUIRED)
|
||||
find_package(
|
||||
Python3
|
||||
COMPONENTS Interpreter
|
||||
REQUIRED)
|
||||
|
||||
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import lxml"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
execute_process(
|
||||
COMMAND ${Python3_EXECUTABLE} -c "import lxml"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
|
||||
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
|
||||
message(FATAL_ERROR "\
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"\
|
||||
The \"lxml\" Python3 package is not installed. \
|
||||
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install lxml\".\
|
||||
")
|
||||
endif()
|
||||
|
||||
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
execute_process(
|
||||
COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
|
||||
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
|
||||
message(FATAL_ERROR "\
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"\
|
||||
The \"CppHeaderParser\" Python3 package is not installed. \
|
||||
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install CppHeaderParser\".\
|
||||
")
|
||||
@@ -29,134 +38,157 @@ endif()
|
||||
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
# Getting HSA Include Directory
|
||||
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
|
||||
|
||||
find_library(AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_PATH} PATH_SUFFIXES lib)
|
||||
find_library(
|
||||
AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so"
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES lib)
|
||||
|
||||
if(NOT AQLPROFILE_LIB)
|
||||
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
|
||||
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
|
||||
endif()
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Adding Old Library Files
|
||||
# ############################################################################################################################################
|
||||
set (OLD_LIB_SRC
|
||||
${LIB_DIR}/core/rocprofiler.cpp
|
||||
${LIB_DIR}/core/gpu_command.cpp
|
||||
${LIB_DIR}/core/proxy_queue.cpp
|
||||
${LIB_DIR}/core/simple_proxy_queue.cpp
|
||||
${LIB_DIR}/core/intercept_queue.cpp
|
||||
${LIB_DIR}/core/metrics.cpp
|
||||
${LIB_DIR}/core/activity.cpp
|
||||
${LIB_DIR}/util/hsa_rsrc_factory.cpp
|
||||
)
|
||||
# ########################################################################################
|
||||
set(OLD_LIB_SRC
|
||||
${LIB_DIR}/core/rocprofiler.cpp
|
||||
${LIB_DIR}/core/gpu_command.cpp
|
||||
${LIB_DIR}/core/proxy_queue.cpp
|
||||
${LIB_DIR}/core/simple_proxy_queue.cpp
|
||||
${LIB_DIR}/core/intercept_queue.cpp
|
||||
${LIB_DIR}/core/metrics.cpp
|
||||
${LIB_DIR}/core/activity.cpp
|
||||
${LIB_DIR}/util/hsa_rsrc_factory.cpp)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Configuring Basic/Derived Counters
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
set(COUNTERS_DIR ${PROJECT_SOURCE_DIR}/src/core/counters)
|
||||
|
||||
execute_process(
|
||||
COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py ${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp
|
||||
COMMENT "Generating basic_counter.cpp...")
|
||||
COMMAND
|
||||
${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py
|
||||
${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp COMMENT
|
||||
"Generating basic_counter.cpp...")
|
||||
|
||||
# execute_process(
|
||||
# COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp
|
||||
# COMMENT "Generating derived_counter.cpp...")
|
||||
# execute_process( COMMAND ${Python3_EXECUTABLE}
|
||||
# ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived
|
||||
# ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp COMMENT "Generating
|
||||
# derived_counter.cpp...")
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ROCProfiler Tracer HIP/HSA Parsing
|
||||
# ############################################################################################################################################
|
||||
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
# ########################################################################################
|
||||
get_property(
|
||||
HIP_INCLUDE_DIRECTORIES
|
||||
TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
|
||||
# # Generate the HSA wrapper functions header
|
||||
add_custom_command(
|
||||
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py ${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h" "${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
|
||||
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
|
||||
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py
|
||||
${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
|
||||
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
|
||||
|
||||
# # Generate the HSA pretty printers
|
||||
add_custom_command(
|
||||
OUTPUT hsa_ostream_ops.h
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o hsa_ext_amd.h.i
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
-in hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
COMMENT "Generating hsa_ostream_ops.h...")
|
||||
OUTPUT hsa_ostream_ops.h
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
|
||||
hsa_ext_amd.h.i
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
|
||||
hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
COMMENT "Generating hsa_ostream_ops.h...")
|
||||
|
||||
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HIP_INCLUDE_DIRECTORIES
|
||||
TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
|
||||
## Generate the HIP pretty printers
|
||||
# Generate the HIP pretty printers
|
||||
add_custom_command(
|
||||
OUTPUT hip_ostream_ops.h
|
||||
COMMAND ${CMAKE_C_COMPILER} "$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
|
||||
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
|
||||
BYPRODUCTS hip_runtime_api.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
-in hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
|
||||
COMMENT "Generating hip_ostream_ops.h..."
|
||||
COMMAND_EXPAND_LISTS)
|
||||
OUTPUT hip_ostream_ops.h
|
||||
COMMAND
|
||||
${CMAKE_C_COMPILER}
|
||||
"$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
|
||||
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o
|
||||
hip_runtime_api.h.i
|
||||
BYPRODUCTS hip_runtime_api.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
|
||||
hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
|
||||
COMMENT "Generating hip_ostream_ops.h..."
|
||||
COMMAND_EXPAND_LISTS)
|
||||
|
||||
set(GENERATED_SOURCES
|
||||
hip_ostream_ops.h
|
||||
hsa_prof_str.h
|
||||
hsa_ostream_ops.h
|
||||
hsa_prof_str.inline.h)
|
||||
set(GENERATED_SOURCES hip_ostream_ops.h hsa_prof_str.h hsa_ostream_ops.h
|
||||
hsa_prof_str.inline.h)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ROCProfiler API
|
||||
# ############################################################################################################################################
|
||||
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is
|
||||
# unavailable
|
||||
# ########################################################################################
|
||||
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is unavailable
|
||||
find_path(PCIACCESS_INCLUDE_DIR pciaccess.h REQUIRED)
|
||||
find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED)
|
||||
|
||||
set(PUBLIC_HEADERS rocprofiler.h)
|
||||
|
||||
foreach(header ${PUBLIC_HEADERS})
|
||||
install(FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
|
||||
COMPONENT dev)
|
||||
endforeach()
|
||||
|
||||
install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
|
||||
install(
|
||||
FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
|
||||
COMPONENT dev)
|
||||
endforeach()
|
||||
|
||||
install(
|
||||
DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
|
||||
COMPONENT dev)
|
||||
|
||||
# Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils
|
||||
file(GLOB ROCPROFILER_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
|
||||
|
||||
file(GLOB ROCPROFILER_PROFILER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
|
||||
file(GLOB ROCPROFILER_TRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
|
||||
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
|
||||
file(GLOB ROCPROFILER_PROFILER_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
|
||||
file(GLOB ROCPROFILER_TRACER_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
|
||||
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
|
||||
file(GLOB ROCPROFILER_ATT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/att/att.cpp)
|
||||
file(GLOB ROCPROFILER_CLASS_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
|
||||
file(GLOB ROCPROFILER_CLASS_SRC_FILES
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
|
||||
file(GLOB ROCPROFILER_SPM_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/spm/spm.cpp)
|
||||
|
||||
|
||||
set(CORE_HARDWARE_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware)
|
||||
file(GLOB CORE_HARDWARE_SRC_FILES ${CORE_HARDWARE_DIR}/*.cpp)
|
||||
|
||||
@@ -180,148 +212,202 @@ file(GLOB CORE_COUNTERS_SAMPLER_SRC_FILES ${CORE_SESSION_DIR}/counters_sampler.c
|
||||
|
||||
file(GLOB CORE_COUNTERS_SRC_FILES ${PROJECT_BINARY_DIR}/src/api/*_counter.cpp)
|
||||
file(GLOB CORE_COUNTERS_PARENT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/*.cpp)
|
||||
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
|
||||
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
|
||||
file(GLOB CORE_COUNTERS_MMIO_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/mmio/*.cpp)
|
||||
|
||||
set(CORE_UTILS_DIR ${PROJECT_SOURCE_DIR}/src/utils)
|
||||
file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp)
|
||||
|
||||
set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler)
|
||||
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
|
||||
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp
|
||||
${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
|
||||
|
||||
|
||||
#### V1 Library
|
||||
# Compiling/Installing ROCProfiler API V1
|
||||
# V1 Library Compiling/Installing ROCProfiler API V1
|
||||
add_library(${ROCPROFILER_TARGET} SHARED ${OLD_LIB_SRC})
|
||||
set_target_properties(${ROCPROFILER_TARGET} PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION 1.0.0
|
||||
SOVERSION 1)
|
||||
set_target_properties(
|
||||
${ROCPROFILER_TARGET}
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION 1.0.0
|
||||
SOVERSION 1)
|
||||
|
||||
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
|
||||
target_compile_definitions(${ROCPROFILER_TARGET} PUBLIC AMD_INTERNAL_BUILD)
|
||||
target_include_directories(${ROCPROFILER_TARGET}
|
||||
PUBLIC
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
|
||||
PRIVATE
|
||||
${LIB_DIR} ${ROOT_DIR}
|
||||
${PROJECT_SOURCE_DIR}/include/rocprofiler)
|
||||
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 c stdc++)
|
||||
target_include_directories(
|
||||
${ROCPROFILER_TARGET}
|
||||
PUBLIC $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
|
||||
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include/rocprofiler)
|
||||
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB}
|
||||
hsa-runtime64::hsa-runtime64 c stdc++)
|
||||
|
||||
get_target_property(ROCPROFILER_LIBRARY_V1_NAME ${ROCPROFILER_TARGET} NAME)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V1_VERSION ${ROCPROFILER_TARGET} VERSION)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V1_SOVERSION ${ROCPROFILER_TARGET} SOVERSION)
|
||||
|
||||
## Install libraries: Non versioned lib file in dev package
|
||||
## Skipping NameLink as it will be installed using symlinks
|
||||
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
|
||||
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
|
||||
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
|
||||
# installed using symlinks
|
||||
install(
|
||||
TARGETS ${ROCPROFILER_TARGET}
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
TARGETS ${ROCPROFILER_TARGET}
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT asan)
|
||||
|
||||
#### V2 Library
|
||||
# Compiling/Installing ROCProfiler API
|
||||
add_library(rocprofiler-v2 SHARED
|
||||
${ROCPROFILER_SRC_FILES}
|
||||
${ROCPROFILER_CLASS_SRC_FILES}
|
||||
${ROCPROFILER_PROFILER_SRC_FILES}
|
||||
${ROCPROFILER_ATT_SRC_FILES}
|
||||
${CORE_HARDWARE_SRC_FILES}
|
||||
${CORE_HSA_SRC_FILES}
|
||||
${ROCPROFILER_SPM_SRC_FILES}
|
||||
${CORE_MEMORY_SRC_FILES}
|
||||
${CORE_SESSION_SRC_FILES}
|
||||
${CORE_FILTER_SRC_FILES}
|
||||
${CORE_DEVICE_PROFILING_SRC_FILES}
|
||||
${CORE_COUNTERS_SAMPLER_SRC_FILES}
|
||||
${CORE_COUNTERS_PARENT_SRC_FILES}
|
||||
${CORE_COUNTERS_METRICS_SRC_FILES}
|
||||
${CORE_COUNTERS_MMIO_SRC_FILES}
|
||||
${CORE_UTILS_SRC_FILES}
|
||||
${CORE_HSA_PACKETS_SRC_FILES}
|
||||
${CORE_HSA_QUEUES_SRC_FILES}
|
||||
${ROCPROFILER_TRACER_SRC_FILES}
|
||||
${ROCPROFILER_ROCTRACER_SRC_FILES}
|
||||
${GENERATED_SOURCES}
|
||||
${CORE_COUNTERS_SRC_FILES}
|
||||
${CORE_PC_SAMPLING_FILES})
|
||||
set_target_properties(rocprofiler-v2 PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
OUTPUT_NAME rocprofiler64
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION ${PROJECT_VERSION}
|
||||
SOVERSION ${PROJECT_VERSION_MAJOR})
|
||||
# V2 Library Compiling/Installing ROCProfiler API
|
||||
add_library(
|
||||
rocprofiler-v2 SHARED
|
||||
${ROCPROFILER_SRC_FILES}
|
||||
${ROCPROFILER_CLASS_SRC_FILES}
|
||||
${ROCPROFILER_PROFILER_SRC_FILES}
|
||||
${ROCPROFILER_ATT_SRC_FILES}
|
||||
${CORE_HARDWARE_SRC_FILES}
|
||||
${CORE_HSA_SRC_FILES}
|
||||
${ROCPROFILER_SPM_SRC_FILES}
|
||||
${CORE_MEMORY_SRC_FILES}
|
||||
${CORE_SESSION_SRC_FILES}
|
||||
${CORE_FILTER_SRC_FILES}
|
||||
${CORE_DEVICE_PROFILING_SRC_FILES}
|
||||
${CORE_COUNTERS_SAMPLER_SRC_FILES}
|
||||
${CORE_COUNTERS_PARENT_SRC_FILES}
|
||||
${CORE_COUNTERS_METRICS_SRC_FILES}
|
||||
${CORE_COUNTERS_MMIO_SRC_FILES}
|
||||
${CORE_UTILS_SRC_FILES}
|
||||
${CORE_HSA_PACKETS_SRC_FILES}
|
||||
${CORE_HSA_QUEUES_SRC_FILES}
|
||||
${ROCPROFILER_TRACER_SRC_FILES}
|
||||
${ROCPROFILER_ROCTRACER_SRC_FILES}
|
||||
${GENERATED_SOURCES}
|
||||
${CORE_COUNTERS_SRC_FILES}
|
||||
${CORE_PC_SAMPLING_FILES})
|
||||
set_target_properties(
|
||||
rocprofiler-v2
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
OUTPUT_NAME rocprofiler64
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION ${PROJECT_VERSION}
|
||||
SOVERSION ${PROJECT_VERSION_MAJOR})
|
||||
|
||||
target_compile_definitions(rocprofiler-v2
|
||||
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
|
||||
PRIVATE AMD_INTERNAL_BUILD
|
||||
PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1)
|
||||
target_include_directories(rocprofiler-v2
|
||||
PUBLIC
|
||||
${HIP_INCLUDE_DIRECTORIES} ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
|
||||
PRIVATE
|
||||
${LIB_DIR} ${ROOT_DIR}
|
||||
${CMAKE_CURRENT_BINARY_DIR}
|
||||
${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/tools)
|
||||
target_compile_definitions(
|
||||
rocprofiler-v2
|
||||
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
|
||||
PRIVATE AMD_INTERNAL_BUILD PROF_API_IMPL HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_AMD__=1)
|
||||
target_include_directories(
|
||||
rocprofiler-v2
|
||||
PUBLIC ${HIP_INCLUDE_DIRECTORIES}
|
||||
${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
|
||||
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${CMAKE_CURRENT_BINARY_DIR} ${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/tools)
|
||||
if(ASAN)
|
||||
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
|
||||
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address)
|
||||
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa asan dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
|
||||
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
|
||||
target_link_options(
|
||||
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
-Wl,--no-undefined,-fsanitize=address)
|
||||
target_link_libraries(
|
||||
rocprofiler-v2
|
||||
PRIVATE ${AQLPROFILE_LIB}
|
||||
hsa-runtime64::hsa-runtime64
|
||||
Threads::Threads
|
||||
atomic
|
||||
numa
|
||||
asan
|
||||
dl
|
||||
c
|
||||
stdc++
|
||||
stdc++fs
|
||||
amd_comgr
|
||||
${PCIACCESS_LIBRARIES})
|
||||
else()
|
||||
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined)
|
||||
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
|
||||
target_link_options(
|
||||
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(
|
||||
rocprofiler-v2
|
||||
PRIVATE ${AQLPROFILE_LIB}
|
||||
hsa-runtime64::hsa-runtime64
|
||||
Threads::Threads
|
||||
atomic
|
||||
numa
|
||||
dl
|
||||
c
|
||||
stdc++
|
||||
stdc++fs
|
||||
amd_comgr
|
||||
${PCIACCESS_LIBRARIES})
|
||||
endif()
|
||||
|
||||
get_target_property(ROCPROFILER_LIBRARY_V2_NAME rocprofiler-v2 OUTPUT_NAME)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V2_VERSION rocprofiler-v2 VERSION)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V2_SOVERSION rocprofiler-v2 SOVERSION)
|
||||
|
||||
## Prepare Name Link SO files for V1 & V2 Libraries
|
||||
add_custom_command(TARGET rocprofiler-v2 POST_BUILD
|
||||
COMMAND ${CMAKE_COMMAND} -E rm -f ${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so
|
||||
)
|
||||
# Prepare Name Link SO files for V1 & V2 Libraries
|
||||
add_custom_command(
|
||||
TARGET rocprofiler-v2
|
||||
POST_BUILD
|
||||
COMMAND ${CMAKE_COMMAND} -E rm -f
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so)
|
||||
# Add custom target to trigger the create_symlink command
|
||||
add_custom_target(create_rocprofiler_lib DEPENDS rocprofiler-v2 ${ROCPROFILER_TARGET})
|
||||
|
||||
## Install libraries: Non versioned lib file in dev package
|
||||
## Skipping NameLink as it will be installed using symlinks
|
||||
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
|
||||
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
|
||||
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
|
||||
# installed using symlinks
|
||||
install(
|
||||
TARGETS rocprofiler-v2
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
TARGETS rocprofiler-v2
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT asan)
|
||||
|
||||
## Installing NameLinks for V1 & V2
|
||||
## librocprofiler64.so links to V1 library
|
||||
## librocprofiler64v2.so links to V2 library
|
||||
install(CODE "execute_process( \
|
||||
# Installing NameLinks for V1 & V2 librocprofiler64.so links to V1 library
|
||||
# librocprofiler64v2.so links to V2 library
|
||||
install(
|
||||
CODE "execute_process( \
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink \
|
||||
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION} \
|
||||
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so \
|
||||
)" COMPONENT dev
|
||||
)
|
||||
install(CODE "execute_process( \
|
||||
)"
|
||||
COMPONENT dev)
|
||||
install(
|
||||
CODE "execute_process( \
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink \
|
||||
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION} \
|
||||
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so \
|
||||
)" COMPONENT dev
|
||||
)
|
||||
)"
|
||||
COMPONENT dev)
|
||||
|
||||
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
|
||||
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
|
||||
configure_file(
|
||||
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml
|
||||
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
|
||||
configure_file(
|
||||
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml
|
||||
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
|
||||
|
||||
install(DIRECTORY
|
||||
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
DIRECTORY ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
@@ -74,13 +74,14 @@ class ROCProfiler_Singleton {
|
||||
// Device Profiling Session
|
||||
bool FindDeviceProfilingSession(rocprofiler_session_id_t session_id);
|
||||
rocprofiler_session_id_t CreateDeviceProfilingSession(std::vector<std::string> counters,
|
||||
int cpu_agent_index, int gpu_agent_index);
|
||||
int cpu_agent_index, int gpu_agent_index);
|
||||
void DestroyDeviceProfilingSession(rocprofiler_session_id_t session_id);
|
||||
DeviceProfileSession* GetDeviceProfilingSession(rocprofiler_session_id_t session_id);
|
||||
|
||||
|
||||
// Generic
|
||||
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind, rocprofiler_filter_data_t filter_data);
|
||||
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind,
|
||||
rocprofiler_filter_data_t filter_data);
|
||||
uint64_t GetUniqueRecordId();
|
||||
uint64_t GetUniqueKernelDispatchId();
|
||||
|
||||
|
||||
@@ -11,8 +11,7 @@
|
||||
|
||||
// TODO(aelwazir): change that to adapt with our own Exception
|
||||
// What about outside exceptions and callbacks exceptions!!
|
||||
#define API_METHOD_PREFIX \
|
||||
try {
|
||||
#define API_METHOD_PREFIX try {
|
||||
#define API_METHOD_SUFFIX \
|
||||
} \
|
||||
catch (rocprofiler::Exception & e) { \
|
||||
|
||||
@@ -61,11 +61,11 @@ void check_status(hsa_status_t status) {
|
||||
namespace activity_prim {
|
||||
// PC sampling callback data
|
||||
struct pcsmp_callback_data_t {
|
||||
const char* kernel_name; // sampled kernel name
|
||||
void* data_buffer; // host buffer for tracing data
|
||||
uint64_t id; // sample id
|
||||
uint64_t cycle; // sample cycle
|
||||
uint64_t pc; // sample PC
|
||||
const char* kernel_name; // sampled kernel name
|
||||
void* data_buffer; // host buffer for tracing data
|
||||
uint64_t id; // sample id
|
||||
uint64_t cycle; // sample cycle
|
||||
uint64_t pc; // sample PC
|
||||
};
|
||||
|
||||
uint32_t activity_op = UINT32_MAX;
|
||||
@@ -74,9 +74,8 @@ std::atomic<activity_async_callback_t> activity_callback{NULL};
|
||||
rocprofiler_t* context = NULL;
|
||||
|
||||
hsa_status_t trace_data_cb(hsa_ven_amd_aqlprofile_info_type_t info_type,
|
||||
hsa_ven_amd_aqlprofile_info_data_t* info_data,
|
||||
void* data) {
|
||||
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*) data;
|
||||
hsa_ven_amd_aqlprofile_info_data_t* info_data, void* data) {
|
||||
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*)data;
|
||||
|
||||
activity_record_t record{};
|
||||
record.op = activity_op;
|
||||
@@ -96,11 +95,13 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
|
||||
hsa_agent_t agent{};
|
||||
hsa_status_t status = rocprofiler_get_agent(group.context, &agent);
|
||||
check_status(status);
|
||||
const rocprofiler::util::AgentInfo* agent_info = rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
|
||||
const rocprofiler::util::AgentInfo* agent_info =
|
||||
rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
|
||||
|
||||
pcsmp_callback_data_t pcsmp_data{};
|
||||
pcsmp_data.kernel_name = (const char*)arg;
|
||||
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(agent_info, rocprofiler::TraceProfile::GetSize());
|
||||
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(
|
||||
agent_info, rocprofiler::TraceProfile::GetSize());
|
||||
status = rocprofiler_iterate_trace_data(group.context, trace_data_cb, &pcsmp_data);
|
||||
check_status(status);
|
||||
return false;
|
||||
@@ -110,8 +111,8 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
|
||||
hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data, void* user_data,
|
||||
rocprofiler_group_t* group) {
|
||||
// context features
|
||||
const rocprofiler_feature_kind_t trace_kind =
|
||||
(rocprofiler_feature_kind_t)(ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
|
||||
const rocprofiler_feature_kind_t trace_kind = (rocprofiler_feature_kind_t)(
|
||||
ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
|
||||
const uint32_t feature_count = 1;
|
||||
const uint32_t parameter_count = 1;
|
||||
rocprofiler_feature_t* features = new rocprofiler_feature_t[feature_count];
|
||||
@@ -131,8 +132,8 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
|
||||
properties.handler_arg = (void*)strdup(callback_data->kernel_name);
|
||||
|
||||
// Open profiling context
|
||||
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count,
|
||||
&context, 0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
|
||||
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count, &context,
|
||||
0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
|
||||
check_status(status);
|
||||
|
||||
// Get group[0]
|
||||
@@ -141,7 +142,7 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
|
||||
|
||||
return status;
|
||||
}
|
||||
} // namespace activity_prim
|
||||
} // namespace activity_prim
|
||||
|
||||
extern "C" {
|
||||
PUBLIC_API const char* GetOpName(uint32_t op) { return strdup("PCSAMPLE"); }
|
||||
@@ -152,7 +153,8 @@ PUBLIC_API bool RemoveApiCallback(uint32_t op) { return true; }
|
||||
|
||||
PUBLIC_API bool InitActivityCallback(void* callback, void* arg) {
|
||||
activity_prim::activity_arg = arg;
|
||||
activity_prim::activity_callback.store((activity_async_callback_t)callback, std::memory_order_release);
|
||||
activity_prim::activity_callback.store((activity_async_callback_t)callback,
|
||||
std::memory_order_release);
|
||||
|
||||
rocprofiler_queue_callbacks_t queue_callbacks{};
|
||||
queue_callbacks.dispatch = activity_prim::dispatch_callback;
|
||||
@@ -191,11 +193,8 @@ struct evt_cb_entry_t {
|
||||
};
|
||||
evt_cb_entry_t evt_cb_table[HSA_EVT_ID_NUMBER];
|
||||
|
||||
hsa_status_t codeobj_evt_callback(
|
||||
rocprofiler_hsa_cb_id_t id,
|
||||
const rocprofiler_hsa_callback_data_t* cb_data,
|
||||
void* arg)
|
||||
{
|
||||
hsa_status_t codeobj_evt_callback(rocprofiler_hsa_cb_id_t id,
|
||||
const rocprofiler_hsa_callback_data_t* cb_data, void* arg) {
|
||||
const auto evt = evt_cb_table[id].get();
|
||||
activity_rtapi_callback_t evt_callback = (activity_rtapi_callback_t)evt.first;
|
||||
if (evt_callback != NULL) evt_callback(ACTIVITY_DOMAIN_HSA_EVT, id, cb_data, evt.second);
|
||||
|
||||
@@ -19,4 +19,4 @@ enum hsa_evt_id_t {
|
||||
// HSA EVT callback data type
|
||||
typedef rocprofiler_hsa_callback_data_t hsa_evt_data_t;
|
||||
|
||||
#endif // _SRC_CORE_ACTIVITY_H
|
||||
#endif // _SRC_CORE_ACTIVITY_H
|
||||
|
||||
@@ -27,7 +27,7 @@ THE SOFTWARE.
|
||||
|
||||
#include <hsa/hsa.h>
|
||||
#include <hsa/hsa_ext_amd.h>
|
||||
#include <unistd.h> // usleep
|
||||
#include <unistd.h> // usleep
|
||||
#include <atomic>
|
||||
#include <list>
|
||||
#include <map>
|
||||
@@ -91,8 +91,7 @@ class Group {
|
||||
barrier_signal_{},
|
||||
dispatch_signal_{},
|
||||
orig_signal_{},
|
||||
record_{}
|
||||
{}
|
||||
record_{} {}
|
||||
|
||||
void Insert(const profile_info_t& info) {
|
||||
const rocprofiler_feature_kind_t kind = info.rinfo->kind;
|
||||
@@ -110,11 +109,10 @@ class Group {
|
||||
}
|
||||
|
||||
hsa_status_t Finalize(const bool is_concurrent = false) {
|
||||
hsa_status_t status = pmc_profile_.Finalize(start_vector_, stop_vector_,
|
||||
read_vector_, is_concurrent);
|
||||
hsa_status_t status =
|
||||
pmc_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
|
||||
if (status == HSA_STATUS_SUCCESS) {
|
||||
status = trace_profile_.Finalize(start_vector_, stop_vector_,
|
||||
read_vector_, is_concurrent);
|
||||
status = trace_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
|
||||
}
|
||||
if (status == HSA_STATUS_SUCCESS) {
|
||||
if (!pmc_profile_.Empty()) ++n_profiles_;
|
||||
@@ -137,32 +135,20 @@ class Group {
|
||||
Context* GetContext() { return context_; }
|
||||
uint32_t GetIndex() const { return index_; }
|
||||
|
||||
void SetBarrierSignal(const hsa_signal_t &signal) {
|
||||
barrier_signal_ = signal;
|
||||
}
|
||||
hsa_signal_t& GetBarrierSignal() {
|
||||
return barrier_signal_;
|
||||
}
|
||||
void SetDispatchSignal(const hsa_signal_t &signal) {
|
||||
dispatch_signal_ = signal;
|
||||
}
|
||||
hsa_signal_t& GetDispatchSignal() {
|
||||
return dispatch_signal_;
|
||||
}
|
||||
void SetOrigSignal(const hsa_signal_t &signal) {
|
||||
orig_signal_ = signal;
|
||||
}
|
||||
const hsa_signal_t& GetOrigSignal() const {
|
||||
return orig_signal_;
|
||||
}
|
||||
rocprofiler_dispatch_record_t* GetRecord() {
|
||||
return &record_;
|
||||
}
|
||||
void SetBarrierSignal(const hsa_signal_t& signal) { barrier_signal_ = signal; }
|
||||
hsa_signal_t& GetBarrierSignal() { return barrier_signal_; }
|
||||
void SetDispatchSignal(const hsa_signal_t& signal) { dispatch_signal_ = signal; }
|
||||
hsa_signal_t& GetDispatchSignal() { return dispatch_signal_; }
|
||||
void SetOrigSignal(const hsa_signal_t& signal) { orig_signal_ = signal; }
|
||||
const hsa_signal_t& GetOrigSignal() const { return orig_signal_; }
|
||||
rocprofiler_dispatch_record_t* GetRecord() { return &record_; }
|
||||
|
||||
atomic_refs_t* AtomicRefsCount() { return reinterpret_cast<atomic_refs_t*>(&refs_); }
|
||||
void ResetRefsCount() { AtomicRefsCount()->store(n_profiles_, std::memory_order_release); }
|
||||
void IncrRefsCount() { AtomicRefsCount()->fetch_add(1, std::memory_order_acq_rel); }
|
||||
uint32_t FetchDecrRefsCount() { return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel); }
|
||||
uint32_t FetchDecrRefsCount() {
|
||||
return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel);
|
||||
}
|
||||
|
||||
private:
|
||||
PmcProfile pmc_profile_;
|
||||
@@ -188,23 +174,23 @@ class Context {
|
||||
public:
|
||||
typedef std::map<std::string, rocprofiler_feature_t*> info_map_t;
|
||||
|
||||
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
|
||||
{
|
||||
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue,
|
||||
rocprofiler_feature_t* info, const uint32_t info_count,
|
||||
rocprofiler_handler_t handler, void* handler_arg) {
|
||||
new (obj) Context(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
}
|
||||
|
||||
static void Release(Context* obj) { obj->Destruct(); }
|
||||
|
||||
static Context* Create(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
|
||||
{
|
||||
static Context* Create(const util::AgentInfo* agent_info, Queue* queue,
|
||||
rocprofiler_feature_t* info, const uint32_t info_count,
|
||||
rocprofiler_handler_t handler, void* handler_arg) {
|
||||
Context* obj = new Context(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
|
||||
try {
|
||||
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
} catch(...) {
|
||||
} catch (...) {
|
||||
delete obj;
|
||||
obj = NULL;
|
||||
std::cerr << "Error: Context Create failed" << std::endl;
|
||||
@@ -213,7 +199,9 @@ class Context {
|
||||
return obj;
|
||||
}
|
||||
|
||||
static void Destroy(Context* obj) { if (obj != NULL) delete obj; }
|
||||
static void Destroy(Context* obj) {
|
||||
if (obj != NULL) delete obj;
|
||||
}
|
||||
|
||||
void Reset(const uint32_t& group_index) { set_[group_index].ResetRefsCount(); }
|
||||
|
||||
@@ -293,8 +281,10 @@ class Context {
|
||||
hsa_rsrc_->SignalWaitRestore(tuple.completion_signal, 1);
|
||||
// Restore other signals
|
||||
RestoreSignals(tuple);
|
||||
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector)) rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
|
||||
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(), NULL};
|
||||
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector))
|
||||
rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
|
||||
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(),
|
||||
NULL};
|
||||
const hsa_status_t status =
|
||||
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, DataCallback, &callback_data);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
|
||||
@@ -310,7 +300,8 @@ class Context {
|
||||
if (expr) {
|
||||
auto it = info_map_.find(name);
|
||||
if (it == info_map_.end())
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "metric '" << name << "', rocprofiler info is not found " << this);
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"metric '" << name << "', rocprofiler info is not found " << this);
|
||||
rocprofiler_feature_t* info = it->second;
|
||||
info->data.result_double = expr->Eval(args);
|
||||
info->data.kind = ROCPROFILER_DATA_KIND_DOUBLE;
|
||||
@@ -324,7 +315,7 @@ class Context {
|
||||
for (auto& tuple : profile_vector) {
|
||||
if (pcsmp_mode_) const_cast<profile_t*>(tuple.profile)->event_count = UINT32_MAX;
|
||||
const hsa_status_t status =
|
||||
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
|
||||
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
|
||||
}
|
||||
}
|
||||
@@ -342,7 +333,10 @@ class Context {
|
||||
|
||||
hsa_agent_t GetAgent() const { return agent_; }
|
||||
Group* GetGroup(const uint32_t& index) { return &set_[index]; }
|
||||
rocprofiler_handler_t GetHandler(void** arg) const { *arg = handler_arg_; return handler_; }
|
||||
rocprofiler_handler_t GetHandler(void** arg) const {
|
||||
*arg = handler_arg_;
|
||||
return handler_;
|
||||
}
|
||||
|
||||
// Concurrent profiling mode
|
||||
static bool k_concurrent_;
|
||||
@@ -358,8 +352,7 @@ class Context {
|
||||
metrics_(NULL),
|
||||
handler_(handler),
|
||||
handler_arg_(handler_arg),
|
||||
pcsmp_mode_(false)
|
||||
{}
|
||||
pcsmp_mode_(false) {}
|
||||
|
||||
~Context() { Destruct(); }
|
||||
|
||||
@@ -375,8 +368,7 @@ class Context {
|
||||
}
|
||||
|
||||
void Construct(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
|
||||
{
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg) {
|
||||
if (info_count == 0) {
|
||||
set_.push_back(Group(agent_info_, this, 0));
|
||||
return;
|
||||
@@ -386,9 +378,11 @@ class Context {
|
||||
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
|
||||
|
||||
if (Initialize(info, info_count) == false) {
|
||||
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n"); fflush(stdout);
|
||||
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n");
|
||||
fflush(stdout);
|
||||
MetricsGroupSet(agent_info, info, info_count).Print(stdout);
|
||||
fprintf(stdout, "\n"); fflush(stdout);
|
||||
fprintf(stdout, "\n");
|
||||
fflush(stdout);
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Metrics list exceeds HW limits");
|
||||
}
|
||||
Finalize();
|
||||
@@ -420,8 +414,8 @@ class Context {
|
||||
info_map_[name] = info;
|
||||
auto ret = metrics_map_.insert({name, NULL});
|
||||
if (!ret.second)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name
|
||||
<< "' is registered more then once");
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"input metric '" << name << "' is registered more then once");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -437,8 +431,9 @@ class Context {
|
||||
if (kind == ROCPROFILER_FEATURE_KIND_METRIC) { // Processing metrics features
|
||||
const Metric* metric = metrics_->Get(name);
|
||||
if (metric == NULL)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not supported on this hardware: "
|
||||
<< agent_info_->name);
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"input metric '"
|
||||
<< name << "' is not supported on this hardware: " << agent_info_->name);
|
||||
#if 0
|
||||
std::cout << " " << name << (metric->GetExpr() ? " = " + metric->GetExpr()->String() : " counter") << std::endl;
|
||||
#endif
|
||||
@@ -493,9 +488,9 @@ class Context {
|
||||
info->kind = ROCPROFILER_FEATURE_KIND_TRACE;
|
||||
|
||||
const event_t* event = NULL;
|
||||
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
|
||||
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
|
||||
pcsmp_mode_ = true;
|
||||
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
|
||||
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
|
||||
const Metric* metric = metrics_->Get(name);
|
||||
if (metric == NULL)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
|
||||
@@ -559,14 +554,14 @@ class Context {
|
||||
const bool trace_local = TraceProfile::IsLocal();
|
||||
util::HsaRsrcFactory* hsa_rsrc = &util::HsaRsrcFactory::Instance();
|
||||
if (sample_id == 0) {
|
||||
const uint32_t output_buffer_size = profile->output_buffer.size;
|
||||
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
|
||||
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
|
||||
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size) :
|
||||
calloc(output_buffer_size64, sizeof(uint64_t));
|
||||
rinfo->data.result_bytes.size = output_buffer_size;
|
||||
rinfo->data.result_bytes.ptr = ptr;
|
||||
callback_data->ptr = reinterpret_cast<char*>(ptr);
|
||||
const uint32_t output_buffer_size = profile->output_buffer.size;
|
||||
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
|
||||
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
|
||||
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size)
|
||||
: calloc(output_buffer_size64, sizeof(uint64_t));
|
||||
rinfo->data.result_bytes.size = output_buffer_size;
|
||||
rinfo->data.result_bytes.ptr = ptr;
|
||||
callback_data->ptr = reinterpret_cast<char*>(ptr);
|
||||
}
|
||||
char* result_bytes_ptr = reinterpret_cast<char*>(rinfo->data.result_bytes.ptr);
|
||||
const char* end = result_bytes_ptr + rinfo->data.result_bytes.size;
|
||||
@@ -577,8 +572,10 @@ class Context {
|
||||
char* dest = ptr + sizeof(*header);
|
||||
|
||||
if ((dest + size) >= end) {
|
||||
if (dest < end) size = end - dest;
|
||||
else EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
|
||||
if (dest < end)
|
||||
size = end - dest;
|
||||
else
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
|
||||
}
|
||||
|
||||
bool suc = true;
|
||||
@@ -593,7 +590,9 @@ class Context {
|
||||
rinfo->data.result_bytes.instance_count = sample_id + 1;
|
||||
rinfo->data.kind = ROCPROFILER_DATA_KIND_BYTES;
|
||||
} else
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src << ") size(" << size << ")");
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src
|
||||
<< ") size(" << size << ")");
|
||||
} else {
|
||||
if (sample_id == 0) {
|
||||
rinfo->data.result_bytes.ptr = profile->output_buffer.ptr;
|
||||
@@ -647,8 +646,7 @@ class Context {
|
||||
bool pcsmp_mode_;
|
||||
};
|
||||
|
||||
#define CONTEXT_INSTANTIATE() \
|
||||
bool rocprofiler::Context::k_concurrent_ = false;
|
||||
#define CONTEXT_INSTANTIATE() bool rocprofiler::Context::k_concurrent_ = false;
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
|
||||
@@ -31,7 +31,7 @@ THE SOFTWARE.
|
||||
|
||||
namespace rocprofiler {
|
||||
class ContextPool {
|
||||
public:
|
||||
public:
|
||||
typedef uint64_t index_t;
|
||||
typedef std::mutex mutex_t;
|
||||
|
||||
@@ -41,16 +41,12 @@ class ContextPool {
|
||||
std::atomic<bool> completed;
|
||||
};
|
||||
|
||||
static ContextPool* Create(
|
||||
uint32_t num_entries,
|
||||
uint32_t payload_bytes,
|
||||
const util::AgentInfo* agent_info,
|
||||
rocprofiler_feature_t* info,
|
||||
const uint32_t info_count,
|
||||
rocprofiler_pool_handler_t handler,
|
||||
void* handler_arg)
|
||||
{
|
||||
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count, handler, handler_arg);
|
||||
static ContextPool* Create(uint32_t num_entries, uint32_t payload_bytes,
|
||||
const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_pool_handler_t handler,
|
||||
void* handler_arg) {
|
||||
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count,
|
||||
handler, handler_arg);
|
||||
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
|
||||
return obj;
|
||||
}
|
||||
@@ -61,18 +57,18 @@ class ContextPool {
|
||||
if (constructed_ == false) {
|
||||
Construct(agent_info_, info_, info_count_);
|
||||
}
|
||||
const index_t write_index = write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
|
||||
const index_t write_index =
|
||||
write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
|
||||
while (write_index >= (read_index_.load(std::memory_order_acquire) + array_size_bytes_)) {
|
||||
check_completed();
|
||||
std::this_thread::yield();
|
||||
}
|
||||
entry_t* entry = GetPoolEntry(write_index, pool_entry);
|
||||
if (entry->completed.load(std::memory_order_relaxed) != false) EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
|
||||
if (entry->completed.load(std::memory_order_relaxed) != false)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
|
||||
}
|
||||
|
||||
void Flush() {
|
||||
check_completed();
|
||||
}
|
||||
void Flush() { check_completed(); }
|
||||
#if 0
|
||||
template <class F>
|
||||
F for_each(const F& f_p) {
|
||||
@@ -95,7 +91,7 @@ class ContextPool {
|
||||
return f;
|
||||
}
|
||||
#endif
|
||||
private:
|
||||
private:
|
||||
static unsigned aligned64(const unsigned& size) { return (size + 0x3f) & ~0x3fu; }
|
||||
|
||||
static bool context_handler(rocprofiler_group_t group, void* arg) {
|
||||
@@ -105,45 +101,41 @@ class ContextPool {
|
||||
return true;
|
||||
}
|
||||
|
||||
ContextPool(
|
||||
uint32_t num_entries,
|
||||
uint32_t payload_bytes,
|
||||
const util::AgentInfo* agent_info,
|
||||
rocprofiler_feature_t* info,
|
||||
const uint32_t info_count,
|
||||
rocprofiler_pool_handler_t pool_handler,
|
||||
void* pool_handler_arg
|
||||
) :
|
||||
payload_off_(aligned64(sizeof(entry_t))),
|
||||
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
|
||||
array_size_bytes_(entry_size_bytes_ * num_entries),
|
||||
array_(NULL),
|
||||
read_index_(0),
|
||||
write_index_(0),
|
||||
sync_flag_(false),
|
||||
ContextPool(uint32_t num_entries, uint32_t payload_bytes, const util::AgentInfo* agent_info,
|
||||
rocprofiler_feature_t* info, const uint32_t info_count,
|
||||
rocprofiler_pool_handler_t pool_handler, void* pool_handler_arg)
|
||||
: payload_off_(aligned64(sizeof(entry_t))),
|
||||
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
|
||||
array_size_bytes_(entry_size_bytes_ * num_entries),
|
||||
array_(NULL),
|
||||
read_index_(0),
|
||||
write_index_(0),
|
||||
sync_flag_(false),
|
||||
|
||||
agent_info_(agent_info),
|
||||
info_(info),
|
||||
info_count_(info_count),
|
||||
pool_handler_(pool_handler),
|
||||
pool_handler_arg_(pool_handler_arg),
|
||||
constructed_(false)
|
||||
{}
|
||||
agent_info_(agent_info),
|
||||
info_(info),
|
||||
info_count_(info_count),
|
||||
pool_handler_(pool_handler),
|
||||
pool_handler_arg_(pool_handler_arg),
|
||||
constructed_(false) {}
|
||||
|
||||
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info, const uint32_t info_count) {
|
||||
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count) {
|
||||
std::lock_guard<mutex_t> lck(mutex_);
|
||||
|
||||
if (constructed_ == false) {
|
||||
array_data_ = (char*) malloc(array_size_bytes_ + 0x3f);
|
||||
array_data_ = (char*)malloc(array_size_bytes_ + 0x3f);
|
||||
array_ = reinterpret_cast<char*>(((intptr_t)array_data_ + 0x3f) >> 6 << 6);
|
||||
if (((intptr_t)array_ & 0x3f) != 0) EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
|
||||
if (((intptr_t)array_ & 0x3f) != 0)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
|
||||
memset(array_, 0, array_size_bytes_);
|
||||
|
||||
const char* end = array_ + array_size_bytes_;
|
||||
for (char* ptr = array_; ptr < end; ptr += entry_size_bytes_) {
|
||||
entry_t* entry = reinterpret_cast<entry_t*>(ptr);
|
||||
entry->pool = this;
|
||||
entry->context = Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
|
||||
entry->context =
|
||||
Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
|
||||
}
|
||||
|
||||
constructed_ = true;
|
||||
@@ -175,7 +167,7 @@ class ContextPool {
|
||||
if (sync_flag_.test_and_set(std::memory_order_acquire) == false) {
|
||||
index_t read_index = read_index_.load(std::memory_order_relaxed);
|
||||
const index_t write_index = write_index_.load(std::memory_order_relaxed);
|
||||
while(read_index < write_index) {
|
||||
while (read_index < write_index) {
|
||||
rocprofiler_pool_entry_t pool_entry{};
|
||||
entry_t* entry = GetPoolEntry(read_index, &pool_entry);
|
||||
if (entry->completed.load(std::memory_order_acquire) == true) {
|
||||
|
||||
@@ -1,8 +1,7 @@
|
||||
#ifndef _CORE_TIMER_H_
|
||||
#define _CORE_TIMER_H_
|
||||
|
||||
template <int Size>
|
||||
class CoreTimer {
|
||||
template <int Size> class CoreTimer {
|
||||
CoreTimer() {
|
||||
index_ = 0;
|
||||
freq_in_100mhz_ = MeasureTSCFreqHz();
|
||||
@@ -20,15 +19,15 @@ class CoreTimer {
|
||||
// AMD Linux timing
|
||||
unsigned int unused;
|
||||
n = __rdtscp(&unused);
|
||||
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
|
||||
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
|
||||
index_ += 1;
|
||||
}
|
||||
|
||||
double Print()
|
||||
double Print()
|
||||
|
||||
private:
|
||||
// timer data
|
||||
double data_[Size];
|
||||
private :
|
||||
// timer data
|
||||
double data_[Size];
|
||||
// data index
|
||||
uint32_t index_;
|
||||
// frequency
|
||||
@@ -40,20 +39,20 @@ class CoreTimer {
|
||||
clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
|
||||
return uint64_t(ts.tv_sec) * 1000000 + ts.tv_nsec / 1000;
|
||||
}
|
||||
|
||||
|
||||
uint64_t CoreTimer::MeasureTSCFreqHz() {
|
||||
// Make a coarse interval measurement of TSC ticks for 1 gigacycles.
|
||||
unsigned int unused;
|
||||
uint64_t tscTicksEnd;
|
||||
|
||||
|
||||
uint64_t coarseBeginUs = CoarseTimestampUs();
|
||||
uint64_t tscTicksBegin = __rdtscp(&unused);
|
||||
do {
|
||||
tscTicksEnd = __rdtscp(&unused);
|
||||
} while (tscTicksEnd - tscTicksBegin < 1000000000);
|
||||
|
||||
|
||||
uint64_t coarseEndUs = CoarseTimestampUs();
|
||||
|
||||
|
||||
// Compute the TSC frequency and round to nearest 100MHz.
|
||||
uint64_t coarseIntervalNs = (coarseEndUs - coarseBeginUs) * 1000;
|
||||
uint64_t tscIntervalTicks = tscTicksEnd - tscTicksBegin;
|
||||
@@ -61,4 +60,4 @@ class CoreTimer {
|
||||
}
|
||||
};
|
||||
|
||||
#endif // _CORE_TIMER_H_
|
||||
#endif // _CORE_TIMER_H_
|
||||
|
||||
@@ -27,8 +27,7 @@ namespace Counter {
|
||||
|
||||
static std::atomic<uint64_t> COUNTER_COUNTER{0};
|
||||
|
||||
DerivedCounter::DerivedCounter(std::string name, std::string description,
|
||||
std::string gpu_name)
|
||||
DerivedCounter::DerivedCounter(std::string name, std::string description, std::string gpu_name)
|
||||
: Counter(name, description, gpu_name) {
|
||||
metric_id_ = COUNTER_COUNTER.fetch_add(1, std::memory_order_release);
|
||||
addCounterToCounterMap();
|
||||
@@ -41,20 +40,17 @@ DerivedCounter::~DerivedCounter() {
|
||||
|
||||
uint64_t DerivedCounter::getMetricId() { return metric_id_; }
|
||||
|
||||
std::map<uint64_t, BasicCounter*> *DerivedCounter::getAllCounters() {
|
||||
return &counters_;
|
||||
}
|
||||
std::map<uint64_t, BasicCounter*>* DerivedCounter::getAllCounters() { return &counters_; }
|
||||
|
||||
BasicCounter *DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
|
||||
BasicCounter* DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
|
||||
return counters_[counter_id];
|
||||
}
|
||||
|
||||
void DerivedCounter::addBasicCounter(uint64_t counter_id,
|
||||
BasicCounter *counter) {
|
||||
void DerivedCounter::addBasicCounter(uint64_t counter_id, BasicCounter* counter) {
|
||||
counters_.emplace(counter_id, counter);
|
||||
}
|
||||
|
||||
@DERIVED_XML_PARSE_RESULT@
|
||||
@DERIVED_XML_PARSE_RESULT @
|
||||
|
||||
} // namespace Counter
|
||||
|
||||
|
||||
@@ -39,8 +39,7 @@ namespace Counter {
|
||||
class DerivedCounter : Counter {
|
||||
public:
|
||||
std::function<uint64_t()> evaluate_metric;
|
||||
DerivedCounter(std::string name, std::string description,
|
||||
std::string gpu_name);
|
||||
DerivedCounter(std::string name, std::string description, std::string gpu_name);
|
||||
~DerivedCounter();
|
||||
|
||||
uint64_t getMetricId();
|
||||
|
||||
@@ -108,7 +108,7 @@ bool metrics::ExtractMetricEvents(
|
||||
// adding result object for derived metric
|
||||
std::lock_guard<std::mutex> lock(extract_metric_events_lock);
|
||||
|
||||
if(metric_names[i].compare("KERNEL_DURATION")==0) {
|
||||
if (metric_names[i].compare("KERNEL_DURATION") == 0) {
|
||||
if (results_map.find(metric_names[i]) == results_map.end()) {
|
||||
results_map[metric_names[i]] = new results_t(metric_names[i], {}, xcc_count);
|
||||
}
|
||||
@@ -192,7 +192,7 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
|
||||
auto it = results_map.find(metric->GetName());
|
||||
if (it == results_map.end()) rocprofiler::fatal("metric results not found ");
|
||||
results_t* res = it->second;
|
||||
if(metric->GetName().compare("KERNEL_DURATION") == 0) {
|
||||
if (metric->GetName().compare("KERNEL_DURATION") == 0) {
|
||||
res->val_double = kernel_duration;
|
||||
continue;
|
||||
}
|
||||
@@ -206,7 +206,8 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
|
||||
void metrics::GetCountersAndMetricResultsByXcc(uint32_t xcc_index,
|
||||
std::vector<results_t*>& results_list,
|
||||
std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration) {
|
||||
std::vector<const Metric*>& metrics_list,
|
||||
uint64_t kernel_duration) {
|
||||
for (auto it = results_list.begin(); it != results_list.end(); it++) {
|
||||
(*it)->val_double =
|
||||
(*it)->xcc_vals[xcc_index]; // set val_double to hold value for specific xcc
|
||||
|
||||
@@ -35,10 +35,10 @@ namespace rocprofiler {
|
||||
|
||||
typedef std::vector<double> xcc_results_t;
|
||||
|
||||
class results_t{
|
||||
public:
|
||||
results_t(std::string in_name, event_t in_event, uint32_t xcc_count):
|
||||
name(in_name), val_double(0), event(in_event) {
|
||||
class results_t {
|
||||
public:
|
||||
results_t(std::string in_name, event_t in_event, uint32_t xcc_count)
|
||||
: name(in_name), val_double(0), event(in_event) {
|
||||
xcc_vals.resize(xcc_count);
|
||||
std::fill(xcc_vals.begin(), xcc_vals.end(), 0);
|
||||
}
|
||||
@@ -78,8 +78,9 @@ bool GetMetricsData(std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
|
||||
|
||||
void GetCountersAndMetricResultsByXcc(uint32_t xcc_index, std::vector<results_t*>& results_list,
|
||||
std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
|
||||
std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list,
|
||||
uint64_t kernel_duration = 0);
|
||||
|
||||
} // namespace metrics
|
||||
} // namespace rocprofiler
|
||||
|
||||
@@ -45,7 +45,7 @@ THE SOFTWARE.
|
||||
do { \
|
||||
std::ostringstream oss; \
|
||||
oss << __FUNCTION__ << "(), " << stream; \
|
||||
throw rocprofiler::util::exception(error, oss.str()); \
|
||||
throw rocprofiler::util::exception(error, oss.str()); \
|
||||
} while (0)
|
||||
|
||||
#define AQL_EXC_RAISING(error, stream) \
|
||||
|
||||
Исполняемый файл → Обычный файл
+3
-6
@@ -221,14 +221,11 @@ class MetricsDict {
|
||||
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
|
||||
|
||||
std::unordered_set<std::string> supported_agent_names = {
|
||||
"gfx906",
|
||||
"gfx908",
|
||||
"gfx906", "gfx908",
|
||||
"gfx90a", // Vega
|
||||
"gfx940",
|
||||
"gfx941",
|
||||
"gfx940", "gfx941",
|
||||
"gfx942", // Mi300
|
||||
"gfx1030",
|
||||
"gfx1031",
|
||||
"gfx1030", "gfx1031",
|
||||
"gfx1032", // Navi2x
|
||||
"gfx1100",
|
||||
"gfx1101" // Navi3x
|
||||
|
||||
@@ -17,8 +17,8 @@ class DFPerfMonMI200 : public PerfMon {
|
||||
DFPerfMonMI200(const Agent::AgentInfo& info);
|
||||
~DFPerfMonMI200();
|
||||
void Start() override;
|
||||
void Stop() {};
|
||||
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {};
|
||||
void Stop(){};
|
||||
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){};
|
||||
void SetCounterNames(std::vector<std::string>& counter_names);
|
||||
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::DF_PERFMON; }
|
||||
|
||||
@@ -31,7 +31,6 @@ class DFPerfMonMI200 : public PerfMon {
|
||||
uint64_t GetFicaNodeOutboundBw(uint32_t ficaa_val);
|
||||
|
||||
|
||||
|
||||
private:
|
||||
mmio::DFPerfmonMMIO* mmio_;
|
||||
static std::mutex mutex_; // should be an MMIO member
|
||||
|
||||
@@ -13,12 +13,12 @@ PciePerfMonMI200::~PciePerfMonMI200() {
|
||||
mmio::MMIOManager::DestroyMMIOInstance(dynamic_cast<mmio::MMIO*>(mmio_));
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value){
|
||||
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value) {
|
||||
// mmio or ioctl approaches
|
||||
mmio_->RegisterWriteAPI(reg_offset, value);
|
||||
mmio_->RegisterWriteAPI(reg_offset, value);
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value){
|
||||
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value) {
|
||||
// mmio or ioctl approaches
|
||||
mmio_->RegisterReadAPI(reg_offset, value);
|
||||
}
|
||||
@@ -35,44 +35,40 @@ void PciePerfMonMI200::SetCounterNames(std::vector<std::string>& counter_names)
|
||||
}
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Start(){
|
||||
void PciePerfMonMI200::Start() {
|
||||
// TODO: make sure values stored in table
|
||||
// in registers header are dec and not hex
|
||||
|
||||
Start_RX_TILE_SCLK(event_id_);
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Stop(){
|
||||
void PciePerfMonMI200::Stop() {
|
||||
// TODO: revisit correct value to stop
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){
|
||||
uint64_t val=0;
|
||||
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {
|
||||
uint64_t val = 0;
|
||||
Read_RX_TILE_SCLK(val);
|
||||
rocprofiler_counters_sampler_counter_output_t value = {
|
||||
ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
|
||||
static_cast<double>(val)
|
||||
};
|
||||
rocprofiler_counters_sampler_counter_output_t value = {ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
|
||||
static_cast<double>(val)};
|
||||
values.push_back(value);
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event){
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event) {
|
||||
// Step 1: PORT SEL update
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
|
||||
|
||||
// Step 2: EVENT SEL update
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_TXCLK3, value);
|
||||
|
||||
// Steps 3 & 4: Performance counters initialization, enable:
|
||||
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
|
||||
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
|
||||
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result) {
|
||||
// Step 5: Performance counters read:
|
||||
uint32_t lo_val, hi_val;
|
||||
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_TXCLK3, lo_val);
|
||||
@@ -84,22 +80,20 @@ void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
|
||||
result = val | lo_val;
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event){
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event) {
|
||||
// Step 1: PORT SEL update
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
|
||||
|
||||
// Step 2: EVENT SEL update
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_LCLK1, value);
|
||||
|
||||
// Steps 3 & 4: Performance counters initialization, enable:
|
||||
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
|
||||
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
|
||||
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result) {
|
||||
// Step 5: Performance counters read:
|
||||
uint32_t lo_val, hi_val;
|
||||
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_LCLK1, lo_val);
|
||||
@@ -111,6 +105,4 @@ void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
|
||||
result = val | lo_val;
|
||||
}
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
@@ -22,7 +22,7 @@ class PciePerfMonMI200 : public PerfMon {
|
||||
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::PCIE_PERFMON; }
|
||||
|
||||
private:
|
||||
// TODO : check google coding std
|
||||
// TODO : check google coding std
|
||||
void writeRegister(uint32_t reg_offset, uint32_t value);
|
||||
void readRegister(uint32_t reg_offset, uint32_t& value);
|
||||
|
||||
|
||||
+236
-237
@@ -4,70 +4,70 @@
|
||||
#include <stdint.h>
|
||||
|
||||
namespace PCIE_MI200 {
|
||||
|
||||
|
||||
// -------- RX Tile TXCLK Start --------
|
||||
|
||||
// Step 1: PORT SEL update
|
||||
const static uint32_t PCIE_PERF_CNTL_EVENT_CI_PORT_SEL = 0x11180250;
|
||||
|
||||
// Step 2: EVENT SEL update
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
|
||||
|
||||
// Steps 3 & 4: Performance counters initialization, enable:
|
||||
const static uint32_t PCIE_PERF_COUNT_CNTL = 0x11180200;
|
||||
|
||||
// Step 5: Performance counters read:
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
|
||||
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK10 = 0x11180930;
|
||||
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK10 = 0x111808B4;
|
||||
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK10 = 0x11180934;
|
||||
|
||||
|
||||
@@ -127,201 +127,200 @@ const static uint32_t PCIE_PERF_COUNT1_UPVAL_LCLK8 = 0x11180974;
|
||||
|
||||
// -------- RX Tile SCLK End ----------
|
||||
|
||||
typedef enum{
|
||||
TX_TILE_TXCLK = 0,
|
||||
TX_TILE_SCLK = 1,
|
||||
RX_TILE_TXCLK = 2,
|
||||
RX_TILE_SCLK = 3,
|
||||
LC_TILE_TXCLK = 4
|
||||
}pcie_event_category_t;
|
||||
typedef enum {
|
||||
TX_TILE_TXCLK = 0,
|
||||
TX_TILE_SCLK = 1,
|
||||
RX_TILE_TXCLK = 2,
|
||||
RX_TILE_SCLK = 3,
|
||||
LC_TILE_TXCLK = 4
|
||||
} pcie_event_category_t;
|
||||
|
||||
struct pcie_event_t{
|
||||
pcie_event_t(int id, pcie_event_category_t cat): event_id(id), event_category(cat){}
|
||||
int event_id;
|
||||
pcie_event_category_t event_category;
|
||||
struct pcie_event_t {
|
||||
pcie_event_t(int id, pcie_event_category_t cat) : event_id(id), event_category(cat) {}
|
||||
int event_id;
|
||||
pcie_event_category_t event_category;
|
||||
};
|
||||
|
||||
const static std::map<std::string, pcie_event_t> pcie_events_table = {
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
|
||||
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
|
||||
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
|
||||
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
|
||||
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
|
||||
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
|
||||
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
|
||||
|
||||
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
|
||||
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
|
||||
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
|
||||
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}
|
||||
};
|
||||
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
|
||||
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
|
||||
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
|
||||
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}};
|
||||
|
||||
}
|
||||
} // namespace PCIE_MI200
|
||||
|
||||
|
||||
#endif
|
||||
@@ -42,6 +42,6 @@ class PerfMon {
|
||||
std::vector<std::string> counter_names_;
|
||||
};
|
||||
|
||||
} // namespace rocprofiler
|
||||
} // namespace rocprofiler
|
||||
|
||||
#endif
|
||||
@@ -31,10 +31,8 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
namespace rocprofiler {
|
||||
size_t CreateGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t* command,
|
||||
const size_t& slot_count) {
|
||||
size_t CreateGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t* command, const size_t& slot_count) {
|
||||
if (op >= NUMBER_GPU_CMD_OP) EXC_RAISING(HSA_STATUS_ERROR, "bad op value (" << op << ")");
|
||||
|
||||
const bool is_legacy = (strncmp(agent_info->name, "gfx8", 4) == 0);
|
||||
@@ -49,14 +47,15 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
|
||||
profile.agent = agent_info->dev_id;
|
||||
// Query for cmd buffer size
|
||||
hsa_ven_amd_aqlprofile_info_type_t info_type =
|
||||
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
|
||||
hsa_status_t status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
|
||||
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
|
||||
hsa_status_t status =
|
||||
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
|
||||
if (profile.command_buffer.size == 0) EXC_RAISING(status, "get_info(ENABLE_CMD).size == 0");
|
||||
// Allocate cmd buffer
|
||||
const size_t aligment_mask = 0x100 - 1;
|
||||
profile.command_buffer.ptr =
|
||||
hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
|
||||
profile.command_buffer.ptr = hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
|
||||
if ((reinterpret_cast<uintptr_t>(profile.command_buffer.ptr) & aligment_mask) != 0) {
|
||||
EXC_RAISING(status, "profile.command_buffer.ptr bad alignment");
|
||||
}
|
||||
@@ -66,15 +65,18 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
|
||||
packet_t packet{};
|
||||
|
||||
// Query for cmd buffer data
|
||||
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
|
||||
status =
|
||||
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
|
||||
|
||||
// Check for legacy GFXIP
|
||||
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_legacy_get_pm4(&packet, command);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
|
||||
} else {
|
||||
// Query for cmd buffer data
|
||||
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
|
||||
status =
|
||||
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
|
||||
}
|
||||
|
||||
@@ -91,15 +93,14 @@ struct gpu_cmd_key_t {
|
||||
uint32_t node_id;
|
||||
};
|
||||
struct gpu_cmd_fncomp_t {
|
||||
bool operator() (const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
|
||||
bool operator()(const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
|
||||
return (a.op < b.op) || ((a.op == b.op) && (a.node_id < b.node_id));
|
||||
}
|
||||
};
|
||||
typedef std::map<gpu_cmd_key_t, gpu_cmd_entry_t, gpu_cmd_fncomp_t> gpu_cmd_map_t;
|
||||
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out) {
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out) {
|
||||
thread_local gpu_cmd_map_t map;
|
||||
|
||||
// Getting NUMA node id
|
||||
@@ -112,7 +113,8 @@ size_t GetGpuCommand(gpu_cmd_op_t op,
|
||||
auto ret = map.insert({gpu_cmd_key_t{op, node_id}, gpu_cmd_entry_t{}});
|
||||
gpu_cmd_map_t::iterator it = ret.first;
|
||||
if (ret.second) {
|
||||
it->second.size = CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
|
||||
it->second.size =
|
||||
CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
|
||||
}
|
||||
|
||||
*command_out = it->second.command;
|
||||
|
||||
@@ -37,9 +37,8 @@ enum gpu_cmd_op_t {
|
||||
NUMBER_GPU_CMD_OP
|
||||
};
|
||||
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out);
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out);
|
||||
|
||||
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
@@ -55,9 +54,7 @@ static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
|
||||
hsa_agent_t agent,
|
||||
hsa_queue_t* queue) {
|
||||
static inline size_t IssueGpuCommand(gpu_cmd_op_t op, hsa_agent_t agent, hsa_queue_t* queue) {
|
||||
rocprofiler::util::HsaRsrcFactory* hsa_rsrc = &rocprofiler::util::HsaRsrcFactory::Instance();
|
||||
const rocprofiler::util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(agent);
|
||||
return IssueGpuCommand(op, agent_info, queue);
|
||||
|
||||
@@ -55,31 +55,30 @@ struct block_status_t {
|
||||
|
||||
// Metrics set class
|
||||
class MetricsGroup {
|
||||
public:
|
||||
public:
|
||||
// Info map type
|
||||
typedef std::map<std::string, const Metric*> info_map_t;
|
||||
// Blocks map type
|
||||
typedef std::map<block_des_t, block_status_t, lt_block_des> blocks_map_t;
|
||||
|
||||
MetricsGroup(const util::AgentInfo* agent_info) :
|
||||
agent_info_(agent_info)
|
||||
{
|
||||
MetricsGroup(const util::AgentInfo* agent_info) : agent_info_(agent_info) {
|
||||
metrics_ = MetricsDict::Create(agent_info);
|
||||
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
|
||||
}
|
||||
|
||||
void Print(FILE* file) const {
|
||||
for (const Metric* metric : metrics_vec_) {
|
||||
fprintf(file, " %s", metric->GetName().c_str()); fflush(stdout);
|
||||
fprintf(file, " %s", metric->GetName().c_str());
|
||||
fflush(stdout);
|
||||
}
|
||||
fprintf(file, "\n"); fflush(stdout);
|
||||
fprintf(file, "\n");
|
||||
fflush(stdout);
|
||||
}
|
||||
|
||||
static const Metric* GetMetric(const MetricsDict* metrics, const std::string& name) {
|
||||
// Metric object
|
||||
const Metric* metric = metrics->Get(name);
|
||||
if (metric == NULL)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
|
||||
if (metric == NULL) EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
|
||||
return metric;
|
||||
}
|
||||
|
||||
@@ -95,9 +94,7 @@ class MetricsGroup {
|
||||
}
|
||||
|
||||
// Add metric
|
||||
bool AddMetric(const rocprofiler_feature_t* info) {
|
||||
return AddMetric(GetMetric(metrics_, info));
|
||||
}
|
||||
bool AddMetric(const rocprofiler_feature_t* info) { return AddMetric(GetMetric(metrics_, info)); }
|
||||
|
||||
bool AddMetric(const Metric* metric) {
|
||||
// Blocks utilization delta
|
||||
@@ -125,8 +122,9 @@ class MetricsGroup {
|
||||
query.events = event;
|
||||
|
||||
uint32_t block_counters;
|
||||
hsa_status_t status = util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
|
||||
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
|
||||
hsa_status_t status =
|
||||
util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
|
||||
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "get block_counters info");
|
||||
block_status.max_counters = block_counters;
|
||||
}
|
||||
@@ -141,7 +139,8 @@ class MetricsGroup {
|
||||
metrics_vec_.push_back(metric);
|
||||
info_map_[metric->GetName()] = metric;
|
||||
for (const counter_t* counter : counters_vec) {
|
||||
if (info_map_.find(counter->name) == info_map_.end()) info_map_[counter->name] = NewCounterInfo(counter->name);
|
||||
if (info_map_.find(counter->name) == info_map_.end())
|
||||
info_map_[counter->name] = NewCounterInfo(counter->name);
|
||||
}
|
||||
for (const auto& entry : blocks_delta) {
|
||||
blocks_map_[entry.first] = entry.second;
|
||||
@@ -150,10 +149,8 @@ class MetricsGroup {
|
||||
return true;
|
||||
}
|
||||
|
||||
private:
|
||||
const Metric* NewCounterInfo(const std::string& name) const {
|
||||
return GetMetric(metrics_, name);
|
||||
}
|
||||
private:
|
||||
const Metric* NewCounterInfo(const std::string& name) const { return GetMetric(metrics_, name); }
|
||||
|
||||
// Agent info
|
||||
const util::AgentInfo* const agent_info_;
|
||||
@@ -169,10 +166,10 @@ class MetricsGroup {
|
||||
|
||||
// Metrics groups class
|
||||
class MetricsGroupSet {
|
||||
public:
|
||||
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array, const uint32_t info_count) :
|
||||
agent_info_(agent_info)
|
||||
{
|
||||
public:
|
||||
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array,
|
||||
const uint32_t info_count)
|
||||
: agent_info_(agent_info) {
|
||||
metrics_ = MetricsDict::Create(agent_info);
|
||||
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
|
||||
Initialize(info_array, info_count);
|
||||
@@ -186,12 +183,13 @@ class MetricsGroupSet {
|
||||
|
||||
void Print(FILE* file) const {
|
||||
for (const auto* group : groups_) {
|
||||
fprintf(stdout, " pmc : "); fflush(stdout);
|
||||
fprintf(stdout, " pmc : ");
|
||||
fflush(stdout);
|
||||
group->Print(file);
|
||||
}
|
||||
}
|
||||
|
||||
private:
|
||||
private:
|
||||
void Initialize(const rocprofiler_feature_t* info_array, const uint32_t info_count) {
|
||||
std::multimap<uint32_t, const Metric*, std::greater<uint32_t> > input_metrics;
|
||||
for (unsigned i = 0; i < info_count; ++i) {
|
||||
@@ -202,7 +200,8 @@ class MetricsGroupSet {
|
||||
input_metrics.insert({counters_num, metric});
|
||||
|
||||
if (MetricsGroup(agent_info_).AddMetric(metric) == false) {
|
||||
AQL_EXC_RAISING(HSA_STATUS_ERROR, "Metric '" << metric->GetName() << "' doesn't fit in one group");
|
||||
AQL_EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"Metric '" << metric->GetName() << "' doesn't fit in one group");
|
||||
}
|
||||
}
|
||||
#if 0
|
||||
@@ -239,4 +238,4 @@ class MetricsGroupSet {
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
#endif // SRC_CORE_GROUP_SET_H_
|
||||
#endif // SRC_CORE_GROUP_SET_H_
|
||||
|
||||
@@ -62,33 +62,28 @@ AgentInfo::AgentInfo(const hsa_agent_t agent, ::CoreApiTable* table) : handle_(a
|
||||
table->hsa_agent_get_info_fn(
|
||||
agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES), &se_num_);
|
||||
|
||||
if (table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
|
||||
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
|
||||
&waves_per_cu_) != HSA_STATUS_SUCCESS)
|
||||
{
|
||||
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
|
||||
if (table->hsa_agent_get_info_fn(agent,
|
||||
(hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
|
||||
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
|
||||
&waves_per_cu_) != HSA_STATUS_SUCCESS) {
|
||||
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
|
||||
}
|
||||
|
||||
compute_units_per_sh_ = cu_num_ / (se_num_ * shader_arrays_per_se_);
|
||||
wave_slots_per_simd_ = waves_per_cu_ / simds_per_cu_;
|
||||
|
||||
if (table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
|
||||
&pci_domain_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
|
||||
&pci_location_id_) != HSA_STATUS_SUCCESS)
|
||||
{
|
||||
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
|
||||
if (table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
|
||||
&pci_domain_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
|
||||
&pci_location_id_) != HSA_STATUS_SUCCESS) {
|
||||
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
|
||||
}
|
||||
|
||||
// TODO(saurabh, giovanni): Remove this in 5.7
|
||||
if (table->hsa_agent_get_info_fn(agent,
|
||||
static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC), &xcc_num_) != HSA_STATUS_SUCCESS) {
|
||||
xcc_num_ = 1;
|
||||
if (table->hsa_agent_get_info_fn(agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC),
|
||||
&xcc_num_) != HSA_STATUS_SUCCESS) {
|
||||
xcc_num_ = 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -33,8 +33,8 @@ Agent::AgentInfo& GetAgentInfo(decltype(hsa_agent_t::handle) handle) {
|
||||
if (agent_info_map.find(handle) != agent_info_map.end()) {
|
||||
return agent_info_map.at(handle);
|
||||
} else {
|
||||
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle) <<
|
||||
") in this system" << std::endl;
|
||||
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle)
|
||||
<< ") in this system" << std::endl;
|
||||
abort();
|
||||
}
|
||||
}
|
||||
@@ -49,9 +49,7 @@ void SetAgentInfo(decltype(hsa_agent_t::handle) handle, const Agent::AgentInfo&
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<hsa_agent_t>& GetCPUAgentList() {
|
||||
return cpu_agents_list;
|
||||
}
|
||||
std::vector<hsa_agent_t>& GetCPUAgentList() { return cpu_agents_list; }
|
||||
|
||||
hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
|
||||
std::lock_guard<std::mutex> lock(agents_map_lock);
|
||||
@@ -60,8 +58,8 @@ hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
|
||||
return hsa_agent_t{agent_info.second.getHandle()};
|
||||
}
|
||||
}
|
||||
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index) <<
|
||||
") in this system" << std::endl;
|
||||
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index)
|
||||
<< ") in this system" << std::endl;
|
||||
abort();
|
||||
}
|
||||
|
||||
|
||||
@@ -95,7 +95,7 @@ namespace rocprofiler {
|
||||
namespace hsa_support {
|
||||
|
||||
void Initialize(HsaApiTable* Table);
|
||||
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void *data);
|
||||
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void* data);
|
||||
void Finalize();
|
||||
|
||||
bool IterateCounters(rocprofiler_counters_info_callback_t counters_info_callback);
|
||||
|
||||
@@ -181,7 +181,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
|
||||
|
||||
// TODO: validate needs to be called on each events_list[i]
|
||||
// Validating the events array for the specified gpu agent
|
||||
if(events_list.size() > 0) {
|
||||
if (events_list.size() > 0) {
|
||||
bool validate_event_result;
|
||||
status =
|
||||
hsa_ven_amd_aqlprofile_validate_event(gpu_agent, &events_list[0], &validate_event_result);
|
||||
@@ -234,9 +234,10 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
|
||||
}
|
||||
}
|
||||
|
||||
for(auto& cname : counter_names) {
|
||||
if(cname.compare("KERNEL_DURATION")==0) {
|
||||
rocprofiler::Metric* metric = const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
|
||||
for (auto& cname : counter_names) {
|
||||
if (cname.compare("KERNEL_DURATION") == 0) {
|
||||
rocprofiler::Metric* metric =
|
||||
const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
|
||||
if (metric == nullptr) std::cout << cname << " not found in metricsDict\n";
|
||||
context->metrics_list.push_back(metric);
|
||||
}
|
||||
@@ -315,7 +316,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
|
||||
hsa_agent_t ag_list[ag_list_count];
|
||||
ag_list[0] = gpu_agent;
|
||||
|
||||
if(context->events_list.size() > 0) {
|
||||
if (context->events_list.size() > 0) {
|
||||
// Preparing an Getting the size of the command and output buffers
|
||||
status = hsa_ven_amd_aqlprofile_start(profile, NULL);
|
||||
// CHECK_HSA_STATUS("Error: Getting Buffers Size", status);
|
||||
@@ -510,7 +511,8 @@ uint8_t* AllocateLocalMemory(size_t size, hsa_amd_memory_pool_t* gpu_pool) {
|
||||
return ptr;
|
||||
}
|
||||
|
||||
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile, size_t att_buffer_size) {
|
||||
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
size_t att_buffer_size) {
|
||||
Agent::AgentInfo& agentInfo = rocprofiler::hsa_support::GetAgentInfo(gpu_agent.handle);
|
||||
profile->command_buffer.ptr =
|
||||
AllocateSysMemory(gpu_agent, profile->command_buffer.size, &agentInfo.cpu_pool);
|
||||
|
||||
@@ -435,16 +435,18 @@ bool AsyncSignalHandler(hsa_signal_value_t signal_value, void* data) {
|
||||
pending->session_id = GetROCProfilerSingleton()->GetCurrentSessionId();
|
||||
}
|
||||
if (pending->counters_count > 0) {
|
||||
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 && pending->profile) // call to GetCounterData() is required only once for a dispatch
|
||||
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 &&
|
||||
pending->profile) // call to GetCounterData() is required only once for a dispatch
|
||||
rocprofiler::metrics::GetCounterData(pending->profile, queue_info_session->agent,
|
||||
pending->context->results_list);
|
||||
if (is_individual_xcc_mode)
|
||||
rocprofiler::metrics::GetCountersAndMetricResultsByXcc(
|
||||
xcc_id, pending->context->results_list, pending->context->results_map,
|
||||
pending->context->metrics_list, time.end-time.start);
|
||||
pending->context->metrics_list, time.end - time.start);
|
||||
else
|
||||
rocprofiler::metrics::GetMetricsData(pending->context->results_map,
|
||||
pending->context->metrics_list, time.end-time.start);
|
||||
pending->context->metrics_list,
|
||||
time.end - time.start);
|
||||
AddRecordCounters(&record, pending);
|
||||
} else {
|
||||
if (session->FindBuffer(pending->buffer_id)) {
|
||||
@@ -652,8 +654,8 @@ void CheckNeededProfileConfigs() {
|
||||
att_counters_names = filter->GetCounterData();
|
||||
kernel_profile_names = std::get<std::vector<std::string>>(
|
||||
filter->GetProperty(ROCPROFILER_FILTER_KERNEL_NAMES));
|
||||
kernel_profile_dispatch_ids = std::get<std::vector<uint64_t>>(
|
||||
filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
|
||||
kernel_profile_dispatch_ids =
|
||||
std::get<std::vector<uint64_t>>(filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
|
||||
} else if (session && session->FindFilterWithKind(ROCPROFILER_PC_SAMPLING_COLLECTION)) {
|
||||
is_pc_sampling_collection_mode = true;
|
||||
}
|
||||
@@ -685,23 +687,20 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
|
||||
auto& kdispatch = static_cast<const hsa_kernel_dispatch_packet_s*>(packets)[i];
|
||||
|
||||
// If Dispatch IDs specified, profile based on dispatch ID
|
||||
for (auto id : kernel_profile_dispatch_ids)
|
||||
b_profile_this_object |= id == current_writer_id;
|
||||
for (auto id : kernel_profile_dispatch_ids) b_profile_this_object |= id == current_writer_id;
|
||||
try {
|
||||
// Can throw
|
||||
const std::string& kernel_name = ksymbols->at(kdispatch.kernel_object);
|
||||
|
||||
// If no filters specified, auto profile this kernel
|
||||
if (kernel_profile_names.size() == 0 &&
|
||||
kernel_profile_dispatch_ids.size() == 0 &&
|
||||
if (kernel_profile_names.size() == 0 && kernel_profile_dispatch_ids.size() == 0 &&
|
||||
kernel_name.find("__amd_rocclr_") == std::string::npos)
|
||||
b_profile_this_object = true;
|
||||
b_profile_this_object = true;
|
||||
|
||||
// Try to match the mangled kernel name with given matches in input.txt
|
||||
// We want to initiate att profiling if a match exists
|
||||
for (const std::string& kernel_matches : kernel_profile_names)
|
||||
if (kernel_name.find(kernel_matches) != std::string::npos)
|
||||
b_profile_this_object = true;
|
||||
if (kernel_name.find(kernel_matches) != std::string::npos) b_profile_this_object = true;
|
||||
} catch (...) {
|
||||
printf("Warning: Unknown name for object %lu\n", kdispatch.kernel_object);
|
||||
}
|
||||
@@ -711,17 +710,13 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
|
||||
can_profile_packet.push_back(b_profile_this_object);
|
||||
}
|
||||
// If we're going to skip all packets, need to update writer ID
|
||||
if (!b_can_profile_anypacket)
|
||||
WRITER_ID.store(current_writer_id, std::memory_order_release);
|
||||
if (!b_can_profile_anypacket) WRITER_ID.store(current_writer_id, std::memory_order_release);
|
||||
return {can_profile_packet, b_can_profile_anypacket};
|
||||
}
|
||||
|
||||
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
|
||||
Packet::packet_t& start_packet,
|
||||
Packet::packet_t& stop_packet,
|
||||
Queue& queue_info,
|
||||
Agent::AgentInfo& agentInfo
|
||||
) {
|
||||
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(Packet::packet_t& start_packet,
|
||||
Packet::packet_t& stop_packet, Queue& queue_info,
|
||||
Agent::AgentInfo& agentInfo) {
|
||||
std::vector<hsa_ven_amd_aqlprofile_parameter_t> att_params;
|
||||
int num_att_counters = 0;
|
||||
uint32_t att_buffer_size = DEFAULT_ATT_BUFFER_SIZE;
|
||||
@@ -731,15 +726,16 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
|
||||
case ROCPROFILER_ATT_PERFCOUNTER_NAME:
|
||||
break;
|
||||
case ROCPROFILER_ATT_BUFFER_SIZE:
|
||||
att_buffer_size = std::max(96l<<10l, std::min(int64_t(param.value)<<20l, (1l<<32l)-(3l<<20)));
|
||||
break; // Clip to [96KB, 4GB)
|
||||
att_buffer_size =
|
||||
std::max(96l << 10l, std::min(int64_t(param.value) << 20l, (1l << 32l) - (3l << 20)));
|
||||
break; // Clip to [96KB, 4GB)
|
||||
case ROCPROFILER_ATT_PERFCOUNTER:
|
||||
num_att_counters += 1;
|
||||
break;
|
||||
default:
|
||||
att_params.push_back(
|
||||
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
|
||||
param.value});
|
||||
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
|
||||
param.value});
|
||||
}
|
||||
}
|
||||
|
||||
@@ -760,22 +756,21 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
|
||||
printf("Only events from the SQ block can be selected for ATT.");
|
||||
exit(1);
|
||||
}
|
||||
att_params.push_back({static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(
|
||||
int(ROCPROFILER_ATT_PERFCOUNTER)),
|
||||
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
|
||||
att_params.push_back(
|
||||
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
|
||||
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
|
||||
num_att_counters += 1;
|
||||
}
|
||||
|
||||
hsa_ven_amd_aqlprofile_parameter_t zero_perf = {
|
||||
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
|
||||
0};
|
||||
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)), 0};
|
||||
|
||||
// Fill other perfcounters with 0's
|
||||
for (; num_att_counters < 16; num_att_counters++) att_params.push_back(zero_perf);
|
||||
}
|
||||
// Get the PM4 Packets using packets_generator
|
||||
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(),
|
||||
att_params, &start_packet, &stop_packet, att_buffer_size);
|
||||
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(), att_params,
|
||||
&start_packet, &stop_packet, att_buffer_size);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -866,14 +861,16 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
record_id);
|
||||
if (session_data_count > 0 && profile.second) {
|
||||
session->GetProfiler()->AddPendingSignals(
|
||||
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
|
||||
profile.first, session_data_count, profile.second, kernel_properties,
|
||||
(uint32_t)syscall(__NR_gettid), user_pkt_index, correlation_id);
|
||||
writer_id, record_id, original_packet.completion_signal,
|
||||
dispatch_packet.completion_signal, session_id, buffer_id, profile.first,
|
||||
session_data_count, profile.second, kernel_properties, (uint32_t)syscall(__NR_gettid),
|
||||
user_pkt_index, correlation_id);
|
||||
} else {
|
||||
session->GetProfiler()->AddPendingSignals(
|
||||
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
|
||||
nullptr, session_data_count, nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid),
|
||||
user_pkt_index, correlation_id);
|
||||
writer_id, record_id, original_packet.completion_signal,
|
||||
dispatch_packet.completion_signal, session_id, buffer_id, nullptr, session_data_count,
|
||||
nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index,
|
||||
correlation_id);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -893,7 +890,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
CreateSignal(0, &interrupt_signal);
|
||||
|
||||
// Adding Stop and Read PM4 Packets
|
||||
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 && profile.first && profile.first->stop_packet) {
|
||||
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 &&
|
||||
profile.first && profile.first->stop_packet) {
|
||||
hsa_signal_t dummy_signal{};
|
||||
profile.first->stop_packet->header = HSA_PACKET_TYPE_VENDOR_SPECIFIC
|
||||
<< HSA_PACKET_HEADER_TYPE;
|
||||
@@ -937,7 +935,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
|
||||
bool can_profile_anypacket = false;
|
||||
std::vector<bool> can_profile_packet;
|
||||
std::tie(can_profile_packet, can_profile_anypacket) = GetAllowedProfilesList(packets, pkt_count);
|
||||
std::tie(can_profile_packet, can_profile_anypacket) =
|
||||
GetAllowedProfilesList(packets, pkt_count);
|
||||
|
||||
if (!can_profile_anypacket) {
|
||||
/* Write the original packets to the hardware if no patch will be profiled */
|
||||
@@ -964,8 +963,9 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
|
||||
// increment writer ID for every packet
|
||||
if (bit_extract(original_packet.header, HSA_PACKET_HEADER_TYPE,
|
||||
HSA_PACKET_HEADER_TYPE+HSA_PACKET_HEADER_WIDTH_TYPE-1) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
|
||||
HSA_PACKET_HEADER_TYPE + HSA_PACKET_HEADER_WIDTH_TYPE - 1) ==
|
||||
HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
|
||||
|
||||
continue;
|
||||
}
|
||||
|
||||
@@ -37,33 +37,37 @@ SOFTWARE.
|
||||
#include "util/exception.h"
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
#define HSA_RT(call) \
|
||||
do { \
|
||||
const hsa_status_t status = call; \
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
|
||||
} while(0)
|
||||
|
||||
#define IS_HSA_CALLBACK(ID) \
|
||||
const auto __id = ID; (void)__id; \
|
||||
void *__arg = arg_.load(); (void)__arg; \
|
||||
rocprofiler_hsa_callback_fun_t __callback = \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) ? callbacks_.allocate: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol: \
|
||||
callbacks_.codeobj; \
|
||||
if ((__callback != NULL) && (recursion_ == false))
|
||||
|
||||
#define DO_HSA_CALLBACK \
|
||||
do { \
|
||||
recursion_ = true; \
|
||||
__callback(__id, &data, __arg); \
|
||||
recursion_ = false; \
|
||||
#define HSA_RT(call) \
|
||||
do { \
|
||||
const hsa_status_t status = call; \
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
|
||||
} while (0)
|
||||
|
||||
#define ISSUE_HSA_CALLBACK(ID) \
|
||||
do { IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } } while(0)
|
||||
#define IS_HSA_CALLBACK(ID) \
|
||||
const auto __id = ID; \
|
||||
(void)__id; \
|
||||
void* __arg = arg_.load(); \
|
||||
(void)__arg; \
|
||||
rocprofiler_hsa_callback_fun_t __callback = (ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) \
|
||||
? callbacks_.allocate \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol \
|
||||
: callbacks_.codeobj; \
|
||||
if ((__callback != NULL) && (recursion_ == false))
|
||||
|
||||
#define DO_HSA_CALLBACK \
|
||||
do { \
|
||||
recursion_ = true; \
|
||||
__callback(__id, &data, __arg); \
|
||||
recursion_ = false; \
|
||||
} while (0)
|
||||
|
||||
#define ISSUE_HSA_CALLBACK(ID) \
|
||||
do { \
|
||||
IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } \
|
||||
} while (0)
|
||||
|
||||
// Demangle C++ symbol name
|
||||
static const char* cpp_demangle(const char* symname) {
|
||||
@@ -74,15 +78,15 @@ static const char* cpp_demangle(const char* symname) {
|
||||
}
|
||||
|
||||
namespace rocprofiler {
|
||||
extern decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
extern decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
extern decltype(hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
extern decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
extern decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
extern decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
extern decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
extern decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
extern decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
extern decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
extern decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
extern decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
extern decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
extern decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
extern decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
extern decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
extern decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
extern decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
|
||||
class HsaInterceptor {
|
||||
public:
|
||||
@@ -95,10 +99,7 @@ class HsaInterceptor {
|
||||
if (enable_) {
|
||||
// Fetching AMD Loader HSA extension API
|
||||
HSA_RT(hsa_system_get_major_extension_table(
|
||||
HSA_EXTENSION_AMD_LOADER,
|
||||
1,
|
||||
sizeof(hsa_ven_amd_loader_1_01_pfn_t),
|
||||
&LoaderApiTable));
|
||||
HSA_EXTENSION_AMD_LOADER, 1, sizeof(hsa_ven_amd_loader_1_01_pfn_t), &LoaderApiTable));
|
||||
|
||||
// Saving original API functions
|
||||
hsa_memory_allocate_fn = table->core_->hsa_memory_allocate_fn;
|
||||
@@ -131,10 +132,7 @@ class HsaInterceptor {
|
||||
}
|
||||
|
||||
private:
|
||||
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region,
|
||||
size_t size,
|
||||
void** ptr)
|
||||
{
|
||||
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region, size_t size, void** ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_memory_allocate_fn(region, size, ptr));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
|
||||
@@ -150,11 +148,8 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryAssignAgent(
|
||||
void *ptr,
|
||||
hsa_agent_t agent,
|
||||
hsa_access_permission_t access)
|
||||
{
|
||||
static hsa_status_t MemoryAssignAgent(void* ptr, hsa_agent_t agent,
|
||||
hsa_access_permission_t access) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_memory_assign_agent_fn(ptr, agent, access));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
|
||||
@@ -169,11 +164,7 @@ class HsaInterceptor {
|
||||
}
|
||||
|
||||
// Spawn device allow access callback
|
||||
static void DeviceCallback(
|
||||
uint32_t num_agents,
|
||||
const hsa_agent_t* agents,
|
||||
const void* ptr)
|
||||
{
|
||||
static void DeviceCallback(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr) {
|
||||
for (const hsa_agent_t* agent_p = agents; agent_p < (agents + num_agents); ++agent_p) {
|
||||
hsa_agent_t agent = *agent_p;
|
||||
rocprofiler_hsa_callback_data_t data{};
|
||||
@@ -188,17 +179,11 @@ class HsaInterceptor {
|
||||
}
|
||||
|
||||
// Agent allow access callback 'hsa_amd_agents_allow_access'
|
||||
static hsa_status_t AgentsAllowAccess(
|
||||
uint32_t num_agents,
|
||||
const hsa_agent_t* agents,
|
||||
const uint32_t* flags,
|
||||
const void* ptr)
|
||||
{
|
||||
static hsa_status_t AgentsAllowAccess(uint32_t num_agents, const hsa_agent_t* agents,
|
||||
const uint32_t* flags, const void* ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_amd_agents_allow_access_fn(num_agents, agents, flags, ptr));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
|
||||
DeviceCallback(num_agents, agents, ptr);
|
||||
}
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) { DeviceCallback(num_agents, agents, ptr); }
|
||||
return status;
|
||||
}
|
||||
|
||||
@@ -218,12 +203,8 @@ class HsaInterceptor {
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryPoolAllocate(
|
||||
hsa_amd_memory_pool_t pool,
|
||||
size_t size,
|
||||
uint32_t flags,
|
||||
void** ptr)
|
||||
{
|
||||
static hsa_status_t MemoryPoolAllocate(hsa_amd_memory_pool_t pool, size_t size, uint32_t flags,
|
||||
void** ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_amd_memory_pool_allocate_fn(pool, size, flags, ptr));
|
||||
if (size != 0) {
|
||||
@@ -232,8 +213,10 @@ class HsaInterceptor {
|
||||
data.allocate.ptr = *ptr;
|
||||
data.allocate.size = size;
|
||||
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &data.allocate.segment));
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &data.allocate.global_flag));
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT,
|
||||
&data.allocate.segment));
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS,
|
||||
&data.allocate.global_flag));
|
||||
|
||||
DO_HSA_CALLBACK;
|
||||
|
||||
@@ -246,9 +229,7 @@ class HsaInterceptor {
|
||||
}
|
||||
return status;
|
||||
}
|
||||
static hsa_status_t MemoryPoolFree(
|
||||
void* ptr)
|
||||
{
|
||||
static hsa_status_t MemoryPoolFree(void* ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
|
||||
rocprofiler_hsa_callback_data_t data{};
|
||||
@@ -260,11 +241,7 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryCopy(
|
||||
void *dst,
|
||||
const void *src,
|
||||
size_t size)
|
||||
{
|
||||
static hsa_status_t MemoryCopy(void* dst, const void* src, size_t size) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_memory_copy_fn(dst, src, size));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
|
||||
@@ -277,17 +254,13 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryAsyncCopy(
|
||||
void* dst, hsa_agent_t dst_agent, const void* src,
|
||||
hsa_agent_t src_agent, size_t size,
|
||||
uint32_t num_dep_signals,
|
||||
const hsa_signal_t* dep_signals,
|
||||
hsa_signal_t completion_signal)
|
||||
{
|
||||
static hsa_status_t MemoryAsyncCopy(void* dst, hsa_agent_t dst_agent, const void* src,
|
||||
hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals,
|
||||
const hsa_signal_t* dep_signals,
|
||||
hsa_signal_t completion_signal) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_amd_memory_async_copy_fn(
|
||||
dst, dst_agent, src, src_agent, size,
|
||||
num_dep_signals, dep_signals, completion_signal));
|
||||
HSA_RT(hsa_amd_memory_async_copy_fn(dst, dst_agent, src, src_agent, size, num_dep_signals,
|
||||
dep_signals, completion_signal));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
|
||||
rocprofiler_hsa_callback_data_t data{};
|
||||
data.memcopy.dst = dst;
|
||||
@@ -298,14 +271,11 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t CodeObjectCallback(
|
||||
hsa_executable_t executable,
|
||||
hsa_loaded_code_object_t loaded_code_object,
|
||||
void* arg)
|
||||
{
|
||||
static hsa_status_t CodeObjectCallback(hsa_executable_t executable,
|
||||
hsa_loaded_code_object_t loaded_code_object, void* arg) {
|
||||
const int free_flag = reinterpret_cast<long>(arg);
|
||||
hsa_ven_amd_loader_code_object_storage_type_t storage_type =
|
||||
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
|
||||
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
|
||||
int storage_fd = -1;
|
||||
uint64_t memory_base = 0;
|
||||
uint64_t memory_size = 0;
|
||||
@@ -316,56 +286,45 @@ class HsaInterceptor {
|
||||
char* uri_str = NULL;
|
||||
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
|
||||
&storage_type));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
|
||||
&storage_type));
|
||||
|
||||
if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE) {
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
|
||||
&storage_fd));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
|
||||
&storage_fd));
|
||||
if (storage_fd == -1) {
|
||||
printf("CodeObjectCallback: fd == -1\n"); fflush(stdout);
|
||||
abort();
|
||||
printf("CodeObjectCallback: fd == -1\n");
|
||||
fflush(stdout);
|
||||
abort();
|
||||
}
|
||||
} else if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY) {
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
|
||||
&memory_base));
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
|
||||
&memory_base));
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
|
||||
&memory_size));
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
|
||||
&memory_size));
|
||||
}
|
||||
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE,
|
||||
&load_base));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE, &load_base));
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE,
|
||||
&load_size));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE, &load_size));
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA,
|
||||
&load_delta));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA, &load_delta));
|
||||
|
||||
// Getting URI
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH,
|
||||
&uri_len));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH, &uri_len));
|
||||
|
||||
uri_str = (char*)calloc(uri_len + 1, sizeof(char));
|
||||
if (!uri_str) EXC_ABORT(HSA_STATUS_ERROR, "URI allocation");
|
||||
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI,
|
||||
uri_str));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI, uri_str));
|
||||
|
||||
if (storage_type != HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE) {
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_CODEOBJ) {
|
||||
@@ -377,8 +336,8 @@ class HsaInterceptor {
|
||||
data.codeobj.load_base = load_base;
|
||||
data.codeobj.load_size = load_size;
|
||||
data.codeobj.load_delta = load_delta;
|
||||
data.codeobj.uri_length = uri_len;
|
||||
data.codeobj.uri = uri_str;
|
||||
data.codeobj.uri_length = uri_len;
|
||||
data.codeobj.uri = uri_str;
|
||||
data.codeobj.unload = free_flag;
|
||||
|
||||
DO_HSA_CALLBACK;
|
||||
@@ -406,12 +365,8 @@ class HsaInterceptor {
|
||||
uint32_t num_agents = 0;
|
||||
hsa_agent_t* agents = NULL;
|
||||
pointer_info.size = sizeof(hsa_amd_pointer_info_t);
|
||||
HSA_RT(hsa_amd_pointer_info(
|
||||
reinterpret_cast<void*>(load_base),
|
||||
&pointer_info,
|
||||
malloc,
|
||||
&num_agents,
|
||||
&agents));
|
||||
HSA_RT(hsa_amd_pointer_info(reinterpret_cast<void*>(load_base), &pointer_info, malloc,
|
||||
&num_agents, &agents));
|
||||
|
||||
DeviceCallback(num_agents, agents, reinterpret_cast<void*>(load_base));
|
||||
}
|
||||
@@ -420,11 +375,8 @@ class HsaInterceptor {
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static hsa_status_t KernelSymbolCallback(
|
||||
hsa_executable_t executable,
|
||||
hsa_executable_symbol_t symbol,
|
||||
void *arg)
|
||||
{
|
||||
static hsa_status_t KernelSymbolCallback(hsa_executable_t executable,
|
||||
hsa_executable_symbol_t symbol, void* arg) {
|
||||
const int free_flag = reinterpret_cast<long>(arg);
|
||||
hsa_symbol_kind_t kind = (hsa_symbol_kind_t)0;
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &kind));
|
||||
@@ -433,9 +385,11 @@ class HsaInterceptor {
|
||||
const char* name = NULL;
|
||||
uint32_t len = 0;
|
||||
uint64_t obj = 0;
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
|
||||
HSA_RT(
|
||||
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
|
||||
if (free_flag == 0) {
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
|
||||
HSA_RT(
|
||||
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
|
||||
char sym_name[len + 1];
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME, sym_name));
|
||||
name = cpp_demangle(sym_name);
|
||||
@@ -453,10 +407,7 @@ class HsaInterceptor {
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static hsa_status_t ExecutableFreeze(
|
||||
hsa_executable_t executable,
|
||||
const char *options)
|
||||
{
|
||||
static hsa_status_t ExecutableFreeze(hsa_executable_t executable, const char* options) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
|
||||
HSA_RT(hsa_executable_freeze_fn(executable, options));
|
||||
@@ -466,39 +417,29 @@ class HsaInterceptor {
|
||||
{ IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) is_codeobj_cb |= 1; }
|
||||
if (is_codeobj_cb) {
|
||||
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
|
||||
executable,
|
||||
CodeObjectCallback,
|
||||
reinterpret_cast<void*>(0));
|
||||
executable, CodeObjectCallback, reinterpret_cast<void*>(0));
|
||||
}
|
||||
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
|
||||
HSA_RT(hsa_executable_iterate_symbols(
|
||||
executable,
|
||||
KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(0)));
|
||||
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(0)));
|
||||
}
|
||||
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t ExecutableDestroy(
|
||||
hsa_executable_t executable)
|
||||
{
|
||||
static hsa_status_t ExecutableDestroy(hsa_executable_t executable) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
|
||||
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
|
||||
executable,
|
||||
CodeObjectCallback,
|
||||
reinterpret_cast<void*>(1));
|
||||
executable, CodeObjectCallback, reinterpret_cast<void*>(1));
|
||||
}
|
||||
|
||||
{
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
|
||||
HSA_RT(hsa_executable_iterate_symbols(
|
||||
executable,
|
||||
KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(1)));
|
||||
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(1)));
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -33,9 +33,9 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
namespace rocprofiler {
|
||||
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
extern decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
extern decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
|
||||
class HsaProxyQueue : public ProxyQueue {
|
||||
public:
|
||||
|
||||
@@ -40,16 +40,13 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
namespace rocprofiler {
|
||||
enum {
|
||||
K_CONC_OFF = 0,
|
||||
K_CONC_PMC = 1,
|
||||
K_CONC_TRACE = 2
|
||||
};
|
||||
enum { K_CONC_OFF = 0, K_CONC_PMC = 1, K_CONC_TRACE = 2 };
|
||||
|
||||
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
|
||||
static inline void print_packet(const void* in_p, const uint32_t& in_n, const uint32_t& w_n = UINT32_MAX) {
|
||||
static inline void print_packet(const void* in_p, const uint32_t& in_n,
|
||||
const uint32_t& w_n = UINT32_MAX) {
|
||||
const uint32_t size32 = util::HsaRsrcFactory::CMD_SLOT_SIZE_B / 4;
|
||||
const uint32_t* beg = (const uint32_t*)in_p;
|
||||
const uint32_t* end = beg + (in_n * size32);
|
||||
@@ -85,31 +82,33 @@ class InterceptQueue {
|
||||
typedef std::recursive_mutex mutex_t;
|
||||
typedef std::map<uint64_t, InterceptQueue*> obj_map_t;
|
||||
typedef hsa_status_t (*queue_callback_t)(hsa_queue_t*, void* data);
|
||||
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t *queue, void *arg);
|
||||
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t* queue, void* arg);
|
||||
typedef uint32_t queue_id_t;
|
||||
|
||||
static void HsaIntercept(HsaApiTable* table);
|
||||
|
||||
static hsa_status_t InterceptQueueCreate(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source,
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue,
|
||||
const bool& tracker_on) {
|
||||
static hsa_status_t InterceptQueueCreate(
|
||||
hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
|
||||
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue,
|
||||
const bool& tracker_on) {
|
||||
std::lock_guard<mutex_t> lck(mutex_);
|
||||
hsa_status_t status = HSA_STATUS_ERROR;
|
||||
|
||||
if (in_create_call_) EXC_ABORT(status, "recursive InterceptQueueCreate()");
|
||||
in_create_call_ = true;
|
||||
|
||||
ProxyQueue* proxy = ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
|
||||
group_segment_size, queue, &status);
|
||||
ProxyQueue* proxy =
|
||||
ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
|
||||
group_segment_size, queue, &status);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "ProxyQueue::Create()");
|
||||
|
||||
if (tracker_on || tracker_on_) {
|
||||
if (tracker_ == NULL) tracker_ = &Tracker::Instance();
|
||||
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(*queue, true);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
|
||||
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(
|
||||
*queue, true);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
|
||||
}
|
||||
|
||||
InterceptQueue* obj = new InterceptQueue(agent, *queue, proxy);
|
||||
@@ -138,15 +137,17 @@ class InterceptQueue {
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue) {
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, false);
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
|
||||
group_segment_size, queue, false);
|
||||
}
|
||||
|
||||
static hsa_status_t QueueCreateTracked(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source,
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue) {
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, true);
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source,
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue) {
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
|
||||
group_segment_size, queue, true);
|
||||
}
|
||||
|
||||
static hsa_status_t QueueDestroy(hsa_queue_t* queue) {
|
||||
@@ -170,8 +171,8 @@ class InterceptQueue {
|
||||
return status;
|
||||
}
|
||||
|
||||
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
|
||||
hsa_amd_queue_intercept_packet_writer writer) {
|
||||
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx,
|
||||
void* data, hsa_amd_queue_intercept_packet_writer writer) {
|
||||
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
|
||||
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
|
||||
Queue* proxy = obj->proxy_;
|
||||
@@ -195,10 +196,10 @@ class InterceptQueue {
|
||||
obj->queue_id,
|
||||
completion_signal,
|
||||
dispatch_packet,
|
||||
NULL, // kernel_name
|
||||
0, // kernel_object
|
||||
NULL, // kernel_code
|
||||
0, // (uint32_t)syscall(__NR_gettid),
|
||||
NULL, // kernel_name
|
||||
0, // kernel_object
|
||||
NULL, // kernel_code
|
||||
0, // (uint32_t)syscall(__NR_gettid),
|
||||
NULL}; // record
|
||||
|
||||
// Calling dispatch callback
|
||||
@@ -210,7 +211,8 @@ class InterceptQueue {
|
||||
if (group.feature_count != 0) {
|
||||
if (tracker_ != NULL) {
|
||||
Group* context_group = context->GetGroup(group.index);
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context_group->GetDispatchSignal();
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
context_group->GetDispatchSignal();
|
||||
Tracker::Enable_opt(context_group, completion_signal);
|
||||
context_group->IncrRefsCount();
|
||||
}
|
||||
@@ -254,8 +256,9 @@ class InterceptQueue {
|
||||
const uint32_t tid = syscall(__NR_gettid);
|
||||
hsa_queue_t* qptr = obj->queue_;
|
||||
const void* slot_ptr = util::HsaRsrcFactory::GetSlotPointer(qptr, user_que_idx);
|
||||
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n",
|
||||
pid, tid, qptr, user_que_idx, in_packets, slot_ptr, count, header_val); fflush(stdout);
|
||||
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n", pid, tid, qptr,
|
||||
user_que_idx, in_packets, slot_ptr, count, header_val);
|
||||
fflush(stdout);
|
||||
print_packet(in_packets, count);
|
||||
abort();
|
||||
#endif
|
||||
@@ -277,8 +280,9 @@ class InterceptQueue {
|
||||
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
|
||||
uint64_t kernel_object = dispatch_packet->kernel_object;
|
||||
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
|
||||
QueryKernelName(kernel_object, kernel_code) : NULL;
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
? QueryKernelName(kernel_object, kernel_code)
|
||||
: NULL;
|
||||
}
|
||||
|
||||
// Prepareing submit callback data
|
||||
@@ -311,8 +315,11 @@ class InterceptQueue {
|
||||
|
||||
const bool is_serial = (k_concurrent_ == K_CONC_OFF);
|
||||
if (tracker_ != NULL) {
|
||||
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id, dispatch_packet->completion_signal, is_serial);
|
||||
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->signal;
|
||||
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id,
|
||||
dispatch_packet->completion_signal, is_serial);
|
||||
if (is_serial)
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
tracker_entry->signal;
|
||||
}
|
||||
|
||||
// Prepareing dispatch callback data
|
||||
@@ -339,7 +346,9 @@ class InterceptQueue {
|
||||
// Injecting profiling start/stop/read packets
|
||||
if ((status != HSA_STATUS_SUCCESS) || (group.context == NULL)) {
|
||||
if (tracker_entry != NULL) {
|
||||
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->orig;
|
||||
if (is_serial)
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
tracker_entry->orig;
|
||||
tracker_->Delete(tracker_entry);
|
||||
}
|
||||
} else {
|
||||
@@ -351,11 +360,11 @@ class InterceptQueue {
|
||||
const pkt_vector_t& read_vector = context->ReadPackets(group.index);
|
||||
pkt_vector_t packets;
|
||||
|
||||
if (is_serial) { // serial
|
||||
if (is_serial) { // serial
|
||||
packets = start_vector;
|
||||
packets.insert(packets.end(), *packet);
|
||||
packets.insert(packets.end(), stop_vector.begin(), stop_vector.end());
|
||||
} else { // concurrent
|
||||
} else { // concurrent
|
||||
// Insert start packets once
|
||||
auto inject_start = [&packets](const pkt_vector_t& starts) mutable {
|
||||
packets = starts;
|
||||
@@ -363,14 +372,15 @@ class InterceptQueue {
|
||||
std::call_once(once_flag_, inject_start, start_vector);
|
||||
// Reads at both kernel start and end (also with barriers)
|
||||
assert(read_vector.size() >= 2 * start_vector.size());
|
||||
auto mid = read_vector.begin() + read_vector.size()/2;
|
||||
auto mid = read_vector.begin() + read_vector.size() / 2;
|
||||
// Read at kernel start
|
||||
packets.insert(packets.end(), read_vector.begin(), mid);
|
||||
// Kernel dispatch packet
|
||||
assert(tracker_entry != NULL);
|
||||
// Bind dispatch and barrier signals with tracker entry
|
||||
tracker_->SetHandler(tracker_entry, context->GetGroup(group.index));
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context->GetGroup(group.index)->GetDispatchSignal();
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
context->GetGroup(group.index)->GetDispatchSignal();
|
||||
packets.insert(packets.end(), *packet);
|
||||
// Read at kernel end
|
||||
packets.insert(packets.end(), mid, read_vector.end());
|
||||
@@ -379,7 +389,8 @@ class InterceptQueue {
|
||||
if (tracker_entry != NULL) {
|
||||
Group* context_group = context->GetGroup(group.index);
|
||||
context_group->IncrRefsCount();
|
||||
tracker_->EnableContext(tracker_entry, Context::Handler, reinterpret_cast<void*>(context_group));
|
||||
tracker_->EnableContext(tracker_entry, Context::Handler,
|
||||
reinterpret_cast<void*>(context_group));
|
||||
}
|
||||
|
||||
if (writer != NULL) {
|
||||
@@ -409,8 +420,8 @@ class InterceptQueue {
|
||||
}
|
||||
}
|
||||
|
||||
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
|
||||
hsa_amd_queue_intercept_packet_writer writer) {
|
||||
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx,
|
||||
void* data, hsa_amd_queue_intercept_packet_writer writer) {
|
||||
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
|
||||
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
|
||||
Queue* proxy = obj->proxy_;
|
||||
@@ -431,8 +442,9 @@ class InterceptQueue {
|
||||
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
|
||||
uint64_t kernel_object = dispatch_packet->kernel_object;
|
||||
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
|
||||
QueryKernelName(kernel_object, kernel_code) : NULL;
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
? QueryKernelName(kernel_object, kernel_code)
|
||||
: NULL;
|
||||
}
|
||||
|
||||
// Prepareing submit callback data
|
||||
@@ -529,7 +541,9 @@ class InterceptQueue {
|
||||
Stop();
|
||||
}
|
||||
|
||||
static inline void Start() { dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release); }
|
||||
static inline void Start() {
|
||||
dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release);
|
||||
}
|
||||
static inline void Stop() { dispatch_callback_.store(NULL, std::memory_order_relaxed); }
|
||||
|
||||
static void SetSubmitCallback(rocprofiler_hsa_callback_fun_t fun, void* arg) {
|
||||
@@ -545,7 +559,7 @@ class InterceptQueue {
|
||||
static uint32_t k_concurrent_;
|
||||
|
||||
private:
|
||||
static void queue_event_callback(hsa_status_t status, hsa_queue_t *queue, void *arg) {
|
||||
static void queue_event_callback(hsa_status_t status, hsa_queue_t* queue, void* arg) {
|
||||
if (status != HSA_STATUS_SUCCESS) {
|
||||
uint32_t* read_ptr32 = (uint32_t*)util::HsaRsrcFactory::GetReadPointer(queue);
|
||||
print_packet(read_ptr32, 1);
|
||||
@@ -582,12 +596,13 @@ class InterceptQueue {
|
||||
const uint16_t kernel_object_flag = *((uint64_t*)kernel_code + 1);
|
||||
if (kernel_object_flag == 0) {
|
||||
if (!util::HsaRsrcFactory::IsExecutableTracking()) {
|
||||
EXC_ABORT(HSA_STATUS_ERROR, "Error: V3 code object detected - code objects tracking should be enabled\n");
|
||||
EXC_ABORT(HSA_STATUS_ERROR,
|
||||
"Error: V3 code object detected - code objects tracking should be enabled\n");
|
||||
}
|
||||
}
|
||||
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking()) ?
|
||||
util::HsaRsrcFactory::GetKernelNameRef(kernel_object) :
|
||||
GetKernelName(kernel_code->runtime_loader_kernel_symbol);
|
||||
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking())
|
||||
? util::HsaRsrcFactory::GetKernelNameRef(kernel_object)
|
||||
: GetKernelName(kernel_code->runtime_loader_kernel_symbol);
|
||||
return kernel_symname;
|
||||
}
|
||||
|
||||
@@ -618,17 +633,13 @@ class InterceptQueue {
|
||||
return status;
|
||||
}
|
||||
|
||||
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy) :
|
||||
queue_(queue),
|
||||
proxy_(proxy)
|
||||
{
|
||||
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy)
|
||||
: queue_(queue), proxy_(proxy) {
|
||||
agent_info_ = util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
|
||||
queue_event_callback_ = NULL;
|
||||
}
|
||||
|
||||
~InterceptQueue() {
|
||||
ProxyQueue::Destroy(proxy_);
|
||||
}
|
||||
~InterceptQueue() { ProxyQueue::Destroy(proxy_); }
|
||||
|
||||
static const packet_word_t header_type_mask = (1ul << HSA_PACKET_HEADER_WIDTH_TYPE) - 1;
|
||||
|
||||
|
||||
@@ -25,4 +25,4 @@ THE SOFTWARE.
|
||||
namespace rocprofiler {
|
||||
MetricsDict::map_t* MetricsDict::map_ = NULL;
|
||||
MetricsDict::mutex_t MetricsDict::mutex_;
|
||||
}
|
||||
} // namespace rocprofiler
|
||||
|
||||
Исполняемый файл → Обычный файл
+5
-5
@@ -202,15 +202,15 @@ class MetricsDict {
|
||||
xml_->AddConst("top.const.metric", "SE_NUM", agent_info->se_num);
|
||||
ImportMetrics(agent_info, "const");
|
||||
agent_name_ = agent_info->name;
|
||||
|
||||
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
|
||||
|
||||
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
|
||||
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
|
||||
|
||||
std::unordered_set<std::string> supported_agent_names = {
|
||||
"gfx906", "gfx908", "gfx90a", // Vega
|
||||
"gfx940", "gfx941", "gfx942", // Mi300
|
||||
"gfx906", "gfx908", "gfx90a", // Vega
|
||||
"gfx940", "gfx941", "gfx942", // Mi300
|
||||
"gfx1030", "gfx1031", "gfx1032", // Navi2x
|
||||
"gfx1100", "gfx1101" // Navi3x
|
||||
"gfx1100", "gfx1101" // Navi3x
|
||||
};
|
||||
if (supported_agent_names.find(agent_name_) != supported_agent_names.end()) {
|
||||
ImportMetrics(agent_info, agent_name_);
|
||||
|
||||
@@ -140,7 +140,7 @@ class Profile {
|
||||
static void SetConcurrent(profile_t* profile) {
|
||||
// Check whether conconcurrent has been set
|
||||
for (const parameter_t* p = profile->parameters;
|
||||
p < (profile->parameters + profile->parameter_count); ++p) {
|
||||
p < (profile->parameters + profile->parameter_count); ++p) {
|
||||
// If yes, stop here
|
||||
if (p->parameter_name == HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_K_CONCURRENT) {
|
||||
return;
|
||||
@@ -148,7 +148,7 @@ class Profile {
|
||||
}
|
||||
|
||||
// Otherwise, try to set
|
||||
parameter_t* parameters = new parameter_t[profile->parameter_count+1];
|
||||
parameter_t* parameters = new parameter_t[profile->parameter_count + 1];
|
||||
for (unsigned i = 0; i < profile->parameter_count; ++i) {
|
||||
parameters[i].parameter_name = profile->parameters[i].parameter_name;
|
||||
parameters[i].value = profile->parameters[i].value;
|
||||
@@ -162,15 +162,16 @@ class Profile {
|
||||
}
|
||||
|
||||
void BarrierPacket(packet_t* packet, const hsa_signal_t& prior_signal) {
|
||||
hsa_barrier_and_packet_t* barrier =
|
||||
reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
|
||||
hsa_barrier_and_packet_t* barrier = reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
|
||||
barrier->header = HSA_PACKET_TYPE_BARRIER_AND;
|
||||
if (prior_signal.handle) barrier->dep_signal[0] = prior_signal; // set packet dependency
|
||||
else barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
|
||||
if (prior_signal.handle)
|
||||
barrier->dep_signal[0] = prior_signal; // set packet dependency
|
||||
else
|
||||
barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
|
||||
}
|
||||
|
||||
hsa_status_t Finalize(pkt_vector_t& start_vector, pkt_vector_t& stop_vector,
|
||||
pkt_vector_t& read_vector, bool is_concurrent = false) {
|
||||
pkt_vector_t& read_vector, bool is_concurrent = false) {
|
||||
if (is_concurrent) SetConcurrent(&profile_);
|
||||
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
@@ -180,8 +181,8 @@ class Profile {
|
||||
const pfn_t* api = rsrc->AqlProfileApi();
|
||||
packet_t start{};
|
||||
packet_t stop{};
|
||||
packet_t read{}; // read at kernel start
|
||||
packet_t read2{}; // read at kernel end
|
||||
packet_t read{}; // read at kernel start
|
||||
packet_t read2{}; // read at kernel end
|
||||
|
||||
// Check the profile buffer sizes
|
||||
status = api->hsa_ven_amd_aqlprofile_start(&profile_, NULL);
|
||||
@@ -200,12 +201,12 @@ class Profile {
|
||||
#ifdef AQLPROF_NEW_API
|
||||
if (profile_.type == HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC) {
|
||||
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read);
|
||||
if (is_concurrent){ // concurrent: one more read
|
||||
if (is_concurrent) { // concurrent: one more read
|
||||
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
|
||||
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read2);
|
||||
}
|
||||
}
|
||||
#if 0 // Read API returns error if disabled
|
||||
#if 0 // Read API returns error if disabled
|
||||
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
|
||||
#endif
|
||||
#endif
|
||||
@@ -220,7 +221,8 @@ class Profile {
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
if (is_concurrent) {
|
||||
status = hsa_signal_create(1, 0, NULL, &read_signal_);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
read.completion_signal = read_signal_;
|
||||
read2.completion_signal = completion_signal_;
|
||||
} else {
|
||||
@@ -239,7 +241,8 @@ class Profile {
|
||||
BarrierPacket(&barrier_rd, read.completion_signal);
|
||||
BarrierPacket(&barrier_rd2, dispatch_signal_);
|
||||
status = hsa_signal_create(1, 0, NULL, &(barrier_signal_));
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
barrier_rd2.completion_signal = barrier_signal_;
|
||||
}
|
||||
|
||||
@@ -297,8 +300,8 @@ class Profile {
|
||||
|
||||
void GetProfiles(profile_vector_t& vec) {
|
||||
if (!info_vector_.empty()) {
|
||||
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_,
|
||||
dispatch_signal_, barrier_signal_, read_signal_});
|
||||
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_, dispatch_signal_,
|
||||
barrier_signal_, read_signal_});
|
||||
}
|
||||
}
|
||||
|
||||
@@ -330,11 +333,12 @@ class PmcProfile : public Profile {
|
||||
|
||||
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
|
||||
profile_.command_buffer.ptr =
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
// Allocate profile output buffer from kernarg memory pool since kernarg
|
||||
// memory buffer is uncached. So when GPU copies performance counter values
|
||||
// to this buffer they are guaranteed to be visible to CPU.
|
||||
profile_.output_buffer.ptr = rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
|
||||
profile_.output_buffer.ptr =
|
||||
rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
|
||||
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
|
||||
: HSA_STATUS_ERROR;
|
||||
}
|
||||
@@ -366,11 +370,11 @@ class TraceProfile : public Profile {
|
||||
|
||||
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
|
||||
profile_.command_buffer.ptr =
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
profile_.output_buffer.size = output_buffer_size_;
|
||||
profile_.output_buffer.ptr = (output_buffer_local_) ?
|
||||
rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size) :
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
|
||||
profile_.output_buffer.ptr = (output_buffer_local_)
|
||||
? rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size)
|
||||
: rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
|
||||
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
|
||||
: HSA_STATUS_ERROR;
|
||||
}
|
||||
|
||||
@@ -38,10 +38,10 @@ ProxyQueue* ProxyQueue::Create(hsa_agent_t agent, uint32_t size, hsa_queue_type3
|
||||
hsa_status_t* status) {
|
||||
hsa_status_t suc = HSA_STATUS_ERROR;
|
||||
ProxyQueue* instance =
|
||||
(rocp_type_) ? (ProxyQueue*) new SimpleProxyQueue() : (ProxyQueue*) new HsaProxyQueue();
|
||||
(rocp_type_) ? (ProxyQueue*)new SimpleProxyQueue() : (ProxyQueue*)new HsaProxyQueue();
|
||||
if (instance != NULL) {
|
||||
suc = instance->Init(agent, size, type, callback, data, private_segment_size,
|
||||
group_segment_size, queue);
|
||||
group_segment_size, queue);
|
||||
if (suc != HSA_STATUS_SUCCESS) {
|
||||
delete instance;
|
||||
instance = NULL;
|
||||
|
||||
@@ -75,34 +75,34 @@ hsa_status_t CreateQueuePro(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t
|
||||
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
|
||||
hsa_queue_t** queue);
|
||||
|
||||
decltype(hsa_queue_create)* hsa_queue_create_fn;
|
||||
decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
decltype(::hsa_queue_create)* hsa_queue_create_fn;
|
||||
decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
|
||||
decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
|
||||
decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
decltype(hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
|
||||
decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
decltype(::hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
|
||||
|
||||
decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
|
||||
decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
|
||||
decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
decltype(hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
decltype(hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
|
||||
decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
decltype(::hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
|
||||
decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
|
||||
::HsaApiTable* kHsaApiTable;
|
||||
|
||||
@@ -393,80 +393,80 @@ ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25;
|
||||
PUBLIC_API bool OnLoad(HsaApiTable* table, uint64_t runtime_version, uint64_t failed_tool_count,
|
||||
const char* const* failed_tool_names) {
|
||||
ONLOAD_TRACE_BEG();
|
||||
rocprofiler::SaveHsaApi(table);
|
||||
rocprofiler::ProxyQueue::InitFactory();
|
||||
rocprofiler::SaveHsaApi(table);
|
||||
rocprofiler::ProxyQueue::InitFactory();
|
||||
|
||||
// Checking environment to enable intercept mode
|
||||
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
|
||||
// Checking environment to enable intercept mode
|
||||
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
|
||||
|
||||
int intercept_env_value = 0;
|
||||
if (intercept_env != NULL) {
|
||||
intercept_env_value = atoi(intercept_env);
|
||||
int intercept_env_value = 0;
|
||||
if (intercept_env != NULL) {
|
||||
intercept_env_value = atoi(intercept_env);
|
||||
|
||||
switch (intercept_env_value) {
|
||||
case 0:
|
||||
case 1:
|
||||
// 0: Intercepting disabled
|
||||
// 1: Intercepting enabled without timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(false);
|
||||
break;
|
||||
case 2:
|
||||
// Intercepting enabled with timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(true);
|
||||
break;
|
||||
default:
|
||||
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
|
||||
<< intercept_env << "): "
|
||||
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
|
||||
"(intercepting with timestamp)");
|
||||
return false;
|
||||
}
|
||||
switch (intercept_env_value) {
|
||||
case 0:
|
||||
case 1:
|
||||
// 0: Intercepting disabled
|
||||
// 1: Intercepting enabled without timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(false);
|
||||
break;
|
||||
case 2:
|
||||
// Intercepting enabled with timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(true);
|
||||
break;
|
||||
default:
|
||||
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
|
||||
<< intercept_env << "): "
|
||||
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
|
||||
"(intercepting with timestamp)");
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// always enable excutable tracking
|
||||
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
|
||||
// always enable excutable tracking
|
||||
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
|
||||
|
||||
// Loading a tool lib and setting of intercept mode
|
||||
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
|
||||
// Loading a tool lib and setting of intercept mode
|
||||
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
|
||||
|
||||
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
|
||||
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
|
||||
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_interceptor;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
|
||||
}
|
||||
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
|
||||
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
|
||||
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
|
||||
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_interceptor;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
|
||||
}
|
||||
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
|
||||
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
|
||||
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
|
||||
}
|
||||
rocprofiler::HsaInterceptor::Enable(true);
|
||||
rocprofiler::HsaInterceptor::HsaIntercept(table);
|
||||
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
|
||||
}
|
||||
rocprofiler::HsaInterceptor::Enable(true);
|
||||
rocprofiler::HsaInterceptor::HsaIntercept(table);
|
||||
}
|
||||
|
||||
// HSA intercepting
|
||||
if (intercept_env_value != 0) {
|
||||
rocprofiler::ProxyQueue::HsaIntercept(table);
|
||||
rocprofiler::InterceptQueue::HsaIntercept(table);
|
||||
} else {
|
||||
rocprofiler::StandaloneIntercept();
|
||||
}
|
||||
// HSA intercepting
|
||||
if (intercept_env_value != 0) {
|
||||
rocprofiler::ProxyQueue::HsaIntercept(table);
|
||||
rocprofiler::InterceptQueue::HsaIntercept(table);
|
||||
} else {
|
||||
rocprofiler::StandaloneIntercept();
|
||||
}
|
||||
|
||||
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
|
||||
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
|
||||
<< ")" << std::dec);
|
||||
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
|
||||
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
|
||||
<< ")" << std::dec);
|
||||
return true;
|
||||
}
|
||||
|
||||
// HSA-runtime tool on-unload method
|
||||
PUBLIC_API void OnUnload() {
|
||||
ONLOAD_TRACE_BEG();
|
||||
rocprofiler::UnloadTool();
|
||||
rocprofiler::RestoreHsaApi();
|
||||
rocprofiler::UnloadTool();
|
||||
rocprofiler::RestoreHsaApi();
|
||||
ONLOAD_TRACE_END();
|
||||
}
|
||||
|
||||
|
||||
@@ -27,22 +27,20 @@ namespace rocprofiler {
|
||||
namespace att {
|
||||
|
||||
AttTracer::AttTracer(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
|
||||
rocprofiler_session_id_t session_id)
|
||||
rocprofiler_session_id_t session_id)
|
||||
: buffer_id_(buffer_id), filter_id_(filter_id), session_id_(session_id) {}
|
||||
|
||||
void AttTracer::AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
|
||||
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
|
||||
rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties,
|
||||
uint32_t thread_id, uint64_t queue_index) {
|
||||
void AttTracer::AddPendingSignals(
|
||||
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
|
||||
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id, hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index) {
|
||||
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
|
||||
if (sessions_pending_signals_.find(writer_id) == sessions_pending_signals_.end())
|
||||
sessions_pending_signals_.emplace(writer_id, std::vector<att_pending_signal_t>());
|
||||
sessions_pending_signals_.at(writer_id).emplace_back(
|
||||
att_pending_signal_t{kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, profile,
|
||||
kernel_properties, thread_id, queue_index});
|
||||
sessions_pending_signals_.at(writer_id).emplace_back(att_pending_signal_t{
|
||||
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id,
|
||||
profile, kernel_properties, thread_id, queue_index});
|
||||
std::atomic_thread_fence(std::memory_order_release);
|
||||
}
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
|
||||
}
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_PC_SAMPLING_COLLECTION:{
|
||||
case ROCPROFILER_PC_SAMPLING_COLLECTION: {
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_ATT_TRACE_COLLECTION: {
|
||||
@@ -62,8 +62,8 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
|
||||
}
|
||||
case ROCPROFILER_API_TRACE: {
|
||||
tracer_apis_.clear();
|
||||
for (uint32_t j = 0; j < data_count; j++){
|
||||
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
|
||||
for (uint32_t j = 0; j < data_count; j++) {
|
||||
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
|
||||
}
|
||||
break;
|
||||
}
|
||||
@@ -195,7 +195,7 @@ void Filter::SetProperty(rocprofiler_filter_property_t property) {
|
||||
case ROCPROFILER_FILTER_DISPATCH_IDS:
|
||||
dispatch_id_filter_.clear();
|
||||
for (uint32_t j = 0; j < property.data_count; j++)
|
||||
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
|
||||
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
@@ -249,9 +249,7 @@ void Filter::SetCallback(rocprofiler_sync_callback_t& callback) {
|
||||
|
||||
bool Filter::HasCallback() { return has_sync_callback_; }
|
||||
|
||||
rocprofiler_sync_callback_t& Filter::GetCallback() {
|
||||
return callback_;
|
||||
}
|
||||
rocprofiler_sync_callback_t& Filter::GetCallback() { return callback_; }
|
||||
|
||||
size_t Filter::GetPropertiesCount(rocprofiler_filter_property_kind_t kind) {
|
||||
switch (kind) {
|
||||
|
||||
@@ -53,11 +53,8 @@ class Filter {
|
||||
bool HasCallback();
|
||||
|
||||
void SetProperty(rocprofiler_filter_property_t property);
|
||||
std::variant<
|
||||
std::vector<std::string>,
|
||||
uint32_t*,
|
||||
std::vector<uint64_t>
|
||||
> GetProperty(rocprofiler_filter_property_kind_t kind);
|
||||
std::variant<std::vector<std::string>, uint32_t*, std::vector<uint64_t> > GetProperty(
|
||||
rocprofiler_filter_property_kind_t kind);
|
||||
|
||||
size_t GetPropertiesCount(rocprofiler_filter_property_kind_t kind);
|
||||
rocprofiler_spm_parameter_t* GetSpmParameterData();
|
||||
@@ -74,11 +71,12 @@ class Filter {
|
||||
std::vector<std::string> kernel_names_; // HIP/HSA API Functions
|
||||
uint32_t dispatch_range_[2]; // Kernel Dispatches OR API Range
|
||||
|
||||
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
|
||||
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
|
||||
std::vector<rocprofiler_tracer_activity_domain_t> tracer_apis_; // ROCTX/HIP/HSA API
|
||||
rocprofiler_spm_parameter_t* spm_parameter_; // spm parameter
|
||||
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
|
||||
rocprofiler_counters_sampler_parameters_t counters_sampler_parameters_; // sampled counters parameters
|
||||
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
|
||||
rocprofiler_counters_sampler_parameters_t
|
||||
counters_sampler_parameters_; // sampled counters parameters
|
||||
std::vector<uint64_t> dispatch_id_filter_;
|
||||
|
||||
bool has_sync_callback_{false};
|
||||
|
||||
@@ -125,17 +125,19 @@ bool Profiler::HasActivePass() {
|
||||
}
|
||||
|
||||
void Profiler::AddPendingSignals(
|
||||
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
|
||||
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile, rocprofiler_kernel_properties_t kernel_properties,
|
||||
uint32_t thread_id, uint64_t queue_index, uint64_t correlation_id) {
|
||||
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
|
||||
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id, rocprofiler::profiling_context_t* context,
|
||||
uint64_t session_data_count, hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index,
|
||||
uint64_t correlation_id) {
|
||||
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
|
||||
if (sessions_pending_signals_->find(writer_id) == sessions_pending_signals_->end())
|
||||
sessions_pending_signals_->emplace(writer_id, std::vector<pending_signal_t*>());
|
||||
sessions_pending_signals_->at(writer_id).emplace_back(new pending_signal_t{
|
||||
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, context, session_data_count,
|
||||
profile, kernel_properties, thread_id, queue_index, correlation_id});
|
||||
sessions_pending_signals_->at(writer_id).emplace_back(
|
||||
new pending_signal_t{kernel_object, original_completion_signal, new_completion_signal,
|
||||
session_id_, buffer_id, context, session_data_count, profile,
|
||||
kernel_properties, thread_id, queue_index, correlation_id});
|
||||
}
|
||||
|
||||
const std::vector<pending_signal_t*>& Profiler::GetPendingSignals(uint32_t writer_id) {
|
||||
|
||||
@@ -36,7 +36,7 @@
|
||||
#include "src/core/counters/metrics/eval_metrics.h"
|
||||
|
||||
typedef void (*rocprofiler_add_profiler_record_t)(rocprofiler_record_profiler_t&& record,
|
||||
rocprofiler_session_id_t session_id);
|
||||
rocprofiler_session_id_t session_id);
|
||||
|
||||
typedef rocprofiler_timestamp_t (*rocprofiler_get_timestamp_t)();
|
||||
|
||||
@@ -68,12 +68,13 @@ class Profiler {
|
||||
~Profiler();
|
||||
|
||||
void AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
|
||||
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
|
||||
uint64_t queue_index, uint64_t correlation_id);
|
||||
const hsa_signal_t& original_completion_signal,
|
||||
const hsa_signal_t& new_completion_signal,
|
||||
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
|
||||
uint64_t queue_index, uint64_t correlation_id);
|
||||
|
||||
const std::vector<pending_signal_t*>& GetPendingSignals(uint32_t writer_id);
|
||||
bool CheckPendingSignalsIsEmpty();
|
||||
@@ -83,8 +84,10 @@ class Profiler {
|
||||
std::string& GetCounterName(rocprofiler_counter_id_t handler);
|
||||
|
||||
bool FindCounter(rocprofiler_counter_id_t counter_id);
|
||||
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
|
||||
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
|
||||
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind,
|
||||
rocprofiler_counter_id_t counter_id);
|
||||
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind,
|
||||
rocprofiler_counter_id_t counter_id);
|
||||
|
||||
void StartReplayPass(rocprofiler_session_id_t session_id);
|
||||
void EndReplayPass();
|
||||
|
||||
@@ -67,8 +67,8 @@ class Session {
|
||||
|
||||
// Filter
|
||||
rocprofiler_filter_id_t CreateFilter(rocprofiler_filter_kind_t filter_kind,
|
||||
rocprofiler_filter_data_t filter_data, uint64_t data_count,
|
||||
rocprofiler_filter_property_t property);
|
||||
rocprofiler_filter_data_t filter_data, uint64_t data_count,
|
||||
rocprofiler_filter_property_t property);
|
||||
bool FindFilter(rocprofiler_filter_id_t filter_id);
|
||||
void DestroyFilter(rocprofiler_filter_id_t filter_id);
|
||||
Filter* GetFilter(rocprofiler_filter_id_t filter_id);
|
||||
@@ -83,7 +83,7 @@ class Session {
|
||||
|
||||
// Buffer
|
||||
rocprofiler_buffer_id_t CreateBuffer(rocprofiler_buffer_callback_t buffer_callback,
|
||||
size_t buffer_size);
|
||||
size_t buffer_size);
|
||||
bool FindBuffer(rocprofiler_buffer_id_t buffer_id);
|
||||
void DestroyBuffer(rocprofiler_buffer_id_t buffer_id);
|
||||
Memory::GenericBuffer* GetBuffer(rocprofiler_buffer_id_t buffer_id);
|
||||
|
||||
@@ -112,8 +112,7 @@ const char* roctracer_op_string(uint32_t domain, uint32_t op) {
|
||||
case ACTIVITY_DOMAIN_EXT_API:
|
||||
return "EXT_API";
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -178,8 +177,7 @@ constexpr uint32_t get_op_begin(activity_domain_t domain) {
|
||||
case ACTIVITY_DOMAIN_EXT_API:
|
||||
return 0;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -200,8 +198,7 @@ constexpr uint32_t get_op_end(activity_domain_t domain) {
|
||||
case ACTIVITY_DOMAIN_EXT_API:
|
||||
return get_op_begin(ACTIVITY_DOMAIN_EXT_API);
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -476,11 +473,10 @@ int TracerCallback(activity_domain_t domain, uint32_t operation_id, void* data)
|
||||
rocprofiler::GetROCProfilerSingleton()
|
||||
->GetSession((*pool)->session_id)
|
||||
->GetBuffer((*pool)->buffer_id)
|
||||
->AddRecord(
|
||||
rocprofiler_record, record->kernel_name, kernel_name_size,
|
||||
[](auto& rocprofiler_record, const void* data) {
|
||||
rocprofiler_record.name = static_cast<const char*>(data);
|
||||
});
|
||||
->AddRecord(rocprofiler_record, record->kernel_name, kernel_name_size,
|
||||
[](auto& rocprofiler_record, const void* data) {
|
||||
rocprofiler_record.name = static_cast<const char*>(data);
|
||||
});
|
||||
} else {
|
||||
rocprofiler::GetROCProfilerSingleton()
|
||||
->GetSession((*pool)->session_id)
|
||||
@@ -584,8 +580,7 @@ static void roctracer_enable_op_callback(activity_domain_t domain, uint32_t oper
|
||||
user_data);
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -623,8 +618,7 @@ void roctracer_disable_op_callback(activity_domain_t domain, uint32_t operation_
|
||||
ROCTX_registration_group.Unregister(roctx_api_callback_table, operation_id);
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -667,8 +661,7 @@ void roctracer_enable_op_activity(activity_domain_t domain, uint32_t op,
|
||||
case ACTIVITY_DOMAIN_ROCTX:
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -710,8 +703,7 @@ void roctracer_disable_activity(activity_domain_t domain, uint32_t op) {
|
||||
case ACTIVITY_DOMAIN_ROCTX:
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -774,8 +766,7 @@ void roctracer_set_properties(activity_domain_t domain, void* properties) {
|
||||
break;
|
||||
}
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -791,9 +782,7 @@ static std::string getKernelNameMultiKernelMultiDevice(hipLaunchParams* launchPa
|
||||
return name_str.str();
|
||||
}
|
||||
|
||||
template <typename... Ts> struct Overloaded : Ts... {
|
||||
using Ts::operator()...;
|
||||
};
|
||||
template <typename... Ts> struct Overloaded : Ts... { using Ts::operator()...; };
|
||||
template <class... Ts> Overloaded(Ts...) -> Overloaded<Ts...>;
|
||||
|
||||
std::optional<std::string> GetHipKernelName(uint32_t cid, hip_api_data_t* data) {
|
||||
|
||||
@@ -27,13 +27,19 @@ void SimpleProxyQueue::HsaIntercept(HsaApiTable* table) {
|
||||
table->core_->hsa_signal_store_relaxed_fn = rocprofiler::SimpleProxyQueue::SignalStore;
|
||||
table->core_->hsa_signal_store_screlease_fn = rocprofiler::SimpleProxyQueue::SignalStore;
|
||||
|
||||
table->core_->hsa_queue_load_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
table->core_->hsa_queue_load_write_index_relaxed_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_relaxed_fn =
|
||||
rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_relaxed_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
|
||||
table->core_->hsa_queue_load_write_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_screlease_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
table->core_->hsa_queue_load_write_index_scacquire_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_screlease_fn =
|
||||
rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_scacquire_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
}
|
||||
|
||||
SimpleProxyQueue::queue_map_t* SimpleProxyQueue::queue_map_ = NULL;
|
||||
|
||||
@@ -33,23 +33,23 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
#ifndef ROCP_PROXY_LOCK
|
||||
# define ROCP_PROXY_LOCK 1
|
||||
#define ROCP_PROXY_LOCK 1
|
||||
#endif
|
||||
|
||||
namespace rocprofiler {
|
||||
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
|
||||
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
|
||||
extern decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
extern decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
extern decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
extern decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
extern decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
extern decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
|
||||
extern decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
extern decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
extern decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
extern decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
extern decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
extern decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
|
||||
typedef decltype(hsa_signal_t::handle) signal_handle_t;
|
||||
|
||||
@@ -128,7 +128,8 @@ class SimpleProxyQueue : public ProxyQueue {
|
||||
const uint64_t que_idx = hsa_queue_load_write_index_relaxed_fn(queue_);
|
||||
|
||||
// Waiting untill there is a free space in the queue
|
||||
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_));
|
||||
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_))
|
||||
;
|
||||
|
||||
// Increment the write index
|
||||
hsa_queue_store_write_index_relaxed_fn(queue_, que_idx + 1);
|
||||
@@ -163,8 +164,7 @@ class SimpleProxyQueue : public ProxyQueue {
|
||||
queue_mask_(0),
|
||||
submit_index_(0),
|
||||
on_submit_cb_(NULL),
|
||||
on_submit_cb_data_(NULL)
|
||||
{
|
||||
on_submit_cb_data_(NULL) {
|
||||
printf("ROCProfiler: SimpleProxyQueue is enabled\n");
|
||||
fflush(stdout);
|
||||
}
|
||||
@@ -203,8 +203,8 @@ class SimpleProxyQueue : public ProxyQueue {
|
||||
|
||||
if (queue_map_ == NULL) queue_map_ = new queue_map_t;
|
||||
(*queue_map_)[queue_->doorbell_signal.handle] = this;
|
||||
}
|
||||
else abort();
|
||||
} else
|
||||
abort();
|
||||
}
|
||||
}
|
||||
if (status != HSA_STATUS_SUCCESS) abort();
|
||||
|
||||
@@ -40,7 +40,7 @@ THE SOFTWARE.
|
||||
namespace rocprofiler {
|
||||
|
||||
class Tracker {
|
||||
public:
|
||||
public:
|
||||
typedef std::mutex mutex_t;
|
||||
typedef util::HsaRsrcFactory::timestamp_t timestamp_t;
|
||||
typedef rocprofiler_dispatch_record_t record_t;
|
||||
@@ -89,7 +89,7 @@ class Tracker {
|
||||
}
|
||||
|
||||
// Add tracker entry
|
||||
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy=true) {
|
||||
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy = true) {
|
||||
hsa_status_t status = HSA_STATUS_ERROR;
|
||||
|
||||
// Creating a new tracker entry
|
||||
@@ -108,10 +108,12 @@ class Tracker {
|
||||
// Creating a proxy signal
|
||||
if (proxy) {
|
||||
entry->is_proxy = true;
|
||||
const hsa_signal_value_t signal_value = (orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
|
||||
const hsa_signal_value_t signal_value =
|
||||
(orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
|
||||
status = hsa_api_.hsa_signal_create(signal_value, 0, NULL, &(entry->signal));
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_signal_create");
|
||||
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler, entry);
|
||||
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT,
|
||||
signal_value, Handler, entry);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
|
||||
}
|
||||
|
||||
@@ -128,7 +130,8 @@ class Tracker {
|
||||
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
|
||||
hsa_signal_t& handler_signal = group->GetBarrierSignal();
|
||||
entry->signal = dispatch_signal;
|
||||
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
|
||||
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(
|
||||
handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
|
||||
}
|
||||
|
||||
@@ -150,7 +153,8 @@ class Tracker {
|
||||
// Debug trace
|
||||
if (trace_on_) {
|
||||
auto outstanding = outstanding_.fetch_add(1);
|
||||
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
|
||||
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record,
|
||||
outstanding);
|
||||
fflush(stdout);
|
||||
}
|
||||
}
|
||||
@@ -173,12 +177,14 @@ class Tracker {
|
||||
group->GetRecord()->dispatch = util::HsaRsrcFactory::Instance().TimestampNs();
|
||||
|
||||
// Creating a proxy signal
|
||||
const hsa_signal_value_t signal_value = (orig_signal.handle) ?
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal) : 1;
|
||||
const hsa_signal_value_t signal_value = (orig_signal.handle)
|
||||
? util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal)
|
||||
: 1;
|
||||
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal, signal_value);
|
||||
hsa_status_t status =
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal,
|
||||
signal_value);
|
||||
hsa_status_t status = util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(
|
||||
dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
|
||||
}
|
||||
|
||||
@@ -190,7 +196,8 @@ class Tracker {
|
||||
record_t* record = group->GetRecord();
|
||||
hsa_amd_profiling_dispatch_time_t dispatch_time{};
|
||||
hsa_status_t status =
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(context->GetAgent(), dispatch_signal, &dispatch_time);
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(
|
||||
context->GetAgent(), dispatch_signal, &dispatch_time);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
|
||||
record->begin = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.start);
|
||||
record->end = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.end);
|
||||
@@ -203,22 +210,23 @@ class Tracker {
|
||||
amd_signal_t* prof_signal_ptr = reinterpret_cast<amd_signal_t*>(dispatch_signal.handle);
|
||||
orig_signal_ptr->start_ts = prof_signal_ptr->start_ts;
|
||||
orig_signal_ptr->end_ts = prof_signal_ptr->end_ts;
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal, signal_value);
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal,
|
||||
signal_value);
|
||||
}
|
||||
|
||||
return Context::Handler(signal_value, arg);
|
||||
}
|
||||
|
||||
private:
|
||||
Tracker() :
|
||||
outstanding_(0),
|
||||
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
|
||||
hsa_api_(*(hsa_rsrc_->HsaApi()))
|
||||
{}
|
||||
private:
|
||||
Tracker()
|
||||
: outstanding_(0),
|
||||
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
|
||||
hsa_api_(*(hsa_rsrc_->HsaApi())) {}
|
||||
|
||||
~Tracker() {
|
||||
if (trace_on_) {
|
||||
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()), outstanding_.load());
|
||||
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()),
|
||||
outstanding_.load());
|
||||
fflush(stdout);
|
||||
}
|
||||
|
||||
@@ -226,8 +234,8 @@ class Tracker {
|
||||
auto end = sig_list_.end();
|
||||
while (it != end) {
|
||||
auto cur = it++;
|
||||
// The wait should be optiona as there possible some inter kernel dependencies and it possible to wait for
|
||||
// the kernels will never be lunched as the application was finished by some reason.
|
||||
// The wait should be optiona as there possible some inter kernel dependencies and it possible to
|
||||
// wait for the kernels will never be lunched as the application was finished by some reason.
|
||||
#if 0
|
||||
// FIXME: currently the signal value for tracking signals are taken from original application signal
|
||||
hsa_rsrc_->SignalWait((*cur)->signal, 1);
|
||||
@@ -246,20 +254,24 @@ class Tracker {
|
||||
// Debug trace
|
||||
if (trace_on_) {
|
||||
auto outstanding = outstanding_.fetch_sub(1);
|
||||
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
|
||||
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record,
|
||||
outstanding);
|
||||
fflush(stdout);
|
||||
}
|
||||
|
||||
// Query begin/end and complete timestamps
|
||||
if (entry->is_memcopy) {
|
||||
hsa_amd_profiling_async_copy_time_t async_copy_time{};
|
||||
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
|
||||
hsa_status_t status =
|
||||
hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
|
||||
record->begin = hsa_rsrc_->SysclockToNs(async_copy_time.start);
|
||||
record->end = hsa_rsrc_->SysclockToNs(async_copy_time.end);
|
||||
} else {
|
||||
hsa_amd_profiling_dispatch_time_t dispatch_time{};
|
||||
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
|
||||
hsa_status_t status =
|
||||
hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
|
||||
record->begin = hsa_rsrc_->SysclockToNs(dispatch_time.start);
|
||||
record->end = hsa_rsrc_->SysclockToNs(dispatch_time.end);
|
||||
@@ -349,6 +361,6 @@ class Tracker {
|
||||
static const bool trace_on_ = false;
|
||||
};
|
||||
|
||||
} // namespace rocprofiler
|
||||
} // namespace rocprofiler
|
||||
|
||||
#endif // SRC_CORE_TRACKER_H_
|
||||
#endif // SRC_CORE_TRACKER_H_
|
||||
|
||||
@@ -36,11 +36,12 @@ typedef hsa_ext_amd_aql_pm4_packet_t packet_t;
|
||||
typedef uint32_t packet_word_t;
|
||||
typedef uint64_t timestamp_t;
|
||||
|
||||
inline std::ostream& operator<< (std::ostream& out, const event_t& event) {
|
||||
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index << "). counter_id(" << event.counter_id << ")]";
|
||||
inline std::ostream& operator<<(std::ostream& out, const event_t& event) {
|
||||
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index
|
||||
<< "). counter_id(" << event.counter_id << ")]";
|
||||
return out;
|
||||
}
|
||||
inline std::ostream& operator<< (std::ostream& out, const parameter_t& parameter) {
|
||||
inline std::ostream& operator<<(std::ostream& out, const parameter_t& parameter) {
|
||||
out << "[parameter_name(" << parameter.parameter_name << "). value(" << parameter.value << ")]";
|
||||
return out;
|
||||
}
|
||||
|
||||
@@ -35,15 +35,12 @@
|
||||
|
||||
namespace rocprofiler::pc_sampler {
|
||||
|
||||
PCSampler::PCSampler(
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler_filter_id_t filter_id,
|
||||
rocprofiler_session_id_t session_id)
|
||||
: buffer_id_(buffer_id)
|
||||
, filter_id_(filter_id)
|
||||
, session_id_(session_id)
|
||||
, pci_system_initialized_(pci_system_init() == 0)
|
||||
{}
|
||||
PCSampler::PCSampler(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
|
||||
rocprofiler_session_id_t session_id)
|
||||
: buffer_id_(buffer_id),
|
||||
filter_id_(filter_id),
|
||||
session_id_(session_id),
|
||||
pci_system_initialized_(pci_system_init() == 0) {}
|
||||
|
||||
PCSampler::~PCSampler() {
|
||||
if (pci_system_initialized_) {
|
||||
@@ -53,7 +50,9 @@ PCSampler::~PCSampler() {
|
||||
}
|
||||
|
||||
void PCSampler::Start() {
|
||||
if (sampler_thread_.joinable()) { return; }
|
||||
if (sampler_thread_.joinable()) {
|
||||
return;
|
||||
}
|
||||
|
||||
devices_.clear();
|
||||
|
||||
@@ -61,15 +60,15 @@ void PCSampler::Start() {
|
||||
|
||||
agents_t agents;
|
||||
rocprofiler::hsa_support::GetCoreApiTable().hsa_iterate_agents_fn(
|
||||
[](hsa_agent_t agent, void *arg){
|
||||
auto &agents = *reinterpret_cast<agents_t *>(arg);
|
||||
agents.emplace_back(agent);
|
||||
return HSA_STATUS_SUCCESS;
|
||||
},
|
||||
&agents);
|
||||
[](hsa_agent_t agent, void* arg) {
|
||||
auto& agents = *reinterpret_cast<agents_t*>(arg);
|
||||
agents.emplace_back(agent);
|
||||
return HSA_STATUS_SUCCESS;
|
||||
},
|
||||
&agents);
|
||||
|
||||
for (const auto &agent : agents) {
|
||||
const auto &ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
|
||||
for (const auto& agent : agents) {
|
||||
const auto& ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
|
||||
if (ai.getType() != HSA_DEVICE_TYPE_GPU) {
|
||||
continue;
|
||||
}
|
||||
@@ -81,31 +80,30 @@ void PCSampler::Start() {
|
||||
}
|
||||
|
||||
void PCSampler::Stop() {
|
||||
if (!sampler_thread_.joinable()) { return; }
|
||||
if (!sampler_thread_.joinable()) {
|
||||
return;
|
||||
}
|
||||
|
||||
keep_running_ = false;
|
||||
sampler_thread_.join();
|
||||
}
|
||||
|
||||
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t &record) {
|
||||
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t& record) {
|
||||
const auto tool = rocprofiler::GetROCProfilerSingleton();
|
||||
const auto session = tool->GetSession(session_id_);
|
||||
const auto buffer = session->GetBuffer(buffer_id_);
|
||||
|
||||
std::lock_guard<std::mutex> lk(session->GetSessionLock());
|
||||
|
||||
record.header = {
|
||||
ROCPROFILER_PC_SAMPLING_RECORD,
|
||||
{ tool->GetUniqueRecordId() }
|
||||
};
|
||||
record.header = {ROCPROFILER_PC_SAMPLING_RECORD, {tool->GetUniqueRecordId()}};
|
||||
buffer->AddRecord(record);
|
||||
}
|
||||
|
||||
void PCSampler::SamplerLoop() {
|
||||
while (keep_running_) {
|
||||
auto next_tick = std::chrono::steady_clock::now() + std::chrono::milliseconds(10);
|
||||
for (auto &agent : devices_) {
|
||||
auto &device = agent.second;
|
||||
for (auto& agent : devices_) {
|
||||
auto& device = agent.second;
|
||||
if (device.fd_.mmio2.get() >= 0) {
|
||||
gfxip::read_pc_samples_v9_ioctl(device, this);
|
||||
} else {
|
||||
@@ -116,4 +114,4 @@ void PCSampler::SamplerLoop() {
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
|
||||
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
@@ -23,244 +23,244 @@
|
||||
|
||||
// addressBlock: gc_grbmdec
|
||||
// base address: 0x8000
|
||||
#define mmGRBM_CNTL 0x0000
|
||||
#define mmGRBM_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_SKEW_CNTL 0x0001
|
||||
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS2 0x0002
|
||||
#define mmGRBM_STATUS2_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL 0x0003
|
||||
#define mmGRBM_PWR_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS 0x0004
|
||||
#define mmGRBM_STATUS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE0 0x0005
|
||||
#define mmGRBM_STATUS_SE0_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE1 0x0006
|
||||
#define mmGRBM_STATUS_SE1_BASE_IDX 0
|
||||
#define mmGRBM_SOFT_RESET 0x0008
|
||||
#define mmGRBM_SOFT_RESET_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
|
||||
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE2 0x000e
|
||||
#define mmGRBM_STATUS_SE2_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE3 0x000f
|
||||
#define mmGRBM_STATUS_SE3_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR 0x0016
|
||||
#define mmGRBM_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR2 0x0017
|
||||
#define mmGRBM_READ_ERROR2_BASE_IDX 0
|
||||
#define mmGRBM_INT_CNTL 0x0018
|
||||
#define mmGRBM_INT_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_OP 0x0019
|
||||
#define mmGRBM_TRAP_OP_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR 0x001a
|
||||
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR_MSK 0x001b
|
||||
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD 0x001c
|
||||
#define mmGRBM_TRAP_WD_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD_MSK 0x001d
|
||||
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
|
||||
#define mmGRBM_DSM_BYPASS 0x001e
|
||||
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
|
||||
#define mmGRBM_WRITE_ERROR 0x001f
|
||||
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_IOV_ERROR 0x0020
|
||||
#define mmGRBM_IOV_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHIP_REVISION 0x0021
|
||||
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CNTL 0x0022
|
||||
#define mmGRBM_GFX_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_CFG 0x0023
|
||||
#define mmGRBM_RSMU_CFG_BASE_IDX 0
|
||||
#define mmGRBM_IH_CREDIT 0x0024
|
||||
#define mmGRBM_IH_CREDIT_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL2 0x0025
|
||||
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_READ_ERROR 0x0028
|
||||
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHICKEN_BITS 0x0029
|
||||
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE0 0x002a
|
||||
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE1 0x002b
|
||||
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
|
||||
#define mmGRBM_NOWHERE 0x003f
|
||||
#define mmGRBM_NOWHERE_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG0 0x0040
|
||||
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG1 0x0041
|
||||
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG2 0x0042
|
||||
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG3 0x0043
|
||||
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG4 0x0044
|
||||
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG5 0x0045
|
||||
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG6 0x0046
|
||||
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG7 0x0047
|
||||
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
|
||||
#define mmGRBM_CNTL 0x0000
|
||||
#define mmGRBM_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_SKEW_CNTL 0x0001
|
||||
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS2 0x0002
|
||||
#define mmGRBM_STATUS2_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL 0x0003
|
||||
#define mmGRBM_PWR_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS 0x0004
|
||||
#define mmGRBM_STATUS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE0 0x0005
|
||||
#define mmGRBM_STATUS_SE0_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE1 0x0006
|
||||
#define mmGRBM_STATUS_SE1_BASE_IDX 0
|
||||
#define mmGRBM_SOFT_RESET 0x0008
|
||||
#define mmGRBM_SOFT_RESET_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
|
||||
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE2 0x000e
|
||||
#define mmGRBM_STATUS_SE2_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE3 0x000f
|
||||
#define mmGRBM_STATUS_SE3_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR 0x0016
|
||||
#define mmGRBM_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR2 0x0017
|
||||
#define mmGRBM_READ_ERROR2_BASE_IDX 0
|
||||
#define mmGRBM_INT_CNTL 0x0018
|
||||
#define mmGRBM_INT_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_OP 0x0019
|
||||
#define mmGRBM_TRAP_OP_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR 0x001a
|
||||
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR_MSK 0x001b
|
||||
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD 0x001c
|
||||
#define mmGRBM_TRAP_WD_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD_MSK 0x001d
|
||||
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
|
||||
#define mmGRBM_DSM_BYPASS 0x001e
|
||||
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
|
||||
#define mmGRBM_WRITE_ERROR 0x001f
|
||||
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_IOV_ERROR 0x0020
|
||||
#define mmGRBM_IOV_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHIP_REVISION 0x0021
|
||||
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CNTL 0x0022
|
||||
#define mmGRBM_GFX_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_CFG 0x0023
|
||||
#define mmGRBM_RSMU_CFG_BASE_IDX 0
|
||||
#define mmGRBM_IH_CREDIT 0x0024
|
||||
#define mmGRBM_IH_CREDIT_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL2 0x0025
|
||||
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_READ_ERROR 0x0028
|
||||
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHICKEN_BITS 0x0029
|
||||
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE0 0x002a
|
||||
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE1 0x002b
|
||||
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
|
||||
#define mmGRBM_NOWHERE 0x003f
|
||||
#define mmGRBM_NOWHERE_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG0 0x0040
|
||||
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG1 0x0041
|
||||
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG2 0x0042
|
||||
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG3 0x0043
|
||||
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG4 0x0044
|
||||
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG5 0x0045
|
||||
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG6 0x0046
|
||||
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG7 0x0047
|
||||
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_cppdec2
|
||||
// base address: 0xc600
|
||||
#define mmCPF_EDC_TAG_CNT 0x1189
|
||||
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPF_EDC_ROQ_CNT 0x118a
|
||||
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_TAG_CNT 0x118b
|
||||
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_DMA_CNT 0x118d
|
||||
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_SCRATCH_CNT 0x118e
|
||||
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_UCODE_CNT 0x118f
|
||||
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_STATE_CNT 0x1191
|
||||
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_CSINVOC_CNT 0x1192
|
||||
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_RESTORE_CNT 0x1193
|
||||
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
|
||||
#define mmCPF_EDC_TAG_CNT 0x1189
|
||||
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPF_EDC_ROQ_CNT 0x118a
|
||||
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_TAG_CNT 0x118b
|
||||
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_DMA_CNT 0x118d
|
||||
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_SCRATCH_CNT 0x118e
|
||||
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_UCODE_CNT 0x118f
|
||||
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_STATE_CNT 0x1191
|
||||
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_CSINVOC_CNT 0x1192
|
||||
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_RESTORE_CNT 0x1193
|
||||
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_gdsdec
|
||||
// base address: 0x9700
|
||||
#define mmGDS_EDC_CNT 0x05c5
|
||||
#define mmGDS_EDC_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_GRBM_CNT 0x05c6
|
||||
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_DED 0x05c7
|
||||
#define mmGDS_EDC_OA_DED_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
|
||||
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
|
||||
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_CNT 0x05c5
|
||||
#define mmGDS_EDC_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_GRBM_CNT 0x05c6
|
||||
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_DED 0x05c7
|
||||
#define mmGDS_EDC_OA_DED_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
|
||||
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
|
||||
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_shsdec
|
||||
// base address: 0x9000
|
||||
#define mmSPI_EDC_CNT 0x0445
|
||||
#define mmSPI_EDC_CNT_BASE_IDX 0
|
||||
#define mmSPI_EDC_CNT 0x0445
|
||||
#define mmSPI_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_sqdec
|
||||
// base address: 0x8c00
|
||||
#define mmSQC_EDC_CNT2 0x032c
|
||||
#define mmSQC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT3 0x032d
|
||||
#define mmSQC_EDC_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_PARITY_CNT3 0x032e
|
||||
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT 0x03a2
|
||||
#define mmSQC_EDC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_SEC_CNT 0x03a3
|
||||
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_DED_CNT 0x03a4
|
||||
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_INFO 0x03a5
|
||||
#define mmSQ_EDC_INFO_BASE_IDX 0
|
||||
#define mmSQ_EDC_CNT 0x03a6
|
||||
#define mmSQ_EDC_CNT_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT2 0x032c
|
||||
#define mmSQC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT3 0x032d
|
||||
#define mmSQC_EDC_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_PARITY_CNT3 0x032e
|
||||
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT 0x03a2
|
||||
#define mmSQC_EDC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_SEC_CNT 0x03a3
|
||||
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_DED_CNT 0x03a4
|
||||
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_INFO 0x03a5
|
||||
#define mmSQ_EDC_INFO_BASE_IDX 0
|
||||
#define mmSQ_EDC_CNT 0x03a6
|
||||
#define mmSQ_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_tpdec
|
||||
// base address: 0x9400
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_tcdec
|
||||
// base address: 0xac00
|
||||
#define mmTCP_EDC_CNT 0x0b17
|
||||
#define mmTCP_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCP_EDC_CNT_NEW 0x0b18
|
||||
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
|
||||
#define mmTCI_EDC_CNT 0x0b60
|
||||
#define mmTCI_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT 0x0b82
|
||||
#define mmTCC_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT2 0x0b83
|
||||
#define mmTCC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmTCA_EDC_CNT 0x0bc5
|
||||
#define mmTCA_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCP_EDC_CNT 0x0b17
|
||||
#define mmTCP_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCP_EDC_CNT_NEW 0x0b18
|
||||
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
|
||||
#define mmTCI_EDC_CNT 0x0b60
|
||||
#define mmTCI_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT 0x0b82
|
||||
#define mmTCC_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT2 0x0b83
|
||||
#define mmTCC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmTCA_EDC_CNT 0x0bc5
|
||||
#define mmTCA_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_tpdec
|
||||
// base address: 0x9400
|
||||
#define mmTD_EDC_CNT 0x052e
|
||||
#define mmTD_EDC_CNT_BASE_IDX 0
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
#define mmTD_EDC_CNT 0x052e
|
||||
#define mmTD_EDC_CNT_BASE_IDX 0
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_ea_gceadec2
|
||||
// base address: 0x9c00
|
||||
#define mmGCEA_EDC_CNT 0x0706
|
||||
#define mmGCEA_EDC_CNT_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT2 0x0707
|
||||
#define mmGCEA_EDC_CNT2_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT3 0x071b
|
||||
#define mmGCEA_EDC_CNT3_BASE_IDX 0
|
||||
#define mmGCEA_ERR_STATUS 0x0712
|
||||
#define mmGCEA_ERR_STATUS_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT 0x0706
|
||||
#define mmGCEA_EDC_CNT_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT2 0x0707
|
||||
#define mmGCEA_EDC_CNT2_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT3 0x071b
|
||||
#define mmGCEA_EDC_CNT3_BASE_IDX 0
|
||||
#define mmGCEA_ERR_STATUS 0x0712
|
||||
#define mmGCEA_ERR_STATUS_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_gfxudec
|
||||
// base address: 0x30000
|
||||
#define mmSCRATCH_REG0 0x2040
|
||||
#define mmSCRATCH_REG0_BASE_IDX 1
|
||||
#define mmSCRATCH_REG1 0x2041
|
||||
#define mmSCRATCH_REG1_BASE_IDX 1
|
||||
#define mmSCRATCH_REG2 0x2042
|
||||
#define mmSCRATCH_REG2_BASE_IDX 1
|
||||
#define mmSCRATCH_REG3 0x2043
|
||||
#define mmSCRATCH_REG3_BASE_IDX 1
|
||||
#define mmSCRATCH_REG4 0x2044
|
||||
#define mmSCRATCH_REG4_BASE_IDX 1
|
||||
#define mmSCRATCH_REG5 0x2045
|
||||
#define mmSCRATCH_REG5_BASE_IDX 1
|
||||
#define mmSCRATCH_REG6 0x2046
|
||||
#define mmSCRATCH_REG6_BASE_IDX 1
|
||||
#define mmSCRATCH_REG7 0x2047
|
||||
#define mmSCRATCH_REG7_BASE_IDX 1
|
||||
#define mmGRBM_GFX_INDEX 0x2200
|
||||
#define mmGRBM_GFX_INDEX_BASE_IDX 1
|
||||
#define mmSCRATCH_REG0 0x2040
|
||||
#define mmSCRATCH_REG0_BASE_IDX 1
|
||||
#define mmSCRATCH_REG1 0x2041
|
||||
#define mmSCRATCH_REG1_BASE_IDX 1
|
||||
#define mmSCRATCH_REG2 0x2042
|
||||
#define mmSCRATCH_REG2_BASE_IDX 1
|
||||
#define mmSCRATCH_REG3 0x2043
|
||||
#define mmSCRATCH_REG3_BASE_IDX 1
|
||||
#define mmSCRATCH_REG4 0x2044
|
||||
#define mmSCRATCH_REG4_BASE_IDX 1
|
||||
#define mmSCRATCH_REG5 0x2045
|
||||
#define mmSCRATCH_REG5_BASE_IDX 1
|
||||
#define mmSCRATCH_REG6 0x2046
|
||||
#define mmSCRATCH_REG6_BASE_IDX 1
|
||||
#define mmSCRATCH_REG7 0x2047
|
||||
#define mmSCRATCH_REG7_BASE_IDX 1
|
||||
#define mmGRBM_GFX_INDEX 0x2200
|
||||
#define mmGRBM_GFX_INDEX_BASE_IDX 1
|
||||
|
||||
// addressBlock: gc_utcl2_atcl2dec
|
||||
// base address: 0xa000
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_utcl2_vml2pfdec
|
||||
// base address: 0xa100
|
||||
#define mmVML2_MEM_ECC_INDEX 0x0860
|
||||
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_INDEX 0x0862
|
||||
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmVML2_MEM_ECC_INDEX 0x0860
|
||||
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_INDEX 0x0862
|
||||
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
|
||||
#define mmVML2_MEM_ECC_CNTL 0x0863
|
||||
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_CNTL 0x0865
|
||||
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmVML2_MEM_ECC_CNTL 0x0863
|
||||
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_CNTL 0x0865
|
||||
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_rlcpdec
|
||||
// base address: 0x3b000
|
||||
#define mmRLC_EDC_CNT 0x4d40
|
||||
#define mmRLC_EDC_CNT_BASE_IDX 1
|
||||
#define mmRLC_EDC_CNT2 0x4d41
|
||||
#define mmRLC_EDC_CNT2_BASE_IDX 1
|
||||
#define mmRLC_EDC_CNT 0x4d40
|
||||
#define mmRLC_EDC_CNT_BASE_IDX 1
|
||||
#define mmRLC_EDC_CNT2 0x4d41
|
||||
#define mmRLC_EDC_CNT2_BASE_IDX 1
|
||||
|
||||
#endif
|
||||
|
||||
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
@@ -41,18 +41,17 @@ namespace rocprofiler::pc_sampler::gfxip {
|
||||
|
||||
namespace {
|
||||
|
||||
static int find_pci_instance(const std::string &pci_string) {
|
||||
static int find_pci_instance(const std::string& pci_string) {
|
||||
rocprofiler::handle_t<DIR*, util::dir_closer> dir(opendir(DEBUG_DRI_PATH));
|
||||
if (dir.get() == nullptr) {
|
||||
char *errstr = strerror(errno);
|
||||
char* errstr = strerror(errno);
|
||||
warning("Can't open debugfs dri directory: %s\n", errstr);
|
||||
goto fail;
|
||||
}
|
||||
|
||||
struct dirent *dent;
|
||||
struct dirent* dent;
|
||||
while ((dent = readdir(dir.get())) != nullptr) {
|
||||
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0)
|
||||
continue;
|
||||
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0) continue;
|
||||
|
||||
std::string name(DEBUG_DRI_PATH);
|
||||
name += dent->d_name;
|
||||
@@ -66,8 +65,7 @@ static int find_pci_instance(const std::string &pci_string) {
|
||||
ifs >> device;
|
||||
}
|
||||
if (device.empty()) continue;
|
||||
if (auto p = device.find(DEV_PFX); p != device.npos)
|
||||
device.erase(p, strlen(DEV_PFX));
|
||||
if (auto p = device.find(DEV_PFX); p != device.npos) device.erase(p, strlen(DEV_PFX));
|
||||
if (pci_string == device) return std::stoi(dent->d_name);
|
||||
}
|
||||
|
||||
@@ -75,7 +73,7 @@ fail:
|
||||
return -1;
|
||||
}
|
||||
|
||||
} // namespace
|
||||
} // namespace
|
||||
|
||||
uint32_t pasid() {
|
||||
static std::optional<uint32_t> pasid;
|
||||
@@ -89,9 +87,7 @@ uint32_t pasid() {
|
||||
return *pasid;
|
||||
}
|
||||
|
||||
int debugfs_ioctl_set_state(
|
||||
const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc) {
|
||||
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc) {
|
||||
int ret = ioctl(dev.fd_.mmio2.get(), AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE, &ioc);
|
||||
if (ret < 0) {
|
||||
fatal("Couldn't set register ioctl state\n");
|
||||
@@ -99,11 +95,9 @@ int debugfs_ioctl_set_state(
|
||||
return ret;
|
||||
}
|
||||
|
||||
int debugfs_ioctl_write_register(
|
||||
const device_t &dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc,
|
||||
const uint64_t addr,
|
||||
const uint32_t value) {
|
||||
int debugfs_ioctl_write_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr, const uint32_t value) {
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
if (lseek(dev.fd_.mmio2.get(), addr * 4, SEEK_SET) < 0) {
|
||||
fatal("Cannot seek to MMIO address for write\n");
|
||||
@@ -115,10 +109,9 @@ int debugfs_ioctl_write_register(
|
||||
return r;
|
||||
}
|
||||
|
||||
uint32_t debugfs_ioctl_read_register(
|
||||
const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc,
|
||||
const uint64_t addr) {
|
||||
uint32_t debugfs_ioctl_read_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr) {
|
||||
// Select the SE, SH, and CU.
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
|
||||
@@ -134,20 +127,17 @@ uint32_t debugfs_ioctl_read_register(
|
||||
return value;
|
||||
}
|
||||
|
||||
device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
|
||||
: agent_info_(info)
|
||||
, pci_memory_(nullptr)
|
||||
{
|
||||
device_t::device_t(const bool pci_inited, const Agent::AgentInfo& info)
|
||||
: agent_info_(info), pci_memory_(nullptr) {
|
||||
const auto pci_domain = agent_info_.getPCIDomain();
|
||||
const auto pci_location_id = agent_info_.getPCILocationID();
|
||||
|
||||
std::string name([pci_domain, pci_location_id]() {
|
||||
std::ostringstream out;
|
||||
out.fill('0');
|
||||
out << std::hex << std::setw(4) << pci_domain << ':'
|
||||
<< std::hex << std::setw(2) << (pci_location_id >> 8) << ':'
|
||||
<< std::hex << std::setw(2) << (pci_location_id & 0xFF) << '.'
|
||||
<< 0;
|
||||
out << std::hex << std::setw(4) << pci_domain << ':' << std::hex << std::setw(2)
|
||||
<< (pci_location_id >> 8) << ':' << std::hex << std::setw(2) << (pci_location_id & 0xFF)
|
||||
<< '.' << 0;
|
||||
return out.str();
|
||||
}());
|
||||
|
||||
@@ -162,8 +152,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
|
||||
if (fd_.mmio2.get() < 0) {
|
||||
warning("Couldn't open amdgpu_regs2 debugfs file\n");
|
||||
if (!pci_inited) {
|
||||
constexpr char msg[] =
|
||||
"PCI system uninitialized; no PC sampling methods available\n";
|
||||
constexpr char msg[] = "PCI system uninitialized; no PC sampling methods available\n";
|
||||
fatal(msg);
|
||||
}
|
||||
} else {
|
||||
@@ -173,8 +162,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
|
||||
|
||||
pci_device_ =
|
||||
pci_device_find_by_slot(pci_domain, pci_location_id >> 8, pci_location_id & 0xFF, 0);
|
||||
if (!pci_device_ || pci_device_probe(pci_device_))
|
||||
fatal("failed to probe the GPU device\n");
|
||||
if (!pci_device_ || pci_device_probe(pci_device_)) fatal("failed to probe the GPU device\n");
|
||||
|
||||
// Look for a region between 256KB and 4096KB, 32-bit, non IO, and non prefetchable.
|
||||
for (size_t region = 0; region < sizeof(pci_device::regions) / sizeof(pci_device::regions[0]);
|
||||
@@ -199,11 +187,9 @@ device_specific_init:
|
||||
}
|
||||
|
||||
device_t::~device_t() {
|
||||
if (pci_memory_ &&
|
||||
pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_))
|
||||
{
|
||||
if (pci_memory_ && pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_)) {
|
||||
warning("failed to unmap the pci memory\n");
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace rocprofiler::pc_sampler::gfxip
|
||||
} // namespace rocprofiler::pc_sampler::gfxip
|
||||
|
||||
@@ -52,14 +52,18 @@ namespace gfxip {
|
||||
namespace util {
|
||||
|
||||
struct dir_closer {
|
||||
void operator()(DIR *dir) { if (dir != nullptr) closedir(dir); }
|
||||
void operator()(DIR* dir) {
|
||||
if (dir != nullptr) closedir(dir);
|
||||
}
|
||||
};
|
||||
|
||||
struct fd_closer {
|
||||
void operator()(int fd) { if (fd >= 0) close(fd); }
|
||||
void operator()(int fd) {
|
||||
if (fd >= 0) close(fd);
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace rocprofiler::pc_sampler::gfxip::util
|
||||
} // namespace util
|
||||
|
||||
struct amdgpu_debugfs_regs2_iocdata {
|
||||
__u32 use_srbm, use_grbm, pg_lock;
|
||||
@@ -71,11 +75,10 @@ struct amdgpu_debugfs_regs2_iocdata {
|
||||
} srbm;
|
||||
};
|
||||
|
||||
enum AMDGPU_DEBUGFS_REGS2_CMDS {
|
||||
AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0
|
||||
};
|
||||
enum AMDGPU_DEBUGFS_REGS2_CMDS { AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0 };
|
||||
|
||||
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE _IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
|
||||
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE \
|
||||
_IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
|
||||
|
||||
enum {
|
||||
GC_HWIP = 1, // Graphics Core IP
|
||||
@@ -96,14 +99,14 @@ static constexpr int HWIP_MAX_INSTANCE = 11;
|
||||
(REG_FIELD_MASK(reg, field) & ((field_val) << REG_FIELD_SHIFT(reg, field))))
|
||||
|
||||
struct device_t {
|
||||
device_t(const bool pci_inited, const Agent::AgentInfo &agent_info);
|
||||
device_t(const bool pci_inited, const Agent::AgentInfo& agent_info);
|
||||
~device_t();
|
||||
|
||||
device_t(const device_t&) = delete;
|
||||
device_t& operator=(const device_t&) = delete;
|
||||
device_t(device_t&&) = default;
|
||||
|
||||
const Agent::AgentInfo &agent_info_;
|
||||
const Agent::AgentInfo& agent_info_;
|
||||
|
||||
struct pci_device* pci_device_;
|
||||
size_t pci_memory_size_;
|
||||
@@ -120,19 +123,23 @@ struct device_t {
|
||||
|
||||
uint32_t pasid();
|
||||
|
||||
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc);
|
||||
int debugfs_ioctl_write_register(const device_t &dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr, const uint32_t value);
|
||||
uint32_t debugfs_ioctl_read_register(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr);
|
||||
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc);
|
||||
int debugfs_ioctl_write_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr, const uint32_t value);
|
||||
uint32_t debugfs_ioctl_read_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr);
|
||||
|
||||
void vega10_reg_offset_init(device_t& dev);
|
||||
void vega20_reg_offset_init(device_t& dev);
|
||||
void arct_reg_offset_init(device_t& dev);
|
||||
void aldebaran_reg_offset_init(device_t& dev);
|
||||
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler);
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler);
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler);
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler);
|
||||
|
||||
} // namespace rocprofiler::pc_sampler::gfxip
|
||||
} // namespace gfxip
|
||||
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
#endif // SRC_PCSAMPLER_GFXIP_GFXIP_H_
|
||||
|
||||
@@ -54,12 +54,10 @@ uint32_t read_sq_register(const device_t& dev, uint32_t simd, uint32_t wave_id,
|
||||
return dev.pci_memory_[REG_OFFSET(GC, 0, mmSQ_IND_DATA)];
|
||||
}
|
||||
|
||||
uint32_t debugfs_ioctl_read_sq_register(
|
||||
const device_t &dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc,
|
||||
const uint32_t simd,
|
||||
const uint32_t wave_id,
|
||||
const uint32_t register_address) {
|
||||
uint32_t debugfs_ioctl_read_sq_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint32_t simd, const uint32_t wave_id,
|
||||
const uint32_t register_address) {
|
||||
uint32_t data = REG_SET_FIELD(0, SQ_IND_INDEX, WAVE_ID, wave_id);
|
||||
data = REG_SET_FIELD(data, SQ_IND_INDEX, SIMD_ID, simd);
|
||||
data = REG_SET_FIELD(data, SQ_IND_INDEX, INDEX, register_address);
|
||||
@@ -67,21 +65,15 @@ uint32_t debugfs_ioctl_read_sq_register(
|
||||
return debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmSQ_IND_DATA));
|
||||
}
|
||||
|
||||
void fill_record(
|
||||
const device_t &dev,
|
||||
rocprofiler_record_pc_sample_t *record,
|
||||
uint32_t se,
|
||||
uint64_t pc,
|
||||
hsa_kernel_dispatch_packet_t *pkt) {
|
||||
|
||||
void fill_record(const device_t& dev, rocprofiler_record_pc_sample_t* record, uint32_t se,
|
||||
uint64_t pc, hsa_kernel_dispatch_packet_t* pkt) {
|
||||
/*
|
||||
* XXX: Use of the reserved2 field in the HSA dispatch packet to uniquely
|
||||
* identify kernel dispatches for PC sampling is an internal implementation
|
||||
* detail which is subject to change. See the comment associated with
|
||||
* rocprofiler::rocprofiler::kernel_dispatch_counter_.
|
||||
*/
|
||||
record->pc_sample.dispatch_id =
|
||||
rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
|
||||
record->pc_sample.dispatch_id = rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
|
||||
|
||||
/*
|
||||
* TODO: Fill this with gpu_clock_counter via AMDKFD_IOC_GET_CLOCK_COUNTERS,
|
||||
@@ -98,12 +90,12 @@ void fill_record(
|
||||
* Future sampling methods may fill this in automatically from the GPU's
|
||||
* real-time counter.
|
||||
*/
|
||||
//record->pc_sample.cycle = 0;
|
||||
// record->pc_sample.cycle = 0;
|
||||
rocprofiler_get_timestamp(&record->pc_sample.timestamp);
|
||||
|
||||
record->pc_sample.pc = pc;
|
||||
record->pc_sample.se = se;
|
||||
const auto &hdl = dev.agent_info_.getHandle();
|
||||
const auto& hdl = dev.agent_info_.getHandle();
|
||||
|
||||
/*
|
||||
* XXX FIXME: For consistency, this is the same method as used by
|
||||
@@ -112,17 +104,16 @@ void fill_record(
|
||||
* comment in rocprofiler::hsa_support::Initialize about using KFD's gpu_id for
|
||||
* more information.
|
||||
*/
|
||||
record->pc_sample.gpu_id = rocprofiler_agent_id_t{
|
||||
(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
|
||||
record->pc_sample.gpu_id =
|
||||
rocprofiler_agent_id_t{(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler) {
|
||||
assert(sampler);
|
||||
|
||||
uint32_t saved_grbm_gfx_index =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
|
||||
uint32_t saved_grbm_gfx_index = dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
|
||||
uint32_t data;
|
||||
|
||||
for (uint32_t se = 0; se < dev.agent_info_.getShaderEngineCount(); ++se)
|
||||
@@ -174,19 +165,16 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
|
||||
data = REG_SET_FIELD(data, GRBM_GFX_CNTL, VMID, vm_id);
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL)] = data;
|
||||
|
||||
uint32_t pq_base_lo =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
|
||||
uint32_t pq_base_hi =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
|
||||
uint32_t pq_base_lo = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
|
||||
uint32_t pq_base_hi = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
|
||||
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
|
||||
uint32_t cp_hqd_pq_control_queue_size =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
|
||||
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
|
||||
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(
|
||||
pq_base + disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t)
|
||||
);
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
|
||||
disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t));
|
||||
fill_record(dev, &record, se, *pc, pkt);
|
||||
}
|
||||
|
||||
@@ -208,10 +196,10 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)] = saved_grbm_gfx_index;
|
||||
}
|
||||
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler) {
|
||||
assert(sampler);
|
||||
|
||||
struct amdgpu_debugfs_regs2_iocdata ioc{};
|
||||
struct amdgpu_debugfs_regs2_iocdata ioc {};
|
||||
ioc.use_grbm = 1;
|
||||
|
||||
uint32_t data;
|
||||
@@ -236,11 +224,13 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
|
||||
// Skip this slot if the wave is not valid.
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
uint32_t status = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
|
||||
uint32_t status =
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
|
||||
if (!REG_GET_FIELD(status, SQ_WAVE_STATUS, VALID)) continue;
|
||||
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
uint32_t hw_id = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
|
||||
uint32_t hw_id =
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
|
||||
uint32_t vm_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, VM_ID);
|
||||
|
||||
rocprofiler_record_pc_sample_t record;
|
||||
@@ -248,12 +238,16 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
// If the wave's PASID matches the process', read and report the PC
|
||||
// and dispatch packet for the wave.
|
||||
std::optional<uint64_t> pc;
|
||||
if (debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
|
||||
pc = (uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI) << 32 |
|
||||
if (debugfs_ioctl_read_register(
|
||||
dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
|
||||
pc =
|
||||
(uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI)
|
||||
<< 32 |
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_LO);
|
||||
|
||||
// The dispatch index into the queue
|
||||
uint32_t disp_idx = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
|
||||
uint32_t disp_idx =
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
|
||||
|
||||
// Set up reading CP_HQD_PQ_BASE and CP_HQD_PQ_BASE_HI
|
||||
uint32_t pipe_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, PIPE_ID);
|
||||
@@ -266,18 +260,19 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
debugfs_ioctl_write_register(dev, ioc, REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL), data);
|
||||
|
||||
uint32_t pq_base_lo =
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
|
||||
uint32_t pq_base_hi =
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) & 0xff;
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) &
|
||||
0xff;
|
||||
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
|
||||
uint32_t cp_hqd_pq_control_queue_size =
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) & 0x3f;
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) &
|
||||
0x3f;
|
||||
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
|
||||
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(
|
||||
pq_base + disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t)
|
||||
);
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
|
||||
disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t));
|
||||
fill_record(dev, &record, se, *pc, pkt);
|
||||
}
|
||||
|
||||
|
||||
@@ -22,306 +22,305 @@
|
||||
#define _osssys_4_0_OFFSET_HEADER
|
||||
|
||||
|
||||
|
||||
// addressBlock: osssys_osssysdec
|
||||
// base address: 0x4280
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x0088
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x0089
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008a
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008b
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x008c
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x008f
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0090
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0091
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x0092
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x0093
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x0094
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x0097
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x0098
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x0088
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x0089
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008a
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008b
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x008c
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x008f
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0090
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0091
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x0092
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x0093
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x0094
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x0097
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x0098
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
|
||||
#endif
|
||||
|
||||
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
@@ -24,322 +24,321 @@
|
||||
#define _osssys_4_2_0_OFFSET_HEADER
|
||||
|
||||
|
||||
|
||||
// addressBlock: osssys_osssysdec
|
||||
// base address: 0x4280
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x008c
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x008d
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008e
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008f
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x0090
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x0093
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0098
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0099
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x009a
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x009b
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x009c
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x009f
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x00a5
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_MEM_POWER_CTRL 0x00e8
|
||||
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_MEM_POWER_CTRL 0x0110
|
||||
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_CNTL 0x018e
|
||||
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
|
||||
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
|
||||
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD_1 0x01ad
|
||||
#define mmSEM_RESP_UVD_1_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x008c
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x008d
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008e
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008f
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x0090
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x0093
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0098
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0099
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x009a
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x009b
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x009c
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x009f
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x00a5
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_MEM_POWER_CTRL 0x00e8
|
||||
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_MEM_POWER_CTRL 0x0110
|
||||
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_CNTL 0x018e
|
||||
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
|
||||
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
|
||||
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD_1 0x01ad
|
||||
#define mmSEM_RESP_UVD_1_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
|
||||
#endif
|
||||
|
||||
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Некоторые файлы не были показаны из-за слишком большого количества измененных файлов Показать больше
Ссылка в новой задаче
Block a user