Pull from Github
Squashed commit of the following: commit f029195705a15700380c6f832ba5d15d46fd6de7 Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Date: Thu Jul 13 14:38:56 2023 -0500 Formatting workflows for source (clang-format) and cmake (cmake-format) (#4) * Add .cmake-format.yaml file * Add formatting workflow * provide base input for creating PR * Update scheme for extracting branch name - disable running formatting on push to amd-staging branch * patch .cmake-format.yaml for find_package signature - apparently cmake-format doesn't format the full signature of find_package * run formatting (clang-format v11) (#7) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> * run cmake formatting (cmake-format) (#6) Co-authored-by: jrmadsen <jrmadsen@users.noreply.github.com> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> commit bc4d135fdd8a1a9e51235f18a5d575fd2b3735e6 Author: Ammar ELWazir <aelwazir@amd.com> Date: Thu Jul 13 12:55:17 2023 -0500 Removing Build cache for potential issues with auto-generated header files (#5) Change-Id: I9e2319f4335e2f88585ffa6fac2bd88a1c952e6e commit ce86dea6a311d44d880fa684eb78f3329295e2a4 Author: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Date: Thu Jul 13 11:08:58 2023 -0500 Fix decltype(<hsa-function>) function pointer usage (#3) - the following is done in several places: decltype(hsa_memory_allocate)* hsa_memory_allocate - above can cause compiler errors - replace decltype(<hsa-function>) with decltype(::<hsa-function>) - this ensures that the type within the decltype is recognized as the global scope HSA function, not the variable - in many places, the variable has a "_fn" suffix to prevent this issue but added '::' anyway for consistency commit ac49fdd92a72e9c99394253a02da413a6c2e3b3a Merge: a07946a 03a0855 Author: Ammar ELWazir <aelwazir@amd.com> Date: Wed Jul 12 11:36:24 2023 -0500 Merge pull request #2 from ROCm-Developer-Tools/gerrit-amd-staging Pull from gerrit commit 03a085588cffe863e8f466de67be1cfb205b675a Merge:c26b32ba07946a Author: Ammar ELWazir <aelwazir@amd.com> Date: Wed Jul 12 10:57:30 2023 -0500 Merge branch 'amd-staging' into gerrit-amd-staging commit a07946a5cd4c670c83c27ad1a076a9d4567ce6d7 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 15:46:04 2023 +0000 Enabling Cached Builds commit 525e494a7f13941077a8fd4ad6840904db4d27d4 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 04:53:54 2023 +0000 Updating missed GPU Targets commit 42c75862f628c9bee7cfb7dc04dff2619430efbc Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 04:43:02 2023 +0000 Adding V1 Testing commit 9d72fd4aee85e4b0c12e717060d2730fa5b73be1 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 03:34:31 2023 +0000 Fixing Artifacts directory path commit f4000cc558b3b2e4676f7994f7ce8c8e6f94518e Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 03:27:26 2023 +0000 Fixing CMake for test build job commit 2ce8115d4c33948c3c8f957f545a95a04e1d6cd2 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 03:16:18 2023 +0000 Fixing Ubuntu CMake for ubuntu test build commit 6d0ed439191be900748d0c025157f9d689a73ec7 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 01:28:41 2023 +0000 Removing Navi21 commit e349a7642e5ae5eb03ab9fcd0a0f74f09f78cab5 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 01:14:14 2023 +0000 Removing Navi21 commit fefd02fe68d2a4bca7ec2e381960ad004ee9fc5b Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 00:42:48 2023 +0000 Fixing CMake Job commit 2ea46abf7bf92643efa8c549fa70346ffbd79d65 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 00:35:13 2023 +0000 Fixing CMake Job commit d99d681ed1999c5fcf291dc678b11a77205fb0f3 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Wed Jul 12 00:32:13 2023 +0000 Fixing Pull Latest Dockers and CMake Jobs commit dfc4498072d13b4a1df3a63047d34c682c3d9a29 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 23:54:21 2023 +0000 Fixing CMake job commit 919efe04de707f7c702031be15c3e2c5f8442cbb Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 23:52:13 2023 +0000 Adding Pull Last dockers job commit be1b1256e8b0e05308e8f7e7e69bee3acca55281 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 18:25:40 2023 -0500 Update cmake.yml commit 212299fa4355ae6ec18f9aaacbb79c51ea6c6f97 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 18:23:35 2023 -0500 Update cmake.yml commit 7c2c1327086a61466cc6cac39f70865c051a8bc7 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 18:18:53 2023 -0500 Update cmake.yml commit 191b5ce007e612e814c1d7a3afb4ad398f3852e1 Author: Ammar ELWazir <aelwazir@amd.com> Date: Tue Jul 11 16:03:22 2023 -0500 Update cmake.yml commit 8824113d95f3e13c7ce4d0af8e0d9d8f522a6c4a Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 16:28:09 2023 +0000 Fixing Pull from Gerrit job name Change-Id: I9e7ed9a27a13ca49d62c93bdadb30f0057e4d385 commit cc3d5e4b02ffb439e8cc2b3efa53527c376f9982 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 16:21:43 2023 +0000 Adding Staging sync job Change-Id: I0551f43878b0678ce4b3e74e27d62357cf95ad95 commit b9be2eee71380a2e6dd34d520e92d0c4209277a0 Author: Ammar ELWazir <Ammar.ELWazir@amd.com> Date: Tue Jul 11 15:57:11 2023 +0000 Fixing build.sh Change-Id: Ia987b0244f0875370d5fe69907b3f5e9cea914de commit 9eee33a95a1abd656a7ac5ca10a9f245e9825431 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 21:39:46 2023 -0500 Update cmake.yml commit 7093b85a78497140e8b52632ca2a002bdaeacd62 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 21:33:29 2023 -0500 Update cmake.yml commit f54697172c72a67740f9fdfa0c217b6ea6931576 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 21:01:26 2023 -0500 Update cmake.yml commit 1b6620e16f8940386b0f4f04e69e2410d21c0e26 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 20:21:02 2023 -0500 Update cmake.yml commit a94bec740c6b42c4b79c87bca20fa87b99bf060d Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:46:35 2023 -0500 Update cmake.yml commit 85d6b29d4375a69d575c18ece8542c50f2ddfcc3 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:34:39 2023 -0500 Update cmake.yml commit 8c004887cf1435f1a6214c3d2455299a8a27bd4c Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:31:17 2023 -0500 Update cmake.yml commit a14a9168e17d9348a53c6e9c9a47ba1edb4c4509 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:25:46 2023 -0500 Update cmake.yml commit 000f2f40b84e6a2f7d4becdbf5aed01436ca4c83 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:08:18 2023 -0500 Update cmake.yml commit a28a53d56731cad848fa9133d1c4dbaa8fc7afa7 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 19:03:39 2023 -0500 Update cmake.yml commit a6a2db01027f0b01fdfbb5997ddb772c7f51b649 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 18:21:53 2023 -0500 Update cmake.yml commit 118ef2a88b2d44e3207c31c343da3e5e5ec6f176 Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 17:55:57 2023 -0500 Update cmake.yml commit 03c4c232396440cd0be6d2dd7baf4ceea1c2589d Author: Ammar ELWazir <aelwazir@amd.com> Date: Mon Jul 10 17:48:49 2023 -0500 Create cmake.yml Change-Id: I77992f15694e77cbae49c56f9ff02f4f9079235d [ROCm/rocprofiler commit:d4a33cf33a]
This commit is contained in:
کامیت شده توسط
Ammar Elwazir
والد
e599708211
کامیت
6eb06cf201
@@ -0,0 +1,98 @@
|
||||
parse:
|
||||
additional_commands:
|
||||
find_package:
|
||||
flags:
|
||||
- EXACT
|
||||
- QUIET
|
||||
- MODULE
|
||||
- REQUIRED
|
||||
- CONFIG
|
||||
- NO_MODULE
|
||||
- GLOBAL
|
||||
- NO_POLICY_SCOPE
|
||||
- BYPASS_PROVIDER
|
||||
- NO_DEFAULT_PATH
|
||||
- NO_PACKAGE_ROOT_PATH
|
||||
- NO_CMAKE_PATH
|
||||
- NO_CMAKE_ENVIRONMENT_PATH
|
||||
- NO_SYSTEM_ENVIRONMENT_PATH
|
||||
- NO_CMAKE_PACKAGE_REGISTRY
|
||||
- NO_CMAKE_BUILDS_PATH
|
||||
- NO_CMAKE_SYSTEM_PATH
|
||||
- NO_CMAKE_INSTALL_PREFIX
|
||||
- NO_CMAKE_SYSTEM_PACKAGE_REGISTRY
|
||||
- CMAKE_FIND_ROOT_PATH_BOTH
|
||||
- ONLY_CMAKE_FIND_ROOT_PATH
|
||||
- NO_CMAKE_FIND_ROOT_PATH
|
||||
kwargs:
|
||||
COMPONENTS: '*'
|
||||
OPTIONAL_COMPONENTS: '*'
|
||||
NAMES: '*'
|
||||
CONFIGS: '*'
|
||||
HINTS: '*'
|
||||
PATHS: '*'
|
||||
REGISTRY_VIEW: '*'
|
||||
PATH_SUFFIXES: '*'
|
||||
override_spec: {}
|
||||
vartags: []
|
||||
proptags: []
|
||||
format:
|
||||
disable: false
|
||||
line_width: 90
|
||||
tab_size: 4
|
||||
use_tabchars: false
|
||||
fractional_tab_policy: use-space
|
||||
max_subgroups_hwrap: 2
|
||||
max_pargs_hwrap: 8
|
||||
max_rows_cmdline: 2
|
||||
separate_ctrl_name_with_space: false
|
||||
separate_fn_name_with_space: false
|
||||
dangle_parens: false
|
||||
dangle_align: child
|
||||
min_prefix_chars: 4
|
||||
max_prefix_chars: 10
|
||||
max_lines_hwrap: 2
|
||||
line_ending: unix
|
||||
command_case: lower
|
||||
keyword_case: upper
|
||||
always_wrap: []
|
||||
enable_sort: true
|
||||
autosort: false
|
||||
require_valid_layout: false
|
||||
layout_passes: {}
|
||||
markup:
|
||||
bullet_char: '*'
|
||||
enum_char: .
|
||||
first_comment_is_literal: true
|
||||
literal_comment_pattern: ^#
|
||||
fence_pattern: ^\s*([`~]{3}[`~]*)(.*)$
|
||||
ruler_pattern: ^\s*[^\w\s]{3}.*[^\w\s]{3}$
|
||||
explicit_trailing_pattern: '#<'
|
||||
hashruler_min_length: 10
|
||||
canonicalize_hashrulers: true
|
||||
enable_markup: true
|
||||
lint:
|
||||
disabled_codes: []
|
||||
function_pattern: '[0-9a-z_]+'
|
||||
macro_pattern: '[0-9A-Z_]+'
|
||||
global_var_pattern: '[A-Z][0-9A-Z_]+'
|
||||
internal_var_pattern: _[A-Z][0-9A-Z_]+
|
||||
local_var_pattern: '[a-z][a-z0-9_]+'
|
||||
private_var_pattern: _[0-9a-z_]+
|
||||
public_var_pattern: '[A-Z][0-9A-Z_]+'
|
||||
argument_var_pattern: '[a-z][a-z0-9_]+'
|
||||
keyword_pattern: '[A-Z][0-9A-Z_]+'
|
||||
max_conditionals_custom_parser: 2
|
||||
min_statement_spacing: 1
|
||||
max_statement_spacing: 2
|
||||
max_returns: 6
|
||||
max_branches: 12
|
||||
max_arguments: 5
|
||||
max_localvars: 15
|
||||
max_statements: 50
|
||||
encode:
|
||||
emit_byteorder_mark: false
|
||||
input_encoding: utf-8
|
||||
output_encoding: utf-8
|
||||
misc:
|
||||
per_command: {}
|
||||
+14
-159
@@ -34,16 +34,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_ubuntu_22_04
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -56,14 +47,6 @@ jobs:
|
||||
- name: Build Tests, Samples, Documentation, Packages
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests samples doc package
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_ubuntu_22_04
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -102,16 +85,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_ubuntu_20_04
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -153,16 +127,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_sles
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -175,14 +140,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_sles
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -212,16 +169,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_8
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -234,14 +182,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_8
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -271,16 +211,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_9
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -293,14 +224,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi200_rhel_9
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -326,16 +249,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: vega20
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -348,14 +262,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: vega20
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -381,16 +287,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: navi32
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -403,14 +300,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: navi32
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -436,16 +325,7 @@ jobs:
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Restore cached Build
|
||||
id: cache-build-restore
|
||||
uses: actions/cache/restore@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi100
|
||||
|
||||
- name: Configure CMake
|
||||
if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -458,14 +338,6 @@ jobs:
|
||||
- name: Build Tests
|
||||
run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
- name: Save Build
|
||||
id: cache-build-save
|
||||
uses: actions/cache/save@v3
|
||||
with:
|
||||
path: |
|
||||
${{github.workspace}}/build
|
||||
key: mi100
|
||||
|
||||
- name: Testing V1
|
||||
run: |
|
||||
cd ${{github.workspace}}/build
|
||||
@@ -492,16 +364,7 @@ jobs:
|
||||
# steps:
|
||||
# - uses: actions/checkout@v3
|
||||
|
||||
# - name: Restore cached Build
|
||||
# id: cache-build-restore
|
||||
# uses: actions/cache/restore@v3
|
||||
# with:
|
||||
# path: |
|
||||
# ${{github.workspace}}/build
|
||||
# key: navi21
|
||||
|
||||
# - name: Configure CMake
|
||||
# if: steps.cache-build-restore.outputs.cache-hit != 'false'
|
||||
# # Configure CMake in a 'build' subdirectory. `CMAKE_BUILD_TYPE` is only required if you are using a single-configuration generator such as make.
|
||||
# # See https://cmake.org/cmake/help/latest/variable/CMAKE_BUILD_TYPE.html?highlight=cmake_build_type
|
||||
# run: cmake -B ${{github.workspace}}/build -DCMAKE_BUILD_TYPE=${{env.BUILD_TYPE}} -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE -DCMAKE_MODULE_PATH="${{env.ROCM_PATH}}/hip/cmake;${{env.ROCM_PATH}}/lib/cmake" -DCMAKE_PREFIX_PATH="${{env.PREFIX_PATH}}" -DCMAKE_INSTALL_PREFIX="${{env.ROCM_PATH}}" -DCMAKE_SHARED_LINKER_FLAGS="${{env.LD_RUNPATH_FLAG}}" -DCMAKE_INSTALL_RPATH=${{env.ROCM_RPATH}} -DCMAKE_INSTALL_RPATH_USE_LINK_PATH=FALSE -DGPU_TARGETS="${{env.GPU_LIST}}" -DCPACK_PACKAGING_INSTALL_PREFIX=${{env.ROCM_PATH}} -DCPACK_GENERATOR='DEB;RPM;TGZ' -DCPACK_OBJCOPY_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objcopy" -DCPACK_READELF_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-readelf" -DCPACK_STRIP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-strip" -DCPACK_OBJDUMP_EXECUTABLE="${{env.ROCM_PATH}}/llvm/bin/llvm-objdump"
|
||||
@@ -514,26 +377,18 @@ jobs:
|
||||
# - name: Build Tests
|
||||
# run: cmake --build ${{github.workspace}}/build --config ${{env.BUILD_TYPE}} -- -j 16 tests
|
||||
|
||||
# - name: Save Build
|
||||
# id: cache-build-save
|
||||
# uses: actions/cache/save@v3
|
||||
# with:
|
||||
# path: |
|
||||
# ${{github.workspace}}/build
|
||||
# key: navi21
|
||||
# - name: Testing V1
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# ./run.sh
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # working-directory: ${{github.workspace}}/build/tests-v2
|
||||
# # Execute tests defined by the CMake configuration.
|
||||
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
|
||||
|
||||
# - name: Testing V1
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# ./run.sh
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # working-directory: ${{github.workspace}}/build/tests-v2
|
||||
# # Execute tests defined by the CMake configuration.
|
||||
# # See https://cmake.org/cmake/help/latest/manual/ctest.1.html for more detail
|
||||
# # TODO(aelwazir): Enable this once ctest is fixed
|
||||
# # run: ctest --parallel 16 -C ${{env.BUILD_TYPE}}
|
||||
|
||||
# - name: Testing V2
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# make -j check
|
||||
# - name: Testing V2
|
||||
# run: |
|
||||
# cd ${{github.workspace}}/build
|
||||
# make -j check
|
||||
|
||||
@@ -0,0 +1,95 @@
|
||||
|
||||
name: Formatting
|
||||
run-name: formatting
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
branches: [ amd-staging ]
|
||||
|
||||
concurrency:
|
||||
group: ${{ github.workflow }}-${{ github.ref }}
|
||||
cancel-in-progress: true
|
||||
|
||||
jobs:
|
||||
cmake:
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Extract branch name
|
||||
shell: bash
|
||||
run: |
|
||||
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
|
||||
id: extract_branch
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y python3-pip
|
||||
python3 -m pip install -U cmake-format
|
||||
|
||||
- name: Run cmake-format
|
||||
run: |
|
||||
set +e
|
||||
cmake-format -i $(find . -type f | egrep 'CMakeLists.txt|\.cmake$')
|
||||
if [ $(git diff | wc -l) -ne 0 ]; then
|
||||
echo -e "\nError! CMake code not formatted. Run cmake-format...\n"
|
||||
echo -e "\nFiles:\n"
|
||||
git diff --name-only
|
||||
echo -e "\nFull diff:\n"
|
||||
git diff
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Create pull request
|
||||
if: failure()
|
||||
uses: peter-evans/create-pull-request@v5
|
||||
with:
|
||||
commit-message: "run cmake formatting (cmake-format)"
|
||||
branch: ${{ steps.extract_branch.outputs.branch }}-cmake-format
|
||||
delete-branch: true
|
||||
title: "Apply cmake-format to ${{ steps.extract_branch.outputs.branch }}"
|
||||
base: ${{ steps.extract_branch.outputs.branch }}
|
||||
|
||||
source:
|
||||
runs-on: ubuntu-22.04
|
||||
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
|
||||
- name: Install dependencies
|
||||
run: |
|
||||
DISTRIB_CODENAME=$(cat /etc/lsb-release | grep DISTRIB_CODENAME | awk -F '=' '{print $NF}')
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y software-properties-common wget curl clang-format-11
|
||||
|
||||
- name: Extract branch name
|
||||
shell: bash
|
||||
run: |
|
||||
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_HEAD_REF#refs/heads/}}" >> $GITHUB_OUTPUT
|
||||
id: extract_branch
|
||||
|
||||
- name: Run clang-format
|
||||
run: |
|
||||
set +e
|
||||
FILES=$(find include plugin samples src test tests-v2 -type f | egrep '\.(h|hpp|hh|c|cc|cpp)(|\.in)$')
|
||||
FORMAT_OUT=$(clang-format-11 -i ${FILES})
|
||||
if [ $(git diff | wc -l) -ne 0 ]; then
|
||||
echo -e "\nError! Code not formatted. Run clang-format (version 11)...\n"
|
||||
echo -e "\nFiles:\n"
|
||||
git diff --name-only
|
||||
echo -e "\nFull diff:\n"
|
||||
git diff
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Create pull request
|
||||
if: failure()
|
||||
uses: peter-evans/create-pull-request@v5
|
||||
with:
|
||||
commit-message: "run formatting (clang-format v11)"
|
||||
branch: ${{ steps.extract_branch.outputs.branch }}-clang-format
|
||||
delete-branch: true
|
||||
title: "Apply clang-format (v11) to ${{ steps.extract_branch.outputs.branch }}"
|
||||
base: ${{ steps.extract_branch.outputs.branch }}
|
||||
@@ -24,7 +24,7 @@ cmake_minimum_required(VERSION 3.18.0)
|
||||
|
||||
# Build is not supported on Windows plaform
|
||||
if(WIN32)
|
||||
message(FATAL_ERROR "Windows build is not supported.")
|
||||
message(FATAL_ERROR "Windows build is not supported.")
|
||||
endif()
|
||||
|
||||
# Set module name and project name.
|
||||
@@ -37,9 +37,9 @@ include(GNUInstallDirs)
|
||||
|
||||
# set default ROCM_PATH
|
||||
if(NOT DEFINED ROCM_PATH)
|
||||
set(ROCM_PATH
|
||||
"/opt/rocm"
|
||||
CACHE STRING "Default ROCM installation directory")
|
||||
set(ROCM_PATH
|
||||
"/opt/rocm"
|
||||
CACHE STRING "Default ROCM installation directory")
|
||||
endif()
|
||||
|
||||
set(CMAKE_CXX_STANDARD 17)
|
||||
@@ -62,8 +62,8 @@ set(BUILD_VERSION_MAJOR ${VERSION_MAJOR})
|
||||
set(BUILD_VERSION_MINOR ${VERSION_MINOR})
|
||||
set(BUILD_VERSION_PATCH ${VERSION_PATCH})
|
||||
if(DEFINED VERSION_BUILD AND NOT ${VERSION_BUILD} STREQUAL "")
|
||||
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
|
||||
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
|
||||
message("VERSION BUILD DEFINED ${VERSION_BUILD}")
|
||||
set(BUILD_VERSION_PATCH "${BUILD_VERSION_PATCH}-${VERSION_BUILD}")
|
||||
endif()
|
||||
set(BUILD_VERSION_STRING
|
||||
"${BUILD_VERSION_MAJOR}.${BUILD_VERSION_MINOR}.${BUILD_VERSION_PATCH}")
|
||||
@@ -71,12 +71,11 @@ set(BUILD_VERSION_STRING
|
||||
set(LIB_VERSION_MAJOR ${VERSION_MAJOR})
|
||||
set(LIB_VERSION_MINOR ${VERSION_MINOR})
|
||||
if(${ROCM_PATCH_VERSION})
|
||||
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
|
||||
set(LIB_VERSION_PATCH ${ROCM_PATCH_VERSION})
|
||||
else()
|
||||
set(LIB_VERSION_PATCH ${VERSION_PATCH})
|
||||
set(LIB_VERSION_PATCH ${VERSION_PATCH})
|
||||
endif()
|
||||
set(LIB_VERSION_STRING
|
||||
"${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
|
||||
set(LIB_VERSION_STRING "${LIB_VERSION_MAJOR}.${LIB_VERSION_MINOR}.${LIB_VERSION_PATCH}")
|
||||
message("-- LIB-VERSION STRING: ${LIB_VERSION_STRING}")
|
||||
|
||||
# Set target and root/lib/test directory
|
||||
@@ -86,97 +85,84 @@ set(LIB_DIR "${ROOT_DIR}/src")
|
||||
set(TEST_DIR "${ROOT_DIR}/test")
|
||||
|
||||
find_package(
|
||||
amd_comgr
|
||||
REQUIRED
|
||||
CONFIG
|
||||
HINTS
|
||||
${CMAKE_INSTALL_PREFIX}
|
||||
PATHS
|
||||
${ROCM_PATH}
|
||||
PATH_SUFFIXES
|
||||
lib/cmake/amd_comgr)
|
||||
amd_comgr REQUIRED CONFIG
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES lib/cmake/amd_comgr)
|
||||
message(STATUS "Code Object Manager found at ${amd_comgr_DIR}.")
|
||||
link_libraries(amd_comgr)
|
||||
|
||||
find_package(Threads REQUIRED)
|
||||
find_package(
|
||||
hsa-runtime64
|
||||
REQUIRED
|
||||
CONFIG
|
||||
HINTS
|
||||
${CMAKE_INSTALL_PREFIX}
|
||||
PATHS
|
||||
${ROCM_PATH})
|
||||
hsa-runtime64 REQUIRED CONFIG
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH})
|
||||
find_package(
|
||||
HIP
|
||||
REQUIRED
|
||||
CONFIG
|
||||
HINTS
|
||||
${CMAKE_INSTALL_PREFIX}
|
||||
PATHS
|
||||
${ROCM_PATH})
|
||||
HIP REQUIRED CONFIG
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH})
|
||||
|
||||
find_library(NUMA NAME numa REQUIRED)
|
||||
link_libraries(${NUMA})
|
||||
|
||||
find_program(ROCMINFO_EXEC NAMES "rocminfo"
|
||||
PATHS ${ROCM_PATH}
|
||||
${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
|
||||
PATH_SUFFIXES bin)
|
||||
find_program(
|
||||
ROCMINFO_EXEC
|
||||
NAMES "rocminfo"
|
||||
PATHS ${ROCM_PATH} ${CMAKE_INSTALL_PREFIX} "/usr/local" "/usr"
|
||||
PATH_SUFFIXES bin)
|
||||
set(ORIGINAL_SCRIPT_PATH ${CMAKE_CURRENT_SOURCE_DIR}/bin/tblextr.py)
|
||||
set(OUTPUT_SCRIPT_PATH ${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py)
|
||||
configure_file(${ORIGINAL_SCRIPT_PATH} ${OUTPUT_SCRIPT_PATH} @ONLY)
|
||||
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
|
||||
include_directories(${HSA_RUNTIME_INC_PATH})
|
||||
|
||||
if(NOT DEFINED LIBRARY_TYPE)
|
||||
set(LIBRARY_TYPE SHARED)
|
||||
set(LIBRARY_TYPE SHARED)
|
||||
endif()
|
||||
|
||||
# Enable tracing API
|
||||
if(NOT USE_PROF_API)
|
||||
set(USE_PROF_API 1)
|
||||
set(USE_PROF_API 1)
|
||||
endif()
|
||||
|
||||
# Protocol header lookup
|
||||
set(PROF_API_HEADER_NAME prof_protocol.h)
|
||||
if(USE_PROF_API EQUAL 1)
|
||||
find_path(
|
||||
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
|
||||
HINTS ${PROF_API_HEADER_PATH}
|
||||
PATHS /opt/rocm/include
|
||||
PATH_SUFFIXES roctracer/ext)
|
||||
if(NOT PROF_API_HEADER_DIR)
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
|
||||
)
|
||||
else()
|
||||
include_directories(${PROF_API_HEADER_DIR})
|
||||
message(
|
||||
STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
|
||||
endif()
|
||||
find_path(
|
||||
PROF_API_HEADER_DIR ${PROF_API_HEADER_NAME}
|
||||
HINTS ${PROF_API_HEADER_PATH}
|
||||
PATHS /opt/rocm/include
|
||||
PATH_SUFFIXES roctracer/ext)
|
||||
if(NOT PROF_API_HEADER_DIR)
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"Profiling API header not found. Tracer integration disabled. Use -DPROF_API_HEADER_PATH=<path to ${PROF_API_HEADER_NAME} header>"
|
||||
)
|
||||
else()
|
||||
include_directories(${PROF_API_HEADER_DIR})
|
||||
message(STATUS "Profiling API: ${PROF_API_HEADER_DIR}/${PROF_API_HEADER_NAME}")
|
||||
endif()
|
||||
endif()
|
||||
|
||||
# Build libraries
|
||||
add_subdirectory(src)
|
||||
|
||||
if(${LIBRARY_TYPE} STREQUAL SHARED)
|
||||
# Build samples
|
||||
add_subdirectory(samples)
|
||||
# Build samples
|
||||
add_subdirectory(samples)
|
||||
|
||||
# Build tests
|
||||
add_subdirectory(tests-v2)
|
||||
# Build tests
|
||||
add_subdirectory(tests-v2)
|
||||
endif()
|
||||
|
||||
# Build Plugins
|
||||
@@ -188,20 +174,20 @@ add_subdirectory(${TEST_DIR} ${PROJECT_BINARY_DIR}/test)
|
||||
# Installation and packaging
|
||||
set(DEST_NAME ${ROCPROFILER_NAME})
|
||||
if(DEFINED CMAKE_INSTALL_PREFIX)
|
||||
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
get_filename_component(prefix_name ${CMAKE_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CMAKE_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CMAKE_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
endif()
|
||||
if(DEFINED CPACK_PACKAGING_INSTALL_PREFIX)
|
||||
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
get_filename_component(prefix_name ${CPACK_PACKAGING_INSTALL_PREFIX} NAME)
|
||||
get_filename_component(prefix_dir ${CPACK_PACKAGING_INSTALL_PREFIX} DIRECTORY)
|
||||
if(prefix_name STREQUAL ${DEST_NAME})
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${prefix_dir})
|
||||
endif()
|
||||
else()
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
|
||||
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
|
||||
endif()
|
||||
message("CMake-install-prefix: ${CMAKE_INSTALL_PREFIX}")
|
||||
message("CPack-install-prefix: ${CPACK_PACKAGING_INSTALL_PREFIX}")
|
||||
@@ -209,413 +195,395 @@ message("-----------Dest-name: ${DEST_NAME}")
|
||||
|
||||
# Install headers
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT dev)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT dev)
|
||||
|
||||
# rpl_run.sh
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
RENAME rocprof
|
||||
COMPONENT runtime)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/rpl_run.sh
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
|
||||
RENAME rocprof
|
||||
COMPONENT runtime)
|
||||
|
||||
configure_file(bin/rocprofv2 ${PROJECT_BINARY_DIR} COPYONLY)
|
||||
install(
|
||||
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
FILES ${PROJECT_SOURCE_DIR}/bin/rocprofv2
|
||||
DESTINATION ${CMAKE_INSTALL_BINDIR}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
|
||||
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2xml.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/merge_traces.sh
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/txt2params.py
|
||||
${CMAKE_CURRENT_BINARY_DIR}/bin/tblextr.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/dform.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/mem_manager.py
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/bin/sqlitedb.py
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS OWNER_READ OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
|
||||
# gfx_metrics.xml metrics.xml
|
||||
install(
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
FILES ${CMAKE_CURRENT_SOURCE_DIR}/test/tool/metrics.xml
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/test/tool/gfx_metrics.xml
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
# librocprof-tool.so
|
||||
install(
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
install(
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT asan)
|
||||
FILES ${PROJECT_BINARY_DIR}/test/librocprof-tool.so
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
COMPONENT asan)
|
||||
|
||||
install(
|
||||
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS
|
||||
OWNER_READ
|
||||
OWNER_WRITE
|
||||
OWNER_EXECUTE
|
||||
GROUP_READ
|
||||
GROUP_EXECUTE
|
||||
WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
FILES ${PROJECT_BINARY_DIR}/test/rocprof-ctrl
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}
|
||||
PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ
|
||||
WORLD_EXECUTE
|
||||
COMPONENT runtime)
|
||||
|
||||
# File reorg backward compatibility for non ASAN packaging
|
||||
if ( NOT ENABLE_ASAN_PACKAGING )
|
||||
# File reorg Backward compatibility
|
||||
option(FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
"Enable File Reorg with backward compatibility" ON)
|
||||
if(NOT ENABLE_ASAN_PACKAGING)
|
||||
# File reorg Backward compatibility
|
||||
option(FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
"Enable File Reorg with backward compatibility" ON)
|
||||
endif()
|
||||
|
||||
if(FILE_REORG_BACKWARD_COMPATIBILITY)
|
||||
# To enabe/disable #error in wrapper header files
|
||||
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
|
||||
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
|
||||
set(ROCM_HEADER_WRAPPER_WERROR "$ENV{ROCM_HEADER_WRAPPER_WERROR}"
|
||||
CACHE STRING "Header wrapper warnings as errors.")
|
||||
else()
|
||||
set(ROCM_HEADER_WRAPPER_WERROR "OFF" CACHE STRING "Header wrapper warnings as errors.")
|
||||
# To enabe/disable #error in wrapper header files
|
||||
if(NOT DEFINED ROCM_HEADER_WRAPPER_WERROR)
|
||||
if(DEFINED ENV{ROCM_HEADER_WRAPPER_WERROR})
|
||||
set(ROCM_HEADER_WRAPPER_WERROR
|
||||
"$ENV{ROCM_HEADER_WRAPPER_WERROR}"
|
||||
CACHE STRING "Header wrapper warnings as errors.")
|
||||
else()
|
||||
set(ROCM_HEADER_WRAPPER_WERROR
|
||||
"OFF"
|
||||
CACHE STRING "Header wrapper warnings as errors.")
|
||||
endif()
|
||||
endif()
|
||||
endif()
|
||||
|
||||
if(ROCM_HEADER_WRAPPER_WERROR)
|
||||
set(deprecated_error 1)
|
||||
else()
|
||||
set(deprecated_error 0)
|
||||
endif()
|
||||
include(rocprofiler-backward-compat.cmake)
|
||||
endif() #FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
if(ROCM_HEADER_WRAPPER_WERROR)
|
||||
set(deprecated_error 1)
|
||||
else()
|
||||
set(deprecated_error 0)
|
||||
endif()
|
||||
include(rocprofiler-backward-compat.cmake)
|
||||
endif() # FILE_REORG_BACKWARD_COMPATIBILITY
|
||||
|
||||
if(${LIBRARY_TYPE} STREQUAL SHARED)
|
||||
# Packaging directives
|
||||
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
|
||||
set(ENABLE_LDCONFIG
|
||||
ON
|
||||
CACHE BOOL "Set library links and caches using ldconfig.")
|
||||
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
|
||||
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
|
||||
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
|
||||
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
|
||||
set(CPACK_PACKAGE_VERSION
|
||||
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
|
||||
)
|
||||
set(CPACK_PACKAGE_CONTACT
|
||||
"ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
|
||||
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
|
||||
"ROCPROFILER library for AMD HSA runtime API extension support")
|
||||
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
|
||||
|
||||
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
|
||||
# Packaging directives
|
||||
set(CPACK_GENERATOR "DEB" "RPM" "TGZ")
|
||||
set(ENABLE_LDCONFIG
|
||||
ON
|
||||
CACHE BOOL "Set library links and caches using ldconfig.")
|
||||
set(CPACK_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_PACKAGE_VENDOR "Advanced Micro Devices, Inc.")
|
||||
set(CPACK_PACKAGE_VERSION_MAJOR ${PROJECT_VERSION_MAJOR})
|
||||
set(CPACK_PACKAGE_VERSION_MINOR ${PROJECT_VERSION_MINOR})
|
||||
set(CPACK_PACKAGE_VERSION_PATCH ${PROJECT_VERSION_PATCH})
|
||||
set(CPACK_PACKAGE_VERSION
|
||||
"${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
|
||||
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
|
||||
endif()
|
||||
"${CPACK_PACKAGE_VERSION_MAJOR}.${CPACK_PACKAGE_VERSION_MINOR}.${CPACK_PACKAGE_VERSION_PATCH}"
|
||||
)
|
||||
set(CPACK_PACKAGE_CONTACT "ROCm Profiler Support <dl.ROCm-Profiler.support@amd.com>")
|
||||
set(CPACK_PACKAGE_DESCRIPTION_SUMMARY
|
||||
"ROCPROFILER library for AMD HSA runtime API extension support")
|
||||
set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_CURRENT_SOURCE_DIR}/LICENSE")
|
||||
|
||||
if(DEFINED ENV{ROCM_LIBPATCH_VERSION})
|
||||
set(CPACK_PACKAGE_VERSION "${CPACK_PACKAGE_VERSION}.$ENV{ROCM_LIBPATCH_VERSION}")
|
||||
message("Using CPACK_PACKAGE_VERSION ${CPACK_PACKAGE_VERSION}")
|
||||
endif()
|
||||
|
||||
# Debian package specific variable for ASAN
|
||||
set(CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
|
||||
set(CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan")
|
||||
|
||||
# Debian package specific variable for ASAN
|
||||
set ( CPACK_DEBIAN_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
|
||||
set ( CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS "hsa-rocr-asan, rocm-core-asan" )
|
||||
# Install license file
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
|
||||
COMPONENT asan)
|
||||
|
||||
# Install license file
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
FILES ${CPACK_RESOURCE_FILE_LICENSE}
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}-asan
|
||||
COMPONENT asan)
|
||||
# Debian package specific variables
|
||||
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
|
||||
# Debian package specific variables
|
||||
if(DEFINED ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE $ENV{CPACK_DEBIAN_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_DEBIAN_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
|
||||
set(CPACK_DEB_COMPONENT_INSTALL ON)
|
||||
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
|
||||
"hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev"
|
||||
)
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS "${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
|
||||
message("Using CPACK_DEBIAN_PACKAGE_RELEASE ${CPACK_DEBIAN_PACKAGE_RELEASE}")
|
||||
set(CPACK_DEB_COMPONENT_INSTALL ON)
|
||||
set(CPACK_DEBIAN_FILE_NAME "DEB-DEFAULT")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS "hsa-rocr-dev, rocm-core, libsystemd-dev, libelf-dev, libnuma-dev, libpciaccess-dev, libxml2-dev")
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_NAME "${PROJECT_NAME}-dev")
|
||||
set(CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}-dev, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
|
||||
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md")
|
||||
|
||||
set ( CPACK_DEBIAN_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
|
||||
# RPM package specific variables
|
||||
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_RPM_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
|
||||
# RPM package specific variables
|
||||
if(DEFINED ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
set(CPACK_RPM_PACKAGE_RELEASE $ENV{CPACK_RPM_PACKAGE_RELEASE})
|
||||
else()
|
||||
set(CPACK_RPM_PACKAGE_RELEASE "local")
|
||||
endif()
|
||||
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
|
||||
message("Using CPACK_RPM_PACKAGE_RELEASE ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
|
||||
|
||||
set(CPACK_RPM_PACKAGE_LICENSE "MIT")
|
||||
# 'dist' breaks manual builds on debian systems due to empty Provides
|
||||
execute_process(
|
||||
COMMAND rpm --eval %{?dist}
|
||||
RESULT_VARIABLE PROC_RESULT
|
||||
OUTPUT_VARIABLE EVAL_RESULT
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
|
||||
|
||||
# 'dist' breaks manual builds on debian systems due to empty Provides
|
||||
execute_process(
|
||||
COMMAND rpm --eval %{?dist}
|
||||
RESULT_VARIABLE PROC_RESULT
|
||||
OUTPUT_VARIABLE EVAL_RESULT
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE)
|
||||
message("RESULT_VARIABLE ${PROC_RESULT} OUTPUT_VARIABLE: ${EVAL_RESULT}")
|
||||
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
|
||||
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
|
||||
endif()
|
||||
|
||||
if(PROC_RESULT EQUAL "0" AND NOT EVAL_RESULT STREQUAL "")
|
||||
string(APPEND CPACK_RPM_PACKAGE_RELEASE "%{?dist}")
|
||||
endif()
|
||||
set(CPACK_RPM_COMPONENT_INSTALL ON)
|
||||
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
|
||||
"hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES "${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
|
||||
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
|
||||
set(CPACK_RPM_COMPONENT_INSTALL ON)
|
||||
set(CPACK_RPM_FILE_NAME "RPM-DEFAULT")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_NAME "${PROJECT_NAME}")
|
||||
set(CPACK_RPM_RUNTIME_PACKAGE_REQUIRES "hsa-rocr-dev, rocm-core, systemd-devel, libpciaccess-devel, libxml2-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_NAME "${PROJECT_NAME}-devel")
|
||||
set(CPACK_RPM_DEV_PACKAGE_REQUIRES "${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DEV_PACKAGE_PROVIDES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_DEV_PACKAGE_OBSOLETES "${PROJECT_NAME}-dev")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_NAME "${PROJECT_NAME}-tests")
|
||||
set(CPACK_RPM_TESTS_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_NAME "${PROJECT_NAME}-docs")
|
||||
set(CPACK_RPM_DOCS_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_NAME "${PROJECT_NAME}-plugins")
|
||||
set(CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}, hsa-rocr-dev, rocm-core")
|
||||
set(CPACK_RPM_PACKAGE_AUTOREQ 0)
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_NAME "${PROJECT_NAME}-samples")
|
||||
set(CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
"${PROJECT_NAME}-devel, hsa-rocr-dev, rocm-core, hip-runtime-amd")
|
||||
message("CPACK_RPM_PACKAGE_RELEASE: ${CPACK_RPM_PACKAGE_RELEASE}")
|
||||
|
||||
#Disable build id for rocprofiler as its creating transaction error
|
||||
set ( CPACK_RPM_SPEC_MORE_DEFINE "%define _build_id_links none
|
||||
# Disable build id for rocprofiler as its creating transaction error
|
||||
set(CPACK_RPM_SPEC_MORE_DEFINE
|
||||
"%define _build_id_links none
|
||||
%global __strip ${CPACK_STRIP_EXECUTABLE}
|
||||
%global __objdump ${CPACK_OBJDUMP_EXECUTABLE}
|
||||
%global __objcopy ${CPACK_OBJCOPY_EXECUTABLE}
|
||||
%global __readelf ${CPACK_READELF_EXECUTABLE}")
|
||||
|
||||
# RPM package specific variable for ASAN
|
||||
set ( CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan" )
|
||||
set ( CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan" )
|
||||
|
||||
#set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
|
||||
# RPM package specific variable for ASAN
|
||||
set(CPACK_RPM_ASAN_PACKAGE_NAME "${ROCPROFILER_NAME}-asan")
|
||||
set(CPACK_RPM_ASAN_PACKAGE_REQUIRES "hsa-rocr-asan, rocm-core-asan")
|
||||
|
||||
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
|
||||
if(NOT ROCM_DEP_ROCMCORE)
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
|
||||
string(REGEX
|
||||
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
|
||||
string(REGEX
|
||||
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
|
||||
string(REGEX
|
||||
REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
|
||||
endif()
|
||||
# set ( CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md" )
|
||||
|
||||
## set components
|
||||
if(ENABLE_ASAN_PACKAGING)
|
||||
# ASAN Package requires only asan component with libraries and license file
|
||||
set(CPACK_COMPONENTS_ALL asan)
|
||||
else()
|
||||
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
|
||||
endif()
|
||||
# Remove dependency on rocm-core if -DROCM_DEP_ROCMCORE=ON not given to cmake
|
||||
if(NOT ROCM_DEP_ROCMCORE)
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_RUNTIME_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_RUNTIME_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DEV_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DEV_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_RPM_ASAN_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_ASAN_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_TESTS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_TESTS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_SAMPLES_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_SAMPLES_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_DOCS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_DOCS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_RPM_PLUGINS_PACKAGE_REQUIRES
|
||||
${CPACK_RPM_PLUGINS_PACKAGE_REQUIRES})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_RUNTIME_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DEV_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DEV_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core-asan" "" CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_ASAN_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_TESTS_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_SAMPLES_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_DOCS_PACKAGE_DEPENDS})
|
||||
string(REGEX REPLACE ",? ?rocm-core" "" CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS
|
||||
${CPACK_DEBIAN_PLUGINS_PACKAGE_DEPENDS})
|
||||
endif()
|
||||
|
||||
include(CPack)
|
||||
# set components
|
||||
if(ENABLE_ASAN_PACKAGING)
|
||||
# ASAN Package requires only asan component with libraries and license file
|
||||
set(CPACK_COMPONENTS_ALL asan)
|
||||
else()
|
||||
set(CPACK_COMPONENTS_ALL runtime dev tests docs plugins samples)
|
||||
endif()
|
||||
|
||||
cpack_add_component(
|
||||
runtime
|
||||
DISPLAY_NAME "Runtime"
|
||||
DESCRIPTION "Dynamic libraries for the ROCProfiler")
|
||||
include(CPack)
|
||||
|
||||
cpack_add_component(
|
||||
dev
|
||||
DISPLAY_NAME "Development"
|
||||
DESCRIPTION "Development needed header files for ROCProfiler"
|
||||
DEPENDS runtime)
|
||||
cpack_add_component(
|
||||
runtime
|
||||
DISPLAY_NAME "Runtime"
|
||||
DESCRIPTION "Dynamic libraries for the ROCProfiler")
|
||||
|
||||
cpack_add_component(
|
||||
plugins
|
||||
DISPLAY_NAME "ROCProfile Plugins"
|
||||
DESCRIPTION "Plugins for handling ROCProfiler data output"
|
||||
DEPENDS runtime)
|
||||
cpack_add_component(
|
||||
dev
|
||||
DISPLAY_NAME "Development"
|
||||
DESCRIPTION "Development needed header files for ROCProfiler"
|
||||
DEPENDS runtime)
|
||||
|
||||
cpack_add_component(
|
||||
tests
|
||||
DISPLAY_NAME "Tests"
|
||||
DESCRIPTION "Tests for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
cpack_add_component(
|
||||
plugins
|
||||
DISPLAY_NAME "ROCProfile Plugins"
|
||||
DESCRIPTION "Plugins for handling ROCProfiler data output"
|
||||
DEPENDS runtime)
|
||||
|
||||
cpack_add_component(
|
||||
samples
|
||||
DISPLAY_NAME "Samples"
|
||||
DESCRIPTION "Samples for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
cpack_add_component(
|
||||
tests
|
||||
DISPLAY_NAME "Tests"
|
||||
DESCRIPTION "Tests for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
|
||||
cpack_add_component(
|
||||
docs
|
||||
DISPLAY_NAME "Documentation"
|
||||
DESCRIPTION "Documentation for the ROCProfiler API"
|
||||
DEPENDS dev)
|
||||
cpack_add_component(
|
||||
samples
|
||||
DISPLAY_NAME "Samples"
|
||||
DESCRIPTION "Samples for the ROCProfiler"
|
||||
DEPENDS dev)
|
||||
|
||||
cpack_add_component(
|
||||
asan
|
||||
DISPLAY_NAME "ASAN"
|
||||
DESCRIPTION "ASAN libraries for the ROCPROFILER"
|
||||
DEPENDS asan)
|
||||
cpack_add_component(
|
||||
docs
|
||||
DISPLAY_NAME "Documentation"
|
||||
DESCRIPTION "Documentation for the ROCProfiler API"
|
||||
DEPENDS dev)
|
||||
|
||||
cpack_add_component(
|
||||
asan
|
||||
DISPLAY_NAME "ASAN"
|
||||
DESCRIPTION "ASAN libraries for the ROCPROFILER"
|
||||
DEPENDS asan)
|
||||
endif()
|
||||
|
||||
find_package(Doxygen)
|
||||
|
||||
if(DOXYGEN_FOUND)
|
||||
# # Set input and output files for API Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
|
||||
# # Set input and output files for API Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_API.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/Doxyfile_API)
|
||||
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
|
||||
COMMENT "Generating API documentation")
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler_plugin.h
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/v2/rocprofiler.h
|
||||
COMMENT "Generating API documentation")
|
||||
|
||||
add_custom_target(
|
||||
doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
|
||||
add_custom_target(doc DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf)
|
||||
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_api_spec.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/doc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
# # Set input and output files for Tools Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
|
||||
# # Set input and output files for Tools Document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_Tool.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/Doxyfile_Tool)
|
||||
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
|
||||
COMMENT "Generating Tools documentation")
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/doc/rocprofv2_tool.md
|
||||
COMMENT "Generating Tools documentation")
|
||||
|
||||
add_custom_target(
|
||||
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
|
||||
add_custom_target(
|
||||
doc_tool DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/index.html
|
||||
${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf)
|
||||
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_tool.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}v2_tool.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
DIRECTORY "${CMAKE_CURRENT_BINARY_DIR}/tooldoc/html/"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}/html
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
# # Set input and output files for changelog document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
|
||||
# # Set input and output files for changelog document
|
||||
set(DOXYGEN_IN ${CMAKE_CURRENT_SOURCE_DIR}/doc/Doxyfile_ChangeLog.in)
|
||||
set(DOXYGEN_OUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/Doxyfile_ChangeLog)
|
||||
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
# # Request to configure the file
|
||||
configure_file(${DOXYGEN_IN} ${DOXYGEN_OUT} @ONLY)
|
||||
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
|
||||
COMMENT "Generating changelog documentation")
|
||||
add_custom_command(
|
||||
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${DOXYGEN_OUT}
|
||||
COMMAND make -C ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex pdf
|
||||
MAIN_DEPENDENCY ${DOXYGEN_OUT}
|
||||
${DOXYGEN_IN}
|
||||
DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/CHANGELOG.md
|
||||
COMMENT "Generating changelog documentation")
|
||||
|
||||
add_custom_target(
|
||||
doc_changelog DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
|
||||
add_custom_target(doc_changelog
|
||||
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf)
|
||||
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/doc/changelog/latex/refman.pdf"
|
||||
DESTINATION ${CMAKE_INSTALL_DOCDIR}
|
||||
RENAME "${PROJECT_NAME}_ChangeLog.pdf"
|
||||
OPTIONAL
|
||||
COMPONENT docs)
|
||||
|
||||
add_dependencies(doc doc_changelog)
|
||||
add_dependencies(doc doc_changelog)
|
||||
endif()
|
||||
|
||||
|
||||
@@ -5,23 +5,16 @@
|
||||
# - LIBDW_INCLUDE_DIRS - the libelf include directory
|
||||
# - LIBDW_LIBRARIES - Link these to use libelf
|
||||
# - LIBDW_DEFINITIONS - Compiler switches required for using libelf
|
||||
find_path(FIND_LIBDW_INCLUDES
|
||||
NAMES
|
||||
elfutils/libdw.h
|
||||
PATHS
|
||||
/usr/include
|
||||
/usr/local/include)
|
||||
find_path(
|
||||
FIND_LIBDW_INCLUDES
|
||||
NAMES elfutils/libdw.h
|
||||
PATHS /usr/include /usr/local/include)
|
||||
|
||||
find_library(FIND_LIBDW_LIBRARIES
|
||||
NAMES
|
||||
dw
|
||||
PATH
|
||||
/usr/lib
|
||||
/usr/local/lib)
|
||||
find_library(FIND_LIBDW_LIBRARIES NAMES dw PATH /usr/lib /usr/local/lib)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(LibDw DEFAULT_MSG
|
||||
FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
|
||||
find_package_handle_standard_args(LibDw DEFAULT_MSG FIND_LIBDW_INCLUDES
|
||||
FIND_LIBDW_LIBRARIES)
|
||||
mark_as_advanced(FIND_LIBDW_INCLUDES FIND_LIBDW_LIBRARIES)
|
||||
|
||||
set(LIBDW_INCLUDES ${FIND_LIBDW_INCLUDES})
|
||||
|
||||
@@ -5,25 +5,16 @@
|
||||
# - LIBELF_INCLUDE_DIRS - the libelf include directory
|
||||
# - LIBELF_LIBRARIES - Link these to use libelf
|
||||
# - LIBELF_DEFINITIONS - Compiler switches required for using libelf
|
||||
find_path(FIND_LIBELF_INCLUDES
|
||||
NAMES
|
||||
libelf.h
|
||||
PATHS
|
||||
/usr/include
|
||||
/usr/include/libelf
|
||||
/usr/local/include
|
||||
/usr/local/include/libelf)
|
||||
find_path(
|
||||
FIND_LIBELF_INCLUDES
|
||||
NAMES libelf.h
|
||||
PATHS /usr/include /usr/include/libelf /usr/local/include /usr/local/include/libelf)
|
||||
|
||||
find_library(FIND_LIBELF_LIBRARIES
|
||||
NAMES
|
||||
elf
|
||||
PATH
|
||||
/usr/lib
|
||||
/usr/local/lib)
|
||||
find_library(FIND_LIBELF_LIBRARIES NAMES elf PATH /usr/lib /usr/local/lib)
|
||||
|
||||
include(FindPackageHandleStandardArgs)
|
||||
find_package_handle_standard_args(LibElf DEFAULT_MSG
|
||||
FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
|
||||
find_package_handle_standard_args(LibElf DEFAULT_MSG FIND_LIBELF_INCLUDES
|
||||
FIND_LIBELF_LIBRARIES)
|
||||
mark_as_advanced(FIND_LIBELF_INCLUDES FIND_LIBELF_LIBRARIES)
|
||||
|
||||
set(LIBELF_INCLUDES ${FIND_LIBELF_INCLUDES})
|
||||
|
||||
@@ -20,60 +20,75 @@
|
||||
# THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
## Linux Compiler options
|
||||
# Linux Compiler options
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fms-extensions")
|
||||
|
||||
add_definitions ( -DNEW_TRACE_API=1 )
|
||||
add_definitions(-DNEW_TRACE_API=1)
|
||||
|
||||
## CLANG options
|
||||
# CLANG options
|
||||
if("$ENV{CXX}" STREQUAL "/usr/bin/clang++")
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ferror-limit=1000000")
|
||||
endif()
|
||||
|
||||
## Enable debug trace
|
||||
if ( DEFINED ENV{CMAKE_DEBUG_TRACE} )
|
||||
add_definitions ( -DDEBUG_TRACE=1 )
|
||||
# Enable debug trace
|
||||
if(DEFINED ENV{CMAKE_DEBUG_TRACE})
|
||||
add_definitions(-DDEBUG_TRACE=1)
|
||||
endif()
|
||||
|
||||
## Enable AQL-profile new API
|
||||
if ( NOT DEFINED ENV{CMAKE_CURR_API} )
|
||||
add_definitions ( -DAQLPROF_NEW_API=1 )
|
||||
# Enable AQL-profile new API
|
||||
if(NOT DEFINED ENV{CMAKE_CURR_API})
|
||||
add_definitions(-DAQLPROF_NEW_API=1)
|
||||
endif()
|
||||
|
||||
## Enable direct loading of AQL-profile HSA extension
|
||||
if ( DEFINED ENV{CMAKE_LD_AQLPROFILE} )
|
||||
add_definitions ( -DROCP_LD_AQLPROFILE=1 )
|
||||
# Enable direct loading of AQL-profile HSA extension
|
||||
if(DEFINED ENV{CMAKE_LD_AQLPROFILE})
|
||||
add_definitions(-DROCP_LD_AQLPROFILE=1)
|
||||
endif()
|
||||
|
||||
## Find hsa-runtime
|
||||
find_package(hsa-runtime64 CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsa-runtime64 )
|
||||
# Find hsa-runtime
|
||||
find_package(
|
||||
hsa-runtime64 CONFIG REQUIRED
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS /opt/rocm
|
||||
PATH_SUFFIXES lib/cmake/hsa-runtime64)
|
||||
|
||||
# find KFD thunk
|
||||
find_package(hsakmt CONFIG REQUIRED HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES lib/cmake/hsakmt )
|
||||
find_package(
|
||||
hsakmt CONFIG REQUIRED
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS /opt/rocm
|
||||
PATH_SUFFIXES lib/cmake/hsakmt)
|
||||
|
||||
## Find ROCm
|
||||
## TODO: Need a better method to find the ROCm path
|
||||
find_path ( HSA_KMT_INC_PATH "hsakmt/hsakmt.h" HINTS ${CMAKE_PREFIX_PATH} PATHS /opt/rocm PATH_SUFFIXES include )
|
||||
if ( "${HSA_KMT_INC_PATH}" STREQUAL "" )
|
||||
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
|
||||
# Find ROCm TODO: Need a better method to find the ROCm path
|
||||
find_path(
|
||||
HSA_KMT_INC_PATH "hsakmt/hsakmt.h"
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS /opt/rocm
|
||||
PATH_SUFFIXES include)
|
||||
if("${HSA_KMT_INC_PATH}" STREQUAL "")
|
||||
get_target_property(HSA_KMT_INC_PATH hsakmt::hsakmt INTERFACE_INCLUDE_DIRECTORIES)
|
||||
endif()
|
||||
## Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
|
||||
get_filename_component ( ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY )
|
||||
# Include path: /opt/rocm-ver/include. Go up one level to get ROCm path
|
||||
get_filename_component(ROCM_ROOT_DIR "${HSA_KMT_INC_PATH}" DIRECTORY)
|
||||
|
||||
## Basic Tool Chain Information
|
||||
message ( "----------Build-Type: ${CMAKE_BUILD_TYPE}" )
|
||||
message ( "------------Compiler: ${CMAKE_CXX_COMPILER}" )
|
||||
message ( "----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}" )
|
||||
message ( "-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}" )
|
||||
message ( "-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}" )
|
||||
message ( "---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}" )
|
||||
message ( "---------GPU_TARGETS: ${GPU_TARGETS}" )
|
||||
# Basic Tool Chain Information
|
||||
message("----------Build-Type: ${CMAKE_BUILD_TYPE}")
|
||||
message("------------Compiler: ${CMAKE_CXX_COMPILER}")
|
||||
message("----Compiler-Version: ${CMAKE_CXX_COMPILER_VERSION}")
|
||||
message("-------ROCM_ROOT_DIR: ${ROCM_ROOT_DIR}")
|
||||
message("-----CMAKE_CXX_FLAGS: ${CMAKE_CXX_FLAGS}")
|
||||
message("---CMAKE_PREFIX_PATH: ${CMAKE_PREFIX_PATH}")
|
||||
message("---------GPU_TARGETS: ${GPU_TARGETS}")
|
||||
|
||||
if ( "${ROCM_ROOT_DIR}" STREQUAL "" )
|
||||
message ( FATAL_ERROR "ROCM_ROOT_DIR is not found." )
|
||||
endif ()
|
||||
if("${ROCM_ROOT_DIR}" STREQUAL "")
|
||||
message(FATAL_ERROR "ROCM_ROOT_DIR is not found.")
|
||||
endif()
|
||||
|
||||
find_library(FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_ROOT_DIR} PATH_SUFFIXES lib REQUIRED)
|
||||
find_library(
|
||||
FIND_AQL_PROFILE_LIB "libhsa-amd-aqlprofile64.so"
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS ${ROCM_ROOT_DIR}
|
||||
PATH_SUFFIXES lib REQUIRED)
|
||||
if(NOT FIND_AQL_PROFILE_LIB)
|
||||
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
|
||||
message("AQL_PROFILE not installed. Please install AQL_PROFILE")
|
||||
endif()
|
||||
|
||||
@@ -20,77 +20,95 @@
|
||||
# THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
## Parses the VERSION_STRING variable and places
|
||||
## the first, second and third number values in
|
||||
## the major, minor and patch variables.
|
||||
function( parse_version VERSION_STRING )
|
||||
# Parses the VERSION_STRING variable and places the first, second and third number values
|
||||
# in the major, minor and patch variables.
|
||||
function(parse_version VERSION_STRING)
|
||||
|
||||
string ( FIND ${VERSION_STRING} "-" STRING_INDEX )
|
||||
string(FIND ${VERSION_STRING} "-" STRING_INDEX)
|
||||
|
||||
if ( ${STRING_INDEX} GREATER -1 )
|
||||
math ( EXPR STRING_INDEX "${STRING_INDEX} + 1" )
|
||||
string ( SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD )
|
||||
endif ()
|
||||
if(${STRING_INDEX} GREATER -1)
|
||||
math(EXPR STRING_INDEX "${STRING_INDEX} + 1")
|
||||
string(SUBSTRING ${VERSION_STRING} ${STRING_INDEX} -1 VERSION_BUILD)
|
||||
endif()
|
||||
|
||||
string ( REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING} )
|
||||
list ( LENGTH VERSIONS VERSION_COUNT )
|
||||
string(REGEX MATCHALL "[0123456789]+" VERSIONS ${VERSION_STRING})
|
||||
list(LENGTH VERSIONS VERSION_COUNT)
|
||||
|
||||
if ( ${VERSION_COUNT} GREATER 0)
|
||||
list ( GET VERSIONS 0 MAJOR )
|
||||
set ( VERSION_MAJOR ${MAJOR} PARENT_SCOPE )
|
||||
set ( TEMP_VERSION_STRING "${MAJOR}" )
|
||||
endif ()
|
||||
if(${VERSION_COUNT} GREATER 0)
|
||||
list(GET VERSIONS 0 MAJOR)
|
||||
set(VERSION_MAJOR
|
||||
${MAJOR}
|
||||
PARENT_SCOPE)
|
||||
set(TEMP_VERSION_STRING "${MAJOR}")
|
||||
endif()
|
||||
|
||||
if ( ${VERSION_COUNT} GREATER 1 )
|
||||
list ( GET VERSIONS 1 MINOR )
|
||||
set ( VERSION_MINOR ${MINOR} PARENT_SCOPE )
|
||||
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}" )
|
||||
endif ()
|
||||
if(${VERSION_COUNT} GREATER 1)
|
||||
list(GET VERSIONS 1 MINOR)
|
||||
set(VERSION_MINOR
|
||||
${MINOR}
|
||||
PARENT_SCOPE)
|
||||
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${MINOR}")
|
||||
endif()
|
||||
|
||||
if ( ${VERSION_COUNT} GREATER 2 )
|
||||
list ( GET VERSIONS 2 PATCH )
|
||||
set ( VERSION_PATCH ${PATCH} PARENT_SCOPE )
|
||||
set ( TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}" )
|
||||
endif ()
|
||||
if(${VERSION_COUNT} GREATER 2)
|
||||
list(GET VERSIONS 2 PATCH)
|
||||
set(VERSION_PATCH
|
||||
${PATCH}
|
||||
PARENT_SCOPE)
|
||||
set(TEMP_VERSION_STRING "${TEMP_VERSION_STRING}.${PATCH}")
|
||||
endif()
|
||||
|
||||
if ( DEFINED VERSION_BUILD )
|
||||
set ( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
|
||||
endif ()
|
||||
if(DEFINED VERSION_BUILD)
|
||||
set(VERSION_BUILD
|
||||
"${VERSION_BUILD}"
|
||||
PARENT_SCOPE)
|
||||
endif()
|
||||
|
||||
set ( VERSION_STRING "${TEMP_VERSION_STRING}" PARENT_SCOPE )
|
||||
|
||||
endfunction ()
|
||||
|
||||
## Gets the current version of the repository
|
||||
## using versioning tags and git describe.
|
||||
## Passes back a packaging version string
|
||||
## and a library version string.
|
||||
function ( get_version DEFAULT_VERSION_STRING )
|
||||
|
||||
parse_version ( ${DEFAULT_VERSION_STRING} )
|
||||
|
||||
find_program ( GIT NAMES git )
|
||||
|
||||
if ( GIT )
|
||||
|
||||
execute_process ( COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
|
||||
OUTPUT_VARIABLE GIT_TAG_STRING
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE RESULT )
|
||||
|
||||
if ( ${RESULT} EQUAL 0 )
|
||||
|
||||
parse_version ( ${GIT_TAG_STRING} )
|
||||
|
||||
endif ()
|
||||
|
||||
endif ()
|
||||
|
||||
set( VERSION_STRING "${VERSION_STRING}" PARENT_SCOPE )
|
||||
set( VERSION_MAJOR "${VERSION_MAJOR}" PARENT_SCOPE )
|
||||
set( VERSION_MINOR "${VERSION_MINOR}" PARENT_SCOPE )
|
||||
set( VERSION_PATCH "${VERSION_PATCH}" PARENT_SCOPE )
|
||||
set( VERSION_BUILD "${VERSION_BUILD}" PARENT_SCOPE )
|
||||
set(VERSION_STRING
|
||||
"${TEMP_VERSION_STRING}"
|
||||
PARENT_SCOPE)
|
||||
|
||||
endfunction()
|
||||
|
||||
# Gets the current version of the repository using versioning tags and git describe.
|
||||
# Passes back a packaging version string and a library version string.
|
||||
function(get_version DEFAULT_VERSION_STRING)
|
||||
|
||||
parse_version(${DEFAULT_VERSION_STRING})
|
||||
|
||||
find_program(GIT NAMES git)
|
||||
|
||||
if(GIT)
|
||||
|
||||
execute_process(
|
||||
COMMAND "git describe --dirty --long --match [0-9]* 2>/dev/null"
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
|
||||
OUTPUT_VARIABLE GIT_TAG_STRING
|
||||
OUTPUT_STRIP_TRAILING_WHITESPACE
|
||||
RESULT_VARIABLE RESULT)
|
||||
|
||||
if(${RESULT} EQUAL 0)
|
||||
|
||||
parse_version(${GIT_TAG_STRING})
|
||||
|
||||
endif()
|
||||
|
||||
endif()
|
||||
|
||||
set(VERSION_STRING
|
||||
"${VERSION_STRING}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_MAJOR
|
||||
"${VERSION_MAJOR}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_MINOR
|
||||
"${VERSION_MINOR}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_PATCH
|
||||
"${VERSION_PATCH}"
|
||||
PARENT_SCOPE)
|
||||
set(VERSION_BUILD
|
||||
"${VERSION_BUILD}"
|
||||
PARENT_SCOPE)
|
||||
|
||||
endfunction()
|
||||
|
||||
@@ -164,12 +164,12 @@ typedef struct {
|
||||
|
||||
// Profiling feature type
|
||||
typedef struct {
|
||||
rocprofiler_feature_kind_t kind; // feature kind
|
||||
rocprofiler_feature_kind_t kind; // feature kind
|
||||
union {
|
||||
const char* name; // feature name
|
||||
const char* name; // feature name
|
||||
struct {
|
||||
const char* block; // counter block name
|
||||
uint32_t event; // counter event id
|
||||
const char* block; // counter block name
|
||||
uint32_t event; // counter event id
|
||||
} counter;
|
||||
};
|
||||
const rocprofiler_parameter_t* parameters; // feature parameters array
|
||||
@@ -216,23 +216,25 @@ typedef struct {
|
||||
} rocprofiler_properties_t;
|
||||
|
||||
// Create new profiling context
|
||||
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
hsa_status_t rocprofiler_open(hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_properties_t* properties); // profiling properties
|
||||
|
||||
// Add feature to a features set
|
||||
hsa_status_t rocprofiler_add_feature(const rocprofiler_feature_t* feature, // [in]
|
||||
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
|
||||
hsa_status_t rocprofiler_add_feature(
|
||||
const rocprofiler_feature_t* feature, // [in]
|
||||
rocprofiler_feature_set_t* features_set); // [in/out] profiling features set
|
||||
|
||||
// Create new profiling context
|
||||
hsa_status_t rocprofiler_features_set_open(hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_set_t* features_set, // [in] profiling features set
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_properties_t* properties); // profiling properties
|
||||
hsa_status_t rocprofiler_features_set_open(
|
||||
hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_set_t* features_set, // [in] profiling features set
|
||||
rocprofiler_t** context, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_properties_t* properties); // profiling properties
|
||||
|
||||
// Delete profiling info
|
||||
hsa_status_t rocprofiler_close(rocprofiler_t* context); // [in] profiling context
|
||||
@@ -242,24 +244,24 @@ hsa_status_t rocprofiler_reset(rocprofiler_t* context, // [in] profiling contex
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Return context agent
|
||||
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
|
||||
hsa_agent_t* agent); // [out] GPU handle
|
||||
hsa_status_t rocprofiler_get_agent(rocprofiler_t* context, // [in] profiling context
|
||||
hsa_agent_t* agent); // [out] GPU handle
|
||||
|
||||
// Supported time value ID
|
||||
typedef enum {
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME = 0, // Linux realtime clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_REALTIME_COARSE = 1, // Linux realtime-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC = 2, // Linux monotonic clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_COARSE = 3, // Linux monotonic-coarse clock time
|
||||
ROCPROFILER_TIME_ID_CLOCK_MONOTONIC_RAW = 4, // Linux monotonic-raw clock time
|
||||
} rocprofiler_time_id_t;
|
||||
|
||||
// Return time value for a given time ID and profiling timestamp
|
||||
hsa_status_t rocprofiler_get_time(
|
||||
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
|
||||
uint64_t timestamp, // profiling timestamp
|
||||
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
|
||||
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
|
||||
rocprofiler_time_id_t time_id, // identifier of the particular time to convert the timesatmp
|
||||
uint64_t timestamp, // profiling timestamp
|
||||
uint64_t* value_ns, // [out] returned time 'ns' value, ignored if NULL
|
||||
uint64_t* error_ns); // [out] returned time error 'ns' value, ignored if NULL
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
// Queue callbacks
|
||||
@@ -269,26 +271,26 @@ hsa_status_t rocprofiler_get_time(
|
||||
|
||||
// Dispatch record
|
||||
typedef struct {
|
||||
uint64_t dispatch; // dispatch timestamp, ns
|
||||
uint64_t begin; // kernel begin timestamp, ns
|
||||
uint64_t end; // kernel end timestamp, ns
|
||||
uint64_t complete; // completion signal timestamp, ns
|
||||
uint64_t dispatch; // dispatch timestamp, ns
|
||||
uint64_t begin; // kernel begin timestamp, ns
|
||||
uint64_t end; // kernel end timestamp, ns
|
||||
uint64_t complete; // completion signal timestamp, ns
|
||||
} rocprofiler_dispatch_record_t;
|
||||
|
||||
// Profiling callback data
|
||||
typedef struct {
|
||||
hsa_agent_t agent; // GPU agent handle
|
||||
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
const hsa_queue_t* queue; // HSA queue
|
||||
uint64_t queue_index; // Index in the queue
|
||||
uint32_t queue_id; // Queue id
|
||||
hsa_signal_t completion_signal; // Completion signal
|
||||
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
|
||||
const char* kernel_name; // Kernel name
|
||||
uint64_t kernel_object; // Kernel object address
|
||||
const amd_kernel_code_t* kernel_code; // Kernel code pointer
|
||||
uint32_t thread_id; // Thread id
|
||||
const rocprofiler_dispatch_record_t* record; // Dispatch record
|
||||
hsa_agent_t agent; // GPU agent handle
|
||||
uint32_t agent_index; // GPU index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
const hsa_queue_t* queue; // HSA queue
|
||||
uint64_t queue_index; // Index in the queue
|
||||
uint32_t queue_id; // Queue id
|
||||
hsa_signal_t completion_signal; // Completion signal
|
||||
const hsa_kernel_dispatch_packet_t* packet; // HSA dispatch packet
|
||||
const char* kernel_name; // Kernel name
|
||||
uint64_t kernel_object; // Kernel object address
|
||||
const amd_kernel_code_t* kernel_code; // Kernel code pointer
|
||||
uint32_t thread_id; // Thread id
|
||||
const rocprofiler_dispatch_record_t* record; // Dispatch record
|
||||
} rocprofiler_callback_data_t;
|
||||
|
||||
// Profiling callback type
|
||||
@@ -299,15 +301,14 @@ typedef hsa_status_t (*rocprofiler_callback_t)(
|
||||
|
||||
// Queue callbacks
|
||||
typedef struct {
|
||||
rocprofiler_callback_t dispatch; // dispatch callback
|
||||
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
|
||||
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
|
||||
rocprofiler_callback_t dispatch; // dispatch callback
|
||||
hsa_status_t (*create)(hsa_queue_t* queue, void* data); // create callback
|
||||
hsa_status_t (*destroy)(hsa_queue_t* queue, void* data); // destroy callback
|
||||
} rocprofiler_queue_callbacks_t;
|
||||
|
||||
// Set queue callbacks
|
||||
hsa_status_t rocprofiler_set_queue_callbacks(
|
||||
rocprofiler_queue_callbacks_t callbacks, // callbacks
|
||||
void* data); // [in/out] passed callbacks data
|
||||
hsa_status_t rocprofiler_set_queue_callbacks(rocprofiler_queue_callbacks_t callbacks, // callbacks
|
||||
void* data); // [in/out] passed callbacks data
|
||||
|
||||
// Remove queue callbacks
|
||||
hsa_status_t rocprofiler_remove_queue_callbacks();
|
||||
@@ -323,20 +324,20 @@ hsa_status_t rocprofiler_stop_queue_callbacks();
|
||||
// contect.invocations' to collect all profiling data
|
||||
|
||||
// Start profiling
|
||||
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_start(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Stop profiling
|
||||
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_stop(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Read profiling
|
||||
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_read(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Read profiling data
|
||||
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
hsa_status_t rocprofiler_get_data(rocprofiler_t* context, // [in/out] profiling context
|
||||
uint32_t group_index); // group index
|
||||
|
||||
// Get profiling groups count
|
||||
hsa_status_t rocprofiler_group_count(const rocprofiler_t* context, // [in] profiling context
|
||||
@@ -379,75 +380,76 @@ hsa_status_t rocprofiler_iterate_trace_data(
|
||||
|
||||
// Profiling info kind
|
||||
typedef enum {
|
||||
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
|
||||
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
|
||||
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
|
||||
ROCPROFILER_INFO_KIND_METRIC = 0, // metric info
|
||||
ROCPROFILER_INFO_KIND_METRIC_COUNT = 1, // metric features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE = 2, // trace info
|
||||
ROCPROFILER_INFO_KIND_TRACE_COUNT = 3, // trace features count, int32
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER = 4, // trace parameter info
|
||||
ROCPROFILER_INFO_KIND_TRACE_PARAMETER_COUNT = 5 // trace parameter count, int32
|
||||
} rocprofiler_info_kind_t;
|
||||
|
||||
// Profiling info query
|
||||
typedef union {
|
||||
rocprofiler_info_kind_t info_kind; // queried profiling info kind
|
||||
rocprofiler_info_kind_t info_kind; // queried profiling info kind
|
||||
struct {
|
||||
const char* trace_name; // queried info trace name
|
||||
const char* trace_name; // queried info trace name
|
||||
} trace_parameter;
|
||||
} rocprofiler_info_query_t;
|
||||
|
||||
// Profiling info data
|
||||
typedef struct {
|
||||
uint32_t agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
rocprofiler_info_kind_t kind; // info data kind
|
||||
uint32_t
|
||||
agent_index; // GPU HSA agent index (GPU Driver Node ID as reported in the sysfs topology)
|
||||
rocprofiler_info_kind_t kind; // info data kind
|
||||
union {
|
||||
struct {
|
||||
const char* name; // metric name
|
||||
uint32_t instances; // instances number
|
||||
const char* expr; // metric expression, NULL for basic counters
|
||||
const char* description; // metric description
|
||||
const char* block_name; // block name
|
||||
uint32_t block_counters; // number of block counters
|
||||
const char* name; // metric name
|
||||
uint32_t instances; // instances number
|
||||
const char* expr; // metric expression, NULL for basic counters
|
||||
const char* description; // metric description
|
||||
const char* block_name; // block name
|
||||
uint32_t block_counters; // number of block counters
|
||||
} metric;
|
||||
struct {
|
||||
const char* name; // trace name
|
||||
const char* description; // trace description
|
||||
uint32_t parameter_count; // supported by the trace number parameters
|
||||
const char* name; // trace name
|
||||
const char* description; // trace description
|
||||
uint32_t parameter_count; // supported by the trace number parameters
|
||||
} trace;
|
||||
struct {
|
||||
uint32_t code; // parameter code
|
||||
const char* trace_name; // trace name
|
||||
const char* parameter_name; // parameter name
|
||||
const char* description; // trace parameter description
|
||||
uint32_t code; // parameter code
|
||||
const char* trace_name; // trace name
|
||||
const char* parameter_name; // parameter name
|
||||
const char* description; // trace parameter description
|
||||
} trace_parameter;
|
||||
};
|
||||
} rocprofiler_info_data_t;
|
||||
|
||||
// Return the info for a given info kind
|
||||
hsa_status_t rocprofiler_get_info(
|
||||
const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
void *data); // [in/out] returned data
|
||||
hsa_status_t rocprofiler_get_info(const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
void* data); // [in/out] returned data
|
||||
|
||||
// Iterate over the info for a given info kind, and invoke an application-defined callback on every iteration
|
||||
hsa_status_t rocprofiler_iterate_info(
|
||||
const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
|
||||
void *data); // [in/out] data passed to callback
|
||||
// Iterate over the info for a given info kind, and invoke an application-defined callback on every
|
||||
// iteration
|
||||
hsa_status_t rocprofiler_iterate_info(const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_kind_t kind, // kind of iterated info
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
|
||||
void* data), // callback
|
||||
void* data); // [in/out] data passed to callback
|
||||
|
||||
// Iterate over the info for a given info query, and invoke an application-defined callback on every iteration
|
||||
hsa_status_t rocprofiler_query_info(
|
||||
const hsa_agent_t *agent, // [in] GFXIP handle
|
||||
rocprofiler_info_query_t query, // iterated info query
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info, void *data), // callback
|
||||
void *data); // [in/out] data passed to callback
|
||||
// Iterate over the info for a given info query, and invoke an application-defined callback on every
|
||||
// iteration
|
||||
hsa_status_t rocprofiler_query_info(const hsa_agent_t* agent, // [in] GFXIP handle
|
||||
rocprofiler_info_query_t query, // iterated info query
|
||||
hsa_status_t (*callback)(const rocprofiler_info_data_t info,
|
||||
void* data), // callback
|
||||
void* data); // [in/out] data passed to callback
|
||||
|
||||
// Create a profiled queue. All dispatches on this queue will be profiled
|
||||
hsa_status_t rocprofiler_queue_create_profiled(
|
||||
hsa_agent_t agent_handle,uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data),
|
||||
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
|
||||
hsa_queue_t** queue);
|
||||
hsa_agent_t agent_handle, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
|
||||
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue);
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
// Profiling pool
|
||||
@@ -461,8 +463,8 @@ typedef void rocprofiler_pool_t;
|
||||
|
||||
// Profiling pool entry
|
||||
typedef struct {
|
||||
rocprofiler_t* context; // context object
|
||||
void* payload; // payload data object
|
||||
rocprofiler_t* context; // context object
|
||||
void* payload; // payload data object
|
||||
} rocprofiler_pool_entry_t;
|
||||
|
||||
// Profiling handler, calling on profiling completion
|
||||
@@ -478,120 +480,118 @@ typedef struct {
|
||||
|
||||
// Open profiling pool
|
||||
hsa_status_t rocprofiler_pool_open(
|
||||
hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_pool_t** pool, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_pool_properties_t*); // pool properties
|
||||
hsa_agent_t agent, // GPU handle
|
||||
rocprofiler_feature_t* features, // [in] profiling features array
|
||||
uint32_t feature_count, // profiling info count
|
||||
rocprofiler_pool_t** pool, // [out] context object
|
||||
uint32_t mode, // profiling mode mask
|
||||
rocprofiler_pool_properties_t*); // pool properties
|
||||
|
||||
// Close profiling pool
|
||||
hsa_status_t rocprofiler_pool_close(
|
||||
rocprofiler_pool_t* pool); // profiling pool handle
|
||||
hsa_status_t rocprofiler_pool_close(rocprofiler_pool_t* pool); // profiling pool handle
|
||||
|
||||
// Fetch profiling pool entry
|
||||
hsa_status_t rocprofiler_pool_fetch(
|
||||
rocprofiler_pool_t* pool, // profiling pool handle
|
||||
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
|
||||
rocprofiler_pool_t* pool, // profiling pool handle
|
||||
rocprofiler_pool_entry_t* entry); // [out] empty profiling pool entry
|
||||
|
||||
// Release profiling pool entry
|
||||
hsa_status_t rocprofiler_pool_release(
|
||||
rocprofiler_pool_entry_t* entry); // released profiling pool entry
|
||||
rocprofiler_pool_entry_t* entry); // released profiling pool entry
|
||||
|
||||
// Iterate fetched profiling pool entries
|
||||
hsa_status_t rocprofiler_pool_iterate(
|
||||
rocprofiler_pool_t* pool, // profiling pool handle
|
||||
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry, void* data), // callback
|
||||
void *data); // [in/out] data passed to callback
|
||||
hsa_status_t rocprofiler_pool_iterate(rocprofiler_pool_t* pool, // profiling pool handle
|
||||
hsa_status_t (*callback)(rocprofiler_pool_entry_t* entry,
|
||||
void* data), // callback
|
||||
void* data); // [in/out] data passed to callback
|
||||
|
||||
// Flush completed entries in profiling pool
|
||||
hsa_status_t rocprofiler_pool_flush(
|
||||
rocprofiler_pool_t* pool); // profiling pool handle
|
||||
hsa_status_t rocprofiler_pool_flush(rocprofiler_pool_t* pool); // profiling pool handle
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
// HSA intercepting API
|
||||
|
||||
// HSA callbacks ID enumeration
|
||||
typedef enum {
|
||||
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
|
||||
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
|
||||
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
|
||||
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
|
||||
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
|
||||
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
|
||||
ROCPROFILER_HSA_CB_ID_ALLOCATE = 0, // Memory allocate callback
|
||||
ROCPROFILER_HSA_CB_ID_DEVICE = 1, // Device assign callback
|
||||
ROCPROFILER_HSA_CB_ID_MEMCOPY = 2, // Memcopy callback
|
||||
ROCPROFILER_HSA_CB_ID_SUBMIT = 3, // Packet submit callback
|
||||
ROCPROFILER_HSA_CB_ID_KSYMBOL = 4, // Loading/unloading of kernel symbol
|
||||
ROCPROFILER_HSA_CB_ID_CODEOBJ = 5 // Loading/unloading of kernel symbol
|
||||
} rocprofiler_hsa_cb_id_t;
|
||||
|
||||
// HSA callback data type
|
||||
typedef struct {
|
||||
union {
|
||||
struct {
|
||||
const void* ptr; // allocated area ptr
|
||||
size_t size; // allocated area size, zero size means 'free' callback
|
||||
hsa_amd_segment_t segment; // allocated area's memory segment type
|
||||
const void* ptr; // allocated area ptr
|
||||
size_t size; // allocated area size, zero size means 'free' callback
|
||||
hsa_amd_segment_t segment; // allocated area's memory segment type
|
||||
hsa_amd_memory_pool_global_flag_t global_flag; // allocated area's memory global flag
|
||||
int is_code; // equal to 1 if code is allocated
|
||||
} allocate;
|
||||
struct {
|
||||
hsa_device_type_t type; // type of assigned device
|
||||
uint32_t id; // id of assigned device
|
||||
hsa_agent_t agent; // device HSA agent handle
|
||||
const void* ptr; // ptr the device is assigned to
|
||||
hsa_device_type_t type; // type of assigned device
|
||||
uint32_t id; // id of assigned device
|
||||
hsa_agent_t agent; // device HSA agent handle
|
||||
const void* ptr; // ptr the device is assigned to
|
||||
} device;
|
||||
struct {
|
||||
const void* dst; // memcopy dst ptr
|
||||
const void* src; // memcopy src ptr
|
||||
size_t size; // memcopy size bytes
|
||||
const void* dst; // memcopy dst ptr
|
||||
const void* src; // memcopy src ptr
|
||||
size_t size; // memcopy size bytes
|
||||
} memcopy;
|
||||
struct {
|
||||
const void* packet; // submitted to GPU packet
|
||||
const char* kernel_name; // kernel name, not NULL if dispatch
|
||||
hsa_queue_t* queue; // HSA queue the kernel was submitted to
|
||||
uint32_t device_type; // type of device the packed is submitted to
|
||||
uint32_t device_id; // id of device the packed is submitted to
|
||||
const void* packet; // submitted to GPU packet
|
||||
const char* kernel_name; // kernel name, not NULL if dispatch
|
||||
hsa_queue_t* queue; // HSA queue the kernel was submitted to
|
||||
uint32_t device_type; // type of device the packed is submitted to
|
||||
uint32_t device_id; // id of device the packed is submitted to
|
||||
} submit;
|
||||
struct {
|
||||
uint64_t object; // kernel symbol object
|
||||
const char* name; // kernel symbol name
|
||||
uint32_t name_length; // kernel symbol name length
|
||||
int unload; // symbol executable destroy
|
||||
uint64_t object; // kernel symbol object
|
||||
const char* name; // kernel symbol name
|
||||
uint32_t name_length; // kernel symbol name length
|
||||
int unload; // symbol executable destroy
|
||||
} ksymbol;
|
||||
struct {
|
||||
uint32_t storage_type; // code object storage type
|
||||
int storage_file; // origin file descriptor
|
||||
uint64_t memory_base; // origin memory base
|
||||
uint64_t memory_size; // origin memory size
|
||||
uint64_t load_base; // codeobj load base
|
||||
uint64_t load_size; // codeobj load size
|
||||
uint64_t load_delta; // codeobj load size
|
||||
uint32_t uri_length; // URI string length
|
||||
char* uri; // URI string
|
||||
int unload; // unload flag
|
||||
uint32_t storage_type; // code object storage type
|
||||
int storage_file; // origin file descriptor
|
||||
uint64_t memory_base; // origin memory base
|
||||
uint64_t memory_size; // origin memory size
|
||||
uint64_t load_base; // codeobj load base
|
||||
uint64_t load_size; // codeobj load size
|
||||
uint64_t load_delta; // codeobj load size
|
||||
uint32_t uri_length; // URI string length
|
||||
char* uri; // URI string
|
||||
int unload; // unload flag
|
||||
} codeobj;
|
||||
};
|
||||
} rocprofiler_hsa_callback_data_t;
|
||||
|
||||
// HSA callback function type
|
||||
typedef hsa_status_t (*rocprofiler_hsa_callback_fun_t)(
|
||||
rocprofiler_hsa_cb_id_t id, // callback id
|
||||
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
|
||||
void* arg); // [in/out] user passed data
|
||||
rocprofiler_hsa_cb_id_t id, // callback id
|
||||
const rocprofiler_hsa_callback_data_t* data, // [in] callback data
|
||||
void* arg); // [in/out] user passed data
|
||||
|
||||
// HSA callbacks structure
|
||||
typedef struct {
|
||||
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
|
||||
rocprofiler_hsa_callback_fun_t device; // agent assign callback
|
||||
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
|
||||
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
|
||||
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
|
||||
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
|
||||
rocprofiler_hsa_callback_fun_t allocate; // memory allocate callback
|
||||
rocprofiler_hsa_callback_fun_t device; // agent assign callback
|
||||
rocprofiler_hsa_callback_fun_t memcopy; // memory copy callback
|
||||
rocprofiler_hsa_callback_fun_t submit; // packet submit callback
|
||||
rocprofiler_hsa_callback_fun_t ksymbol; // kernel symbol callback
|
||||
rocprofiler_hsa_callback_fun_t codeobj; // codeobject load/unload callback
|
||||
} rocprofiler_hsa_callbacks_t;
|
||||
|
||||
// Set callbacks. If the callback is NULL then it is disabled.
|
||||
// If callback returns a value that is not HSA_STATUS_SUCCESS the callback
|
||||
// will be unregistered.
|
||||
hsa_status_t rocprofiler_set_hsa_callbacks(
|
||||
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
|
||||
void* arg); // callback user data
|
||||
const rocprofiler_hsa_callbacks_t callbacks, // HSA callback function
|
||||
void* arg); // callback user data
|
||||
|
||||
#ifdef __cplusplus
|
||||
} // extern "C" block
|
||||
|
||||
@@ -1714,7 +1714,7 @@ typedef enum {
|
||||
ROCPROFILER_ATT_TOKEN_MASK2 = 4,
|
||||
ROCPROFILER_ATT_SE_MASK = 5,
|
||||
ROCPROFILER_ATT_SAMPLE_RATE = 6,
|
||||
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
|
||||
ROCPROFILER_ATT_BUFFER_SIZE = 7, //! ATT collection max data size.
|
||||
ROCPROFILER_ATT_PERF_MASK = 240,
|
||||
ROCPROFILER_ATT_PERF_CTRL = 241,
|
||||
ROCPROFILER_ATT_PERFCOUNTER = 242,
|
||||
|
||||
@@ -1,23 +1,23 @@
|
||||
################################################################################
|
||||
## Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
##
|
||||
## Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
## of this software and associated documentation files (the "Software"), to
|
||||
## deal in the Software without restriction, including without limitation the
|
||||
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
## sell copies of the Software, and to permit persons to whom the Software is
|
||||
## furnished to do so, subject to the following conditions:
|
||||
##
|
||||
## The above copyright notice and this permission notice shall be included in
|
||||
## all copies or substantial portions of the Software.
|
||||
##
|
||||
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
## IN THE SOFTWARE.
|
||||
# Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to
|
||||
# deal in the Software without restriction, including without limitation the
|
||||
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
# sell copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in
|
||||
# all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
# IN THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
add_subdirectory(file)
|
||||
|
||||
@@ -17,10 +17,10 @@
|
||||
# ##############################################################################
|
||||
|
||||
find_library(
|
||||
ROCPROFV2_ATT rocprofv2_att
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES hsa-amd-aqlprofile)
|
||||
ROCPROFV2_ATT rocprofv2_att
|
||||
HINTS ${CMAKE_INSTALL_PREFIX}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES hsa-amd-aqlprofile)
|
||||
|
||||
set(ENV{ROCPROFV2_ATT_LIB_PATH} $ROCPROFV2_ATT)
|
||||
|
||||
@@ -30,30 +30,26 @@ file(GLOB FILE_SOURCES att.cpp)
|
||||
add_library(att_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
|
||||
set_target_properties(
|
||||
att_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
att_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(att_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(
|
||||
att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR})
|
||||
target_include_directories(att_plugin PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR})
|
||||
target_link_options(
|
||||
att_plugin PRIVATE
|
||||
-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(att_plugin PRIVATE rocprofiler-v2
|
||||
hsa-runtime64::hsa-runtime64 stdc++fs)
|
||||
att_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(att_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs)
|
||||
|
||||
install(TARGETS att_plugin
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS att_plugin
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS att_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
configure_file(att.py att/att.py COPYONLY)
|
||||
configure_file(trace_view.py att/trace_view.py COPYONLY)
|
||||
@@ -64,7 +60,7 @@ configure_file(ui/logo.svg att/ui/logo.svg COPYONLY)
|
||||
configure_file(ui/styles.css att/ui/styles.css COPYONLY)
|
||||
configure_file(ui/httpserver.py att/ui/httpserver.py COPYONLY)
|
||||
install(
|
||||
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/att
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/rocprofiler
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -54,11 +54,12 @@ class att_plugin_t {
|
||||
att_plugin_t() {
|
||||
std::vector<const char*> mpivars = {"MPI_RANK", "OMPI_COMM_WORLD_RANK", "MV2_COMM_WORLD_RANK"};
|
||||
|
||||
for (const char* envvar : mpivars) if (const char* env = getenv(envvar)) {
|
||||
MPI_RANK = atoi(env);
|
||||
MPI_ENABLE = true;
|
||||
break;
|
||||
}
|
||||
for (const char* envvar : mpivars)
|
||||
if (const char* env = getenv(envvar)) {
|
||||
MPI_RANK = atoi(env);
|
||||
MPI_ENABLE = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
bool MPI_ENABLE = false;
|
||||
@@ -92,16 +93,15 @@ class att_plugin_t {
|
||||
std::string name_demangled =
|
||||
rocprofiler::truncate_name(rocprofiler::cxx_demangle(kernel_name_c));
|
||||
|
||||
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
|
||||
if (name_demangled.size() > ATT_FILENAME_MAXBYTES) // Limit filename size
|
||||
name_demangled = name_demangled.substr(0, ATT_FILENAME_MAXBYTES);
|
||||
|
||||
std::string outfilepath = ".";
|
||||
if (const char* env = getenv("OUTPUT_PATH"))
|
||||
outfilepath = std::string(env);
|
||||
if (const char* env = getenv("OUTPUT_PATH")) outfilepath = std::string(env);
|
||||
|
||||
outfilepath.reserve(outfilepath.size()+128); // Max filename size
|
||||
outfilepath += '/'+name_demangled;
|
||||
if (MPI_ENABLE) outfilepath += "_rank"+std::to_string(MPI_RANK);
|
||||
outfilepath.reserve(outfilepath.size() + 128); // Max filename size
|
||||
outfilepath += '/' + name_demangled;
|
||||
if (MPI_ENABLE) outfilepath += "_rank" + std::to_string(MPI_RANK);
|
||||
outfilepath += "_v";
|
||||
|
||||
// Find if this filename already exists. If so, increment vname.
|
||||
@@ -113,9 +113,9 @@ class att_plugin_t {
|
||||
auto dispatch_id = att_tracer_record->header.id.handle;
|
||||
|
||||
std::string fname = outfilepath + "_kernel.txt";
|
||||
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id
|
||||
<< "] GPU[" << att_tracer_record->gpu_id.handle
|
||||
<< "]: " << kernel_name_c << '\n';
|
||||
std::ofstream(fname.c_str()) << name_demangled << " dispatch[" << dispatch_id << "] GPU["
|
||||
<< att_tracer_record->gpu_id.handle << "]: " << kernel_name_c
|
||||
<< '\n';
|
||||
|
||||
// iterate over each shader engine att trace
|
||||
int se_num = att_tracer_record->shader_engine_data_count;
|
||||
|
||||
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
file(GLOB CLI_SOURCES "*.cpp")
|
||||
add_library(cli_plugin SHARED ${CLI_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
|
||||
set_target_properties(cli_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
set_target_properties(
|
||||
cli_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(cli_plugin
|
||||
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
|
||||
target_compile_definitions(cli_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(cli_plugin PRIVATE ${PROJECT_SOURCE_DIR})
|
||||
|
||||
target_link_options(cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
|
||||
target_link_options(
|
||||
cli_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
|
||||
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs atomic amd_comgr dl)
|
||||
target_link_libraries(cli_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs atomic amd_comgr dl)
|
||||
|
||||
install(TARGETS cli_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS cli_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS cli_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -1,76 +1,84 @@
|
||||
################################################################################
|
||||
## Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
##
|
||||
## Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
## of this software and associated documentation files (the "Software"), to
|
||||
## deal in the Software without restriction, including without limitation the
|
||||
## rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
## sell copies of the Software, and to permit persons to whom the Software is
|
||||
## furnished to do so, subject to the following conditions:
|
||||
##
|
||||
## The above copyright notice and this permission notice shall be included in
|
||||
## all copies or substantial portions of the Software.
|
||||
##
|
||||
## THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
## IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
## FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
## AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
## LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
## FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
## IN THE SOFTWARE.
|
||||
# Copyright (c) 2022 Advanced Micro Devices, Inc.
|
||||
#
|
||||
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
# of this software and associated documentation files (the "Software"), to
|
||||
# deal in the Software without restriction, including without limitation the
|
||||
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
|
||||
# sell copies of the Software, and to permit persons to whom the Software is
|
||||
# furnished to do so, subject to the following conditions:
|
||||
#
|
||||
# The above copyright notice and this permission notice shall be included in
|
||||
# all copies or substantial portions of the Software.
|
||||
#
|
||||
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
||||
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
||||
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
||||
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
||||
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
||||
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
||||
# IN THE SOFTWARE.
|
||||
################################################################################
|
||||
|
||||
# Plugin shared object.
|
||||
add_library(ctf_plugin SHARED
|
||||
ctf.cpp
|
||||
plugin.cpp
|
||||
barectf.c "${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
|
||||
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
|
||||
hsa_begin.cpp.i hsa_end.cpp.i
|
||||
hip_begin.cpp.i hip_end.cpp.i)
|
||||
set_target_properties(ctf_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
|
||||
add_library(
|
||||
ctf_plugin SHARED
|
||||
ctf.cpp
|
||||
plugin.cpp
|
||||
barectf.c
|
||||
"${CMAKE_CURRENT_BINARY_DIR}/barectf.h"
|
||||
${PROJECT_SOURCE_DIR}/src/utils/helper.cpp
|
||||
hsa_begin.cpp.i
|
||||
hsa_end.cpp.i
|
||||
hip_begin.cpp.i
|
||||
hip_end.cpp.i)
|
||||
set_target_properties(
|
||||
ctf_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
LIBRARY_OUTPUT_DIRECTORY "${PROJECT_BINARY_DIR}/lib/rocprofiler")
|
||||
set(METADATA_STREAM_FILE_DIR "${CMAKE_INSTALL_DATADIR}/${PROJECT_NAME}/plugin/ctf")
|
||||
target_compile_definitions(ctf_plugin PUBLIC AMD_INTERNAL_BUILD PRIVATE
|
||||
HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1
|
||||
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata")
|
||||
target_include_directories(ctf_plugin PRIVATE
|
||||
"${PROJECT_SOURCE_DIR}"
|
||||
"${CMAKE_BINARY_DIR}/src/api"
|
||||
"${CMAKE_CURRENT_BINARY_DIR}")
|
||||
target_link_options(ctf_plugin PRIVATE
|
||||
"-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(ctf_plugin PRIVATE
|
||||
rocprofiler-v2
|
||||
hsa-runtime64::hsa-runtime64
|
||||
stdc++fs
|
||||
dl)
|
||||
install(TARGETS ctf_plugin LIBRARY
|
||||
DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
|
||||
COMPONENT plugins)
|
||||
target_compile_definitions(
|
||||
ctf_plugin
|
||||
PUBLIC AMD_INTERNAL_BUILD
|
||||
PRIVATE
|
||||
HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1
|
||||
CTF_PLUGIN_METADATA_FILE_PATH="${CMAKE_INSTALL_PREFIX}/${METADATA_STREAM_FILE_DIR}/metadata"
|
||||
)
|
||||
target_include_directories(
|
||||
ctf_plugin PRIVATE "${PROJECT_SOURCE_DIR}" "${CMAKE_BINARY_DIR}/src/api"
|
||||
"${CMAKE_CURRENT_BINARY_DIR}")
|
||||
target_link_options(
|
||||
ctf_plugin PRIVATE "-Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap"
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(ctf_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs dl)
|
||||
install(TARGETS ctf_plugin LIBRARY DESTINATION "${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}"
|
||||
COMPONENT plugins)
|
||||
|
||||
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3,
|
||||
# CppHeaderParser, PyYAML, and barectf.
|
||||
find_package(Python3 COMPONENTS Interpreter REQUIRED)
|
||||
# `gen_api_files.py` and `gen_env_yaml.py` require Python 3, CppHeaderParser, PyYAML, and
|
||||
# barectf.
|
||||
find_package(
|
||||
Python3
|
||||
COMPONENTS Interpreter
|
||||
REQUIRED)
|
||||
|
||||
message("Python: ${Python3_EXECUTABLE})")
|
||||
|
||||
execute_process(COMMAND Python3::Interpreter -c "print('hello')")
|
||||
|
||||
function(check_py3_pkg pkg_name)
|
||||
execute_process(COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
|
||||
RESULT_VARIABLE PY3_IMPORT_RES
|
||||
OUTPUT_QUIET)
|
||||
execute_process(
|
||||
COMMAND "${Python3_EXECUTABLE}" -c "import ${pkg_name}"
|
||||
RESULT_VARIABLE PY3_IMPORT_RES
|
||||
OUTPUT_QUIET)
|
||||
|
||||
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
|
||||
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
|
||||
endif()
|
||||
if(NOT (${PY3_IMPORT_RES} EQUAL 0))
|
||||
message(FATAL_ERROR "Cannot find Python 3 package `${pkg_name}`")
|
||||
endif()
|
||||
|
||||
message(STATUS "Found Python 3 package `${pkg_name}`")
|
||||
message(STATUS "Found Python 3 package `${pkg_name}`")
|
||||
endfunction()
|
||||
|
||||
check_py3_pkg(CppHeaderParser)
|
||||
@@ -78,82 +86,76 @@ check_py3_pkg(yaml)
|
||||
find_program(BARECTF_RES barectf REQUIRED HINTS "$ENV{HOME}/.local/bin")
|
||||
|
||||
# Generate barectf YAML and C++ files for HSA API.
|
||||
get_property(HSA_RUNTIME_INCLUDE_DIRS
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRS
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH "${HSA_H}" DIRECTORY)
|
||||
add_custom_command(
|
||||
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
-o hsa_ext_amd.h.i
|
||||
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i
|
||||
hsa_ext_amd.h.i
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
|
||||
> hsa_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
hsa hsa_input.h
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
|
||||
COMMENT "Generating HSA API files for the `ctf` plugin...")
|
||||
OUTPUT hsa_erts.yaml hsa_begin.cpp.i hsa_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
|
||||
hsa_ext_amd.h.i
|
||||
COMMAND ${CMAKE_COMMAND} -E cat hsa.h.i hsa_ext_amd.h.i
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h" > hsa_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hsa
|
||||
hsa_input.h
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i hsa_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HSA_RUNTIME_INC_PATH}/hsa.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${CMAKE_BINARY_DIR}/src/api/hsa_prof_str.h"
|
||||
COMMENT "Generating HSA API files for the `ctf` plugin...")
|
||||
|
||||
# Generate barectf YAML and C++ files for HIP API.
|
||||
get_property(HIP_INCLUDE_DIRS TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
find_file(HIP_PROF_STR_H hip_prof_str.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip hip/amd_detail
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HIP_INCLUDE_DIRS
|
||||
TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
find_file(
|
||||
HIP_PROF_STR_H hip_prof_str.h
|
||||
PATHS ${HIP_INCLUDE_DIRS}
|
||||
PATH_SUFFIXES hip hip/amd_detail
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
list(TRANSFORM HIP_INCLUDE_DIRS PREPEND -I)
|
||||
add_custom_command(
|
||||
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS}
|
||||
-E "${HIP_RUNTIME_API_H}"
|
||||
-D__HIP_PLATFORM_HCC__=1
|
||||
-D__HIP_ROCclr__=1
|
||||
-o hip_runtime_api.h.i
|
||||
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
|
||||
BYPRODUCTS hip_runtime_api.h.i hip_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
hip hip_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py"
|
||||
"${HIP_RUNTIME_API_H}"
|
||||
"${HIP_PROF_STR_H}"
|
||||
COMMENT "Generating HIP API files for the `ctf` plugin...")
|
||||
OUTPUT hip_erts.yaml hip_begin.cpp.i hip_end.cpp.i
|
||||
COMMAND ${CMAKE_C_COMPILER} ${HIP_INCLUDE_DIRS} -E "${HIP_RUNTIME_API_H}"
|
||||
-D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
|
||||
COMMAND cat hip_runtime_api.h.i "${HIP_PROF_STR_H}" > hip_input.h
|
||||
BYPRODUCTS hip_runtime_api.h.i hip_input.h
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" hip
|
||||
hip_input.h
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_api_files.py" "${HIP_RUNTIME_API_H}"
|
||||
"${HIP_PROF_STR_H}"
|
||||
COMMENT "Generating HIP API files for the `ctf` plugin...")
|
||||
|
||||
# Generate `env.yaml` (trace environment for barectf).
|
||||
add_custom_command(
|
||||
OUTPUT env.yaml
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
${PROJECT_VERSION}
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
COMMENT "Generating `env.yaml`...")
|
||||
OUTPUT env.yaml
|
||||
COMMAND "${Python3_EXECUTABLE}" "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
${PROJECT_VERSION}
|
||||
DEPENDS "${CMAKE_CURRENT_SOURCE_DIR}/gen_env_yaml.py"
|
||||
COMMENT "Generating `env.yaml`...")
|
||||
|
||||
# Generate raw CTF tracer with barectf.
|
||||
add_custom_command(
|
||||
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
|
||||
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
|
||||
"-I${CMAKE_CURRENT_SOURCE_DIR}"
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
DEPENDS hsa_erts.yaml
|
||||
hip_erts.yaml
|
||||
env.yaml
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
|
||||
COMMENT "Generating raw CTF tracer with barectf...")
|
||||
install(FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
|
||||
DESTINATION "${METADATA_STREAM_FILE_DIR}" COMPONENT plugins)
|
||||
OUTPUT barectf.c barectf.h barectf-bitfield.h metadata
|
||||
COMMAND "${BARECTF_RES}" gen "-I${CMAKE_CURRENT_BINARY_DIR}"
|
||||
"-I${CMAKE_CURRENT_SOURCE_DIR}" "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
DEPENDS hsa_erts.yaml hip_erts.yaml env.yaml "${CMAKE_CURRENT_SOURCE_DIR}/config.yaml"
|
||||
"${CMAKE_CURRENT_SOURCE_DIR}/dst_base.yaml"
|
||||
COMMENT "Generating raw CTF tracer with barectf...")
|
||||
install(
|
||||
FILES "${CMAKE_CURRENT_BINARY_DIR}/metadata"
|
||||
DESTINATION "${METADATA_STREAM_FILE_DIR}"
|
||||
COMPONENT plugins)
|
||||
|
||||
@@ -156,9 +156,8 @@ class HsaApiEventRecord : public TracerEventRecord<barectf_hsa_api_ctx> {
|
||||
const rocprofiler_session_id_t session_id,
|
||||
const std::uint64_t clock_val)
|
||||
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val} {
|
||||
if(record.api_data.hsa)
|
||||
api_data_ = *(record.api_data.hsa);
|
||||
}
|
||||
if (record.api_data.hsa) api_data_ = *(record.api_data.hsa);
|
||||
}
|
||||
explicit HsaApiEventRecord(const rocprofiler_record_tracer_t& record,
|
||||
const std::uint64_t clock_val, hsa_api_data_t& api_data)
|
||||
: TracerEventRecord<barectf_hsa_api_ctx>{record, clock_val}, api_data_(api_data) {}
|
||||
@@ -206,7 +205,7 @@ class HipApiEventRecord : public TracerEventRecord<barectf_hip_api_ctx> {
|
||||
const rocprofiler_session_id_t session_id,
|
||||
const std::uint64_t clock_val)
|
||||
: TracerEventRecord<barectf_hip_api_ctx>{record, clock_val},
|
||||
api_data_{record.api_data.hip? *(record.api_data.hip) : hip_api_data_t{}},
|
||||
api_data_{record.api_data.hip ? *(record.api_data.hip) : hip_api_data_t{}},
|
||||
kernel_name_{record.name ? record.name : std::string{}} {}
|
||||
explicit HipApiEventRecord(const rocprofiler_record_tracer_t& record,
|
||||
const std::uint64_t clock_val, hip_api_data_t& api_data,
|
||||
@@ -760,16 +759,11 @@ std::uint64_t GetMetadataClkClsOffset() {
|
||||
|
||||
static const char* LOOP_MPI_RANK(const std::vector<const char*>& mpivars) {
|
||||
for (const char* env : mpivars)
|
||||
if (const char* envvar = getenv(env))
|
||||
return envvar;
|
||||
if (const char* envvar = getenv(env)) return envvar;
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
static void insert_meta_to_stream(
|
||||
std::stringstream& stream,
|
||||
const char* field,
|
||||
const char* value
|
||||
) {
|
||||
static void insert_meta_to_stream(std::stringstream& stream, const char* field, const char* value) {
|
||||
if (!field || !value) return;
|
||||
stream << "\n\t" << std::string(field) << " = " << std::string(value) << ';';
|
||||
}
|
||||
@@ -802,7 +796,7 @@ void Plugin::CopyAdjustedMetadataStreamFile(const fs::path& metadata_stream_path
|
||||
std::string data_ins = data_stream.str();
|
||||
size_t env_pos = metadata.find("env {");
|
||||
if (env_pos != std::string::npos)
|
||||
metadata.insert(metadata.begin()+env_pos+5, data_ins.begin(), data_ins.end());
|
||||
metadata.insert(metadata.begin() + env_pos + 5, data_ins.begin(), data_ins.end());
|
||||
else
|
||||
std::cerr << "Failed to insert MPI metadata!" << std::endl;
|
||||
}
|
||||
|
||||
@@ -25,23 +25,25 @@ file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
file(GLOB FILE_SOURCES "*.cpp")
|
||||
add_library(file_plugin SHARED ${FILE_SOURCES} ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
|
||||
set_target_properties(file_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
set_target_properties(
|
||||
file_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(file_plugin
|
||||
PRIVATE HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_HCC__=1)
|
||||
target_compile_definitions(file_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(file_plugin PRIVATE ${PROJECT_SOURCE_DIR})
|
||||
|
||||
target_link_options(file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
|
||||
target_link_options(
|
||||
file_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
|
||||
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64 stdc++fs amd_comgr dl)
|
||||
target_link_libraries(file_plugin PRIVATE rocprofiler-v2 hsa-runtime64::hsa-runtime64
|
||||
stdc++fs amd_comgr dl)
|
||||
|
||||
install(TARGETS file_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS file_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT asan)
|
||||
install(TARGETS file_plugin LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -216,8 +216,7 @@ class file_plugin_t {
|
||||
case ACTIVITY_DOMAIN_HIP_API: {
|
||||
if (hip_api_header_written_.load(std::memory_order_relaxed)) return;
|
||||
output_file = get_output_file(output_type_t::TRACER, ACTIVITY_DOMAIN_HIP_API);
|
||||
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID"
|
||||
<< std::endl;
|
||||
*output_file << "Domain,Function,Start_Timestamp,End_Timestamp,Correlation_ID" << std::endl;
|
||||
*output_file << std::endl;
|
||||
hip_api_header_written_.exchange(true, std::memory_order_release);
|
||||
return;
|
||||
|
||||
@@ -1,27 +1,27 @@
|
||||
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
|
||||
add_library(perfetto_plugin
|
||||
${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES}
|
||||
perfetto.cpp perfetto_sdk/sdk/perfetto.cc)
|
||||
add_library(perfetto_plugin ${LIBRARY_TYPE} ${ROCPROFILER_UTIL_SRC_FILES} perfetto.cpp
|
||||
perfetto_sdk/sdk/perfetto.cc)
|
||||
|
||||
set_target_properties(perfetto_plugin PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
set_target_properties(
|
||||
perfetto_plugin
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR}/lib/rocprofiler)
|
||||
|
||||
target_compile_definitions(perfetto_plugin
|
||||
PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
target_compile_definitions(perfetto_plugin PRIVATE HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_HCC__=1)
|
||||
|
||||
target_include_directories(perfetto_plugin
|
||||
PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
|
||||
target_include_directories(
|
||||
perfetto_plugin PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/plugin/perfetto/perfetto_sdk/sdk)
|
||||
|
||||
target_link_options(perfetto_plugin
|
||||
PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap -Wl,--no-undefined)
|
||||
target_link_options(
|
||||
perfetto_plugin PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/../exportmap
|
||||
-Wl,--no-undefined)
|
||||
|
||||
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs amd_comgr)
|
||||
target_link_libraries(perfetto_plugin PRIVATE rocprofiler-v2 Threads::Threads stdc++fs
|
||||
amd_comgr)
|
||||
|
||||
install(TARGETS perfetto_plugin LIBRARY
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME}
|
||||
COMPONENT plugins)
|
||||
install(TARGETS perfetto_plugin
|
||||
LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR}/${PROJECT_NAME} COMPONENT plugins)
|
||||
|
||||
@@ -556,8 +556,7 @@ class perfetto_plugin_t {
|
||||
if (tracer_record.name) {
|
||||
kernel_name = rocprofiler::cxx_demangle(tracer_record.name);
|
||||
TRACE_EVENT_BEGIN(
|
||||
"HIP_OPS",
|
||||
perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
|
||||
"HIP_OPS", perfetto::StaticString(rocprofiler::truncate_name(kernel_name).c_str()),
|
||||
gpu_track, tracer_record.timestamps.begin.value, "Agent ID",
|
||||
tracer_record.agent_id.handle, "Process ID", GetPid(), "Kernel Name", kernel_name,
|
||||
perfetto::Flow::ProcessScoped(tracer_record.correlation_id.value));
|
||||
|
||||
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -36,9 +36,10 @@
|
||||
#include "src/utils/helper.h"
|
||||
|
||||
// Macro to check ROCProfiler calls status
|
||||
#define CHECK_ROCPROFILER(call) \
|
||||
#define CHECK_ROCPROFILER(call) \
|
||||
do { \
|
||||
if ((call) != ROCPROFILER_STATUS_SUCCESS) rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
|
||||
if ((call) != ROCPROFILER_STATUS_SUCCESS) \
|
||||
rocprofiler::fatal("Error: ROCProfiler API Call Error!"); \
|
||||
} while (false)
|
||||
|
||||
namespace {
|
||||
@@ -48,8 +49,6 @@ namespace {
|
||||
return pid;
|
||||
}
|
||||
|
||||
[[maybe_unused]] uint64_t GetMachineID() {
|
||||
return gethostid();
|
||||
}
|
||||
[[maybe_unused]] uint64_t GetMachineID() { return gethostid(); }
|
||||
|
||||
} // namespace
|
||||
|
||||
@@ -26,9 +26,11 @@ set(ROCPROF_WRAPPER_BIN_DIR ${ROCPROF_WRAPPER_DIR}/bin)
|
||||
set(ROCPROF_WRAPPER_LIB_DIR ${ROCPROF_WRAPPER_DIR}/lib)
|
||||
set(ROCPROF_WRAPPER_TOOL_DIR ${ROCPROF_WRAPPER_DIR}/tool)
|
||||
|
||||
#Function to generate header template file
|
||||
# Function to generate header template file
|
||||
function(create_header_template)
|
||||
file(WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in "/*
|
||||
file(
|
||||
WRITE ${ROCPROF_WRAPPER_DIR}/header.hpp.in
|
||||
"/*
|
||||
Copyright (c) 2022 Advanced Micro Devices, Inc. All rights reserved.
|
||||
|
||||
Permission is hereby granted, free of charge, to any person obtaining a copy
|
||||
@@ -69,105 +71,142 @@ function(create_header_template)
|
||||
#endif")
|
||||
endfunction()
|
||||
|
||||
#use header template file and generate wrapper header files
|
||||
# use header template file and generate wrapper header files
|
||||
function(generate_wrapper_header)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
|
||||
#find all header files from inc
|
||||
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
|
||||
#Convert the list of files into #includes
|
||||
foreach(header_file ${include_files})
|
||||
#set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_INC_DIR})
|
||||
# find all header files from inc
|
||||
file(GLOB include_files ${CMAKE_CURRENT_SOURCE_DIR}/include/rocprofiler/*.h)
|
||||
# Convert the list of files into #includes
|
||||
foreach(header_file ${include_files})
|
||||
# set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${header_file} NAME_WE)
|
||||
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
|
||||
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
|
||||
# set include statement
|
||||
get_filename_component(file_name ${header_file} NAME)
|
||||
set(include_statements
|
||||
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
|
||||
)
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
|
||||
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
unset(include_guard)
|
||||
unset(include_statements)
|
||||
endforeach()
|
||||
|
||||
# Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged.
|
||||
# So drectly using that file name
|
||||
set(file_name "activity.h")
|
||||
# set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
|
||||
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
|
||||
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
|
||||
#set include statement
|
||||
get_filename_component(file_name ${header_file} NAME)
|
||||
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
unset(include_guard)
|
||||
unset(include_statements)
|
||||
endforeach()
|
||||
|
||||
#Only single file from ${CMAKE_CURRENT_SOURCE_DIR}/src/core/activity.h is packaged. So drectly using that file name
|
||||
set(file_name "activity.h")
|
||||
#set include guard
|
||||
get_filename_component(INC_GAURD_NAME ${file_name} NAME_WE)
|
||||
string(TOUPPER ${INC_GAURD_NAME} INC_GAURD_NAME)
|
||||
set(include_guard "${include_guard}ROCPROF_WRAPPER_INCLUDE_${INC_GAURD_NAME}_H")
|
||||
set(include_statements "${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n")
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in ${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
set(include_statements
|
||||
"${include_statements}#include \"../../${CMAKE_INSTALL_INCLUDEDIR}/${ROCPROFILER_NAME}/${file_name}\"\n"
|
||||
)
|
||||
configure_file(${ROCPROF_WRAPPER_DIR}/header.hpp.in
|
||||
${ROCPROF_WRAPPER_INC_DIR}/${file_name})
|
||||
endfunction()
|
||||
|
||||
#function to create symlink to binaries
|
||||
# function to create symlink to binaries
|
||||
function(create_binary_symlink)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
|
||||
#create symlink for rocprof
|
||||
set(file_name "rocprof")
|
||||
add_custom_target(link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_BINDIR}/${file_name} ${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR})
|
||||
# create symlink for rocprof
|
||||
set(file_name "rocprof")
|
||||
add_custom_target(
|
||||
link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink ../../${CMAKE_INSTALL_BINDIR}/${file_name}
|
||||
${ROCPROF_WRAPPER_BIN_DIR}/${file_name})
|
||||
|
||||
endfunction()
|
||||
|
||||
#function to create symlink to libraries
|
||||
# function to create symlink to libraries
|
||||
function(create_library_symlink)
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
|
||||
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
|
||||
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
|
||||
set(SO_VERSION "${LIB_VERSION_STRING}")
|
||||
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}" "${LIB_ROCPROF}.${SO_VERSION}")
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR})
|
||||
set(LIB_ROCPROF "${ROCPROFILER_LIBRARY}.so")
|
||||
set(MAJ_VERSION "${LIB_VERSION_MAJOR}")
|
||||
set(SO_VERSION "${LIB_VERSION_STRING}")
|
||||
set(library_files "${LIB_ROCPROF}" "${LIB_ROCPROF}.${MAJ_VERSION}"
|
||||
"${LIB_ROCPROF}.${SO_VERSION}")
|
||||
|
||||
foreach(file_name ${library_files})
|
||||
add_custom_target(link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${file_name} ${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
|
||||
endforeach()
|
||||
#create symlink to rocprofiler/tool/libtool.so
|
||||
# With File reorg,tool renamed to rocprof-tool
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
|
||||
set(LIB_TOOL "libtool.so")
|
||||
set(LIB_ROCPROFTOOL "librocprof-tool.so")
|
||||
add_custom_target(link_${LIB_TOOL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL} ${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
|
||||
#create symlink to test binary
|
||||
#since its saved in lib folder , the code for the same is added here
|
||||
# With File reorg ,binary name changed from ctrl to rocprof-ctrl
|
||||
set(TEST_CTRL "ctrl")
|
||||
set(TEST_ROCPROFCTRL "rocprof-ctrl")
|
||||
add_custom_target(link_${TEST_CTRL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL} ${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
|
||||
set(METRICS "metrics.xml")
|
||||
add_custom_target(link_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
|
||||
foreach(file_name ${library_files})
|
||||
add_custom_target(
|
||||
link_${file_name} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${file_name}
|
||||
${ROCPROF_WRAPPER_LIB_DIR}/${file_name})
|
||||
endforeach()
|
||||
# create symlink to rocprofiler/tool/libtool.so With File reorg,tool renamed to
|
||||
# rocprof-tool
|
||||
file(MAKE_DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR})
|
||||
set(LIB_TOOL "libtool.so")
|
||||
set(LIB_ROCPROFTOOL "librocprof-tool.so")
|
||||
add_custom_target(
|
||||
link_${LIB_TOOL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${LIB_ROCPROFTOOL}
|
||||
${ROCPROF_WRAPPER_TOOL_DIR}/${LIB_TOOL})
|
||||
# create symlink to test binary since its saved in lib folder , the code for the same
|
||||
# is added here With File reorg ,binary name changed from ctrl to rocprof-ctrl
|
||||
set(TEST_CTRL "ctrl")
|
||||
set(TEST_ROCPROFCTRL "rocprof-ctrl")
|
||||
add_custom_target(
|
||||
link_${TEST_CTRL} ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${TEST_ROCPROFCTRL}
|
||||
${ROCPROF_WRAPPER_TOOL_DIR}/${TEST_CTRL})
|
||||
set(METRICS "metrics.xml")
|
||||
add_custom_target(
|
||||
link_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${METRICS}
|
||||
${ROCPROF_WRAPPER_LIB_DIR}/${METRICS})
|
||||
|
||||
set(GFX_METRICS "gfx_metrics.xml")
|
||||
add_custom_target(link_gfx_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS} ${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
|
||||
set(GFX_METRICS "gfx_metrics.xml")
|
||||
add_custom_target(
|
||||
link_gfx_metrics ALL
|
||||
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
../../${CMAKE_INSTALL_LIBDIR}/${ROCPROFILER_NAME}/${GFX_METRICS}
|
||||
${ROCPROF_WRAPPER_LIB_DIR}/${GFX_METRICS})
|
||||
endfunction()
|
||||
|
||||
#Creater a template for header file
|
||||
# Creater a template for header file
|
||||
create_header_template()
|
||||
#Use template header file and generater wrapper header files
|
||||
# Use template header file and generater wrapper header files
|
||||
generate_wrapper_header()
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_INC_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT dev)
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_INC_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT dev)
|
||||
# Create symlink to binaries
|
||||
create_binary_symlink()
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_BIN_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
create_library_symlink()
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR} DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime
|
||||
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
|
||||
install(FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so DESTINATION ${ROCPROFILER_NAME}/lib
|
||||
COMPONENT dev)
|
||||
#install tools directory
|
||||
install(DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR} DESTINATION ${ROCPROFILER_NAME} COMPONENT runtime)
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_LIB_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime
|
||||
PATTERN ${ROCPROFILER_LIBRARY}.so EXCLUDE)
|
||||
install(
|
||||
FILES ${ROCPROF_WRAPPER_LIB_DIR}/${ROCPROFILER_LIBRARY}.so
|
||||
DESTINATION ${ROCPROFILER_NAME}/lib
|
||||
COMPONENT dev)
|
||||
# install tools directory
|
||||
install(
|
||||
DIRECTORY ${ROCPROF_WRAPPER_TOOL_DIR}
|
||||
DESTINATION ${ROCPROFILER_NAME}
|
||||
COMPONENT runtime)
|
||||
|
||||
@@ -1,15 +1,18 @@
|
||||
include (CheckCSourceCompiles)
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
include(CheckCSourceCompiles)
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
# General Requirements
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
|
||||
include_directories(${HSA_RUNTIME_INC_PATH})
|
||||
|
||||
@@ -22,138 +25,179 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${ROCM_PATH}/lib/cmake/hip")
|
||||
set(CMAKE_HIP_ARCHITECTURES OFF)
|
||||
find_package(HIP REQUIRED MODULE)
|
||||
|
||||
find_package(Clang REQUIRED CONFIG
|
||||
PATHS "${ROCM_PATH}"
|
||||
PATH_SUFFIXES "llvm/lib/cmake/clang")
|
||||
find_package(
|
||||
Clang REQUIRED CONFIG
|
||||
PATHS "${ROCM_PATH}"
|
||||
PATH_SUFFIXES "llvm/lib/cmake/clang")
|
||||
|
||||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules" "${ROCM_PATH}/lib/cmake/hip")
|
||||
set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} "${CMAKE_SOURCE_DIR}/cmake/modules"
|
||||
"${ROCM_PATH}/lib/cmake/hip")
|
||||
find_package(LibElf REQUIRED)
|
||||
find_package(LibDw REQUIRED)
|
||||
|
||||
## Add a custom targets to build and run all the tests
|
||||
# Add a custom targets to build and run all the tests
|
||||
add_custom_target(samples)
|
||||
add_dependencies(samples rocprofiler-v2)
|
||||
add_custom_target(run-samples COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh DEPENDS samples)
|
||||
add_custom_target(
|
||||
run-samples
|
||||
COMMAND ${PROJECT_BINARY_DIR}/samples/run_samples.sh
|
||||
DEPENDS samples)
|
||||
|
||||
file(GLOB ROCPROFILER_UTIL_SRC_FILES ${PROJECT_SOURCE_DIR}/src/utils/helper.cpp)
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
# Samples Build & Run Script
|
||||
# ############################################################################################################################################
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Profiler Samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
## Build Kernel No Replay Sample
|
||||
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build Kernel No Replay Sample
|
||||
set_source_files_properties(profiler/kernel_profiling_no_replay_sample.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(
|
||||
profiler_kernel_no_replay profiler/kernel_profiling_no_replay_sample.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
profiler_kernel_no_replay PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(profiler_kernel_no_replay PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(profiler_kernel_no_replay PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples profiler_kernel_no_replay)
|
||||
install(TARGETS profiler_kernel_no_replay RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS profiler_kernel_no_replay
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build Device Profiling Sample
|
||||
set_source_files_properties(profiler/device_profiling_sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build Device Profiling Sample
|
||||
set_source_files_properties(profiler/device_profiling_sample.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(profiler_device_profiling profiler/device_profiling_sample.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
profiler_device_profiling PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(profiler_device_profiling PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(profiler_device_profiling PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples profiler_device_profiling)
|
||||
install(TARGETS profiler_device_profiling RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS profiler_device_profiling
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/pcie_counters_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/pcie_counters_example.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(pcie_counters_sampler counters_sampler/pcie_counters_example.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
pcie_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(pcie_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
|
||||
target_link_options(pcie_counters_sampler PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples pcie_counters_sampler)
|
||||
install(TARGETS pcie_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS pcie_counters_sampler
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build XGMI Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build XGMI Counters Sampling example
|
||||
set_source_files_properties(counters_sampler/xgmi_counters_sampler_example.cpp
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(
|
||||
xgmi_counters_sampler counters_sampler/xgmi_counters_sampler_example.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
xgmi_counters_sampler PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(xgmi_counters_sampler PRIVATE rocprofiler-v2 systemd amd_comgr)
|
||||
target_link_options(xgmi_counters_sampler PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples xgmi_counters_sampler)
|
||||
install(TARGETS xgmi_counters_sampler RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS xgmi_counters_sampler
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
# ################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Tracer Samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
## Build HIP/HSA Trace Sample
|
||||
# Build HIP/HSA Trace Sample
|
||||
set_source_files_properties(tracer/sample.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(tracer_hip_hsa tracer/sample.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_include_directories(tracer_hip_hsa PRIVATE ${PROJECT_SOURCE_DIR}
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(tracer_hip_hsa PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(tracer_hip_hsa PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples tracer_hip_hsa)
|
||||
install(TARGETS tracer_hip_hsa RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS tracer_hip_hsa
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
## Build HIP/HSA Trace with async output api trace data Sample
|
||||
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp ${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
# Build HIP/HSA Trace with async output api trace data Sample
|
||||
set_source_files_properties(tracer/sample_async.cpp PROPERTIES HIP_SOURCE_PROPERTY_FORMAT
|
||||
1)
|
||||
hip_add_executable(tracer_hip_hsa_async tracer/sample_async.cpp
|
||||
${ROCPROFILER_UTIL_SRC_FILES})
|
||||
target_include_directories(
|
||||
tracer_hip_hsa_async PRIVATE ${PROJECT_SOURCE_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/common)
|
||||
target_link_libraries(tracer_hip_hsa_async PRIVATE rocprofiler-v2 amd_comgr)
|
||||
target_link_options(tracer_hip_hsa_async PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples tracer_hip_hsa_async)
|
||||
install(TARGETS tracer_hip_hsa_async RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
install(TARGETS tracer_hip_hsa_async
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# PC Sampling Samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
set(CODE_PRINTING_SAMPLE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/pcsampler/code_printing_sample)
|
||||
file(GLOB PC_SAMPLING_CODE_PRINTING_FILES ${CODE_PRINTING_SAMPLE_DIR}/*.cpp)
|
||||
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES}
|
||||
HIPCC_OPTIONS
|
||||
-std=c++17
|
||||
# Include debugging symbols and source for the contextual disassembly
|
||||
-gdwarf-4)
|
||||
set_source_files_properties(${PC_SAMPLING_CODE_PRINTING_FILES}
|
||||
PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)
|
||||
hip_add_executable(
|
||||
pc_sampling_code_printing ${PC_SAMPLING_CODE_PRINTING_FILES} HIPCC_OPTIONS -std=c++17
|
||||
# Include debugging symbols and source for the contextual disassembly
|
||||
-gdwarf-4)
|
||||
|
||||
check_c_source_compiles("
|
||||
check_c_source_compiles(
|
||||
"
|
||||
#define _GNU_SOURCE
|
||||
#include <sys/mman.h>
|
||||
int main() { return memfd_create (\"cmake_test\", 0); }
|
||||
" HAVE_MEMFD_CREATE)
|
||||
if (HAVE_MEMFD_CREATE)
|
||||
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
|
||||
endif()
|
||||
"
|
||||
HAVE_MEMFD_CREATE)
|
||||
if(HAVE_MEMFD_CREATE)
|
||||
target_compile_definitions(pc_sampling_code_printing PRIVATE HAVE_MEMFD_CREATE)
|
||||
endif()
|
||||
|
||||
target_link_libraries(pc_sampling_code_printing
|
||||
PRIVATE
|
||||
rocprofiler-v2
|
||||
rocm-dbgapi
|
||||
${LIBELF_LIBRARIES}
|
||||
${LIBDW_LIBRARIES}
|
||||
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
|
||||
target_include_directories(pc_sampling_code_printing
|
||||
PRIVATE
|
||||
${TEST_DIR}
|
||||
${ROOT_DIR}
|
||||
${HSA_RUNTIME_INC_PATH}
|
||||
${PROJECT_SOURCE_DIR})
|
||||
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples pc_sampling_code_printing)
|
||||
install(TARGETS pc_sampling_code_printing RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples COMPONENT samples)
|
||||
target_link_libraries(
|
||||
pc_sampling_code_printing
|
||||
PRIVATE rocprofiler-v2 rocm-dbgapi ${LIBELF_LIBRARIES} ${LIBDW_LIBRARIES}
|
||||
hsa-runtime64::hsa-runtime64 Threads::Threads dl)
|
||||
target_include_directories(
|
||||
pc_sampling_code_printing PRIVATE ${TEST_DIR} ${ROOT_DIR} ${HSA_RUNTIME_INC_PATH}
|
||||
${PROJECT_SOURCE_DIR})
|
||||
target_link_options(pc_sampling_code_printing PRIVATE "-Wl,--build-id=md5")
|
||||
add_dependencies(samples pc_sampling_code_printing)
|
||||
install(TARGETS pc_sampling_code_printing
|
||||
RUNTIME DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples
|
||||
COMPONENT samples)
|
||||
|
||||
install(DIRECTORY "${PROJECT_SOURCE_DIR}/samples/" DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src OPTIONAL COMPONENT samples)
|
||||
install(
|
||||
DIRECTORY "${PROJECT_SOURCE_DIR}/samples/"
|
||||
DESTINATION ${CMAKE_INSTALL_DATAROOTDIR}/${PROJECT_NAME}/samples-src
|
||||
OPTIONAL
|
||||
COMPONENT samples)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Scripts to run samples
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
# Copy run_samples script to samples folder
|
||||
configure_file(run_samples.sh ${PROJECT_BINARY_DIR}/samples COPYONLY)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
@@ -8,15 +8,14 @@ int main(int argc, char** argv) {
|
||||
"CI_PERF_slv_MemRd_Bandwidth0", "CI_PERF_slv_MemWr_Bandwidth0", "CI_PERF_slv_totalMemRdTx",
|
||||
"CI_PERF_slv_totalMemWrTx", "CI_PERF_slv_totalTx"};
|
||||
|
||||
if(argc > 1) {
|
||||
if (argc > 1) {
|
||||
counter_option = atoi(argv[1]);
|
||||
}
|
||||
else{
|
||||
std::cout<< "Please provide one of the counter index options as argument:\n";
|
||||
for(int i = 0; i < pcie_counters.size(); i++){
|
||||
std::cout<< "[" << i << "]: " << pcie_counters[i] << std::endl;
|
||||
} else {
|
||||
std::cout << "Please provide one of the counter index options as argument:\n";
|
||||
for (int i = 0; i < pcie_counters.size(); i++) {
|
||||
std::cout << "[" << i << "]: " << pcie_counters[i] << std::endl;
|
||||
}
|
||||
std::cout<< "Example:\n ./pcie_counters_sampler 1\n";
|
||||
std::cout << "Example:\n ./pcie_counters_sampler 1\n";
|
||||
exit(0);
|
||||
}
|
||||
|
||||
@@ -55,10 +54,10 @@ int main(int argc, char** argv) {
|
||||
.sampling_rate = rate,
|
||||
.sampling_duration = duration,
|
||||
.gpu_agent_index = 0};
|
||||
CHECK_ROCPROFILER(
|
||||
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
|
||||
0, &filter_id, property));
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
|
||||
property));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls
|
||||
|
||||
@@ -40,14 +40,14 @@ int main(int argc, char** argv) {
|
||||
uint32_t duration = 5000;
|
||||
|
||||
rocprofiler_counters_sampler_parameters_t cs_parameters = {.counters = counters_input,
|
||||
.counters_num = 1,
|
||||
.sampling_rate = rate,
|
||||
.sampling_duration = duration,
|
||||
.gpu_agent_index = 0};
|
||||
CHECK_ROCPROFILER(
|
||||
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters},
|
||||
0, &filter_id, property));
|
||||
.counters_num = 1,
|
||||
.sampling_rate = rate,
|
||||
.sampling_duration = duration,
|
||||
.gpu_agent_index = 0};
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_COUNTERS_SAMPLER,
|
||||
rocprofiler_filter_data_t{.counters_sampler_parameters = cs_parameters}, 0, &filter_id,
|
||||
property));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls
|
||||
|
||||
+878
-1037
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -31,96 +31,74 @@
|
||||
namespace amd::debug_agent {
|
||||
|
||||
class code_object_t {
|
||||
struct symbol_info_t {
|
||||
const std::string m_name;
|
||||
amd_dbgapi_global_address_t m_value;
|
||||
amd_dbgapi_size_t m_size;
|
||||
};
|
||||
struct symbol_info_t {
|
||||
const std::string m_name;
|
||||
amd_dbgapi_global_address_t m_value;
|
||||
amd_dbgapi_size_t m_size;
|
||||
};
|
||||
|
||||
using symbol_map_t =
|
||||
std::optional
|
||||
< std::map
|
||||
< amd_dbgapi_global_address_t
|
||||
, std::pair<std::string, amd_dbgapi_size_t>
|
||||
>
|
||||
>;
|
||||
using symbol_map_t = std::optional<
|
||||
std::map<amd_dbgapi_global_address_t, std::pair<std::string, amd_dbgapi_size_t>>>;
|
||||
|
||||
public:
|
||||
void load_symbol_map();
|
||||
void load_debug_info();
|
||||
public:
|
||||
void load_symbol_map();
|
||||
void load_debug_info();
|
||||
|
||||
std::optional<symbol_info_t>
|
||||
find_symbol(amd_dbgapi_global_address_t address);
|
||||
std::optional<symbol_info_t> find_symbol(amd_dbgapi_global_address_t address);
|
||||
|
||||
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
|
||||
code_object_t(code_object_t &&rhs);
|
||||
code_object_t(amd_dbgapi_code_object_id_t code_object_id);
|
||||
code_object_t(code_object_t&& rhs);
|
||||
|
||||
~code_object_t();
|
||||
~code_object_t();
|
||||
|
||||
void open();
|
||||
bool is_open() const { return m_fd.has_value(); }
|
||||
void open();
|
||||
bool is_open() const { return m_fd.has_value(); }
|
||||
|
||||
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
|
||||
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
|
||||
// FIXME(?): extra function not in rocr-debug-agent
|
||||
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
|
||||
amd_dbgapi_global_address_t load_address() const { return m_load_address; }
|
||||
amd_dbgapi_size_t mem_size() const { return m_mem_size; }
|
||||
// FIXME(?): extra function not in rocr-debug-agent
|
||||
uint32_t elf_amdgpu_machine() const { return m_elf_amdgpu_machine; }
|
||||
|
||||
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t pc);
|
||||
void disassemble_around(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t pc);
|
||||
|
||||
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t start_addr,
|
||||
bool const print_src = false);
|
||||
void disassemble_kernel(amd_dbgapi_architecture_id_t architecture_id,
|
||||
amd_dbgapi_global_address_t start_addr, bool const print_src = false);
|
||||
|
||||
bool save(const std::string &directory) const;
|
||||
bool save(const std::string& directory) const;
|
||||
|
||||
amd_dbgapi_global_address_t m_load_address{ 0 };
|
||||
amd_dbgapi_size_t m_mem_size{ 0 };
|
||||
std::optional<int> m_fd;
|
||||
amd_dbgapi_global_address_t m_load_address{0};
|
||||
amd_dbgapi_size_t m_mem_size{0};
|
||||
std::optional<int> m_fd;
|
||||
|
||||
std::optional
|
||||
< std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>
|
||||
>
|
||||
m_line_number_map;
|
||||
std::optional<std::map<amd_dbgapi_global_address_t, std::pair<std::string, size_t>>>
|
||||
m_line_number_map;
|
||||
|
||||
std::optional
|
||||
< std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>
|
||||
>
|
||||
m_pc_ranges_map;
|
||||
std::optional<std::map<amd_dbgapi_global_address_t, amd_dbgapi_global_address_t>> m_pc_ranges_map;
|
||||
|
||||
symbol_map_t m_symbol_map;
|
||||
std::string m_uri;
|
||||
amd_dbgapi_code_object_id_t const m_code_object_id;
|
||||
// FIXME(?): extra field not in rocr-debug-agent
|
||||
uint32_t m_elf_amdgpu_machine{ 0 };
|
||||
symbol_map_t m_symbol_map;
|
||||
std::string m_uri;
|
||||
amd_dbgapi_code_object_id_t const m_code_object_id;
|
||||
// FIXME(?): extra field not in rocr-debug-agent
|
||||
uint32_t m_elf_amdgpu_machine{0};
|
||||
};
|
||||
|
||||
} // namespace amd::debug_agent
|
||||
} // namespace amd::debug_agent
|
||||
|
||||
enum struct disassembly_mode {
|
||||
AROUND,
|
||||
KERNEL
|
||||
};
|
||||
enum struct disassembly_mode { AROUND, KERNEL };
|
||||
|
||||
std::tuple
|
||||
< amd_dbgapi_process_id_t
|
||||
, std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
|
||||
>
|
||||
std::tuple<amd_dbgapi_process_id_t,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>>
|
||||
init_disassembly();
|
||||
|
||||
void
|
||||
disassemble(
|
||||
disassembly_mode const mode,
|
||||
amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
|
||||
&code_object_map,
|
||||
uint64_t const addr);
|
||||
void disassemble(
|
||||
disassembly_mode const mode, amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
|
||||
uint64_t const addr);
|
||||
|
||||
void
|
||||
print_pc_context(
|
||||
amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>
|
||||
&code_object_map,
|
||||
amd_dbgapi_global_address_t const pc);
|
||||
void print_pc_context(
|
||||
amd_dbgapi_process_id_t const process_id,
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t>& code_object_map,
|
||||
amd_dbgapi_global_address_t const pc);
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_CODE_PRINTING_HPP_
|
||||
|
||||
@@ -47,169 +47,130 @@
|
||||
#include "program.hpp"
|
||||
|
||||
struct libc_freer {
|
||||
void operator()(char *p) { free(p); }
|
||||
void operator()(char* p) { free(p); }
|
||||
};
|
||||
|
||||
namespace util {
|
||||
|
||||
template <typename T, typename... Ts>
|
||||
static void
|
||||
hash_combine(size_t &hsh, T const& v, Ts const&... rest)
|
||||
{
|
||||
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
|
||||
(hash_combine(hsh, rest), ...);
|
||||
static void hash_combine(size_t& hsh, T const& v, Ts const&... rest) {
|
||||
hsh ^= std::hash<T>{}(v) + 0x9e3779b9 + (hsh << 6) + (hsh >> 2);
|
||||
(hash_combine(hsh, rest), ...);
|
||||
}
|
||||
|
||||
} // namespace util
|
||||
} // namespace util
|
||||
|
||||
[[maybe_unused]]
|
||||
static inline bool
|
||||
operator==(hsa_executable_t const &l, hsa_executable_t const &r)
|
||||
{
|
||||
return l.handle == r.handle;
|
||||
[[maybe_unused]] static inline bool operator==(hsa_executable_t const& l,
|
||||
hsa_executable_t const& r) {
|
||||
return l.handle == r.handle;
|
||||
}
|
||||
|
||||
[[maybe_unused]]
|
||||
static inline bool
|
||||
operator==(
|
||||
rocprofiler_kernel_dispatch_id_t const &l,
|
||||
rocprofiler_kernel_dispatch_id_t const &r)
|
||||
{
|
||||
return l.value == r.value;
|
||||
[[maybe_unused]] static inline bool operator==(rocprofiler_kernel_dispatch_id_t const& l,
|
||||
rocprofiler_kernel_dispatch_id_t const& r) {
|
||||
return l.value == r.value;
|
||||
}
|
||||
|
||||
static inline bool
|
||||
operator==(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
|
||||
{
|
||||
return l.handle == r.handle;
|
||||
static inline bool operator==(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
|
||||
return l.handle == r.handle;
|
||||
}
|
||||
|
||||
static inline bool
|
||||
operator!=(amd_dbgapi_process_id_t const &l, amd_dbgapi_process_id_t const &r)
|
||||
{
|
||||
return !(l == r);
|
||||
static inline bool operator!=(amd_dbgapi_process_id_t const& l, amd_dbgapi_process_id_t const& r) {
|
||||
return !(l == r);
|
||||
}
|
||||
|
||||
namespace std {
|
||||
|
||||
template <>
|
||||
struct hash<hsa_executable_t> {
|
||||
size_t operator()(hsa_executable_t const &v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.handle);
|
||||
return ret;
|
||||
}
|
||||
template <> struct hash<hsa_executable_t> {
|
||||
size_t operator()(hsa_executable_t const& v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.handle);
|
||||
return ret;
|
||||
}
|
||||
};
|
||||
|
||||
template <>
|
||||
struct hash<rocprofiler_kernel_dispatch_id_t> {
|
||||
size_t operator()(rocprofiler_kernel_dispatch_id_t const &v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.value);
|
||||
return ret;
|
||||
}
|
||||
template <> struct hash<rocprofiler_kernel_dispatch_id_t> {
|
||||
size_t operator()(rocprofiler_kernel_dispatch_id_t const& v) const {
|
||||
size_t ret = 0;
|
||||
util::hash_combine(ret, v.value);
|
||||
return ret;
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace std
|
||||
} // namespace std
|
||||
|
||||
struct disassembly_ctx_t {
|
||||
disassembly_ctx_t();
|
||||
~disassembly_ctx_t();
|
||||
disassembly_ctx_t();
|
||||
~disassembly_ctx_t();
|
||||
|
||||
void disassemble_kernels(bool const reinitialize);
|
||||
void init();
|
||||
bool inited() const;
|
||||
void reset();
|
||||
void disassemble_kernels(bool const reinitialize);
|
||||
void init();
|
||||
bool inited() const;
|
||||
void reset();
|
||||
|
||||
amd_dbgapi_process_id_t process_id;
|
||||
std::map
|
||||
< amd_dbgapi_global_address_t
|
||||
, amd::debug_agent::code_object_t
|
||||
> codeobjs;
|
||||
amd_dbgapi_process_id_t process_id;
|
||||
std::map<amd_dbgapi_global_address_t, amd::debug_agent::code_object_t> codeobjs;
|
||||
};
|
||||
|
||||
disassembly_ctx_t::disassembly_ctx_t()
|
||||
: process_id(AMD_DBGAPI_PROCESS_NONE)
|
||||
, codeobjs()
|
||||
{}
|
||||
disassembly_ctx_t::disassembly_ctx_t() : process_id(AMD_DBGAPI_PROCESS_NONE), codeobjs() {}
|
||||
|
||||
disassembly_ctx_t::~disassembly_ctx_t()
|
||||
{
|
||||
disassembly_ctx_t::~disassembly_ctx_t() { reset(); }
|
||||
|
||||
void disassembly_ctx_t::disassemble_kernels(bool const reinitialize) {
|
||||
if (reinitialize) {
|
||||
reset();
|
||||
}
|
||||
}
|
||||
if (!inited()) {
|
||||
init();
|
||||
}
|
||||
|
||||
void
|
||||
disassembly_ctx_t::disassemble_kernels(bool const reinitialize)
|
||||
{
|
||||
if (reinitialize) {
|
||||
reset();
|
||||
}
|
||||
if (!inited()) {
|
||||
init();
|
||||
auto it = codeobjs.begin();
|
||||
auto const end = codeobjs.end();
|
||||
auto const pred = [](decltype(*it)& x) {
|
||||
/*
|
||||
* A lame filter for the kernels in the current file, because nothing
|
||||
* else in this little demo will have the URL prefix of `file://`.
|
||||
*/
|
||||
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
|
||||
};
|
||||
while (end != (it = std::find_if(it, end, pred))) {
|
||||
auto& codeobj = it->second;
|
||||
codeobj.load_symbol_map();
|
||||
if (!codeobj.m_symbol_map) {
|
||||
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
|
||||
break;
|
||||
}
|
||||
|
||||
auto it = codeobjs.begin();
|
||||
auto const end = codeobjs.end();
|
||||
auto const pred = [](decltype(*it) &x){
|
||||
/*
|
||||
* A lame filter for the kernels in the current file, because nothing
|
||||
* else in this little demo will have the URL prefix of `file://`.
|
||||
*/
|
||||
return x.second.m_uri.find("file://", 0, 7) != std::string::npos;
|
||||
};
|
||||
while (end != (it = std::find_if(it, end, pred))) {
|
||||
auto &codeobj = it->second;
|
||||
codeobj.load_symbol_map();
|
||||
if (!codeobj.m_symbol_map) {
|
||||
fputs(PROGNAME ": error: failed to load symbol map\n", stderr);
|
||||
break;
|
||||
}
|
||||
|
||||
for (auto const &sym : *codeobj.m_symbol_map) {
|
||||
auto const &addr = sym.first;
|
||||
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
|
||||
}
|
||||
|
||||
++it;
|
||||
for (auto const& sym : *codeobj.m_symbol_map) {
|
||||
auto const& addr = sym.first;
|
||||
::disassemble(disassembly_mode::KERNEL, process_id, codeobjs, addr);
|
||||
}
|
||||
|
||||
++it;
|
||||
}
|
||||
}
|
||||
|
||||
inline void
|
||||
disassembly_ctx_t::init()
|
||||
{
|
||||
std::tie(process_id, codeobjs) = init_disassembly();
|
||||
}
|
||||
inline void disassembly_ctx_t::init() { std::tie(process_id, codeobjs) = init_disassembly(); }
|
||||
|
||||
inline bool
|
||||
disassembly_ctx_t::inited() const
|
||||
{
|
||||
return AMD_DBGAPI_PROCESS_NONE != process_id;
|
||||
}
|
||||
inline bool disassembly_ctx_t::inited() const { return AMD_DBGAPI_PROCESS_NONE != process_id; }
|
||||
|
||||
void
|
||||
disassembly_ctx_t::reset()
|
||||
{
|
||||
codeobjs.clear();
|
||||
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
|
||||
amd_dbgapi_process_detach(process_id);
|
||||
amd_dbgapi_finalize();
|
||||
process_id = AMD_DBGAPI_PROCESS_NONE;
|
||||
}
|
||||
void disassembly_ctx_t::reset() {
|
||||
codeobjs.clear();
|
||||
if (AMD_DBGAPI_PROCESS_NONE.handle != process_id.handle) {
|
||||
amd_dbgapi_process_detach(process_id);
|
||||
amd_dbgapi_finalize();
|
||||
process_id = AMD_DBGAPI_PROCESS_NONE;
|
||||
}
|
||||
}
|
||||
|
||||
static disassembly_ctx_t g_dis;
|
||||
|
||||
void
|
||||
disassembly_disassemble_kernels(bool const reinitialize)
|
||||
{
|
||||
g_dis.disassemble_kernels(reinitialize);
|
||||
void disassembly_disassemble_kernels(bool const reinitialize) {
|
||||
g_dis.disassemble_kernels(reinitialize);
|
||||
}
|
||||
|
||||
void
|
||||
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc)
|
||||
{
|
||||
if (!g_dis.inited()) {
|
||||
g_dis.init();
|
||||
}
|
||||
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
|
||||
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const pc) {
|
||||
if (!g_dis.inited()) {
|
||||
g_dis.init();
|
||||
}
|
||||
print_pc_context(g_dis.process_id, g_dis.codeobjs, pc);
|
||||
}
|
||||
|
||||
@@ -23,10 +23,8 @@
|
||||
|
||||
#include <amd-dbgapi/amd-dbgapi.h>
|
||||
|
||||
void
|
||||
disassembly_disassemble_kernels(bool const);
|
||||
void disassembly_disassemble_kernels(bool const);
|
||||
|
||||
void
|
||||
disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
|
||||
void disassembly_print_pc_sample_context(amd_dbgapi_global_address_t const);
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_DISASSEMBLY_HPP_
|
||||
|
||||
@@ -46,274 +46,227 @@
|
||||
namespace util {
|
||||
|
||||
struct hipMalloc_freer {
|
||||
void operator()(void * const ptr) { (void)hipFree(ptr); }
|
||||
void operator()(void* const ptr) { (void)hipFree(ptr); }
|
||||
};
|
||||
|
||||
} // namespace util
|
||||
} // namespace util
|
||||
|
||||
namespace prng {
|
||||
|
||||
static uint64_t
|
||||
splitmix64_next(uint64_t * const sm64_state)
|
||||
{
|
||||
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
|
||||
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
|
||||
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
|
||||
return z ^ (z >> 31);
|
||||
static uint64_t splitmix64_next(uint64_t* const sm64_state) {
|
||||
uint64_t z = (*sm64_state += 0x9e3779b97f4a7c15);
|
||||
z = (z ^ (z >> 30)) * 0xbf58476d1ce4e5b9;
|
||||
z = (z ^ (z >> 27)) * 0x94d049bb133111eb;
|
||||
return z ^ (z >> 31);
|
||||
}
|
||||
|
||||
static inline uint64_t
|
||||
rotl64(const uint64_t x, int k)
|
||||
{
|
||||
return (x << k) | (x >> (64 - k));
|
||||
static inline uint64_t rotl64(const uint64_t x, int k) { return (x << k) | (x >> (64 - k)); }
|
||||
|
||||
static uint64_t xrs_next(uint64_t* const xrs_state) {
|
||||
const uint64_t result = rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
|
||||
|
||||
const uint64_t t = xrs_state[1] << 17;
|
||||
|
||||
xrs_state[2] ^= xrs_state[0];
|
||||
xrs_state[3] ^= xrs_state[1];
|
||||
xrs_state[1] ^= xrs_state[2];
|
||||
xrs_state[0] ^= xrs_state[3];
|
||||
|
||||
xrs_state[2] ^= t;
|
||||
|
||||
xrs_state[3] = rotl64(xrs_state[3], 45);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
static uint64_t
|
||||
xrs_next(uint64_t * const xrs_state)
|
||||
{
|
||||
const uint64_t result =
|
||||
rotl64(xrs_state[0] + xrs_state[3], 23) + xrs_state[0];
|
||||
|
||||
const uint64_t t = xrs_state[1] << 17;
|
||||
|
||||
xrs_state[2] ^= xrs_state[0];
|
||||
xrs_state[3] ^= xrs_state[1];
|
||||
xrs_state[1] ^= xrs_state[2];
|
||||
xrs_state[0] ^= xrs_state[3];
|
||||
|
||||
xrs_state[2] ^= t;
|
||||
|
||||
xrs_state[3] = rotl64(xrs_state[3], 45);
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
} // namespace prng
|
||||
} // namespace prng
|
||||
|
||||
namespace kernel {
|
||||
|
||||
template <typename T>
|
||||
__global__ static void
|
||||
memset_gpu(T * const s, T const c, size_t const n)
|
||||
{
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
s[i] = c;
|
||||
}
|
||||
template <typename T> __global__ static void memset_gpu(T* const s, T const c, size_t const n) {
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
s[i] = c;
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
__global__ static void
|
||||
count_gpu(
|
||||
T const * const xs,
|
||||
T * const out,
|
||||
size_t const n,
|
||||
size_t const nblocks,
|
||||
T const gt)
|
||||
{
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
if (xs[i] > gt) {
|
||||
atomicAdd(&out[i % nblocks], 1);
|
||||
}
|
||||
__global__ static void count_gpu(T const* const xs, T* const out, size_t const n,
|
||||
size_t const nblocks, T const gt) {
|
||||
size_t i_start = threadIdx.x + blockIdx.x * blockDim.x;
|
||||
size_t i_shift = blockDim.x * gridDim.x;
|
||||
for (size_t i = i_start; i < n; i += i_shift) {
|
||||
if (xs[i] > gt) {
|
||||
atomicAdd(&out[i % nblocks], 1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace kernel
|
||||
} // namespace kernel
|
||||
|
||||
static char const GETOPT_ARGS[] = "cd:mn:DP";
|
||||
|
||||
static void
|
||||
usage()
|
||||
{
|
||||
fputs("usage: " PROGNAME " [OPTION]... MIN [SEED]\n"
|
||||
" -d DEV\tHIP device number\n"
|
||||
" -n LEN\tLength of random integer array\n"
|
||||
" -D\t\tPrint kernel disassembly\n"
|
||||
" -P\t\tPrint source and disassembly of sampled PC locations\n"
|
||||
"where\n"
|
||||
" DEV : i32\n"
|
||||
" MIN : u64\n"
|
||||
" LEN : u64\n"
|
||||
" SEED : u64\n",
|
||||
stderr);
|
||||
static void usage() {
|
||||
fputs("usage: " PROGNAME
|
||||
" [OPTION]... MIN [SEED]\n"
|
||||
" -d DEV\tHIP device number\n"
|
||||
" -n LEN\tLength of random integer array\n"
|
||||
" -D\t\tPrint kernel disassembly\n"
|
||||
" -P\t\tPrint source and disassembly of sampled PC locations\n"
|
||||
"where\n"
|
||||
" DEV : i32\n"
|
||||
" MIN : u64\n"
|
||||
" LEN : u64\n"
|
||||
" SEED : u64\n",
|
||||
stderr);
|
||||
}
|
||||
|
||||
static int
|
||||
get_options(int argc, char **argv, program_options * const opts)
|
||||
{
|
||||
int opt;
|
||||
static int get_options(int argc, char** argv, program_options* const opts) {
|
||||
int opt;
|
||||
|
||||
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
|
||||
switch (opt) {
|
||||
case 'd':
|
||||
// TODO error checking
|
||||
opts->device = strtol(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'n':
|
||||
// TODO error checking
|
||||
opts->rands_len = strtoul(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'D':
|
||||
opts->disassemble = true;
|
||||
break;
|
||||
case 'P':
|
||||
opts->pc_sampling = true;
|
||||
break;
|
||||
default:
|
||||
usage();
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
}
|
||||
|
||||
auto const optcount = argc - optind;
|
||||
if (!(1 == optcount || 2 == optcount)) {
|
||||
while (-1 != (opt = getopt(argc, argv, GETOPT_ARGS))) {
|
||||
switch (opt) {
|
||||
case 'd':
|
||||
// TODO error checking
|
||||
opts->device = strtol(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'n':
|
||||
// TODO error checking
|
||||
opts->rands_len = strtoul(optarg, nullptr, 10);
|
||||
break;
|
||||
case 'D':
|
||||
opts->disassemble = true;
|
||||
break;
|
||||
case 'P':
|
||||
opts->pc_sampling = true;
|
||||
break;
|
||||
default:
|
||||
usage();
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
}
|
||||
|
||||
// TODO error checking
|
||||
opts->gt = strtoul(argv[optind], nullptr, 10);
|
||||
if (2 == argc - optind) {
|
||||
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
|
||||
}
|
||||
auto const optcount = argc - optind;
|
||||
if (!(1 == optcount || 2 == optcount)) {
|
||||
usage();
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
return EXIT_SUCCESS;
|
||||
// TODO error checking
|
||||
opts->gt = strtoul(argv[optind], nullptr, 10);
|
||||
if (2 == argc - optind) {
|
||||
opts->seed = strtoull(argv[optind + 1], nullptr, 10);
|
||||
}
|
||||
|
||||
return EXIT_SUCCESS;
|
||||
}
|
||||
|
||||
static program_options g_opts;
|
||||
|
||||
static void
|
||||
callback_flush_fn(
|
||||
rocprofiler_record_header_t const *record,
|
||||
rocprofiler_record_header_t const *end_record,
|
||||
rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id)
|
||||
{
|
||||
while (record < end_record) {
|
||||
if (nullptr == record) {
|
||||
break;
|
||||
}
|
||||
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
|
||||
auto const &pcr = (rocprofiler_record_pc_sample_t &)*record;
|
||||
printf(
|
||||
"dispatch[%" PRIu64 "] timestamp(%" PRIu64
|
||||
") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
|
||||
") se(%" PRIu32 ")\n",
|
||||
pcr.pc_sample.dispatch_id.value,
|
||||
pcr.pc_sample.timestamp.value,
|
||||
pcr.pc_sample.gpu_id.handle,
|
||||
pcr.pc_sample.pc,
|
||||
pcr.pc_sample.se);
|
||||
if (g_opts.pc_sampling) {
|
||||
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
|
||||
}
|
||||
}
|
||||
rocprofiler_next_record(record, &record, session_id, buffer_id);
|
||||
static void callback_flush_fn(rocprofiler_record_header_t const* record,
|
||||
rocprofiler_record_header_t const* end_record,
|
||||
rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id) {
|
||||
while (record < end_record) {
|
||||
if (nullptr == record) {
|
||||
break;
|
||||
}
|
||||
if (ROCPROFILER_PC_SAMPLING_RECORD == record->kind) {
|
||||
auto const& pcr = (rocprofiler_record_pc_sample_t&)*record;
|
||||
printf("dispatch[%" PRIu64 "] timestamp(%" PRIu64 ") gpu_id(%#" PRIx64 ") pc-sample(%#" PRIx64
|
||||
") se(%" PRIu32 ")\n",
|
||||
pcr.pc_sample.dispatch_id.value, pcr.pc_sample.timestamp.value,
|
||||
pcr.pc_sample.gpu_id.handle, pcr.pc_sample.pc, pcr.pc_sample.se);
|
||||
if (g_opts.pc_sampling) {
|
||||
disassembly_print_pc_sample_context(pcr.pc_sample.pc);
|
||||
}
|
||||
}
|
||||
rocprofiler_next_record(record, &record, session_id, buffer_id);
|
||||
}
|
||||
}
|
||||
|
||||
static int
|
||||
run_kernel(program_options const &opts)
|
||||
{
|
||||
rocprofiler_session_id_t sid;
|
||||
rocprofiler_filter_id_t fid, fid2;
|
||||
rocprofiler_buffer_id_t bid;
|
||||
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
static int run_kernel(program_options const& opts) {
|
||||
rocprofiler_session_id_t sid;
|
||||
rocprofiler_filter_id_t fid, fid2;
|
||||
rocprofiler_buffer_id_t bid;
|
||||
auto rocprofiler_ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
|
||||
if (opts.pc_sampling) {
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to create rocprofiler session\n", stderr);
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
rocprofiler_filter_property_t property{};
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_buffer(
|
||||
sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to add PC sampling session mode\n", stderr);
|
||||
goto out;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_filter(
|
||||
sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
|
||||
rocprofiler_filter_data_t{},
|
||||
0, &fid, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_filter(
|
||||
sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{},
|
||||
0, &fid2, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_set_filter_buffer(sid, fid, bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_set_filter_buffer(sid, fid2, bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_start_session(sid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
if (opts.pc_sampling) {
|
||||
ROCPROFILER_CHECK(rocprofiler_create_session(ROCPROFILER_NONE_REPLAY_MODE, &sid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to create rocprofiler session\n", stderr);
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
{
|
||||
rocprofiler_filter_property_t property{};
|
||||
|
||||
ROCPROFILER_CHECK(
|
||||
rocprofiler_create_buffer(sid, callback_flush_fn, static_cast<size_t>(0x1000), &bid),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
fputs("error: failed to add PC sampling session mode\n", stderr);
|
||||
goto out;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_PC_SAMPLING_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &fid, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_create_filter(sid, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &fid2, property),
|
||||
rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid, bid), rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_set_filter_buffer(sid, fid2, bid), rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_start_session(sid), rocprofiler_ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS != rocprofiler_ok) {
|
||||
goto cleanup;
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
printf("seed = %" PRIu64 "\n", opts.seed);
|
||||
|
||||
std::vector<uint64_t> rands(opts.rands_len);
|
||||
using rands_elt_t = decltype(rands)::value_type;
|
||||
|
||||
uint64_t
|
||||
sm64_state = opts.seed,
|
||||
xrs_state[4];
|
||||
uint64_t sm64_state = opts.seed, xrs_state[4];
|
||||
|
||||
{
|
||||
using prng::splitmix64_next;
|
||||
using prng::xrs_next;
|
||||
using prng::splitmix64_next;
|
||||
using prng::xrs_next;
|
||||
|
||||
// Initialize the Xoroshiro PRNG
|
||||
xrs_state[0] = splitmix64_next(&sm64_state);
|
||||
xrs_state[1] = splitmix64_next(&sm64_state);
|
||||
xrs_state[2] = splitmix64_next(&sm64_state);
|
||||
xrs_state[3] = splitmix64_next(&sm64_state);
|
||||
// Initialize the Xoroshiro PRNG
|
||||
xrs_state[0] = splitmix64_next(&sm64_state);
|
||||
xrs_state[1] = splitmix64_next(&sm64_state);
|
||||
xrs_state[2] = splitmix64_next(&sm64_state);
|
||||
xrs_state[3] = splitmix64_next(&sm64_state);
|
||||
|
||||
// Fill rands with random integers
|
||||
for (auto &i : rands) {
|
||||
i = xrs_next(xrs_state);
|
||||
}
|
||||
// Fill rands with random integers
|
||||
for (auto& i : rands) {
|
||||
i = xrs_next(xrs_state);
|
||||
}
|
||||
}
|
||||
|
||||
struct tm {
|
||||
using monoclk = std::chrono::steady_clock;
|
||||
using dur = std::chrono::duration<double>;
|
||||
using monoclk = std::chrono::steady_clock;
|
||||
using dur = std::chrono::duration<double>;
|
||||
};
|
||||
|
||||
using util::hipMalloc_freer;
|
||||
@@ -322,126 +275,109 @@ run_kernel(program_options const &opts)
|
||||
|
||||
auto hip_ok = hipSuccess;
|
||||
do {
|
||||
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
|
||||
HIP_CHECK_BREAK(hipSetDevice(opts.device), hip_ok);
|
||||
|
||||
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
|
||||
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
|
||||
{
|
||||
rands_elt_t *rands_gpu_ptr;
|
||||
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
|
||||
rands_gpu.reset(rands_gpu_ptr);
|
||||
}
|
||||
auto const rands_nbytes = rands.size() * sizeof(rands_elt_t);
|
||||
std::unique_ptr<rands_elt_t, hipMalloc_freer> rands_gpu;
|
||||
{
|
||||
rands_elt_t* rands_gpu_ptr;
|
||||
HIP_CHECK_BREAK(hipMalloc(&rands_gpu_ptr, rands_nbytes), hip_ok);
|
||||
rands_gpu.reset(rands_gpu_ptr);
|
||||
}
|
||||
|
||||
HIP_CHECK_BREAK(
|
||||
hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes,
|
||||
hipMemcpyHostToDevice),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
HIP_CHECK_BREAK(hipMemcpy(rands_gpu.get(), rands.data(), rands_nbytes, hipMemcpyHostToDevice),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
uint32_t constexpr nthreads = 256U;
|
||||
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
|
||||
uint32_t constexpr nthreads = 256U;
|
||||
uint32_t const nblocks = (rands.size() + nthreads - 1) / nthreads;
|
||||
|
||||
using count_elt_t = size_t;
|
||||
using count_elt_t = size_t;
|
||||
|
||||
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
|
||||
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
|
||||
{
|
||||
count_elt_t *count_subtotals_gpu_ptr;
|
||||
HIP_CHECK_BREAK(
|
||||
hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes),
|
||||
hip_ok);
|
||||
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
|
||||
}
|
||||
auto const count_subtotals_nbytes = nblocks * sizeof(count_elt_t);
|
||||
std::unique_ptr<count_elt_t, hipMalloc_freer> count_subtotals_gpu;
|
||||
{
|
||||
count_elt_t* count_subtotals_gpu_ptr;
|
||||
HIP_CHECK_BREAK(hipMalloc(&count_subtotals_gpu_ptr, count_subtotals_nbytes), hip_ok);
|
||||
count_subtotals_gpu.reset(count_subtotals_gpu_ptr);
|
||||
}
|
||||
|
||||
hipLaunchKernelGGL(
|
||||
kernel::memset_gpu, nblocks, nthreads, 0, 0,
|
||||
count_subtotals_gpu.get(), 0UL, static_cast<size_t>(nblocks));
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
hipLaunchKernelGGL(kernel::memset_gpu, nblocks, nthreads, 0, 0, count_subtotals_gpu.get(),
|
||||
0UL, static_cast<size_t>(nblocks));
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
auto const kernel_begin_time = tm::monoclk::now();
|
||||
auto const kernel_begin_time = tm::monoclk::now();
|
||||
|
||||
hipLaunchKernelGGL(
|
||||
kernel::count_gpu, nblocks, nthreads, 0, 0,
|
||||
rands_gpu.get(), count_subtotals_gpu.get(), rands.size(),
|
||||
static_cast<size_t>(nblocks), opts.gt);
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
hipLaunchKernelGGL(kernel::count_gpu, nblocks, nthreads, 0, 0, rands_gpu.get(),
|
||||
count_subtotals_gpu.get(), rands.size(), static_cast<size_t>(nblocks),
|
||||
opts.gt);
|
||||
HIP_CHECK_BREAK(hipGetLastError(), hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
auto const kernel_end_time = tm::monoclk::now();
|
||||
auto const kernel_end_time = tm::monoclk::now();
|
||||
|
||||
std::vector<size_t> count_subtotals(nblocks);
|
||||
HIP_CHECK_BREAK(
|
||||
hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
|
||||
count_subtotals_nbytes, hipMemcpyDeviceToHost),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
std::vector<size_t> count_subtotals(nblocks);
|
||||
HIP_CHECK_BREAK(hipMemcpy(count_subtotals.data(), count_subtotals_gpu.get(),
|
||||
count_subtotals_nbytes, hipMemcpyDeviceToHost),
|
||||
hip_ok);
|
||||
(void)hipDeviceSynchronize();
|
||||
|
||||
// TODO parallel sum on GPU
|
||||
auto const total =
|
||||
std::accumulate(
|
||||
count_subtotals.cbegin(), count_subtotals.cend(),
|
||||
static_cast<size_t>(0));
|
||||
// TODO parallel sum on GPU
|
||||
auto const total =
|
||||
std::accumulate(count_subtotals.cbegin(), count_subtotals.cend(), static_cast<size_t>(0));
|
||||
|
||||
auto const all_end_time = tm::monoclk::now();
|
||||
auto const all_end_time = tm::monoclk::now();
|
||||
|
||||
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
|
||||
auto total_time(all_end_time - begin_time);
|
||||
tm::dur const total_time_without_tool_init(total_time);
|
||||
printf("len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
|
||||
"main kernel time elapsed: %" DBL_FMT "\n"
|
||||
"full time elapsed: %" DBL_FMT "\n",
|
||||
rands.size(), opts.gt, total,
|
||||
kernel_time.count(),
|
||||
total_time_without_tool_init.count());
|
||||
tm::dur const kernel_time(kernel_end_time - kernel_begin_time);
|
||||
auto total_time(all_end_time - begin_time);
|
||||
tm::dur const total_time_without_tool_init(total_time);
|
||||
printf(
|
||||
"len(rands) = %zu; gt = %zu; count(rands, gt) = %zu\n"
|
||||
"main kernel time elapsed: %" DBL_FMT
|
||||
"\n"
|
||||
"full time elapsed: %" DBL_FMT "\n",
|
||||
rands.size(), opts.gt, total, kernel_time.count(), total_time_without_tool_init.count());
|
||||
} while (false);
|
||||
|
||||
if (opts.disassemble) {
|
||||
disassembly_disassemble_kernels(false);
|
||||
}
|
||||
disassembly_disassemble_kernels(false);
|
||||
}
|
||||
}
|
||||
|
||||
cleanup:
|
||||
if (opts.pc_sampling) {
|
||||
rocprofiler_terminate_session(sid);
|
||||
rocprofiler_flush_data(sid, bid);
|
||||
rocprofiler_destroy_session(sid);
|
||||
}
|
||||
if (opts.pc_sampling) {
|
||||
rocprofiler_terminate_session(sid);
|
||||
rocprofiler_flush_data(sid, bid);
|
||||
rocprofiler_destroy_session(sid);
|
||||
}
|
||||
|
||||
out:
|
||||
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok
|
||||
? EXIT_SUCCESS
|
||||
: EXIT_FAILURE;
|
||||
return ROCPROFILER_STATUS_SUCCESS == rocprofiler_ok ? EXIT_SUCCESS : EXIT_FAILURE;
|
||||
}
|
||||
|
||||
int
|
||||
main(int argc, char **argv)
|
||||
{
|
||||
if (auto const ret = get_options(argc, argv, &g_opts);
|
||||
EXIT_SUCCESS != ret)
|
||||
{
|
||||
return ret;
|
||||
}
|
||||
int main(int argc, char** argv) {
|
||||
if (auto const ret = get_options(argc, argv, &g_opts); EXIT_SUCCESS != ret) {
|
||||
return ret;
|
||||
}
|
||||
|
||||
if (hsa_init() != HSA_STATUS_SUCCESS){
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
if (hsa_init() != HSA_STATUS_SUCCESS) {
|
||||
return EXIT_FAILURE;
|
||||
}
|
||||
|
||||
int ret = EXIT_FAILURE;
|
||||
auto ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
int ret = EXIT_FAILURE;
|
||||
auto ok = ROCPROFILER_STATUS_SUCCESS;
|
||||
|
||||
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS == ok) {
|
||||
ret = run_kernel(g_opts);
|
||||
} else {
|
||||
goto out;
|
||||
}
|
||||
ROCPROFILER_CHECK(rocprofiler_initialize(), ok);
|
||||
if (ROCPROFILER_STATUS_SUCCESS == ok) {
|
||||
ret = run_kernel(g_opts);
|
||||
} else {
|
||||
goto out;
|
||||
}
|
||||
|
||||
rocprofiler_finalize();
|
||||
rocprofiler_finalize();
|
||||
|
||||
out:
|
||||
hsa_shut_down();
|
||||
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret
|
||||
? EXIT_SUCCESS
|
||||
: EXIT_FAILURE;
|
||||
hsa_shut_down();
|
||||
return ROCPROFILER_STATUS_SUCCESS == ok && EXIT_FAILURE != ret ? EXIT_SUCCESS : EXIT_FAILURE;
|
||||
}
|
||||
|
||||
@@ -23,32 +23,30 @@
|
||||
|
||||
#define PROGNAME "code_printing_sample"
|
||||
|
||||
#define HIP_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, \
|
||||
PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", \
|
||||
__FILE__, __LINE__, hipGetErrorString((code))); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
#define HIP_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, HIP error: %s\n", __FILE__, __LINE__, \
|
||||
hipGetErrorString((code))); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
|
||||
#define HIP_CHECK_BREAK(expr, var) \
|
||||
if (auto const code = (expr); hipSuccess != code) { \
|
||||
HIP_ERROR(code); \
|
||||
(var) = code; \
|
||||
break; \
|
||||
}
|
||||
#define HIP_CHECK_BREAK(expr, var) \
|
||||
if (auto const code = (expr); hipSuccess != code) { \
|
||||
HIP_ERROR(code); \
|
||||
(var) = code; \
|
||||
break; \
|
||||
}
|
||||
|
||||
#define ROCPROFILER_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, \
|
||||
PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", \
|
||||
__FILE__, __LINE__, rocprofiler_error_str(code)); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
#define ROCPROFILER_ERROR(code) \
|
||||
do { \
|
||||
fprintf(stderr, PROGNAME ": Assertion failed at %s:%d, ROCProfiler error: %s\n", __FILE__, \
|
||||
__LINE__, rocprofiler_error_str(code)); \
|
||||
fflush(stderr); \
|
||||
} while (false);
|
||||
|
||||
#define ROCPROFILER_CHECK(expr, var) \
|
||||
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
|
||||
ROCPROFILER_ERROR((var)); \
|
||||
}
|
||||
#define ROCPROFILER_CHECK(expr, var) \
|
||||
if ((var) = (expr); ROCPROFILER_STATUS_SUCCESS != (var)) { \
|
||||
ROCPROFILER_ERROR((var)); \
|
||||
}
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_HPP_
|
||||
|
||||
+18
-19
@@ -25,25 +25,24 @@
|
||||
#include <cstdint>
|
||||
|
||||
struct program_options {
|
||||
program_options()
|
||||
: device(0)
|
||||
, no_gpu(false)
|
||||
, hip_memset(false)
|
||||
, rands_len(1024 * 1024 * 4)
|
||||
, gt(0)
|
||||
, seed(std::chrono::steady_clock::now().time_since_epoch().count())
|
||||
, disassemble(false)
|
||||
, pc_sampling(false)
|
||||
{}
|
||||
program_options()
|
||||
: device(0),
|
||||
no_gpu(false),
|
||||
hip_memset(false),
|
||||
rands_len(1024 * 1024 * 4),
|
||||
gt(0),
|
||||
seed(std::chrono::steady_clock::now().time_since_epoch().count()),
|
||||
disassemble(false),
|
||||
pc_sampling(false) {}
|
||||
|
||||
int device;
|
||||
bool no_gpu;
|
||||
bool hip_memset;
|
||||
size_t rands_len;
|
||||
uint64_t gt;
|
||||
uint64_t seed;
|
||||
bool disassemble;
|
||||
bool pc_sampling;
|
||||
int device;
|
||||
bool no_gpu;
|
||||
bool hip_memset;
|
||||
size_t rands_len;
|
||||
uint64_t gt;
|
||||
uint64_t seed;
|
||||
bool disassemble;
|
||||
bool pc_sampling;
|
||||
};
|
||||
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
|
||||
#endif // SAMPLES_PCSAMPLER_CODE_PRINTING_SAMPLE_PROGRAM_OPTIONS_HPP_
|
||||
|
||||
@@ -23,8 +23,8 @@ int main(int argc, char** argv) {
|
||||
|
||||
int gpu_agent = 0;
|
||||
int cpu_agent = 0;
|
||||
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(&counters[0], counters.size(),
|
||||
&dp_session_id, gpu_agent, cpu_agent));
|
||||
CHECK_ROCPROFILER(rocprofiler_device_profiling_session_create(
|
||||
&counters[0], counters.size(), &dp_session_id, gpu_agent, cpu_agent));
|
||||
|
||||
printf("session start \n");
|
||||
// start GPU device profiling
|
||||
|
||||
@@ -25,9 +25,10 @@ int main(int argc, char** argv) {
|
||||
counters.emplace_back("GRBM_COUNT");
|
||||
rocprofiler_filter_id_t filter_id;
|
||||
[[maybe_unused]] rocprofiler_filter_property_t property = {};
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
|
||||
rocprofiler_filter_data_t{.counters_names = &counters[0]},
|
||||
counters.size(), &filter_id, property));
|
||||
CHECK_ROCPROFILER(
|
||||
rocprofiler_create_filter(session_id, ROCPROFILER_COUNTERS_COLLECTION,
|
||||
rocprofiler_filter_data_t{.counters_names = &counters[0]},
|
||||
counters.size(), &filter_id, property));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls
|
||||
|
||||
@@ -40,9 +40,9 @@ int main(int argc, char** argv) {
|
||||
|
||||
// Kernel Tracing
|
||||
rocprofiler_filter_id_t kernel_tracing_filter_id;
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
|
||||
rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
|
||||
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls won't be traced
|
||||
|
||||
@@ -35,9 +35,9 @@ int main(int argc, char** argv) {
|
||||
|
||||
// Kernel Tracing
|
||||
rocprofiler_filter_id_t kernel_tracing_filter_id;
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION,
|
||||
rocprofiler_filter_data_t{}, 0, &kernel_tracing_filter_id,
|
||||
rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_create_filter(
|
||||
session_id, ROCPROFILER_DISPATCH_TIMESTAMPS_COLLECTION, rocprofiler_filter_data_t{}, 0,
|
||||
&kernel_tracing_filter_id, rocprofiler_filter_property_t{}));
|
||||
CHECK_ROCPROFILER(rocprofiler_set_filter_buffer(session_id, kernel_tracing_filter_id, buffer_id));
|
||||
|
||||
// Normal HIP Calls won't be traced
|
||||
|
||||
@@ -1,25 +1,34 @@
|
||||
# ############################################################################################################################################
|
||||
# ROCProfiler General Requirements
|
||||
# ############################################################################################################################################
|
||||
find_package(Python3 COMPONENTS Interpreter REQUIRED)
|
||||
find_package(
|
||||
Python3
|
||||
COMPONENTS Interpreter
|
||||
REQUIRED)
|
||||
|
||||
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import lxml"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
execute_process(
|
||||
COMMAND ${Python3_EXECUTABLE} -c "import lxml"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
|
||||
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
|
||||
message(FATAL_ERROR "\
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"\
|
||||
The \"lxml\" Python3 package is not installed. \
|
||||
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install lxml\".\
|
||||
")
|
||||
endif()
|
||||
|
||||
execute_process(COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
execute_process(
|
||||
COMMAND ${Python3_EXECUTABLE} -c "import CppHeaderParser"
|
||||
RESULT_VARIABLE CPP_HEADER_PARSER
|
||||
OUTPUT_QUIET)
|
||||
|
||||
if(NOT ${CPP_HEADER_PARSER} EQUAL 0)
|
||||
message(FATAL_ERROR "\
|
||||
message(
|
||||
FATAL_ERROR
|
||||
"\
|
||||
The \"CppHeaderParser\" Python3 package is not installed. \
|
||||
Please install it using the following command: \"${Python3_EXECUTABLE} -m pip install CppHeaderParser\".\
|
||||
")
|
||||
@@ -29,134 +38,157 @@ endif()
|
||||
set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${PROJECT_BINARY_DIR})
|
||||
|
||||
# Getting HSA Include Directory
|
||||
get_property(HSA_RUNTIME_INCLUDE_DIRECTORIES TARGET hsa-runtime64::hsa-runtime64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HSA_RUNTIME_INCLUDE_DIRECTORIES
|
||||
TARGET hsa-runtime64::hsa-runtime64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HSA_H hsa.h
|
||||
PATHS ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hsa
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
get_filename_component(HSA_RUNTIME_INC_PATH ${HSA_H} DIRECTORY)
|
||||
|
||||
find_library(AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so" HINTS ${CMAKE_PREFIX_PATH} PATHS ${ROCM_PATH} PATH_SUFFIXES lib)
|
||||
find_library(
|
||||
AQLPROFILE_LIB "libhsa-amd-aqlprofile64.so"
|
||||
HINTS ${CMAKE_PREFIX_PATH}
|
||||
PATHS ${ROCM_PATH}
|
||||
PATH_SUFFIXES lib)
|
||||
|
||||
if(NOT AQLPROFILE_LIB)
|
||||
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
|
||||
message(FATAL_ERROR "AQL_PROFILE not installed. Please install hsa-amd-aqlprofile!")
|
||||
endif()
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Adding Old Library Files
|
||||
# ############################################################################################################################################
|
||||
set (OLD_LIB_SRC
|
||||
${LIB_DIR}/core/rocprofiler.cpp
|
||||
${LIB_DIR}/core/gpu_command.cpp
|
||||
${LIB_DIR}/core/proxy_queue.cpp
|
||||
${LIB_DIR}/core/simple_proxy_queue.cpp
|
||||
${LIB_DIR}/core/intercept_queue.cpp
|
||||
${LIB_DIR}/core/metrics.cpp
|
||||
${LIB_DIR}/core/activity.cpp
|
||||
${LIB_DIR}/util/hsa_rsrc_factory.cpp
|
||||
)
|
||||
# ########################################################################################
|
||||
set(OLD_LIB_SRC
|
||||
${LIB_DIR}/core/rocprofiler.cpp
|
||||
${LIB_DIR}/core/gpu_command.cpp
|
||||
${LIB_DIR}/core/proxy_queue.cpp
|
||||
${LIB_DIR}/core/simple_proxy_queue.cpp
|
||||
${LIB_DIR}/core/intercept_queue.cpp
|
||||
${LIB_DIR}/core/metrics.cpp
|
||||
${LIB_DIR}/core/activity.cpp
|
||||
${LIB_DIR}/util/hsa_rsrc_factory.cpp)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# Configuring Basic/Derived Counters
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
set(COUNTERS_DIR ${PROJECT_SOURCE_DIR}/src/core/counters)
|
||||
|
||||
execute_process(
|
||||
COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py ${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp
|
||||
COMMENT "Generating basic_counter.cpp...")
|
||||
COMMAND
|
||||
${Python3_EXECUTABLE} ${COUNTERS_DIR}/basic/xml_parser_basic.py
|
||||
${COUNTERS_DIR}/basic ${CMAKE_CURRENT_BINARY_DIR}/basic_counter.cpp COMMENT
|
||||
"Generating basic_counter.cpp...")
|
||||
|
||||
# execute_process(
|
||||
# COMMAND ${Python3_EXECUTABLE} ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp
|
||||
# COMMENT "Generating derived_counter.cpp...")
|
||||
# execute_process( COMMAND ${Python3_EXECUTABLE}
|
||||
# ${COUNTERS_DIR}/derived/xml_parser_derived.py ${COUNTERS_DIR}/derived
|
||||
# ${CMAKE_CURRENT_BINARY_DIR}/derived_counter.cpp COMMENT "Generating
|
||||
# derived_counter.cpp...")
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ROCProfiler Tracer HIP/HSA Parsing
|
||||
# ############################################################################################################################################
|
||||
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
# ########################################################################################
|
||||
get_property(
|
||||
HIP_INCLUDE_DIRECTORIES
|
||||
TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
|
||||
# # Generate the HSA wrapper functions header
|
||||
add_custom_command(
|
||||
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py ${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h" "${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
|
||||
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
|
||||
OUTPUT hsa_prof_str.h hsa_prof_str.inline.h
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/hsaap.py
|
||||
${CMAKE_CURRENT_BINARY_DIR} "${HSA_RUNTIME_INC_PATH}" > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/hsaap.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_ext_image.h"
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa_api_trace.h"
|
||||
COMMENT "Generating hsa_prof_str.h,hsa_prof_str.inline.h...")
|
||||
|
||||
# # Generate the HSA pretty printers
|
||||
add_custom_command(
|
||||
OUTPUT hsa_ostream_ops.h
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o hsa_ext_amd.h.i
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
-in hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
COMMENT "Generating hsa_ostream_ops.h...")
|
||||
OUTPUT hsa_ostream_ops.h
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa.h" -o hsa.h.i
|
||||
COMMAND ${CMAKE_C_COMPILER} -E "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h" -o
|
||||
hsa_ext_amd.h.i
|
||||
BYPRODUCTS hsa.h.i hsa_ext_amd.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
|
||||
hsa.h.i,hsa_ext_amd.h.i -out hsa_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
"${HSA_RUNTIME_INC_PATH}/hsa.h" "${HSA_RUNTIME_INC_PATH}/hsa_ext_amd.h"
|
||||
COMMENT "Generating hsa_ostream_ops.h...")
|
||||
|
||||
get_property(HIP_INCLUDE_DIRECTORIES TARGET hip::amdhip64 PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH
|
||||
REQUIRED)
|
||||
get_property(
|
||||
HIP_INCLUDE_DIRECTORIES
|
||||
TARGET hip::amdhip64
|
||||
PROPERTY INTERFACE_INCLUDE_DIRECTORIES)
|
||||
find_file(
|
||||
HIP_RUNTIME_API_H hip_runtime_api.h
|
||||
PATHS ${HIP_INCLUDE_DIRECTORIES}
|
||||
PATH_SUFFIXES hip
|
||||
NO_DEFAULT_PATH REQUIRED)
|
||||
|
||||
## Generate the HIP pretty printers
|
||||
# Generate the HIP pretty printers
|
||||
add_custom_command(
|
||||
OUTPUT hip_ostream_ops.h
|
||||
COMMAND ${CMAKE_C_COMPILER} "$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
|
||||
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o hip_runtime_api.h.i
|
||||
BYPRODUCTS hip_runtime_api.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py
|
||||
-in hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
|
||||
COMMENT "Generating hip_ostream_ops.h..."
|
||||
COMMAND_EXPAND_LISTS)
|
||||
OUTPUT hip_ostream_ops.h
|
||||
COMMAND
|
||||
${CMAKE_C_COMPILER}
|
||||
"$<$<BOOL:${HIP_INCLUDE_DIRECTORIES}>:-I$<JOIN:${HIP_INCLUDE_DIRECTORIES},$<SEMICOLON>-I>>"
|
||||
-E "${HIP_RUNTIME_API_H}" -D__HIP_PLATFORM_HCC__=1 -D__HIP_ROCclr__=1 -o
|
||||
hip_runtime_api.h.i
|
||||
BYPRODUCTS hip_runtime_api.h.i
|
||||
COMMAND ${Python3_EXECUTABLE} ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py -in
|
||||
hip_runtime_api.h.i -out hip_ostream_ops.h > /dev/null
|
||||
DEPENDS ${PROJECT_SOURCE_DIR}/script/gen_ostream_ops.py "${HIP_RUNTIME_API_H}"
|
||||
COMMENT "Generating hip_ostream_ops.h..."
|
||||
COMMAND_EXPAND_LISTS)
|
||||
|
||||
set(GENERATED_SOURCES
|
||||
hip_ostream_ops.h
|
||||
hsa_prof_str.h
|
||||
hsa_ostream_ops.h
|
||||
hsa_prof_str.inline.h)
|
||||
set(GENERATED_SOURCES hip_ostream_ops.h hsa_prof_str.h hsa_ostream_ops.h
|
||||
hsa_prof_str.inline.h)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
# ROCProfiler API
|
||||
# ############################################################################################################################################
|
||||
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is
|
||||
# unavailable
|
||||
# ########################################################################################
|
||||
# PC sampling uses libpciaccess as a fallback if the debugfs ioctl is unavailable
|
||||
find_path(PCIACCESS_INCLUDE_DIR pciaccess.h REQUIRED)
|
||||
find_library(PCIACCESS_LIBRARIES pciaccess REQUIRED)
|
||||
|
||||
set(PUBLIC_HEADERS rocprofiler.h)
|
||||
|
||||
foreach(header ${PUBLIC_HEADERS})
|
||||
install(FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
|
||||
COMPONENT dev)
|
||||
endforeach()
|
||||
|
||||
install(DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
|
||||
install(
|
||||
FILES ${PROJECT_SOURCE_DIR}/include/rocprofiler/${header}
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
|
||||
COMPONENT dev)
|
||||
endforeach()
|
||||
|
||||
install(
|
||||
DIRECTORY ${PROJECT_SOURCE_DIR}/include/rocprofiler/v2
|
||||
DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}/${PROJECT_NAME}
|
||||
COMPONENT dev)
|
||||
|
||||
# Getting Source files for ROCProfiler, Hardware, HSA, Memory, Session, Counters, Utils
|
||||
file(GLOB ROCPROFILER_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp)
|
||||
|
||||
file(GLOB ROCPROFILER_PROFILER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
|
||||
file(GLOB ROCPROFILER_TRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
|
||||
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
|
||||
file(GLOB ROCPROFILER_PROFILER_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/session/profiler/profiler.cpp)
|
||||
file(GLOB ROCPROFILER_TRACER_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/session/tracer/*.cpp)
|
||||
file(GLOB ROCPROFILER_ROCTRACER_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/session/tracer/src/*.cpp)
|
||||
file(GLOB ROCPROFILER_ATT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/att/att.cpp)
|
||||
file(GLOB ROCPROFILER_CLASS_SRC_FILES ${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
|
||||
file(GLOB ROCPROFILER_CLASS_SRC_FILES
|
||||
${CMAKE_CURRENT_SOURCE_DIR}/rocprofiler_singleton.cpp)
|
||||
file(GLOB ROCPROFILER_SPM_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/session/spm/spm.cpp)
|
||||
|
||||
|
||||
set(CORE_HARDWARE_DIR ${PROJECT_SOURCE_DIR}/src/core/hardware)
|
||||
file(GLOB CORE_HARDWARE_SRC_FILES ${CORE_HARDWARE_DIR}/*.cpp)
|
||||
|
||||
@@ -180,148 +212,202 @@ file(GLOB CORE_COUNTERS_SAMPLER_SRC_FILES ${CORE_SESSION_DIR}/counters_sampler.c
|
||||
|
||||
file(GLOB CORE_COUNTERS_SRC_FILES ${PROJECT_BINARY_DIR}/src/api/*_counter.cpp)
|
||||
file(GLOB CORE_COUNTERS_PARENT_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/*.cpp)
|
||||
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
|
||||
file(GLOB CORE_COUNTERS_METRICS_SRC_FILES
|
||||
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/*.cpp)
|
||||
file(GLOB CORE_COUNTERS_MMIO_SRC_FILES ${PROJECT_SOURCE_DIR}/src/core/counters/mmio/*.cpp)
|
||||
|
||||
set(CORE_UTILS_DIR ${PROJECT_SOURCE_DIR}/src/utils)
|
||||
file(GLOB CORE_UTILS_SRC_FILES ${CORE_UTILS_DIR}/*.cpp)
|
||||
|
||||
set(CORE_PC_SAMPLING_DIR ${PROJECT_SOURCE_DIR}/src/pcsampler)
|
||||
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp ${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
|
||||
file(GLOB CORE_PC_SAMPLING_FILES ${CORE_PC_SAMPLING_DIR}/core/*.cpp
|
||||
${CORE_PC_SAMPLING_DIR}/gfxip/*.cpp ${CORE_PC_SAMPLING_DIR}/session/*.cpp)
|
||||
|
||||
|
||||
#### V1 Library
|
||||
# Compiling/Installing ROCProfiler API V1
|
||||
# V1 Library Compiling/Installing ROCProfiler API V1
|
||||
add_library(${ROCPROFILER_TARGET} SHARED ${OLD_LIB_SRC})
|
||||
set_target_properties(${ROCPROFILER_TARGET} PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION 1.0.0
|
||||
SOVERSION 1)
|
||||
set_target_properties(
|
||||
${ROCPROFILER_TARGET}
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION 1.0.0
|
||||
SOVERSION 1)
|
||||
|
||||
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
|
||||
target_compile_definitions(${ROCPROFILER_TARGET} PUBLIC AMD_INTERNAL_BUILD)
|
||||
target_include_directories(${ROCPROFILER_TARGET}
|
||||
PUBLIC
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
|
||||
PRIVATE
|
||||
${LIB_DIR} ${ROOT_DIR}
|
||||
${PROJECT_SOURCE_DIR}/include/rocprofiler)
|
||||
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 c stdc++)
|
||||
target_include_directories(
|
||||
${ROCPROFILER_TARGET}
|
||||
PUBLIC $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler>
|
||||
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${PROJECT_SOURCE_DIR}/include/rocprofiler)
|
||||
target_link_libraries(${ROCPROFILER_TARGET} PRIVATE ${AQLPROFILE_LIB}
|
||||
hsa-runtime64::hsa-runtime64 c stdc++)
|
||||
|
||||
get_target_property(ROCPROFILER_LIBRARY_V1_NAME ${ROCPROFILER_TARGET} NAME)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V1_VERSION ${ROCPROFILER_TARGET} VERSION)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V1_SOVERSION ${ROCPROFILER_TARGET} SOVERSION)
|
||||
|
||||
## Install libraries: Non versioned lib file in dev package
|
||||
## Skipping NameLink as it will be installed using symlinks
|
||||
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
|
||||
install ( TARGETS ${ROCPROFILER_TARGET} LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
|
||||
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
|
||||
# installed using symlinks
|
||||
install(
|
||||
TARGETS ${ROCPROFILER_TARGET}
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
TARGETS ${ROCPROFILER_TARGET}
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT asan)
|
||||
|
||||
#### V2 Library
|
||||
# Compiling/Installing ROCProfiler API
|
||||
add_library(rocprofiler-v2 SHARED
|
||||
${ROCPROFILER_SRC_FILES}
|
||||
${ROCPROFILER_CLASS_SRC_FILES}
|
||||
${ROCPROFILER_PROFILER_SRC_FILES}
|
||||
${ROCPROFILER_ATT_SRC_FILES}
|
||||
${CORE_HARDWARE_SRC_FILES}
|
||||
${CORE_HSA_SRC_FILES}
|
||||
${ROCPROFILER_SPM_SRC_FILES}
|
||||
${CORE_MEMORY_SRC_FILES}
|
||||
${CORE_SESSION_SRC_FILES}
|
||||
${CORE_FILTER_SRC_FILES}
|
||||
${CORE_DEVICE_PROFILING_SRC_FILES}
|
||||
${CORE_COUNTERS_SAMPLER_SRC_FILES}
|
||||
${CORE_COUNTERS_PARENT_SRC_FILES}
|
||||
${CORE_COUNTERS_METRICS_SRC_FILES}
|
||||
${CORE_COUNTERS_MMIO_SRC_FILES}
|
||||
${CORE_UTILS_SRC_FILES}
|
||||
${CORE_HSA_PACKETS_SRC_FILES}
|
||||
${CORE_HSA_QUEUES_SRC_FILES}
|
||||
${ROCPROFILER_TRACER_SRC_FILES}
|
||||
${ROCPROFILER_ROCTRACER_SRC_FILES}
|
||||
${GENERATED_SOURCES}
|
||||
${CORE_COUNTERS_SRC_FILES}
|
||||
${CORE_PC_SAMPLING_FILES})
|
||||
set_target_properties(rocprofiler-v2 PROPERTIES
|
||||
CXX_VISIBILITY_PRESET hidden
|
||||
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
OUTPUT_NAME rocprofiler64
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION ${PROJECT_VERSION}
|
||||
SOVERSION ${PROJECT_VERSION_MAJOR})
|
||||
# V2 Library Compiling/Installing ROCProfiler API
|
||||
add_library(
|
||||
rocprofiler-v2 SHARED
|
||||
${ROCPROFILER_SRC_FILES}
|
||||
${ROCPROFILER_CLASS_SRC_FILES}
|
||||
${ROCPROFILER_PROFILER_SRC_FILES}
|
||||
${ROCPROFILER_ATT_SRC_FILES}
|
||||
${CORE_HARDWARE_SRC_FILES}
|
||||
${CORE_HSA_SRC_FILES}
|
||||
${ROCPROFILER_SPM_SRC_FILES}
|
||||
${CORE_MEMORY_SRC_FILES}
|
||||
${CORE_SESSION_SRC_FILES}
|
||||
${CORE_FILTER_SRC_FILES}
|
||||
${CORE_DEVICE_PROFILING_SRC_FILES}
|
||||
${CORE_COUNTERS_SAMPLER_SRC_FILES}
|
||||
${CORE_COUNTERS_PARENT_SRC_FILES}
|
||||
${CORE_COUNTERS_METRICS_SRC_FILES}
|
||||
${CORE_COUNTERS_MMIO_SRC_FILES}
|
||||
${CORE_UTILS_SRC_FILES}
|
||||
${CORE_HSA_PACKETS_SRC_FILES}
|
||||
${CORE_HSA_QUEUES_SRC_FILES}
|
||||
${ROCPROFILER_TRACER_SRC_FILES}
|
||||
${ROCPROFILER_ROCTRACER_SRC_FILES}
|
||||
${GENERATED_SOURCES}
|
||||
${CORE_COUNTERS_SRC_FILES}
|
||||
${CORE_PC_SAMPLING_FILES})
|
||||
set_target_properties(
|
||||
rocprofiler-v2
|
||||
PROPERTIES CXX_VISIBILITY_PRESET hidden
|
||||
DEFINE_SYMBOL "ROCPROFILER_EXPORTS"
|
||||
LINK_DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
OUTPUT_NAME rocprofiler64
|
||||
LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib
|
||||
VERSION ${PROJECT_VERSION}
|
||||
SOVERSION ${PROJECT_VERSION_MAJOR})
|
||||
|
||||
target_compile_definitions(rocprofiler-v2
|
||||
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
|
||||
PRIVATE AMD_INTERNAL_BUILD
|
||||
PROF_API_IMPL HIP_PROF_HIP_API_STRING=1 __HIP_PLATFORM_AMD__=1)
|
||||
target_include_directories(rocprofiler-v2
|
||||
PUBLIC
|
||||
${HIP_INCLUDE_DIRECTORIES} ${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
|
||||
PRIVATE
|
||||
${LIB_DIR} ${ROOT_DIR}
|
||||
${CMAKE_CURRENT_BINARY_DIR}
|
||||
${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/tools)
|
||||
target_compile_definitions(
|
||||
rocprofiler-v2
|
||||
# As ROCR hsa_api_trace header file is not usable unless AMD_INTERNAL_BUILD is defined
|
||||
PRIVATE AMD_INTERNAL_BUILD PROF_API_IMPL HIP_PROF_HIP_API_STRING=1
|
||||
__HIP_PLATFORM_AMD__=1)
|
||||
target_include_directories(
|
||||
rocprofiler-v2
|
||||
PUBLIC ${HIP_INCLUDE_DIRECTORIES}
|
||||
${HSA_RUNTIME_INCLUDE_DIRECTORIES}
|
||||
$<BUILD_INTERFACE:${CMAKE_CURRENT_BINARY_DIR}>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include/rocprofiler/v2>
|
||||
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/include>
|
||||
PRIVATE ${LIB_DIR} ${ROOT_DIR} ${CMAKE_CURRENT_BINARY_DIR} ${PROJECT_SOURCE_DIR}
|
||||
${PROJECT_SOURCE_DIR}/tools)
|
||||
if(ASAN)
|
||||
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
|
||||
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined,-fsanitize=address)
|
||||
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa asan dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
|
||||
target_compile_options(rocprofiler-v2 PRIVATE -fsanitize=address)
|
||||
target_link_options(
|
||||
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
-Wl,--no-undefined,-fsanitize=address)
|
||||
target_link_libraries(
|
||||
rocprofiler-v2
|
||||
PRIVATE ${AQLPROFILE_LIB}
|
||||
hsa-runtime64::hsa-runtime64
|
||||
Threads::Threads
|
||||
atomic
|
||||
numa
|
||||
asan
|
||||
dl
|
||||
c
|
||||
stdc++
|
||||
stdc++fs
|
||||
amd_comgr
|
||||
${PCIACCESS_LIBRARIES})
|
||||
else()
|
||||
target_link_options(rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap -Wl,--no-undefined)
|
||||
target_link_libraries(rocprofiler-v2 PRIVATE ${AQLPROFILE_LIB} hsa-runtime64::hsa-runtime64 Threads::Threads atomic numa dl c stdc++ stdc++fs amd_comgr ${PCIACCESS_LIBRARIES})
|
||||
target_link_options(
|
||||
rocprofiler-v2 PRIVATE -Wl,--version-script=${CMAKE_CURRENT_SOURCE_DIR}/exportmap
|
||||
-Wl,--no-undefined)
|
||||
target_link_libraries(
|
||||
rocprofiler-v2
|
||||
PRIVATE ${AQLPROFILE_LIB}
|
||||
hsa-runtime64::hsa-runtime64
|
||||
Threads::Threads
|
||||
atomic
|
||||
numa
|
||||
dl
|
||||
c
|
||||
stdc++
|
||||
stdc++fs
|
||||
amd_comgr
|
||||
${PCIACCESS_LIBRARIES})
|
||||
endif()
|
||||
|
||||
get_target_property(ROCPROFILER_LIBRARY_V2_NAME rocprofiler-v2 OUTPUT_NAME)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V2_VERSION rocprofiler-v2 VERSION)
|
||||
get_target_property(ROCPROFILER_LIBRARY_V2_SOVERSION rocprofiler-v2 SOVERSION)
|
||||
|
||||
## Prepare Name Link SO files for V1 & V2 Libraries
|
||||
add_custom_command(TARGET rocprofiler-v2 POST_BUILD
|
||||
COMMAND ${CMAKE_COMMAND} -E rm -f ${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so
|
||||
)
|
||||
# Prepare Name Link SO files for V1 & V2 Libraries
|
||||
add_custom_command(
|
||||
TARGET rocprofiler-v2
|
||||
POST_BUILD
|
||||
COMMAND ${CMAKE_COMMAND} -E rm -f
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so
|
||||
COMMAND
|
||||
${CMAKE_COMMAND} -E create_symlink
|
||||
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION}
|
||||
${CMAKE_BINARY_DIR}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so)
|
||||
# Add custom target to trigger the create_symlink command
|
||||
add_custom_target(create_rocprofiler_lib DEPENDS rocprofiler-v2 ${ROCPROFILER_TARGET})
|
||||
|
||||
## Install libraries: Non versioned lib file in dev package
|
||||
## Skipping NameLink as it will be installed using symlinks
|
||||
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT runtime)
|
||||
install(TARGETS rocprofiler-v2 LIBRARY NAMELINK_SKIP DESTINATION ${CMAKE_INSTALL_LIBDIR} COMPONENT asan)
|
||||
# Install libraries: Non versioned lib file in dev package Skipping NameLink as it will be
|
||||
# installed using symlinks
|
||||
install(
|
||||
TARGETS rocprofiler-v2
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
TARGETS rocprofiler-v2
|
||||
LIBRARY NAMELINK_SKIP
|
||||
DESTINATION ${CMAKE_INSTALL_LIBDIR}
|
||||
COMPONENT asan)
|
||||
|
||||
## Installing NameLinks for V1 & V2
|
||||
## librocprofiler64.so links to V1 library
|
||||
## librocprofiler64v2.so links to V2 library
|
||||
install(CODE "execute_process( \
|
||||
# Installing NameLinks for V1 & V2 librocprofiler64.so links to V1 library
|
||||
# librocprofiler64v2.so links to V2 library
|
||||
install(
|
||||
CODE "execute_process( \
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink \
|
||||
lib${ROCPROFILER_LIBRARY_V1_NAME}.so.${ROCPROFILER_LIBRARY_V1_SOVERSION} \
|
||||
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V1_NAME}.so \
|
||||
)" COMPONENT dev
|
||||
)
|
||||
install(CODE "execute_process( \
|
||||
)"
|
||||
COMPONENT dev)
|
||||
install(
|
||||
CODE "execute_process( \
|
||||
COMMAND ${CMAKE_COMMAND} -E create_symlink \
|
||||
lib${ROCPROFILER_LIBRARY_V2_NAME}.so.${ROCPROFILER_LIBRARY_V2_SOVERSION} \
|
||||
${CMAKE_INSTALL_PREFIX}/lib/lib${ROCPROFILER_LIBRARY_V2_NAME}v2.so \
|
||||
)" COMPONENT dev
|
||||
)
|
||||
)"
|
||||
COMPONENT dev)
|
||||
|
||||
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
|
||||
configure_file(${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
|
||||
configure_file(
|
||||
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/basic_counters.xml
|
||||
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/basic_counters.xml COPYONLY)
|
||||
configure_file(
|
||||
${PROJECT_SOURCE_DIR}/src/core/counters/metrics/derived_counters.xml
|
||||
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters/derived_counters.xml COPYONLY)
|
||||
|
||||
install(DIRECTORY
|
||||
${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
install(
|
||||
DIRECTORY ${PROJECT_BINARY_DIR}/libexec/rocprofiler/counters
|
||||
DESTINATION ${CMAKE_INSTALL_LIBEXECDIR}/${PROJECT_NAME}
|
||||
USE_SOURCE_PERMISSIONS
|
||||
COMPONENT runtime)
|
||||
|
||||
# ############################################################################################################################################
|
||||
# ########################################################################################
|
||||
|
||||
@@ -74,13 +74,14 @@ class ROCProfiler_Singleton {
|
||||
// Device Profiling Session
|
||||
bool FindDeviceProfilingSession(rocprofiler_session_id_t session_id);
|
||||
rocprofiler_session_id_t CreateDeviceProfilingSession(std::vector<std::string> counters,
|
||||
int cpu_agent_index, int gpu_agent_index);
|
||||
int cpu_agent_index, int gpu_agent_index);
|
||||
void DestroyDeviceProfilingSession(rocprofiler_session_id_t session_id);
|
||||
DeviceProfileSession* GetDeviceProfilingSession(rocprofiler_session_id_t session_id);
|
||||
|
||||
|
||||
// Generic
|
||||
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind, rocprofiler_filter_data_t filter_data);
|
||||
bool CheckFilterData(rocprofiler_filter_kind_t filter_kind,
|
||||
rocprofiler_filter_data_t filter_data);
|
||||
uint64_t GetUniqueRecordId();
|
||||
uint64_t GetUniqueKernelDispatchId();
|
||||
|
||||
|
||||
@@ -11,8 +11,7 @@
|
||||
|
||||
// TODO(aelwazir): change that to adapt with our own Exception
|
||||
// What about outside exceptions and callbacks exceptions!!
|
||||
#define API_METHOD_PREFIX \
|
||||
try {
|
||||
#define API_METHOD_PREFIX try {
|
||||
#define API_METHOD_SUFFIX \
|
||||
} \
|
||||
catch (rocprofiler::Exception & e) { \
|
||||
|
||||
@@ -61,11 +61,11 @@ void check_status(hsa_status_t status) {
|
||||
namespace activity_prim {
|
||||
// PC sampling callback data
|
||||
struct pcsmp_callback_data_t {
|
||||
const char* kernel_name; // sampled kernel name
|
||||
void* data_buffer; // host buffer for tracing data
|
||||
uint64_t id; // sample id
|
||||
uint64_t cycle; // sample cycle
|
||||
uint64_t pc; // sample PC
|
||||
const char* kernel_name; // sampled kernel name
|
||||
void* data_buffer; // host buffer for tracing data
|
||||
uint64_t id; // sample id
|
||||
uint64_t cycle; // sample cycle
|
||||
uint64_t pc; // sample PC
|
||||
};
|
||||
|
||||
uint32_t activity_op = UINT32_MAX;
|
||||
@@ -74,9 +74,8 @@ std::atomic<activity_async_callback_t> activity_callback{NULL};
|
||||
rocprofiler_t* context = NULL;
|
||||
|
||||
hsa_status_t trace_data_cb(hsa_ven_amd_aqlprofile_info_type_t info_type,
|
||||
hsa_ven_amd_aqlprofile_info_data_t* info_data,
|
||||
void* data) {
|
||||
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*) data;
|
||||
hsa_ven_amd_aqlprofile_info_data_t* info_data, void* data) {
|
||||
const pcsmp_callback_data_t* pcsmp_data = (pcsmp_callback_data_t*)data;
|
||||
|
||||
activity_record_t record{};
|
||||
record.op = activity_op;
|
||||
@@ -96,11 +95,13 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
|
||||
hsa_agent_t agent{};
|
||||
hsa_status_t status = rocprofiler_get_agent(group.context, &agent);
|
||||
check_status(status);
|
||||
const rocprofiler::util::AgentInfo* agent_info = rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
|
||||
const rocprofiler::util::AgentInfo* agent_info =
|
||||
rocprofiler::util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
|
||||
|
||||
pcsmp_callback_data_t pcsmp_data{};
|
||||
pcsmp_data.kernel_name = (const char*)arg;
|
||||
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(agent_info, rocprofiler::TraceProfile::GetSize());
|
||||
pcsmp_data.data_buffer = rocprofiler::util::HsaRsrcFactory::Instance().AllocateSysMemory(
|
||||
agent_info, rocprofiler::TraceProfile::GetSize());
|
||||
status = rocprofiler_iterate_trace_data(group.context, trace_data_cb, &pcsmp_data);
|
||||
check_status(status);
|
||||
return false;
|
||||
@@ -110,8 +111,8 @@ bool context_handler(rocprofiler_group_t group, void* arg) {
|
||||
hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data, void* user_data,
|
||||
rocprofiler_group_t* group) {
|
||||
// context features
|
||||
const rocprofiler_feature_kind_t trace_kind =
|
||||
(rocprofiler_feature_kind_t)(ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
|
||||
const rocprofiler_feature_kind_t trace_kind = (rocprofiler_feature_kind_t)(
|
||||
ROCPROFILER_FEATURE_KIND_TRACE | ROCPROFILER_FEATURE_KIND_PCSMP_MOD);
|
||||
const uint32_t feature_count = 1;
|
||||
const uint32_t parameter_count = 1;
|
||||
rocprofiler_feature_t* features = new rocprofiler_feature_t[feature_count];
|
||||
@@ -131,8 +132,8 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
|
||||
properties.handler_arg = (void*)strdup(callback_data->kernel_name);
|
||||
|
||||
// Open profiling context
|
||||
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count,
|
||||
&context, 0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
|
||||
hsa_status_t status = rocprofiler_open(callback_data->agent, features, feature_count, &context,
|
||||
0 /*ROCPROFILER_MODE_SINGLEGROUP*/, &properties);
|
||||
check_status(status);
|
||||
|
||||
// Get group[0]
|
||||
@@ -141,7 +142,7 @@ hsa_status_t dispatch_callback(const rocprofiler_callback_data_t* callback_data,
|
||||
|
||||
return status;
|
||||
}
|
||||
} // namespace activity_prim
|
||||
} // namespace activity_prim
|
||||
|
||||
extern "C" {
|
||||
PUBLIC_API const char* GetOpName(uint32_t op) { return strdup("PCSAMPLE"); }
|
||||
@@ -152,7 +153,8 @@ PUBLIC_API bool RemoveApiCallback(uint32_t op) { return true; }
|
||||
|
||||
PUBLIC_API bool InitActivityCallback(void* callback, void* arg) {
|
||||
activity_prim::activity_arg = arg;
|
||||
activity_prim::activity_callback.store((activity_async_callback_t)callback, std::memory_order_release);
|
||||
activity_prim::activity_callback.store((activity_async_callback_t)callback,
|
||||
std::memory_order_release);
|
||||
|
||||
rocprofiler_queue_callbacks_t queue_callbacks{};
|
||||
queue_callbacks.dispatch = activity_prim::dispatch_callback;
|
||||
@@ -191,11 +193,8 @@ struct evt_cb_entry_t {
|
||||
};
|
||||
evt_cb_entry_t evt_cb_table[HSA_EVT_ID_NUMBER];
|
||||
|
||||
hsa_status_t codeobj_evt_callback(
|
||||
rocprofiler_hsa_cb_id_t id,
|
||||
const rocprofiler_hsa_callback_data_t* cb_data,
|
||||
void* arg)
|
||||
{
|
||||
hsa_status_t codeobj_evt_callback(rocprofiler_hsa_cb_id_t id,
|
||||
const rocprofiler_hsa_callback_data_t* cb_data, void* arg) {
|
||||
const auto evt = evt_cb_table[id].get();
|
||||
activity_rtapi_callback_t evt_callback = (activity_rtapi_callback_t)evt.first;
|
||||
if (evt_callback != NULL) evt_callback(ACTIVITY_DOMAIN_HSA_EVT, id, cb_data, evt.second);
|
||||
|
||||
@@ -19,4 +19,4 @@ enum hsa_evt_id_t {
|
||||
// HSA EVT callback data type
|
||||
typedef rocprofiler_hsa_callback_data_t hsa_evt_data_t;
|
||||
|
||||
#endif // _SRC_CORE_ACTIVITY_H
|
||||
#endif // _SRC_CORE_ACTIVITY_H
|
||||
|
||||
@@ -27,7 +27,7 @@ THE SOFTWARE.
|
||||
|
||||
#include <hsa/hsa.h>
|
||||
#include <hsa/hsa_ext_amd.h>
|
||||
#include <unistd.h> // usleep
|
||||
#include <unistd.h> // usleep
|
||||
#include <atomic>
|
||||
#include <list>
|
||||
#include <map>
|
||||
@@ -91,8 +91,7 @@ class Group {
|
||||
barrier_signal_{},
|
||||
dispatch_signal_{},
|
||||
orig_signal_{},
|
||||
record_{}
|
||||
{}
|
||||
record_{} {}
|
||||
|
||||
void Insert(const profile_info_t& info) {
|
||||
const rocprofiler_feature_kind_t kind = info.rinfo->kind;
|
||||
@@ -110,11 +109,10 @@ class Group {
|
||||
}
|
||||
|
||||
hsa_status_t Finalize(const bool is_concurrent = false) {
|
||||
hsa_status_t status = pmc_profile_.Finalize(start_vector_, stop_vector_,
|
||||
read_vector_, is_concurrent);
|
||||
hsa_status_t status =
|
||||
pmc_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
|
||||
if (status == HSA_STATUS_SUCCESS) {
|
||||
status = trace_profile_.Finalize(start_vector_, stop_vector_,
|
||||
read_vector_, is_concurrent);
|
||||
status = trace_profile_.Finalize(start_vector_, stop_vector_, read_vector_, is_concurrent);
|
||||
}
|
||||
if (status == HSA_STATUS_SUCCESS) {
|
||||
if (!pmc_profile_.Empty()) ++n_profiles_;
|
||||
@@ -137,32 +135,20 @@ class Group {
|
||||
Context* GetContext() { return context_; }
|
||||
uint32_t GetIndex() const { return index_; }
|
||||
|
||||
void SetBarrierSignal(const hsa_signal_t &signal) {
|
||||
barrier_signal_ = signal;
|
||||
}
|
||||
hsa_signal_t& GetBarrierSignal() {
|
||||
return barrier_signal_;
|
||||
}
|
||||
void SetDispatchSignal(const hsa_signal_t &signal) {
|
||||
dispatch_signal_ = signal;
|
||||
}
|
||||
hsa_signal_t& GetDispatchSignal() {
|
||||
return dispatch_signal_;
|
||||
}
|
||||
void SetOrigSignal(const hsa_signal_t &signal) {
|
||||
orig_signal_ = signal;
|
||||
}
|
||||
const hsa_signal_t& GetOrigSignal() const {
|
||||
return orig_signal_;
|
||||
}
|
||||
rocprofiler_dispatch_record_t* GetRecord() {
|
||||
return &record_;
|
||||
}
|
||||
void SetBarrierSignal(const hsa_signal_t& signal) { barrier_signal_ = signal; }
|
||||
hsa_signal_t& GetBarrierSignal() { return barrier_signal_; }
|
||||
void SetDispatchSignal(const hsa_signal_t& signal) { dispatch_signal_ = signal; }
|
||||
hsa_signal_t& GetDispatchSignal() { return dispatch_signal_; }
|
||||
void SetOrigSignal(const hsa_signal_t& signal) { orig_signal_ = signal; }
|
||||
const hsa_signal_t& GetOrigSignal() const { return orig_signal_; }
|
||||
rocprofiler_dispatch_record_t* GetRecord() { return &record_; }
|
||||
|
||||
atomic_refs_t* AtomicRefsCount() { return reinterpret_cast<atomic_refs_t*>(&refs_); }
|
||||
void ResetRefsCount() { AtomicRefsCount()->store(n_profiles_, std::memory_order_release); }
|
||||
void IncrRefsCount() { AtomicRefsCount()->fetch_add(1, std::memory_order_acq_rel); }
|
||||
uint32_t FetchDecrRefsCount() { return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel); }
|
||||
uint32_t FetchDecrRefsCount() {
|
||||
return AtomicRefsCount()->fetch_sub(1, std::memory_order_acq_rel);
|
||||
}
|
||||
|
||||
private:
|
||||
PmcProfile pmc_profile_;
|
||||
@@ -188,23 +174,23 @@ class Context {
|
||||
public:
|
||||
typedef std::map<std::string, rocprofiler_feature_t*> info_map_t;
|
||||
|
||||
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
|
||||
{
|
||||
static void Create(Context* obj, const util::AgentInfo* agent_info, Queue* queue,
|
||||
rocprofiler_feature_t* info, const uint32_t info_count,
|
||||
rocprofiler_handler_t handler, void* handler_arg) {
|
||||
new (obj) Context(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
}
|
||||
|
||||
static void Release(Context* obj) { obj->Destruct(); }
|
||||
|
||||
static Context* Create(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
|
||||
{
|
||||
static Context* Create(const util::AgentInfo* agent_info, Queue* queue,
|
||||
rocprofiler_feature_t* info, const uint32_t info_count,
|
||||
rocprofiler_handler_t handler, void* handler_arg) {
|
||||
Context* obj = new Context(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
|
||||
try {
|
||||
obj->Construct(agent_info, queue, info, info_count, handler, handler_arg);
|
||||
} catch(...) {
|
||||
} catch (...) {
|
||||
delete obj;
|
||||
obj = NULL;
|
||||
std::cerr << "Error: Context Create failed" << std::endl;
|
||||
@@ -213,7 +199,9 @@ class Context {
|
||||
return obj;
|
||||
}
|
||||
|
||||
static void Destroy(Context* obj) { if (obj != NULL) delete obj; }
|
||||
static void Destroy(Context* obj) {
|
||||
if (obj != NULL) delete obj;
|
||||
}
|
||||
|
||||
void Reset(const uint32_t& group_index) { set_[group_index].ResetRefsCount(); }
|
||||
|
||||
@@ -293,8 +281,10 @@ class Context {
|
||||
hsa_rsrc_->SignalWaitRestore(tuple.completion_signal, 1);
|
||||
// Restore other signals
|
||||
RestoreSignals(tuple);
|
||||
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector)) rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
|
||||
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(), NULL};
|
||||
for (rocprofiler_feature_t* rinfo : *(tuple.info_vector))
|
||||
rinfo->data.kind = ROCPROFILER_DATA_KIND_UNINIT;
|
||||
callback_data_t callback_data{tuple.profile, tuple.info_vector, tuple.info_vector->size(),
|
||||
NULL};
|
||||
const hsa_status_t status =
|
||||
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, DataCallback, &callback_data);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
|
||||
@@ -310,7 +300,8 @@ class Context {
|
||||
if (expr) {
|
||||
auto it = info_map_.find(name);
|
||||
if (it == info_map_.end())
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "metric '" << name << "', rocprofiler info is not found " << this);
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"metric '" << name << "', rocprofiler info is not found " << this);
|
||||
rocprofiler_feature_t* info = it->second;
|
||||
info->data.result_double = expr->Eval(args);
|
||||
info->data.kind = ROCPROFILER_DATA_KIND_DOUBLE;
|
||||
@@ -324,7 +315,7 @@ class Context {
|
||||
for (auto& tuple : profile_vector) {
|
||||
if (pcsmp_mode_) const_cast<profile_t*>(tuple.profile)->event_count = UINT32_MAX;
|
||||
const hsa_status_t status =
|
||||
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
|
||||
api_->hsa_ven_amd_aqlprofile_iterate_data(tuple.profile, callback, data);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "context iterate data failed");
|
||||
}
|
||||
}
|
||||
@@ -342,7 +333,10 @@ class Context {
|
||||
|
||||
hsa_agent_t GetAgent() const { return agent_; }
|
||||
Group* GetGroup(const uint32_t& index) { return &set_[index]; }
|
||||
rocprofiler_handler_t GetHandler(void** arg) const { *arg = handler_arg_; return handler_; }
|
||||
rocprofiler_handler_t GetHandler(void** arg) const {
|
||||
*arg = handler_arg_;
|
||||
return handler_;
|
||||
}
|
||||
|
||||
// Concurrent profiling mode
|
||||
static bool k_concurrent_;
|
||||
@@ -358,8 +352,7 @@ class Context {
|
||||
metrics_(NULL),
|
||||
handler_(handler),
|
||||
handler_arg_(handler_arg),
|
||||
pcsmp_mode_(false)
|
||||
{}
|
||||
pcsmp_mode_(false) {}
|
||||
|
||||
~Context() { Destruct(); }
|
||||
|
||||
@@ -375,8 +368,7 @@ class Context {
|
||||
}
|
||||
|
||||
void Construct(const util::AgentInfo* agent_info, Queue* queue, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg)
|
||||
{
|
||||
const uint32_t info_count, rocprofiler_handler_t handler, void* handler_arg) {
|
||||
if (info_count == 0) {
|
||||
set_.push_back(Group(agent_info_, this, 0));
|
||||
return;
|
||||
@@ -386,9 +378,11 @@ class Context {
|
||||
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
|
||||
|
||||
if (Initialize(info, info_count) == false) {
|
||||
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n"); fflush(stdout);
|
||||
fprintf(stdout, "\nInput metrics out of HW limit. Proposed metrics group set:\n");
|
||||
fflush(stdout);
|
||||
MetricsGroupSet(agent_info, info, info_count).Print(stdout);
|
||||
fprintf(stdout, "\n"); fflush(stdout);
|
||||
fprintf(stdout, "\n");
|
||||
fflush(stdout);
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Metrics list exceeds HW limits");
|
||||
}
|
||||
Finalize();
|
||||
@@ -420,8 +414,8 @@ class Context {
|
||||
info_map_[name] = info;
|
||||
auto ret = metrics_map_.insert({name, NULL});
|
||||
if (!ret.second)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name
|
||||
<< "' is registered more then once");
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"input metric '" << name << "' is registered more then once");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -437,8 +431,9 @@ class Context {
|
||||
if (kind == ROCPROFILER_FEATURE_KIND_METRIC) { // Processing metrics features
|
||||
const Metric* metric = metrics_->Get(name);
|
||||
if (metric == NULL)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not supported on this hardware: "
|
||||
<< agent_info_->name);
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"input metric '"
|
||||
<< name << "' is not supported on this hardware: " << agent_info_->name);
|
||||
#if 0
|
||||
std::cout << " " << name << (metric->GetExpr() ? " = " + metric->GetExpr()->String() : " counter") << std::endl;
|
||||
#endif
|
||||
@@ -493,9 +488,9 @@ class Context {
|
||||
info->kind = ROCPROFILER_FEATURE_KIND_TRACE;
|
||||
|
||||
const event_t* event = NULL;
|
||||
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
|
||||
if (kind & ROCPROFILER_FEATURE_KIND_PCSMP_MOD) { // PC sampling
|
||||
pcsmp_mode_ = true;
|
||||
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
|
||||
} else if (kind & ROCPROFILER_FEATURE_KIND_SPM_MOD) { // SPM trace
|
||||
const Metric* metric = metrics_->Get(name);
|
||||
if (metric == NULL)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
|
||||
@@ -559,14 +554,14 @@ class Context {
|
||||
const bool trace_local = TraceProfile::IsLocal();
|
||||
util::HsaRsrcFactory* hsa_rsrc = &util::HsaRsrcFactory::Instance();
|
||||
if (sample_id == 0) {
|
||||
const uint32_t output_buffer_size = profile->output_buffer.size;
|
||||
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
|
||||
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
|
||||
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size) :
|
||||
calloc(output_buffer_size64, sizeof(uint64_t));
|
||||
rinfo->data.result_bytes.size = output_buffer_size;
|
||||
rinfo->data.result_bytes.ptr = ptr;
|
||||
callback_data->ptr = reinterpret_cast<char*>(ptr);
|
||||
const uint32_t output_buffer_size = profile->output_buffer.size;
|
||||
const uint32_t output_buffer_size64 = profile->output_buffer.size / sizeof(uint64_t);
|
||||
const util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(profile->agent);
|
||||
void* ptr = (trace_local) ? hsa_rsrc->AllocateSysMemory(agent_info, output_buffer_size)
|
||||
: calloc(output_buffer_size64, sizeof(uint64_t));
|
||||
rinfo->data.result_bytes.size = output_buffer_size;
|
||||
rinfo->data.result_bytes.ptr = ptr;
|
||||
callback_data->ptr = reinterpret_cast<char*>(ptr);
|
||||
}
|
||||
char* result_bytes_ptr = reinterpret_cast<char*>(rinfo->data.result_bytes.ptr);
|
||||
const char* end = result_bytes_ptr + rinfo->data.result_bytes.size;
|
||||
@@ -577,8 +572,10 @@ class Context {
|
||||
char* dest = ptr + sizeof(*header);
|
||||
|
||||
if ((dest + size) >= end) {
|
||||
if (dest < end) size = end - dest;
|
||||
else EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
|
||||
if (dest < end)
|
||||
size = end - dest;
|
||||
else
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Trace data out of output buffer");
|
||||
}
|
||||
|
||||
bool suc = true;
|
||||
@@ -593,7 +590,9 @@ class Context {
|
||||
rinfo->data.result_bytes.instance_count = sample_id + 1;
|
||||
rinfo->data.kind = ROCPROFILER_DATA_KIND_BYTES;
|
||||
} else
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src << ") size(" << size << ")");
|
||||
EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"Agent Memcpy failed, dst(" << (void*)dest << ") src(" << (void*)src
|
||||
<< ") size(" << size << ")");
|
||||
} else {
|
||||
if (sample_id == 0) {
|
||||
rinfo->data.result_bytes.ptr = profile->output_buffer.ptr;
|
||||
@@ -647,8 +646,7 @@ class Context {
|
||||
bool pcsmp_mode_;
|
||||
};
|
||||
|
||||
#define CONTEXT_INSTANTIATE() \
|
||||
bool rocprofiler::Context::k_concurrent_ = false;
|
||||
#define CONTEXT_INSTANTIATE() bool rocprofiler::Context::k_concurrent_ = false;
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
|
||||
@@ -31,7 +31,7 @@ THE SOFTWARE.
|
||||
|
||||
namespace rocprofiler {
|
||||
class ContextPool {
|
||||
public:
|
||||
public:
|
||||
typedef uint64_t index_t;
|
||||
typedef std::mutex mutex_t;
|
||||
|
||||
@@ -41,16 +41,12 @@ class ContextPool {
|
||||
std::atomic<bool> completed;
|
||||
};
|
||||
|
||||
static ContextPool* Create(
|
||||
uint32_t num_entries,
|
||||
uint32_t payload_bytes,
|
||||
const util::AgentInfo* agent_info,
|
||||
rocprofiler_feature_t* info,
|
||||
const uint32_t info_count,
|
||||
rocprofiler_pool_handler_t handler,
|
||||
void* handler_arg)
|
||||
{
|
||||
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count, handler, handler_arg);
|
||||
static ContextPool* Create(uint32_t num_entries, uint32_t payload_bytes,
|
||||
const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count, rocprofiler_pool_handler_t handler,
|
||||
void* handler_arg) {
|
||||
ContextPool* obj = new ContextPool(num_entries, payload_bytes, agent_info, info, info_count,
|
||||
handler, handler_arg);
|
||||
if (obj == NULL) EXC_RAISING(HSA_STATUS_ERROR, "allocation error");
|
||||
return obj;
|
||||
}
|
||||
@@ -61,18 +57,18 @@ class ContextPool {
|
||||
if (constructed_ == false) {
|
||||
Construct(agent_info_, info_, info_count_);
|
||||
}
|
||||
const index_t write_index = write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
|
||||
const index_t write_index =
|
||||
write_index_.fetch_add(entry_size_bytes_, std::memory_order_relaxed);
|
||||
while (write_index >= (read_index_.load(std::memory_order_acquire) + array_size_bytes_)) {
|
||||
check_completed();
|
||||
std::this_thread::yield();
|
||||
}
|
||||
entry_t* entry = GetPoolEntry(write_index, pool_entry);
|
||||
if (entry->completed.load(std::memory_order_relaxed) != false) EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
|
||||
if (entry->completed.load(std::memory_order_relaxed) != false)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Corrupted pool entry");
|
||||
}
|
||||
|
||||
void Flush() {
|
||||
check_completed();
|
||||
}
|
||||
void Flush() { check_completed(); }
|
||||
#if 0
|
||||
template <class F>
|
||||
F for_each(const F& f_p) {
|
||||
@@ -95,7 +91,7 @@ class ContextPool {
|
||||
return f;
|
||||
}
|
||||
#endif
|
||||
private:
|
||||
private:
|
||||
static unsigned aligned64(const unsigned& size) { return (size + 0x3f) & ~0x3fu; }
|
||||
|
||||
static bool context_handler(rocprofiler_group_t group, void* arg) {
|
||||
@@ -105,45 +101,41 @@ class ContextPool {
|
||||
return true;
|
||||
}
|
||||
|
||||
ContextPool(
|
||||
uint32_t num_entries,
|
||||
uint32_t payload_bytes,
|
||||
const util::AgentInfo* agent_info,
|
||||
rocprofiler_feature_t* info,
|
||||
const uint32_t info_count,
|
||||
rocprofiler_pool_handler_t pool_handler,
|
||||
void* pool_handler_arg
|
||||
) :
|
||||
payload_off_(aligned64(sizeof(entry_t))),
|
||||
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
|
||||
array_size_bytes_(entry_size_bytes_ * num_entries),
|
||||
array_(NULL),
|
||||
read_index_(0),
|
||||
write_index_(0),
|
||||
sync_flag_(false),
|
||||
ContextPool(uint32_t num_entries, uint32_t payload_bytes, const util::AgentInfo* agent_info,
|
||||
rocprofiler_feature_t* info, const uint32_t info_count,
|
||||
rocprofiler_pool_handler_t pool_handler, void* pool_handler_arg)
|
||||
: payload_off_(aligned64(sizeof(entry_t))),
|
||||
entry_size_bytes_(payload_off_ + aligned64(payload_bytes)),
|
||||
array_size_bytes_(entry_size_bytes_ * num_entries),
|
||||
array_(NULL),
|
||||
read_index_(0),
|
||||
write_index_(0),
|
||||
sync_flag_(false),
|
||||
|
||||
agent_info_(agent_info),
|
||||
info_(info),
|
||||
info_count_(info_count),
|
||||
pool_handler_(pool_handler),
|
||||
pool_handler_arg_(pool_handler_arg),
|
||||
constructed_(false)
|
||||
{}
|
||||
agent_info_(agent_info),
|
||||
info_(info),
|
||||
info_count_(info_count),
|
||||
pool_handler_(pool_handler),
|
||||
pool_handler_arg_(pool_handler_arg),
|
||||
constructed_(false) {}
|
||||
|
||||
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info, const uint32_t info_count) {
|
||||
void Construct(const util::AgentInfo* agent_info, rocprofiler_feature_t* info,
|
||||
const uint32_t info_count) {
|
||||
std::lock_guard<mutex_t> lck(mutex_);
|
||||
|
||||
if (constructed_ == false) {
|
||||
array_data_ = (char*) malloc(array_size_bytes_ + 0x3f);
|
||||
array_data_ = (char*)malloc(array_size_bytes_ + 0x3f);
|
||||
array_ = reinterpret_cast<char*>(((intptr_t)array_data_ + 0x3f) >> 6 << 6);
|
||||
if (((intptr_t)array_ & 0x3f) != 0) EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
|
||||
if (((intptr_t)array_ & 0x3f) != 0)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "Pool array is not aligned");
|
||||
memset(array_, 0, array_size_bytes_);
|
||||
|
||||
const char* end = array_ + array_size_bytes_;
|
||||
for (char* ptr = array_; ptr < end; ptr += entry_size_bytes_) {
|
||||
entry_t* entry = reinterpret_cast<entry_t*>(ptr);
|
||||
entry->pool = this;
|
||||
entry->context = Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
|
||||
entry->context =
|
||||
Context::Create(agent_info, NULL, info, info_count, ContextPool::context_handler, ptr);
|
||||
}
|
||||
|
||||
constructed_ = true;
|
||||
@@ -175,7 +167,7 @@ class ContextPool {
|
||||
if (sync_flag_.test_and_set(std::memory_order_acquire) == false) {
|
||||
index_t read_index = read_index_.load(std::memory_order_relaxed);
|
||||
const index_t write_index = write_index_.load(std::memory_order_relaxed);
|
||||
while(read_index < write_index) {
|
||||
while (read_index < write_index) {
|
||||
rocprofiler_pool_entry_t pool_entry{};
|
||||
entry_t* entry = GetPoolEntry(read_index, &pool_entry);
|
||||
if (entry->completed.load(std::memory_order_acquire) == true) {
|
||||
|
||||
@@ -1,8 +1,7 @@
|
||||
#ifndef _CORE_TIMER_H_
|
||||
#define _CORE_TIMER_H_
|
||||
|
||||
template <int Size>
|
||||
class CoreTimer {
|
||||
template <int Size> class CoreTimer {
|
||||
CoreTimer() {
|
||||
index_ = 0;
|
||||
freq_in_100mhz_ = MeasureTSCFreqHz();
|
||||
@@ -20,15 +19,15 @@ class CoreTimer {
|
||||
// AMD Linux timing
|
||||
unsigned int unused;
|
||||
n = __rdtscp(&unused);
|
||||
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
|
||||
data_[index_] = 10 * n / freq_in_100mhz_; // unit is ns
|
||||
index_ += 1;
|
||||
}
|
||||
|
||||
double Print()
|
||||
double Print()
|
||||
|
||||
private:
|
||||
// timer data
|
||||
double data_[Size];
|
||||
private :
|
||||
// timer data
|
||||
double data_[Size];
|
||||
// data index
|
||||
uint32_t index_;
|
||||
// frequency
|
||||
@@ -40,20 +39,20 @@ class CoreTimer {
|
||||
clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
|
||||
return uint64_t(ts.tv_sec) * 1000000 + ts.tv_nsec / 1000;
|
||||
}
|
||||
|
||||
|
||||
uint64_t CoreTimer::MeasureTSCFreqHz() {
|
||||
// Make a coarse interval measurement of TSC ticks for 1 gigacycles.
|
||||
unsigned int unused;
|
||||
uint64_t tscTicksEnd;
|
||||
|
||||
|
||||
uint64_t coarseBeginUs = CoarseTimestampUs();
|
||||
uint64_t tscTicksBegin = __rdtscp(&unused);
|
||||
do {
|
||||
tscTicksEnd = __rdtscp(&unused);
|
||||
} while (tscTicksEnd - tscTicksBegin < 1000000000);
|
||||
|
||||
|
||||
uint64_t coarseEndUs = CoarseTimestampUs();
|
||||
|
||||
|
||||
// Compute the TSC frequency and round to nearest 100MHz.
|
||||
uint64_t coarseIntervalNs = (coarseEndUs - coarseBeginUs) * 1000;
|
||||
uint64_t tscIntervalTicks = tscTicksEnd - tscTicksBegin;
|
||||
@@ -61,4 +60,4 @@ class CoreTimer {
|
||||
}
|
||||
};
|
||||
|
||||
#endif // _CORE_TIMER_H_
|
||||
#endif // _CORE_TIMER_H_
|
||||
|
||||
@@ -27,8 +27,7 @@ namespace Counter {
|
||||
|
||||
static std::atomic<uint64_t> COUNTER_COUNTER{0};
|
||||
|
||||
DerivedCounter::DerivedCounter(std::string name, std::string description,
|
||||
std::string gpu_name)
|
||||
DerivedCounter::DerivedCounter(std::string name, std::string description, std::string gpu_name)
|
||||
: Counter(name, description, gpu_name) {
|
||||
metric_id_ = COUNTER_COUNTER.fetch_add(1, std::memory_order_release);
|
||||
addCounterToCounterMap();
|
||||
@@ -41,20 +40,17 @@ DerivedCounter::~DerivedCounter() {
|
||||
|
||||
uint64_t DerivedCounter::getMetricId() { return metric_id_; }
|
||||
|
||||
std::map<uint64_t, BasicCounter*> *DerivedCounter::getAllCounters() {
|
||||
return &counters_;
|
||||
}
|
||||
std::map<uint64_t, BasicCounter*>* DerivedCounter::getAllCounters() { return &counters_; }
|
||||
|
||||
BasicCounter *DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
|
||||
BasicCounter* DerivedCounter::getBasicCounterFromDerived(uint64_t counter_id) {
|
||||
return counters_[counter_id];
|
||||
}
|
||||
|
||||
void DerivedCounter::addBasicCounter(uint64_t counter_id,
|
||||
BasicCounter *counter) {
|
||||
void DerivedCounter::addBasicCounter(uint64_t counter_id, BasicCounter* counter) {
|
||||
counters_.emplace(counter_id, counter);
|
||||
}
|
||||
|
||||
@DERIVED_XML_PARSE_RESULT@
|
||||
@DERIVED_XML_PARSE_RESULT @
|
||||
|
||||
} // namespace Counter
|
||||
|
||||
|
||||
@@ -39,8 +39,7 @@ namespace Counter {
|
||||
class DerivedCounter : Counter {
|
||||
public:
|
||||
std::function<uint64_t()> evaluate_metric;
|
||||
DerivedCounter(std::string name, std::string description,
|
||||
std::string gpu_name);
|
||||
DerivedCounter(std::string name, std::string description, std::string gpu_name);
|
||||
~DerivedCounter();
|
||||
|
||||
uint64_t getMetricId();
|
||||
|
||||
@@ -108,7 +108,7 @@ bool metrics::ExtractMetricEvents(
|
||||
// adding result object for derived metric
|
||||
std::lock_guard<std::mutex> lock(extract_metric_events_lock);
|
||||
|
||||
if(metric_names[i].compare("KERNEL_DURATION")==0) {
|
||||
if (metric_names[i].compare("KERNEL_DURATION") == 0) {
|
||||
if (results_map.find(metric_names[i]) == results_map.end()) {
|
||||
results_map[metric_names[i]] = new results_t(metric_names[i], {}, xcc_count);
|
||||
}
|
||||
@@ -192,7 +192,7 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
|
||||
auto it = results_map.find(metric->GetName());
|
||||
if (it == results_map.end()) rocprofiler::fatal("metric results not found ");
|
||||
results_t* res = it->second;
|
||||
if(metric->GetName().compare("KERNEL_DURATION") == 0) {
|
||||
if (metric->GetName().compare("KERNEL_DURATION") == 0) {
|
||||
res->val_double = kernel_duration;
|
||||
continue;
|
||||
}
|
||||
@@ -206,7 +206,8 @@ bool metrics::GetMetricsData(std::map<std::string, results_t*>& results_map,
|
||||
void metrics::GetCountersAndMetricResultsByXcc(uint32_t xcc_index,
|
||||
std::vector<results_t*>& results_list,
|
||||
std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration) {
|
||||
std::vector<const Metric*>& metrics_list,
|
||||
uint64_t kernel_duration) {
|
||||
for (auto it = results_list.begin(); it != results_list.end(); it++) {
|
||||
(*it)->val_double =
|
||||
(*it)->xcc_vals[xcc_index]; // set val_double to hold value for specific xcc
|
||||
|
||||
@@ -35,10 +35,10 @@ namespace rocprofiler {
|
||||
|
||||
typedef std::vector<double> xcc_results_t;
|
||||
|
||||
class results_t{
|
||||
public:
|
||||
results_t(std::string in_name, event_t in_event, uint32_t xcc_count):
|
||||
name(in_name), val_double(0), event(in_event) {
|
||||
class results_t {
|
||||
public:
|
||||
results_t(std::string in_name, event_t in_event, uint32_t xcc_count)
|
||||
: name(in_name), val_double(0), event(in_event) {
|
||||
xcc_vals.resize(xcc_count);
|
||||
std::fill(xcc_vals.begin(), xcc_vals.end(), 0);
|
||||
}
|
||||
@@ -78,8 +78,9 @@ bool GetMetricsData(std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
|
||||
|
||||
void GetCountersAndMetricResultsByXcc(uint32_t xcc_index, std::vector<results_t*>& results_list,
|
||||
std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list, uint64_t kernel_duration = 0);
|
||||
std::map<std::string, results_t*>& results_map,
|
||||
std::vector<const Metric*>& metrics_list,
|
||||
uint64_t kernel_duration = 0);
|
||||
|
||||
} // namespace metrics
|
||||
} // namespace rocprofiler
|
||||
|
||||
@@ -45,7 +45,7 @@ THE SOFTWARE.
|
||||
do { \
|
||||
std::ostringstream oss; \
|
||||
oss << __FUNCTION__ << "(), " << stream; \
|
||||
throw rocprofiler::util::exception(error, oss.str()); \
|
||||
throw rocprofiler::util::exception(error, oss.str()); \
|
||||
} while (0)
|
||||
|
||||
#define AQL_EXC_RAISING(error, stream) \
|
||||
|
||||
Executable → Regular
+3
-6
@@ -221,14 +221,11 @@ class MetricsDict {
|
||||
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
|
||||
|
||||
std::unordered_set<std::string> supported_agent_names = {
|
||||
"gfx906",
|
||||
"gfx908",
|
||||
"gfx906", "gfx908",
|
||||
"gfx90a", // Vega
|
||||
"gfx940",
|
||||
"gfx941",
|
||||
"gfx940", "gfx941",
|
||||
"gfx942", // Mi300
|
||||
"gfx1030",
|
||||
"gfx1031",
|
||||
"gfx1030", "gfx1031",
|
||||
"gfx1032", // Navi2x
|
||||
"gfx1100",
|
||||
"gfx1101" // Navi3x
|
||||
|
||||
@@ -17,8 +17,8 @@ class DFPerfMonMI200 : public PerfMon {
|
||||
DFPerfMonMI200(const Agent::AgentInfo& info);
|
||||
~DFPerfMonMI200();
|
||||
void Start() override;
|
||||
void Stop() {};
|
||||
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {};
|
||||
void Stop(){};
|
||||
void Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){};
|
||||
void SetCounterNames(std::vector<std::string>& counter_names);
|
||||
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::DF_PERFMON; }
|
||||
|
||||
@@ -31,7 +31,6 @@ class DFPerfMonMI200 : public PerfMon {
|
||||
uint64_t GetFicaNodeOutboundBw(uint32_t ficaa_val);
|
||||
|
||||
|
||||
|
||||
private:
|
||||
mmio::DFPerfmonMMIO* mmio_;
|
||||
static std::mutex mutex_; // should be an MMIO member
|
||||
|
||||
@@ -13,12 +13,12 @@ PciePerfMonMI200::~PciePerfMonMI200() {
|
||||
mmio::MMIOManager::DestroyMMIOInstance(dynamic_cast<mmio::MMIO*>(mmio_));
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value){
|
||||
void PciePerfMonMI200::writeRegister(uint32_t reg_offset, uint32_t value) {
|
||||
// mmio or ioctl approaches
|
||||
mmio_->RegisterWriteAPI(reg_offset, value);
|
||||
mmio_->RegisterWriteAPI(reg_offset, value);
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value){
|
||||
void PciePerfMonMI200::readRegister(uint32_t reg_offset, uint32_t& value) {
|
||||
// mmio or ioctl approaches
|
||||
mmio_->RegisterReadAPI(reg_offset, value);
|
||||
}
|
||||
@@ -35,44 +35,40 @@ void PciePerfMonMI200::SetCounterNames(std::vector<std::string>& counter_names)
|
||||
}
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Start(){
|
||||
void PciePerfMonMI200::Start() {
|
||||
// TODO: make sure values stored in table
|
||||
// in registers header are dec and not hex
|
||||
|
||||
Start_RX_TILE_SCLK(event_id_);
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Stop(){
|
||||
void PciePerfMonMI200::Stop() {
|
||||
// TODO: revisit correct value to stop
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x2); // stop
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values){
|
||||
uint64_t val=0;
|
||||
void PciePerfMonMI200::Read(std::vector<rocprofiler_counters_sampler_counter_output_t>& values) {
|
||||
uint64_t val = 0;
|
||||
Read_RX_TILE_SCLK(val);
|
||||
rocprofiler_counters_sampler_counter_output_t value = {
|
||||
ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
|
||||
static_cast<double>(val)
|
||||
};
|
||||
rocprofiler_counters_sampler_counter_output_t value = {ROCPROFILER_COUNTERS_SAMPLER_PCIE_COUNTERS,
|
||||
static_cast<double>(val)};
|
||||
values.push_back(value);
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event){
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_TXCLK(uint32_t event) {
|
||||
// Step 1: PORT SEL update
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
|
||||
|
||||
// Step 2: EVENT SEL update
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_TXCLK3, value);
|
||||
|
||||
// Steps 3 & 4: Performance counters initialization, enable:
|
||||
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
|
||||
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
|
||||
void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result) {
|
||||
// Step 5: Performance counters read:
|
||||
uint32_t lo_val, hi_val;
|
||||
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_TXCLK3, lo_val);
|
||||
@@ -84,22 +80,20 @@ void PciePerfMonMI200::Read_RX_TILE_TXCLK(uint64_t& result){
|
||||
result = val | lo_val;
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event){
|
||||
|
||||
void PciePerfMonMI200::Start_RX_TILE_SCLK(uint32_t event) {
|
||||
// Step 1: PORT SEL update
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_EVENT_CI_PORT_SEL, 0x0);
|
||||
|
||||
// Step 2: EVENT SEL update
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
uint32_t value = event; // last 8 bits for event
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_CNTL_LCLK1, value);
|
||||
|
||||
// Steps 3 & 4: Performance counters initialization, enable:
|
||||
// TODO: revisit. Just a single write with 0x3 might be enough (check with pcie team)
|
||||
writeRegister(PCIE_MI200::PCIE_PERF_COUNT_CNTL, 0x5);
|
||||
|
||||
}
|
||||
|
||||
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
|
||||
void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result) {
|
||||
// Step 5: Performance counters read:
|
||||
uint32_t lo_val, hi_val;
|
||||
readRegister(PCIE_MI200::PCIE_PERF_COUNT0_LCLK1, lo_val);
|
||||
@@ -111,6 +105,4 @@ void PciePerfMonMI200::Read_RX_TILE_SCLK(uint64_t& result){
|
||||
result = val | lo_val;
|
||||
}
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
@@ -22,7 +22,7 @@ class PciePerfMonMI200 : public PerfMon {
|
||||
mmio::mmap_type_t Type() override { return mmio::mmap_type_t::PCIE_PERFMON; }
|
||||
|
||||
private:
|
||||
// TODO : check google coding std
|
||||
// TODO : check google coding std
|
||||
void writeRegister(uint32_t reg_offset, uint32_t value);
|
||||
void readRegister(uint32_t reg_offset, uint32_t& value);
|
||||
|
||||
|
||||
@@ -4,70 +4,70 @@
|
||||
#include <stdint.h>
|
||||
|
||||
namespace PCIE_MI200 {
|
||||
|
||||
|
||||
// -------- RX Tile TXCLK Start --------
|
||||
|
||||
// Step 1: PORT SEL update
|
||||
const static uint32_t PCIE_PERF_CNTL_EVENT_CI_PORT_SEL = 0x11180250;
|
||||
|
||||
// Step 2: EVENT SEL update
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK1 = 0x11180204;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK2 = 0x11180210;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK3 = 0x1118021C; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK4 = 0x11180228; //#
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK5 = 0x11180258;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK6 = 0x11180264;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK7 = 0x11180888;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK8 = 0x11180894;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK9 = 0x111808A0;
|
||||
const static uint32_t PCIE_PERF_CNTL_TXCLK10 = 0x111808AC;
|
||||
|
||||
// Steps 3 & 4: Performance counters initialization, enable:
|
||||
const static uint32_t PCIE_PERF_COUNT_CNTL = 0x11180200;
|
||||
|
||||
// Step 5: Performance counters read:
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK1 = 0x11180208;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK2 = 0x11180214;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK3 = 0x11180220; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK4 = 0x1118022C; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK5 = 0x1118025C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK6 = 0x11180268;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK7 = 0x1118088C;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK8 = 0x11180898;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK9 = 0x111808A4;
|
||||
const static uint32_t PCIE_PERF_COUNT0_TXCLK10 = 0x111808B0;
|
||||
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK1 = 0x111808E8;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK2 = 0x111808F0;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK3 = 0x111808F8; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK4 = 0x11180900; //#
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK5 = 0x11180908;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK6 = 0x11180910;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK7 = 0x11180918;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK8 = 0x11180920;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK9 = 0x11180928;
|
||||
const static uint32_t PCIE_PERF_COUNT0_UPVAL_TXCLK10 = 0x11180930;
|
||||
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK1 = 0x1118020C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK2 = 0x11180218;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK3 = 0x11180224; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK4 = 0x11180230; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK5 = 0x11180260;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK6 = 0x1118026C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK7 = 0x11180890;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK8 = 0x1118089C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK9 = 0x111808A8;
|
||||
const static uint32_t PCIE_PERF_COUNT1_TXCLK10 = 0x111808B4;
|
||||
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK1 = 0x111808EC;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK2 = 0x111808F4;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK3 = 0x111808FC; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK4 = 0x11180904; //#
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK5 = 0x1118090C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK6 = 0x11180914;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK7 = 0x1118091C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK8 = 0x11180924;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK9 = 0x1118092C;
|
||||
const static uint32_t PCIE_PERF_COUNT1_UPVAL_TXCLK10 = 0x11180934;
|
||||
|
||||
|
||||
@@ -127,201 +127,200 @@ const static uint32_t PCIE_PERF_COUNT1_UPVAL_LCLK8 = 0x11180974;
|
||||
|
||||
// -------- RX Tile SCLK End ----------
|
||||
|
||||
typedef enum{
|
||||
TX_TILE_TXCLK = 0,
|
||||
TX_TILE_SCLK = 1,
|
||||
RX_TILE_TXCLK = 2,
|
||||
RX_TILE_SCLK = 3,
|
||||
LC_TILE_TXCLK = 4
|
||||
}pcie_event_category_t;
|
||||
typedef enum {
|
||||
TX_TILE_TXCLK = 0,
|
||||
TX_TILE_SCLK = 1,
|
||||
RX_TILE_TXCLK = 2,
|
||||
RX_TILE_SCLK = 3,
|
||||
LC_TILE_TXCLK = 4
|
||||
} pcie_event_category_t;
|
||||
|
||||
struct pcie_event_t{
|
||||
pcie_event_t(int id, pcie_event_category_t cat): event_id(id), event_category(cat){}
|
||||
int event_id;
|
||||
pcie_event_category_t event_category;
|
||||
struct pcie_event_t {
|
||||
pcie_event_t(int id, pcie_event_category_t cat) : event_id(id), event_category(cat) {}
|
||||
int event_id;
|
||||
pcie_event_category_t event_category;
|
||||
};
|
||||
|
||||
const static std::map<std::string, pcie_event_t> pcie_events_table = {
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
|
||||
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
|
||||
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
|
||||
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[0]", {2, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[1]", {3, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[2]", {4, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEdb_A[3]", {5, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[0]", {6, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[1]", {7, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[2]", {8, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_TailEnd_A[3]", {9, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[0]", {10, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[1]", {11, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[2]", {12, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadSdp_A[3]", {13, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[0]", {14, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[1]", {15, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[2]", {16, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXP_RX_HeadStp_A[3]", {17, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_nullified_tlp_A", {18, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_valid_crc_A", {19, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RXCRC_invalid_crc_A", {20, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type1_A", {21, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_vendor_type0_A", {22, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_set_slot_power_limit_A", {23, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unlock_A", {24, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_fatal_A", {25, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_nonfatal_A", {26, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_err_corr_A", {27, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_to_ack_A", {28, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pme_turn_off_A", {29, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_pme_A", {30, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_pm_active_state_nak_A", {31, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intd_A", {32, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intc_A", {33, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_intb_A", {34, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_deassert_inta_A", {35, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intd_A", {36, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intc_A", {37, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_intb_A", {38, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_assert_inta_A", {39, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_valid_A", {40, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RMSG_unsupported_A", {41, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_unexpected_cpl_A", {42, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_RCB_timeout_cpl_A", {43, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlphdrvalid_A", {44, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_HDS_tlpdatavalid_A", {45, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_bad_tlp_A", {46, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_nak_A", {47, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_GAN_ack_A", {48, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_req_A", {49, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unsupported_cpl_A", {50, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_unexpected_cpl_A", {51, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_tlp_A", {52, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_poisoned_cpl_A", {53, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_malformed_tlp_A", {54, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_abort_A", {55, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MSG_A", {56, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_WR_A", {57, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_CFG_RD_A", {58, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_WR_A", {59, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_IO_RD_A", {60, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_WR_A", {61, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_request_MEM_RD_A", {62, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_gt16_A", {63, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_9to16_A", {64, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_5to8_A", {65, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_2to4_A", {66, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_MST_1_A", {67, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_gt32_A", {68, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_17to32_A", {69, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_9to16_A", {70, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_5to8_A", {71, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_2to4_A", {72, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_length_SLV_1_A", {73, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CA_A", {74, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_CRS_A", {75, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_UR_A", {76, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_FE_cpl_status_SC_A", {77, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_active_state_request_l1_A", {78, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_request_ack_A", {79, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l23_A", {80, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_pm_enter_l1_A", {81, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_error_A", {82, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_crc_err_A", {83, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_npd_0", {84, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_pd_0", {85, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_nph_0", {86, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_FCC_ph_0", {87, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_rd_hdr_0", {88, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_rd_hdr_0", {89, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_wr_hdr_0", {90, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_wr_hdr_0", {91, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_fail_crc_data_0", {92, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_pass_crc_data_0", {93, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_invalid_crc_0", {94, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_valid_crc_0", {95, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_rd_hdr_WEN_0", {96, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_wr_hdr_WEN_0", {97, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_data_WEN_0", {98, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE", {99, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE", {100, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE", {101, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_rd_from_FE_0", {102, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_non_post_wr_from_FE_0", {103, RX_TILE_TXCLK}},
|
||||
{"SB_PERF_post_req_from_FE_0", {104, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_nak_A", {111, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_DLLP_ack_A", {112, RX_TILE_TXCLK}},
|
||||
{"RX_PERF_allErrors_A", {113, RX_TILE_TXCLK}},
|
||||
{"perf_PG_COUNT", {175, RX_TILE_TXCLK}},
|
||||
{"perf_NOT_POWER_GATED", {176, RX_TILE_TXCLK}},
|
||||
{"perf_POWER_GATED", {177, RX_TILE_TXCLK}},
|
||||
|
||||
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
|
||||
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
|
||||
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
|
||||
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}
|
||||
};
|
||||
{"SB_PERF_non_post_rd_to_HI", {2, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI", {3, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI", {4, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_rd_to_HI_0", {5, RX_TILE_SCLK}},
|
||||
{"SB_PERF_non_post_wr_to_HI_0", {6, RX_TILE_SCLK}},
|
||||
{"SB_PERF_post_req_to_HI_0", {7, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_REN_0", {8, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_REN_0", {9, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_REN_0", {10, RX_TILE_SCLK}},
|
||||
{"SB_PERF_rd_hdr_empty_0", {11, RX_TILE_SCLK}},
|
||||
{"SB_PERF_wr_hdr_empty_0", {12, RX_TILE_SCLK}},
|
||||
{"SB_PERF_data_empty_0", {13, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total128BRdCpl", {29, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemRdTx", {30, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemRdTx", {31, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total16BMemWrTx", {32, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total32BMemWrTx", {33, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_total64BMemWrTx", {34, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalTx", {35, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_stallGrantGen", {36, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalGrant", {37, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_txPending", {38, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT32B", {39, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numMemRdLT16B", {40, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemTx", {41, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemRdTx", {42, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_totalMemWrTx", {43, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_numGrant0", {44, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_ns0", {45, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_ns0", {46, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow_s0", {47, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow_s0", {48, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntOverFlow0", {49, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_portCntUnderFlow0", {50, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_ns0", {51, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_npNotAccepted_s0", {52, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num128BRdCpl0", {53, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemRdTx0", {54, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemRdTx0", {55, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num16BMemWrTx0", {56, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num32BMemWrTx0", {57, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_num64BMemWrTx0", {58, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_Bandwidth0", {59, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_Bandwidth0", {60, RX_TILE_SCLK}},
|
||||
{"TX_PERF_S_RCLK_s_tag_buf_empty", {61, RX_TILE_SCLK}},
|
||||
{"P_request_latency_500ns_or_more", {62, RX_TILE_SCLK}},
|
||||
{"P_request_latency_250_to_500ns", {63, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100_to_250ns", {64, RX_TILE_SCLK}},
|
||||
{"P_request_latency_100ns_or_less", {65, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_500ns_or_more", {66, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_250_to_500ns", {67, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100_to_250ns", {68, RX_TILE_SCLK}},
|
||||
{"NP_request_latency_100ns_or_less", {69, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_cpl_slot[0]", {70, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_tag[0]", {71, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_d_credit[0]", {72, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemRd_wait_for_h_credit[0]", {73, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_tag[0]", {74, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_d_credit[0]", {75, RX_TILE_SCLK}},
|
||||
{"CI_PERF_slv_MemWr_wait_for_h_credit[0]", {76, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_no_tags_q", {77, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_data_credits_q", {78, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC1_req_credits_q", {79, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_cpl_slots_q[0]", {80, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_no_tags_q", {81, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_data_credits_q", {82, RX_TILE_SCLK}},
|
||||
{"CISLV_PERF_no_VC0_req_credits_q", {83, RX_TILE_SCLK}}};
|
||||
|
||||
}
|
||||
} // namespace PCIE_MI200
|
||||
|
||||
|
||||
#endif
|
||||
@@ -42,6 +42,6 @@ class PerfMon {
|
||||
std::vector<std::string> counter_names_;
|
||||
};
|
||||
|
||||
} // namespace rocprofiler
|
||||
} // namespace rocprofiler
|
||||
|
||||
#endif
|
||||
@@ -31,10 +31,8 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
namespace rocprofiler {
|
||||
size_t CreateGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t* command,
|
||||
const size_t& slot_count) {
|
||||
size_t CreateGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t* command, const size_t& slot_count) {
|
||||
if (op >= NUMBER_GPU_CMD_OP) EXC_RAISING(HSA_STATUS_ERROR, "bad op value (" << op << ")");
|
||||
|
||||
const bool is_legacy = (strncmp(agent_info->name, "gfx8", 4) == 0);
|
||||
@@ -49,14 +47,15 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
|
||||
profile.agent = agent_info->dev_id;
|
||||
// Query for cmd buffer size
|
||||
hsa_ven_amd_aqlprofile_info_type_t info_type =
|
||||
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
|
||||
hsa_status_t status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
|
||||
(hsa_ven_amd_aqlprofile_info_type_t)((int)HSA_VEN_AMD_AQLPROFILE_INFO_ENABLE_CMD + (int)op);
|
||||
hsa_status_t status =
|
||||
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, NULL);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "get_info(ENABLE_CMD ).size exc, op(" << int(op) << ")");
|
||||
if (profile.command_buffer.size == 0) EXC_RAISING(status, "get_info(ENABLE_CMD).size == 0");
|
||||
// Allocate cmd buffer
|
||||
const size_t aligment_mask = 0x100 - 1;
|
||||
profile.command_buffer.ptr =
|
||||
hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
|
||||
profile.command_buffer.ptr = hsa_rsrc->AllocateSysMemory(agent_info, profile.command_buffer.size);
|
||||
if ((reinterpret_cast<uintptr_t>(profile.command_buffer.ptr) & aligment_mask) != 0) {
|
||||
EXC_RAISING(status, "profile.command_buffer.ptr bad alignment");
|
||||
}
|
||||
@@ -66,15 +65,18 @@ size_t CreateGpuCommand(gpu_cmd_op_t op,
|
||||
packet_t packet{};
|
||||
|
||||
// Query for cmd buffer data
|
||||
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
|
||||
status =
|
||||
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, &packet);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
|
||||
|
||||
// Check for legacy GFXIP
|
||||
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_legacy_get_pm4(&packet, command);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
AQL_EXC_RAISING(status, "hsa_ven_amd_aqlprofile_legacy_get_pm4");
|
||||
} else {
|
||||
// Query for cmd buffer data
|
||||
status = hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
|
||||
status =
|
||||
hsa_rsrc->AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(&profile, info_type, command);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "get_info(ENABLE_CMD).data exc");
|
||||
}
|
||||
|
||||
@@ -91,15 +93,14 @@ struct gpu_cmd_key_t {
|
||||
uint32_t node_id;
|
||||
};
|
||||
struct gpu_cmd_fncomp_t {
|
||||
bool operator() (const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
|
||||
bool operator()(const gpu_cmd_key_t& a, const gpu_cmd_key_t& b) const {
|
||||
return (a.op < b.op) || ((a.op == b.op) && (a.node_id < b.node_id));
|
||||
}
|
||||
};
|
||||
typedef std::map<gpu_cmd_key_t, gpu_cmd_entry_t, gpu_cmd_fncomp_t> gpu_cmd_map_t;
|
||||
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out) {
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out) {
|
||||
thread_local gpu_cmd_map_t map;
|
||||
|
||||
// Getting NUMA node id
|
||||
@@ -112,7 +113,8 @@ size_t GetGpuCommand(gpu_cmd_op_t op,
|
||||
auto ret = map.insert({gpu_cmd_key_t{op, node_id}, gpu_cmd_entry_t{}});
|
||||
gpu_cmd_map_t::iterator it = ret.first;
|
||||
if (ret.second) {
|
||||
it->second.size = CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
|
||||
it->second.size =
|
||||
CreateGpuCommand(op, agent_info, it->second.command, Profile::LEGACY_SLOT_SIZE_PKT);
|
||||
}
|
||||
|
||||
*command_out = it->second.command;
|
||||
|
||||
@@ -37,9 +37,8 @@ enum gpu_cmd_op_t {
|
||||
NUMBER_GPU_CMD_OP
|
||||
};
|
||||
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out);
|
||||
size_t GetGpuCommand(gpu_cmd_op_t op, const rocprofiler::util::AgentInfo* agent_info,
|
||||
packet_t** command_out);
|
||||
|
||||
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
|
||||
const rocprofiler::util::AgentInfo* agent_info,
|
||||
@@ -55,9 +54,7 @@ static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static inline size_t IssueGpuCommand(gpu_cmd_op_t op,
|
||||
hsa_agent_t agent,
|
||||
hsa_queue_t* queue) {
|
||||
static inline size_t IssueGpuCommand(gpu_cmd_op_t op, hsa_agent_t agent, hsa_queue_t* queue) {
|
||||
rocprofiler::util::HsaRsrcFactory* hsa_rsrc = &rocprofiler::util::HsaRsrcFactory::Instance();
|
||||
const rocprofiler::util::AgentInfo* agent_info = hsa_rsrc->GetAgentInfo(agent);
|
||||
return IssueGpuCommand(op, agent_info, queue);
|
||||
|
||||
@@ -55,31 +55,30 @@ struct block_status_t {
|
||||
|
||||
// Metrics set class
|
||||
class MetricsGroup {
|
||||
public:
|
||||
public:
|
||||
// Info map type
|
||||
typedef std::map<std::string, const Metric*> info_map_t;
|
||||
// Blocks map type
|
||||
typedef std::map<block_des_t, block_status_t, lt_block_des> blocks_map_t;
|
||||
|
||||
MetricsGroup(const util::AgentInfo* agent_info) :
|
||||
agent_info_(agent_info)
|
||||
{
|
||||
MetricsGroup(const util::AgentInfo* agent_info) : agent_info_(agent_info) {
|
||||
metrics_ = MetricsDict::Create(agent_info);
|
||||
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
|
||||
}
|
||||
|
||||
void Print(FILE* file) const {
|
||||
for (const Metric* metric : metrics_vec_) {
|
||||
fprintf(file, " %s", metric->GetName().c_str()); fflush(stdout);
|
||||
fprintf(file, " %s", metric->GetName().c_str());
|
||||
fflush(stdout);
|
||||
}
|
||||
fprintf(file, "\n"); fflush(stdout);
|
||||
fprintf(file, "\n");
|
||||
fflush(stdout);
|
||||
}
|
||||
|
||||
static const Metric* GetMetric(const MetricsDict* metrics, const std::string& name) {
|
||||
// Metric object
|
||||
const Metric* metric = metrics->Get(name);
|
||||
if (metric == NULL)
|
||||
EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
|
||||
if (metric == NULL) EXC_RAISING(HSA_STATUS_ERROR, "input metric '" << name << "' is not found");
|
||||
return metric;
|
||||
}
|
||||
|
||||
@@ -95,9 +94,7 @@ class MetricsGroup {
|
||||
}
|
||||
|
||||
// Add metric
|
||||
bool AddMetric(const rocprofiler_feature_t* info) {
|
||||
return AddMetric(GetMetric(metrics_, info));
|
||||
}
|
||||
bool AddMetric(const rocprofiler_feature_t* info) { return AddMetric(GetMetric(metrics_, info)); }
|
||||
|
||||
bool AddMetric(const Metric* metric) {
|
||||
// Blocks utilization delta
|
||||
@@ -125,8 +122,9 @@ class MetricsGroup {
|
||||
query.events = event;
|
||||
|
||||
uint32_t block_counters;
|
||||
hsa_status_t status = util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
|
||||
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
|
||||
hsa_status_t status =
|
||||
util::HsaRsrcFactory::Instance().AqlProfileApi()->hsa_ven_amd_aqlprofile_get_info(
|
||||
&query, HSA_VEN_AMD_AQLPROFILE_INFO_BLOCK_COUNTERS, &block_counters);
|
||||
if (status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "get block_counters info");
|
||||
block_status.max_counters = block_counters;
|
||||
}
|
||||
@@ -141,7 +139,8 @@ class MetricsGroup {
|
||||
metrics_vec_.push_back(metric);
|
||||
info_map_[metric->GetName()] = metric;
|
||||
for (const counter_t* counter : counters_vec) {
|
||||
if (info_map_.find(counter->name) == info_map_.end()) info_map_[counter->name] = NewCounterInfo(counter->name);
|
||||
if (info_map_.find(counter->name) == info_map_.end())
|
||||
info_map_[counter->name] = NewCounterInfo(counter->name);
|
||||
}
|
||||
for (const auto& entry : blocks_delta) {
|
||||
blocks_map_[entry.first] = entry.second;
|
||||
@@ -150,10 +149,8 @@ class MetricsGroup {
|
||||
return true;
|
||||
}
|
||||
|
||||
private:
|
||||
const Metric* NewCounterInfo(const std::string& name) const {
|
||||
return GetMetric(metrics_, name);
|
||||
}
|
||||
private:
|
||||
const Metric* NewCounterInfo(const std::string& name) const { return GetMetric(metrics_, name); }
|
||||
|
||||
// Agent info
|
||||
const util::AgentInfo* const agent_info_;
|
||||
@@ -169,10 +166,10 @@ class MetricsGroup {
|
||||
|
||||
// Metrics groups class
|
||||
class MetricsGroupSet {
|
||||
public:
|
||||
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array, const uint32_t info_count) :
|
||||
agent_info_(agent_info)
|
||||
{
|
||||
public:
|
||||
MetricsGroupSet(const util::AgentInfo* agent_info, const rocprofiler_feature_t* info_array,
|
||||
const uint32_t info_count)
|
||||
: agent_info_(agent_info) {
|
||||
metrics_ = MetricsDict::Create(agent_info);
|
||||
if (metrics_ == NULL) EXC_RAISING(HSA_STATUS_ERROR, "MetricsDict create failed");
|
||||
Initialize(info_array, info_count);
|
||||
@@ -186,12 +183,13 @@ class MetricsGroupSet {
|
||||
|
||||
void Print(FILE* file) const {
|
||||
for (const auto* group : groups_) {
|
||||
fprintf(stdout, " pmc : "); fflush(stdout);
|
||||
fprintf(stdout, " pmc : ");
|
||||
fflush(stdout);
|
||||
group->Print(file);
|
||||
}
|
||||
}
|
||||
|
||||
private:
|
||||
private:
|
||||
void Initialize(const rocprofiler_feature_t* info_array, const uint32_t info_count) {
|
||||
std::multimap<uint32_t, const Metric*, std::greater<uint32_t> > input_metrics;
|
||||
for (unsigned i = 0; i < info_count; ++i) {
|
||||
@@ -202,7 +200,8 @@ class MetricsGroupSet {
|
||||
input_metrics.insert({counters_num, metric});
|
||||
|
||||
if (MetricsGroup(agent_info_).AddMetric(metric) == false) {
|
||||
AQL_EXC_RAISING(HSA_STATUS_ERROR, "Metric '" << metric->GetName() << "' doesn't fit in one group");
|
||||
AQL_EXC_RAISING(HSA_STATUS_ERROR,
|
||||
"Metric '" << metric->GetName() << "' doesn't fit in one group");
|
||||
}
|
||||
}
|
||||
#if 0
|
||||
@@ -239,4 +238,4 @@ class MetricsGroupSet {
|
||||
|
||||
} // namespace rocprofiler
|
||||
|
||||
#endif // SRC_CORE_GROUP_SET_H_
|
||||
#endif // SRC_CORE_GROUP_SET_H_
|
||||
|
||||
@@ -62,33 +62,28 @@ AgentInfo::AgentInfo(const hsa_agent_t agent, ::CoreApiTable* table) : handle_(a
|
||||
table->hsa_agent_get_info_fn(
|
||||
agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_SHADER_ENGINES), &se_num_);
|
||||
|
||||
if (table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
|
||||
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
|
||||
&waves_per_cu_) != HSA_STATUS_SUCCESS)
|
||||
{
|
||||
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
|
||||
if (table->hsa_agent_get_info_fn(agent,
|
||||
(hsa_agent_info_t)HSA_AMD_AGENT_INFO_NUM_SHADER_ARRAYS_PER_SE,
|
||||
&shader_arrays_per_se_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_MAX_WAVES_PER_CU,
|
||||
&waves_per_cu_) != HSA_STATUS_SUCCESS) {
|
||||
rocprofiler::fatal("hsa_agent_get_info for gfxip hardware configuration failed");
|
||||
}
|
||||
|
||||
compute_units_per_sh_ = cu_num_ / (se_num_ * shader_arrays_per_se_);
|
||||
wave_slots_per_simd_ = waves_per_cu_ / simds_per_cu_;
|
||||
|
||||
if (table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
|
||||
&pci_domain_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(
|
||||
agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
|
||||
&pci_location_id_) != HSA_STATUS_SUCCESS)
|
||||
{
|
||||
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
|
||||
if (table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_DOMAIN,
|
||||
&pci_domain_) != HSA_STATUS_SUCCESS ||
|
||||
table->hsa_agent_get_info_fn(agent, (hsa_agent_info_t)HSA_AMD_AGENT_INFO_BDFID,
|
||||
&pci_location_id_) != HSA_STATUS_SUCCESS) {
|
||||
rocprofiler::fatal("hsa_agent_get_info for PCI info failed");
|
||||
}
|
||||
|
||||
// TODO(saurabh, giovanni): Remove this in 5.7
|
||||
if (table->hsa_agent_get_info_fn(agent,
|
||||
static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC), &xcc_num_) != HSA_STATUS_SUCCESS) {
|
||||
xcc_num_ = 1;
|
||||
if (table->hsa_agent_get_info_fn(agent, static_cast<hsa_agent_info_t>(HSA_AMD_AGENT_INFO_NUM_XCC),
|
||||
&xcc_num_) != HSA_STATUS_SUCCESS) {
|
||||
xcc_num_ = 1;
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -33,8 +33,8 @@ Agent::AgentInfo& GetAgentInfo(decltype(hsa_agent_t::handle) handle) {
|
||||
if (agent_info_map.find(handle) != agent_info_map.end()) {
|
||||
return agent_info_map.at(handle);
|
||||
} else {
|
||||
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle) <<
|
||||
") in this system" << std::endl;
|
||||
std::cerr << std::string("Error: Can't find Agent with handle(") << std::to_string(handle)
|
||||
<< ") in this system" << std::endl;
|
||||
abort();
|
||||
}
|
||||
}
|
||||
@@ -49,9 +49,7 @@ void SetAgentInfo(decltype(hsa_agent_t::handle) handle, const Agent::AgentInfo&
|
||||
}
|
||||
}
|
||||
|
||||
std::vector<hsa_agent_t>& GetCPUAgentList() {
|
||||
return cpu_agents_list;
|
||||
}
|
||||
std::vector<hsa_agent_t>& GetCPUAgentList() { return cpu_agents_list; }
|
||||
|
||||
hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
|
||||
std::lock_guard<std::mutex> lock(agents_map_lock);
|
||||
@@ -60,8 +58,8 @@ hsa_agent_t GetAgentByIndex(uint64_t agent_index) {
|
||||
return hsa_agent_t{agent_info.second.getHandle()};
|
||||
}
|
||||
}
|
||||
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index) <<
|
||||
") in this system" << std::endl;
|
||||
std::cerr << std::string("Error: Can't find Agent with Index(") << std::to_string(agent_index)
|
||||
<< ") in this system" << std::endl;
|
||||
abort();
|
||||
}
|
||||
|
||||
|
||||
@@ -95,7 +95,7 @@ namespace rocprofiler {
|
||||
namespace hsa_support {
|
||||
|
||||
void Initialize(HsaApiTable* Table);
|
||||
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void *data);
|
||||
hsa_status_t hsa_iterate_agents_cb(hsa_agent_t agent, void* data);
|
||||
void Finalize();
|
||||
|
||||
bool IterateCounters(rocprofiler_counters_info_callback_t counters_info_callback);
|
||||
|
||||
@@ -181,7 +181,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
|
||||
|
||||
// TODO: validate needs to be called on each events_list[i]
|
||||
// Validating the events array for the specified gpu agent
|
||||
if(events_list.size() > 0) {
|
||||
if (events_list.size() > 0) {
|
||||
bool validate_event_result;
|
||||
status =
|
||||
hsa_ven_amd_aqlprofile_validate_event(gpu_agent, &events_list[0], &validate_event_result);
|
||||
@@ -234,9 +234,10 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
|
||||
}
|
||||
}
|
||||
|
||||
for(auto& cname : counter_names) {
|
||||
if(cname.compare("KERNEL_DURATION")==0) {
|
||||
rocprofiler::Metric* metric = const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
|
||||
for (auto& cname : counter_names) {
|
||||
if (cname.compare("KERNEL_DURATION") == 0) {
|
||||
rocprofiler::Metric* metric =
|
||||
const_cast<rocprofiler::Metric*>(metricsDict[gpu_agent.handle]->Get(cname));
|
||||
if (metric == nullptr) std::cout << cname << " not found in metricsDict\n";
|
||||
context->metrics_list.push_back(metric);
|
||||
}
|
||||
@@ -315,7 +316,7 @@ InitializeAqlPackets(hsa_agent_t cpu_agent, hsa_agent_t gpu_agent,
|
||||
hsa_agent_t ag_list[ag_list_count];
|
||||
ag_list[0] = gpu_agent;
|
||||
|
||||
if(context->events_list.size() > 0) {
|
||||
if (context->events_list.size() > 0) {
|
||||
// Preparing an Getting the size of the command and output buffers
|
||||
status = hsa_ven_amd_aqlprofile_start(profile, NULL);
|
||||
// CHECK_HSA_STATUS("Error: Getting Buffers Size", status);
|
||||
@@ -510,7 +511,8 @@ uint8_t* AllocateLocalMemory(size_t size, hsa_amd_memory_pool_t* gpu_pool) {
|
||||
return ptr;
|
||||
}
|
||||
|
||||
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile, size_t att_buffer_size) {
|
||||
hsa_status_t Allocate(hsa_agent_t gpu_agent, hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
size_t att_buffer_size) {
|
||||
Agent::AgentInfo& agentInfo = rocprofiler::hsa_support::GetAgentInfo(gpu_agent.handle);
|
||||
profile->command_buffer.ptr =
|
||||
AllocateSysMemory(gpu_agent, profile->command_buffer.size, &agentInfo.cpu_pool);
|
||||
|
||||
@@ -435,16 +435,18 @@ bool AsyncSignalHandler(hsa_signal_value_t signal_value, void* data) {
|
||||
pending->session_id = GetROCProfilerSingleton()->GetCurrentSessionId();
|
||||
}
|
||||
if (pending->counters_count > 0) {
|
||||
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 && pending->profile) // call to GetCounterData() is required only once for a dispatch
|
||||
if (xcc_id == 0 && pending->context && pending->context->metrics_list.size() > 0 &&
|
||||
pending->profile) // call to GetCounterData() is required only once for a dispatch
|
||||
rocprofiler::metrics::GetCounterData(pending->profile, queue_info_session->agent,
|
||||
pending->context->results_list);
|
||||
if (is_individual_xcc_mode)
|
||||
rocprofiler::metrics::GetCountersAndMetricResultsByXcc(
|
||||
xcc_id, pending->context->results_list, pending->context->results_map,
|
||||
pending->context->metrics_list, time.end-time.start);
|
||||
pending->context->metrics_list, time.end - time.start);
|
||||
else
|
||||
rocprofiler::metrics::GetMetricsData(pending->context->results_map,
|
||||
pending->context->metrics_list, time.end-time.start);
|
||||
pending->context->metrics_list,
|
||||
time.end - time.start);
|
||||
AddRecordCounters(&record, pending);
|
||||
} else {
|
||||
if (session->FindBuffer(pending->buffer_id)) {
|
||||
@@ -652,8 +654,8 @@ void CheckNeededProfileConfigs() {
|
||||
att_counters_names = filter->GetCounterData();
|
||||
kernel_profile_names = std::get<std::vector<std::string>>(
|
||||
filter->GetProperty(ROCPROFILER_FILTER_KERNEL_NAMES));
|
||||
kernel_profile_dispatch_ids = std::get<std::vector<uint64_t>>(
|
||||
filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
|
||||
kernel_profile_dispatch_ids =
|
||||
std::get<std::vector<uint64_t>>(filter->GetProperty(ROCPROFILER_FILTER_DISPATCH_IDS));
|
||||
} else if (session && session->FindFilterWithKind(ROCPROFILER_PC_SAMPLING_COLLECTION)) {
|
||||
is_pc_sampling_collection_mode = true;
|
||||
}
|
||||
@@ -685,23 +687,20 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
|
||||
auto& kdispatch = static_cast<const hsa_kernel_dispatch_packet_s*>(packets)[i];
|
||||
|
||||
// If Dispatch IDs specified, profile based on dispatch ID
|
||||
for (auto id : kernel_profile_dispatch_ids)
|
||||
b_profile_this_object |= id == current_writer_id;
|
||||
for (auto id : kernel_profile_dispatch_ids) b_profile_this_object |= id == current_writer_id;
|
||||
try {
|
||||
// Can throw
|
||||
const std::string& kernel_name = ksymbols->at(kdispatch.kernel_object);
|
||||
|
||||
// If no filters specified, auto profile this kernel
|
||||
if (kernel_profile_names.size() == 0 &&
|
||||
kernel_profile_dispatch_ids.size() == 0 &&
|
||||
if (kernel_profile_names.size() == 0 && kernel_profile_dispatch_ids.size() == 0 &&
|
||||
kernel_name.find("__amd_rocclr_") == std::string::npos)
|
||||
b_profile_this_object = true;
|
||||
b_profile_this_object = true;
|
||||
|
||||
// Try to match the mangled kernel name with given matches in input.txt
|
||||
// We want to initiate att profiling if a match exists
|
||||
for (const std::string& kernel_matches : kernel_profile_names)
|
||||
if (kernel_name.find(kernel_matches) != std::string::npos)
|
||||
b_profile_this_object = true;
|
||||
if (kernel_name.find(kernel_matches) != std::string::npos) b_profile_this_object = true;
|
||||
} catch (...) {
|
||||
printf("Warning: Unknown name for object %lu\n", kdispatch.kernel_object);
|
||||
}
|
||||
@@ -711,17 +710,13 @@ std::pair<std::vector<bool>, bool> GetAllowedProfilesList(const void* packets, i
|
||||
can_profile_packet.push_back(b_profile_this_object);
|
||||
}
|
||||
// If we're going to skip all packets, need to update writer ID
|
||||
if (!b_can_profile_anypacket)
|
||||
WRITER_ID.store(current_writer_id, std::memory_order_release);
|
||||
if (!b_can_profile_anypacket) WRITER_ID.store(current_writer_id, std::memory_order_release);
|
||||
return {can_profile_packet, b_can_profile_anypacket};
|
||||
}
|
||||
|
||||
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
|
||||
Packet::packet_t& start_packet,
|
||||
Packet::packet_t& stop_packet,
|
||||
Queue& queue_info,
|
||||
Agent::AgentInfo& agentInfo
|
||||
) {
|
||||
hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(Packet::packet_t& start_packet,
|
||||
Packet::packet_t& stop_packet, Queue& queue_info,
|
||||
Agent::AgentInfo& agentInfo) {
|
||||
std::vector<hsa_ven_amd_aqlprofile_parameter_t> att_params;
|
||||
int num_att_counters = 0;
|
||||
uint32_t att_buffer_size = DEFAULT_ATT_BUFFER_SIZE;
|
||||
@@ -731,15 +726,16 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
|
||||
case ROCPROFILER_ATT_PERFCOUNTER_NAME:
|
||||
break;
|
||||
case ROCPROFILER_ATT_BUFFER_SIZE:
|
||||
att_buffer_size = std::max(96l<<10l, std::min(int64_t(param.value)<<20l, (1l<<32l)-(3l<<20)));
|
||||
break; // Clip to [96KB, 4GB)
|
||||
att_buffer_size =
|
||||
std::max(96l << 10l, std::min(int64_t(param.value) << 20l, (1l << 32l) - (3l << 20)));
|
||||
break; // Clip to [96KB, 4GB)
|
||||
case ROCPROFILER_ATT_PERFCOUNTER:
|
||||
num_att_counters += 1;
|
||||
break;
|
||||
default:
|
||||
att_params.push_back(
|
||||
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
|
||||
param.value});
|
||||
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(param.parameter_name)),
|
||||
param.value});
|
||||
}
|
||||
}
|
||||
|
||||
@@ -760,22 +756,21 @@ hsa_ven_amd_aqlprofile_profile_t* ProcessATTParams(
|
||||
printf("Only events from the SQ block can be selected for ATT.");
|
||||
exit(1);
|
||||
}
|
||||
att_params.push_back({static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(
|
||||
int(ROCPROFILER_ATT_PERFCOUNTER)),
|
||||
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
|
||||
att_params.push_back(
|
||||
{static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
|
||||
event.counter_id | (event.counter_id ? (0xF << 24) : 0)});
|
||||
num_att_counters += 1;
|
||||
}
|
||||
|
||||
hsa_ven_amd_aqlprofile_parameter_t zero_perf = {
|
||||
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)),
|
||||
0};
|
||||
static_cast<hsa_ven_amd_aqlprofile_parameter_name_t>(int(ROCPROFILER_ATT_PERFCOUNTER)), 0};
|
||||
|
||||
// Fill other perfcounters with 0's
|
||||
for (; num_att_counters < 16; num_att_counters++) att_params.push_back(zero_perf);
|
||||
}
|
||||
// Get the PM4 Packets using packets_generator
|
||||
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(),
|
||||
att_params, &start_packet, &stop_packet, att_buffer_size);
|
||||
return Packet::GenerateATTPackets(queue_info.GetCPUAgent(), queue_info.GetGPUAgent(), att_params,
|
||||
&start_packet, &stop_packet, att_buffer_size);
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -866,14 +861,16 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
record_id);
|
||||
if (session_data_count > 0 && profile.second) {
|
||||
session->GetProfiler()->AddPendingSignals(
|
||||
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
|
||||
profile.first, session_data_count, profile.second, kernel_properties,
|
||||
(uint32_t)syscall(__NR_gettid), user_pkt_index, correlation_id);
|
||||
writer_id, record_id, original_packet.completion_signal,
|
||||
dispatch_packet.completion_signal, session_id, buffer_id, profile.first,
|
||||
session_data_count, profile.second, kernel_properties, (uint32_t)syscall(__NR_gettid),
|
||||
user_pkt_index, correlation_id);
|
||||
} else {
|
||||
session->GetProfiler()->AddPendingSignals(
|
||||
writer_id, record_id, original_packet.completion_signal, dispatch_packet.completion_signal, session_id, buffer_id,
|
||||
nullptr, session_data_count, nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid),
|
||||
user_pkt_index, correlation_id);
|
||||
writer_id, record_id, original_packet.completion_signal,
|
||||
dispatch_packet.completion_signal, session_id, buffer_id, nullptr, session_data_count,
|
||||
nullptr, kernel_properties, (uint32_t)syscall(__NR_gettid), user_pkt_index,
|
||||
correlation_id);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -893,7 +890,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
CreateSignal(0, &interrupt_signal);
|
||||
|
||||
// Adding Stop and Read PM4 Packets
|
||||
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 && profile.first && profile.first->stop_packet) {
|
||||
if (session_data_count > 0 && is_counter_collection_mode && profiles.size() > 0 &&
|
||||
profile.first && profile.first->stop_packet) {
|
||||
hsa_signal_t dummy_signal{};
|
||||
profile.first->stop_packet->header = HSA_PACKET_TYPE_VENDOR_SPECIFIC
|
||||
<< HSA_PACKET_HEADER_TYPE;
|
||||
@@ -937,7 +935,8 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
|
||||
bool can_profile_anypacket = false;
|
||||
std::vector<bool> can_profile_packet;
|
||||
std::tie(can_profile_packet, can_profile_anypacket) = GetAllowedProfilesList(packets, pkt_count);
|
||||
std::tie(can_profile_packet, can_profile_anypacket) =
|
||||
GetAllowedProfilesList(packets, pkt_count);
|
||||
|
||||
if (!can_profile_anypacket) {
|
||||
/* Write the original packets to the hardware if no patch will be profiled */
|
||||
@@ -964,8 +963,9 @@ void WriteInterceptor(const void* packets, uint64_t pkt_count, uint64_t user_pkt
|
||||
|
||||
// increment writer ID for every packet
|
||||
if (bit_extract(original_packet.header, HSA_PACKET_HEADER_TYPE,
|
||||
HSA_PACKET_HEADER_TYPE+HSA_PACKET_HEADER_WIDTH_TYPE-1) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
|
||||
HSA_PACKET_HEADER_TYPE + HSA_PACKET_HEADER_WIDTH_TYPE - 1) ==
|
||||
HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
writer_id = WRITER_ID.fetch_add(1, std::memory_order_release);
|
||||
|
||||
continue;
|
||||
}
|
||||
|
||||
@@ -37,33 +37,37 @@ SOFTWARE.
|
||||
#include "util/exception.h"
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
#define HSA_RT(call) \
|
||||
do { \
|
||||
const hsa_status_t status = call; \
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
|
||||
} while(0)
|
||||
|
||||
#define IS_HSA_CALLBACK(ID) \
|
||||
const auto __id = ID; (void)__id; \
|
||||
void *__arg = arg_.load(); (void)__arg; \
|
||||
rocprofiler_hsa_callback_fun_t __callback = \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) ? callbacks_.allocate: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit: \
|
||||
(ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol: \
|
||||
callbacks_.codeobj; \
|
||||
if ((__callback != NULL) && (recursion_ == false))
|
||||
|
||||
#define DO_HSA_CALLBACK \
|
||||
do { \
|
||||
recursion_ = true; \
|
||||
__callback(__id, &data, __arg); \
|
||||
recursion_ = false; \
|
||||
#define HSA_RT(call) \
|
||||
do { \
|
||||
const hsa_status_t status = call; \
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, #call); \
|
||||
} while (0)
|
||||
|
||||
#define ISSUE_HSA_CALLBACK(ID) \
|
||||
do { IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } } while(0)
|
||||
#define IS_HSA_CALLBACK(ID) \
|
||||
const auto __id = ID; \
|
||||
(void)__id; \
|
||||
void* __arg = arg_.load(); \
|
||||
(void)__arg; \
|
||||
rocprofiler_hsa_callback_fun_t __callback = (ID == ROCPROFILER_HSA_CB_ID_ALLOCATE) \
|
||||
? callbacks_.allocate \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_DEVICE) ? callbacks_.device \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_MEMCOPY) ? callbacks_.memcopy \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_SUBMIT) ? callbacks_.submit \
|
||||
: (ID == ROCPROFILER_HSA_CB_ID_KSYMBOL) ? callbacks_.ksymbol \
|
||||
: callbacks_.codeobj; \
|
||||
if ((__callback != NULL) && (recursion_ == false))
|
||||
|
||||
#define DO_HSA_CALLBACK \
|
||||
do { \
|
||||
recursion_ = true; \
|
||||
__callback(__id, &data, __arg); \
|
||||
recursion_ = false; \
|
||||
} while (0)
|
||||
|
||||
#define ISSUE_HSA_CALLBACK(ID) \
|
||||
do { \
|
||||
IS_HSA_CALLBACK(ID) { DO_HSA_CALLBACK; } \
|
||||
} while (0)
|
||||
|
||||
// Demangle C++ symbol name
|
||||
static const char* cpp_demangle(const char* symname) {
|
||||
@@ -74,15 +78,15 @@ static const char* cpp_demangle(const char* symname) {
|
||||
}
|
||||
|
||||
namespace rocprofiler {
|
||||
extern decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
extern decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
extern decltype(hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
extern decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
extern decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
extern decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
extern decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
extern decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
extern decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
extern decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
extern decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
extern decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
extern decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
extern decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
extern decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
extern decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
extern decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
extern decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
|
||||
class HsaInterceptor {
|
||||
public:
|
||||
@@ -95,10 +99,7 @@ class HsaInterceptor {
|
||||
if (enable_) {
|
||||
// Fetching AMD Loader HSA extension API
|
||||
HSA_RT(hsa_system_get_major_extension_table(
|
||||
HSA_EXTENSION_AMD_LOADER,
|
||||
1,
|
||||
sizeof(hsa_ven_amd_loader_1_01_pfn_t),
|
||||
&LoaderApiTable));
|
||||
HSA_EXTENSION_AMD_LOADER, 1, sizeof(hsa_ven_amd_loader_1_01_pfn_t), &LoaderApiTable));
|
||||
|
||||
// Saving original API functions
|
||||
hsa_memory_allocate_fn = table->core_->hsa_memory_allocate_fn;
|
||||
@@ -131,10 +132,7 @@ class HsaInterceptor {
|
||||
}
|
||||
|
||||
private:
|
||||
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region,
|
||||
size_t size,
|
||||
void** ptr)
|
||||
{
|
||||
static hsa_status_t HSA_API MemoryAllocate(hsa_region_t region, size_t size, void** ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_memory_allocate_fn(region, size, ptr));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
|
||||
@@ -150,11 +148,8 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryAssignAgent(
|
||||
void *ptr,
|
||||
hsa_agent_t agent,
|
||||
hsa_access_permission_t access)
|
||||
{
|
||||
static hsa_status_t MemoryAssignAgent(void* ptr, hsa_agent_t agent,
|
||||
hsa_access_permission_t access) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_memory_assign_agent_fn(ptr, agent, access));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
|
||||
@@ -169,11 +164,7 @@ class HsaInterceptor {
|
||||
}
|
||||
|
||||
// Spawn device allow access callback
|
||||
static void DeviceCallback(
|
||||
uint32_t num_agents,
|
||||
const hsa_agent_t* agents,
|
||||
const void* ptr)
|
||||
{
|
||||
static void DeviceCallback(uint32_t num_agents, const hsa_agent_t* agents, const void* ptr) {
|
||||
for (const hsa_agent_t* agent_p = agents; agent_p < (agents + num_agents); ++agent_p) {
|
||||
hsa_agent_t agent = *agent_p;
|
||||
rocprofiler_hsa_callback_data_t data{};
|
||||
@@ -188,17 +179,11 @@ class HsaInterceptor {
|
||||
}
|
||||
|
||||
// Agent allow access callback 'hsa_amd_agents_allow_access'
|
||||
static hsa_status_t AgentsAllowAccess(
|
||||
uint32_t num_agents,
|
||||
const hsa_agent_t* agents,
|
||||
const uint32_t* flags,
|
||||
const void* ptr)
|
||||
{
|
||||
static hsa_status_t AgentsAllowAccess(uint32_t num_agents, const hsa_agent_t* agents,
|
||||
const uint32_t* flags, const void* ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_amd_agents_allow_access_fn(num_agents, agents, flags, ptr));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) {
|
||||
DeviceCallback(num_agents, agents, ptr);
|
||||
}
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_DEVICE) { DeviceCallback(num_agents, agents, ptr); }
|
||||
return status;
|
||||
}
|
||||
|
||||
@@ -218,12 +203,8 @@ class HsaInterceptor {
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryPoolAllocate(
|
||||
hsa_amd_memory_pool_t pool,
|
||||
size_t size,
|
||||
uint32_t flags,
|
||||
void** ptr)
|
||||
{
|
||||
static hsa_status_t MemoryPoolAllocate(hsa_amd_memory_pool_t pool, size_t size, uint32_t flags,
|
||||
void** ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_amd_memory_pool_allocate_fn(pool, size, flags, ptr));
|
||||
if (size != 0) {
|
||||
@@ -232,8 +213,10 @@ class HsaInterceptor {
|
||||
data.allocate.ptr = *ptr;
|
||||
data.allocate.size = size;
|
||||
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT, &data.allocate.segment));
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS, &data.allocate.global_flag));
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_SEGMENT,
|
||||
&data.allocate.segment));
|
||||
HSA_RT(hsa_amd_memory_pool_get_info(pool, HSA_AMD_MEMORY_POOL_INFO_GLOBAL_FLAGS,
|
||||
&data.allocate.global_flag));
|
||||
|
||||
DO_HSA_CALLBACK;
|
||||
|
||||
@@ -246,9 +229,7 @@ class HsaInterceptor {
|
||||
}
|
||||
return status;
|
||||
}
|
||||
static hsa_status_t MemoryPoolFree(
|
||||
void* ptr)
|
||||
{
|
||||
static hsa_status_t MemoryPoolFree(void* ptr) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
|
||||
rocprofiler_hsa_callback_data_t data{};
|
||||
@@ -260,11 +241,7 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryCopy(
|
||||
void *dst,
|
||||
const void *src,
|
||||
size_t size)
|
||||
{
|
||||
static hsa_status_t MemoryCopy(void* dst, const void* src, size_t size) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_memory_copy_fn(dst, src, size));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
|
||||
@@ -277,17 +254,13 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t MemoryAsyncCopy(
|
||||
void* dst, hsa_agent_t dst_agent, const void* src,
|
||||
hsa_agent_t src_agent, size_t size,
|
||||
uint32_t num_dep_signals,
|
||||
const hsa_signal_t* dep_signals,
|
||||
hsa_signal_t completion_signal)
|
||||
{
|
||||
static hsa_status_t MemoryAsyncCopy(void* dst, hsa_agent_t dst_agent, const void* src,
|
||||
hsa_agent_t src_agent, size_t size, uint32_t num_dep_signals,
|
||||
const hsa_signal_t* dep_signals,
|
||||
hsa_signal_t completion_signal) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
HSA_RT(hsa_amd_memory_async_copy_fn(
|
||||
dst, dst_agent, src, src_agent, size,
|
||||
num_dep_signals, dep_signals, completion_signal));
|
||||
HSA_RT(hsa_amd_memory_async_copy_fn(dst, dst_agent, src, src_agent, size, num_dep_signals,
|
||||
dep_signals, completion_signal));
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_MEMCOPY) {
|
||||
rocprofiler_hsa_callback_data_t data{};
|
||||
data.memcopy.dst = dst;
|
||||
@@ -298,14 +271,11 @@ class HsaInterceptor {
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t CodeObjectCallback(
|
||||
hsa_executable_t executable,
|
||||
hsa_loaded_code_object_t loaded_code_object,
|
||||
void* arg)
|
||||
{
|
||||
static hsa_status_t CodeObjectCallback(hsa_executable_t executable,
|
||||
hsa_loaded_code_object_t loaded_code_object, void* arg) {
|
||||
const int free_flag = reinterpret_cast<long>(arg);
|
||||
hsa_ven_amd_loader_code_object_storage_type_t storage_type =
|
||||
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
|
||||
HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE;
|
||||
int storage_fd = -1;
|
||||
uint64_t memory_base = 0;
|
||||
uint64_t memory_size = 0;
|
||||
@@ -316,56 +286,45 @@ class HsaInterceptor {
|
||||
char* uri_str = NULL;
|
||||
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
|
||||
&storage_type));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE,
|
||||
&storage_type));
|
||||
|
||||
if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE) {
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
|
||||
&storage_fd));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
|
||||
&storage_fd));
|
||||
if (storage_fd == -1) {
|
||||
printf("CodeObjectCallback: fd == -1\n"); fflush(stdout);
|
||||
abort();
|
||||
printf("CodeObjectCallback: fd == -1\n");
|
||||
fflush(stdout);
|
||||
abort();
|
||||
}
|
||||
} else if (storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY) {
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
|
||||
&memory_base));
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
|
||||
&memory_base));
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
|
||||
&memory_size));
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
|
||||
&memory_size));
|
||||
}
|
||||
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE,
|
||||
&load_base));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE, &load_base));
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE,
|
||||
&load_size));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE, &load_size));
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA,
|
||||
&load_delta));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA, &load_delta));
|
||||
|
||||
// Getting URI
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH,
|
||||
&uri_len));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH, &uri_len));
|
||||
|
||||
uri_str = (char*)calloc(uri_len + 1, sizeof(char));
|
||||
if (!uri_str) EXC_ABORT(HSA_STATUS_ERROR, "URI allocation");
|
||||
|
||||
HSA_RT(LoaderApiTable.hsa_ven_amd_loader_loaded_code_object_get_info(
|
||||
loaded_code_object,
|
||||
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI,
|
||||
uri_str));
|
||||
loaded_code_object, HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI, uri_str));
|
||||
|
||||
if (storage_type != HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE) {
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_CODEOBJ) {
|
||||
@@ -377,8 +336,8 @@ class HsaInterceptor {
|
||||
data.codeobj.load_base = load_base;
|
||||
data.codeobj.load_size = load_size;
|
||||
data.codeobj.load_delta = load_delta;
|
||||
data.codeobj.uri_length = uri_len;
|
||||
data.codeobj.uri = uri_str;
|
||||
data.codeobj.uri_length = uri_len;
|
||||
data.codeobj.uri = uri_str;
|
||||
data.codeobj.unload = free_flag;
|
||||
|
||||
DO_HSA_CALLBACK;
|
||||
@@ -406,12 +365,8 @@ class HsaInterceptor {
|
||||
uint32_t num_agents = 0;
|
||||
hsa_agent_t* agents = NULL;
|
||||
pointer_info.size = sizeof(hsa_amd_pointer_info_t);
|
||||
HSA_RT(hsa_amd_pointer_info(
|
||||
reinterpret_cast<void*>(load_base),
|
||||
&pointer_info,
|
||||
malloc,
|
||||
&num_agents,
|
||||
&agents));
|
||||
HSA_RT(hsa_amd_pointer_info(reinterpret_cast<void*>(load_base), &pointer_info, malloc,
|
||||
&num_agents, &agents));
|
||||
|
||||
DeviceCallback(num_agents, agents, reinterpret_cast<void*>(load_base));
|
||||
}
|
||||
@@ -420,11 +375,8 @@ class HsaInterceptor {
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static hsa_status_t KernelSymbolCallback(
|
||||
hsa_executable_t executable,
|
||||
hsa_executable_symbol_t symbol,
|
||||
void *arg)
|
||||
{
|
||||
static hsa_status_t KernelSymbolCallback(hsa_executable_t executable,
|
||||
hsa_executable_symbol_t symbol, void* arg) {
|
||||
const int free_flag = reinterpret_cast<long>(arg);
|
||||
hsa_symbol_kind_t kind = (hsa_symbol_kind_t)0;
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &kind));
|
||||
@@ -433,9 +385,11 @@ class HsaInterceptor {
|
||||
const char* name = NULL;
|
||||
uint32_t len = 0;
|
||||
uint64_t obj = 0;
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
|
||||
HSA_RT(
|
||||
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT, &obj));
|
||||
if (free_flag == 0) {
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
|
||||
HSA_RT(
|
||||
hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &len));
|
||||
char sym_name[len + 1];
|
||||
HSA_RT(hsa_executable_symbol_get_info(symbol, HSA_EXECUTABLE_SYMBOL_INFO_NAME, sym_name));
|
||||
name = cpp_demangle(sym_name);
|
||||
@@ -453,10 +407,7 @@ class HsaInterceptor {
|
||||
return HSA_STATUS_SUCCESS;
|
||||
}
|
||||
|
||||
static hsa_status_t ExecutableFreeze(
|
||||
hsa_executable_t executable,
|
||||
const char *options)
|
||||
{
|
||||
static hsa_status_t ExecutableFreeze(hsa_executable_t executable, const char* options) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
|
||||
HSA_RT(hsa_executable_freeze_fn(executable, options));
|
||||
@@ -466,39 +417,29 @@ class HsaInterceptor {
|
||||
{ IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) is_codeobj_cb |= 1; }
|
||||
if (is_codeobj_cb) {
|
||||
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
|
||||
executable,
|
||||
CodeObjectCallback,
|
||||
reinterpret_cast<void*>(0));
|
||||
executable, CodeObjectCallback, reinterpret_cast<void*>(0));
|
||||
}
|
||||
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
|
||||
HSA_RT(hsa_executable_iterate_symbols(
|
||||
executable,
|
||||
KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(0)));
|
||||
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(0)));
|
||||
}
|
||||
|
||||
return status;
|
||||
}
|
||||
|
||||
static hsa_status_t ExecutableDestroy(
|
||||
hsa_executable_t executable)
|
||||
{
|
||||
static hsa_status_t ExecutableDestroy(hsa_executable_t executable) {
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_ALLOCATE) {
|
||||
LoaderApiTable.hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
|
||||
executable,
|
||||
CodeObjectCallback,
|
||||
reinterpret_cast<void*>(1));
|
||||
executable, CodeObjectCallback, reinterpret_cast<void*>(1));
|
||||
}
|
||||
|
||||
{
|
||||
IS_HSA_CALLBACK(ROCPROFILER_HSA_CB_ID_KSYMBOL) {
|
||||
HSA_RT(hsa_executable_iterate_symbols(
|
||||
executable,
|
||||
KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(1)));
|
||||
HSA_RT(hsa_executable_iterate_symbols(executable, KernelSymbolCallback,
|
||||
reinterpret_cast<void*>(1)));
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -33,9 +33,9 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
namespace rocprofiler {
|
||||
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
extern decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
extern decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
|
||||
class HsaProxyQueue : public ProxyQueue {
|
||||
public:
|
||||
|
||||
@@ -40,16 +40,13 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
namespace rocprofiler {
|
||||
enum {
|
||||
K_CONC_OFF = 0,
|
||||
K_CONC_PMC = 1,
|
||||
K_CONC_TRACE = 2
|
||||
};
|
||||
enum { K_CONC_OFF = 0, K_CONC_PMC = 1, K_CONC_TRACE = 2 };
|
||||
|
||||
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
|
||||
static inline void print_packet(const void* in_p, const uint32_t& in_n, const uint32_t& w_n = UINT32_MAX) {
|
||||
static inline void print_packet(const void* in_p, const uint32_t& in_n,
|
||||
const uint32_t& w_n = UINT32_MAX) {
|
||||
const uint32_t size32 = util::HsaRsrcFactory::CMD_SLOT_SIZE_B / 4;
|
||||
const uint32_t* beg = (const uint32_t*)in_p;
|
||||
const uint32_t* end = beg + (in_n * size32);
|
||||
@@ -85,31 +82,33 @@ class InterceptQueue {
|
||||
typedef std::recursive_mutex mutex_t;
|
||||
typedef std::map<uint64_t, InterceptQueue*> obj_map_t;
|
||||
typedef hsa_status_t (*queue_callback_t)(hsa_queue_t*, void* data);
|
||||
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t *queue, void *arg);
|
||||
typedef void (*queue_event_callback_t)(hsa_status_t status, hsa_queue_t* queue, void* arg);
|
||||
typedef uint32_t queue_id_t;
|
||||
|
||||
static void HsaIntercept(HsaApiTable* table);
|
||||
|
||||
static hsa_status_t InterceptQueueCreate(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source,
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue,
|
||||
const bool& tracker_on) {
|
||||
static hsa_status_t InterceptQueueCreate(
|
||||
hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data), void* data,
|
||||
uint32_t private_segment_size, uint32_t group_segment_size, hsa_queue_t** queue,
|
||||
const bool& tracker_on) {
|
||||
std::lock_guard<mutex_t> lck(mutex_);
|
||||
hsa_status_t status = HSA_STATUS_ERROR;
|
||||
|
||||
if (in_create_call_) EXC_ABORT(status, "recursive InterceptQueueCreate()");
|
||||
in_create_call_ = true;
|
||||
|
||||
ProxyQueue* proxy = ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
|
||||
group_segment_size, queue, &status);
|
||||
ProxyQueue* proxy =
|
||||
ProxyQueue::Create(agent, size, type, queue_event_callback, data, private_segment_size,
|
||||
group_segment_size, queue, &status);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "ProxyQueue::Create()");
|
||||
|
||||
if (tracker_on || tracker_on_) {
|
||||
if (tracker_ == NULL) tracker_ = &Tracker::Instance();
|
||||
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(*queue, true);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
|
||||
status = rocprofiler::util::HsaRsrcFactory::HsaApi()->hsa_amd_profiling_set_profiler_enabled(
|
||||
*queue, true);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_ABORT(status, "hsa_amd_profiling_set_profiler_enabled()");
|
||||
}
|
||||
|
||||
InterceptQueue* obj = new InterceptQueue(agent, *queue, proxy);
|
||||
@@ -138,15 +137,17 @@ class InterceptQueue {
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue) {
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, false);
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
|
||||
group_segment_size, queue, false);
|
||||
}
|
||||
|
||||
static hsa_status_t QueueCreateTracked(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t type,
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source,
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue) {
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size, group_segment_size, queue, true);
|
||||
void (*callback)(hsa_status_t status, hsa_queue_t* source,
|
||||
void* data),
|
||||
void* data, uint32_t private_segment_size,
|
||||
uint32_t group_segment_size, hsa_queue_t** queue) {
|
||||
return InterceptQueueCreate(agent, size, type, callback, data, private_segment_size,
|
||||
group_segment_size, queue, true);
|
||||
}
|
||||
|
||||
static hsa_status_t QueueDestroy(hsa_queue_t* queue) {
|
||||
@@ -170,8 +171,8 @@ class InterceptQueue {
|
||||
return status;
|
||||
}
|
||||
|
||||
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
|
||||
hsa_amd_queue_intercept_packet_writer writer) {
|
||||
static void OnSubmitCB_opt(const void* in_packets, uint64_t count, uint64_t user_que_idx,
|
||||
void* data, hsa_amd_queue_intercept_packet_writer writer) {
|
||||
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
|
||||
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
|
||||
Queue* proxy = obj->proxy_;
|
||||
@@ -195,10 +196,10 @@ class InterceptQueue {
|
||||
obj->queue_id,
|
||||
completion_signal,
|
||||
dispatch_packet,
|
||||
NULL, // kernel_name
|
||||
0, // kernel_object
|
||||
NULL, // kernel_code
|
||||
0, // (uint32_t)syscall(__NR_gettid),
|
||||
NULL, // kernel_name
|
||||
0, // kernel_object
|
||||
NULL, // kernel_code
|
||||
0, // (uint32_t)syscall(__NR_gettid),
|
||||
NULL}; // record
|
||||
|
||||
// Calling dispatch callback
|
||||
@@ -210,7 +211,8 @@ class InterceptQueue {
|
||||
if (group.feature_count != 0) {
|
||||
if (tracker_ != NULL) {
|
||||
Group* context_group = context->GetGroup(group.index);
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context_group->GetDispatchSignal();
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
context_group->GetDispatchSignal();
|
||||
Tracker::Enable_opt(context_group, completion_signal);
|
||||
context_group->IncrRefsCount();
|
||||
}
|
||||
@@ -254,8 +256,9 @@ class InterceptQueue {
|
||||
const uint32_t tid = syscall(__NR_gettid);
|
||||
hsa_queue_t* qptr = obj->queue_;
|
||||
const void* slot_ptr = util::HsaRsrcFactory::GetSlotPointer(qptr, user_que_idx);
|
||||
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n",
|
||||
pid, tid, qptr, user_que_idx, in_packets, slot_ptr, count, header_val); fflush(stdout);
|
||||
printf("OnSubmitCB: %u:%u queue(%p:%lu) in(%p, %p, %lu) hdr(%u)\n", pid, tid, qptr,
|
||||
user_que_idx, in_packets, slot_ptr, count, header_val);
|
||||
fflush(stdout);
|
||||
print_packet(in_packets, count);
|
||||
abort();
|
||||
#endif
|
||||
@@ -277,8 +280,9 @@ class InterceptQueue {
|
||||
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
|
||||
uint64_t kernel_object = dispatch_packet->kernel_object;
|
||||
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
|
||||
QueryKernelName(kernel_object, kernel_code) : NULL;
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
? QueryKernelName(kernel_object, kernel_code)
|
||||
: NULL;
|
||||
}
|
||||
|
||||
// Prepareing submit callback data
|
||||
@@ -311,8 +315,11 @@ class InterceptQueue {
|
||||
|
||||
const bool is_serial = (k_concurrent_ == K_CONC_OFF);
|
||||
if (tracker_ != NULL) {
|
||||
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id, dispatch_packet->completion_signal, is_serial);
|
||||
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->signal;
|
||||
tracker_entry = tracker_->Alloc(obj->agent_info_->dev_id,
|
||||
dispatch_packet->completion_signal, is_serial);
|
||||
if (is_serial)
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
tracker_entry->signal;
|
||||
}
|
||||
|
||||
// Prepareing dispatch callback data
|
||||
@@ -339,7 +346,9 @@ class InterceptQueue {
|
||||
// Injecting profiling start/stop/read packets
|
||||
if ((status != HSA_STATUS_SUCCESS) || (group.context == NULL)) {
|
||||
if (tracker_entry != NULL) {
|
||||
if (is_serial) const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = tracker_entry->orig;
|
||||
if (is_serial)
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
tracker_entry->orig;
|
||||
tracker_->Delete(tracker_entry);
|
||||
}
|
||||
} else {
|
||||
@@ -351,11 +360,11 @@ class InterceptQueue {
|
||||
const pkt_vector_t& read_vector = context->ReadPackets(group.index);
|
||||
pkt_vector_t packets;
|
||||
|
||||
if (is_serial) { // serial
|
||||
if (is_serial) { // serial
|
||||
packets = start_vector;
|
||||
packets.insert(packets.end(), *packet);
|
||||
packets.insert(packets.end(), stop_vector.begin(), stop_vector.end());
|
||||
} else { // concurrent
|
||||
} else { // concurrent
|
||||
// Insert start packets once
|
||||
auto inject_start = [&packets](const pkt_vector_t& starts) mutable {
|
||||
packets = starts;
|
||||
@@ -363,14 +372,15 @@ class InterceptQueue {
|
||||
std::call_once(once_flag_, inject_start, start_vector);
|
||||
// Reads at both kernel start and end (also with barriers)
|
||||
assert(read_vector.size() >= 2 * start_vector.size());
|
||||
auto mid = read_vector.begin() + read_vector.size()/2;
|
||||
auto mid = read_vector.begin() + read_vector.size() / 2;
|
||||
// Read at kernel start
|
||||
packets.insert(packets.end(), read_vector.begin(), mid);
|
||||
// Kernel dispatch packet
|
||||
assert(tracker_entry != NULL);
|
||||
// Bind dispatch and barrier signals with tracker entry
|
||||
tracker_->SetHandler(tracker_entry, context->GetGroup(group.index));
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal = context->GetGroup(group.index)->GetDispatchSignal();
|
||||
const_cast<hsa_kernel_dispatch_packet_t*>(dispatch_packet)->completion_signal =
|
||||
context->GetGroup(group.index)->GetDispatchSignal();
|
||||
packets.insert(packets.end(), *packet);
|
||||
// Read at kernel end
|
||||
packets.insert(packets.end(), mid, read_vector.end());
|
||||
@@ -379,7 +389,8 @@ class InterceptQueue {
|
||||
if (tracker_entry != NULL) {
|
||||
Group* context_group = context->GetGroup(group.index);
|
||||
context_group->IncrRefsCount();
|
||||
tracker_->EnableContext(tracker_entry, Context::Handler, reinterpret_cast<void*>(context_group));
|
||||
tracker_->EnableContext(tracker_entry, Context::Handler,
|
||||
reinterpret_cast<void*>(context_group));
|
||||
}
|
||||
|
||||
if (writer != NULL) {
|
||||
@@ -409,8 +420,8 @@ class InterceptQueue {
|
||||
}
|
||||
}
|
||||
|
||||
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx, void* data,
|
||||
hsa_amd_queue_intercept_packet_writer writer) {
|
||||
static void OnSubmitCB_ctrace(const void* in_packets, uint64_t count, uint64_t user_que_idx,
|
||||
void* data, hsa_amd_queue_intercept_packet_writer writer) {
|
||||
const packet_t* packets_arr = reinterpret_cast<const packet_t*>(in_packets);
|
||||
InterceptQueue* obj = reinterpret_cast<InterceptQueue*>(data);
|
||||
Queue* proxy = obj->proxy_;
|
||||
@@ -431,8 +442,9 @@ class InterceptQueue {
|
||||
if (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) {
|
||||
uint64_t kernel_object = dispatch_packet->kernel_object;
|
||||
const amd_kernel_code_t* kernel_code = GetKernelCode(kernel_object);
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH) ?
|
||||
QueryKernelName(kernel_object, kernel_code) : NULL;
|
||||
kernel_name = (GetHeaderType(packet) == HSA_PACKET_TYPE_KERNEL_DISPATCH)
|
||||
? QueryKernelName(kernel_object, kernel_code)
|
||||
: NULL;
|
||||
}
|
||||
|
||||
// Prepareing submit callback data
|
||||
@@ -529,7 +541,9 @@ class InterceptQueue {
|
||||
Stop();
|
||||
}
|
||||
|
||||
static inline void Start() { dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release); }
|
||||
static inline void Start() {
|
||||
dispatch_callback_.store(callbacks_.dispatch, std::memory_order_release);
|
||||
}
|
||||
static inline void Stop() { dispatch_callback_.store(NULL, std::memory_order_relaxed); }
|
||||
|
||||
static void SetSubmitCallback(rocprofiler_hsa_callback_fun_t fun, void* arg) {
|
||||
@@ -545,7 +559,7 @@ class InterceptQueue {
|
||||
static uint32_t k_concurrent_;
|
||||
|
||||
private:
|
||||
static void queue_event_callback(hsa_status_t status, hsa_queue_t *queue, void *arg) {
|
||||
static void queue_event_callback(hsa_status_t status, hsa_queue_t* queue, void* arg) {
|
||||
if (status != HSA_STATUS_SUCCESS) {
|
||||
uint32_t* read_ptr32 = (uint32_t*)util::HsaRsrcFactory::GetReadPointer(queue);
|
||||
print_packet(read_ptr32, 1);
|
||||
@@ -582,12 +596,13 @@ class InterceptQueue {
|
||||
const uint16_t kernel_object_flag = *((uint64_t*)kernel_code + 1);
|
||||
if (kernel_object_flag == 0) {
|
||||
if (!util::HsaRsrcFactory::IsExecutableTracking()) {
|
||||
EXC_ABORT(HSA_STATUS_ERROR, "Error: V3 code object detected - code objects tracking should be enabled\n");
|
||||
EXC_ABORT(HSA_STATUS_ERROR,
|
||||
"Error: V3 code object detected - code objects tracking should be enabled\n");
|
||||
}
|
||||
}
|
||||
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking()) ?
|
||||
util::HsaRsrcFactory::GetKernelNameRef(kernel_object) :
|
||||
GetKernelName(kernel_code->runtime_loader_kernel_symbol);
|
||||
const char* kernel_symname = (util::HsaRsrcFactory::IsExecutableTracking())
|
||||
? util::HsaRsrcFactory::GetKernelNameRef(kernel_object)
|
||||
: GetKernelName(kernel_code->runtime_loader_kernel_symbol);
|
||||
return kernel_symname;
|
||||
}
|
||||
|
||||
@@ -618,17 +633,13 @@ class InterceptQueue {
|
||||
return status;
|
||||
}
|
||||
|
||||
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy) :
|
||||
queue_(queue),
|
||||
proxy_(proxy)
|
||||
{
|
||||
InterceptQueue(const hsa_agent_t& agent, hsa_queue_t* const queue, ProxyQueue* proxy)
|
||||
: queue_(queue), proxy_(proxy) {
|
||||
agent_info_ = util::HsaRsrcFactory::Instance().GetAgentInfo(agent);
|
||||
queue_event_callback_ = NULL;
|
||||
}
|
||||
|
||||
~InterceptQueue() {
|
||||
ProxyQueue::Destroy(proxy_);
|
||||
}
|
||||
~InterceptQueue() { ProxyQueue::Destroy(proxy_); }
|
||||
|
||||
static const packet_word_t header_type_mask = (1ul << HSA_PACKET_HEADER_WIDTH_TYPE) - 1;
|
||||
|
||||
|
||||
@@ -25,4 +25,4 @@ THE SOFTWARE.
|
||||
namespace rocprofiler {
|
||||
MetricsDict::map_t* MetricsDict::map_ = NULL;
|
||||
MetricsDict::mutex_t MetricsDict::mutex_;
|
||||
}
|
||||
} // namespace rocprofiler
|
||||
|
||||
Executable → Regular
+5
-5
@@ -202,15 +202,15 @@ class MetricsDict {
|
||||
xml_->AddConst("top.const.metric", "SE_NUM", agent_info->se_num);
|
||||
ImportMetrics(agent_info, "const");
|
||||
agent_name_ = agent_info->name;
|
||||
|
||||
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
|
||||
|
||||
if (agent_name_.find(':') != std::string::npos) // Remove compiler flags from the agent_name
|
||||
agent_name_ = agent_name_.substr(0, agent_name_.find(':'));
|
||||
|
||||
std::unordered_set<std::string> supported_agent_names = {
|
||||
"gfx906", "gfx908", "gfx90a", // Vega
|
||||
"gfx940", "gfx941", "gfx942", // Mi300
|
||||
"gfx906", "gfx908", "gfx90a", // Vega
|
||||
"gfx940", "gfx941", "gfx942", // Mi300
|
||||
"gfx1030", "gfx1031", "gfx1032", // Navi2x
|
||||
"gfx1100", "gfx1101" // Navi3x
|
||||
"gfx1100", "gfx1101" // Navi3x
|
||||
};
|
||||
if (supported_agent_names.find(agent_name_) != supported_agent_names.end()) {
|
||||
ImportMetrics(agent_info, agent_name_);
|
||||
|
||||
@@ -140,7 +140,7 @@ class Profile {
|
||||
static void SetConcurrent(profile_t* profile) {
|
||||
// Check whether conconcurrent has been set
|
||||
for (const parameter_t* p = profile->parameters;
|
||||
p < (profile->parameters + profile->parameter_count); ++p) {
|
||||
p < (profile->parameters + profile->parameter_count); ++p) {
|
||||
// If yes, stop here
|
||||
if (p->parameter_name == HSA_VEN_AMD_AQLPROFILE_PARAMETER_NAME_K_CONCURRENT) {
|
||||
return;
|
||||
@@ -148,7 +148,7 @@ class Profile {
|
||||
}
|
||||
|
||||
// Otherwise, try to set
|
||||
parameter_t* parameters = new parameter_t[profile->parameter_count+1];
|
||||
parameter_t* parameters = new parameter_t[profile->parameter_count + 1];
|
||||
for (unsigned i = 0; i < profile->parameter_count; ++i) {
|
||||
parameters[i].parameter_name = profile->parameters[i].parameter_name;
|
||||
parameters[i].value = profile->parameters[i].value;
|
||||
@@ -162,15 +162,16 @@ class Profile {
|
||||
}
|
||||
|
||||
void BarrierPacket(packet_t* packet, const hsa_signal_t& prior_signal) {
|
||||
hsa_barrier_and_packet_t* barrier =
|
||||
reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
|
||||
hsa_barrier_and_packet_t* barrier = reinterpret_cast<hsa_barrier_and_packet_t*>(packet);
|
||||
barrier->header = HSA_PACKET_TYPE_BARRIER_AND;
|
||||
if (prior_signal.handle) barrier->dep_signal[0] = prior_signal; // set packet dependency
|
||||
else barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
|
||||
if (prior_signal.handle)
|
||||
barrier->dep_signal[0] = prior_signal; // set packet dependency
|
||||
else
|
||||
barrier->header |= 1 << HSA_PACKET_HEADER_BARRIER; // set barrier bit
|
||||
}
|
||||
|
||||
hsa_status_t Finalize(pkt_vector_t& start_vector, pkt_vector_t& stop_vector,
|
||||
pkt_vector_t& read_vector, bool is_concurrent = false) {
|
||||
pkt_vector_t& read_vector, bool is_concurrent = false) {
|
||||
if (is_concurrent) SetConcurrent(&profile_);
|
||||
|
||||
hsa_status_t status = HSA_STATUS_SUCCESS;
|
||||
@@ -180,8 +181,8 @@ class Profile {
|
||||
const pfn_t* api = rsrc->AqlProfileApi();
|
||||
packet_t start{};
|
||||
packet_t stop{};
|
||||
packet_t read{}; // read at kernel start
|
||||
packet_t read2{}; // read at kernel end
|
||||
packet_t read{}; // read at kernel start
|
||||
packet_t read2{}; // read at kernel end
|
||||
|
||||
// Check the profile buffer sizes
|
||||
status = api->hsa_ven_amd_aqlprofile_start(&profile_, NULL);
|
||||
@@ -200,12 +201,12 @@ class Profile {
|
||||
#ifdef AQLPROF_NEW_API
|
||||
if (profile_.type == HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC) {
|
||||
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read);
|
||||
if (is_concurrent){ // concurrent: one more read
|
||||
if (is_concurrent) { // concurrent: one more read
|
||||
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
|
||||
rd_status = api->hsa_ven_amd_aqlprofile_read(&profile_, &read2);
|
||||
}
|
||||
}
|
||||
#if 0 // Read API returns error if disabled
|
||||
#if 0 // Read API returns error if disabled
|
||||
if (rd_status != HSA_STATUS_SUCCESS) AQL_EXC_RAISING(status, "aqlprofile_read");
|
||||
#endif
|
||||
#endif
|
||||
@@ -220,7 +221,8 @@ class Profile {
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
if (is_concurrent) {
|
||||
status = hsa_signal_create(1, 0, NULL, &read_signal_);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
read.completion_signal = read_signal_;
|
||||
read2.completion_signal = completion_signal_;
|
||||
} else {
|
||||
@@ -239,7 +241,8 @@ class Profile {
|
||||
BarrierPacket(&barrier_rd, read.completion_signal);
|
||||
BarrierPacket(&barrier_rd2, dispatch_signal_);
|
||||
status = hsa_signal_create(1, 0, NULL, &(barrier_signal_));
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "signal_create " << std::hex << status);
|
||||
barrier_rd2.completion_signal = barrier_signal_;
|
||||
}
|
||||
|
||||
@@ -297,8 +300,8 @@ class Profile {
|
||||
|
||||
void GetProfiles(profile_vector_t& vec) {
|
||||
if (!info_vector_.empty()) {
|
||||
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_,
|
||||
dispatch_signal_, barrier_signal_, read_signal_});
|
||||
vec.push_back(profile_tuple_t{&profile_, &info_vector_, completion_signal_, dispatch_signal_,
|
||||
barrier_signal_, read_signal_});
|
||||
}
|
||||
}
|
||||
|
||||
@@ -330,11 +333,12 @@ class PmcProfile : public Profile {
|
||||
|
||||
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
|
||||
profile_.command_buffer.ptr =
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
// Allocate profile output buffer from kernarg memory pool since kernarg
|
||||
// memory buffer is uncached. So when GPU copies performance counter values
|
||||
// to this buffer they are guaranteed to be visible to CPU.
|
||||
profile_.output_buffer.ptr = rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
|
||||
profile_.output_buffer.ptr =
|
||||
rsrc->AllocateKernArgMemory(agent_info_, profile_.output_buffer.size);
|
||||
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
|
||||
: HSA_STATUS_ERROR;
|
||||
}
|
||||
@@ -366,11 +370,11 @@ class TraceProfile : public Profile {
|
||||
|
||||
hsa_status_t Allocate(util::HsaRsrcFactory* rsrc) {
|
||||
profile_.command_buffer.ptr =
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.command_buffer.size);
|
||||
profile_.output_buffer.size = output_buffer_size_;
|
||||
profile_.output_buffer.ptr = (output_buffer_local_) ?
|
||||
rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size) :
|
||||
rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
|
||||
profile_.output_buffer.ptr = (output_buffer_local_)
|
||||
? rsrc->AllocateLocalMemory(agent_info_, profile_.output_buffer.size)
|
||||
: rsrc->AllocateSysMemory(agent_info_, profile_.output_buffer.size);
|
||||
return (profile_.command_buffer.ptr && profile_.output_buffer.ptr) ? HSA_STATUS_SUCCESS
|
||||
: HSA_STATUS_ERROR;
|
||||
}
|
||||
|
||||
@@ -38,10 +38,10 @@ ProxyQueue* ProxyQueue::Create(hsa_agent_t agent, uint32_t size, hsa_queue_type3
|
||||
hsa_status_t* status) {
|
||||
hsa_status_t suc = HSA_STATUS_ERROR;
|
||||
ProxyQueue* instance =
|
||||
(rocp_type_) ? (ProxyQueue*) new SimpleProxyQueue() : (ProxyQueue*) new HsaProxyQueue();
|
||||
(rocp_type_) ? (ProxyQueue*)new SimpleProxyQueue() : (ProxyQueue*)new HsaProxyQueue();
|
||||
if (instance != NULL) {
|
||||
suc = instance->Init(agent, size, type, callback, data, private_segment_size,
|
||||
group_segment_size, queue);
|
||||
group_segment_size, queue);
|
||||
if (suc != HSA_STATUS_SUCCESS) {
|
||||
delete instance;
|
||||
instance = NULL;
|
||||
|
||||
@@ -75,34 +75,34 @@ hsa_status_t CreateQueuePro(hsa_agent_t agent, uint32_t size, hsa_queue_type32_t
|
||||
void* data, uint32_t private_segment_size, uint32_t group_segment_size,
|
||||
hsa_queue_t** queue);
|
||||
|
||||
decltype(hsa_queue_create)* hsa_queue_create_fn;
|
||||
decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
decltype(::hsa_queue_create)* hsa_queue_create_fn;
|
||||
decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
|
||||
decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
|
||||
decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
decltype(hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
|
||||
decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
decltype(::hsa_queue_add_write_index_scacq_screl)* hsa_queue_add_write_index_scacq_screl_fn;
|
||||
|
||||
decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
|
||||
decltype(hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
decltype(hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
decltype(::hsa_amd_queue_intercept_create)* hsa_amd_queue_intercept_create_fn;
|
||||
decltype(::hsa_amd_queue_intercept_register)* hsa_amd_queue_intercept_register_fn;
|
||||
|
||||
decltype(hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
decltype(hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
decltype(hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
decltype(hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
decltype(hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
decltype(hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
decltype(hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
decltype(hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
|
||||
decltype(hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
decltype(hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
decltype(::hsa_memory_allocate)* hsa_memory_allocate_fn;
|
||||
decltype(::hsa_memory_assign_agent)* hsa_memory_assign_agent_fn;
|
||||
decltype(::hsa_memory_copy)* hsa_memory_copy_fn;
|
||||
decltype(::hsa_amd_memory_pool_allocate)* hsa_amd_memory_pool_allocate_fn;
|
||||
decltype(::hsa_amd_memory_pool_free)* hsa_amd_memory_pool_free_fn;
|
||||
decltype(::hsa_amd_agents_allow_access)* hsa_amd_agents_allow_access_fn;
|
||||
decltype(::hsa_amd_memory_async_copy)* hsa_amd_memory_async_copy_fn;
|
||||
decltype(::hsa_amd_memory_async_copy_rect)* hsa_amd_memory_async_copy_rect_fn;
|
||||
decltype(::hsa_executable_freeze)* hsa_executable_freeze_fn;
|
||||
decltype(::hsa_executable_destroy)* hsa_executable_destroy_fn;
|
||||
|
||||
::HsaApiTable* kHsaApiTable;
|
||||
|
||||
@@ -393,80 +393,80 @@ ROCPROFILER_EXPORT extern const uint32_t HSA_AMD_TOOL_PRIORITY = 25;
|
||||
PUBLIC_API bool OnLoad(HsaApiTable* table, uint64_t runtime_version, uint64_t failed_tool_count,
|
||||
const char* const* failed_tool_names) {
|
||||
ONLOAD_TRACE_BEG();
|
||||
rocprofiler::SaveHsaApi(table);
|
||||
rocprofiler::ProxyQueue::InitFactory();
|
||||
rocprofiler::SaveHsaApi(table);
|
||||
rocprofiler::ProxyQueue::InitFactory();
|
||||
|
||||
// Checking environment to enable intercept mode
|
||||
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
|
||||
// Checking environment to enable intercept mode
|
||||
const char* intercept_env = getenv("ROCP_HSA_INTERCEPT");
|
||||
|
||||
int intercept_env_value = 0;
|
||||
if (intercept_env != NULL) {
|
||||
intercept_env_value = atoi(intercept_env);
|
||||
int intercept_env_value = 0;
|
||||
if (intercept_env != NULL) {
|
||||
intercept_env_value = atoi(intercept_env);
|
||||
|
||||
switch (intercept_env_value) {
|
||||
case 0:
|
||||
case 1:
|
||||
// 0: Intercepting disabled
|
||||
// 1: Intercepting enabled without timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(false);
|
||||
break;
|
||||
case 2:
|
||||
// Intercepting enabled with timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(true);
|
||||
break;
|
||||
default:
|
||||
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
|
||||
<< intercept_env << "): "
|
||||
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
|
||||
"(intercepting with timestamp)");
|
||||
return false;
|
||||
}
|
||||
switch (intercept_env_value) {
|
||||
case 0:
|
||||
case 1:
|
||||
// 0: Intercepting disabled
|
||||
// 1: Intercepting enabled without timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(false);
|
||||
break;
|
||||
case 2:
|
||||
// Intercepting enabled with timestamping
|
||||
rocprofiler::InterceptQueue::TrackerOn(true);
|
||||
break;
|
||||
default:
|
||||
ERR_LOGGING("Bad ROCP_HSA_INTERCEPT env var value ("
|
||||
<< intercept_env << "): "
|
||||
<< "valid values are 0 (standalone), 1 (intercepting without timestamp), 2 "
|
||||
"(intercepting with timestamp)");
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// always enable excutable tracking
|
||||
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
|
||||
// always enable excutable tracking
|
||||
rocprofiler::util::HsaRsrcFactory::EnableExecutableTracking(table);
|
||||
|
||||
// Loading a tool lib and setting of intercept mode
|
||||
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
|
||||
// Loading a tool lib and setting of intercept mode
|
||||
const uint32_t intercept_mode_mask = rocprofiler::LoadTool();
|
||||
|
||||
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
|
||||
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
|
||||
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_interceptor;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
|
||||
}
|
||||
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
|
||||
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
|
||||
hsa_status_t status = hsa_amd_profiling_async_copy_enable(true);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_ABORT(status, "hsa_amd_profiling_async_copy_enable");
|
||||
rocprofiler::hsa_amd_memory_async_copy_fn = table->amd_ext_->hsa_amd_memory_async_copy_fn;
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_fn =
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_interceptor;
|
||||
table->amd_ext_->hsa_amd_memory_async_copy_rect_fn =
|
||||
rocprofiler::hsa_amd_memory_async_copy_rect_interceptor;
|
||||
}
|
||||
if (intercept_mode_mask & rocprofiler::HSA_INTERCEPT_MODE) {
|
||||
if (intercept_mode_mask & rocprofiler::MEMCOPY_INTERCEPT_MODE) {
|
||||
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
|
||||
}
|
||||
rocprofiler::HsaInterceptor::Enable(true);
|
||||
rocprofiler::HsaInterceptor::HsaIntercept(table);
|
||||
EXC_ABORT(HSA_STATUS_ERROR, "HSA_INTERCEPT and MEMCOPY_INTERCEPT conflict");
|
||||
}
|
||||
rocprofiler::HsaInterceptor::Enable(true);
|
||||
rocprofiler::HsaInterceptor::HsaIntercept(table);
|
||||
}
|
||||
|
||||
// HSA intercepting
|
||||
if (intercept_env_value != 0) {
|
||||
rocprofiler::ProxyQueue::HsaIntercept(table);
|
||||
rocprofiler::InterceptQueue::HsaIntercept(table);
|
||||
} else {
|
||||
rocprofiler::StandaloneIntercept();
|
||||
}
|
||||
// HSA intercepting
|
||||
if (intercept_env_value != 0) {
|
||||
rocprofiler::ProxyQueue::HsaIntercept(table);
|
||||
rocprofiler::InterceptQueue::HsaIntercept(table);
|
||||
} else {
|
||||
rocprofiler::StandaloneIntercept();
|
||||
}
|
||||
|
||||
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
|
||||
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
|
||||
<< ")" << std::dec);
|
||||
ONLOAD_TRACE("end intercept_mode(" << std::hex << intercept_env_value << ")"
|
||||
<< " intercept_mode_mask(" << std::hex << intercept_mode_mask
|
||||
<< ")" << std::dec);
|
||||
return true;
|
||||
}
|
||||
|
||||
// HSA-runtime tool on-unload method
|
||||
PUBLIC_API void OnUnload() {
|
||||
ONLOAD_TRACE_BEG();
|
||||
rocprofiler::UnloadTool();
|
||||
rocprofiler::RestoreHsaApi();
|
||||
rocprofiler::UnloadTool();
|
||||
rocprofiler::RestoreHsaApi();
|
||||
ONLOAD_TRACE_END();
|
||||
}
|
||||
|
||||
|
||||
@@ -27,22 +27,20 @@ namespace rocprofiler {
|
||||
namespace att {
|
||||
|
||||
AttTracer::AttTracer(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
|
||||
rocprofiler_session_id_t session_id)
|
||||
rocprofiler_session_id_t session_id)
|
||||
: buffer_id_(buffer_id), filter_id_(filter_id), session_id_(session_id) {}
|
||||
|
||||
void AttTracer::AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
|
||||
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
|
||||
rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties,
|
||||
uint32_t thread_id, uint64_t queue_index) {
|
||||
void AttTracer::AddPendingSignals(
|
||||
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
|
||||
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id, hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index) {
|
||||
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
|
||||
if (sessions_pending_signals_.find(writer_id) == sessions_pending_signals_.end())
|
||||
sessions_pending_signals_.emplace(writer_id, std::vector<att_pending_signal_t>());
|
||||
sessions_pending_signals_.at(writer_id).emplace_back(
|
||||
att_pending_signal_t{kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, profile,
|
||||
kernel_properties, thread_id, queue_index});
|
||||
sessions_pending_signals_.at(writer_id).emplace_back(att_pending_signal_t{
|
||||
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id,
|
||||
profile, kernel_properties, thread_id, queue_index});
|
||||
std::atomic_thread_fence(std::memory_order_release);
|
||||
}
|
||||
|
||||
|
||||
@@ -40,7 +40,7 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
|
||||
}
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_PC_SAMPLING_COLLECTION:{
|
||||
case ROCPROFILER_PC_SAMPLING_COLLECTION: {
|
||||
break;
|
||||
}
|
||||
case ROCPROFILER_ATT_TRACE_COLLECTION: {
|
||||
@@ -62,8 +62,8 @@ Filter::Filter(rocprofiler_filter_id_t id, rocprofiler_filter_kind_t filter_kind
|
||||
}
|
||||
case ROCPROFILER_API_TRACE: {
|
||||
tracer_apis_.clear();
|
||||
for (uint32_t j = 0; j < data_count; j++){
|
||||
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
|
||||
for (uint32_t j = 0; j < data_count; j++) {
|
||||
tracer_apis_.emplace_back(filter_data.trace_apis[j]);
|
||||
}
|
||||
break;
|
||||
}
|
||||
@@ -195,7 +195,7 @@ void Filter::SetProperty(rocprofiler_filter_property_t property) {
|
||||
case ROCPROFILER_FILTER_DISPATCH_IDS:
|
||||
dispatch_id_filter_.clear();
|
||||
for (uint32_t j = 0; j < property.data_count; j++)
|
||||
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
|
||||
dispatch_id_filter_.emplace_back(property.dispatch_ids[j]);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
@@ -249,9 +249,7 @@ void Filter::SetCallback(rocprofiler_sync_callback_t& callback) {
|
||||
|
||||
bool Filter::HasCallback() { return has_sync_callback_; }
|
||||
|
||||
rocprofiler_sync_callback_t& Filter::GetCallback() {
|
||||
return callback_;
|
||||
}
|
||||
rocprofiler_sync_callback_t& Filter::GetCallback() { return callback_; }
|
||||
|
||||
size_t Filter::GetPropertiesCount(rocprofiler_filter_property_kind_t kind) {
|
||||
switch (kind) {
|
||||
|
||||
@@ -53,11 +53,8 @@ class Filter {
|
||||
bool HasCallback();
|
||||
|
||||
void SetProperty(rocprofiler_filter_property_t property);
|
||||
std::variant<
|
||||
std::vector<std::string>,
|
||||
uint32_t*,
|
||||
std::vector<uint64_t>
|
||||
> GetProperty(rocprofiler_filter_property_kind_t kind);
|
||||
std::variant<std::vector<std::string>, uint32_t*, std::vector<uint64_t> > GetProperty(
|
||||
rocprofiler_filter_property_kind_t kind);
|
||||
|
||||
size_t GetPropertiesCount(rocprofiler_filter_property_kind_t kind);
|
||||
rocprofiler_spm_parameter_t* GetSpmParameterData();
|
||||
@@ -74,11 +71,12 @@ class Filter {
|
||||
std::vector<std::string> kernel_names_; // HIP/HSA API Functions
|
||||
uint32_t dispatch_range_[2]; // Kernel Dispatches OR API Range
|
||||
|
||||
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
|
||||
std::vector<std::string> profiler_counter_names_; // Counter Names to collect
|
||||
std::vector<rocprofiler_tracer_activity_domain_t> tracer_apis_; // ROCTX/HIP/HSA API
|
||||
rocprofiler_spm_parameter_t* spm_parameter_; // spm parameter
|
||||
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
|
||||
rocprofiler_counters_sampler_parameters_t counters_sampler_parameters_; // sampled counters parameters
|
||||
std::vector<rocprofiler_att_parameter_t> att_parameters_; // ATT Parameters
|
||||
rocprofiler_counters_sampler_parameters_t
|
||||
counters_sampler_parameters_; // sampled counters parameters
|
||||
std::vector<uint64_t> dispatch_id_filter_;
|
||||
|
||||
bool has_sync_callback_{false};
|
||||
|
||||
@@ -125,17 +125,19 @@ bool Profiler::HasActivePass() {
|
||||
}
|
||||
|
||||
void Profiler::AddPendingSignals(
|
||||
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal,
|
||||
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile, rocprofiler_kernel_properties_t kernel_properties,
|
||||
uint32_t thread_id, uint64_t queue_index, uint64_t correlation_id) {
|
||||
uint32_t writer_id, uint64_t kernel_object, const hsa_signal_t& original_completion_signal,
|
||||
const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id, rocprofiler::profiling_context_t* context,
|
||||
uint64_t session_data_count, hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id, uint64_t queue_index,
|
||||
uint64_t correlation_id) {
|
||||
std::lock_guard<std::mutex> lock(sessions_pending_signals_lock_);
|
||||
if (sessions_pending_signals_->find(writer_id) == sessions_pending_signals_->end())
|
||||
sessions_pending_signals_->emplace(writer_id, std::vector<pending_signal_t*>());
|
||||
sessions_pending_signals_->at(writer_id).emplace_back(new pending_signal_t{
|
||||
kernel_object, original_completion_signal, new_completion_signal, session_id_, buffer_id, context, session_data_count,
|
||||
profile, kernel_properties, thread_id, queue_index, correlation_id});
|
||||
sessions_pending_signals_->at(writer_id).emplace_back(
|
||||
new pending_signal_t{kernel_object, original_completion_signal, new_completion_signal,
|
||||
session_id_, buffer_id, context, session_data_count, profile,
|
||||
kernel_properties, thread_id, queue_index, correlation_id});
|
||||
}
|
||||
|
||||
const std::vector<pending_signal_t*>& Profiler::GetPendingSignals(uint32_t writer_id) {
|
||||
|
||||
@@ -36,7 +36,7 @@
|
||||
#include "src/core/counters/metrics/eval_metrics.h"
|
||||
|
||||
typedef void (*rocprofiler_add_profiler_record_t)(rocprofiler_record_profiler_t&& record,
|
||||
rocprofiler_session_id_t session_id);
|
||||
rocprofiler_session_id_t session_id);
|
||||
|
||||
typedef rocprofiler_timestamp_t (*rocprofiler_get_timestamp_t)();
|
||||
|
||||
@@ -68,12 +68,13 @@ class Profiler {
|
||||
~Profiler();
|
||||
|
||||
void AddPendingSignals(uint32_t writer_id, uint64_t kernel_object,
|
||||
const hsa_signal_t& original_completion_signal, const hsa_signal_t& new_completion_signal, rocprofiler_session_id_t session_id,
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
|
||||
uint64_t queue_index, uint64_t correlation_id);
|
||||
const hsa_signal_t& original_completion_signal,
|
||||
const hsa_signal_t& new_completion_signal,
|
||||
rocprofiler_session_id_t session_id, rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler::profiling_context_t* context, uint64_t session_data_count,
|
||||
hsa_ven_amd_aqlprofile_profile_t* profile,
|
||||
rocprofiler_kernel_properties_t kernel_properties, uint32_t thread_id,
|
||||
uint64_t queue_index, uint64_t correlation_id);
|
||||
|
||||
const std::vector<pending_signal_t*>& GetPendingSignals(uint32_t writer_id);
|
||||
bool CheckPendingSignalsIsEmpty();
|
||||
@@ -83,8 +84,10 @@ class Profiler {
|
||||
std::string& GetCounterName(rocprofiler_counter_id_t handler);
|
||||
|
||||
bool FindCounter(rocprofiler_counter_id_t counter_id);
|
||||
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
|
||||
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind, rocprofiler_counter_id_t counter_id);
|
||||
size_t GetCounterInfoSize(rocprofiler_counter_info_kind_t kind,
|
||||
rocprofiler_counter_id_t counter_id);
|
||||
const char* GetCounterInfo(rocprofiler_counter_info_kind_t kind,
|
||||
rocprofiler_counter_id_t counter_id);
|
||||
|
||||
void StartReplayPass(rocprofiler_session_id_t session_id);
|
||||
void EndReplayPass();
|
||||
|
||||
@@ -67,8 +67,8 @@ class Session {
|
||||
|
||||
// Filter
|
||||
rocprofiler_filter_id_t CreateFilter(rocprofiler_filter_kind_t filter_kind,
|
||||
rocprofiler_filter_data_t filter_data, uint64_t data_count,
|
||||
rocprofiler_filter_property_t property);
|
||||
rocprofiler_filter_data_t filter_data, uint64_t data_count,
|
||||
rocprofiler_filter_property_t property);
|
||||
bool FindFilter(rocprofiler_filter_id_t filter_id);
|
||||
void DestroyFilter(rocprofiler_filter_id_t filter_id);
|
||||
Filter* GetFilter(rocprofiler_filter_id_t filter_id);
|
||||
@@ -83,7 +83,7 @@ class Session {
|
||||
|
||||
// Buffer
|
||||
rocprofiler_buffer_id_t CreateBuffer(rocprofiler_buffer_callback_t buffer_callback,
|
||||
size_t buffer_size);
|
||||
size_t buffer_size);
|
||||
bool FindBuffer(rocprofiler_buffer_id_t buffer_id);
|
||||
void DestroyBuffer(rocprofiler_buffer_id_t buffer_id);
|
||||
Memory::GenericBuffer* GetBuffer(rocprofiler_buffer_id_t buffer_id);
|
||||
|
||||
@@ -112,8 +112,7 @@ const char* roctracer_op_string(uint32_t domain, uint32_t op) {
|
||||
case ACTIVITY_DOMAIN_EXT_API:
|
||||
return "EXT_API";
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -178,8 +177,7 @@ constexpr uint32_t get_op_begin(activity_domain_t domain) {
|
||||
case ACTIVITY_DOMAIN_EXT_API:
|
||||
return 0;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -200,8 +198,7 @@ constexpr uint32_t get_op_end(activity_domain_t domain) {
|
||||
case ACTIVITY_DOMAIN_EXT_API:
|
||||
return get_op_begin(ACTIVITY_DOMAIN_EXT_API);
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -476,11 +473,10 @@ int TracerCallback(activity_domain_t domain, uint32_t operation_id, void* data)
|
||||
rocprofiler::GetROCProfilerSingleton()
|
||||
->GetSession((*pool)->session_id)
|
||||
->GetBuffer((*pool)->buffer_id)
|
||||
->AddRecord(
|
||||
rocprofiler_record, record->kernel_name, kernel_name_size,
|
||||
[](auto& rocprofiler_record, const void* data) {
|
||||
rocprofiler_record.name = static_cast<const char*>(data);
|
||||
});
|
||||
->AddRecord(rocprofiler_record, record->kernel_name, kernel_name_size,
|
||||
[](auto& rocprofiler_record, const void* data) {
|
||||
rocprofiler_record.name = static_cast<const char*>(data);
|
||||
});
|
||||
} else {
|
||||
rocprofiler::GetROCProfilerSingleton()
|
||||
->GetSession((*pool)->session_id)
|
||||
@@ -584,8 +580,7 @@ static void roctracer_enable_op_callback(activity_domain_t domain, uint32_t oper
|
||||
user_data);
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -623,8 +618,7 @@ void roctracer_disable_op_callback(activity_domain_t domain, uint32_t operation_
|
||||
ROCTX_registration_group.Unregister(roctx_api_callback_table, operation_id);
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -667,8 +661,7 @@ void roctracer_enable_op_activity(activity_domain_t domain, uint32_t op,
|
||||
case ACTIVITY_DOMAIN_ROCTX:
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -710,8 +703,7 @@ void roctracer_disable_activity(activity_domain_t domain, uint32_t op) {
|
||||
case ACTIVITY_DOMAIN_ROCTX:
|
||||
break;
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -774,8 +766,7 @@ void roctracer_set_properties(activity_domain_t domain, void* properties) {
|
||||
break;
|
||||
}
|
||||
default:
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID,
|
||||
"Invalid domain ID");
|
||||
throw rocprofiler::Exception(ROCPROFILER_STATUS_ERROR_INVALID_DOMAIN_ID, "Invalid domain ID");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -791,9 +782,7 @@ static std::string getKernelNameMultiKernelMultiDevice(hipLaunchParams* launchPa
|
||||
return name_str.str();
|
||||
}
|
||||
|
||||
template <typename... Ts> struct Overloaded : Ts... {
|
||||
using Ts::operator()...;
|
||||
};
|
||||
template <typename... Ts> struct Overloaded : Ts... { using Ts::operator()...; };
|
||||
template <class... Ts> Overloaded(Ts...) -> Overloaded<Ts...>;
|
||||
|
||||
std::optional<std::string> GetHipKernelName(uint32_t cid, hip_api_data_t* data) {
|
||||
|
||||
@@ -27,13 +27,19 @@ void SimpleProxyQueue::HsaIntercept(HsaApiTable* table) {
|
||||
table->core_->hsa_signal_store_relaxed_fn = rocprofiler::SimpleProxyQueue::SignalStore;
|
||||
table->core_->hsa_signal_store_screlease_fn = rocprofiler::SimpleProxyQueue::SignalStore;
|
||||
|
||||
table->core_->hsa_queue_load_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_relaxed_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_relaxed_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
table->core_->hsa_queue_load_write_index_relaxed_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_relaxed_fn =
|
||||
rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_relaxed_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
|
||||
table->core_->hsa_queue_load_write_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_screlease_fn = rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_scacquire_fn = rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
table->core_->hsa_queue_load_write_index_scacquire_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetQueueIndex;
|
||||
table->core_->hsa_queue_store_write_index_screlease_fn =
|
||||
rocprofiler::SimpleProxyQueue::SetQueueIndex;
|
||||
table->core_->hsa_queue_load_read_index_scacquire_fn =
|
||||
rocprofiler::SimpleProxyQueue::GetSubmitIndex;
|
||||
}
|
||||
|
||||
SimpleProxyQueue::queue_map_t* SimpleProxyQueue::queue_map_ = NULL;
|
||||
|
||||
@@ -33,23 +33,23 @@ THE SOFTWARE.
|
||||
#include "util/hsa_rsrc_factory.h"
|
||||
|
||||
#ifndef ROCP_PROXY_LOCK
|
||||
# define ROCP_PROXY_LOCK 1
|
||||
#define ROCP_PROXY_LOCK 1
|
||||
#endif
|
||||
|
||||
namespace rocprofiler {
|
||||
extern decltype(hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
extern decltype(::hsa_queue_create)* hsa_queue_create_fn;
|
||||
extern decltype(::hsa_queue_destroy)* hsa_queue_destroy_fn;
|
||||
|
||||
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
extern decltype(hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_relaxed_fn;
|
||||
extern decltype(::hsa_signal_store_relaxed)* hsa_signal_store_screlease_fn;
|
||||
|
||||
extern decltype(hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
extern decltype(hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
extern decltype(hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
extern decltype(::hsa_queue_load_write_index_relaxed)* hsa_queue_load_write_index_relaxed_fn;
|
||||
extern decltype(::hsa_queue_store_write_index_relaxed)* hsa_queue_store_write_index_relaxed_fn;
|
||||
extern decltype(::hsa_queue_load_read_index_relaxed)* hsa_queue_load_read_index_relaxed_fn;
|
||||
|
||||
extern decltype(hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
extern decltype(hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
extern decltype(hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
extern decltype(::hsa_queue_load_write_index_scacquire)* hsa_queue_load_write_index_scacquire_fn;
|
||||
extern decltype(::hsa_queue_store_write_index_screlease)* hsa_queue_store_write_index_screlease_fn;
|
||||
extern decltype(::hsa_queue_load_read_index_scacquire)* hsa_queue_load_read_index_scacquire_fn;
|
||||
|
||||
typedef decltype(hsa_signal_t::handle) signal_handle_t;
|
||||
|
||||
@@ -128,7 +128,8 @@ class SimpleProxyQueue : public ProxyQueue {
|
||||
const uint64_t que_idx = hsa_queue_load_write_index_relaxed_fn(queue_);
|
||||
|
||||
// Waiting untill there is a free space in the queue
|
||||
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_));
|
||||
while (que_idx >= (hsa_queue_load_read_index_relaxed_fn(queue_) + size_))
|
||||
;
|
||||
|
||||
// Increment the write index
|
||||
hsa_queue_store_write_index_relaxed_fn(queue_, que_idx + 1);
|
||||
@@ -163,8 +164,7 @@ class SimpleProxyQueue : public ProxyQueue {
|
||||
queue_mask_(0),
|
||||
submit_index_(0),
|
||||
on_submit_cb_(NULL),
|
||||
on_submit_cb_data_(NULL)
|
||||
{
|
||||
on_submit_cb_data_(NULL) {
|
||||
printf("ROCProfiler: SimpleProxyQueue is enabled\n");
|
||||
fflush(stdout);
|
||||
}
|
||||
@@ -203,8 +203,8 @@ class SimpleProxyQueue : public ProxyQueue {
|
||||
|
||||
if (queue_map_ == NULL) queue_map_ = new queue_map_t;
|
||||
(*queue_map_)[queue_->doorbell_signal.handle] = this;
|
||||
}
|
||||
else abort();
|
||||
} else
|
||||
abort();
|
||||
}
|
||||
}
|
||||
if (status != HSA_STATUS_SUCCESS) abort();
|
||||
|
||||
@@ -40,7 +40,7 @@ THE SOFTWARE.
|
||||
namespace rocprofiler {
|
||||
|
||||
class Tracker {
|
||||
public:
|
||||
public:
|
||||
typedef std::mutex mutex_t;
|
||||
typedef util::HsaRsrcFactory::timestamp_t timestamp_t;
|
||||
typedef rocprofiler_dispatch_record_t record_t;
|
||||
@@ -89,7 +89,7 @@ class Tracker {
|
||||
}
|
||||
|
||||
// Add tracker entry
|
||||
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy=true) {
|
||||
entry_t* Alloc(const hsa_agent_t& agent, const hsa_signal_t& orig, bool proxy = true) {
|
||||
hsa_status_t status = HSA_STATUS_ERROR;
|
||||
|
||||
// Creating a new tracker entry
|
||||
@@ -108,10 +108,12 @@ class Tracker {
|
||||
// Creating a proxy signal
|
||||
if (proxy) {
|
||||
entry->is_proxy = true;
|
||||
const hsa_signal_value_t signal_value = (orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
|
||||
const hsa_signal_value_t signal_value =
|
||||
(orig.handle) ? hsa_api_.hsa_signal_load_relaxed(orig) : 1;
|
||||
status = hsa_api_.hsa_signal_create(signal_value, 0, NULL, &(entry->signal));
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_signal_create");
|
||||
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler, entry);
|
||||
status = hsa_api_.hsa_amd_signal_async_handler(entry->signal, HSA_SIGNAL_CONDITION_LT,
|
||||
signal_value, Handler, entry);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
|
||||
}
|
||||
|
||||
@@ -128,7 +130,8 @@ class Tracker {
|
||||
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
|
||||
hsa_signal_t& handler_signal = group->GetBarrierSignal();
|
||||
entry->signal = dispatch_signal;
|
||||
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
|
||||
hsa_status_t status = hsa_api_.hsa_amd_signal_async_handler(
|
||||
handler_signal, HSA_SIGNAL_CONDITION_LT, 1, Handler, entry);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
|
||||
}
|
||||
|
||||
@@ -150,7 +153,8 @@ class Tracker {
|
||||
// Debug trace
|
||||
if (trace_on_) {
|
||||
auto outstanding = outstanding_.fetch_add(1);
|
||||
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
|
||||
fprintf(stdout, "Tracker::Enable: entry %p, record %p, outst %lu\n", entry, entry->record,
|
||||
outstanding);
|
||||
fflush(stdout);
|
||||
}
|
||||
}
|
||||
@@ -173,12 +177,14 @@ class Tracker {
|
||||
group->GetRecord()->dispatch = util::HsaRsrcFactory::Instance().TimestampNs();
|
||||
|
||||
// Creating a proxy signal
|
||||
const hsa_signal_value_t signal_value = (orig_signal.handle) ?
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal) : 1;
|
||||
const hsa_signal_value_t signal_value = (orig_signal.handle)
|
||||
? util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_load_relaxed(orig_signal)
|
||||
: 1;
|
||||
hsa_signal_t& dispatch_signal = group->GetDispatchSignal();
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal, signal_value);
|
||||
hsa_status_t status =
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(dispatch_signal,
|
||||
signal_value);
|
||||
hsa_status_t status = util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_signal_async_handler(
|
||||
dispatch_signal, HSA_SIGNAL_CONDITION_LT, signal_value, Handler_opt, group);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_signal_async_handler");
|
||||
}
|
||||
|
||||
@@ -190,7 +196,8 @@ class Tracker {
|
||||
record_t* record = group->GetRecord();
|
||||
hsa_amd_profiling_dispatch_time_t dispatch_time{};
|
||||
hsa_status_t status =
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(context->GetAgent(), dispatch_signal, &dispatch_time);
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_amd_profiling_get_dispatch_time(
|
||||
context->GetAgent(), dispatch_signal, &dispatch_time);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
|
||||
record->begin = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.start);
|
||||
record->end = util::HsaRsrcFactory::Instance().SysclockToNs(dispatch_time.end);
|
||||
@@ -203,22 +210,23 @@ class Tracker {
|
||||
amd_signal_t* prof_signal_ptr = reinterpret_cast<amd_signal_t*>(dispatch_signal.handle);
|
||||
orig_signal_ptr->start_ts = prof_signal_ptr->start_ts;
|
||||
orig_signal_ptr->end_ts = prof_signal_ptr->end_ts;
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal, signal_value);
|
||||
util::HsaRsrcFactory::Instance().HsaApi()->hsa_signal_store_screlease(orig_signal,
|
||||
signal_value);
|
||||
}
|
||||
|
||||
return Context::Handler(signal_value, arg);
|
||||
}
|
||||
|
||||
private:
|
||||
Tracker() :
|
||||
outstanding_(0),
|
||||
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
|
||||
hsa_api_(*(hsa_rsrc_->HsaApi()))
|
||||
{}
|
||||
private:
|
||||
Tracker()
|
||||
: outstanding_(0),
|
||||
hsa_rsrc_(&(util::HsaRsrcFactory::Instance())),
|
||||
hsa_api_(*(hsa_rsrc_->HsaApi())) {}
|
||||
|
||||
~Tracker() {
|
||||
if (trace_on_) {
|
||||
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()), outstanding_.load());
|
||||
fprintf(stdout, "Tracker::DESTR: sig list %d, outst %lu\n", (int)(sig_list_.size()),
|
||||
outstanding_.load());
|
||||
fflush(stdout);
|
||||
}
|
||||
|
||||
@@ -226,8 +234,8 @@ class Tracker {
|
||||
auto end = sig_list_.end();
|
||||
while (it != end) {
|
||||
auto cur = it++;
|
||||
// The wait should be optiona as there possible some inter kernel dependencies and it possible to wait for
|
||||
// the kernels will never be lunched as the application was finished by some reason.
|
||||
// The wait should be optiona as there possible some inter kernel dependencies and it possible to
|
||||
// wait for the kernels will never be lunched as the application was finished by some reason.
|
||||
#if 0
|
||||
// FIXME: currently the signal value for tracking signals are taken from original application signal
|
||||
hsa_rsrc_->SignalWait((*cur)->signal, 1);
|
||||
@@ -246,20 +254,24 @@ class Tracker {
|
||||
// Debug trace
|
||||
if (trace_on_) {
|
||||
auto outstanding = outstanding_.fetch_sub(1);
|
||||
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record, outstanding);
|
||||
fprintf(stdout, "Tracker::Complete: entry %p, record %p, outst %lu\n", entry, entry->record,
|
||||
outstanding);
|
||||
fflush(stdout);
|
||||
}
|
||||
|
||||
// Query begin/end and complete timestamps
|
||||
if (entry->is_memcopy) {
|
||||
hsa_amd_profiling_async_copy_time_t async_copy_time{};
|
||||
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
|
||||
hsa_status_t status =
|
||||
hsa_api_.hsa_amd_profiling_get_async_copy_time(entry->signal, &async_copy_time);
|
||||
if (status != HSA_STATUS_SUCCESS)
|
||||
EXC_RAISING(status, "hsa_amd_profiling_get_async_copy_time");
|
||||
record->begin = hsa_rsrc_->SysclockToNs(async_copy_time.start);
|
||||
record->end = hsa_rsrc_->SysclockToNs(async_copy_time.end);
|
||||
} else {
|
||||
hsa_amd_profiling_dispatch_time_t dispatch_time{};
|
||||
hsa_status_t status = hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
|
||||
hsa_status_t status =
|
||||
hsa_api_.hsa_amd_profiling_get_dispatch_time(entry->agent, entry->signal, &dispatch_time);
|
||||
if (status != HSA_STATUS_SUCCESS) EXC_RAISING(status, "hsa_amd_profiling_get_dispatch_time");
|
||||
record->begin = hsa_rsrc_->SysclockToNs(dispatch_time.start);
|
||||
record->end = hsa_rsrc_->SysclockToNs(dispatch_time.end);
|
||||
@@ -349,6 +361,6 @@ class Tracker {
|
||||
static const bool trace_on_ = false;
|
||||
};
|
||||
|
||||
} // namespace rocprofiler
|
||||
} // namespace rocprofiler
|
||||
|
||||
#endif // SRC_CORE_TRACKER_H_
|
||||
#endif // SRC_CORE_TRACKER_H_
|
||||
|
||||
@@ -36,11 +36,12 @@ typedef hsa_ext_amd_aql_pm4_packet_t packet_t;
|
||||
typedef uint32_t packet_word_t;
|
||||
typedef uint64_t timestamp_t;
|
||||
|
||||
inline std::ostream& operator<< (std::ostream& out, const event_t& event) {
|
||||
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index << "). counter_id(" << event.counter_id << ")]";
|
||||
inline std::ostream& operator<<(std::ostream& out, const event_t& event) {
|
||||
out << "[block_name(" << event.block_name << "). block_index(" << event.block_index
|
||||
<< "). counter_id(" << event.counter_id << ")]";
|
||||
return out;
|
||||
}
|
||||
inline std::ostream& operator<< (std::ostream& out, const parameter_t& parameter) {
|
||||
inline std::ostream& operator<<(std::ostream& out, const parameter_t& parameter) {
|
||||
out << "[parameter_name(" << parameter.parameter_name << "). value(" << parameter.value << ")]";
|
||||
return out;
|
||||
}
|
||||
|
||||
@@ -35,15 +35,12 @@
|
||||
|
||||
namespace rocprofiler::pc_sampler {
|
||||
|
||||
PCSampler::PCSampler(
|
||||
rocprofiler_buffer_id_t buffer_id,
|
||||
rocprofiler_filter_id_t filter_id,
|
||||
rocprofiler_session_id_t session_id)
|
||||
: buffer_id_(buffer_id)
|
||||
, filter_id_(filter_id)
|
||||
, session_id_(session_id)
|
||||
, pci_system_initialized_(pci_system_init() == 0)
|
||||
{}
|
||||
PCSampler::PCSampler(rocprofiler_buffer_id_t buffer_id, rocprofiler_filter_id_t filter_id,
|
||||
rocprofiler_session_id_t session_id)
|
||||
: buffer_id_(buffer_id),
|
||||
filter_id_(filter_id),
|
||||
session_id_(session_id),
|
||||
pci_system_initialized_(pci_system_init() == 0) {}
|
||||
|
||||
PCSampler::~PCSampler() {
|
||||
if (pci_system_initialized_) {
|
||||
@@ -53,7 +50,9 @@ PCSampler::~PCSampler() {
|
||||
}
|
||||
|
||||
void PCSampler::Start() {
|
||||
if (sampler_thread_.joinable()) { return; }
|
||||
if (sampler_thread_.joinable()) {
|
||||
return;
|
||||
}
|
||||
|
||||
devices_.clear();
|
||||
|
||||
@@ -61,15 +60,15 @@ void PCSampler::Start() {
|
||||
|
||||
agents_t agents;
|
||||
rocprofiler::hsa_support::GetCoreApiTable().hsa_iterate_agents_fn(
|
||||
[](hsa_agent_t agent, void *arg){
|
||||
auto &agents = *reinterpret_cast<agents_t *>(arg);
|
||||
agents.emplace_back(agent);
|
||||
return HSA_STATUS_SUCCESS;
|
||||
},
|
||||
&agents);
|
||||
[](hsa_agent_t agent, void* arg) {
|
||||
auto& agents = *reinterpret_cast<agents_t*>(arg);
|
||||
agents.emplace_back(agent);
|
||||
return HSA_STATUS_SUCCESS;
|
||||
},
|
||||
&agents);
|
||||
|
||||
for (const auto &agent : agents) {
|
||||
const auto &ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
|
||||
for (const auto& agent : agents) {
|
||||
const auto& ai = rocprofiler::hsa_support::GetAgentInfo(agent.handle);
|
||||
if (ai.getType() != HSA_DEVICE_TYPE_GPU) {
|
||||
continue;
|
||||
}
|
||||
@@ -81,31 +80,30 @@ void PCSampler::Start() {
|
||||
}
|
||||
|
||||
void PCSampler::Stop() {
|
||||
if (!sampler_thread_.joinable()) { return; }
|
||||
if (!sampler_thread_.joinable()) {
|
||||
return;
|
||||
}
|
||||
|
||||
keep_running_ = false;
|
||||
sampler_thread_.join();
|
||||
}
|
||||
|
||||
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t &record) {
|
||||
void PCSampler::AddRecord(rocprofiler_record_pc_sample_t& record) {
|
||||
const auto tool = rocprofiler::GetROCProfilerSingleton();
|
||||
const auto session = tool->GetSession(session_id_);
|
||||
const auto buffer = session->GetBuffer(buffer_id_);
|
||||
|
||||
std::lock_guard<std::mutex> lk(session->GetSessionLock());
|
||||
|
||||
record.header = {
|
||||
ROCPROFILER_PC_SAMPLING_RECORD,
|
||||
{ tool->GetUniqueRecordId() }
|
||||
};
|
||||
record.header = {ROCPROFILER_PC_SAMPLING_RECORD, {tool->GetUniqueRecordId()}};
|
||||
buffer->AddRecord(record);
|
||||
}
|
||||
|
||||
void PCSampler::SamplerLoop() {
|
||||
while (keep_running_) {
|
||||
auto next_tick = std::chrono::steady_clock::now() + std::chrono::milliseconds(10);
|
||||
for (auto &agent : devices_) {
|
||||
auto &device = agent.second;
|
||||
for (auto& agent : devices_) {
|
||||
auto& device = agent.second;
|
||||
if (device.fd_.mmio2.get() >= 0) {
|
||||
gfxip::read_pc_samples_v9_ioctl(device, this);
|
||||
} else {
|
||||
@@ -116,4 +114,4 @@ void PCSampler::SamplerLoop() {
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
|
||||
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -23,244 +23,244 @@
|
||||
|
||||
// addressBlock: gc_grbmdec
|
||||
// base address: 0x8000
|
||||
#define mmGRBM_CNTL 0x0000
|
||||
#define mmGRBM_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_SKEW_CNTL 0x0001
|
||||
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS2 0x0002
|
||||
#define mmGRBM_STATUS2_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL 0x0003
|
||||
#define mmGRBM_PWR_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS 0x0004
|
||||
#define mmGRBM_STATUS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE0 0x0005
|
||||
#define mmGRBM_STATUS_SE0_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE1 0x0006
|
||||
#define mmGRBM_STATUS_SE1_BASE_IDX 0
|
||||
#define mmGRBM_SOFT_RESET 0x0008
|
||||
#define mmGRBM_SOFT_RESET_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
|
||||
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE2 0x000e
|
||||
#define mmGRBM_STATUS_SE2_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE3 0x000f
|
||||
#define mmGRBM_STATUS_SE3_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR 0x0016
|
||||
#define mmGRBM_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR2 0x0017
|
||||
#define mmGRBM_READ_ERROR2_BASE_IDX 0
|
||||
#define mmGRBM_INT_CNTL 0x0018
|
||||
#define mmGRBM_INT_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_OP 0x0019
|
||||
#define mmGRBM_TRAP_OP_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR 0x001a
|
||||
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR_MSK 0x001b
|
||||
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD 0x001c
|
||||
#define mmGRBM_TRAP_WD_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD_MSK 0x001d
|
||||
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
|
||||
#define mmGRBM_DSM_BYPASS 0x001e
|
||||
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
|
||||
#define mmGRBM_WRITE_ERROR 0x001f
|
||||
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_IOV_ERROR 0x0020
|
||||
#define mmGRBM_IOV_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHIP_REVISION 0x0021
|
||||
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CNTL 0x0022
|
||||
#define mmGRBM_GFX_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_CFG 0x0023
|
||||
#define mmGRBM_RSMU_CFG_BASE_IDX 0
|
||||
#define mmGRBM_IH_CREDIT 0x0024
|
||||
#define mmGRBM_IH_CREDIT_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL2 0x0025
|
||||
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_READ_ERROR 0x0028
|
||||
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHICKEN_BITS 0x0029
|
||||
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE0 0x002a
|
||||
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE1 0x002b
|
||||
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
|
||||
#define mmGRBM_NOWHERE 0x003f
|
||||
#define mmGRBM_NOWHERE_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG0 0x0040
|
||||
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG1 0x0041
|
||||
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG2 0x0042
|
||||
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG3 0x0043
|
||||
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG4 0x0044
|
||||
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG5 0x0045
|
||||
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG6 0x0046
|
||||
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG7 0x0047
|
||||
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
|
||||
#define mmGRBM_CNTL 0x0000
|
||||
#define mmGRBM_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_SKEW_CNTL 0x0001
|
||||
#define mmGRBM_SKEW_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS2 0x0002
|
||||
#define mmGRBM_STATUS2_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL 0x0003
|
||||
#define mmGRBM_PWR_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_STATUS 0x0004
|
||||
#define mmGRBM_STATUS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE0 0x0005
|
||||
#define mmGRBM_STATUS_SE0_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE1 0x0006
|
||||
#define mmGRBM_STATUS_SE1_BASE_IDX 0
|
||||
#define mmGRBM_SOFT_RESET 0x0008
|
||||
#define mmGRBM_SOFT_RESET_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CLKEN_CNTL 0x000c
|
||||
#define mmGRBM_GFX_CLKEN_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS 0x000d
|
||||
#define mmGRBM_WAIT_IDLE_CLOCKS_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE2 0x000e
|
||||
#define mmGRBM_STATUS_SE2_BASE_IDX 0
|
||||
#define mmGRBM_STATUS_SE3 0x000f
|
||||
#define mmGRBM_STATUS_SE3_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR 0x0016
|
||||
#define mmGRBM_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_READ_ERROR2 0x0017
|
||||
#define mmGRBM_READ_ERROR2_BASE_IDX 0
|
||||
#define mmGRBM_INT_CNTL 0x0018
|
||||
#define mmGRBM_INT_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_OP 0x0019
|
||||
#define mmGRBM_TRAP_OP_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR 0x001a
|
||||
#define mmGRBM_TRAP_ADDR_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_ADDR_MSK 0x001b
|
||||
#define mmGRBM_TRAP_ADDR_MSK_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD 0x001c
|
||||
#define mmGRBM_TRAP_WD_BASE_IDX 0
|
||||
#define mmGRBM_TRAP_WD_MSK 0x001d
|
||||
#define mmGRBM_TRAP_WD_MSK_BASE_IDX 0
|
||||
#define mmGRBM_DSM_BYPASS 0x001e
|
||||
#define mmGRBM_DSM_BYPASS_BASE_IDX 0
|
||||
#define mmGRBM_WRITE_ERROR 0x001f
|
||||
#define mmGRBM_WRITE_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_IOV_ERROR 0x0020
|
||||
#define mmGRBM_IOV_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHIP_REVISION 0x0021
|
||||
#define mmGRBM_CHIP_REVISION_BASE_IDX 0
|
||||
#define mmGRBM_GFX_CNTL 0x0022
|
||||
#define mmGRBM_GFX_CNTL_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_CFG 0x0023
|
||||
#define mmGRBM_RSMU_CFG_BASE_IDX 0
|
||||
#define mmGRBM_IH_CREDIT 0x0024
|
||||
#define mmGRBM_IH_CREDIT_BASE_IDX 0
|
||||
#define mmGRBM_PWR_CNTL2 0x0025
|
||||
#define mmGRBM_PWR_CNTL2_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START 0x0026
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_START_BASE_IDX 0
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END 0x0027
|
||||
#define mmGRBM_UTCL2_INVAL_RANGE_END_BASE_IDX 0
|
||||
#define mmGRBM_RSMU_READ_ERROR 0x0028
|
||||
#define mmGRBM_RSMU_READ_ERROR_BASE_IDX 0
|
||||
#define mmGRBM_CHICKEN_BITS 0x0029
|
||||
#define mmGRBM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE0 0x002a
|
||||
#define mmGRBM_FENCE_RANGE0_BASE_IDX 0
|
||||
#define mmGRBM_FENCE_RANGE1 0x002b
|
||||
#define mmGRBM_FENCE_RANGE1_BASE_IDX 0
|
||||
#define mmGRBM_NOWHERE 0x003f
|
||||
#define mmGRBM_NOWHERE_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG0 0x0040
|
||||
#define mmGRBM_SCRATCH_REG0_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG1 0x0041
|
||||
#define mmGRBM_SCRATCH_REG1_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG2 0x0042
|
||||
#define mmGRBM_SCRATCH_REG2_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG3 0x0043
|
||||
#define mmGRBM_SCRATCH_REG3_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG4 0x0044
|
||||
#define mmGRBM_SCRATCH_REG4_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG5 0x0045
|
||||
#define mmGRBM_SCRATCH_REG5_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG6 0x0046
|
||||
#define mmGRBM_SCRATCH_REG6_BASE_IDX 0
|
||||
#define mmGRBM_SCRATCH_REG7 0x0047
|
||||
#define mmGRBM_SCRATCH_REG7_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_cppdec2
|
||||
// base address: 0xc600
|
||||
#define mmCPF_EDC_TAG_CNT 0x1189
|
||||
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPF_EDC_ROQ_CNT 0x118a
|
||||
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_TAG_CNT 0x118b
|
||||
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_DMA_CNT 0x118d
|
||||
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_SCRATCH_CNT 0x118e
|
||||
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_UCODE_CNT 0x118f
|
||||
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_STATE_CNT 0x1191
|
||||
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_CSINVOC_CNT 0x1192
|
||||
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_RESTORE_CNT 0x1193
|
||||
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
|
||||
#define mmCPF_EDC_TAG_CNT 0x1189
|
||||
#define mmCPF_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPF_EDC_ROQ_CNT 0x118a
|
||||
#define mmCPF_EDC_ROQ_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_TAG_CNT 0x118b
|
||||
#define mmCPG_EDC_TAG_CNT_BASE_IDX 0
|
||||
#define mmCPG_EDC_DMA_CNT 0x118d
|
||||
#define mmCPG_EDC_DMA_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_SCRATCH_CNT 0x118e
|
||||
#define mmCPC_EDC_SCRATCH_CNT_BASE_IDX 0
|
||||
#define mmCPC_EDC_UCODE_CNT 0x118f
|
||||
#define mmCPC_EDC_UCODE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_STATE_CNT 0x1191
|
||||
#define mmDC_EDC_STATE_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_CSINVOC_CNT 0x1192
|
||||
#define mmDC_EDC_CSINVOC_CNT_BASE_IDX 0
|
||||
#define mmDC_EDC_RESTORE_CNT 0x1193
|
||||
#define mmDC_EDC_RESTORE_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_gdsdec
|
||||
// base address: 0x9700
|
||||
#define mmGDS_EDC_CNT 0x05c5
|
||||
#define mmGDS_EDC_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_GRBM_CNT 0x05c6
|
||||
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_DED 0x05c7
|
||||
#define mmGDS_EDC_OA_DED_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
|
||||
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
|
||||
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_CNT 0x05c5
|
||||
#define mmGDS_EDC_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_GRBM_CNT 0x05c6
|
||||
#define mmGDS_EDC_GRBM_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_DED 0x05c7
|
||||
#define mmGDS_EDC_OA_DED_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PHY_CNT 0x05cb
|
||||
#define mmGDS_EDC_OA_PHY_CNT_BASE_IDX 0
|
||||
#define mmGDS_EDC_OA_PIPE_CNT 0x05cc
|
||||
#define mmGDS_EDC_OA_PIPE_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_shsdec
|
||||
// base address: 0x9000
|
||||
#define mmSPI_EDC_CNT 0x0445
|
||||
#define mmSPI_EDC_CNT_BASE_IDX 0
|
||||
#define mmSPI_EDC_CNT 0x0445
|
||||
#define mmSPI_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_sqdec
|
||||
// base address: 0x8c00
|
||||
#define mmSQC_EDC_CNT2 0x032c
|
||||
#define mmSQC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT3 0x032d
|
||||
#define mmSQC_EDC_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_PARITY_CNT3 0x032e
|
||||
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT 0x03a2
|
||||
#define mmSQC_EDC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_SEC_CNT 0x03a3
|
||||
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_DED_CNT 0x03a4
|
||||
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_INFO 0x03a5
|
||||
#define mmSQ_EDC_INFO_BASE_IDX 0
|
||||
#define mmSQ_EDC_CNT 0x03a6
|
||||
#define mmSQ_EDC_CNT_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT2 0x032c
|
||||
#define mmSQC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT3 0x032d
|
||||
#define mmSQC_EDC_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_PARITY_CNT3 0x032e
|
||||
#define mmSQC_EDC_PARITY_CNT3_BASE_IDX 0
|
||||
#define mmSQC_EDC_CNT 0x03a2
|
||||
#define mmSQC_EDC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_SEC_CNT 0x03a3
|
||||
#define mmSQ_EDC_SEC_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_DED_CNT 0x03a4
|
||||
#define mmSQ_EDC_DED_CNT_BASE_IDX 0
|
||||
#define mmSQ_EDC_INFO 0x03a5
|
||||
#define mmSQ_EDC_INFO_BASE_IDX 0
|
||||
#define mmSQ_EDC_CNT 0x03a6
|
||||
#define mmSQ_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_tpdec
|
||||
// base address: 0x9400
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_tcdec
|
||||
// base address: 0xac00
|
||||
#define mmTCP_EDC_CNT 0x0b17
|
||||
#define mmTCP_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCP_EDC_CNT_NEW 0x0b18
|
||||
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
|
||||
#define mmTCI_EDC_CNT 0x0b60
|
||||
#define mmTCI_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT 0x0b82
|
||||
#define mmTCC_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT2 0x0b83
|
||||
#define mmTCC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmTCA_EDC_CNT 0x0bc5
|
||||
#define mmTCA_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCP_EDC_CNT 0x0b17
|
||||
#define mmTCP_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCP_EDC_CNT_NEW 0x0b18
|
||||
#define mmTCP_EDC_CNT_NEW_BASE_IDX 0
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT 0x12b1
|
||||
#define mmTCP_ATC_EDC_GATCL1_CNT_BASE_IDX 0
|
||||
#define mmTCI_EDC_CNT 0x0b60
|
||||
#define mmTCI_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT 0x0b82
|
||||
#define mmTCC_EDC_CNT_BASE_IDX 0
|
||||
#define mmTCC_EDC_CNT2 0x0b83
|
||||
#define mmTCC_EDC_CNT2_BASE_IDX 0
|
||||
#define mmTCA_EDC_CNT 0x0bc5
|
||||
#define mmTCA_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_tpdec
|
||||
// base address: 0x9400
|
||||
#define mmTD_EDC_CNT 0x052e
|
||||
#define mmTD_EDC_CNT_BASE_IDX 0
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
#define mmTD_EDC_CNT 0x052e
|
||||
#define mmTD_EDC_CNT_BASE_IDX 0
|
||||
#define mmTA_EDC_CNT 0x0586
|
||||
#define mmTA_EDC_CNT_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_ea_gceadec2
|
||||
// base address: 0x9c00
|
||||
#define mmGCEA_EDC_CNT 0x0706
|
||||
#define mmGCEA_EDC_CNT_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT2 0x0707
|
||||
#define mmGCEA_EDC_CNT2_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT3 0x071b
|
||||
#define mmGCEA_EDC_CNT3_BASE_IDX 0
|
||||
#define mmGCEA_ERR_STATUS 0x0712
|
||||
#define mmGCEA_ERR_STATUS_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT 0x0706
|
||||
#define mmGCEA_EDC_CNT_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT2 0x0707
|
||||
#define mmGCEA_EDC_CNT2_BASE_IDX 0
|
||||
#define mmGCEA_EDC_CNT3 0x071b
|
||||
#define mmGCEA_EDC_CNT3_BASE_IDX 0
|
||||
#define mmGCEA_ERR_STATUS 0x0712
|
||||
#define mmGCEA_ERR_STATUS_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_gfxudec
|
||||
// base address: 0x30000
|
||||
#define mmSCRATCH_REG0 0x2040
|
||||
#define mmSCRATCH_REG0_BASE_IDX 1
|
||||
#define mmSCRATCH_REG1 0x2041
|
||||
#define mmSCRATCH_REG1_BASE_IDX 1
|
||||
#define mmSCRATCH_REG2 0x2042
|
||||
#define mmSCRATCH_REG2_BASE_IDX 1
|
||||
#define mmSCRATCH_REG3 0x2043
|
||||
#define mmSCRATCH_REG3_BASE_IDX 1
|
||||
#define mmSCRATCH_REG4 0x2044
|
||||
#define mmSCRATCH_REG4_BASE_IDX 1
|
||||
#define mmSCRATCH_REG5 0x2045
|
||||
#define mmSCRATCH_REG5_BASE_IDX 1
|
||||
#define mmSCRATCH_REG6 0x2046
|
||||
#define mmSCRATCH_REG6_BASE_IDX 1
|
||||
#define mmSCRATCH_REG7 0x2047
|
||||
#define mmSCRATCH_REG7_BASE_IDX 1
|
||||
#define mmGRBM_GFX_INDEX 0x2200
|
||||
#define mmGRBM_GFX_INDEX_BASE_IDX 1
|
||||
#define mmSCRATCH_REG0 0x2040
|
||||
#define mmSCRATCH_REG0_BASE_IDX 1
|
||||
#define mmSCRATCH_REG1 0x2041
|
||||
#define mmSCRATCH_REG1_BASE_IDX 1
|
||||
#define mmSCRATCH_REG2 0x2042
|
||||
#define mmSCRATCH_REG2_BASE_IDX 1
|
||||
#define mmSCRATCH_REG3 0x2043
|
||||
#define mmSCRATCH_REG3_BASE_IDX 1
|
||||
#define mmSCRATCH_REG4 0x2044
|
||||
#define mmSCRATCH_REG4_BASE_IDX 1
|
||||
#define mmSCRATCH_REG5 0x2045
|
||||
#define mmSCRATCH_REG5_BASE_IDX 1
|
||||
#define mmSCRATCH_REG6 0x2046
|
||||
#define mmSCRATCH_REG6_BASE_IDX 1
|
||||
#define mmSCRATCH_REG7 0x2047
|
||||
#define mmSCRATCH_REG7_BASE_IDX 1
|
||||
#define mmGRBM_GFX_INDEX 0x2200
|
||||
#define mmGRBM_GFX_INDEX_BASE_IDX 1
|
||||
|
||||
// addressBlock: gc_utcl2_atcl2dec
|
||||
// base address: 0xa000
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX 0x080e
|
||||
#define mmATC_L2_CACHE_4K_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX 0x080f
|
||||
#define mmATC_L2_CACHE_2M_DSM_INDEX_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL 0x0810
|
||||
#define mmATC_L2_CACHE_4K_DSM_CNTL_BASE_IDX 0
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL 0x0811
|
||||
#define mmATC_L2_CACHE_2M_DSM_CNTL_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_utcl2_vml2pfdec
|
||||
// base address: 0xa100
|
||||
#define mmVML2_MEM_ECC_INDEX 0x0860
|
||||
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_INDEX 0x0862
|
||||
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmVML2_MEM_ECC_INDEX 0x0860
|
||||
#define mmVML2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX 0x0861
|
||||
#define mmVML2_WALKER_MEM_ECC_INDEX_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_INDEX 0x0862
|
||||
#define mmUTCL2_MEM_ECC_INDEX_BASE_IDX 0
|
||||
|
||||
#define mmVML2_MEM_ECC_CNTL 0x0863
|
||||
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_CNTL 0x0865
|
||||
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmVML2_MEM_ECC_CNTL 0x0863
|
||||
#define mmVML2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL 0x0864
|
||||
#define mmVML2_WALKER_MEM_ECC_CNTL_BASE_IDX 0
|
||||
#define mmUTCL2_MEM_ECC_CNTL 0x0865
|
||||
#define mmUTCL2_MEM_ECC_CNTL_BASE_IDX 0
|
||||
|
||||
// addressBlock: gc_rlcpdec
|
||||
// base address: 0x3b000
|
||||
#define mmRLC_EDC_CNT 0x4d40
|
||||
#define mmRLC_EDC_CNT_BASE_IDX 1
|
||||
#define mmRLC_EDC_CNT2 0x4d41
|
||||
#define mmRLC_EDC_CNT2_BASE_IDX 1
|
||||
#define mmRLC_EDC_CNT 0x4d40
|
||||
#define mmRLC_EDC_CNT_BASE_IDX 1
|
||||
#define mmRLC_EDC_CNT2 0x4d41
|
||||
#define mmRLC_EDC_CNT2_BASE_IDX 1
|
||||
|
||||
#endif
|
||||
|
||||
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -41,18 +41,17 @@ namespace rocprofiler::pc_sampler::gfxip {
|
||||
|
||||
namespace {
|
||||
|
||||
static int find_pci_instance(const std::string &pci_string) {
|
||||
static int find_pci_instance(const std::string& pci_string) {
|
||||
rocprofiler::handle_t<DIR*, util::dir_closer> dir(opendir(DEBUG_DRI_PATH));
|
||||
if (dir.get() == nullptr) {
|
||||
char *errstr = strerror(errno);
|
||||
char* errstr = strerror(errno);
|
||||
warning("Can't open debugfs dri directory: %s\n", errstr);
|
||||
goto fail;
|
||||
}
|
||||
|
||||
struct dirent *dent;
|
||||
struct dirent* dent;
|
||||
while ((dent = readdir(dir.get())) != nullptr) {
|
||||
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0)
|
||||
continue;
|
||||
if (strcmp(dent->d_name, ".") == 0 || strcmp(dent->d_name, "..") == 0) continue;
|
||||
|
||||
std::string name(DEBUG_DRI_PATH);
|
||||
name += dent->d_name;
|
||||
@@ -66,8 +65,7 @@ static int find_pci_instance(const std::string &pci_string) {
|
||||
ifs >> device;
|
||||
}
|
||||
if (device.empty()) continue;
|
||||
if (auto p = device.find(DEV_PFX); p != device.npos)
|
||||
device.erase(p, strlen(DEV_PFX));
|
||||
if (auto p = device.find(DEV_PFX); p != device.npos) device.erase(p, strlen(DEV_PFX));
|
||||
if (pci_string == device) return std::stoi(dent->d_name);
|
||||
}
|
||||
|
||||
@@ -75,7 +73,7 @@ fail:
|
||||
return -1;
|
||||
}
|
||||
|
||||
} // namespace
|
||||
} // namespace
|
||||
|
||||
uint32_t pasid() {
|
||||
static std::optional<uint32_t> pasid;
|
||||
@@ -89,9 +87,7 @@ uint32_t pasid() {
|
||||
return *pasid;
|
||||
}
|
||||
|
||||
int debugfs_ioctl_set_state(
|
||||
const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc) {
|
||||
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc) {
|
||||
int ret = ioctl(dev.fd_.mmio2.get(), AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE, &ioc);
|
||||
if (ret < 0) {
|
||||
fatal("Couldn't set register ioctl state\n");
|
||||
@@ -99,11 +95,9 @@ int debugfs_ioctl_set_state(
|
||||
return ret;
|
||||
}
|
||||
|
||||
int debugfs_ioctl_write_register(
|
||||
const device_t &dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc,
|
||||
const uint64_t addr,
|
||||
const uint32_t value) {
|
||||
int debugfs_ioctl_write_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr, const uint32_t value) {
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
if (lseek(dev.fd_.mmio2.get(), addr * 4, SEEK_SET) < 0) {
|
||||
fatal("Cannot seek to MMIO address for write\n");
|
||||
@@ -115,10 +109,9 @@ int debugfs_ioctl_write_register(
|
||||
return r;
|
||||
}
|
||||
|
||||
uint32_t debugfs_ioctl_read_register(
|
||||
const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc,
|
||||
const uint64_t addr) {
|
||||
uint32_t debugfs_ioctl_read_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr) {
|
||||
// Select the SE, SH, and CU.
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
|
||||
@@ -134,20 +127,17 @@ uint32_t debugfs_ioctl_read_register(
|
||||
return value;
|
||||
}
|
||||
|
||||
device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
|
||||
: agent_info_(info)
|
||||
, pci_memory_(nullptr)
|
||||
{
|
||||
device_t::device_t(const bool pci_inited, const Agent::AgentInfo& info)
|
||||
: agent_info_(info), pci_memory_(nullptr) {
|
||||
const auto pci_domain = agent_info_.getPCIDomain();
|
||||
const auto pci_location_id = agent_info_.getPCILocationID();
|
||||
|
||||
std::string name([pci_domain, pci_location_id]() {
|
||||
std::ostringstream out;
|
||||
out.fill('0');
|
||||
out << std::hex << std::setw(4) << pci_domain << ':'
|
||||
<< std::hex << std::setw(2) << (pci_location_id >> 8) << ':'
|
||||
<< std::hex << std::setw(2) << (pci_location_id & 0xFF) << '.'
|
||||
<< 0;
|
||||
out << std::hex << std::setw(4) << pci_domain << ':' << std::hex << std::setw(2)
|
||||
<< (pci_location_id >> 8) << ':' << std::hex << std::setw(2) << (pci_location_id & 0xFF)
|
||||
<< '.' << 0;
|
||||
return out.str();
|
||||
}());
|
||||
|
||||
@@ -162,8 +152,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
|
||||
if (fd_.mmio2.get() < 0) {
|
||||
warning("Couldn't open amdgpu_regs2 debugfs file\n");
|
||||
if (!pci_inited) {
|
||||
constexpr char msg[] =
|
||||
"PCI system uninitialized; no PC sampling methods available\n";
|
||||
constexpr char msg[] = "PCI system uninitialized; no PC sampling methods available\n";
|
||||
fatal(msg);
|
||||
}
|
||||
} else {
|
||||
@@ -173,8 +162,7 @@ device_t::device_t(const bool pci_inited, const Agent::AgentInfo &info)
|
||||
|
||||
pci_device_ =
|
||||
pci_device_find_by_slot(pci_domain, pci_location_id >> 8, pci_location_id & 0xFF, 0);
|
||||
if (!pci_device_ || pci_device_probe(pci_device_))
|
||||
fatal("failed to probe the GPU device\n");
|
||||
if (!pci_device_ || pci_device_probe(pci_device_)) fatal("failed to probe the GPU device\n");
|
||||
|
||||
// Look for a region between 256KB and 4096KB, 32-bit, non IO, and non prefetchable.
|
||||
for (size_t region = 0; region < sizeof(pci_device::regions) / sizeof(pci_device::regions[0]);
|
||||
@@ -199,11 +187,9 @@ device_specific_init:
|
||||
}
|
||||
|
||||
device_t::~device_t() {
|
||||
if (pci_memory_ &&
|
||||
pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_))
|
||||
{
|
||||
if (pci_memory_ && pci_device_unmap_range(pci_device_, pci_memory_, pci_memory_size_)) {
|
||||
warning("failed to unmap the pci memory\n");
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace rocprofiler::pc_sampler::gfxip
|
||||
} // namespace rocprofiler::pc_sampler::gfxip
|
||||
|
||||
@@ -52,14 +52,18 @@ namespace gfxip {
|
||||
namespace util {
|
||||
|
||||
struct dir_closer {
|
||||
void operator()(DIR *dir) { if (dir != nullptr) closedir(dir); }
|
||||
void operator()(DIR* dir) {
|
||||
if (dir != nullptr) closedir(dir);
|
||||
}
|
||||
};
|
||||
|
||||
struct fd_closer {
|
||||
void operator()(int fd) { if (fd >= 0) close(fd); }
|
||||
void operator()(int fd) {
|
||||
if (fd >= 0) close(fd);
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace rocprofiler::pc_sampler::gfxip::util
|
||||
} // namespace util
|
||||
|
||||
struct amdgpu_debugfs_regs2_iocdata {
|
||||
__u32 use_srbm, use_grbm, pg_lock;
|
||||
@@ -71,11 +75,10 @@ struct amdgpu_debugfs_regs2_iocdata {
|
||||
} srbm;
|
||||
};
|
||||
|
||||
enum AMDGPU_DEBUGFS_REGS2_CMDS {
|
||||
AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0
|
||||
};
|
||||
enum AMDGPU_DEBUGFS_REGS2_CMDS { AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE = 0 };
|
||||
|
||||
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE _IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
|
||||
#define AMDGPU_DEBUGFS_REGS2_IOC_SET_STATE \
|
||||
_IOWR(0x20, AMDGPU_DEBUGFS_REGS2_CMD_SET_STATE, struct amdgpu_debugfs_regs2_iocdata)
|
||||
|
||||
enum {
|
||||
GC_HWIP = 1, // Graphics Core IP
|
||||
@@ -96,14 +99,14 @@ static constexpr int HWIP_MAX_INSTANCE = 11;
|
||||
(REG_FIELD_MASK(reg, field) & ((field_val) << REG_FIELD_SHIFT(reg, field))))
|
||||
|
||||
struct device_t {
|
||||
device_t(const bool pci_inited, const Agent::AgentInfo &agent_info);
|
||||
device_t(const bool pci_inited, const Agent::AgentInfo& agent_info);
|
||||
~device_t();
|
||||
|
||||
device_t(const device_t&) = delete;
|
||||
device_t& operator=(const device_t&) = delete;
|
||||
device_t(device_t&&) = default;
|
||||
|
||||
const Agent::AgentInfo &agent_info_;
|
||||
const Agent::AgentInfo& agent_info_;
|
||||
|
||||
struct pci_device* pci_device_;
|
||||
size_t pci_memory_size_;
|
||||
@@ -120,19 +123,23 @@ struct device_t {
|
||||
|
||||
uint32_t pasid();
|
||||
|
||||
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc);
|
||||
int debugfs_ioctl_write_register(const device_t &dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr, const uint32_t value);
|
||||
uint32_t debugfs_ioctl_read_register(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata &ioc, const uint64_t addr);
|
||||
int debugfs_ioctl_set_state(const device_t& dev, const struct amdgpu_debugfs_regs2_iocdata& ioc);
|
||||
int debugfs_ioctl_write_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr, const uint32_t value);
|
||||
uint32_t debugfs_ioctl_read_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint64_t addr);
|
||||
|
||||
void vega10_reg_offset_init(device_t& dev);
|
||||
void vega20_reg_offset_init(device_t& dev);
|
||||
void arct_reg_offset_init(device_t& dev);
|
||||
void aldebaran_reg_offset_init(device_t& dev);
|
||||
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler);
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler);
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler);
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler);
|
||||
|
||||
} // namespace rocprofiler::pc_sampler::gfxip
|
||||
} // namespace gfxip
|
||||
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
} // namespace rocprofiler::pc_sampler
|
||||
#endif // SRC_PCSAMPLER_GFXIP_GFXIP_H_
|
||||
|
||||
@@ -54,12 +54,10 @@ uint32_t read_sq_register(const device_t& dev, uint32_t simd, uint32_t wave_id,
|
||||
return dev.pci_memory_[REG_OFFSET(GC, 0, mmSQ_IND_DATA)];
|
||||
}
|
||||
|
||||
uint32_t debugfs_ioctl_read_sq_register(
|
||||
const device_t &dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata &ioc,
|
||||
const uint32_t simd,
|
||||
const uint32_t wave_id,
|
||||
const uint32_t register_address) {
|
||||
uint32_t debugfs_ioctl_read_sq_register(const device_t& dev,
|
||||
const struct amdgpu_debugfs_regs2_iocdata& ioc,
|
||||
const uint32_t simd, const uint32_t wave_id,
|
||||
const uint32_t register_address) {
|
||||
uint32_t data = REG_SET_FIELD(0, SQ_IND_INDEX, WAVE_ID, wave_id);
|
||||
data = REG_SET_FIELD(data, SQ_IND_INDEX, SIMD_ID, simd);
|
||||
data = REG_SET_FIELD(data, SQ_IND_INDEX, INDEX, register_address);
|
||||
@@ -67,21 +65,15 @@ uint32_t debugfs_ioctl_read_sq_register(
|
||||
return debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmSQ_IND_DATA));
|
||||
}
|
||||
|
||||
void fill_record(
|
||||
const device_t &dev,
|
||||
rocprofiler_record_pc_sample_t *record,
|
||||
uint32_t se,
|
||||
uint64_t pc,
|
||||
hsa_kernel_dispatch_packet_t *pkt) {
|
||||
|
||||
void fill_record(const device_t& dev, rocprofiler_record_pc_sample_t* record, uint32_t se,
|
||||
uint64_t pc, hsa_kernel_dispatch_packet_t* pkt) {
|
||||
/*
|
||||
* XXX: Use of the reserved2 field in the HSA dispatch packet to uniquely
|
||||
* identify kernel dispatches for PC sampling is an internal implementation
|
||||
* detail which is subject to change. See the comment associated with
|
||||
* rocprofiler::rocprofiler::kernel_dispatch_counter_.
|
||||
*/
|
||||
record->pc_sample.dispatch_id =
|
||||
rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
|
||||
record->pc_sample.dispatch_id = rocprofiler_kernel_dispatch_id_t{pkt->reserved2};
|
||||
|
||||
/*
|
||||
* TODO: Fill this with gpu_clock_counter via AMDKFD_IOC_GET_CLOCK_COUNTERS,
|
||||
@@ -98,12 +90,12 @@ void fill_record(
|
||||
* Future sampling methods may fill this in automatically from the GPU's
|
||||
* real-time counter.
|
||||
*/
|
||||
//record->pc_sample.cycle = 0;
|
||||
// record->pc_sample.cycle = 0;
|
||||
rocprofiler_get_timestamp(&record->pc_sample.timestamp);
|
||||
|
||||
record->pc_sample.pc = pc;
|
||||
record->pc_sample.se = se;
|
||||
const auto &hdl = dev.agent_info_.getHandle();
|
||||
const auto& hdl = dev.agent_info_.getHandle();
|
||||
|
||||
/*
|
||||
* XXX FIXME: For consistency, this is the same method as used by
|
||||
@@ -112,17 +104,16 @@ void fill_record(
|
||||
* comment in rocprofiler::hsa_support::Initialize about using KFD's gpu_id for
|
||||
* more information.
|
||||
*/
|
||||
record->pc_sample.gpu_id = rocprofiler_agent_id_t{
|
||||
(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
|
||||
record->pc_sample.gpu_id =
|
||||
rocprofiler_agent_id_t{(uint64_t)rocprofiler::hsa_support::GetAgentInfo(hdl).getIndex()};
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
|
||||
void read_pc_samples_v9(const device_t& dev, PCSampler* sampler) {
|
||||
assert(sampler);
|
||||
|
||||
uint32_t saved_grbm_gfx_index =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
|
||||
uint32_t saved_grbm_gfx_index = dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)];
|
||||
uint32_t data;
|
||||
|
||||
for (uint32_t se = 0; se < dev.agent_info_.getShaderEngineCount(); ++se)
|
||||
@@ -174,19 +165,16 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
|
||||
data = REG_SET_FIELD(data, GRBM_GFX_CNTL, VMID, vm_id);
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL)] = data;
|
||||
|
||||
uint32_t pq_base_lo =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
|
||||
uint32_t pq_base_hi =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
|
||||
uint32_t pq_base_lo = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)];
|
||||
uint32_t pq_base_hi = dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)] & 0xff;
|
||||
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
|
||||
uint32_t cp_hqd_pq_control_queue_size =
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)] & 0x3f;
|
||||
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
|
||||
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(
|
||||
pq_base + disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t)
|
||||
);
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
|
||||
disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t));
|
||||
fill_record(dev, &record, se, *pc, pkt);
|
||||
}
|
||||
|
||||
@@ -208,10 +196,10 @@ void read_pc_samples_v9(const device_t& dev, PCSampler *sampler) {
|
||||
dev.pci_memory_[REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX)] = saved_grbm_gfx_index;
|
||||
}
|
||||
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler* sampler) {
|
||||
assert(sampler);
|
||||
|
||||
struct amdgpu_debugfs_regs2_iocdata ioc{};
|
||||
struct amdgpu_debugfs_regs2_iocdata ioc {};
|
||||
ioc.use_grbm = 1;
|
||||
|
||||
uint32_t data;
|
||||
@@ -236,11 +224,13 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
|
||||
// Skip this slot if the wave is not valid.
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
uint32_t status = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
|
||||
uint32_t status =
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_STATUS);
|
||||
if (!REG_GET_FIELD(status, SQ_WAVE_STATUS, VALID)) continue;
|
||||
|
||||
debugfs_ioctl_set_state(dev, ioc);
|
||||
uint32_t hw_id = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
|
||||
uint32_t hw_id =
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_HW_ID);
|
||||
uint32_t vm_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, VM_ID);
|
||||
|
||||
rocprofiler_record_pc_sample_t record;
|
||||
@@ -248,12 +238,16 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
// If the wave's PASID matches the process', read and report the PC
|
||||
// and dispatch packet for the wave.
|
||||
std::optional<uint64_t> pc;
|
||||
if (debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
|
||||
pc = (uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI) << 32 |
|
||||
if (debugfs_ioctl_read_register(
|
||||
dev, ioc, REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vm_id) == pasid()) {
|
||||
pc =
|
||||
(uint64_t)debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_HI)
|
||||
<< 32 |
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_PC_LO);
|
||||
|
||||
// The dispatch index into the queue
|
||||
uint32_t disp_idx = debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
|
||||
uint32_t disp_idx =
|
||||
debugfs_ioctl_read_sq_register(dev, ioc, simd, wave_id, ixSQ_WAVE_TTMP6);
|
||||
|
||||
// Set up reading CP_HQD_PQ_BASE and CP_HQD_PQ_BASE_HI
|
||||
uint32_t pipe_id = REG_GET_FIELD(hw_id, SQ_WAVE_HW_ID, PIPE_ID);
|
||||
@@ -266,18 +260,19 @@ void read_pc_samples_v9_ioctl(const device_t& dev, PCSampler *sampler) {
|
||||
debugfs_ioctl_write_register(dev, ioc, REG_OFFSET(GC, 0, mmGRBM_GFX_CNTL), data);
|
||||
|
||||
uint32_t pq_base_lo =
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE));
|
||||
uint32_t pq_base_hi =
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) & 0xff;
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)) &
|
||||
0xff;
|
||||
uint64_t pq_base = (uint64_t)pq_base_hi << 40 | (uint64_t)pq_base_lo << 8;
|
||||
uint32_t cp_hqd_pq_control_queue_size =
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) & 0x3f;
|
||||
debugfs_ioctl_read_register(dev, ioc, REG_OFFSET(GC, 0, mmCP_HQD_PQ_CONTROL)) &
|
||||
0x3f;
|
||||
uint32_t queue_size = 1 << (cp_hqd_pq_control_queue_size + 1);
|
||||
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(
|
||||
pq_base + disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t)
|
||||
);
|
||||
auto pkt = (hsa_kernel_dispatch_packet_t*)(pq_base +
|
||||
disp_idx % queue_size *
|
||||
sizeof(hsa_kernel_dispatch_packet_t));
|
||||
fill_record(dev, &record, se, *pc, pkt);
|
||||
}
|
||||
|
||||
|
||||
@@ -22,306 +22,305 @@
|
||||
#define _osssys_4_0_OFFSET_HEADER
|
||||
|
||||
|
||||
|
||||
// addressBlock: osssys_osssysdec
|
||||
// base address: 0x4280
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x0088
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x0089
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008a
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008b
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x008c
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x008f
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0090
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0091
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x0092
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x0093
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x0094
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x0097
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x0098
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x0088
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x0089
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008a
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008b
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x008c
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x008f
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0090
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0091
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x0092
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x0093
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x0094
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x0097
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x0098
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
|
||||
#endif
|
||||
|
||||
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
@@ -24,322 +24,321 @@
|
||||
#define _osssys_4_2_0_OFFSET_HEADER
|
||||
|
||||
|
||||
|
||||
// addressBlock: osssys_osssysdec
|
||||
// base address: 0x4280
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x008c
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x008d
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008e
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008f
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x0090
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x0093
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0098
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0099
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x009a
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x009b
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x009c
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x009f
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x00a5
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_MEM_POWER_CTRL 0x00e8
|
||||
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_MEM_POWER_CTRL 0x0110
|
||||
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_CNTL 0x018e
|
||||
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
|
||||
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
|
||||
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD_1 0x01ad
|
||||
#define mmSEM_RESP_UVD_1_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT 0x0000
|
||||
#define mmIH_VMID_0_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT 0x0001
|
||||
#define mmIH_VMID_1_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT 0x0002
|
||||
#define mmIH_VMID_2_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT 0x0003
|
||||
#define mmIH_VMID_3_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT 0x0004
|
||||
#define mmIH_VMID_4_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT 0x0005
|
||||
#define mmIH_VMID_5_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT 0x0006
|
||||
#define mmIH_VMID_6_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT 0x0007
|
||||
#define mmIH_VMID_7_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT 0x0008
|
||||
#define mmIH_VMID_8_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT 0x0009
|
||||
#define mmIH_VMID_9_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT 0x000a
|
||||
#define mmIH_VMID_10_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT 0x000b
|
||||
#define mmIH_VMID_11_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT 0x000c
|
||||
#define mmIH_VMID_12_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT 0x000d
|
||||
#define mmIH_VMID_13_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT 0x000e
|
||||
#define mmIH_VMID_14_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT 0x000f
|
||||
#define mmIH_VMID_15_LUT_BASE_IDX 0
|
||||
#define mmIH_VMID_0_LUT_MM 0x0010
|
||||
#define mmIH_VMID_0_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_1_LUT_MM 0x0011
|
||||
#define mmIH_VMID_1_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_2_LUT_MM 0x0012
|
||||
#define mmIH_VMID_2_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_3_LUT_MM 0x0013
|
||||
#define mmIH_VMID_3_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_4_LUT_MM 0x0014
|
||||
#define mmIH_VMID_4_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_5_LUT_MM 0x0015
|
||||
#define mmIH_VMID_5_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_6_LUT_MM 0x0016
|
||||
#define mmIH_VMID_6_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_7_LUT_MM 0x0017
|
||||
#define mmIH_VMID_7_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_8_LUT_MM 0x0018
|
||||
#define mmIH_VMID_8_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_9_LUT_MM 0x0019
|
||||
#define mmIH_VMID_9_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_10_LUT_MM 0x001a
|
||||
#define mmIH_VMID_10_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_11_LUT_MM 0x001b
|
||||
#define mmIH_VMID_11_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_12_LUT_MM 0x001c
|
||||
#define mmIH_VMID_12_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_13_LUT_MM 0x001d
|
||||
#define mmIH_VMID_13_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_14_LUT_MM 0x001e
|
||||
#define mmIH_VMID_14_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_VMID_15_LUT_MM 0x001f
|
||||
#define mmIH_VMID_15_LUT_MM_BASE_IDX 0
|
||||
#define mmIH_COOKIE_0 0x0020
|
||||
#define mmIH_COOKIE_0_BASE_IDX 0
|
||||
#define mmIH_COOKIE_1 0x0021
|
||||
#define mmIH_COOKIE_1_BASE_IDX 0
|
||||
#define mmIH_COOKIE_2 0x0022
|
||||
#define mmIH_COOKIE_2_BASE_IDX 0
|
||||
#define mmIH_COOKIE_3 0x0023
|
||||
#define mmIH_COOKIE_3_BASE_IDX 0
|
||||
#define mmIH_COOKIE_4 0x0024
|
||||
#define mmIH_COOKIE_4_BASE_IDX 0
|
||||
#define mmIH_COOKIE_5 0x0025
|
||||
#define mmIH_COOKIE_5_BASE_IDX 0
|
||||
#define mmIH_COOKIE_6 0x0026
|
||||
#define mmIH_COOKIE_6_BASE_IDX 0
|
||||
#define mmIH_COOKIE_7 0x0027
|
||||
#define mmIH_COOKIE_7_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART0 0x003f
|
||||
#define mmIH_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_0 0x0040
|
||||
#define mmSEM_REQ_INPUT_0_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_1 0x0041
|
||||
#define mmSEM_REQ_INPUT_1_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_2 0x0042
|
||||
#define mmSEM_REQ_INPUT_2_BASE_IDX 0
|
||||
#define mmSEM_REQ_INPUT_3 0x0043
|
||||
#define mmSEM_REQ_INPUT_3_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART0 0x007f
|
||||
#define mmSEM_REGISTER_LAST_PART0_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL 0x0080
|
||||
#define mmIH_RB_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB_BASE 0x0081
|
||||
#define mmIH_RB_BASE_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI 0x0082
|
||||
#define mmIH_RB_BASE_HI_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR 0x0083
|
||||
#define mmIH_RB_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR 0x0084
|
||||
#define mmIH_RB_WPTR_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_HI 0x0085
|
||||
#define mmIH_RB_WPTR_ADDR_HI_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_ADDR_LO 0x0086
|
||||
#define mmIH_RB_WPTR_ADDR_LO_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR 0x0087
|
||||
#define mmIH_DOORBELL_RPTR_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING1 0x008c
|
||||
#define mmIH_RB_CNTL_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING1 0x008d
|
||||
#define mmIH_RB_BASE_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING1 0x008e
|
||||
#define mmIH_RB_BASE_HI_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING1 0x008f
|
||||
#define mmIH_RB_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING1 0x0090
|
||||
#define mmIH_RB_WPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING1 0x0093
|
||||
#define mmIH_DOORBELL_RPTR_RING1_BASE_IDX 0
|
||||
#define mmIH_RB_CNTL_RING2 0x0098
|
||||
#define mmIH_RB_CNTL_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_RING2 0x0099
|
||||
#define mmIH_RB_BASE_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_BASE_HI_RING2 0x009a
|
||||
#define mmIH_RB_BASE_HI_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_RPTR_RING2 0x009b
|
||||
#define mmIH_RB_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_RB_WPTR_RING2 0x009c
|
||||
#define mmIH_RB_WPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_DOORBELL_RPTR_RING2 0x009f
|
||||
#define mmIH_DOORBELL_RPTR_RING2_BASE_IDX 0
|
||||
#define mmIH_VERSION 0x00a5
|
||||
#define mmIH_VERSION_BASE_IDX 0
|
||||
#define mmIH_CNTL 0x00c0
|
||||
#define mmIH_CNTL_BASE_IDX 0
|
||||
#define mmIH_CNTL2 0x00c1
|
||||
#define mmIH_CNTL2_BASE_IDX 0
|
||||
#define mmIH_STATUS 0x00c2
|
||||
#define mmIH_STATUS_BASE_IDX 0
|
||||
#define mmIH_PERFMON_CNTL 0x00c3
|
||||
#define mmIH_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER0_RESULT 0x00c4
|
||||
#define mmIH_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmIH_PERFCOUNTER1_RESULT 0x00c5
|
||||
#define mmIH_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0 0x00c7
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_31_0_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32 0x00c8
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_63_32_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64 0x00c9
|
||||
#define mmIH_DSM_MATCH_VALUE_BIT_95_64_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL 0x00ca
|
||||
#define mmIH_DSM_MATCH_FIELD_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL 0x00cb
|
||||
#define mmIH_DSM_MATCH_DATA_CONTROL_BASE_IDX 0
|
||||
#define mmIH_DSM_MATCH_FCN_ID 0x00cc
|
||||
#define mmIH_DSM_MATCH_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL 0x00cd
|
||||
#define mmIH_LIMIT_INT_RATE_CNTL_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS 0x00ce
|
||||
#define mmIH_VF_RB_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB_STATUS2 0x00cf
|
||||
#define mmIH_VF_RB_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS 0x00d0
|
||||
#define mmIH_VF_RB1_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB1_STATUS2 0x00d1
|
||||
#define mmIH_VF_RB1_STATUS2_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS 0x00d2
|
||||
#define mmIH_VF_RB2_STATUS_BASE_IDX 0
|
||||
#define mmIH_VF_RB2_STATUS2 0x00d3
|
||||
#define mmIH_VF_RB2_STATUS2_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_CNTL 0x00d5
|
||||
#define mmIH_INT_FLOOD_CNTL_BASE_IDX 0
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS 0x00d6
|
||||
#define mmIH_RB0_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS 0x00d7
|
||||
#define mmIH_RB1_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS 0x00d8
|
||||
#define mmIH_RB2_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_INT_FLOOD_STATUS 0x00d9
|
||||
#define mmIH_INT_FLOOD_STATUS_BASE_IDX 0
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL 0x00da
|
||||
#define mmIH_STORM_CLIENT_LIST_CNTL_BASE_IDX 0
|
||||
#define mmIH_CLK_CTRL 0x00db
|
||||
#define mmIH_CLK_CTRL_BASE_IDX 0
|
||||
#define mmIH_INT_FLAGS 0x00dc
|
||||
#define mmIH_INT_FLAGS_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO0 0x00dd
|
||||
#define mmIH_LAST_INT_INFO0_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO1 0x00de
|
||||
#define mmIH_LAST_INT_INFO1_BASE_IDX 0
|
||||
#define mmIH_LAST_INT_INFO2 0x00df
|
||||
#define mmIH_LAST_INT_INFO2_BASE_IDX 0
|
||||
#define mmIH_SCRATCH 0x00e0
|
||||
#define mmIH_SCRATCH_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CREDIT_ERROR 0x00e1
|
||||
#define mmIH_CLIENT_CREDIT_ERROR_BASE_IDX 0
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG 0x00e2
|
||||
#define mmIH_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG 0x00e3
|
||||
#define mmIH_COOKIE_REC_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmIH_CREDIT_STATUS 0x00e4
|
||||
#define mmIH_CREDIT_STATUS_BASE_IDX 0
|
||||
#define mmIH_MMHUB_ERROR 0x00e5
|
||||
#define mmIH_MMHUB_ERROR_BASE_IDX 0
|
||||
#define mmIH_MEM_POWER_CTRL 0x00e8
|
||||
#define mmIH_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART2 0x00ff
|
||||
#define mmIH_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmSEM_CLK_CTRL 0x0100
|
||||
#define mmSEM_CLK_CTRL_BASE_IDX 0
|
||||
#define mmSEM_UTC_CREDIT 0x0101
|
||||
#define mmSEM_UTC_CREDIT_BASE_IDX 0
|
||||
#define mmSEM_UTC_CONFIG 0x0102
|
||||
#define mmSEM_UTC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT 0x0103
|
||||
#define mmSEM_UTCL2_TRAN_EN_LUT_BASE_IDX 0
|
||||
#define mmSEM_MCIF_CONFIG 0x0104
|
||||
#define mmSEM_MCIF_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_PERFMON_CNTL 0x0105
|
||||
#define mmSEM_PERFMON_CNTL_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER0_RESULT 0x0106
|
||||
#define mmSEM_PERFCOUNTER0_RESULT_BASE_IDX 0
|
||||
#define mmSEM_PERFCOUNTER1_RESULT 0x0107
|
||||
#define mmSEM_PERFCOUNTER1_RESULT_BASE_IDX 0
|
||||
#define mmSEM_STATUS 0x0108
|
||||
#define mmSEM_STATUS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG 0x0109
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX 0x010a
|
||||
#define mmSEM_MAILBOX_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CONTROL 0x010b
|
||||
#define mmSEM_MAILBOX_CONTROL_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS 0x010c
|
||||
#define mmSEM_CHICKEN_BITS_BASE_IDX 0
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA 0x010d
|
||||
#define mmSEM_MAILBOX_CLIENTCONFIG_EXTRA_BASE_IDX 0
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG 0x010e
|
||||
#define mmSEM_GPU_IOV_VIOLATION_LOG_BASE_IDX 0
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD 0x010f
|
||||
#define mmSEM_OUTSTANDING_THRESHOLD_BASE_IDX 0
|
||||
#define mmSEM_MEM_POWER_CTRL 0x0110
|
||||
#define mmSEM_MEM_POWER_CTRL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART2 0x017f
|
||||
#define mmSEM_REGISTER_LAST_PART2_BASE_IDX 0
|
||||
#define mmIH_ACTIVE_FCN_ID 0x0180
|
||||
#define mmIH_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmIH_VIRT_RESET_REQ 0x0181
|
||||
#define mmIH_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG 0x0184
|
||||
#define mmIH_CLIENT_CFG_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_INDEX 0x0188
|
||||
#define mmIH_CLIENT_CFG_INDEX_BASE_IDX 0
|
||||
#define mmIH_CLIENT_CFG_DATA 0x0189
|
||||
#define mmIH_CLIENT_CFG_DATA_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_INDEX 0x018a
|
||||
#define mmIH_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmIH_CID_REMAP_DATA 0x018b
|
||||
#define mmIH_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmIH_CHICKEN 0x018c
|
||||
#define mmIH_CHICKEN_BASE_IDX 0
|
||||
#define mmIH_MMHUB_CNTL 0x018d
|
||||
#define mmIH_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_CNTL 0x018e
|
||||
#define mmIH_INT_DROP_CNTL_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0 0x018f
|
||||
#define mmIH_INT_DROP_MATCH_VALUE0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1 0x0190
|
||||
#define mmIH_INT_DROP_MATCH_VALUE1_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK0 0x0191
|
||||
#define mmIH_INT_DROP_MATCH_MASK0_BASE_IDX 0
|
||||
#define mmIH_INT_DROP_MATCH_MASK1 0x0192
|
||||
#define mmIH_INT_DROP_MATCH_MASK1_BASE_IDX 0
|
||||
#define mmIH_REGISTER_LAST_PART1 0x019f
|
||||
#define mmIH_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
#define mmSEM_ACTIVE_FCN_ID 0x01a0
|
||||
#define mmSEM_ACTIVE_FCN_ID_BASE_IDX 0
|
||||
#define mmSEM_VIRT_RESET_REQ 0x01a1
|
||||
#define mmSEM_VIRT_RESET_REQ_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA0 0x01a4
|
||||
#define mmSEM_RESP_SDMA0_BASE_IDX 0
|
||||
#define mmSEM_RESP_SDMA1 0x01a5
|
||||
#define mmSEM_RESP_SDMA1_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD 0x01a6
|
||||
#define mmSEM_RESP_UVD_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_0 0x01a7
|
||||
#define mmSEM_RESP_VCE_0_BASE_IDX 0
|
||||
#define mmSEM_RESP_ACP 0x01a8
|
||||
#define mmSEM_RESP_ACP_BASE_IDX 0
|
||||
#define mmSEM_RESP_ISP 0x01a9
|
||||
#define mmSEM_RESP_ISP_BASE_IDX 0
|
||||
#define mmSEM_RESP_VCE_1 0x01aa
|
||||
#define mmSEM_RESP_VCE_1_BASE_IDX 0
|
||||
#define mmSEM_RESP_VP8 0x01ab
|
||||
#define mmSEM_RESP_VP8_BASE_IDX 0
|
||||
#define mmSEM_RESP_GC 0x01ac
|
||||
#define mmSEM_RESP_GC_BASE_IDX 0
|
||||
#define mmSEM_RESP_UVD_1 0x01ad
|
||||
#define mmSEM_RESP_UVD_1_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_INDEX 0x01b0
|
||||
#define mmSEM_CID_REMAP_INDEX_BASE_IDX 0
|
||||
#define mmSEM_CID_REMAP_DATA 0x01b1
|
||||
#define mmSEM_CID_REMAP_DATA_BASE_IDX 0
|
||||
#define mmSEM_ATOMIC_OP_LUT 0x01b2
|
||||
#define mmSEM_ATOMIC_OP_LUT_BASE_IDX 0
|
||||
#define mmSEM_EDC_CONFIG 0x01b3
|
||||
#define mmSEM_EDC_CONFIG_BASE_IDX 0
|
||||
#define mmSEM_CHICKEN_BITS2 0x01b4
|
||||
#define mmSEM_CHICKEN_BITS2_BASE_IDX 0
|
||||
#define mmSEM_MMHUB_CNTL 0x01b5
|
||||
#define mmSEM_MMHUB_CNTL_BASE_IDX 0
|
||||
#define mmSEM_REGISTER_LAST_PART1 0x01bf
|
||||
#define mmSEM_REGISTER_LAST_PART1_BASE_IDX 0
|
||||
|
||||
#endif
|
||||
|
||||
تفاوت فایلی نمایش داده نمی شود زیرا این فایل بسیار بزرگ است
Diff را بارگزاری کن
برخی از فایل ها نشان داده نشدند زیرا تعداد زیادی فایل در این تفاوت تغییر کرده اند نمایش بیشتر
مرجع در شماره جدید
Block a user