Files
rocm-systems/source/lib/rocprofiler-sdk/tests/agent.cpp
T
Ammar ELWazir 987ae3cc47 PC Sampling Support (#715)
* cmake formatting (cmake-format) (#188)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#189)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets

* source formatting (clang-format v11) (#191)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#192)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: shadow variable fix

* pcs: fix for compiler errors reported by CI/CD

* source formatting (clang-format v11) (#193)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: docs fix; samples uses rocprofiler::rocprofiler library

* cmake formatting (cmake-format) (#195)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client in samples folder fixed

* pcs: client requires rocprofiler package as dependency

* pcs: client uses single context

* source formatting (clang-format v11) (#196)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client using single buffer; no buffer destroy in client

* pcs: client::setup explicitly called from the example

* pcs: rocprofiler_pc_sample_record_t updated

* pcs: fixed init of external correlation id

* source formatting (clang-format v11) (#198)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: remove outdated files; update CMakeLists

* cmake formatting (cmake-format) (#212)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: using rocprofiler_agent_id_t

* pcs: Removing trailing whitespaces

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* source formatting (clang-format v11) (#214)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: mapping agent_id to the agent

* source formatting (clang-format v11) (#215)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: const while iterating over agents

* source formatting (clang-format v11) (#216)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: calling get_buffer instead of get_buffers

* pcs: workgroup typo

* pcs: documentation for the public PC sampling API

* pcs: queue_cb_t signature adaptation

* pcs: mocks removed

* pcs: updating HsaApiTable with HSA/ROCr PC sampling API

* pcs: querying available PC sampling configs through IOCTL

* pcs: create the PCS session in IOCTL

* pcs: first actual PC samples delivered to the rocprofiler's client :)

* pcs: works with marker packet too

* pcs: using HSA table to call pc sampling related functions

* pcs: using ioctl instead of kfd in naming

* pcs: configuration service test fixed

* pcs: sample processing test fixed

* pcs: marker packet macro wrapper removed

* pcs: marker packet is part of the rocprofiler_packet union

* pcs: one fixme added

* pcs: client that uses pc-sampling and code obj tracing

* pcs: client that supprts PC sampling and code obj tracing refactored

* pcs: show more info for each PC sample

* pcs: hex output for the samples that do not belong to the matmul kernel

* pcs: querying avail configuration happens immediately before configuring

* pcs: hsa_ven_amd_pcs_create_from_id renamed

* pcs: using hsa_stop; accessing a buffer by id from parser

* pcs: includes reworked, tests returned to life

* pcs: rocrofiler dir removed as outdated

* cmake formatting (cmake-format) (#271)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#272)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: some warnings fixed

* source formatting (clang-format v11) (#273)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#274)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: show MI200 relevant information in the sample

* pcs: queue cb fixed; rocr.h include fixed

* source formatting (clang-format v11) (#296)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: getting hsa_agent and the doorbell_id from hsa_queue

* source formatting (clang-format v11) (#297)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: correlation ID logic fixed

* source formatting (clang-format v11) (#303)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: pure pc sampling example fixed

* source formatting (clang-format v11) (#307)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#308)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: interval value if the PC sampling is already configured

* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED

New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.

* pcs: hsa_amd_queue_get_info mocked in tests

* source formatting (clang-format v11) (#328)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs (tests): query configs after configuring service

* source formatting (clang-format v11) (#329)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: sample checks workgroup_id_* and wave_id

* source formatting (clang-format v11) (#330)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs samples: running samples on the device 0

* pcs: kfd_ioctl updated

* pcs: ioctl config struct changed fields names

* pcs: status when PC sampling is configured by another process is renamed

* pcs: HSA PC sampling API table fixed

* pcs: tmp hack to be able to use HSA pc sampling table

* source formatting (clang-format v11) (#443)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs service use CIDs generated by HIP API tracing service

* source formatting (clang-format v11) (#455)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#456)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: CID manager

* pcs: explicit flush with no delivered data executes retirement logic

* source formatting (clang-format v11) (#464)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update

* source formatting (clang-format v11) (#465)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_configure_pc_sampling_service docs update

* pcs: explicit sync introduced in PCSCIDManager

* pcs: new logic for retiring CIDs in PC sampling service documented

* pcs: queue interception cb signature updated

* source formatting (clang-format v11) (#471)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: if no agents supports PC sampling, fail gracefully

* elaborating when KFD returns EBUSY and EEXIST

* pcs: the second PC sampling examples fails gracefully

* code samples use only single kernel for now

* pcs: CID manager refactored

* source formatting (clang-format v11) (#481)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: ioctl update

* source formatting (clang-format v11) (#531)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs:code sample to test PC sampling applied on concurrent kernels

* source formatting (clang-format v11) (#533)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: pc sampling strest test included

* cmake formatting (cmake-format) (#539)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#540)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: standalone benchmark

* cmake formatting (cmake-format) (#555)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: glance in external correlation IDs

* source formatting (clang-format v11) (#557)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* another change in ioctl interface

* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version

* source formatting (clang-format v11) (#611)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: avoid running problematic PC sampling test

* pcs: guarding tests not to fail on architectures not supporting PC sampling

* source formatting (clang-format v11) (#617)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: check IOCTL version prior to each KFD call

* pcs: ioctl refactoring

* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch

* cmake formatting (cmake-format) (#631)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#632)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: PC sampling service provides external correlation IDs

* source formatting (clang-format v11) (#644)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: use rocprofiler_dim3_t for workgrou_ip

* source formatting (clang-format v11) (#645)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: minor fixes

* pcs: updating the documentation for the pc sampling API functions

* pcs: api table and queue controller fix

* pcs: don't generate marker packets for the agent if PC sampling is not configured on it

* pcs: multi-GPU and single-GPU clients

* source formatting (clang-format v11) (#700)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: warning and errors fixed

* source formatting (clang-format v11) (#702)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: clang compiler errors and warnings fixed

* source formatting (clang-format v11) (#716)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const reference in cid manager

* source formatting (clang-format v11) (#717)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const & func in manager explicit

* pcs: test to cover creating PC sampling service of agent that does not exist

* pcs: generate marker packets if service is active

* source formatting (clang-format v11) (#719)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp

* Update utils.cpp

* moving pc-sampling tests and samples to pc-sampling label

* Format fix

* pcs: use configured instead of active service

* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp

* pcs: ensure configuring PC sampling on the HSA level is called only once

* pcs: minor fix

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* pcs: refactoring IOCTL integration

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: reverting back what bot doubled

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: retesting the bot

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: why bot fails on this IOCTL status

* pcs: why failing on <vector>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: returning commits removed by bot

* pcs: formatting locally

* pcs: clients are flushing buffers inside the tool_fini

* pcs: sync function in public API

* pcs: sync prior to unloading the code object

* pcs: sync function requires context

* pcs: client uses CID retirement service

* pcs: test for flusing internal ROCr buffers

* pcs: source formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: code samples refactoring

* pcs: public API header refactored

* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too

* pcs: remove unnecessary functions

* pcs: do not call hsa's copytables

* pcs: include reordering

* pcs: using ROCP_ERROR inside PC sampling implementation

* pcs: pc_sampling sample uses ostream instean of printfs

* pcs: pc_sampling_codeobj tracing using ostream instead of prints

* pcs: registering once for interceptor callbacks

* pcs: do not generate internal CIDs if not in debug mode

* pcs: rebasing fixed; missing external correlation IDs

* pcs: code formatting

* enable kernel tracing service to receive external correlation IDs

* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL

* pcs: polishing parser

* formatting

* updating parser to use workgroup_id

* kfd_ioctl.h extracted in details folder

* refactoring

* pcs: preparing to generate code object information

* flush internal buffers prior to unloading code object

* pcs: generating marker records

* pcs: wrap code_object's shutdown function

* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment

* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES

* pcs: separate structs for code object loading/unloading markers

* pcs: inst_pkt_t changed the namespace

* pcs: removing wrapper around the shutdown function

* pcs: size in record field

* pcs: documentation refactoring + typdefs

* renaming PCSAgentConfig to PCSAgentSession

* pcs: service does not keep a pointer to the context

* pcs: static assertions related to the versioning

* pcs: rocprofiler_pc_sampling_configuration_t size field

* pcs: report API unimplemented unleass explicitly enabled

* pcs: skip tests if KFD does not support PC sampling

* pcs: if ROCr hides some devices, no PC samples will be delivered for it

* pcs: hip error check after kernel launch

* formatting

* removing PCS info from agent.h

* fix based on review

* Update continuous integration workflow

- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300

* Updating pc sampling test labels

* ROCP_PC_SAMPLING_ENABLED env in CI

* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs

* Rearrange sanitizer assignments

* fixes according to review

* removed unused functions

* pcs: rocprofiler_agent_id_t instead of handle as a key in map

* Update source/lib/rocprofiler-sdk/context/context.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* removing drm_fd from the agent.h

* pcs: removing one sample due to complexity

* pcs: refactoring sample

* simplifying sample

* new lines

* Improve queue_control enable intercepter logic

* Update lib/rocprofiler-sdk/hsa/types.hpp

- handle amd_ext size for HSA 1.12.0

* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED

* Update hsa_adapter.cpp

- anonymous namespace + remove debug

* parser update

* Apply suggestions from code review

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-05-24 09:49:44 -05:00

287 строки
14 KiB
C++

// MIT License
//
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in all
// copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
// SOFTWARE.
#include <rocprofiler-sdk/agent.h>
#include <rocprofiler-sdk/fwd.h>
#include <rocprofiler-sdk/registration.h>
#include "lib/rocprofiler-sdk/agent.hpp"
#include "lib/rocprofiler-sdk/registration.hpp"
#include "lib/rocprofiler-sdk/tests/details/agent.hpp"
#include <fmt/core.h>
#include <gtest/gtest.h>
#include <hsa/hsa.h>
#include <hsa/hsa_api_trace.h>
#include <pthread.h>
#include <cstdint>
#include <cstdlib>
#include <iostream>
#include <random>
#include <sstream>
#include <type_traits>
#include <typeinfo>
TEST(rocprofiler_lib, agent_abi)
{
constexpr auto msg = "ABI break. NEW FIELDS MAY ONLY BE ADDED AT END OF STRUCT";
EXPECT_EQ(offsetof(rocprofiler_agent_t, size), 0) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, id), 8) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, type), 16) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, cpu_cores_count), 20) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, simd_count), 24) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, mem_banks_count), 28) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, caches_count), 32) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, io_links_count), 36) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, cpu_core_id_base), 40) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, simd_id_base), 44) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, max_waves_per_simd), 48) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, lds_size_in_kb), 52) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, gds_size_in_kb), 56) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_gws), 60) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, wave_front_size), 64) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_xcc), 68) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, cu_count), 72) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, array_count), 76) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_shader_banks), 80) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, simd_arrays_per_engine), 84) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, cu_per_simd_array), 88) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, simd_per_cu), 92) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, max_slots_scratch_cu), 96) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, gfx_target_version), 100) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, vendor_id), 104) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, device_id), 106) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, location_id), 108) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, domain), 112) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, drm_render_minor), 116) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_sdma_engines), 120) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_sdma_xgmi_engines), 124) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_sdma_queues_per_engine), 128) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, num_cp_queues), 132) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, max_engine_clk_ccompute), 136) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, max_engine_clk_fcompute), 140) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, sdma_fw_version), 144) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, fw_version), 148) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, capability), 152) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, cu_per_engine), 156) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, max_waves_per_cu), 160) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, family_id), 164) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, workgroup_max_size), 168) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, grid_max_size), 172) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, local_mem_size), 176) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, hive_id), 184) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, gpu_id), 192) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, workgroup_max_dim), 200) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, grid_max_dim), 212) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, mem_banks), 224) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, caches), 232) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, io_links), 240) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, name), 248) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, vendor_name), 256) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, product_name), 264) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, model_name), 272) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, node_id), 280) << msg;
EXPECT_EQ(offsetof(rocprofiler_agent_t, logical_node_id), 284) << msg;
// Add test for offset of new field above this. Do NOT change any existing values!
constexpr auto expected_rocp_agent_size = 288;
// If a new field is added, increase this value by the size of the new field(s)
EXPECT_EQ(sizeof(rocprofiler_agent_t), expected_rocp_agent_size)
<< "ABI break. If you added a new field, make sure that this is the only new check that "
"failed. Please add a check for the new field at the offset and update this test to the "
"new size";
static_assert(sizeof(rocprofiler_agent_t) == expected_rocp_agent_size, "Update agent size!");
}
TEST(rocprofiler_lib, agent)
{
rocprofiler::registration::init_logging();
auto info_ret = std::system("/usr/bin/rocminfo");
if(info_ret != 0) info_ret = std::system("rocminfo");
if(info_ret != 0) info_ret = std::system("/opt/rocm/bin/rocminfo");
std::cout << "# Data from '/sys/class/kfd/kfd/topology/nodes': \n" << std::flush;
auto sys_ret_kfd = std::system(
"/bin/bash -c 'for i in $(find /sys/class/kfd/kfd/topology/nodes -maxdepth 2 -type f | "
"grep properties | sort); do echo -e \"\n##### ${i} #####\n\"; cat ${i}; echo \"\"; done'");
EXPECT_EQ(sys_ret_kfd, 0);
std::cout << "# Data from '/sys/devices/virtual/kfd/kfd/topology/nodes': \n" << std::flush;
auto sys_ret_virt =
std::system("/bin/bash -c 'for i in $(find /sys/devices/virtual/kfd/kfd/topology/nodes "
"-maxdepth 2 -type f | grep properties | sort); do echo -e \"\n##### ${i} "
"#####\n\"; cat ${i}; echo \"\"; done'");
EXPECT_EQ(sys_ret_virt, 0);
static_assert(std::is_same<rocprofiler_agent_t, rocprofiler_agent_v0_t>::value,
"update test to support new agent struct version");
auto agents = std::vector<const rocprofiler_agent_t*>{};
rocprofiler_query_available_agents_cb_t iterate_cb = [](rocprofiler_agent_version_t agents_ver,
const void** agents_arr,
size_t num_agents,
void* user_data) {
EXPECT_EQ(agents_ver, ROCPROFILER_AGENT_INFO_VERSION_0);
if(agents_ver != ROCPROFILER_AGENT_INFO_VERSION_0) return ROCPROFILER_STATUS_ERROR;
auto* agents_v = static_cast<std::vector<const rocprofiler_agent_t*>*>(user_data);
for(size_t i = 0; i < num_agents; ++i)
{
const auto* agent = static_cast<const rocprofiler_agent_t*>(agents_arr[i]);
agents_v->emplace_back(agent);
}
return ROCPROFILER_STATUS_SUCCESS;
};
hsa_init();
{
auto table = ::HsaApiTable{};
auto core_table = ::CoreApiTable{};
auto amd_ext_table = ::AmdExtTable{};
memset(&table, 0, sizeof(table));
memset(&core_table, 0, sizeof(core_table));
memset(&amd_ext_table, 0, sizeof(amd_ext_table));
core_table.hsa_iterate_agents_fn = &hsa_iterate_agents;
core_table.hsa_status_string_fn = &hsa_status_string;
core_table.hsa_agent_get_info_fn = &hsa_agent_get_info;
amd_ext_table.hsa_amd_agent_iterate_memory_pools_fn = &hsa_amd_agent_iterate_memory_pools;
amd_ext_table.hsa_amd_memory_pool_get_info_fn = &hsa_amd_memory_pool_get_info;
table.core_ = &core_table;
table.amd_ext_ = &amd_ext_table;
rocprofiler::agent::construct_agent_cache(&table);
}
std::cout << "# querying available agents...\n" << std::flush;
auto status =
rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0,
iterate_cb,
sizeof(rocprofiler_agent_t),
const_cast<void*>(static_cast<const void*>(&agents)));
EXPECT_EQ(status, ROCPROFILER_STATUS_SUCCESS);
auto _rocm_info = rocprofiler::test::rocm_info{};
EXPECT_EQ(rocprofiler::test::get_info(_rocm_info), 0);
auto& hsa_agents_v = _rocm_info.agents;
EXPECT_GE(agents.size(), hsa_agents_v.size());
uint64_t skipped = 0;
for(const auto* agent : agents)
{
ASSERT_NE(agent, nullptr);
auto msg = fmt::format("name={}, model={}, gfx version={}, id={}, type={}",
agent->name,
agent->model_name,
agent->gfx_target_version,
agent->node_id,
agent->type == ROCPROFILER_AGENT_TYPE_CPU ? "CPU" : "GPU");
rocprofiler::test::agent_info_t* hsa_agent = nullptr;
{
auto _hsa_agent = rocprofiler::agent::get_hsa_agent(agent);
if(!_hsa_agent)
{
++skipped;
continue;
}
for(auto& hitr : hsa_agents_v)
{
if(_hsa_agent && _hsa_agent->handle == hitr.hsa_agent.handle)
{
hsa_agent = &hitr;
break;
}
}
ASSERT_NE(hsa_agent, nullptr) << msg;
}
if(agent->type == ROCPROFILER_AGENT_TYPE_CPU)
{
EXPECT_EQ(hsa_agent->device_type, HSA_DEVICE_CPU) << msg;
}
else if(agent->type == ROCPROFILER_AGENT_TYPE_GPU)
{
EXPECT_EQ(hsa_agent->device_type, HSA_DEVICE_GPU) << msg;
}
else
{
EXPECT_TRUE(false) << msg << " :: agent-type != CPU|GPU :: " << agent->type;
}
EXPECT_EQ(std::string_view{agent->name}, std::string_view{hsa_agent->name}) << msg;
EXPECT_EQ(std::string_view{agent->vendor_name}, std::string_view{hsa_agent->vendor_name})
<< msg;
EXPECT_EQ(std::string_view{agent->product_name},
std::string_view{hsa_agent->device_mkt_name})
<< msg;
// TODO(aelwazir): To be changed back to use node id once ROCR fixes the hsa_agents to use
// the real node id
EXPECT_EQ(agent->logical_node_id, hsa_agent->internal_node_id) << msg;
EXPECT_EQ(agent->location_id, hsa_agent->bdf_id) << msg;
EXPECT_EQ(agent->device_id, hsa_agent->chip_id) << msg;
EXPECT_EQ(agent->simd_count, hsa_agent->compute_unit * hsa_agent->simds_per_cu) << msg;
EXPECT_EQ(agent->cu_count, hsa_agent->compute_unit) << msg;
EXPECT_EQ(agent->simd_per_cu, hsa_agent->simds_per_cu) << msg;
EXPECT_EQ(agent->wave_front_size, hsa_agent->wavefront_size) << msg;
EXPECT_EQ(agent->simd_arrays_per_engine, hsa_agent->shader_arrs_per_sh_eng) << msg;
EXPECT_EQ(agent->max_waves_per_cu, hsa_agent->max_waves_per_cu) << msg;
EXPECT_EQ(agent->num_shader_banks, hsa_agent->shader_engs) << msg;
EXPECT_EQ(agent->workgroup_max_size, hsa_agent->workgroup_max_size) << msg;
EXPECT_EQ(agent->workgroup_max_dim.x, hsa_agent->workgroup_max_dim[0]) << msg;
EXPECT_EQ(agent->workgroup_max_dim.y, hsa_agent->workgroup_max_dim[1]) << msg;
EXPECT_EQ(agent->workgroup_max_dim.z, hsa_agent->workgroup_max_dim[2]) << msg;
EXPECT_EQ(agent->grid_max_size, hsa_agent->grid_max_size) << msg;
EXPECT_EQ(agent->grid_max_dim.x, hsa_agent->grid_max_dim.x) << msg;
EXPECT_EQ(agent->grid_max_dim.y, hsa_agent->grid_max_dim.y) << msg;
EXPECT_EQ(agent->grid_max_dim.z, hsa_agent->grid_max_dim.z) << msg;
if(agent->type == ROCPROFILER_AGENT_TYPE_GPU)
{
// HSA lib doesn't set family ID for CPU-only but we do
EXPECT_EQ(agent->family_id, hsa_agent->family_id) << msg;
}
EXPECT_EQ(agent->fw_version.ui32.uCode, hsa_agent->ucode_version) << msg;
EXPECT_EQ(agent->sdma_fw_version.uCodeSDMA, hsa_agent->sdma_ucode_version) << msg;
if(hsa_agent->shader_engs > 0)
{
EXPECT_EQ(agent->cu_per_engine, hsa_agent->compute_unit / hsa_agent->shader_engs)
<< msg;
}
}
EXPECT_EQ(skipped, (agents.size() - hsa_agents_v.size()));
// clean up memory leak
for(auto& itr : _rocm_info.isas)
delete[] itr.name_str;
}