Files
rocm-systems/source/lib/rocprofiler-sdk/hsa/queue_controller.cpp
T
Ammar ELWazir 987ae3cc47 PC Sampling Support (#715)
* cmake formatting (cmake-format) (#188)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#189)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: design of the pc sampling data struct; guarding parts of code that uses ROCr marker packets

* source formatting (clang-format v11) (#191)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#192)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: shadow variable fix

* pcs: fix for compiler errors reported by CI/CD

* source formatting (clang-format v11) (#193)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: docs fix; samples uses rocprofiler::rocprofiler library

* cmake formatting (cmake-format) (#195)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client in samples folder fixed

* pcs: client requires rocprofiler package as dependency

* pcs: client uses single context

* source formatting (clang-format v11) (#196)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: client using single buffer; no buffer destroy in client

* pcs: client::setup explicitly called from the example

* pcs: rocprofiler_pc_sample_record_t updated

* pcs: fixed init of external correlation id

* source formatting (clang-format v11) (#198)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: remove outdated files; update CMakeLists

* cmake formatting (cmake-format) (#212)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: using rocprofiler_agent_id_t

* pcs: Removing trailing whitespaces

Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>

* source formatting (clang-format v11) (#214)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: mapping agent_id to the agent

* source formatting (clang-format v11) (#215)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: const while iterating over agents

* source formatting (clang-format v11) (#216)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: calling get_buffer instead of get_buffers

* pcs: workgroup typo

* pcs: documentation for the public PC sampling API

* pcs: queue_cb_t signature adaptation

* pcs: mocks removed

* pcs: updating HsaApiTable with HSA/ROCr PC sampling API

* pcs: querying available PC sampling configs through IOCTL

* pcs: create the PCS session in IOCTL

* pcs: first actual PC samples delivered to the rocprofiler's client :)

* pcs: works with marker packet too

* pcs: using HSA table to call pc sampling related functions

* pcs: using ioctl instead of kfd in naming

* pcs: configuration service test fixed

* pcs: sample processing test fixed

* pcs: marker packet macro wrapper removed

* pcs: marker packet is part of the rocprofiler_packet union

* pcs: one fixme added

* pcs: client that uses pc-sampling and code obj tracing

* pcs: client that supprts PC sampling and code obj tracing refactored

* pcs: show more info for each PC sample

* pcs: hex output for the samples that do not belong to the matmul kernel

* pcs: querying avail configuration happens immediately before configuring

* pcs: hsa_ven_amd_pcs_create_from_id renamed

* pcs: using hsa_stop; accessing a buffer by id from parser

* pcs: includes reworked, tests returned to life

* pcs: rocrofiler dir removed as outdated

* cmake formatting (cmake-format) (#271)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#272)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: some warnings fixed

* source formatting (clang-format v11) (#273)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#274)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: show MI200 relevant information in the sample

* pcs: queue cb fixed; rocr.h include fixed

* source formatting (clang-format v11) (#296)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: getting hsa_agent and the doorbell_id from hsa_queue

* source formatting (clang-format v11) (#297)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: correlation ID logic fixed

* source formatting (clang-format v11) (#303)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: pure pc sampling example fixed

* source formatting (clang-format v11) (#307)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#308)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: interval value if the PC sampling is already configured

* pcs: ROCPROFILER_STATUS_ERROR_PC_SAMPLING_ALREADY_CONFIGURED

New status code if another process configured PC sampling service with different configuration.
Samples are extended to consider this case and retry if it happens.

* pcs: hsa_amd_queue_get_info mocked in tests

* source formatting (clang-format v11) (#328)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs (tests): query configs after configuring service

* source formatting (clang-format v11) (#329)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: sample checks workgroup_id_* and wave_id

* source formatting (clang-format v11) (#330)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs samples: running samples on the device 0

* pcs: kfd_ioctl updated

* pcs: ioctl config struct changed fields names

* pcs: status when PC sampling is configured by another process is renamed

* pcs: HSA PC sampling API table fixed

* pcs: tmp hack to be able to use HSA pc sampling table

* source formatting (clang-format v11) (#443)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs service use CIDs generated by HIP API tracing service

* source formatting (clang-format v11) (#455)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* cmake formatting (cmake-format) (#456)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: CID manager

* pcs: explicit flush with no delivered data executes retirement logic

* source formatting (clang-format v11) (#464)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_query_pc_sampling_agent_configurations docs update

* source formatting (clang-format v11) (#465)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: rocprofiler_configure_pc_sampling_service docs update

* pcs: explicit sync introduced in PCSCIDManager

* pcs: new logic for retiring CIDs in PC sampling service documented

* pcs: queue interception cb signature updated

* source formatting (clang-format v11) (#471)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: if no agents supports PC sampling, fail gracefully

* elaborating when KFD returns EBUSY and EEXIST

* pcs: the second PC sampling examples fails gracefully

* code samples use only single kernel for now

* pcs: CID manager refactored

* source formatting (clang-format v11) (#481)

Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>

* pcs: ioctl update

* source formatting (clang-format v11) (#531)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs:code sample to test PC sampling applied on concurrent kernels

* source formatting (clang-format v11) (#533)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: pc sampling strest test included

* cmake formatting (cmake-format) (#539)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#540)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: standalone benchmark

* cmake formatting (cmake-format) (#555)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: glance in external correlation IDs

* source formatting (clang-format v11) (#557)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* another change in ioctl interface

* pcs: update queue interceptor callbacks and samples accroding to the agent 0 version

* source formatting (clang-format v11) (#611)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: avoid running problematic PC sampling test

* pcs: guarding tests not to fail on architectures not supporting PC sampling

* source formatting (clang-format v11) (#617)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: check IOCTL version prior to each KFD call

* pcs: ioctl refactoring

* pcs: PC sampling service increases the ref_count of the correlation ID of the kernel dispatch

* cmake formatting (cmake-format) (#631)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* source formatting (clang-format v11) (#632)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: PC sampling service provides external correlation IDs

* source formatting (clang-format v11) (#644)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: use rocprofiler_dim3_t for workgrou_ip

* source formatting (clang-format v11) (#645)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: minor fixes

* pcs: updating the documentation for the pc sampling API functions

* pcs: api table and queue controller fix

* pcs: don't generate marker packets for the agent if PC sampling is not configured on it

* pcs: multi-GPU and single-GPU clients

* source formatting (clang-format v11) (#700)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: warning and errors fixed

* source formatting (clang-format v11) (#702)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: clang compiler errors and warnings fixed

* source formatting (clang-format v11) (#716)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const reference in cid manager

* source formatting (clang-format v11) (#717)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: const & func in manager explicit

* pcs: test to cover creating PC sampling service of agent that does not exist

* pcs: generate marker packets if service is active

* source formatting (clang-format v11) (#719)

Co-authored-by: vlaindic <139573562+vlaindic@users.noreply.github.com>

* pcs: refactoring hsa_adapter; use the correlation_id->thread_idx

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/cid_manager.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp

* Update source/lib/rocprofiler-sdk/pc_sampling/utils.cpp

* Update utils.cpp

* moving pc-sampling tests and samples to pc-sampling label

* Format fix

* pcs: use configured instead of active service

* Update source/lib/rocprofiler-sdk/pc_sampling/service.cpp

* pcs: ensure configuring PC sampling on the HSA level is called only once

* pcs: minor fix

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update CMakeLists.txt

* pcs: refactoring IOCTL integration

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: Ammar ELWazir <ammar.elwazir@amd.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: reverting back what bot doubled

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: retesting the bot

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter_types.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: why bot fails on this IOCTL status

* pcs: why failing on <vector>

* Update source/lib/rocprofiler-sdk/pc_sampling/ioctl/ioctl_adapter.cpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: returning commits removed by bot

* pcs: formatting locally

* pcs: clients are flushing buffers inside the tool_fini

* pcs: sync function in public API

* pcs: sync prior to unloading the code object

* pcs: sync function requires context

* pcs: client uses CID retirement service

* pcs: test for flusing internal ROCr buffers

* pcs: source formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/tests/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* pcs: code samples refactoring

* pcs: public API header refactored

* pcs: rocprofiler_buffer_flush drains internal PC sampling buffers too

* pcs: remove unnecessary functions

* pcs: do not call hsa's copytables

* pcs: include reordering

* pcs: using ROCP_ERROR inside PC sampling implementation

* pcs: pc_sampling sample uses ostream instean of printfs

* pcs: pc_sampling_codeobj tracing using ostream instead of prints

* pcs: registering once for interceptor callbacks

* pcs: do not generate internal CIDs if not in debug mode

* pcs: rebasing fixed; missing external correlation IDs

* pcs: code formatting

* enable kernel tracing service to receive external correlation IDs

* pcs: using ROCPROFILER_STATUS_ERROR_INCOMPATIBLE_KERNEL

* pcs: polishing parser

* formatting

* updating parser to use workgroup_id

* kfd_ioctl.h extracted in details folder

* refactoring

* pcs: preparing to generate code object information

* flush internal buffers prior to unloading code object

* pcs: generating marker records

* pcs: wrap code_object's shutdown function

* ROCR_VISIBLE_DEVICES and HIP_VISISBLE_DEVICES unsupported at the moment

* documenting the ignorance of ROCR/HIP_VISIBLE_DEVICES

* pcs: separate structs for code object loading/unloading markers

* pcs: inst_pkt_t changed the namespace

* pcs: removing wrapper around the shutdown function

* pcs: size in record field

* pcs: documentation refactoring + typdefs

* renaming PCSAgentConfig to PCSAgentSession

* pcs: service does not keep a pointer to the context

* pcs: static assertions related to the versioning

* pcs: rocprofiler_pc_sampling_configuration_t size field

* pcs: report API unimplemented unleass explicitly enabled

* pcs: skip tests if KFD does not support PC sampling

* pcs: if ROCr hides some devices, no PC samples will be delivered for it

* pcs: hip error check after kernel launch

* formatting

* removing PCS info from agent.h

* fix based on review

* Update continuous integration workflow

- use mi200 runner for code coverage (supports PC sampling)
- split sanitizer jobs across navi3, vega20, and mi300

* Updating pc sampling test labels

* ROCP_PC_SAMPLING_ENABLED env in CI

* ROCP_PC_SAMPLING_ENABLED for all CI mi200 jobs

* Rearrange sanitizer assignments

* fixes according to review

* removed unused functions

* pcs: rocprofiler_agent_id_t instead of handle as a key in map

* Update source/lib/rocprofiler-sdk/context/context.hpp

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* removing drm_fd from the agent.h

* pcs: removing one sample due to complexity

* pcs: refactoring sample

* simplifying sample

* new lines

* Improve queue_control enable intercepter logic

* Update lib/rocprofiler-sdk/hsa/types.hpp

- handle amd_ext size for HSA 1.12.0

* ROCP_PC_SAMPLING_ENABLED -> ROCPROFILER_PC_SAMPLING_BETA_ENABLED

* Update hsa_adapter.cpp

- anonymous namespace + remove debug

* parser update

* Apply suggestions from code review

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: vlaindic <vlaindic@users.noreply.github.com>
Co-authored-by: vlaindic <vladimir.indic@amd.com>
Co-authored-by: vlaindic <vlaindic@amd.com>
Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com>
Co-authored-by: gobhardw <gopesh.bhardwaj@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-05-24 09:49:44 -05:00

404 rader
14 KiB
C++

// MIT License
//
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp"
#include "lib/common/static_object.hpp"
#include "lib/rocprofiler-sdk/agent.hpp"
#include "lib/rocprofiler-sdk/context/context.hpp"
#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp"
#include "lib/rocprofiler-sdk/registration.hpp"
#include <rocprofiler-sdk/fwd.h>
namespace rocprofiler
{
namespace hsa
{
namespace
{
// HSA Intercept Functions (create_queue/destroy_queue)
hsa_status_t
create_queue(hsa_agent_t agent,
uint32_t size,
hsa_queue_type32_t type,
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data),
void* data,
uint32_t private_segment_size,
uint32_t group_segment_size,
hsa_queue_t** queue)
{
auto* controller = CHECK_NOTNULL(get_queue_controller());
for(const auto& [_, agent_info] : controller->get_supported_agents())
{
if(agent_info.get_hsa_agent().handle == agent.handle)
{
auto new_queue = std::make_unique<Queue>(agent_info,
size,
type,
callback,
data,
private_segment_size,
group_segment_size,
controller->get_core_table(),
controller->get_ext_table(),
queue);
controller->serializer().wlock(
[&](auto& serializer) { serializer.add_queue(queue, *new_queue); });
controller->add_queue(*queue, std::move(new_queue));
return HSA_STATUS_SUCCESS;
}
}
ROCP_FATAL << "Could not find agent - " << agent.handle;
return HSA_STATUS_ERROR_FATAL;
}
hsa_status_t
destroy_queue(hsa_queue_t* hsa_queue)
{
if(get_queue_controller()) get_queue_controller()->destroy_queue(hsa_queue);
return HSA_STATUS_SUCCESS;
}
constexpr rocprofiler_agent_t default_agent =
rocprofiler_agent_t{.size = sizeof(rocprofiler_agent_t),
.id = rocprofiler_agent_id_t{std::numeric_limits<uint64_t>::max()},
.type = ROCPROFILER_AGENT_TYPE_NONE,
.cpu_cores_count = 0,
.simd_count = 0,
.mem_banks_count = 0,
.caches_count = 0,
.io_links_count = 0,
.cpu_core_id_base = 0,
.simd_id_base = 0,
.max_waves_per_simd = 0,
.lds_size_in_kb = 0,
.gds_size_in_kb = 0,
.num_gws = 0,
.wave_front_size = 0,
.num_xcc = 0,
.cu_count = 0,
.array_count = 0,
.num_shader_banks = 0,
.simd_arrays_per_engine = 0,
.cu_per_simd_array = 0,
.simd_per_cu = 0,
.max_slots_scratch_cu = 0,
.gfx_target_version = 0,
.vendor_id = 0,
.device_id = 0,
.location_id = 0,
.domain = 0,
.drm_render_minor = 0,
.num_sdma_engines = 0,
.num_sdma_xgmi_engines = 0,
.num_sdma_queues_per_engine = 0,
.num_cp_queues = 0,
.max_engine_clk_ccompute = 0,
.max_engine_clk_fcompute = 0,
.sdma_fw_version = {},
.fw_version = {},
.capability = {},
.cu_per_engine = 0,
.max_waves_per_cu = 0,
.family_id = 0,
.workgroup_max_size = 0,
.grid_max_size = 0,
.local_mem_size = 0,
.hive_id = 0,
.gpu_id = 0,
.workgroup_max_dim = {0, 0, 0},
.grid_max_dim = {0, 0, 0},
.mem_banks = nullptr,
.caches = nullptr,
.io_links = nullptr,
.name = nullptr,
.vendor_name = nullptr,
.product_name = nullptr,
.model_name = nullptr,
.node_id = 0,
.logical_node_id = 0};
} // namespace
void
QueueController::add_queue(hsa_queue_t* id, std::unique_ptr<Queue> queue)
{
for(auto& pre_initialize_fn : pre_initialize)
pre_initialize_fn(queue->get_agent(), get_core_table(), get_ext_table());
CHECK(queue);
_callback_cache.wlock([&](auto& callbacks) {
_queues.wlock([&](auto& map) {
const auto agent_id = queue->get_agent().get_rocp_agent()->id.handle;
map[id] = std::move(queue);
for(const auto& [cbid, cb_tuple] : callbacks)
{
auto& [agent, qcb, ccb] = cb_tuple;
if(agent.id.handle == default_agent.id.handle || agent.id.handle == agent_id)
{
map[id]->register_callback(cbid, qcb, ccb);
}
}
});
});
}
void
QueueController::destroy_queue(hsa_queue_t* id)
{
if(!id) return;
_queues.wlock([&](auto& map) {
for(auto& deinitialize_fn : pre_deinitialize)
if(map.find(id) != map.end())
deinitialize_fn(map.at(id)->get_agent(), get_core_table(), get_ext_table());
});
const auto* queue = get_queue(*id);
// return if queue does not exist
if(!queue) return;
ROCP_INFO << "destroying queue...";
queue->sync();
if(queue->block_signal.handle != 0) get_core_table().hsa_signal_destroy_fn(queue->block_signal);
_queues.wlock([&](auto& map) { map.erase(id); });
ROCP_INFO << "queue destroyed";
}
ClientID
QueueController::add_callback(std::optional<rocprofiler_agent_t> agent,
Queue::queue_cb_t qcb,
Queue::completed_cb_t ccb)
{
static std::atomic<ClientID> client_id = 1;
ClientID return_id;
_callback_cache.wlock([&](auto& cb_cache) {
return_id = client_id;
if(agent)
{
cb_cache[client_id] = std::tuple(*agent, qcb, ccb);
}
else
{
cb_cache[client_id] = std::tuple(default_agent, qcb, ccb);
}
client_id++;
_queues.wlock([&](auto& map) {
for(auto& [_, queue] : map)
{
if(!agent || queue->get_agent().get_rocp_agent()->id.handle == agent->id.handle)
{
queue->register_callback(return_id, qcb, ccb);
}
}
});
});
return return_id;
}
void
QueueController::remove_callback(ClientID id)
{
_callback_cache.wlock([&](auto& cb_cache) {
cb_cache.erase(id);
_queues.wlock([&](auto& map) {
for(auto& [_, queue] : map)
{
queue->remove_callback(id);
}
});
});
}
void
QueueController::init(CoreApiTable& core_table, AmdExtTable& ext_table)
{
_core_table = core_table;
_ext_table = ext_table;
auto agents = agent::get_agents();
// Generate supported agents
for(const auto* itr : agents)
{
const auto* cached_agent = agent::get_agent_cache(itr);
if(cached_agent && cached_agent->get_rocp_agent()->type == ROCPROFILER_AGENT_TYPE_GPU)
{
get_supported_agents().emplace(cached_agent->index(), *cached_agent);
}
}
auto enable_intercepter = false;
for(const auto& itr : context::get_registered_contexts())
{
constexpr auto expected_context_size = 200UL;
static_assert(
sizeof(context::context) ==
expected_context_size + sizeof(std::shared_ptr<rocprofiler::ThreadTracer>),
"If you added a new field to context struct, make sure there is a check here if it "
"requires queue interception. Once you have done so, increment expected_context_size");
bool has_kernel_tracing =
(itr->callback_tracer &&
itr->callback_tracer->domains(ROCPROFILER_CALLBACK_TRACING_KERNEL_DISPATCH)) ||
(itr->buffered_tracer &&
itr->buffered_tracer->domains(ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH));
if(itr->counter_collection || itr->pc_sampler || has_kernel_tracing)
{
enable_intercepter = true;
break;
}
else if(itr->thread_trace)
{
enable_intercepter = true;
std::weak_ptr<rocprofiler::ThreadTracer> trace = itr->thread_trace;
pre_initialize.emplace_back(
[trace](const AgentCache& cache, const CoreApiTable& core, const AmdExtTable& ext) {
if(auto locked = trace.lock()) locked->resource_init(cache, core, ext);
});
pre_deinitialize.emplace_back(
[trace](const AgentCache& cache, const CoreApiTable&, const AmdExtTable&) {
if(auto locked = trace.lock()) locked->resource_deinit(cache);
});
}
}
if(enable_intercepter)
{
core_table.hsa_queue_create_fn = hsa::create_queue;
core_table.hsa_queue_destroy_fn = hsa::destroy_queue;
}
}
const Queue*
QueueController::get_queue(const hsa_queue_t& _hsa_queue) const
{
return _queues.rlock(
[](const queue_map_t& _data, const hsa_queue_t& _inp) -> const Queue* {
for(const auto& itr : _data)
{
if(itr.first->id == _inp.id) return itr.second.get();
}
return nullptr;
},
_hsa_queue);
}
void
QueueController::disable_serialization()
{
_queues.rlock([](const queue_map_t& _queues_v) {
if(get_queue_controller())
get_queue_controller()->serializer().wlock(
[&](auto& serializer) { serializer.disable(_queues_v); });
});
}
void
QueueController::enable_serialization()
{
_queues.rlock([](const queue_map_t& _queues_v) {
if(get_queue_controller())
get_queue_controller()->serializer().wlock(
[&](auto& serializer) { serializer.enable(_queues_v); });
});
}
void
QueueController::print_debug_signals() const
{
#if !defined(NDEBUG)
_debug_signals.rlock([&](const auto& signals) {
for(const auto& [id, signal] : signals)
{
ROCP_ERROR << "Signal " << signal.handle << " "
<< get_core_table().hsa_signal_load_scacquire_fn(signal);
}
});
#endif
_queues.rlock([&](const auto& queues) {
for(const auto& [_, queue] : queues)
{
ROCP_ERROR << "Queue " << queue->get_id().handle << " " << queue->ready_signal.handle
<< ":" << get_core_table().hsa_signal_load_scacquire_fn(queue->ready_signal)
<< " " << queue->block_signal.handle << ":"
<< get_core_table().hsa_signal_load_scacquire_fn(queue->block_signal);
}
});
}
void
QueueController::set_queue_state(queue_state state, hsa_queue_t* hsa_queue)
{
_queues.wlock([&](auto& map) { map[hsa_queue]->set_state(state); });
}
void
QueueController::iterate_queues(const queue_iterator_cb_t& cb) const
{
_queues.rlock([&cb](const queue_map_t& _queues_v) {
for(const auto& itr : _queues_v)
{
if(itr.second) cb(itr.second.get());
}
});
}
void
QueueController::iterate_callbacks(const callback_iterator_cb_t& cb) const
{
_callback_cache.rlock([&cb](const auto& map) {
for(const auto& [cid, tuple] : map)
{
cb(cid, tuple);
}
});
}
QueueController*
get_queue_controller()
{
static auto*& controller = common::static_object<QueueController>::construct();
return controller;
}
void
queue_controller_init(HsaApiTable* table)
{
CHECK_NOTNULL(get_queue_controller())->init(*table->core_, *table->amd_ext_);
}
void
queue_controller_fini()
{
if(get_queue_controller())
get_queue_controller()->iterate_queues([](const Queue* _queue) { _queue->sync(); });
}
} // namespace hsa
} // namespace rocprofiler