010693b795
* Migrate XML counter defs and reader from v1/v2 * Current Working Set * Modified parser * Evaluate AST Start * Update lib/common/xml - move definitions out of class declaration * Update lib/rocprofiler/counters/parser - update build of bison and flex build - reproducible generation - add ROCPROFILER_REGENERATE_COUNTERS_PARSER option - fix namespacing * Update lib/rocprofiler/counters/xml - change location of XML files and install them * Update lib/rocprofiler/counter/tests - normalize the test names - improve test failures (more clear about where failure is) * Update lib/rocprofiler/counters - fix namespace - update to new XML metrics directory * Update lib/rocprofiler/CMakeLists.txt - link to object library * Update lib/rocprofiler/hsa/types.hpp - reorganize includes * Add metric loading class/printers * Agent Implementation * Queue Implementation (#79) * Queue Implementation * API Implementation For Counters (part 1) (#80) * API Implementation For Counters * Bewelton/counter collection 3 (#84) * Added counter sample * More changes * More changes * Update samples/counter_collection - mostly formatting * Update include/rocprofiler/counters.h - formatting * Add lib.common/synchronized.hpp - Synchronized struct * Update lib/rocprofiler/counters/xml/basic_counters.xml - whitespace * Update scripts/patch-parser.cmake - tweaks for consistency * Update lib/rocprofiler/counters/parser/tests/parser_tests.cpp - formatting * Update lib/rocprofiler/counters/parser - improve consistency in rocprofiler-expr-parser-patch - update parser.{h,cpp} and scanner.cpp - formatting + regenerated * Update lib/rocprofiler/aql - formatting - clang-tidy fixes - guard against memory pool access errors * Update lib/rocprofiler/aql/tests - formatting - update use of get_val - normalize test names * Update lib/rocprofiler/counters/tests - formatting - patch basic_counters and derived_counters - normalize test names * Update lib/rocprofiler/aql/tests - set_tests_properties * Update test labels - fix minor issue with gtest labels * Update lib/rocprofiler/counters - formatting - clang-tidy fixes * Update lib/rocprofiler/hsa - fix includes - formatting - clang-tidy fixes - tweak to queue_controller_init interface * Update lib/rocprofiler - include fixes - namespace fixes - clang-tidy fixes - formatting * Update scripts/run-ci.py - exclude counters/parser from code coverage (generated files) * Update include/rocprofiler/counters.h - fix doxygen comment * Update lib/rocprofiler/aql/packet_construct.cpp - guard against HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT and HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED * Update lib/rocprofiler/counters/parser/raw_ast.hpp - clang-tidy fixes * Update lib/rocprofiler/counters/evaluate_ast.hpp - clang-tidy fixes * Update lib/rocprofiler/aql/tests - disable packet_generation_single and packet_generation_multi tests - the entire implementation rocprofiler::get_ext_table() is incorrect * Minor fixes before cleanup * More changes * More fixes * More fixes * source formatting (clang-format v11) (#99) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Revert PTL submodule * Update scripts/run-ci.py - exclude counters/parser from code coverage (generated files) * Migrating counters state to context * Linting * source formatting (clang-format v11) (#101) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * revert run-ci * Testing fixes * More test changes * Fix minor typo * Small queue change * Small queue change * source formatting (clang-format v11) (#102) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * source formatting (clang-format v11) (#105) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Documentation Change * More documentation fixes * source formatting (clang-format v11) (#106) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Threading fixes * Threading fixes * source formatting (clang-format v11) (#107) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Threading fixes * More test fixes * More agent fixes * More build fixes * source formatting (clang-format v11) (#109) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * changed test timeouts * Build fix * Build fix * Updates to agent * source formatting (clang-format v11) (#114) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * cmake formatting (cmake-format) (#113) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * remove git worktree folder * Doc update * testing fix * Another test fix * More test changes * Rebase * source formatting (clang-format v11) (#116) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Documentation * source formatting (clang-format v11) (#119) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * PTL Changes * Minor agent fix for empty labels * source formatting (clang-format v11) (#120) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Minor agent fix for empty labels * Refactor read_map * source formatting (clang-format v11) (#121) Co-authored-by: bwelton <bwelton@users.noreply.github.com> * Refactor read_map * Cache fixes * source formatting (clang-format v11) (#122) Co-authored-by: bwelton <bwelton@users.noreply.github.com> --------- Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: bwelton <bwelton@users.noreply.github.com>
190 строки
7.1 KiB
C++
190 строки
7.1 KiB
C++
#include "lib/rocprofiler/aql/packet_construct.hpp"
|
|
|
|
#include <fmt/core.h>
|
|
#include <hsa/hsa_ext_amd.h>
|
|
#include "glog/logging.h"
|
|
|
|
namespace rocprofiler
|
|
{
|
|
namespace aql
|
|
{
|
|
AQLPacketConstruct::AQLPacketConstruct(const hsa::AgentCache& agent,
|
|
const std::vector<counters::Metric>& metrics)
|
|
: _agent(agent)
|
|
{
|
|
if(metrics.empty())
|
|
{
|
|
throw std::runtime_error("No metrics supplied");
|
|
}
|
|
|
|
// Validate that the counter exists and construct the block instances
|
|
// for the counter.
|
|
for(const auto& x : metrics)
|
|
{
|
|
auto query_info = get_query_info(_agent.get_agent(), x);
|
|
_metrics.emplace_back().metric = x;
|
|
uint32_t event_id = std::atoi(x.event().c_str());
|
|
for(unsigned block_index = 0; block_index < query_info.instance_count; ++block_index)
|
|
{
|
|
_metrics.back().instances.push_back(
|
|
{static_cast<hsa_ven_amd_aqlprofile_block_name_t>(query_info.id),
|
|
block_index,
|
|
event_id});
|
|
bool validate_event_result;
|
|
LOG_IF(FATAL,
|
|
hsa_ven_amd_aqlprofile_validate_event(_agent.get_agent(),
|
|
&_metrics.back().instances.back(),
|
|
&validate_event_result) !=
|
|
HSA_STATUS_SUCCESS);
|
|
LOG_IF(FATAL, !validate_event_result)
|
|
<< "Invalid Metric: " << block_index << " " << event_id;
|
|
}
|
|
}
|
|
// Check that we can collect all of the metrics in a single execution
|
|
// with a single AQL packet
|
|
can_collect();
|
|
_events = get_all_events();
|
|
}
|
|
|
|
std::unique_ptr<hsa::AQLPacket>
|
|
AQLPacketConstruct::construct_packet(const AmdExtTable& ext) const
|
|
{
|
|
const size_t MEM_PAGE_MASK = 0x1000 - 1;
|
|
auto pkt_ptr = std::make_unique<hsa::AQLPacket>(ext.hsa_amd_memory_pool_free_fn);
|
|
auto& pkt = *pkt_ptr;
|
|
if(_events.empty())
|
|
{
|
|
throw std::runtime_error("Constructing packet with no events");
|
|
}
|
|
|
|
pkt.profile = hsa_ven_amd_aqlprofile_profile_t{
|
|
_agent.get_agent(),
|
|
HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC, // SPM?
|
|
_events.data(),
|
|
static_cast<uint32_t>(_events.size()),
|
|
nullptr,
|
|
0u,
|
|
hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0},
|
|
hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0}};
|
|
auto& profile = pkt.profile;
|
|
|
|
hsa_amd_memory_pool_access_t _access = HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED;
|
|
ext.hsa_amd_agent_memory_pool_get_info_fn(_agent.get_agent(),
|
|
_agent.kernarg_pool(),
|
|
HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS,
|
|
static_cast<void*>(&_access));
|
|
// Memory is accessable by both the GPU and CPU, unlock the command buffer for
|
|
// sharing.
|
|
if(_access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED)
|
|
{
|
|
throw std::runtime_error(
|
|
fmt::format("Agent {} does not allow memory pool access for counter collection",
|
|
_agent.get_agent().handle));
|
|
}
|
|
|
|
auto throw_if_failed = [](auto status, auto& message) {
|
|
if(status != HSA_STATUS_SUCCESS)
|
|
{
|
|
throw std::runtime_error(message);
|
|
}
|
|
};
|
|
|
|
throw_if_failed(hsa_ven_amd_aqlprofile_start(&profile, nullptr),
|
|
"could not generate packet sizes");
|
|
|
|
if(profile.command_buffer.size == 0 || profile.output_buffer.size == 0)
|
|
{
|
|
throw std::runtime_error(
|
|
fmt::format("No command or output buffer size set. CMD_BUF={} PROFILE_BUF={}",
|
|
profile.command_buffer.size,
|
|
profile.output_buffer.size));
|
|
}
|
|
|
|
// Allocate buffers and check the results
|
|
auto alloc_and_check = [&](auto& pool, auto** mem_loc, auto size) -> bool {
|
|
bool malloced = false;
|
|
size_t page_aligned = (size + MEM_PAGE_MASK) & ~MEM_PAGE_MASK;
|
|
if(ext.hsa_amd_memory_pool_allocate_fn(
|
|
pool, page_aligned, 0, static_cast<void**>(mem_loc)) != HSA_STATUS_SUCCESS)
|
|
{
|
|
*mem_loc = malloc(page_aligned);
|
|
malloced = true;
|
|
}
|
|
else
|
|
{
|
|
CHECK(*mem_loc);
|
|
hsa_agent_t agent = _agent.get_agent();
|
|
// Memory is accessable by both the GPU and CPU, unlock the command buffer for
|
|
// sharing.
|
|
LOG_IF(FATAL,
|
|
ext.hsa_amd_agents_allow_access_fn(1, &agent, nullptr, *mem_loc) !=
|
|
HSA_STATUS_SUCCESS)
|
|
<< "Error: Allowing access to Command Buffer";
|
|
}
|
|
return malloced;
|
|
};
|
|
|
|
// Build command and output buffers
|
|
pkt.command_buf_mallocd = alloc_and_check(
|
|
_agent.cpu_pool(), &profile.command_buffer.ptr, profile.command_buffer.size);
|
|
pkt.output_buffer_malloced = alloc_and_check(
|
|
_agent.kernarg_pool(), &profile.output_buffer.ptr, profile.output_buffer.size);
|
|
memset(profile.output_buffer.ptr, 0x0, profile.output_buffer.size);
|
|
|
|
// throw if we do not construct the packets correctly.
|
|
throw_if_failed(hsa_ven_amd_aqlprofile_start(&profile, &pkt.start),
|
|
"could not generate start packet");
|
|
throw_if_failed(hsa_ven_amd_aqlprofile_stop(&profile, &pkt.stop),
|
|
"could not generate stop packet");
|
|
throw_if_failed(hsa_ven_amd_aqlprofile_read(&profile, &pkt.read),
|
|
"could not generate read packet");
|
|
return pkt_ptr;
|
|
}
|
|
|
|
std::vector<hsa_ven_amd_aqlprofile_event_t>
|
|
AQLPacketConstruct::get_all_events() const
|
|
{
|
|
std::vector<hsa_ven_amd_aqlprofile_event_t> ret;
|
|
for(const auto& metric : _metrics)
|
|
{
|
|
ret.insert(ret.end(), metric.instances.begin(), metric.instances.end());
|
|
}
|
|
return ret;
|
|
}
|
|
|
|
void
|
|
AQLPacketConstruct::can_collect()
|
|
{
|
|
// Verify that the counters fit within harrdware limits
|
|
std::map<std::pair<hsa_ven_amd_aqlprofile_block_name_t, uint32_t>, int64_t> counter_count;
|
|
std::map<std::pair<hsa_ven_amd_aqlprofile_block_name_t, uint32_t>, int64_t> max_allowed;
|
|
for(auto& metric : _metrics)
|
|
{
|
|
for(auto& instance : metric.instances)
|
|
{
|
|
auto block_pair = std::make_pair(instance.block_name, instance.block_index);
|
|
auto [iter, inserted] = counter_count.emplace(block_pair, 0);
|
|
iter->second++;
|
|
if(inserted)
|
|
{
|
|
max_allowed.emplace(block_pair, get_block_counters(_agent.get_agent(), instance));
|
|
}
|
|
}
|
|
}
|
|
|
|
// Check if the block count > max count
|
|
for(auto& [block_name, count] : counter_count)
|
|
{
|
|
if(auto* max = CHECK_NOTNULL(common::get_val(max_allowed, block_name)); count > *max)
|
|
{
|
|
throw std::runtime_error(
|
|
fmt::format("Block {} exceeds max number of hardware counters ({} > {})",
|
|
static_cast<int64_t>(block_name.first),
|
|
count,
|
|
*max));
|
|
}
|
|
}
|
|
}
|
|
} // namespace aql
|
|
} // namespace rocprofiler
|