Files
rocm-systems/source/lib/rocprofiler/aql/packet_construct.cpp
T
Benjamin Welton 010693b795 Agent, Counters, and AQL (#55)
* Migrate XML counter defs and reader from v1/v2

* Current Working Set

* Modified parser

* Evaluate AST Start

* Update lib/common/xml

- move definitions out of class declaration

* Update lib/rocprofiler/counters/parser

- update build of bison and flex build
  - reproducible generation
- add ROCPROFILER_REGENERATE_COUNTERS_PARSER option
- fix namespacing

* Update lib/rocprofiler/counters/xml

- change location of XML files and install them

* Update lib/rocprofiler/counter/tests

- normalize the test names
- improve test failures (more clear about where failure is)

* Update lib/rocprofiler/counters

- fix namespace
- update to new XML metrics directory

* Update lib/rocprofiler/CMakeLists.txt

- link to object library

* Update lib/rocprofiler/hsa/types.hpp

- reorganize includes

* Add metric loading class/printers

* Agent Implementation

* Queue Implementation (#79)

* Queue Implementation

* API Implementation For Counters (part 1) (#80)

* API Implementation For Counters

* Bewelton/counter collection 3 (#84)

* Added counter sample

* More changes

* More changes

* Update samples/counter_collection

- mostly formatting

* Update include/rocprofiler/counters.h

- formatting

* Add lib.common/synchronized.hpp

- Synchronized struct

* Update lib/rocprofiler/counters/xml/basic_counters.xml

- whitespace

* Update scripts/patch-parser.cmake

- tweaks for consistency

* Update lib/rocprofiler/counters/parser/tests/parser_tests.cpp

- formatting

* Update lib/rocprofiler/counters/parser

- improve consistency in rocprofiler-expr-parser-patch
- update parser.{h,cpp} and scanner.cpp
  - formatting + regenerated

* Update lib/rocprofiler/aql

- formatting
- clang-tidy fixes
- guard against memory pool access errors

* Update lib/rocprofiler/aql/tests

- formatting
- update use of get_val
- normalize test names

* Update lib/rocprofiler/counters/tests

- formatting
- patch basic_counters and derived_counters
- normalize test names

* Update lib/rocprofiler/aql/tests

- set_tests_properties

* Update test labels

- fix minor issue with gtest labels

* Update lib/rocprofiler/counters

- formatting
- clang-tidy fixes

* Update lib/rocprofiler/hsa

- fix includes
- formatting
- clang-tidy fixes
- tweak to queue_controller_init interface

* Update lib/rocprofiler

- include fixes
- namespace fixes
- clang-tidy fixes
- formatting

* Update scripts/run-ci.py

- exclude counters/parser from code coverage (generated files)

* Update include/rocprofiler/counters.h

- fix doxygen comment

* Update lib/rocprofiler/aql/packet_construct.cpp

- guard against HSA_AMD_MEMORY_POOL_ACCESS_DISALLOWED_BY_DEFAULT and HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED

* Update lib/rocprofiler/counters/parser/raw_ast.hpp

- clang-tidy fixes

* Update lib/rocprofiler/counters/evaluate_ast.hpp

- clang-tidy fixes

* Update lib/rocprofiler/aql/tests

- disable packet_generation_single and packet_generation_multi tests
  - the entire implementation rocprofiler::get_ext_table() is incorrect

* Minor fixes before cleanup

* More changes

* More fixes

* More fixes

* source formatting (clang-format v11) (#99)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Revert PTL submodule

* Update scripts/run-ci.py

- exclude counters/parser from code coverage (generated files)

* Migrating counters state to context

* Linting

* source formatting (clang-format v11) (#101)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* revert run-ci

* Testing fixes

* More test changes

* Fix minor typo

* Small queue change

* Small queue change

* source formatting (clang-format v11) (#102)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* source formatting (clang-format v11) (#105)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Documentation Change

* More documentation fixes

* source formatting (clang-format v11) (#106)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Threading fixes

* Threading fixes

* source formatting (clang-format v11) (#107)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Threading fixes

* More test fixes

* More agent fixes

* More build fixes

* source formatting (clang-format v11) (#109)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* changed test timeouts

* Build fix

* Build fix

* Updates to agent

* source formatting (clang-format v11) (#114)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* cmake formatting (cmake-format) (#113)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* remove git worktree folder

* Doc update

* testing fix

* Another test fix

* More test changes

* Rebase

* source formatting (clang-format v11) (#116)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Documentation

* source formatting (clang-format v11) (#119)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* PTL Changes

* Minor agent fix for empty labels

* source formatting (clang-format v11) (#120)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Minor agent fix for empty labels

* Refactor read_map

* source formatting (clang-format v11) (#121)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

* Refactor read_map

* Cache fixes

* source formatting (clang-format v11) (#122)

Co-authored-by: bwelton <bwelton@users.noreply.github.com>

---------

Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bwelton <bwelton@users.noreply.github.com>
2023-10-16 15:41:40 -05:00

190 строки
7.1 KiB
C++

#include "lib/rocprofiler/aql/packet_construct.hpp"
#include <fmt/core.h>
#include <hsa/hsa_ext_amd.h>
#include "glog/logging.h"
namespace rocprofiler
{
namespace aql
{
AQLPacketConstruct::AQLPacketConstruct(const hsa::AgentCache& agent,
const std::vector<counters::Metric>& metrics)
: _agent(agent)
{
if(metrics.empty())
{
throw std::runtime_error("No metrics supplied");
}
// Validate that the counter exists and construct the block instances
// for the counter.
for(const auto& x : metrics)
{
auto query_info = get_query_info(_agent.get_agent(), x);
_metrics.emplace_back().metric = x;
uint32_t event_id = std::atoi(x.event().c_str());
for(unsigned block_index = 0; block_index < query_info.instance_count; ++block_index)
{
_metrics.back().instances.push_back(
{static_cast<hsa_ven_amd_aqlprofile_block_name_t>(query_info.id),
block_index,
event_id});
bool validate_event_result;
LOG_IF(FATAL,
hsa_ven_amd_aqlprofile_validate_event(_agent.get_agent(),
&_metrics.back().instances.back(),
&validate_event_result) !=
HSA_STATUS_SUCCESS);
LOG_IF(FATAL, !validate_event_result)
<< "Invalid Metric: " << block_index << " " << event_id;
}
}
// Check that we can collect all of the metrics in a single execution
// with a single AQL packet
can_collect();
_events = get_all_events();
}
std::unique_ptr<hsa::AQLPacket>
AQLPacketConstruct::construct_packet(const AmdExtTable& ext) const
{
const size_t MEM_PAGE_MASK = 0x1000 - 1;
auto pkt_ptr = std::make_unique<hsa::AQLPacket>(ext.hsa_amd_memory_pool_free_fn);
auto& pkt = *pkt_ptr;
if(_events.empty())
{
throw std::runtime_error("Constructing packet with no events");
}
pkt.profile = hsa_ven_amd_aqlprofile_profile_t{
_agent.get_agent(),
HSA_VEN_AMD_AQLPROFILE_EVENT_TYPE_PMC, // SPM?
_events.data(),
static_cast<uint32_t>(_events.size()),
nullptr,
0u,
hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0},
hsa_ven_amd_aqlprofile_descriptor_t{.ptr = nullptr, .size = 0}};
auto& profile = pkt.profile;
hsa_amd_memory_pool_access_t _access = HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED;
ext.hsa_amd_agent_memory_pool_get_info_fn(_agent.get_agent(),
_agent.kernarg_pool(),
HSA_AMD_AGENT_MEMORY_POOL_INFO_ACCESS,
static_cast<void*>(&_access));
// Memory is accessable by both the GPU and CPU, unlock the command buffer for
// sharing.
if(_access == HSA_AMD_MEMORY_POOL_ACCESS_NEVER_ALLOWED)
{
throw std::runtime_error(
fmt::format("Agent {} does not allow memory pool access for counter collection",
_agent.get_agent().handle));
}
auto throw_if_failed = [](auto status, auto& message) {
if(status != HSA_STATUS_SUCCESS)
{
throw std::runtime_error(message);
}
};
throw_if_failed(hsa_ven_amd_aqlprofile_start(&profile, nullptr),
"could not generate packet sizes");
if(profile.command_buffer.size == 0 || profile.output_buffer.size == 0)
{
throw std::runtime_error(
fmt::format("No command or output buffer size set. CMD_BUF={} PROFILE_BUF={}",
profile.command_buffer.size,
profile.output_buffer.size));
}
// Allocate buffers and check the results
auto alloc_and_check = [&](auto& pool, auto** mem_loc, auto size) -> bool {
bool malloced = false;
size_t page_aligned = (size + MEM_PAGE_MASK) & ~MEM_PAGE_MASK;
if(ext.hsa_amd_memory_pool_allocate_fn(
pool, page_aligned, 0, static_cast<void**>(mem_loc)) != HSA_STATUS_SUCCESS)
{
*mem_loc = malloc(page_aligned);
malloced = true;
}
else
{
CHECK(*mem_loc);
hsa_agent_t agent = _agent.get_agent();
// Memory is accessable by both the GPU and CPU, unlock the command buffer for
// sharing.
LOG_IF(FATAL,
ext.hsa_amd_agents_allow_access_fn(1, &agent, nullptr, *mem_loc) !=
HSA_STATUS_SUCCESS)
<< "Error: Allowing access to Command Buffer";
}
return malloced;
};
// Build command and output buffers
pkt.command_buf_mallocd = alloc_and_check(
_agent.cpu_pool(), &profile.command_buffer.ptr, profile.command_buffer.size);
pkt.output_buffer_malloced = alloc_and_check(
_agent.kernarg_pool(), &profile.output_buffer.ptr, profile.output_buffer.size);
memset(profile.output_buffer.ptr, 0x0, profile.output_buffer.size);
// throw if we do not construct the packets correctly.
throw_if_failed(hsa_ven_amd_aqlprofile_start(&profile, &pkt.start),
"could not generate start packet");
throw_if_failed(hsa_ven_amd_aqlprofile_stop(&profile, &pkt.stop),
"could not generate stop packet");
throw_if_failed(hsa_ven_amd_aqlprofile_read(&profile, &pkt.read),
"could not generate read packet");
return pkt_ptr;
}
std::vector<hsa_ven_amd_aqlprofile_event_t>
AQLPacketConstruct::get_all_events() const
{
std::vector<hsa_ven_amd_aqlprofile_event_t> ret;
for(const auto& metric : _metrics)
{
ret.insert(ret.end(), metric.instances.begin(), metric.instances.end());
}
return ret;
}
void
AQLPacketConstruct::can_collect()
{
// Verify that the counters fit within harrdware limits
std::map<std::pair<hsa_ven_amd_aqlprofile_block_name_t, uint32_t>, int64_t> counter_count;
std::map<std::pair<hsa_ven_amd_aqlprofile_block_name_t, uint32_t>, int64_t> max_allowed;
for(auto& metric : _metrics)
{
for(auto& instance : metric.instances)
{
auto block_pair = std::make_pair(instance.block_name, instance.block_index);
auto [iter, inserted] = counter_count.emplace(block_pair, 0);
iter->second++;
if(inserted)
{
max_allowed.emplace(block_pair, get_block_counters(_agent.get_agent(), instance));
}
}
}
// Check if the block count > max count
for(auto& [block_name, count] : counter_count)
{
if(auto* max = CHECK_NOTNULL(common::get_val(max_allowed, block_name)); count > *max)
{
throw std::runtime_error(
fmt::format("Block {} exceeds max number of hardware counters ({} > {})",
static_cast<int64_t>(block_name.first),
count,
*max));
}
}
}
} // namespace aql
} // namespace rocprofiler