Arquivos
rocm-systems/source/lib/rocprofiler-sdk/code_object/code_object.cpp
T
Madsen, Jonathan 00c46fd5e5 SDK: OMPT Support (#22)
* Ability to select alternative compiler per file

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Misc updates

Update OpenMP target sample

- samples/ompt -> samples/openmp_target
- fix sample test of openmp-target
- reorganize files

Rework OpenMP implementation

Minor OpenMP implementation cleanup

Rename samples/openmp_target CMake targets

Add tests/bin/openmp

- OpenMP target test app in tests/bin/openmp/target

Format samples/openmp_target CMakeLists.txt

Misc lib/rocprofiler-sdk/openmp cleanup

- fix includes
- convert_arg

Update openmp.def.cpp

- tweak includes
- remove lots of temporary variables

Update samples

- common::get_callback_id_names() -> common::get_callback_tracing_names()
- add kernel dispatch, memory copy, scratch memory buffered tracing to openmp target sample

Fix code object operation names

- add "CODE_OBJECT_" prefix

Update include/rocprofiler-sdk/openmp/api_id.h

- remove spurious comment

Miscellaneous openmp updates

- similar API for openmp_begin and openmp_end
- move implementations of ompt callbacks to openmp.cpp
- ompt_{thread_begin,thread_end,parallel_begin,parallel_end}_callbacks are openmp_events

[SWDEV-484495] Fix int truncation in CSV output (#1098)

CSV output truncates doubles to ints when it shouldn't. Derived metrics
are (mostly) doubles and lose precision (or become worthless) if treated
as an int. Converted these to double to match the format we return from
rocprof-sdk.

Co-authored-by: Benjamin Welton <ben@amd.com>

Update limit for max counter records in rocprof-tool (#1073)

A fixed sized std::array is used to store counter records in rocprofiler SDK. This limit was breached in SWDEV-484742. Upping the limit to 512 to be less likely to reach this limit again.

adding proxy ompt_data_t * arguments

fixes for proxy pointers

- Implement proxy ompt_data_t* pointers for clients
- Add ompt_data_t* arguments back to callback API
- Modify openmp sample to illustrate use of proxy pointers

formatting

SWDEV-467350: Skipping tool counter iteration for unsupported hardware (#1083)

Fixing some accumulate metrics (#1089)

* Fixing some accumulate metrics

* Fixing some more accumulate metrics

---------

Co-authored-by: Benjamin Welton <bewelton@amd.com>

updating rocprofv3 help options (#1113)

* updating rocprofv3 help options

* updating CHANGELOG

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests. (#1112)

* SWDEV-488948: PC Sampling - Correlation class to provide some thread safety. Adding multithread tests.

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/parser/correlation.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Adding backlog for codeobj changes

* Formatting

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

* Update source/lib/rocprofiler-sdk/pc_sampling/code_object.hpp

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

---------

Co-authored-by: Vladimir Indic <139573562+vlaindic@users.noreply.github.com>

SWDEV-487621: Fixes for metric definitions (#1118)

* Fixes for metric definitions

* Removing gfx8

* Update changelog

* Fixing unit tests

* Small fixes

* Fix for write size

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3

clang-18 build fix for RCCL (#1123)

Removes ambiguity on const usage, which clang-18 complains about
(preventing build with warn error).

mem copy direction field update (#1124)

Adding Node-id for debugging with log level trace (#1090)

fix botched rebase

Per Jonathan to remove -rdynamic warning so CI will continue

pedantic formatting

Correct the package name of rocprofiler-sdk (#1126)

* Correct the package name of rocprofiler-sdk

ROCM VERSION(for ex: 60300) was missing in the package name.
Added the same

* Use cmake cache string while setting the variable for ROCm Version

* correct the cmake-format

---------

Co-authored-by: Ranjith Ramakrishnan <Ranjith.Ramakrishnan@amd.com>

Fixing kokkosp tool library packaging (#1121)

* Fixing kokkosp tool library packaging

* Update source/lib/rocprofiler-sdk-tool/kokkosp/CMakeLists.txt

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update CMakeLists.txt

* Update CMakeLists.txt

* Component Requirement in CPack

* Adding package dependency

* Update CMakeLists.txt

* Update rocprofiler_config_packaging.cmake

* Fix rocprofiler-sdk-tool-kokkosp BUILD/INSTALL RPATH

- CMAKE_INSTALL_LIBDIR doesn't help

* Add BUILD/INSTALL RPATH to rocprofv3-trigger-list-metrics

- fixes packaging issues

* Update packaging

- core depends on rocprofiler-sdk-roctx
- add CPACK_DEBIAN_PACKAGE_SHLIBDEPS_PRIVATE_DIRS to resolve inter-package dependencies

* Fix package depends version format

* Improve tests/rocprofv3/summary/validate logging

* Update CI workflow

- prioritize roctx package in Install Packages step

* Remove setting <package-name>_VERSION in config.cmake.in

- this is automatically handled by existence of <package-name>-config-version.cmake

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements to same major and minor version

* Update rocprofiler-sdk-config.cmake

- relax find_package versioning requirements (remove EXACT, specify range)

* Tweak CI workflow

* Update perfetto_reader.py

- better handle failure to load trace processor

* Misc cleanup for config packaging

* Update config packaging

* Update config packaging

* Revert perfetto for core-rpm packages

* Revert perfetto for core-rpm packages

- perfetto < 0.9.0

* Tweak tests/rocprofv3/summary/validate.py

- reorder some checks

---------

Co-authored-by: Ammar Elwazir <aelwazir@useocpm2m-387-013.amd.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>

Clang Warning Fixes (#1131)

Builds prevented on clang-18

Adding start and end timestamp columns in csv (#1128)

* Adding start and end timestamp columns in csv

* Adding assert check for the counter timestamps

---------

Co-authored-by: Gopesh Bhardwaj <gopesh.bhardwaj@amd.com>

rocprofv3: docs and help menu updates (#1129)

* doc updates

* Correcting ROCtx information

* Making ROCTx string consistent

* missing occurence

Renamed agent profiling service to device counting service (#1132)

* Renamed agent profiling service to device counting service

Name more aptly represents what agent profiling did (device wide
counter collection). Conversion of existing user code can be
performed by the following find/sed command:

find . -type f -exec sed -i 's/rocprofiler_agent_profile_callback_t/rocprofiler_device_counting_service_callback_t/g; s/rocprofiler_configure_agent_profile_counting_service/rocprofiler_configure_device_counting_service/g; s/agent_profile.h/device_counting_service.h/g; s/rocprofiler_sample_agent_profile_counting_service/rocprofiler_sample_device_counting_service/g' {} +

* Converted dispatch profile to dispatch counting service

* Debug for functioal counters test

* Minor changes for CI

* Minor fix

* More fixes for CI

* Update evaluate_ast.cpp

---------

Co-authored-by: Benjamin Welton <ben@amd.com>

Testing updated RPM dockers (#1136)

* Testing updated RPM dockers

* Trying to fix PSDB for test package dependency

Agent Profiling Fixes for Broken/Improper API Usage (#1122)

Prevent's multiple setups of agent profiling on the same agent.

Fixes agent read context to only read agents that were setup.

Prevent copy of agent profiling internal data struct and reset
hsa_signal on move to prevent inadvertant delete.

Simplifying PR template (#1139)

Implementation of ompt interface to rocprofiler SDK. task_create and task_schedule are not supported.

Fixing installed pacakge tests in CI (#1119)

* Fixing installed pacakge tests in CI

* Formatted rocprofv3.py with black formatter

Fix PSDB change (#1120)

Reverts change to `source/include/rocprofiler-sdk/callback_tracing.h`
from commit 9b2ece76c3

delete unused files

added arguments to some OMPT buffter records

* Fix cmake issues

Remove rocprofiler_ompt_finalize_tool

- a public API function is not necessary: should just finalize rocprofiler-sdk

Fix duplicate ROCPROFILER_{BUFFER,CALLBACK}_TRACING_KIND_STRING

Add lib/rocprofiler-sdk/ompt.hpp

- declares rocprofiler::sdk::finalize_ompt

Remove change to tests/rocprofv3/summary/conftest.py

Add set_fini_status(1) back to registration.cpp

Deleted uneeded files

Incoporate OpenMP code and sample

Fix merge issues with amd-staging

Add push_correlation_id for OpenMP tasking; improve debugability

fixup bad merge

* Suppress OpenMP data race

* Fix openmp_target sample

* Enum and struct name changes + source code reorg

- remove mix of ompt and openmp
  - opted for ompt
- changes made for consistency
  - ompt_api -> ompt
  - openmp_api -> ompt
  - OPENMP -> OMPT

* Update tests and more renaming

- dest_device_num -> dst_device_num
- src_addr -> src_address
- dest_addr -> dst_address
- remove info_type::begin
- require OMP_TARGET_OFFLOAD

* Update openmp-target test/sample env and labels

* Formatting

* Tweaks to cmake for openmp target

- Disable for thread sanitizers due to preloading issue

* OpenMP target cmake updates

- remove gfx1010 (fails on mi300)
- OPENMP_GPU_TARGETS

* Remove device_unload and target_map_emi support

- these are never supported by AMD OpenMP compilers

* Update CI workflow

- exclude openmp-target tests from navi3 and vega20

---------

Co-authored-by: Larry Meadows <Lawrence.Meadows@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2024-12-05 22:48:19 -06:00

1029 linhas
39 KiB
C++

// MIT License
//
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
// furnished to do so, subject to the following conditions:
//
// The above copyright notice and this permission notice shall be included in
// all copies or substantial portions of the Software.
//
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
// THE SOFTWARE.
#include "lib/rocprofiler-sdk/code_object/code_object.hpp"
#include "lib/common/scope_destructor.hpp"
#include "lib/common/static_object.hpp"
#include "lib/common/string_entry.hpp"
#include "lib/common/synchronized.hpp"
#include "lib/common/utility.hpp"
#include "lib/rocprofiler-sdk/agent.hpp"
#include "lib/rocprofiler-sdk/code_object/hsa/code_object.hpp"
#include "lib/rocprofiler-sdk/code_object/hsa/kernel_symbol.hpp"
#include "lib/rocprofiler-sdk/context/context.hpp"
#include "lib/rocprofiler-sdk/hsa/hsa.hpp"
#include <rocprofiler-sdk/callback_tracing.h>
#include <rocprofiler-sdk/fwd.h>
#include <rocprofiler-sdk/hsa.h>
#include <hsa/hsa.h>
#include <hsa/hsa_api_trace.h>
#include <hsa/hsa_ven_amd_loader.h>
#include <atomic>
#include <cstdint>
#include <cstdlib>
#include <regex>
#include <string_view>
#include <utility>
#include <vector>
namespace rocprofiler
{
namespace code_object
{
namespace
{
using context_t = context::context;
using context_array_t = common::container::small_vector<const context_t*>;
using external_corr_id_map_t = std::unordered_map<const context_t*, rocprofiler_user_data_t>;
template <size_t OpIdx>
struct code_object_info;
#define SPECIALIZE_CODE_OBJECT_INFO(OPERATION) \
template <> \
struct code_object_info<ROCPROFILER_CODE_OBJECT_##OPERATION> \
{ \
static constexpr auto operation_idx = ROCPROFILER_CODE_OBJECT_##OPERATION; \
static constexpr auto name = "CODE_OBJECT_" #OPERATION; \
};
SPECIALIZE_CODE_OBJECT_INFO(NONE)
SPECIALIZE_CODE_OBJECT_INFO(LOAD)
SPECIALIZE_CODE_OBJECT_INFO(DEVICE_KERNEL_SYMBOL_REGISTER)
#undef SPECIALIZE_CODE_OBJECT_INFO
template <size_t Idx, size_t... IdxTail>
const char*
name_by_id(const uint32_t id, std::index_sequence<Idx, IdxTail...>)
{
if(Idx == id) return code_object_info<Idx>::name;
if constexpr(sizeof...(IdxTail) > 0)
return name_by_id(id, std::index_sequence<IdxTail...>{});
else
return nullptr;
}
template <size_t Idx, size_t... IdxTail>
uint32_t
id_by_name(const char* name, std::index_sequence<Idx, IdxTail...>)
{
if(std::string_view{code_object_info<Idx>::name} == std::string_view{name})
return code_object_info<Idx>::operation_idx;
if constexpr(sizeof...(IdxTail) > 0)
return id_by_name(name, std::index_sequence<IdxTail...>{});
else
return ROCPROFILER_CODE_OBJECT_NONE;
}
template <size_t... Idx>
void
get_ids(std::vector<uint32_t>& _id_list, std::index_sequence<Idx...>)
{
auto _emplace = [](auto& _vec, uint32_t _v) {
if(_v < static_cast<uint32_t>(ROCPROFILER_CODE_OBJECT_LAST)) _vec.emplace_back(_v);
};
(_emplace(_id_list, code_object_info<Idx>::operation_idx), ...);
}
template <size_t... Idx>
void
get_names(std::vector<const char*>& _name_list, std::index_sequence<Idx...>)
{
auto _emplace = [](auto& _vec, const char* _v) {
if(_v != nullptr && strnlen(_v, 1) > 0) _vec.emplace_back(_v);
};
(_emplace(_name_list, code_object_info<Idx>::name), ...);
}
} // namespace
// check out the assembly here... this compiles to a switch statement
const char*
name_by_id(uint32_t id)
{
return name_by_id(id, std::make_index_sequence<ROCPROFILER_CODE_OBJECT_LAST>{});
}
uint32_t
id_by_name(const char* name)
{
return id_by_name(name, std::make_index_sequence<ROCPROFILER_CODE_OBJECT_LAST>{});
}
std::vector<uint32_t>
get_ids()
{
auto _data = std::vector<uint32_t>{};
_data.reserve(ROCPROFILER_CODE_OBJECT_LAST);
get_ids(_data, std::make_index_sequence<ROCPROFILER_CODE_OBJECT_LAST>{});
return _data;
}
std::vector<const char*>
get_names()
{
auto _data = std::vector<const char*>{};
_data.reserve(ROCPROFILER_CODE_OBJECT_LAST);
get_names(_data, std::make_index_sequence<ROCPROFILER_CODE_OBJECT_LAST>{});
return _data;
}
namespace
{
using hsa_loader_table_t = hsa_ven_amd_loader_1_01_pfn_t;
using context_t = context::context;
using user_data_t = rocprofiler_user_data_t;
using context_array_t = context::context_array_t;
using context_user_data_map_t = std::unordered_map<const context_t*, user_data_t>;
using amd_compute_pgm_rsrc_three32_t = uint32_t;
struct kernel_descriptor_t
{
uint8_t reserved0[16];
int64_t kernel_code_entry_byte_offset;
uint8_t reserved1[20];
uint32_t compute_pgm_rsrc3;
uint32_t compute_pgm_rsrc1;
uint32_t compute_pgm_rsrc2;
uint16_t kernel_code_properties;
uint8_t reserved2[6];
};
// AMD Compute Program Resource Register Three.
enum amd_compute_gfx9_pgm_rsrc_three_t
{
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_ACCUM_OFFSET, 0, 5),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_TG_SPLIT, 16, 1)
};
enum amd_compute_gfx10_gfx11_pgm_rsrc_three_t
{
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_SHARED_VGPR_COUNT, 0, 4),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_INST_PREF_SIZE, 4, 6),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_TRAP_ON_START, 10, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_TRAP_ON_END, 11, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_COMPUTE_PGM_RSRC_THREE_IMAGE_OP, 31, 1)
};
// Kernel code properties.
enum amd_kernel_code_property_t
{
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER,
0,
1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_PTR, 1, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_QUEUE_PTR, 2, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_KERNARG_SEGMENT_PTR,
3,
1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_DISPATCH_ID, 4, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_FLAT_SCRATCH_INIT, 5, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_SGPR_PRIVATE_SEGMENT_SIZE,
6,
1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_RESERVED0, 7, 3),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32,
10,
1), // GFX10+
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_USES_DYNAMIC_STACK, 11, 1),
AMD_HSA_BITS_CREATE_ENUM_ENTRIES(AMD_KERNEL_CODE_PROPERTY_RESERVED1, 12, 4),
};
uint32_t
arch_vgpr_count(std::string_view name, kernel_descriptor_t kernel_code)
{
if(name == "gfx90a" || name.find("gfx94") == 0)
return (AMD_HSA_BITS_GET(kernel_code.compute_pgm_rsrc3,
AMD_COMPUTE_PGM_RSRC_THREE_ACCUM_OFFSET) +
1) *
4;
return (AMD_HSA_BITS_GET(kernel_code.compute_pgm_rsrc1,
AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WORKITEM_VGPR_COUNT) +
1) *
(AMD_HSA_BITS_GET(kernel_code.kernel_code_properties,
AMD_KERNEL_CODE_PROPERTY_ENABLE_WAVEFRONT_SIZE32)
? 8
: 4);
}
uint32_t
accum_vgpr_count(std::string_view name, kernel_descriptor_t kernel_code)
{
if(name == "gfx908")
return arch_vgpr_count(name, kernel_code);
else if(name == "gfx90a" || name.find("gfx94") == 0)
return (AMD_HSA_BITS_GET(kernel_code.compute_pgm_rsrc1,
AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WORKITEM_VGPR_COUNT) +
1) *
(8 - arch_vgpr_count(name, kernel_code));
bool emplaced = false;
{
static auto warned = std::unordered_set<std::string>{};
static auto mtx = std::mutex{};
auto lk = std::unique_lock<std::mutex>{mtx};
emplaced = warned.emplace(name).second;
}
ROCP_INFO_IF(emplaced) << "Missing support for accum_vgpr_count for " << name;
return 0;
}
uint32_t
sgpr_count(std::string_view name, kernel_descriptor_t kernel_code)
{
// GFX10 and later always allocate 128 sgprs.
constexpr uint32_t gfx10_sgprs = 128;
auto begp = name.find_first_of("0123456789");
if(!name.empty() && begp != std::string_view::npos)
{
auto endp = name.find_first_not_of("0123456789", begp);
auto lenp = (endp - begp) + 1;
auto gfxip_str = name.substr(begp, lenp);
auto gfxip_n = int32_t{0};
if(!gfxip_str.empty()) gfxip_n = std::stoi(std::string{gfxip_str});
if(gfxip_n >= 1000)
{
return gfx10_sgprs;
}
else
{
return (AMD_HSA_BITS_GET(kernel_code.compute_pgm_rsrc1,
AMD_COMPUTE_PGM_RSRC_ONE_GRANULATED_WAVEFRONT_SGPR_COUNT) /
2 +
1) *
16;
}
}
bool emplaced = false;
{
static auto warned = std::unordered_set<std::string>{};
static auto mtx = std::mutex{};
auto lk = std::unique_lock<std::mutex>{mtx};
emplaced = warned.emplace(name).second;
}
ROCP_INFO_IF(emplaced) << "Missing support for sgpr_count for " << name;
return 0;
}
hsa_loader_table_t&
get_loader_table()
{
static auto _v = []() {
auto _val = hsa_loader_table_t{};
memset(&_val, 0, sizeof(hsa_loader_table_t));
return _val;
}();
return _v;
}
auto*&
get_status_string_function()
{
static decltype(::hsa_status_string)* _v = nullptr;
return _v;
}
std::string_view
get_status_string(hsa_status_t _status)
{
const char* _msg = nullptr;
if(get_status_string_function() &&
get_status_string_function()(_status, &_msg) == HSA_STATUS_SUCCESS && _msg)
return std::string_view{_msg};
return std::string_view{"(unknown HSA error)"};
}
const kernel_descriptor_t*
get_kernel_descriptor(uint64_t kernel_object)
{
const kernel_descriptor_t* kernel_code = nullptr;
if(get_loader_table().hsa_ven_amd_loader_query_host_address == nullptr) return kernel_code;
hsa_status_t status = get_loader_table().hsa_ven_amd_loader_query_host_address(
reinterpret_cast<const void*>(kernel_object), // NOLINT(performance-no-int-to-ptr)
reinterpret_cast<const void**>(&kernel_code));
if(status == HSA_STATUS_SUCCESS) return kernel_code;
ROCP_WARNING << "hsa_ven_amd_loader_query_host_address(kernel_object=" << kernel_object
<< ") returned " << status << ": " << get_status_string(status);
// NOLINTNEXTLINE(performance-no-int-to-ptr)
return reinterpret_cast<kernel_descriptor_t*>(kernel_object);
}
auto&
get_code_object_id()
{
static auto _v = std::atomic<uint64_t>{};
return _v;
}
auto&
get_kernel_symbol_id()
{
static auto _v = std::atomic<uint64_t>{};
return _v;
}
using kernel_object_map_t = std::unordered_map<uint64_t, uint64_t>;
using executable_array_t = std::vector<hsa_executable_t>;
using code_object_unload_array_t = std::vector<hsa::code_object_unload>;
std::vector<hsa::code_object_unload>
shutdown(hsa_executable_t executable);
bool is_shutdown = false;
auto*
get_executables()
{
static auto*& _v = common::static_object<common::Synchronized<executable_array_t>>::construct();
return _v;
}
auto*
get_code_objects()
{
static auto*& _v =
common::static_object<common::Synchronized<code_object_array_t>>::construct();
return _v;
}
auto*
get_kernel_object_map()
{
static auto*& _v =
common::static_object<common::Synchronized<kernel_object_map_t>>::construct();
return _v;
}
hsa_status_t
executable_iterate_agent_symbols_load_callback(hsa_executable_t executable,
hsa_agent_t agent,
hsa_executable_symbol_t symbol,
void* args)
{
#define ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(...) \
{ \
auto _status = core_table.hsa_executable_symbol_get_info_fn(symbol, __VA_ARGS__); \
ROCP_ERROR_IF(_status != HSA_STATUS_SUCCESS) \
<< "core_table.hsa_executable_symbol_get_info_fn(hsa_executable_symbol_t{.handle=" \
<< symbol.handle << "}, " << #__VA_ARGS__ << " failed"; \
if(_status != HSA_STATUS_SUCCESS) return _status; \
}
auto& core_table = *::rocprofiler::hsa::get_core_table();
auto* code_obj_v = static_cast<hsa::code_object*>(args);
auto symbol_v = hsa::kernel_symbol{};
auto& data = symbol_v.rocp_data;
symbol_v.hsa_executable = executable;
symbol_v.hsa_agent = agent;
symbol_v.hsa_symbol = symbol;
auto exists = std::any_of(code_obj_v->symbols.begin(),
code_obj_v->symbols.end(),
[&symbol_v](auto& itr) { return (itr && symbol_v == *itr); });
// if there is an existing matching kernel symbol, return success and move onto next symbol
if(exists) return HSA_STATUS_SUCCESS;
ROCP_FATAL_IF(data.size == 0) << "kernel symbol did not properly initialized the size field "
"upon construction (this is likely a compiler bug)";
auto type = hsa_symbol_kind_t{};
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_TYPE, &type);
if(type != HSA_SYMBOL_KIND_KERNEL) return HSA_STATUS_SUCCESS;
// set the code object id
data.code_object_id = code_obj_v->rocp_data.code_object_id;
// compute the kernel name length
constexpr auto name_length_max = std::numeric_limits<uint32_t>::max();
uint32_t _name_length = 0;
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_NAME_LENGTH, &_name_length);
ROCP_CI_LOG_IF(WARNING, _name_length > name_length_max / 2)
<< "kernel symbol name length is extremely large: " << _name_length;
// set the kernel name
if(_name_length > 0 && _name_length < name_length_max)
{
auto _name = std::string(_name_length + 1, '\0');
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_NAME, _name.data());
symbol_v.name = common::get_string_entry(_name.substr(0, _name.find_first_of('\0')));
}
data.kernel_name = (symbol_v.name) ? symbol_v.name->c_str() : nullptr;
// these should all be self-explanatory
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_OBJECT,
&data.kernel_object);
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_SIZE,
&data.kernarg_segment_size);
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_KERNARG_SEGMENT_ALIGNMENT,
&data.kernarg_segment_alignment);
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_GROUP_SEGMENT_SIZE,
&data.group_segment_size);
ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO(HSA_EXECUTABLE_SYMBOL_INFO_KERNEL_PRIVATE_SEGMENT_SIZE,
&data.private_segment_size);
// This works for gfx9 but may not for Navi arch
const auto* kernel_descript = get_kernel_descriptor(data.kernel_object);
if(CHECK_NOTNULL(code_obj_v) && CHECK_NOTNULL(kernel_descript))
{
const auto* rocp_agent = agent::get_agent(code_obj_v->rocp_data.rocp_agent);
if(CHECK_NOTNULL(rocp_agent))
{
data.arch_vgpr_count = arch_vgpr_count(rocp_agent->name, *kernel_descript);
data.accum_vgpr_count = accum_vgpr_count(rocp_agent->name, *kernel_descript);
data.sgpr_count = sgpr_count(rocp_agent->name, *kernel_descript);
}
}
// if we have reached this point (i.e. there were no HSA errors returned within macro) then we
// generate a unique kernel symbol id
data.kernel_id = ++get_kernel_symbol_id();
CHECK_NOTNULL(get_kernel_object_map())
->wlock(
[](kernel_object_map_t& object_map, uint64_t _kern_obj, uint64_t _kern_id) {
object_map[_kern_obj] = _kern_id;
},
data.kernel_object,
data.kernel_id);
code_obj_v->symbols.emplace_back(std::make_unique<hsa::kernel_symbol>(std::move(symbol_v)));
return HSA_STATUS_SUCCESS;
#undef ROCP_HSA_CORE_GET_EXE_SYMBOL_INFO
}
hsa_status_t
executable_iterate_agent_symbols_unload_callback(hsa_executable_t executable,
hsa_agent_t agent,
hsa_executable_symbol_t symbol,
void* args)
{
auto symbol_v = hsa::kernel_symbol{};
symbol_v.hsa_executable = executable;
symbol_v.hsa_agent = agent;
symbol_v.hsa_symbol = symbol;
auto* code_obj_v = static_cast<hsa::code_object_unload*>(args);
CHECK_NOTNULL(code_obj_v);
CHECK_NOTNULL(code_obj_v->object);
for(const auto& itr : code_obj_v->object->symbols)
{
if(itr && *itr == symbol_v) code_obj_v->symbols.emplace_back(itr.get());
}
return HSA_STATUS_SUCCESS;
}
hsa_status_t
code_object_load_callback(hsa_executable_t executable,
hsa_loaded_code_object_t loaded_code_object,
void* cb_data)
{
#define ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(...) \
{ \
auto _status = loader_table.hsa_ven_amd_loader_loaded_code_object_get_info( \
loaded_code_object, __VA_ARGS__); \
ROCP_ERROR_IF(_status != HSA_STATUS_SUCCESS) \
<< "loader_table.hsa_ven_amd_loader_loaded_code_object_get_info(loaded_code_object, " \
<< #__VA_ARGS__ << " failed"; \
if(_status != HSA_STATUS_SUCCESS) return _status; \
}
auto& loader_table = get_loader_table();
auto code_obj_v = hsa::code_object{};
auto& data = code_obj_v.rocp_data;
uint32_t _storage_type = ROCPROFILER_CODE_OBJECT_STORAGE_TYPE_NONE;
ROCP_FATAL_IF(data.size == 0) << "code object did not properly initialized the size field upon "
"construction (this is likely a compiler bug)";
code_obj_v.hsa_executable = executable;
code_obj_v.hsa_code_object = loaded_code_object;
auto* code_obj_vec = static_cast<code_object_array_t*>(cb_data);
auto exists = std::any_of(code_obj_vec->begin(), code_obj_vec->end(), [&code_obj_v](auto& itr) {
return (itr && code_obj_v == *itr);
});
// if there is an existing matching code object, check for any new symbols and then return
// success and move onto next code object
if(exists)
{
for(auto& itr : *code_obj_vec)
{
if(itr && *itr == code_obj_v)
{
::rocprofiler::hsa::get_core_table()->hsa_executable_iterate_agent_symbols_fn(
executable,
data.hsa_agent,
executable_iterate_agent_symbols_load_callback,
itr.get());
}
}
return HSA_STATUS_SUCCESS;
}
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE, &_storage_type);
ROCP_FATAL_IF(_storage_type >= ROCPROFILER_CODE_OBJECT_STORAGE_TYPE_LAST)
<< "HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_TYPE returned an "
"unsupported code object storage type. Expected 0=none, 1=file, or 2=memory but "
"received a value of "
<< _storage_type;
data.storage_type = static_cast<rocprofiler_code_object_storage_type_t>(_storage_type);
if(_storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_FILE)
{
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_FILE,
&data.storage_file);
}
else if(_storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_MEMORY)
{
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_BASE,
&data.memory_base);
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(
HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_CODE_OBJECT_STORAGE_MEMORY_SIZE,
&data.memory_size);
}
else if(_storage_type == HSA_VEN_AMD_LOADER_CODE_OBJECT_STORAGE_TYPE_NONE)
{
ROCP_WARNING << "Code object storage type of none was ignored";
return HSA_STATUS_SUCCESS;
}
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_BASE,
&data.load_base);
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_SIZE,
&data.load_size);
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_LOAD_DELTA,
&data.load_delta);
constexpr auto uri_length_max = std::numeric_limits<uint32_t>::max();
auto _uri_length = uint32_t{0};
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI_LENGTH,
&_uri_length);
ROCP_CI_LOG_IF(WARNING, _uri_length > uri_length_max / 2)
<< "code object uri length is extremely large: " << _uri_length;
if(_uri_length > 0 && _uri_length < uri_length_max)
{
auto _uri = std::string(_uri_length + 1, '\0');
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_URI,
_uri.data());
code_obj_v.uri = common::get_string_entry(_uri);
}
data.uri = (code_obj_v.uri) ? code_obj_v.uri->data() : nullptr;
auto _hsa_agent = hsa_agent_t{};
ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO(HSA_VEN_AMD_LOADER_LOADED_CODE_OBJECT_INFO_AGENT,
&data.hsa_agent);
const auto* _rocp_agent = agent::get_rocprofiler_agent(data.hsa_agent);
if(!_rocp_agent)
{
ROCP_ERROR << "hsa agent (handle=" << _hsa_agent.handle
<< ") did not map to a rocprofiler agent";
return HSA_STATUS_ERROR_INVALID_AGENT;
}
data.rocp_agent = _rocp_agent->id;
// if we have reached this point (i.e. there were no HSA errors returned within macro) then we
// generate a unique code object id
data.code_object_id = ++get_code_object_id();
auto _status = ::rocprofiler::hsa::get_core_table()->hsa_executable_iterate_agent_symbols_fn(
executable, data.hsa_agent, executable_iterate_agent_symbols_load_callback, &code_obj_v);
if(_status == HSA_STATUS_SUCCESS)
{
code_obj_vec->emplace_back(std::make_unique<hsa::code_object>(std::move(code_obj_v)));
}
else
{
ROCP_ERROR << "hsa_executable_iterate_agent_symbols failed for " << data.uri;
}
return _status;
#undef ROCP_HSA_VEN_LOADER_GET_CODE_OBJECT_INFO
}
hsa_status_t
code_object_unload_callback(hsa_executable_t executable,
hsa_loaded_code_object_t loaded_code_object,
void* args)
{
auto code_obj_v = hsa::code_object{};
code_obj_v.hsa_executable = executable;
code_obj_v.hsa_code_object = loaded_code_object;
auto* code_obj_arr = static_cast<code_object_unload_array_t*>(args);
CHECK_NOTNULL(code_obj_arr);
// auto _size = CHECK_NOTNULL(get_code_objects())->rlock([](const auto& data) { return
// data.size(); }); ROCP_INFO << "[inp] executable=" << executable.handle
// << ", code_object=" << loaded_code_object.handle << " vs. " << _size;
CHECK_NOTNULL(get_code_objects())->rlock([&](const code_object_array_t& arr) {
for(const auto& itr : arr)
{
// ROCP_INFO << "[cmp] executable=" << itr->hsa_executable.handle
// << ", code_object=" << itr->hsa_code_object.handle;
if(itr->hsa_executable.handle == executable.handle &&
itr->hsa_code_object.handle == loaded_code_object.handle)
// if(itr && *itr == code_obj_v)
{
auto& _last =
code_obj_arr->emplace_back(hsa::code_object_unload{.object = itr.get()});
auto agent = itr->rocp_data.hsa_agent;
::rocprofiler::hsa::get_core_table()->hsa_executable_iterate_agent_symbols_fn(
executable, agent, executable_iterate_agent_symbols_unload_callback, &_last);
}
}
});
return HSA_STATUS_SUCCESS;
}
std::vector<hsa::code_object_unload>
get_unloaded_code_objects(hsa_executable_t executable)
{
auto _unloaded = std::vector<hsa::code_object_unload>{};
if(!is_shutdown && get_loader_table().hsa_ven_amd_loader_executable_iterate_loaded_code_objects)
get_loader_table().hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
executable, code_object_unload_callback, &_unloaded);
return _unloaded;
}
auto&
get_freeze_function()
{
static decltype(::hsa_executable_freeze)* _v = nullptr;
return _v;
}
auto&
get_destroy_function()
{
static decltype(::hsa_executable_destroy)* _v = nullptr;
return _v;
}
hsa_status_t
executable_freeze(hsa_executable_t executable, const char* options)
{
hsa_status_t status = CHECK_NOTNULL(get_freeze_function())(executable, options);
if(status != HSA_STATUS_SUCCESS) return status;
ROCP_INFO << "running " << __FUNCTION__ << " (executable=" << executable.handle << ")...";
CHECK_NOTNULL(get_executables())->wlock([executable](executable_array_t& data) {
data.emplace_back(executable);
});
auto* code_obj_vec = get_code_objects();
CHECK_NOTNULL(code_obj_vec)->wlock([executable](code_object_array_t& _vec) {
get_loader_table().hsa_ven_amd_loader_executable_iterate_loaded_code_objects(
executable, code_object_load_callback, &_vec);
});
constexpr auto CODE_OBJECT_KIND = ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT;
constexpr auto CODE_OBJECT_LOAD = ROCPROFILER_CODE_OBJECT_LOAD;
constexpr auto CODE_OBJECT_KERNEL_SYMBOL =
ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER;
auto&& context_filter = [](const context_t* ctx) {
return (ctx->callback_tracer && ctx->callback_tracer->domains(CODE_OBJECT_KIND) &&
(ctx->callback_tracer->domains(CODE_OBJECT_KIND, CODE_OBJECT_LOAD) ||
ctx->callback_tracer->domains(CODE_OBJECT_KIND, CODE_OBJECT_KERNEL_SYMBOL)));
};
static thread_local auto ctxs = context_array_t{};
context::get_active_contexts(ctxs, std::move(context_filter));
if(!ctxs.empty())
{
code_obj_vec->rlock([](const code_object_array_t& data) {
auto tidx = common::get_tid();
// set the contexts for each code object
for(const auto& ditr : data)
ditr->contexts = ctxs;
for(const auto& ditr : data)
{
for(const auto* citr : ditr->contexts)
{
if(citr->callback_tracer->domains(CODE_OBJECT_KIND, CODE_OBJECT_LOAD))
{
if(!ditr->beg_notified)
{
auto co_data = ditr->rocp_data;
auto record = rocprofiler_callback_tracing_record_t{
.context_id = rocprofiler_context_id_t{citr->context_idx},
.thread_id = tidx,
.correlation_id = rocprofiler_correlation_id_t{},
.kind = CODE_OBJECT_KIND,
.operation = CODE_OBJECT_LOAD,
.phase = ROCPROFILER_CALLBACK_PHASE_LOAD,
.payload = static_cast<void*>(&co_data)};
// invoke callback
auto& cb_data =
citr->callback_tracer->callback_data.at(CODE_OBJECT_KIND);
auto& user_data = ditr->user_data[citr];
cb_data.callback(record, &user_data, cb_data.data);
}
}
for(const auto& sitr : ditr->symbols)
{
if(sitr && citr->callback_tracer->domains(CODE_OBJECT_KIND,
CODE_OBJECT_KERNEL_SYMBOL))
{
if(!sitr->beg_notified)
{
auto sym_data = sitr->rocp_data;
auto record = rocprofiler_callback_tracing_record_t{
.context_id = rocprofiler_context_id_t{citr->context_idx},
.thread_id = tidx,
.correlation_id = rocprofiler_correlation_id_t{},
.kind = CODE_OBJECT_KIND,
.operation = CODE_OBJECT_KERNEL_SYMBOL,
.phase = ROCPROFILER_CALLBACK_PHASE_LOAD,
.payload = static_cast<void*>(&sym_data)};
// invoke callback
auto& cb_data =
citr->callback_tracer->callback_data.at(CODE_OBJECT_KIND);
auto& user_data = sitr->user_data[citr];
cb_data.callback(record, &user_data, cb_data.data);
}
}
}
}
}
for(const auto& ditr : data)
{
ditr->beg_notified = true;
for(auto& sitr : ditr->symbols)
sitr->beg_notified = true;
}
});
}
return HSA_STATUS_SUCCESS;
}
hsa_status_t
executable_destroy(hsa_executable_t executable)
{
if(is_shutdown) return HSA_STATUS_SUCCESS;
auto _unloaded = shutdown(executable);
if(get_kernel_object_map())
{
CHECK_NOTNULL(get_kernel_object_map())->wlock([_unloaded](kernel_object_map_t& data) {
for(const auto& uitr : _unloaded)
{
for(const auto& sitr : uitr.symbols)
{
data.erase(sitr->rocp_data.kernel_id);
}
}
});
}
if(get_code_objects())
{
CHECK_NOTNULL(get_code_objects())->wlock([executable](code_object_array_t& data) {
for(auto& itr : data)
{
if(itr->hsa_executable.handle == executable.handle) itr.reset();
}
data.erase(std::remove_if(
data.begin(), data.end(), [](auto& itr) { return (itr == nullptr); }),
data.end());
});
}
if(get_executables())
{
CHECK_NOTNULL(get_executables())->wlock([executable](executable_array_t& data) {
data.erase(std::remove_if(data.begin(),
data.end(),
[executable](hsa_executable_t itr) {
return (itr.handle == executable.handle);
}),
data.end());
});
}
return CHECK_NOTNULL(get_destroy_function())(executable);
}
std::vector<hsa::code_object_unload>
shutdown(hsa_executable_t executable)
{
ROCP_INFO << "running " << __FUNCTION__ << " (executable=" << executable.handle << ")...";
auto _unloaded = code_object::get_unloaded_code_objects(executable);
constexpr auto CODE_OBJECT_KIND = ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT;
constexpr auto CODE_OBJECT_LOAD = ROCPROFILER_CODE_OBJECT_LOAD;
constexpr auto CODE_OBJECT_KERNEL_SYMBOL =
ROCPROFILER_CODE_OBJECT_DEVICE_KERNEL_SYMBOL_REGISTER;
auto tidx = common::get_tid();
for(auto& itr : _unloaded)
{
ROCP_FATAL_IF(itr.object == nullptr);
for(const auto* citr : itr.object->contexts)
{
if(citr->callback_tracer->domains(CODE_OBJECT_KIND, CODE_OBJECT_LOAD))
{
if(!itr.object->end_notified)
{
auto record = rocprofiler_callback_tracing_record_t{
.context_id = rocprofiler_context_id_t{citr->context_idx},
.thread_id = tidx,
.correlation_id = rocprofiler_correlation_id_t{},
.kind = CODE_OBJECT_KIND,
.operation = CODE_OBJECT_LOAD,
.phase = ROCPROFILER_CALLBACK_PHASE_UNLOAD,
.payload = static_cast<void*>(&itr.object->rocp_data)};
// invoke callback
auto& cb_data = citr->callback_tracer->callback_data.at(CODE_OBJECT_KIND);
auto& user_data = itr.object->user_data.at(citr);
cb_data.callback(record, &user_data, cb_data.data);
}
}
// generate callbacks for kernel symbols after the callback for code object
// unloading so the code object unload can be used to flush the buffer before the
// symbol information is removed
if(citr->callback_tracer->domains(CODE_OBJECT_KIND, CODE_OBJECT_KERNEL_SYMBOL))
{
for(auto& sitr : itr.symbols)
{
if(!sitr->end_notified)
{
auto record = rocprofiler_callback_tracing_record_t{
.context_id = rocprofiler_context_id_t{citr->context_idx},
.thread_id = tidx,
.correlation_id = rocprofiler_correlation_id_t{},
.kind = CODE_OBJECT_KIND,
.operation = CODE_OBJECT_KERNEL_SYMBOL,
.phase = ROCPROFILER_CALLBACK_PHASE_UNLOAD,
.payload = static_cast<void*>(&sitr->rocp_data)};
// invoke callback
auto& cb_data = citr->callback_tracer->callback_data.at(CODE_OBJECT_KIND);
auto& user_data = sitr->user_data.at(citr);
cb_data.callback(record, &user_data, cb_data.data);
}
}
}
}
}
for(auto& itr : _unloaded)
{
itr.object->end_notified = true;
for(auto& sitr : itr.symbols)
sitr->end_notified = true;
}
return _unloaded;
}
} // namespace
void
initialize(HsaApiTable* table)
{
auto& core_table = *table->core_;
get_status_string_function() = core_table.hsa_status_string_fn;
auto _status = core_table.hsa_system_get_major_extension_table_fn(
HSA_EXTENSION_AMD_LOADER, 1, sizeof(hsa_loader_table_t), &get_loader_table());
ROCP_ERROR_IF(_status != HSA_STATUS_SUCCESS)
<< "hsa_system_get_major_extension_table failed: " << get_status_string(_status);
if(_status == HSA_STATUS_SUCCESS)
{
get_freeze_function() = CHECK_NOTNULL(core_table.hsa_executable_freeze_fn);
get_destroy_function() = CHECK_NOTNULL(core_table.hsa_executable_destroy_fn);
core_table.hsa_executable_freeze_fn = executable_freeze;
core_table.hsa_executable_destroy_fn = executable_destroy;
ROCP_FATAL_IF(get_freeze_function() == core_table.hsa_executable_freeze_fn)
<< "infinite recursion";
ROCP_FATAL_IF(get_destroy_function() == core_table.hsa_executable_destroy_fn)
<< "infinite recursion";
}
}
uint64_t
get_kernel_id(uint64_t kernel_object)
{
return CHECK_NOTNULL(get_kernel_object_map())
->rlock(
[](const kernel_object_map_t& object_map, uint64_t _kern_obj) -> uint64_t {
auto itr = object_map.find(_kern_obj);
return (itr == object_map.end()) ? 0 : itr->second;
},
kernel_object);
}
void
finalize()
{
if(is_shutdown || !get_executables() || !get_code_objects()) return;
CHECK_NOTNULL(get_executables())->rlock([](const executable_array_t& edata) {
auto tmp = edata;
std::reverse(tmp.begin(), tmp.end());
for(auto itr : tmp)
shutdown(itr);
});
CHECK_NOTNULL(get_code_objects())->wlock([](code_object_array_t& data) { data.clear(); });
is_shutdown = true;
}
void
iterate_loaded_code_objects(code_object_iterator_t&& func)
{
if(is_shutdown || !get_executables() || !get_code_objects()) return;
CHECK_NOTNULL(get_code_objects())
->rlock(
[](const code_object_array_t& data, code_object_iterator_t&& func_v) {
for(const auto& itr : data)
{
if(itr) func_v(*itr);
}
},
std::move(func));
}
} // namespace code_object
} // namespace rocprofiler