c641749fe6
* Update include/rocprofiler-sdk/hip*
- updates for intercept table
* Update lib/common/units.hpp
- clang-tidy fixes
* Add lib/rocprofiler-sdk/hip
- tracing implementation for the HIP intercept table
* Update source/lib/rocprofiler-sdk/CMakeLists.txt
- add_subdirectory(hip)
* Update source/lib/rocprofiler-sdk/hsa
- offset function in hsa_api_info<Idx>
- remove report_activity, set_callback
- Tweak HSA_API_TABLE_LOOKUP_DEFINITION
* Update lib/rocprofiler-sdk/hip
- rocprofiler::hip::copy_table
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/hsa/utils.hpp
- stringize_impl print dereferenced pointers when possible
* Update lib/rocprofiler-sdk/tests/intercept_table.cpp
- remove failures for intercepting HIP API tables
* Update include/rocprofiler-sdk/fwd.h
- add ROCPROFILER_HIP_RUNTIME_LIBRARY (== ROCPROFILER_HIP_LIBRARY)
- add ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_query_buffer_tracing_kind_operation_name
- Support ROCPROFILER_BUFFER_TRACING_HIP_API in rocprofiler_iterate_buffer_tracing_kind_operations
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_query_callback_tracing_kind_operation_name
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operations
- Support ROCPROFILER_CALLBACK_TRACING_HIP_API in rocprofiler_iterate_callback_tracing_kind_operation_args
* Update lib/rocprofiler-sdk/intercept_table.cpp
- support HipDispatchTable and HipCompilerDispatchTable
* Update lib/rocprofiler-sdk/internal_threading.cpp
- Support ROCPROFILER_HIP_COMPILER_LIBRARY
* Update lib/rocprofiler-sdk/registration.cpp
- Support "hip" and "hip_compiler" in rocprofiler_set_api_table
- Added some extra logging
* Update samples/api_{buffered,callback}_tracing
- Modifications to demonstrate HIP API tracing
* Update tests/kernel-tracing
- Modifications to handle/test HIP API tracing
* Separate HIP tracing from HIP compiler tracing
* Fix installation of include/rocprofiler-sdk/hip/*
- add compiler and table headers to install
* Fixes to HIP interception
- hip_api_trace.hpp was updated a bit
- removed hipGetDeviceProperties (generic)
- added hipGetDevicePropertiesR0600
- added hipGetDevicePropertiesR0000
- removed hipRegisterTracerCallback
- reordered hipCreateChannelDesc, hipExtModuleLaunchKernel, hipHccModuleLaunchKernel
- added hipDrvGraphAddMemsetNode
- static asserts in hsa_api_info ensuring ordering of pointers
* Update lib/rocprofiler-sdk/hip/hip.*
- use size_t instead of rocprofiler_hip_table_api_id_t as non-type template parameter (smaller binary)
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update lib/rocprofiler-sdk/hsa/hsa.*
- separated out population of callback_context_data and buffered_context_data into non-template function (significantly smaller binary)
* Update test/kernel-tracing/validate.py
- does not expect any hip_api_traces until libamdhip.so actually starts using rocprofiler-register
* Update tests/tools/json-tool.cpp
- fix context associated with "HIP_API_CALLBACK"
* Update external/CMakeLists.txt
- move misc variables to top of CMakeLists.txt so they apply to all external subprojects
- BUILD_TESTING (OFF)
- BUILD_SHARED_LIBS (OFF)
- BUILD_OBJECT_LIBS (OFF)
- BUILD_STATIC_LIBS (ON)
- CMAKE_POSITION_INDEPENDENT_CODE (ON)
- CMAKE_VISIBILITY_INLINES_HIDDEN (ON)
- CMAKE_CXX_VISIBILITY_PRESET (hidden)
- disable using libunwind in glog
* Update lib/rocprofiler-{sdk,sdk-tool}/CMakeLists.txt
- remove explicit setting of SKIP_BUILD_RPATH
* Update CMakeLists.txt
- set high-level CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH_USE_LINK_PATH
* Update tests/CMakeLists.txt
- include(GNUInstallDirs)
* Update samples/CMakeLists.txt
- include(GNUInstallDirs)
* Update include/rocprofiler-sdk/hip/{compiler_api,api}_args.h
- remove extern "C" due to incompatibility b/t empty struct in C (size 0) vs. empty struct in C++ (size 1)
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- clang-tidy fixes
* Update cmake/rocprofiler_linting.cmake
- add a feature for clang tidy exe
* Update lib/rocprofiler-sdk/hip/hip.cpp
- use recursion instead of fold expression due to clang-tidy errors (maximum nesting level exceeded)
* Update lib/rocprofiler-sdk/buffer_tracing.cpp
- fix merge
* Update lib/rocprofiler-sdk/callback_tracing.cpp
- fix merge
* Update bin/rocprofv3
- args for marker, HIP runtime, and HIP compiler tracing
* Update tests/apps/simple-transpose
- use roctx
* Update tests/rocprofv3/tracing
- validate marker API data
* Update lib/rocprofiler-sdk-tool
- support for HIP runtime, HIP compiler, marker API
* Update queue/queue_controller/registration/utility
- call hsa::queue_controller_fini() during finalization
- add a yield function to common/utility.hpp
- implements a thread yield + sleep
- add a sync function to Queue class
- add a iterate_queues member function to QueueController
- this is used to sync each queue during queue_controller_fini()
* Fix data races: queue/context/stable_vector
- stable_vector::emplace_back returns reference
- correlation id map uses stable_vector
- queue_info_session has explicit fields for queue id, hsa agent, rocp agent
- use hsa::get_table() in AsyncSignalHandler
- WriteInterceptor does not use TLS for context array
* Update lib/rocprofiler-sdk/hsa/hsa.*
- static object for API subtables
- accessors for API subtables
- google tests for HSA API subtables
* Update lib/rocprofiler-sdk/hsa/{queue,async_copy}.cpp
- use HSA subtable accessors
* Update rocprofiler_memcheck and CI workflow
- use GCC 13 instead of GCC 11 due to suspected false positives in thread sanitizer
- GCC 13 uses libtsan.so.2
* Update CI workflow
* Update lib/rocprofiler-sdk/counters/{metrics,counters}
- fix possibly dangling reference to a temporary from gcc-13
* Update thread-sanitizer-suppr.txt
- Ignore data races originating in hsa-runtime library
* Update cmake/rocprofiler_memcheck.cmake
- Deduce the sanitizer library to preload by compiling an application and extracting the linked sanitizer library
* Update tests/rocprofv3/tracing/CMakeLists.txt
- add csv files to REQUIRED_FILES and ATTACH_ON_FAIL in validate test
* Update lib/common/container/record_header_buffer.hpp
- fix data race identified by gcc v13 and libtsan.so.2
* Update hip API id, args, and def
- remove hipDrvGraphAddMemsetNode (not part of ROCm 6.0
* Update lib/common/container/record_header_buffer.hpp
- fix deadlock in save/read/reset
* Update source/docs/CMakeLists.txt
- remove COMMAND_ERROR_IS_FATAL ANY to allow for printing of stdout/stderr
* Update lib/rocprofiler-sdk/hip/details/ostream.hpp
- remove overloads for HIP_MEMSET_NODE_PARAMS
* Update docs/CMakeLists.txt
- use find_program for shell instead of hardcoded /bin/bash
250 lines
7.8 KiB
C++
250 lines
7.8 KiB
C++
// MIT License
|
|
//
|
|
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
|
//
|
|
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
// of this software and associated documentation files (the "Software"), to deal
|
|
// in the Software without restriction, including without limitation the rights
|
|
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
// copies of the Software, and to permit persons to whom the Software is
|
|
// furnished to do so, subject to the following conditions:
|
|
//
|
|
// The above copyright notice and this permission notice shall be included in
|
|
// all copies or substantial portions of the Software.
|
|
//
|
|
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
// THE SOFTWARE.
|
|
|
|
#include "lib/rocprofiler-sdk/hsa/queue_controller.hpp"
|
|
#include "lib/common/static_object.hpp"
|
|
#include "lib/rocprofiler-sdk/agent.hpp"
|
|
#include "lib/rocprofiler-sdk/context/context.hpp"
|
|
#include "lib/rocprofiler-sdk/hsa/agent_cache.hpp"
|
|
|
|
#include <rocprofiler-sdk/fwd.h>
|
|
|
|
#include <glog/logging.h>
|
|
|
|
namespace rocprofiler
|
|
{
|
|
namespace hsa
|
|
{
|
|
namespace
|
|
{
|
|
// HSA Intercept Functions (create_queue/destroy_queue)
|
|
hsa_status_t
|
|
create_queue(hsa_agent_t agent,
|
|
uint32_t size,
|
|
hsa_queue_type32_t type,
|
|
void (*callback)(hsa_status_t status, hsa_queue_t* source, void* data),
|
|
void* data,
|
|
uint32_t private_segment_size,
|
|
uint32_t group_segment_size,
|
|
hsa_queue_t** queue)
|
|
{
|
|
for(const auto& [_, agent_info] : get_queue_controller().get_supported_agents())
|
|
{
|
|
if(agent_info.get_hsa_agent().handle == agent.handle)
|
|
{
|
|
auto new_queue = std::make_unique<Queue>(agent_info,
|
|
size,
|
|
type,
|
|
callback,
|
|
data,
|
|
private_segment_size,
|
|
group_segment_size,
|
|
get_queue_controller().get_core_table(),
|
|
get_queue_controller().get_ext_table(),
|
|
queue);
|
|
get_queue_controller().add_queue(*queue, std::move(new_queue));
|
|
return HSA_STATUS_SUCCESS;
|
|
}
|
|
}
|
|
LOG(FATAL) << "Could not find agent - " << agent.handle;
|
|
return HSA_STATUS_ERROR_FATAL;
|
|
}
|
|
|
|
hsa_status_t
|
|
destroy_queue(hsa_queue_t* hsa_queue)
|
|
{
|
|
get_queue_controller().destory_queue(hsa_queue);
|
|
return HSA_STATUS_SUCCESS;
|
|
}
|
|
|
|
constexpr rocprofiler_agent_t default_agent =
|
|
rocprofiler_agent_t{sizeof(rocprofiler_agent_t),
|
|
rocprofiler_agent_id_t{std::numeric_limits<uint64_t>::max()}};
|
|
} // namespace
|
|
|
|
void
|
|
QueueController::add_queue(hsa_queue_t* id, std::unique_ptr<Queue> queue)
|
|
{
|
|
CHECK(queue);
|
|
_callback_cache.wlock([&](auto& callbacks) {
|
|
_queues.wlock([&](auto& map) {
|
|
const auto agent_id = queue->get_agent().get_rocp_agent()->id.handle;
|
|
map[id] = std::move(queue);
|
|
for(const auto& [cbid, cb_tuple] : callbacks)
|
|
{
|
|
auto& [agent, qcb, ccb] = cb_tuple;
|
|
if(agent.id.handle == default_agent.id.handle || agent.id.handle == agent_id)
|
|
{
|
|
map[id]->register_callback(cbid, qcb, ccb);
|
|
}
|
|
}
|
|
});
|
|
});
|
|
}
|
|
|
|
void
|
|
QueueController::destory_queue(hsa_queue_t* id)
|
|
{
|
|
_queues.wlock([&](auto& map) { map.erase(id); });
|
|
}
|
|
|
|
ClientID
|
|
QueueController::add_callback(std::optional<rocprofiler_agent_t> agent,
|
|
Queue::queue_cb_t qcb,
|
|
Queue::completed_cb_t ccb)
|
|
{
|
|
static std::atomic<ClientID> client_id = 1;
|
|
ClientID return_id;
|
|
_callback_cache.wlock([&](auto& cb_cache) {
|
|
return_id = client_id;
|
|
if(agent)
|
|
{
|
|
cb_cache[client_id] = std::tuple(*agent, qcb, ccb);
|
|
}
|
|
else
|
|
{
|
|
cb_cache[client_id] = std::tuple(default_agent, qcb, ccb);
|
|
}
|
|
client_id++;
|
|
|
|
_queues.wlock([&](auto& map) {
|
|
for(auto& [_, queue] : map)
|
|
{
|
|
if(!agent || queue->get_agent().get_rocp_agent()->id.handle == agent->id.handle)
|
|
{
|
|
queue->register_callback(return_id, qcb, ccb);
|
|
}
|
|
}
|
|
});
|
|
});
|
|
return return_id;
|
|
}
|
|
|
|
void
|
|
QueueController::remove_callback(ClientID id)
|
|
{
|
|
_callback_cache.wlock([&](auto& cb_cache) {
|
|
cb_cache.erase(id);
|
|
_queues.wlock([&](auto& map) {
|
|
for(auto& [_, queue] : map)
|
|
{
|
|
queue->remove_callback(id);
|
|
}
|
|
});
|
|
});
|
|
}
|
|
|
|
void
|
|
QueueController::init(CoreApiTable& core_table, AmdExtTable& ext_table)
|
|
{
|
|
_core_table = core_table;
|
|
_ext_table = ext_table;
|
|
|
|
auto agents = agent::get_agents();
|
|
|
|
// Generate supported agents
|
|
for(const auto* itr : agents)
|
|
{
|
|
auto cached_agent = agent::get_agent_cache(itr);
|
|
if(cached_agent && cached_agent->get_rocp_agent()->type == ROCPROFILER_AGENT_TYPE_GPU)
|
|
{
|
|
get_supported_agents().emplace(cached_agent->index(), *cached_agent);
|
|
}
|
|
}
|
|
|
|
auto enable_intercepter = false;
|
|
for(const auto& itr : context::get_registered_contexts())
|
|
{
|
|
constexpr auto expected_context_size = 160UL;
|
|
static_assert(
|
|
sizeof(context::context) == expected_context_size,
|
|
"If you added a new field to context struct, make sure there is a check here if it "
|
|
"requires queue interception. Once you have done so, increment expected_context_size");
|
|
|
|
if(itr->counter_collection)
|
|
{
|
|
enable_intercepter = true;
|
|
break;
|
|
}
|
|
else if(itr->buffered_tracer)
|
|
{
|
|
if(itr->buffered_tracer->domains(ROCPROFILER_BUFFER_TRACING_KERNEL_DISPATCH))
|
|
{
|
|
enable_intercepter = true;
|
|
break;
|
|
}
|
|
}
|
|
}
|
|
|
|
if(enable_intercepter)
|
|
{
|
|
core_table.hsa_queue_create_fn = create_queue;
|
|
core_table.hsa_queue_destroy_fn = destroy_queue;
|
|
}
|
|
}
|
|
|
|
const Queue*
|
|
QueueController::get_queue(const hsa_queue_t& _hsa_queue) const
|
|
{
|
|
return _queues.rlock(
|
|
[](const queue_map_t& _data, const hsa_queue_t& _inp) -> const Queue* {
|
|
for(const auto& itr : _data)
|
|
{
|
|
if(itr.first->id == _inp.id) return itr.second.get();
|
|
}
|
|
return nullptr;
|
|
},
|
|
_hsa_queue);
|
|
}
|
|
|
|
void
|
|
QueueController::iterate_queues(const queue_iterator_cb_t& cb) const
|
|
{
|
|
_queues.rlock([&cb](const queue_map_t& _queues_v) {
|
|
for(const auto& itr : _queues_v)
|
|
{
|
|
if(itr.second) cb(itr.second.get());
|
|
}
|
|
});
|
|
}
|
|
|
|
QueueController&
|
|
get_queue_controller()
|
|
{
|
|
static auto*& controller = common::static_object<QueueController>::construct();
|
|
return *(CHECK_NOTNULL(controller));
|
|
}
|
|
|
|
void
|
|
queue_controller_init(HsaApiTable* table)
|
|
{
|
|
get_queue_controller().init(*table->core_, *table->amd_ext_);
|
|
}
|
|
|
|
void
|
|
queue_controller_fini()
|
|
{
|
|
get_queue_controller().iterate_queues([](const Queue* _queue) { _queue->sync(); });
|
|
}
|
|
} // namespace hsa
|
|
} // namespace rocprofiler
|