4fa165ec1a
* Add ToolsApiTable Add ToolsApiTable wrapping for scratch memory tracking * Add initial support for scratch memory tracking Buffering is implemented * cmake formatting (cmake-format) (#525) Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com> * source formatting (clang-format v11) (#524) Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com> * Add callback tracing for scratch Fixed the error where scratch tracking init was called irrespective of whether any client requested for it * Apply suggestions from code review Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> * Fix tools api copy/update Table were saved/updated incorrectly in previous commit. Also adds passing user data through the callback * Fix OpKind sequence for scratch tracking Previously scratch was using OpKind from rocprofiler-sdk, but templates were instantiated using API ID. These differ by 1 * Integration tests for scratch reporting Added buffer and callback integration tests for scratch reporting * source formatting (clang-format v11) (#550) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * cmake formatting (cmake-format) (#551) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * python formatting (black) (#549) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * CI fixes * source formatting (clang-format v11) (#554) Co-authored-by: MythreyaK <26112391+MythreyaK@users.noreply.github.com> * Update api Rebase on main and updates based on PR feedback * Update scratch reporting and address PR comments - Added agent id to buffer records - Updated `test_internal_correlation_ids` - Is almost identical to one in async-copy - Updated scratch test to check for agent id - Updated queue id serialization in callback records (prints handle as nested key) - Remove `marker_api_traces` from scratch `test_internal_correlation_ids` validation test - Rename `amd_tools_api` to `scratch_memory` - Added doxygen comments - Remove scratch callback from `tool.cpp` - Replace assert with `LOF_IF` in `scratch_memory.cpp` * Update tools table Changed to match up with changes to hsa tables in main branch * Rework scratch memory structure * Update tests - Added suggestions from PR review, and updated tests accordingly * Misc cleanup * Update scratch test As of Apr 4th, `hsa_amd_agent_set_async_scratch_limit` is disabled. Note, > This API: `hsa_amd_agent_set_async_scratch_limit` is currently > disabled. We need some changes in CP firmware to be able to do this > and these changes are not ready yet. > With the current code, you will also not get notifications for > alternate-scratch allocations because this feature has been disabled > while CP firmware is making additional changes > We are hoping to have that feature enabled by ROCm-6.3 * Minor update to lib/rocprofiler-sdk/internal_threading.* - delay destruction of shared_ptrs of the tasks to prevent rare (but possible) data race on the destruction of the shared_ptr --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: MythreyaK <MythreyaK@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
93 γραμμές
2.9 KiB
C++
93 γραμμές
2.9 KiB
C++
// MIT License
|
|
//
|
|
// Copyright (c) 2023 Advanced Micro Devices, Inc. All rights reserved.
|
|
//
|
|
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
// of this software and associated documentation files (the "Software"), to deal
|
|
// in the Software without restriction, including without limitation the rights
|
|
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
// copies of the Software, and to permit persons to whom the Software is
|
|
// furnished to do so, subject to the following conditions:
|
|
//
|
|
// The above copyright notice and this permission notice shall be included in all
|
|
// copies or substantial portions of the Software.
|
|
//
|
|
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
// SOFTWARE.
|
|
|
|
#pragma once
|
|
|
|
#include <rocprofiler-sdk/internal_threading.h>
|
|
|
|
#include "lib/common/container/stable_vector.hpp"
|
|
#include "lib/common/defines.hpp"
|
|
#include "lib/common/utility.hpp"
|
|
#include "lib/rocprofiler-sdk/allocator.hpp"
|
|
|
|
#include <PTL/TaskManager.hh>
|
|
#include <PTL/ThreadPool.hh>
|
|
|
|
#include <cstdint>
|
|
#include <functional>
|
|
#include <memory>
|
|
#include <mutex>
|
|
#include <string>
|
|
#include <vector>
|
|
|
|
namespace rocprofiler
|
|
{
|
|
namespace internal_threading
|
|
{
|
|
class TaskGroup : private PTL::TaskManager
|
|
{
|
|
public:
|
|
using thread_pool_t = PTL::ThreadPool;
|
|
using parent_type = PTL::TaskManager;
|
|
using task_type = PTL::PackagedTask<void>;
|
|
|
|
TaskGroup();
|
|
~TaskGroup() override;
|
|
|
|
TaskGroup(const TaskGroup&) = delete;
|
|
TaskGroup(TaskGroup&&) noexcept = delete;
|
|
TaskGroup& operator=(const TaskGroup&) = delete;
|
|
TaskGroup& operator=(TaskGroup&&) noexcept = delete;
|
|
|
|
void exec(std::function<void()>&&);
|
|
void wait();
|
|
void join();
|
|
|
|
private:
|
|
std::mutex m_mutex = {};
|
|
thread_pool_t* m_pool = nullptr;
|
|
std::deque<std::shared_ptr<task_type>> m_tasks = {};
|
|
std::deque<std::shared_ptr<task_type>> m_completed_tasks = {};
|
|
};
|
|
|
|
using task_group_t = TaskGroup;
|
|
|
|
void notify_pre_internal_thread_create(rocprofiler_runtime_library_t);
|
|
void notify_post_internal_thread_create(rocprofiler_runtime_library_t);
|
|
|
|
// initialize the default thread pool
|
|
void
|
|
initialize();
|
|
|
|
// destroy all the thread pools
|
|
void
|
|
finalize();
|
|
|
|
// creates a new thread
|
|
rocprofiler_callback_thread_t
|
|
create_callback_thread();
|
|
|
|
// returns the task group for the given callback thread identifier
|
|
task_group_t* get_task_group(rocprofiler_callback_thread_t);
|
|
} // namespace internal_threading
|
|
} // namespace rocprofiler
|