1517a398bf
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
256 строки
9.8 KiB
C++
256 строки
9.8 KiB
C++
// MIT License
|
|
//
|
|
// Copyright (c) 2024-2025 Advanced Micro Devices, Inc. All rights reserved.
|
|
//
|
|
// Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
// of this software and associated documentation files (the "Software"), to deal
|
|
// in the Software without restriction, including without limitation the rights
|
|
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
// copies of the Software, and to permit persons to whom the Software is
|
|
// furnished to do so, subject to the following conditions:
|
|
//
|
|
// The above copyright notice and this permission notice shall be included in all
|
|
// copies or substantial portions of the Software.
|
|
//
|
|
// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
// SOFTWARE.
|
|
//
|
|
// undefine NDEBUG so asserts are implemented
|
|
#ifdef NDEBUG
|
|
# undef NDEBUG
|
|
#endif
|
|
|
|
#include "trace_callbacks.hpp"
|
|
|
|
#include <atomic>
|
|
#include <mutex>
|
|
#include <set>
|
|
|
|
namespace ATTTest
|
|
{
|
|
namespace Agent
|
|
{
|
|
rocprofiler_client_id_t* client_id = nullptr;
|
|
rocprofiler_context_id_t agent_ctx = {};
|
|
rocprofiler_context_id_t tracing_ctx = {};
|
|
|
|
// Callback state allocated on heap to control destruction order
|
|
struct CallbackState
|
|
{
|
|
std::atomic<bool> isprofiling{false};
|
|
std::atomic<bool> stop_profiling{false};
|
|
std::mutex mut{};
|
|
std::set<int> captured_ids{};
|
|
};
|
|
|
|
CallbackState* callback_state = nullptr;
|
|
|
|
void
|
|
tool_fini(void* tool_data)
|
|
{
|
|
// Stop contexts to ensure no more callbacks are dispatched before static destruction
|
|
rocprofiler_stop_context(tracing_ctx);
|
|
rocprofiler_stop_context(agent_ctx);
|
|
|
|
// Call the shared finalize logic
|
|
Callbacks::finalize(tool_data);
|
|
|
|
// Clean up heap-allocated callback state after finalize
|
|
delete callback_state;
|
|
callback_state = nullptr;
|
|
}
|
|
|
|
void
|
|
dispatch_tracing_callback(rocprofiler_callback_tracing_record_t record,
|
|
rocprofiler_user_data_t* /* user_data */,
|
|
void* /* userdata */)
|
|
{
|
|
if(record.kind != ROCPROFILER_CALLBACK_TRACING_KERNEL_DISPATCH) return;
|
|
if(record.phase == ROCPROFILER_CALLBACK_PHASE_EXIT) return;
|
|
|
|
// Check if callback_state is still valid (may be null during shutdown)
|
|
if(!callback_state) return;
|
|
|
|
assert(record.payload);
|
|
auto* rdata = static_cast<rocprofiler_callback_tracing_kernel_dispatch_data_t*>(record.payload);
|
|
auto dispatch_id = rdata->dispatch_info.dispatch_id;
|
|
|
|
// Choose two dispatches to begin(6) and end(10) the trace
|
|
constexpr uint64_t begin_dispatch = 6;
|
|
constexpr uint64_t end_dispatch = 10;
|
|
|
|
if(record.phase == ROCPROFILER_CALLBACK_PHASE_ENTER)
|
|
{
|
|
if(dispatch_id == begin_dispatch)
|
|
{
|
|
ROCPROFILER_CALL(rocprofiler_start_context(agent_ctx), "context start");
|
|
callback_state->isprofiling.store(true);
|
|
}
|
|
if(callback_state->isprofiling && dispatch_id <= end_dispatch)
|
|
{
|
|
std::unique_lock<std::mutex> lk(callback_state->mut);
|
|
callback_state->captured_ids.insert(dispatch_id);
|
|
}
|
|
if(dispatch_id > end_dispatch) callback_state->stop_profiling.store(true);
|
|
return;
|
|
}
|
|
|
|
assert(record.phase == ROCPROFILER_CALLBACK_PHASE_NONE);
|
|
|
|
if(!callback_state->isprofiling) return;
|
|
|
|
std::unique_lock<std::mutex> lk(callback_state->mut);
|
|
callback_state->captured_ids.erase(dispatch_id);
|
|
if(!callback_state->captured_ids.empty() || callback_state->stop_profiling == false) return;
|
|
|
|
bool _exp = true;
|
|
if(!callback_state->isprofiling.compare_exchange_strong(_exp, false, std::memory_order_relaxed))
|
|
return;
|
|
|
|
ROCPROFILER_CALL(rocprofiler_stop_context(agent_ctx), "context stop");
|
|
}
|
|
|
|
rocprofiler_status_t
|
|
query_available_agents(rocprofiler_agent_version_t /* version */,
|
|
const void** agents,
|
|
size_t num_agents,
|
|
void* user_data)
|
|
{
|
|
rocprofiler_user_data_t user{};
|
|
user.ptr = user_data;
|
|
|
|
for(size_t idx = 0; idx < num_agents; idx++)
|
|
{
|
|
const auto* agent = static_cast<const rocprofiler_agent_v0_t*>(agents[idx]);
|
|
if(agent->type != ROCPROFILER_AGENT_TYPE_GPU) continue;
|
|
|
|
uint64_t buffer_size_gb = 1;
|
|
|
|
// Are we testing for larger buffers?
|
|
if(const char* var = std::getenv("ATT_LARGE_BUFFER_TEST"); var && atoi(var))
|
|
{
|
|
// To fully test this feature, we need >4GB per shader engine (>8GB total).
|
|
// Some RDNA GPUs only have 8GB of VRAM, so we have to use 5GB total = 2.5GB per SE.
|
|
uint64_t total_memory = 0;
|
|
for(uint32_t i = 0; i < agent->mem_banks_count; i++)
|
|
total_memory += agent->mem_banks[i].size_in_bytes;
|
|
|
|
// Check we have >11GB VRAM. If so, allocate 10GB.
|
|
if(total_memory > (11ul << 30))
|
|
buffer_size_gb = 10;
|
|
else
|
|
buffer_size_gb = 5;
|
|
}
|
|
|
|
uint64_t buffer_size_bytes = buffer_size_gb << 30;
|
|
if(agent->gfx_target_version / 10000 == 11u)
|
|
buffer_size_bytes = 255ul << 20; // gfx11 limititation
|
|
|
|
auto parameters = std::vector<rocprofiler_thread_trace_parameter_t>{};
|
|
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_TARGET_CU, {1}});
|
|
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SIMD_SELECT, {0xF}});
|
|
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_BUFFER_SIZE, {buffer_size_bytes}});
|
|
parameters.push_back({ROCPROFILER_THREAD_TRACE_PARAMETER_SHADER_ENGINE_MASK, {0x3}});
|
|
|
|
static const bool extra_args =
|
|
std::getenv("ATT_NODETAIL") ? std::stoi(std::getenv("ATT_NODETAIL")) != 0 : false;
|
|
if(extra_args)
|
|
{
|
|
// Dont generate instruction profiling, only occupancy and shaderdata
|
|
parameters.emplace_back(rocprofiler_thread_trace_parameter_t{
|
|
ROCPROFILER_THREAD_TRACE_PARAMETER_NO_DETAIL, {1}});
|
|
}
|
|
|
|
ROCPROFILER_CALL(
|
|
rocprofiler_configure_device_thread_trace_service(agent_ctx,
|
|
agent->id,
|
|
parameters.data(),
|
|
parameters.size(),
|
|
Callbacks::shader_data_callback,
|
|
user),
|
|
"thread trace service configure");
|
|
}
|
|
return ROCPROFILER_STATUS_SUCCESS;
|
|
}
|
|
|
|
int
|
|
tool_init(rocprofiler_client_finalize_t /* fini_func */, void* /* tool_data */)
|
|
{
|
|
Callbacks::init();
|
|
|
|
// Allocate callback state on heap for controlled destruction order
|
|
callback_state = new CallbackState{};
|
|
|
|
ROCPROFILER_CALL(rocprofiler_create_context(&tracing_ctx), "context creation");
|
|
ROCPROFILER_CALL(rocprofiler_create_context(&agent_ctx), "context creation");
|
|
|
|
ROCPROFILER_CALL(
|
|
rocprofiler_configure_callback_tracing_service(tracing_ctx,
|
|
ROCPROFILER_CALLBACK_TRACING_CODE_OBJECT,
|
|
nullptr,
|
|
0,
|
|
Callbacks::tool_codeobj_tracing_callback,
|
|
nullptr),
|
|
"code object tracing service configure");
|
|
|
|
ROCPROFILER_CALL(
|
|
rocprofiler_configure_callback_tracing_service(tracing_ctx,
|
|
ROCPROFILER_CALLBACK_TRACING_KERNEL_DISPATCH,
|
|
nullptr,
|
|
0,
|
|
dispatch_tracing_callback,
|
|
nullptr),
|
|
"dispatch tracing service configure");
|
|
|
|
ROCPROFILER_CALL(rocprofiler_query_available_agents(ROCPROFILER_AGENT_INFO_VERSION_0,
|
|
&query_available_agents,
|
|
sizeof(rocprofiler_agent_t),
|
|
nullptr),
|
|
"Failed to find GPU agents");
|
|
|
|
int valid_ctx = 0;
|
|
ROCPROFILER_CALL(rocprofiler_context_is_valid(agent_ctx, &valid_ctx), "validity check");
|
|
assert(valid_ctx != 0);
|
|
ROCPROFILER_CALL(rocprofiler_context_is_valid(tracing_ctx, &valid_ctx), "validity check");
|
|
assert(valid_ctx != 0);
|
|
|
|
ROCPROFILER_CALL(rocprofiler_start_context(tracing_ctx), "context start");
|
|
|
|
// no errors
|
|
return 0;
|
|
}
|
|
|
|
} // namespace Agent
|
|
} // namespace ATTTest
|
|
|
|
extern "C" rocprofiler_tool_configure_result_t*
|
|
rocprofiler_configure(uint32_t /* version */,
|
|
const char* /* runtime_version */,
|
|
uint32_t priority,
|
|
rocprofiler_client_id_t* id)
|
|
{
|
|
// only activate if main tool
|
|
if(priority > 0) return nullptr;
|
|
|
|
// set the client name
|
|
id->name = "ATT_test_agent";
|
|
|
|
// store client info
|
|
ATTTest::Agent::client_id = id;
|
|
|
|
// create configure data
|
|
static auto cfg =
|
|
rocprofiler_tool_configure_result_t{sizeof(rocprofiler_tool_configure_result_t),
|
|
&ATTTest::Agent::tool_init,
|
|
&ATTTest::Agent::tool_fini,
|
|
nullptr};
|
|
|
|
// return pointer to configure data
|
|
return &cfg;
|
|
}
|