1517a398bf
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
526 строки
19 KiB
Python
526 строки
19 KiB
Python
#!/usr/bin/env python3
|
|
|
|
# MIT License
|
|
#
|
|
# Copyright (c) 2024-2025 Advanced Micro Devices, Inc. All rights reserved.
|
|
#
|
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
# of this software and associated documentation files (the "Software"), to deal
|
|
# in the Software without restriction, including without limitation the rights
|
|
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
# copies of the Software, and to permit persons to whom the Software is
|
|
# furnished to do so, subject to the following conditions:
|
|
#
|
|
# The above copyright notice and this permission notice shall be included in
|
|
# all copies or substantial portions of the Software.
|
|
#
|
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
# THE SOFTWARE.
|
|
|
|
import sys
|
|
import pytest
|
|
|
|
test_api_traces = [
|
|
"hsa_api_traces",
|
|
"marker_api_traces",
|
|
"hip_api_traces",
|
|
"rccl_api_traces",
|
|
"scratch_memory_traces",
|
|
]
|
|
|
|
|
|
# helper function
|
|
def node_exists(name, data, min_len=1):
|
|
assert name in data
|
|
assert data[name] is not None
|
|
if isinstance(data[name], (list, tuple, dict, set)):
|
|
assert len(data[name]) >= min_len, f"{name}:\n{data}"
|
|
|
|
|
|
def get_operation(record, kind_name, op_name=None):
|
|
for idx, itr in enumerate(record["names"]):
|
|
if kind_name == itr["kind"]:
|
|
if op_name is None:
|
|
return idx, itr["operations"]
|
|
else:
|
|
for oidx, oname in enumerate(itr["operations"]):
|
|
if op_name == oname:
|
|
return oidx
|
|
|
|
return None
|
|
|
|
|
|
def get_operation_name(record, kind_idx, op_idx):
|
|
for idx, itr in enumerate(record["names"]):
|
|
if idx == kind_idx:
|
|
return itr["operations"][op_idx]
|
|
|
|
return None
|
|
|
|
|
|
def groupby_corr_id(trace_item, op_id=None):
|
|
"""
|
|
If op_id is not none, returns records only with that operation ID
|
|
|
|
{
|
|
corr_id-1: record with internal corr_id = corr_id-1
|
|
corr_id-2: record with internal corr_id = corr_id-2
|
|
...
|
|
}
|
|
"""
|
|
from rocprofiler_sdk.pytest_utils.dotdict import dotdict
|
|
|
|
ret = {}
|
|
|
|
for x in trace_item:
|
|
if op_id is not None and x.operation != op_id:
|
|
continue
|
|
|
|
corr_id = x.correlation_id["internal"]
|
|
|
|
if corr_id in ret.keys():
|
|
assert False, f"Duplicate internal corr_id {corr_id}"
|
|
else:
|
|
ret[corr_id] = x
|
|
|
|
return dotdict(ret)
|
|
|
|
|
|
def test_data_structure(input_data):
|
|
"""verify minimum amount of expected data is present"""
|
|
data = input_data
|
|
|
|
node_exists("rocprofiler-sdk-json-tool", data)
|
|
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
node_exists("metadata", sdk_data)
|
|
node_exists("pid", sdk_data["metadata"])
|
|
node_exists("main_tid", sdk_data["metadata"])
|
|
node_exists("init_time", sdk_data["metadata"])
|
|
node_exists("fini_time", sdk_data["metadata"])
|
|
|
|
node_exists("agents", sdk_data)
|
|
node_exists("call_stack", sdk_data)
|
|
node_exists("callback_records", sdk_data)
|
|
node_exists("buffer_records", sdk_data)
|
|
|
|
node_exists("names", sdk_data["callback_records"])
|
|
node_exists("code_objects", sdk_data["callback_records"])
|
|
node_exists("kernel_symbols", sdk_data["callback_records"])
|
|
node_exists("host_functions", sdk_data["callback_records"])
|
|
node_exists("hsa_api_traces", sdk_data["callback_records"])
|
|
node_exists("hip_api_traces", sdk_data["callback_records"], 0)
|
|
node_exists("marker_api_traces", sdk_data["callback_records"])
|
|
node_exists("kernel_dispatch", sdk_data["callback_records"])
|
|
node_exists("memory_copies", sdk_data["callback_records"], 24)
|
|
|
|
node_exists("names", sdk_data["buffer_records"])
|
|
node_exists("kernel_dispatch", sdk_data["buffer_records"])
|
|
node_exists("memory_copies", sdk_data["buffer_records"], 12)
|
|
node_exists("hsa_api_traces", sdk_data["buffer_records"])
|
|
node_exists("hip_api_traces", sdk_data["buffer_records"], 0)
|
|
node_exists("marker_api_traces", sdk_data["buffer_records"])
|
|
node_exists("retired_correlation_ids", sdk_data["buffer_records"])
|
|
|
|
|
|
def test_size_entries(input_data):
|
|
# check that size fields are > 0 but account for function arguments
|
|
# which are named "size"
|
|
def check_size(data, bt):
|
|
if "size" in data.keys():
|
|
if isinstance(data["size"], str) and bt.endswith('["args"]'):
|
|
pass
|
|
else:
|
|
assert data["size"] > 0, f"origin: {bt}"
|
|
|
|
# recursively check the entire data structure
|
|
def iterate_data(data, bt):
|
|
if isinstance(data, (list, tuple)):
|
|
for i, itr in enumerate(data):
|
|
if isinstance(itr, dict):
|
|
check_size(itr, f"{bt}[{i}]")
|
|
iterate_data(itr, f"{bt}[{i}]")
|
|
elif isinstance(data, dict):
|
|
check_size(data, f"{bt}")
|
|
for key, itr in data.items():
|
|
iterate_data(itr, f'{bt}["{key}"]')
|
|
|
|
# start recursive check over entire JSON dict
|
|
iterate_data(input_data, "input_data")
|
|
|
|
|
|
def test_timestamps(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
cb_start = {}
|
|
cb_end = {}
|
|
for titr in test_api_traces:
|
|
for itr in sdk_data["callback_records"][titr]:
|
|
cid = itr["correlation_id"]["internal"]
|
|
phase = itr["phase"]
|
|
if phase == 1:
|
|
cb_start[cid] = itr["timestamp"]
|
|
elif phase == 2:
|
|
cb_end[cid] = itr["timestamp"]
|
|
assert cb_start[cid] <= itr["timestamp"]
|
|
else:
|
|
assert phase == 1 or phase == 2
|
|
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
assert itr["start_timestamp"] <= itr["end_timestamp"]
|
|
|
|
for titr in ["kernel_dispatch", "memory_copies"]:
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
assert itr["start_timestamp"] < itr["end_timestamp"], f"[{titr}] {itr}"
|
|
assert itr["correlation_id"]["internal"] > 0, f"[{titr}] {itr}"
|
|
assert itr["correlation_id"]["external"] > 0, f"[{titr}] {itr}"
|
|
assert (
|
|
sdk_data["metadata"]["init_time"] < itr["start_timestamp"]
|
|
), f"[{titr}] {itr}"
|
|
assert (
|
|
sdk_data["metadata"]["init_time"] < itr["end_timestamp"]
|
|
), f"[{titr}] {itr}"
|
|
assert (
|
|
sdk_data["metadata"]["fini_time"] > itr["start_timestamp"]
|
|
), f"[{titr}] {itr}"
|
|
assert (
|
|
sdk_data["metadata"]["fini_time"] > itr["end_timestamp"]
|
|
), f"[{titr}] {itr}"
|
|
|
|
api_start = cb_start[itr["correlation_id"]["internal"]]
|
|
# api_end = cb_end[itr["correlation_id"]["internal"]]
|
|
assert api_start < itr["start_timestamp"], f"[{titr}] {itr}"
|
|
# assert api_end <= itr["end_timestamp"], f"[{titr}] {itr}"
|
|
|
|
|
|
def test_internal_correlation_ids(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
api_corr_ids = []
|
|
for titr in test_api_traces:
|
|
for itr in sdk_data["callback_records"][titr]:
|
|
api_corr_ids.append(itr["correlation_id"]["internal"])
|
|
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
api_corr_ids.append(itr["correlation_id"]["internal"])
|
|
|
|
api_corr_ids_sorted = sorted(api_corr_ids)
|
|
api_corr_ids_unique = list(set(api_corr_ids))
|
|
|
|
for itr in sdk_data["buffer_records"]["kernel_dispatch"]:
|
|
assert itr["correlation_id"]["internal"] in api_corr_ids_unique
|
|
|
|
for itr in sdk_data["buffer_records"]["memory_copies"]:
|
|
assert itr["correlation_id"]["internal"] in api_corr_ids_unique
|
|
|
|
len_corr_id_unq = len(api_corr_ids_unique)
|
|
assert len(api_corr_ids) != len_corr_id_unq
|
|
assert max(api_corr_ids_sorted) == len_corr_id_unq
|
|
|
|
|
|
def test_external_correlation_ids(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
extern_corr_ids = []
|
|
for titr in test_api_traces:
|
|
for itr in sdk_data["callback_records"][titr]:
|
|
assert itr["correlation_id"]["external"] > 0
|
|
assert itr["thread_id"] == itr["correlation_id"]["external"]
|
|
extern_corr_ids.append(itr["correlation_id"]["external"])
|
|
|
|
extern_corr_ids = list(set(sorted(extern_corr_ids)))
|
|
for titr in test_api_traces:
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
assert itr["correlation_id"]["external"] > 0, f"[{titr}] {itr}"
|
|
assert (
|
|
itr["thread_id"] == itr["correlation_id"]["external"]
|
|
), f"[{titr}] {itr}"
|
|
assert itr["thread_id"] in extern_corr_ids, f"[{titr}] {itr}"
|
|
assert itr["correlation_id"]["external"] in extern_corr_ids, f"[{titr}] {itr}"
|
|
|
|
for titr in ["kernel_dispatch", "memory_copies"]:
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
assert itr["correlation_id"]["external"] > 0, f"[{titr}] {itr}"
|
|
assert itr["correlation_id"]["external"] in extern_corr_ids, f"[{titr}] {itr}"
|
|
|
|
for itr in sdk_data["callback_records"][titr]:
|
|
assert itr["correlation_id"]["external"] > 0, f"[{titr}] {itr}"
|
|
assert itr["correlation_id"]["external"] in extern_corr_ids, f"[{titr}] {itr}"
|
|
|
|
|
|
def test_kernel_ids(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
symbol_info = {}
|
|
for itr in sdk_data["callback_records"]["kernel_symbols"]:
|
|
phase = itr["phase"]
|
|
payload = itr["payload"]
|
|
kern_id = payload["kernel_id"]
|
|
|
|
assert phase == 1 or phase == 2
|
|
assert kern_id > 0
|
|
if phase == 1:
|
|
assert len(payload["kernel_name"]) > 0
|
|
symbol_info[kern_id] = payload
|
|
elif phase == 2:
|
|
assert payload["kernel_id"] in symbol_info.keys()
|
|
assert payload["kernel_name"] == symbol_info[kern_id]["kernel_name"]
|
|
|
|
for itr in sdk_data["buffer_records"]["kernel_dispatch"]:
|
|
assert itr["dispatch_info"]["kernel_id"] in symbol_info.keys()
|
|
|
|
for itr in sdk_data["callback_records"]["kernel_dispatch"]:
|
|
assert itr["payload"]["dispatch_info"]["kernel_id"] in symbol_info.keys()
|
|
|
|
|
|
def test_kernel_dispatch_ids(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
num_dispatches = len(sdk_data["buffer_records"]["kernel_dispatch"])
|
|
num_cb_dispatches = len(sdk_data["callback_records"]["kernel_dispatch"])
|
|
|
|
assert num_cb_dispatches == (3 * num_dispatches)
|
|
|
|
bf_seq_ids = []
|
|
for itr in sdk_data["buffer_records"]["kernel_dispatch"]:
|
|
bf_seq_ids.append(itr["dispatch_info"]["dispatch_id"])
|
|
|
|
cb_seq_ids = []
|
|
for itr in sdk_data["callback_records"]["kernel_dispatch"]:
|
|
cb_seq_ids.append(itr["payload"]["dispatch_info"]["dispatch_id"])
|
|
|
|
bf_seq_ids = sorted(bf_seq_ids)
|
|
cb_seq_ids = sorted(cb_seq_ids)
|
|
|
|
assert (3 * len(bf_seq_ids)) == len(cb_seq_ids)
|
|
|
|
assert bf_seq_ids[0] == cb_seq_ids[0]
|
|
assert bf_seq_ids[-1] == cb_seq_ids[-1]
|
|
|
|
def get_uniq(data):
|
|
return list(set(data))
|
|
|
|
bf_seq_ids_uniq = get_uniq(bf_seq_ids)
|
|
cb_seq_ids_uniq = get_uniq(cb_seq_ids)
|
|
|
|
assert bf_seq_ids == bf_seq_ids_uniq
|
|
assert len(cb_seq_ids) == (3 * len(cb_seq_ids_uniq))
|
|
assert len(bf_seq_ids) == num_dispatches
|
|
assert len(bf_seq_ids_uniq) == num_dispatches
|
|
assert len(cb_seq_ids_uniq) == num_dispatches
|
|
|
|
|
|
def test_async_copy_direction(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
|
|
# Direction values:
|
|
# 0 == ??? (unknown)
|
|
# 1 == H2H (host to host)
|
|
# 2 == H2D (host to device)
|
|
# 3 == D2H (device to host)
|
|
# 4 == D2D (device to device)
|
|
default_async_dir_cnt = dict([(idx, 0) for idx in range(0, 5)])
|
|
thread_async_dir_cnt = {}
|
|
for itr in sdk_data.buffer_records.memory_copies:
|
|
tid = itr.thread_id
|
|
if tid not in thread_async_dir_cnt.keys():
|
|
thread_async_dir_cnt[tid] = default_async_dir_cnt
|
|
op_id = itr.operation
|
|
assert op_id > 1, f"{itr}"
|
|
assert op_id < 4, f"{itr}"
|
|
thread_async_dir_cnt[tid][op_id] += 1
|
|
|
|
for itr in sdk_data.callback_records.memory_copies:
|
|
tid = itr.thread_id
|
|
if tid not in thread_async_dir_cnt.keys():
|
|
thread_async_dir_cnt[tid] = default_async_dir_cnt
|
|
op_id = itr.operation
|
|
assert op_id > 1, f"{itr}"
|
|
assert op_id < 4, f"{itr}"
|
|
thread_async_dir_cnt[tid][op_id] += 1
|
|
|
|
phase = itr.phase
|
|
pitr = itr.payload
|
|
|
|
assert phase is not None, f"{itr}"
|
|
assert pitr is not None, f"{itr}"
|
|
|
|
if phase == 1:
|
|
assert pitr.start_timestamp == 0, f"{itr}"
|
|
assert pitr.end_timestamp == 0, f"{itr}"
|
|
elif phase == 2:
|
|
assert pitr.start_timestamp > 0, f"{itr}"
|
|
assert pitr.end_timestamp > 0, f"{itr}"
|
|
assert pitr.end_timestamp >= pitr.start_timestamp, f"{itr}"
|
|
else:
|
|
assert phase == 1 or phase == 2, f"{itr}"
|
|
|
|
# in the transpose test which generates the input file,
|
|
# two threads each perform one H2D + one D2H memory copy.
|
|
# there are at least two callback records (phase start +
|
|
# phase end) and one buffer record for each memory copy,
|
|
# i.e., at least 3 records per memory copy
|
|
assert len(thread_async_dir_cnt) == 2, f"{thread_async_dir_cnt}"
|
|
for tid, async_dir_cnt in thread_async_dir_cnt.items():
|
|
min_copy_records = 3
|
|
assert async_dir_cnt[0] == 0
|
|
assert async_dir_cnt[1] == 0
|
|
assert async_dir_cnt[2] >= min_copy_records, f"TID={tid}:\n\t{async_dir_cnt}"
|
|
assert async_dir_cnt[3] >= min_copy_records, f"TID={tid}:\n\t{async_dir_cnt}"
|
|
assert async_dir_cnt[4] == 0
|
|
# HIP memory copies may be decomposed into more than one
|
|
# memory copy at the HSA level so require it to be a multiple
|
|
# of min_copy_records
|
|
assert (
|
|
async_dir_cnt[2] % min_copy_records
|
|
) == 0, f"TID={tid}:\n\t{async_dir_cnt}"
|
|
assert (
|
|
async_dir_cnt[3] % min_copy_records
|
|
) == 0, f"TID={tid}:\n\t{async_dir_cnt}"
|
|
|
|
|
|
def test_ancestor_ids(input_data):
|
|
"""
|
|
This test ensures that each memcpy can be traced back to
|
|
a hipMemcpyAsync through ancestor IDs
|
|
"""
|
|
from rocprofiler_sdk.pytest_utils.dotdict import dotdict
|
|
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
buffer_records = sdk_data.buffer_records
|
|
memcopies = buffer_records.memory_copies
|
|
|
|
_, hip_op_ids = get_operation(buffer_records, "HIP_RUNTIME_API")
|
|
hip_memcopy_id = get_operation(buffer_records, "HIP_RUNTIME_API", "hipMemcpyAsync")
|
|
|
|
# dict with { internal id : record }
|
|
hip_memcopies = groupby_corr_id(buffer_records.hip_api_traces, hip_memcopy_id)
|
|
|
|
hsa_records = groupby_corr_id(buffer_records.hsa_api_traces)
|
|
hip_records = groupby_corr_id(buffer_records.hip_api_traces)
|
|
|
|
accounted_for_hip_ids = []
|
|
|
|
for memcpy in memcopies:
|
|
parent_hsa_call = hsa_records[memcpy.correlation_id.internal]
|
|
parent_hip_call = hip_records[parent_hsa_call.correlation_id.ancestor]
|
|
|
|
assert (
|
|
parent_hip_call.thread_id == parent_hsa_call.thread_id
|
|
), "Expected hsa and hip calls to be on the same thread"
|
|
assert hip_op_ids[parent_hip_call.operation] == "hipMemcpyAsync"
|
|
accounted_for_hip_ids.append(parent_hip_call.correlation_id.internal)
|
|
|
|
# Ensure we looked through all HIP entries
|
|
assert (
|
|
set(accounted_for_hip_ids) == set(hip_memcopies.keys()),
|
|
"Expected to account for all HIP memcpy calls through ancestor ID lookup",
|
|
)
|
|
|
|
|
|
def test_retired_correlation_ids(input_data):
|
|
data = input_data
|
|
sdk_data = data["rocprofiler-sdk-json-tool"]
|
|
buffer_records = sdk_data["buffer_records"]
|
|
api_name_info = {}
|
|
|
|
def _sort_dict(inp):
|
|
return dict(sorted(inp.items()))
|
|
|
|
api_corr_ids = {}
|
|
for titr in test_api_traces:
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
corr_id = itr["correlation_id"]["internal"]
|
|
name = get_operation_name(buffer_records, itr["kind"], itr["operation"])
|
|
|
|
assert corr_id not in api_corr_ids.keys()
|
|
assert name is not None, f"{itr}"
|
|
|
|
api_corr_ids[corr_id] = itr
|
|
api_name_info[corr_id] = name
|
|
|
|
async_corr_ids = {}
|
|
for titr in ["kernel_dispatch", "memory_copies"]:
|
|
for itr in sdk_data["buffer_records"][titr]:
|
|
corr_id = itr["correlation_id"]["internal"]
|
|
assert corr_id not in async_corr_ids.keys()
|
|
async_corr_ids[corr_id] = itr
|
|
|
|
retired_corr_ids = {}
|
|
for itr in sdk_data["buffer_records"]["retired_correlation_ids"]:
|
|
corr_id = itr["internal_correlation_id"]
|
|
assert corr_id not in retired_corr_ids.keys()
|
|
retired_corr_ids[corr_id] = itr
|
|
|
|
api_corr_ids = _sort_dict(api_corr_ids)
|
|
async_corr_ids = _sort_dict(async_corr_ids)
|
|
retired_corr_ids = _sort_dict(retired_corr_ids)
|
|
|
|
#
|
|
# verify all the correlation ids were retired
|
|
#
|
|
num_api_corr_ids = len(api_corr_ids.keys())
|
|
num_retired_corr_ids = len(retired_corr_ids.keys())
|
|
|
|
missing_retired_corr_ids = [
|
|
itr for itr in api_corr_ids.keys() if itr not in retired_corr_ids.keys()
|
|
]
|
|
# log in case of failure
|
|
sys.stderr.flush()
|
|
for itr in missing_retired_corr_ids:
|
|
name = api_name_info[itr]
|
|
info = api_corr_ids[itr]
|
|
sys.stderr.write(f"- unretired corr id: {itr} :: {name} :: {info}\n")
|
|
sys.stderr.flush()
|
|
|
|
assert (
|
|
num_api_corr_ids == num_retired_corr_ids
|
|
), f"correlation ids not retired:\n\t{missing_retired_corr_ids}"
|
|
|
|
#
|
|
# verify the retirement timestamp is >= the end timestamp of the records
|
|
#
|
|
for cid, itr in api_corr_ids.items():
|
|
assert cid in retired_corr_ids.keys()
|
|
retired_ts = retired_corr_ids[cid]["timestamp"]
|
|
end_ts = itr["end_timestamp"]
|
|
name = api_name_info[cid]
|
|
assert (
|
|
retired_ts - end_ts
|
|
) >= 0, f"\n\tcorr: {cid}\n\tname: {name}\n\tdata: {itr}"
|
|
|
|
# allow the retired timestamp to be 10 usec earlier than async end timestamp
|
|
# since the async timestamps undergo conversion from the GPU clock domain to
|
|
# the CPU clock domain. 10 microseconds was arbitrarily chosen to be an
|
|
# acceptable amount of inaccuracy -- in an ideal world, retired_ts should
|
|
# always be >= end_ts
|
|
usec = 1000
|
|
supported_fuzzing = 10 * usec
|
|
|
|
for cid, itr in async_corr_ids.items():
|
|
assert cid in retired_corr_ids.keys()
|
|
retired_ts = retired_corr_ids[cid]["timestamp"]
|
|
end_ts = itr["end_timestamp"]
|
|
name = api_name_info[cid]
|
|
assert (
|
|
retired_ts - end_ts
|
|
) >= -supported_fuzzing, f"\n\tcorr: {cid}\n\tname: {name}\n\tdata: {itr}"
|
|
|
|
|
|
if __name__ == "__main__":
|
|
exit_code = pytest.main(["-x", __file__] + sys.argv[1:])
|
|
sys.exit(exit_code)
|