1517a398bf
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
738 lignes
24 KiB
Python
Fichiers exécutables
738 lignes
24 KiB
Python
Fichiers exécutables
#!/usr/bin/env python3
|
|
|
|
# MIT License
|
|
#
|
|
# Copyright (c) 2024-2025 Advanced Micro Devices, Inc. All rights reserved.
|
|
#
|
|
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
# of this software and associated documentation files (the "Software"), to deal
|
|
# in the Software without restriction, including without limitation the rights
|
|
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
# copies of the Software, and to permit persons to whom the Software is
|
|
# furnished to do so, subject to the following conditions:
|
|
#
|
|
# The above copyright notice and this permission notice shall be included in
|
|
# all copies or substantial portions of the Software.
|
|
#
|
|
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
|
# THE SOFTWARE.
|
|
|
|
|
|
import os
|
|
import re
|
|
import sys
|
|
import glob
|
|
import socket
|
|
import shutil
|
|
import argparse
|
|
import multiprocessing
|
|
|
|
# this constant is used to define CTEST_PROJECT_NAME
|
|
# and default value for CTEST_SUBMIT_URL
|
|
# _PROJECT_NAME = "rocprofiler-v2-internal"
|
|
# _BASE_URL = "10.194.116.31/cdash"
|
|
_PROJECT_NAME = "rocprofiler-sdk-alt"
|
|
_BASE_URL = "my.cdash.org"
|
|
_GCOVR_GENERATE_CMD = None
|
|
|
|
# these are various default values
|
|
_VISIBLE_PROJECT_NAME = _PROJECT_NAME.replace("-internal", "")
|
|
_DEFAULT_ROCM_PATH = os.path.realpath(os.environ.get("ROCM_PATH", "/opt/rocm"))
|
|
_DEFAULT_INSTALL_PREFIX = (
|
|
os.path.realpath(_DEFAULT_ROCM_PATH)
|
|
if os.path.exists(_DEFAULT_ROCM_PATH)
|
|
else f"/opt/{_VISIBLE_PROJECT_NAME}"
|
|
)
|
|
_DEFAULT_GPU_TARGETS = os.environ.get(
|
|
"GPU_TARGETS",
|
|
"gfx900 gfx906 gfx908 gfx90a gfx942 gfx950 gfx1030 gfx1100 gfx1101 gfx1102",
|
|
).split()
|
|
|
|
|
|
def which(cmd, require):
|
|
v = shutil.which(cmd)
|
|
if require and v is None:
|
|
raise RuntimeError(f"{cmd} not found")
|
|
return v if v is not None else ""
|
|
|
|
|
|
def generate_custom(args, cmake_args, ctest_args):
|
|
if not os.path.exists(args.binary_dir):
|
|
os.makedirs(args.binary_dir)
|
|
|
|
if args.memcheck is not None:
|
|
if args.coverage:
|
|
raise ValueError(
|
|
f"Enabling --memcheck={args.memcheck} and --coverage not supported"
|
|
)
|
|
cmake_args += [f"-DROCPROFILER_MEMCHECK={args.memcheck}"]
|
|
|
|
NAME = args.name
|
|
SITE = args.site
|
|
BUILD_JOBS = args.build_jobs
|
|
SUBMIT_URL = args.submit_url
|
|
SOURCE_DIR = os.path.realpath(args.source_dir)
|
|
BINARY_DIR = os.path.realpath(args.binary_dir)
|
|
CMAKE_ARGS = " ".join(cmake_args)
|
|
CTEST_ARGS = " ".join(['"{}"'.format(x.replace('"', '\\"')) for x in ctest_args])
|
|
|
|
GIT_CMD = which("git", require=True)
|
|
GCOV_CMD = which("gcov", require=False)
|
|
GCOVR_CMD = which("gcovr", require=False)
|
|
CMAKE_CMD = which("cmake", require=True)
|
|
# CTEST_CMD = which("ctest", require=True)
|
|
|
|
NAME = re.sub(r"(.*)-([0-9]+)/merge", "PR_\\2_\\1", NAME)
|
|
|
|
def option_in_args(_key, _args):
|
|
_union = [x for x in _args if f"{_key}=" in x]
|
|
return len(_union) != 0
|
|
|
|
DEFAULT_CMAKE_ARGS = []
|
|
for key, value in dict(
|
|
[
|
|
["CMAKE_BUILD_TYPE", "RelWithDebInfo"],
|
|
["CMAKE_INSTALL_PREFIX", f"{_DEFAULT_INSTALL_PREFIX}"],
|
|
["CPACK_PACKAGING_INSTALL_PREFIX", f"{_DEFAULT_INSTALL_PREFIX}"],
|
|
["CPACK_GENERATOR", "DEB;RPM;TGZ"],
|
|
["Python3_EXECUTABLE", sys.executable],
|
|
]
|
|
+ [[f"ROCPROFILER_BUILD_{x}", "ON"] for x in ["CI", "TESTS", "SAMPLES"]]
|
|
).items():
|
|
if not option_in_args(key, cmake_args):
|
|
DEFAULT_CMAKE_ARGS += [f"-D{key}={value}"]
|
|
|
|
DEFAULT_CMAKE_ARGS = " ".join(DEFAULT_CMAKE_ARGS)
|
|
|
|
GPU_TARGETS = ";".join(args.gpu_targets)
|
|
MEMCHECK_TYPE = "" if args.memcheck is None else args.memcheck
|
|
|
|
MEMCHECK_SANITIZER_OPTIONS = ""
|
|
MEMCHECK_SUPPRESSION_FILE = ""
|
|
|
|
if MEMCHECK_TYPE == "AddressSanitizer":
|
|
# print_suppressions=1 shows which suppressions matched during the run
|
|
MEMCHECK_SANITIZER_OPTIONS = (
|
|
"detect_leaks=0 use_sigaltstack=0 print_suppressions=1"
|
|
)
|
|
MEMCHECK_SUPPRESSION_FILE = (
|
|
f"{SOURCE_DIR}/source/scripts/address-sanitizer-suppr.txt"
|
|
)
|
|
os.environ["ASAN_OPTIONS"] = " ".join(
|
|
[
|
|
"detect_leaks=0",
|
|
"use_sigaltstack=0",
|
|
"print_suppressions=1",
|
|
f"suppressions={SOURCE_DIR}/source/scripts/address-sanitizer-suppr.txt",
|
|
os.environ.get("ASAN_OPTIONS", ""),
|
|
]
|
|
)
|
|
elif MEMCHECK_TYPE == "LeakSanitizer":
|
|
# fast_unwind_on_malloc=1 avoids deadlock in libgcc unwinder during early init
|
|
# print_suppressions=1 shows which suppressions matched during the run
|
|
MEMCHECK_SANITIZER_OPTIONS = "fast_unwind_on_malloc=1 print_suppressions=1"
|
|
MEMCHECK_SUPPRESSION_FILE = (
|
|
f"{SOURCE_DIR}/source/scripts/leak-sanitizer-suppr.txt"
|
|
)
|
|
os.environ["LSAN_OPTIONS"] = " ".join(
|
|
[
|
|
f"suppressions={SOURCE_DIR}/source/scripts/leak-sanitizer-suppr.txt",
|
|
"fast_unwind_on_malloc=1",
|
|
"print_suppressions=1",
|
|
os.environ.get("LSAN_OPTIONS", ""),
|
|
]
|
|
)
|
|
elif MEMCHECK_TYPE == "ThreadSanitizer":
|
|
# print_suppressions=1 shows which suppressions matched during the run
|
|
external_symbolizer_path = ""
|
|
for version in range(8, 20):
|
|
_symbolizer = shutil.which(f"llvm-symbolizer-{version}")
|
|
if _symbolizer:
|
|
external_symbolizer_path = f"external_symbolizer_path={_symbolizer}"
|
|
os.environ["TSAN_OPTIONS"] = " ".join(
|
|
[
|
|
"history_size=5",
|
|
"detect_deadlocks=0",
|
|
"print_suppressions=1",
|
|
f"suppressions={SOURCE_DIR}/source/scripts/thread-sanitizer-suppr.txt",
|
|
external_symbolizer_path,
|
|
os.environ.get("TSAN_OPTIONS", ""),
|
|
]
|
|
)
|
|
elif MEMCHECK_TYPE == "UndefinedBehaviorSanitizer":
|
|
MEMCHECK_SUPPRESSION_FILE = (
|
|
f"{SOURCE_DIR}/source/scripts/undef-behavior-sanitizer-suppr.txt"
|
|
)
|
|
os.environ["UBSAN_OPTIONS"] = " ".join(
|
|
[
|
|
"print_stacktrace=1",
|
|
f"suppressions={SOURCE_DIR}/source/scripts/undef-behavior-sanitizer-suppr.txt",
|
|
os.environ.get("UBSAN_OPTIONS", ""),
|
|
]
|
|
)
|
|
|
|
# Print suppression file contents for debugging
|
|
if MEMCHECK_TYPE:
|
|
print(f"\n{'=' * 60}")
|
|
print(f"Sanitizer: {MEMCHECK_TYPE}")
|
|
print(f"{'=' * 60}")
|
|
|
|
# Print environment variables for sanitizers that use them
|
|
for env_var in ["TSAN_OPTIONS", "UBSAN_OPTIONS", "ASAN_OPTIONS", "LSAN_OPTIONS"]:
|
|
if env_var in os.environ:
|
|
print(f"\n{env_var}:")
|
|
print(f" {os.environ[env_var]}")
|
|
|
|
print(f"\n{'=' * 60}\n")
|
|
|
|
codecov_exclude = [
|
|
"/usr/.*",
|
|
"/opt/.*",
|
|
"external/.*",
|
|
"samples/.*",
|
|
"tests/.*",
|
|
".*/external/.*",
|
|
".*/samples/.*",
|
|
".*/tests/.*",
|
|
".*/details/.*",
|
|
".*/counters/parser/.*",
|
|
]
|
|
if args.coverage == "samples":
|
|
codecov_exclude += [
|
|
".*/lib/common/.*",
|
|
".*/lib/output/.*",
|
|
".*/lib/att-tool/.*",
|
|
".*/lib/rocprofiler-sdk-tool/.*",
|
|
]
|
|
|
|
COVERAGE_EXCLUDE = ";".join(codecov_exclude)
|
|
|
|
if args.coverage and GCOVR_CMD:
|
|
global _GCOVR_GENERATE_CMD
|
|
|
|
codecov_dir = os.path.join(args.source_dir, ".codecov")
|
|
codecov_xml = os.path.join(codecov_dir, f"{args.coverage}.xml")
|
|
codecov_html = os.path.join(codecov_dir, f"{args.coverage}.html")
|
|
|
|
if not os.path.exists(codecov_dir):
|
|
os.makedirs(codecov_dir)
|
|
|
|
with open(os.path.join(codecov_dir, ".gitignore"), "w") as f:
|
|
f.write("/*\n")
|
|
|
|
gcovr_codecov_exclude = []
|
|
for itr in codecov_exclude:
|
|
gcovr_codecov_exclude += ["--exclude", f"{itr}"]
|
|
|
|
_GCOVR_GENERATE_CMD = (
|
|
[GCOVR_CMD]
|
|
+ [
|
|
"--root",
|
|
f"{args.source_dir}",
|
|
"--exclude-unreachable-branches",
|
|
"--exclude-throw-branches",
|
|
"--gcov-ignore-parse-errors",
|
|
"--gcov-executable",
|
|
GCOV_CMD,
|
|
"-s",
|
|
"-p",
|
|
"--xml",
|
|
codecov_xml,
|
|
"--html-details",
|
|
codecov_html,
|
|
]
|
|
+ gcovr_codecov_exclude
|
|
+ [args.source_dir]
|
|
)
|
|
|
|
return f"""
|
|
set(CTEST_PROJECT_NAME "{_PROJECT_NAME}")
|
|
set(CTEST_NIGHTLY_START_TIME "05:00:00 UTC")
|
|
|
|
set(CTEST_DROP_METHOD "https")
|
|
set(CTEST_DROP_SITE_CDASH TRUE)
|
|
set(CTEST_SUBMIT_URL "https://{SUBMIT_URL}")
|
|
|
|
set(CTEST_UPDATE_TYPE git)
|
|
set(CTEST_UPDATE_VERSION_ONLY TRUE)
|
|
set(CTEST_GIT_COMMAND {GIT_CMD})
|
|
set(CTEST_GIT_INIT_SUBMODULES FALSE)
|
|
|
|
set(CTEST_OUTPUT_ON_FAILURE TRUE)
|
|
set(CTEST_USE_LAUNCHERS TRUE)
|
|
set(CMAKE_CTEST_ARGUMENTS "--output-on-failure" {CTEST_ARGS})
|
|
|
|
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_ERRORS "100")
|
|
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_WARNINGS "100")
|
|
set(CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE "51200")
|
|
set(CTEST_CUSTOM_COVERAGE_EXCLUDE "{COVERAGE_EXCLUDE}")
|
|
|
|
set(CTEST_MEMORYCHECK_TYPE "{MEMCHECK_TYPE}")
|
|
set(CTEST_MEMORYCHECK_SUPPRESSIONS_FILE "{MEMCHECK_SUPPRESSION_FILE}")
|
|
set(CTEST_MEMORYCHECK_SANITIZER_OPTIONS "{MEMCHECK_SANITIZER_OPTIONS}")
|
|
|
|
set(CTEST_SITE "{SITE}")
|
|
set(CTEST_BUILD_NAME "{NAME}")
|
|
|
|
set(CTEST_SOURCE_DIRECTORY {SOURCE_DIR})
|
|
set(CTEST_BINARY_DIRECTORY {BINARY_DIR})
|
|
|
|
set(CTEST_CONFIGURE_COMMAND "{CMAKE_CMD} -B {BINARY_DIR} {SOURCE_DIR} {DEFAULT_CMAKE_ARGS} -DGPU_TARGETS={GPU_TARGETS} {CMAKE_ARGS}")
|
|
set(CTEST_BUILD_COMMAND "{CMAKE_CMD} --build {BINARY_DIR} --target all --parallel {BUILD_JOBS}")
|
|
set(CTEST_COVERAGE_COMMAND {GCOV_CMD})
|
|
"""
|
|
|
|
|
|
def generate_dashboard_script(args):
|
|
CODECOV = 1 if args.coverage else 0
|
|
DASHBOARD_MODE = args.mode
|
|
SOURCE_DIR = os.path.realpath(args.source_dir)
|
|
BINARY_DIR = os.path.realpath(args.binary_dir)
|
|
MEMCHECK = 1 if args.memcheck is not None else 0
|
|
SUBMIT = 0 if args.disable_cdash else 1
|
|
STRICT_SUBMIT = 1 if args.require_cdash_submission else 0
|
|
ARGN = "${ARGN}"
|
|
SUBMIT_ERR = "${_cdash_submit_err}"
|
|
REPO_SOURCE_DIR = (
|
|
os.path.dirname(os.path.dirname((SOURCE_DIR)))
|
|
if not os.path.exists(os.path.join(SOURCE_DIR, ".git"))
|
|
else SOURCE_DIR
|
|
)
|
|
|
|
if args.memcheck == "ThreadSanitizer":
|
|
MEMCHECK = 0
|
|
|
|
_script = f"""
|
|
cmake_minimum_required(VERSION 3.21 FATAL_ERROR)
|
|
|
|
macro(dashboard_submit)
|
|
if("{SUBMIT}" GREATER 0)
|
|
ctest_submit({ARGN}
|
|
RETRY_COUNT 0
|
|
RETRY_DELAY 10
|
|
CAPTURE_CMAKE_ERROR _cdash_submit_err)
|
|
|
|
if(NOT _cdash_submit_err EQUAL 0)
|
|
if("{STRICT_SUBMIT}" GREATER 0)
|
|
message(FATAL_ERROR "CDash submission failed: {SUBMIT_ERR}")
|
|
else()
|
|
message(AUTHOR_WARNING "CDash submission failure ignored due to absence of --require-cdash-submission")
|
|
endif()
|
|
endif()
|
|
endif()
|
|
endmacro()
|
|
"""
|
|
|
|
_script += """
|
|
|
|
include("${CMAKE_CURRENT_LIST_DIR}/CTestCustom.cmake")
|
|
|
|
macro(handle_error _message _ret)
|
|
if(NOT ${${_ret}} EQUAL 0)
|
|
dashboard_submit(PARTS Done RETURN_VALUE _submit_ret)
|
|
message(FATAL_ERROR "${_message} failed: ${${_ret}}")
|
|
endif()
|
|
endmacro()
|
|
"""
|
|
|
|
STAGES = ";".join([itr.upper() for itr in args.stages])
|
|
|
|
_script += f"""
|
|
set(STAGES "{STAGES}")
|
|
ctest_start({DASHBOARD_MODE})
|
|
ctest_update(SOURCE "{REPO_SOURCE_DIR}" RETURN_VALUE _update_ret
|
|
CAPTURE_CMAKE_ERROR _update_err)
|
|
ctest_configure(BUILD "{BINARY_DIR}" RETURN_VALUE _configure_ret)
|
|
dashboard_submit(PARTS Start Update Configure RETURN_VALUE _submit_ret)
|
|
|
|
if(NOT _update_err EQUAL 0)
|
|
message(WARNING "ctest_update failed")
|
|
endif()
|
|
|
|
handle_error("Configure" _configure_ret)
|
|
|
|
if("BUILD" IN_LIST STAGES)
|
|
ctest_build(BUILD "{BINARY_DIR}" RETURN_VALUE _build_ret)
|
|
dashboard_submit(PARTS Build RETURN_VALUE _submit_ret)
|
|
|
|
handle_error("Build" _build_ret)
|
|
endif()
|
|
|
|
if("TEST" IN_LIST STAGES)
|
|
if("{MEMCHECK}" GREATER 0)
|
|
ctest_memcheck(BUILD "{BINARY_DIR}" RETURN_VALUE _test_ret)
|
|
dashboard_submit(PARTS Test RETURN_VALUE _submit_ret)
|
|
else()
|
|
ctest_test(BUILD "{BINARY_DIR}" RETURN_VALUE _test_ret)
|
|
dashboard_submit(PARTS Test RETURN_VALUE _submit_ret)
|
|
endif()
|
|
endif()
|
|
|
|
if("{CODECOV}" GREATER 0 AND "COVERAGE" IN_LIST STAGES)
|
|
ctest_coverage(
|
|
BUILD "{BINARY_DIR}"
|
|
RETURN_VALUE _coverage_ret
|
|
CAPTURE_CMAKE_ERROR _coverage_err)
|
|
dashboard_submit(PARTS Coverage RETURN_VALUE _submit_ret)
|
|
endif()
|
|
|
|
handle_error("Testing" _test_ret)
|
|
|
|
dashboard_submit(PARTS Done RETURN_VALUE _submit_ret)
|
|
"""
|
|
return _script
|
|
|
|
|
|
def parse_cdash_args(args):
|
|
BUILD_JOBS = multiprocessing.cpu_count()
|
|
DASHBOARD_MODE = "Continuous"
|
|
DASHBOARD_STAGES = [
|
|
"Start",
|
|
"Update",
|
|
"Configure",
|
|
"Build",
|
|
"Test",
|
|
"MemCheck",
|
|
"Coverage",
|
|
"Submit",
|
|
]
|
|
SOURCE_DIR = os.getcwd()
|
|
BINARY_DIR = os.path.join(SOURCE_DIR, "build")
|
|
SITE = socket.gethostname()
|
|
SUBMIT_URL = f"{_BASE_URL}/submit.php?project={_PROJECT_NAME}"
|
|
|
|
parser = argparse.ArgumentParser()
|
|
|
|
parser.add_argument(
|
|
"-n", "--name", help="Job name", default=None, type=str, required=True
|
|
)
|
|
parser.add_argument("-s", "--site", help="Site name", default=SITE, type=str)
|
|
parser.add_argument(
|
|
"-q", "--quiet", help="Disable printing logs", action="store_true"
|
|
)
|
|
parser.add_argument(
|
|
"-c",
|
|
"--coverage",
|
|
help="Enable code coverage",
|
|
choices=("all", "tests", "samples"),
|
|
type=str,
|
|
default=None,
|
|
)
|
|
parser.add_argument(
|
|
"-j",
|
|
"--build-jobs",
|
|
help="Number of build tasks",
|
|
default=BUILD_JOBS,
|
|
type=int,
|
|
)
|
|
parser.add_argument(
|
|
"-B",
|
|
"--binary-dir",
|
|
help="Build directory",
|
|
default=BINARY_DIR,
|
|
type=str,
|
|
)
|
|
parser.add_argument(
|
|
"-S",
|
|
"--source-dir",
|
|
help="Source directory",
|
|
default=SOURCE_DIR,
|
|
type=str,
|
|
)
|
|
parser.add_argument(
|
|
"-F",
|
|
"--clean",
|
|
help="Remove existing build directory",
|
|
action="store_true",
|
|
)
|
|
parser.add_argument(
|
|
"-M",
|
|
"--mode",
|
|
help="Dashboard mode",
|
|
default=DASHBOARD_MODE,
|
|
choices=("Continuous", "Nightly", "Experimental"),
|
|
type=str,
|
|
)
|
|
parser.add_argument(
|
|
"-T",
|
|
"--stages",
|
|
help="Dashboard stages",
|
|
nargs="+",
|
|
default=DASHBOARD_STAGES,
|
|
choices=DASHBOARD_STAGES,
|
|
type=str,
|
|
)
|
|
parser.add_argument(
|
|
"--submit-url",
|
|
help="CDash submission site",
|
|
default=SUBMIT_URL,
|
|
type=str,
|
|
)
|
|
parser.add_argument(
|
|
"--repeat-until-pass",
|
|
help="<N> for --repeat until-pass:<N>",
|
|
default=None,
|
|
type=int,
|
|
)
|
|
parser.add_argument(
|
|
"--repeat-until-fail",
|
|
help="<N> for --repeat until-fail:<N>",
|
|
default=None,
|
|
type=int,
|
|
)
|
|
parser.add_argument(
|
|
"--repeat-after-timeout",
|
|
help="<N> for --repeat after-timeout:<N>",
|
|
default=None,
|
|
type=int,
|
|
)
|
|
parser.add_argument(
|
|
"--disable-cdash",
|
|
help="Disable submitting results to CDash dashboard",
|
|
action="store_true",
|
|
)
|
|
parser.add_argument(
|
|
"--require-cdash-submission",
|
|
help="Failure to submit results to CDash dashboard causes CTest failure",
|
|
action="store_true",
|
|
)
|
|
parser.add_argument(
|
|
"--gpu-targets",
|
|
help="GPU build architectures",
|
|
default=_DEFAULT_GPU_TARGETS,
|
|
type=str,
|
|
nargs="+",
|
|
)
|
|
parser.add_argument(
|
|
"--memcheck",
|
|
help="Run dynamic analysis tool",
|
|
default=None,
|
|
type=str,
|
|
choices=(
|
|
"ThreadSanitizer",
|
|
"AddressSanitizer",
|
|
"LeakSanitizer",
|
|
"MemorySanitizer",
|
|
"UndefinedBehaviorSanitizer",
|
|
),
|
|
)
|
|
parser.add_argument(
|
|
"--linter",
|
|
help="Enable linting tool",
|
|
default=None,
|
|
type=str,
|
|
choices=("clang-tidy",),
|
|
)
|
|
parser.add_argument(
|
|
"--run-attempt",
|
|
help="If > 1, will enable verbose logging of tests",
|
|
default=1,
|
|
type=int,
|
|
)
|
|
|
|
return parser.parse_args(args)
|
|
|
|
|
|
def parse_args(args=None):
|
|
if args is None:
|
|
args = sys.argv[1:]
|
|
|
|
index = 0
|
|
input_args = []
|
|
ctest_args = []
|
|
cmake_args = []
|
|
data = [input_args, cmake_args, ctest_args]
|
|
cmd = os.path.basename(sys.argv[0])
|
|
|
|
for itr in args:
|
|
if itr == "--":
|
|
index += 1
|
|
if index > 2:
|
|
raise RuntimeError(
|
|
f"Usage: {cmd} <options> -- <cmake-args> -- <ctest-args>"
|
|
)
|
|
else:
|
|
data[index].append(itr)
|
|
|
|
cdash_args = parse_cdash_args(input_args)
|
|
|
|
if cdash_args.run_attempt > 1:
|
|
os.environ["ROCPROFILER_LOG_LEVEL"] = "info"
|
|
os.environ["ROCPROF_LOG_LEVEL"] = "info"
|
|
|
|
if cdash_args.coverage:
|
|
cmake_args += ["-DROCPROFILER_BUILD_CODECOV=ON"]
|
|
if cdash_args.coverage == "samples":
|
|
ctest_args += ["-L", "samples"]
|
|
elif cdash_args.coverage == "tests":
|
|
ctest_args += ["-L", "tests"]
|
|
|
|
if cdash_args.linter == "clang-tidy":
|
|
cmake_args += ["-DROCPROFILER_ENABLE_CLANG_TIDY=ON"]
|
|
|
|
if (
|
|
cdash_args.mode == "Nightly"
|
|
and not cdash_args.require_cdash_submission
|
|
and not cdash_args.disable_cdash
|
|
):
|
|
sys.stderr.write(
|
|
"Enabling --require-cdash-submission for Nightly mode. Use --disable-cdash to suppress\n"
|
|
)
|
|
sys.stderr.flush()
|
|
cdash_args.require_cdash_submission = True
|
|
|
|
def get_repeat_val(_param):
|
|
_value = getattr(cdash_args, f"repeat_{_param}".replace("-", "_"))
|
|
return [f"{_param}:{_value}"] if _value is not None and _value > 1 else []
|
|
|
|
repeat_args = (
|
|
get_repeat_val("until-pass")
|
|
+ get_repeat_val("until-fail")
|
|
+ get_repeat_val("after-timeout")
|
|
)
|
|
ctest_args += ["--repeat"] + repeat_args if len(repeat_args) > 0 else []
|
|
|
|
return [cdash_args, cmake_args, ctest_args]
|
|
|
|
|
|
def run(*args, **kwargs):
|
|
import subprocess
|
|
|
|
return subprocess.run(*args, **kwargs)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
args, cmake_args, ctest_args = parse_args()
|
|
|
|
if args.clean and os.path.exists(args.binary_dir):
|
|
if args.source_dir == args.binary_dir:
|
|
raise RuntimeError(
|
|
f"cannot clean binary directory == source directory ({args.source_dir})"
|
|
)
|
|
|
|
shutil.rmtree(args.binary_dir)
|
|
|
|
if not os.path.exists(args.binary_dir):
|
|
os.makedirs(args.binary_dir)
|
|
|
|
from textwrap import dedent
|
|
|
|
_config = dedent(generate_custom(args, cmake_args, ctest_args))
|
|
_script = dedent(generate_dashboard_script(args))
|
|
|
|
if not args.quiet:
|
|
sys.stderr.write(f"##### CTestCustom.cmake #####\n\n{_config}\n\n")
|
|
sys.stderr.write(f"##### dashboard.cmake #####\n\n{_script}\n\n")
|
|
|
|
with open(os.path.join(args.binary_dir, "CTestCustom.cmake"), "w") as f:
|
|
f.write(f"{_config}\n")
|
|
|
|
with open(os.path.join(args.binary_dir, "dashboard.cmake"), "w") as f:
|
|
f.write(f"{_script}\n")
|
|
|
|
CTEST_CMD = which("ctest", require=True)
|
|
|
|
dashboard_args = ["-D"]
|
|
for itr in args.stages:
|
|
dashboard_args.append(f"{args.mode}{itr}")
|
|
|
|
try:
|
|
verbose_options = (
|
|
"--progress",
|
|
"-V",
|
|
"-VV",
|
|
"--debug",
|
|
"--output-on-failure",
|
|
"-Q",
|
|
"--quiet",
|
|
)
|
|
if not args.quiet and len(ctest_args) == 0:
|
|
ctest_args = ["--output-on-failure", "-V"]
|
|
elif not args.quiet:
|
|
opts_union = [x for x in ctest_args if x in verbose_options]
|
|
if len(opts_union) == 0:
|
|
ctest_args += ["--progress", "--output-on-failure", "-V"]
|
|
|
|
# always fail if no tests exist
|
|
ctest_args += ["--no-tests=error"]
|
|
|
|
run_args = (
|
|
[CTEST_CMD]
|
|
+ dashboard_args
|
|
+ [
|
|
"-S",
|
|
os.path.join(args.binary_dir, "dashboard.cmake"),
|
|
]
|
|
+ ctest_args
|
|
)
|
|
|
|
print("CTest command: {}".format(" ".join(run_args)))
|
|
|
|
run(
|
|
run_args,
|
|
check=True,
|
|
)
|
|
finally:
|
|
if "-VV" not in ctest_args and not args.quiet:
|
|
tag = None
|
|
tagfpath = os.path.join(args.binary_dir, "Testing/TAG")
|
|
with open(tagfpath, "r") as f:
|
|
tag = f.readline().strip()
|
|
|
|
for file in glob.glob(
|
|
os.path.join(args.binary_dir, "Testing", tag, "**"),
|
|
recursive=True,
|
|
):
|
|
if not os.path.isfile(file):
|
|
continue
|
|
elif "CoverageLog-" in os.path.basename(file):
|
|
continue
|
|
elif "Test.xml" in os.path.basename(file):
|
|
continue
|
|
print(f"\n\n###### Reading {file}... ######\n\n")
|
|
with open(file, "r") as inpf:
|
|
fdata = inpf.read()
|
|
print(fdata)
|
|
# print out memory checker files
|
|
for file in glob.glob(
|
|
os.path.join(args.binary_dir, "Testing/Temporary/MemoryChecker.*"),
|
|
recursive=True,
|
|
):
|
|
if not os.path.isfile(file):
|
|
continue
|
|
print(f"\n\n\n###### Reading {file}... ######\n\n\n")
|
|
with open(file, "r") as inpf:
|
|
fdata = inpf.read()
|
|
print(fdata)
|
|
|
|
if _GCOVR_GENERATE_CMD:
|
|
print("\n\n\n###### Generating Cobertura XML... ######")
|
|
print(
|
|
"###### GCOVR command: '{}'... ######\n".format(" ".join(_GCOVR_GENERATE_CMD))
|
|
)
|
|
with open("/dev/null", "w") as devnull:
|
|
run(_GCOVR_GENERATE_CMD, stderr=devnull)
|
|
|
|
codecov_dir = os.path.join(args.source_dir, ".codecov")
|
|
codecov_xml = os.path.join(codecov_dir, f"{args.coverage}.xml")
|
|
codecov_md = os.path.join(codecov_dir, f"{args.coverage}.md")
|
|
|
|
PYCOBERTURA_CMD = which("pycobertura", require=False)
|
|
if PYCOBERTURA_CMD:
|
|
run(
|
|
[
|
|
PYCOBERTURA_CMD,
|
|
"show",
|
|
"--format",
|
|
"markdown",
|
|
"--output",
|
|
codecov_md,
|
|
codecov_xml,
|
|
]
|
|
)
|