Files
rocm-systems/projects/rocprofiler-sdk/source/scripts/run-ci.py
T
Benjamin Welton 1517a398bf [rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements

Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.

The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.

The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations

New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
  delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
  high contention with multiple threads rapidly filling buffers

HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.

Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
  initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues

Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering

* [rocprofiler-sdk] Revert buffer pool design changes

Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design

This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test

The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.

* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization

- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
  - hsa/queue.cpp (lines 105, 210)
  - hsa/async_copy.cpp (line 344)
  - hsa/hsa_barrier.cpp (line 43)
  - buffer.cpp (lines 107, 138, 185)

This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.

* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation

Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.

Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
  order check with timestamp-based validation that retirement timestamp >=
  max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check

Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.

* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal

Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.

* [rccl] Remove orphaned rocSHMEM gitlink

Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".

* [rocprofiler-sdk] Add HSA ABI version 0x09 support

Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).

* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations

This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.

Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
  when flushing buffers, as this indicates buffers were already flushed
  during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
  for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
  to prevent correlation ID creation after finalization starts

Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp

* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling

Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.

Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers

The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration

- generatePerfetto.cpp: Move output_stream into shared_state to prevent
  use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
  configuration for better maintainability

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions

The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.

Co-Authored-By: Claude <noreply@anthropic.com>

* Revert "[rccl] Remove orphaned rocSHMEM gitlink"

This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.

* [rocprofiler-sdk] Revert registration.cpp changes

Revert changes to registration.cpp to match develop branch.

Co-Authored-By: Claude <noreply@anthropic.com>

* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix output_stream move ctor/assignment operator

* Fix erroneous revert of registration.cpp

* Fix handling of fini status in correlation ID construction

* [rocprofiler-sdk] Fix OMPT segfault during finalization

Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.

The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.

Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null

* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization

Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.

This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
2026-01-27 13:27:54 -05:00

738 строки
24 KiB
Python
Исполняемый файл

#!/usr/bin/env python3
# MIT License
#
# Copyright (c) 2024-2025 Advanced Micro Devices, Inc. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
# THE SOFTWARE.
import os
import re
import sys
import glob
import socket
import shutil
import argparse
import multiprocessing
# this constant is used to define CTEST_PROJECT_NAME
# and default value for CTEST_SUBMIT_URL
# _PROJECT_NAME = "rocprofiler-v2-internal"
# _BASE_URL = "10.194.116.31/cdash"
_PROJECT_NAME = "rocprofiler-sdk-alt"
_BASE_URL = "my.cdash.org"
_GCOVR_GENERATE_CMD = None
# these are various default values
_VISIBLE_PROJECT_NAME = _PROJECT_NAME.replace("-internal", "")
_DEFAULT_ROCM_PATH = os.path.realpath(os.environ.get("ROCM_PATH", "/opt/rocm"))
_DEFAULT_INSTALL_PREFIX = (
os.path.realpath(_DEFAULT_ROCM_PATH)
if os.path.exists(_DEFAULT_ROCM_PATH)
else f"/opt/{_VISIBLE_PROJECT_NAME}"
)
_DEFAULT_GPU_TARGETS = os.environ.get(
"GPU_TARGETS",
"gfx900 gfx906 gfx908 gfx90a gfx942 gfx950 gfx1030 gfx1100 gfx1101 gfx1102",
).split()
def which(cmd, require):
v = shutil.which(cmd)
if require and v is None:
raise RuntimeError(f"{cmd} not found")
return v if v is not None else ""
def generate_custom(args, cmake_args, ctest_args):
if not os.path.exists(args.binary_dir):
os.makedirs(args.binary_dir)
if args.memcheck is not None:
if args.coverage:
raise ValueError(
f"Enabling --memcheck={args.memcheck} and --coverage not supported"
)
cmake_args += [f"-DROCPROFILER_MEMCHECK={args.memcheck}"]
NAME = args.name
SITE = args.site
BUILD_JOBS = args.build_jobs
SUBMIT_URL = args.submit_url
SOURCE_DIR = os.path.realpath(args.source_dir)
BINARY_DIR = os.path.realpath(args.binary_dir)
CMAKE_ARGS = " ".join(cmake_args)
CTEST_ARGS = " ".join(['"{}"'.format(x.replace('"', '\\"')) for x in ctest_args])
GIT_CMD = which("git", require=True)
GCOV_CMD = which("gcov", require=False)
GCOVR_CMD = which("gcovr", require=False)
CMAKE_CMD = which("cmake", require=True)
# CTEST_CMD = which("ctest", require=True)
NAME = re.sub(r"(.*)-([0-9]+)/merge", "PR_\\2_\\1", NAME)
def option_in_args(_key, _args):
_union = [x for x in _args if f"{_key}=" in x]
return len(_union) != 0
DEFAULT_CMAKE_ARGS = []
for key, value in dict(
[
["CMAKE_BUILD_TYPE", "RelWithDebInfo"],
["CMAKE_INSTALL_PREFIX", f"{_DEFAULT_INSTALL_PREFIX}"],
["CPACK_PACKAGING_INSTALL_PREFIX", f"{_DEFAULT_INSTALL_PREFIX}"],
["CPACK_GENERATOR", "DEB;RPM;TGZ"],
["Python3_EXECUTABLE", sys.executable],
]
+ [[f"ROCPROFILER_BUILD_{x}", "ON"] for x in ["CI", "TESTS", "SAMPLES"]]
).items():
if not option_in_args(key, cmake_args):
DEFAULT_CMAKE_ARGS += [f"-D{key}={value}"]
DEFAULT_CMAKE_ARGS = " ".join(DEFAULT_CMAKE_ARGS)
GPU_TARGETS = ";".join(args.gpu_targets)
MEMCHECK_TYPE = "" if args.memcheck is None else args.memcheck
MEMCHECK_SANITIZER_OPTIONS = ""
MEMCHECK_SUPPRESSION_FILE = ""
if MEMCHECK_TYPE == "AddressSanitizer":
# print_suppressions=1 shows which suppressions matched during the run
MEMCHECK_SANITIZER_OPTIONS = (
"detect_leaks=0 use_sigaltstack=0 print_suppressions=1"
)
MEMCHECK_SUPPRESSION_FILE = (
f"{SOURCE_DIR}/source/scripts/address-sanitizer-suppr.txt"
)
os.environ["ASAN_OPTIONS"] = " ".join(
[
"detect_leaks=0",
"use_sigaltstack=0",
"print_suppressions=1",
f"suppressions={SOURCE_DIR}/source/scripts/address-sanitizer-suppr.txt",
os.environ.get("ASAN_OPTIONS", ""),
]
)
elif MEMCHECK_TYPE == "LeakSanitizer":
# fast_unwind_on_malloc=1 avoids deadlock in libgcc unwinder during early init
# print_suppressions=1 shows which suppressions matched during the run
MEMCHECK_SANITIZER_OPTIONS = "fast_unwind_on_malloc=1 print_suppressions=1"
MEMCHECK_SUPPRESSION_FILE = (
f"{SOURCE_DIR}/source/scripts/leak-sanitizer-suppr.txt"
)
os.environ["LSAN_OPTIONS"] = " ".join(
[
f"suppressions={SOURCE_DIR}/source/scripts/leak-sanitizer-suppr.txt",
"fast_unwind_on_malloc=1",
"print_suppressions=1",
os.environ.get("LSAN_OPTIONS", ""),
]
)
elif MEMCHECK_TYPE == "ThreadSanitizer":
# print_suppressions=1 shows which suppressions matched during the run
external_symbolizer_path = ""
for version in range(8, 20):
_symbolizer = shutil.which(f"llvm-symbolizer-{version}")
if _symbolizer:
external_symbolizer_path = f"external_symbolizer_path={_symbolizer}"
os.environ["TSAN_OPTIONS"] = " ".join(
[
"history_size=5",
"detect_deadlocks=0",
"print_suppressions=1",
f"suppressions={SOURCE_DIR}/source/scripts/thread-sanitizer-suppr.txt",
external_symbolizer_path,
os.environ.get("TSAN_OPTIONS", ""),
]
)
elif MEMCHECK_TYPE == "UndefinedBehaviorSanitizer":
MEMCHECK_SUPPRESSION_FILE = (
f"{SOURCE_DIR}/source/scripts/undef-behavior-sanitizer-suppr.txt"
)
os.environ["UBSAN_OPTIONS"] = " ".join(
[
"print_stacktrace=1",
f"suppressions={SOURCE_DIR}/source/scripts/undef-behavior-sanitizer-suppr.txt",
os.environ.get("UBSAN_OPTIONS", ""),
]
)
# Print suppression file contents for debugging
if MEMCHECK_TYPE:
print(f"\n{'=' * 60}")
print(f"Sanitizer: {MEMCHECK_TYPE}")
print(f"{'=' * 60}")
# Print environment variables for sanitizers that use them
for env_var in ["TSAN_OPTIONS", "UBSAN_OPTIONS", "ASAN_OPTIONS", "LSAN_OPTIONS"]:
if env_var in os.environ:
print(f"\n{env_var}:")
print(f" {os.environ[env_var]}")
print(f"\n{'=' * 60}\n")
codecov_exclude = [
"/usr/.*",
"/opt/.*",
"external/.*",
"samples/.*",
"tests/.*",
".*/external/.*",
".*/samples/.*",
".*/tests/.*",
".*/details/.*",
".*/counters/parser/.*",
]
if args.coverage == "samples":
codecov_exclude += [
".*/lib/common/.*",
".*/lib/output/.*",
".*/lib/att-tool/.*",
".*/lib/rocprofiler-sdk-tool/.*",
]
COVERAGE_EXCLUDE = ";".join(codecov_exclude)
if args.coverage and GCOVR_CMD:
global _GCOVR_GENERATE_CMD
codecov_dir = os.path.join(args.source_dir, ".codecov")
codecov_xml = os.path.join(codecov_dir, f"{args.coverage}.xml")
codecov_html = os.path.join(codecov_dir, f"{args.coverage}.html")
if not os.path.exists(codecov_dir):
os.makedirs(codecov_dir)
with open(os.path.join(codecov_dir, ".gitignore"), "w") as f:
f.write("/*\n")
gcovr_codecov_exclude = []
for itr in codecov_exclude:
gcovr_codecov_exclude += ["--exclude", f"{itr}"]
_GCOVR_GENERATE_CMD = (
[GCOVR_CMD]
+ [
"--root",
f"{args.source_dir}",
"--exclude-unreachable-branches",
"--exclude-throw-branches",
"--gcov-ignore-parse-errors",
"--gcov-executable",
GCOV_CMD,
"-s",
"-p",
"--xml",
codecov_xml,
"--html-details",
codecov_html,
]
+ gcovr_codecov_exclude
+ [args.source_dir]
)
return f"""
set(CTEST_PROJECT_NAME "{_PROJECT_NAME}")
set(CTEST_NIGHTLY_START_TIME "05:00:00 UTC")
set(CTEST_DROP_METHOD "https")
set(CTEST_DROP_SITE_CDASH TRUE)
set(CTEST_SUBMIT_URL "https://{SUBMIT_URL}")
set(CTEST_UPDATE_TYPE git)
set(CTEST_UPDATE_VERSION_ONLY TRUE)
set(CTEST_GIT_COMMAND {GIT_CMD})
set(CTEST_GIT_INIT_SUBMODULES FALSE)
set(CTEST_OUTPUT_ON_FAILURE TRUE)
set(CTEST_USE_LAUNCHERS TRUE)
set(CMAKE_CTEST_ARGUMENTS "--output-on-failure" {CTEST_ARGS})
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_ERRORS "100")
set(CTEST_CUSTOM_MAXIMUM_NUMBER_OF_WARNINGS "100")
set(CTEST_CUSTOM_MAXIMUM_PASSED_TEST_OUTPUT_SIZE "51200")
set(CTEST_CUSTOM_COVERAGE_EXCLUDE "{COVERAGE_EXCLUDE}")
set(CTEST_MEMORYCHECK_TYPE "{MEMCHECK_TYPE}")
set(CTEST_MEMORYCHECK_SUPPRESSIONS_FILE "{MEMCHECK_SUPPRESSION_FILE}")
set(CTEST_MEMORYCHECK_SANITIZER_OPTIONS "{MEMCHECK_SANITIZER_OPTIONS}")
set(CTEST_SITE "{SITE}")
set(CTEST_BUILD_NAME "{NAME}")
set(CTEST_SOURCE_DIRECTORY {SOURCE_DIR})
set(CTEST_BINARY_DIRECTORY {BINARY_DIR})
set(CTEST_CONFIGURE_COMMAND "{CMAKE_CMD} -B {BINARY_DIR} {SOURCE_DIR} {DEFAULT_CMAKE_ARGS} -DGPU_TARGETS={GPU_TARGETS} {CMAKE_ARGS}")
set(CTEST_BUILD_COMMAND "{CMAKE_CMD} --build {BINARY_DIR} --target all --parallel {BUILD_JOBS}")
set(CTEST_COVERAGE_COMMAND {GCOV_CMD})
"""
def generate_dashboard_script(args):
CODECOV = 1 if args.coverage else 0
DASHBOARD_MODE = args.mode
SOURCE_DIR = os.path.realpath(args.source_dir)
BINARY_DIR = os.path.realpath(args.binary_dir)
MEMCHECK = 1 if args.memcheck is not None else 0
SUBMIT = 0 if args.disable_cdash else 1
STRICT_SUBMIT = 1 if args.require_cdash_submission else 0
ARGN = "${ARGN}"
SUBMIT_ERR = "${_cdash_submit_err}"
REPO_SOURCE_DIR = (
os.path.dirname(os.path.dirname((SOURCE_DIR)))
if not os.path.exists(os.path.join(SOURCE_DIR, ".git"))
else SOURCE_DIR
)
if args.memcheck == "ThreadSanitizer":
MEMCHECK = 0
_script = f"""
cmake_minimum_required(VERSION 3.21 FATAL_ERROR)
macro(dashboard_submit)
if("{SUBMIT}" GREATER 0)
ctest_submit({ARGN}
RETRY_COUNT 0
RETRY_DELAY 10
CAPTURE_CMAKE_ERROR _cdash_submit_err)
if(NOT _cdash_submit_err EQUAL 0)
if("{STRICT_SUBMIT}" GREATER 0)
message(FATAL_ERROR "CDash submission failed: {SUBMIT_ERR}")
else()
message(AUTHOR_WARNING "CDash submission failure ignored due to absence of --require-cdash-submission")
endif()
endif()
endif()
endmacro()
"""
_script += """
include("${CMAKE_CURRENT_LIST_DIR}/CTestCustom.cmake")
macro(handle_error _message _ret)
if(NOT ${${_ret}} EQUAL 0)
dashboard_submit(PARTS Done RETURN_VALUE _submit_ret)
message(FATAL_ERROR "${_message} failed: ${${_ret}}")
endif()
endmacro()
"""
STAGES = ";".join([itr.upper() for itr in args.stages])
_script += f"""
set(STAGES "{STAGES}")
ctest_start({DASHBOARD_MODE})
ctest_update(SOURCE "{REPO_SOURCE_DIR}" RETURN_VALUE _update_ret
CAPTURE_CMAKE_ERROR _update_err)
ctest_configure(BUILD "{BINARY_DIR}" RETURN_VALUE _configure_ret)
dashboard_submit(PARTS Start Update Configure RETURN_VALUE _submit_ret)
if(NOT _update_err EQUAL 0)
message(WARNING "ctest_update failed")
endif()
handle_error("Configure" _configure_ret)
if("BUILD" IN_LIST STAGES)
ctest_build(BUILD "{BINARY_DIR}" RETURN_VALUE _build_ret)
dashboard_submit(PARTS Build RETURN_VALUE _submit_ret)
handle_error("Build" _build_ret)
endif()
if("TEST" IN_LIST STAGES)
if("{MEMCHECK}" GREATER 0)
ctest_memcheck(BUILD "{BINARY_DIR}" RETURN_VALUE _test_ret)
dashboard_submit(PARTS Test RETURN_VALUE _submit_ret)
else()
ctest_test(BUILD "{BINARY_DIR}" RETURN_VALUE _test_ret)
dashboard_submit(PARTS Test RETURN_VALUE _submit_ret)
endif()
endif()
if("{CODECOV}" GREATER 0 AND "COVERAGE" IN_LIST STAGES)
ctest_coverage(
BUILD "{BINARY_DIR}"
RETURN_VALUE _coverage_ret
CAPTURE_CMAKE_ERROR _coverage_err)
dashboard_submit(PARTS Coverage RETURN_VALUE _submit_ret)
endif()
handle_error("Testing" _test_ret)
dashboard_submit(PARTS Done RETURN_VALUE _submit_ret)
"""
return _script
def parse_cdash_args(args):
BUILD_JOBS = multiprocessing.cpu_count()
DASHBOARD_MODE = "Continuous"
DASHBOARD_STAGES = [
"Start",
"Update",
"Configure",
"Build",
"Test",
"MemCheck",
"Coverage",
"Submit",
]
SOURCE_DIR = os.getcwd()
BINARY_DIR = os.path.join(SOURCE_DIR, "build")
SITE = socket.gethostname()
SUBMIT_URL = f"{_BASE_URL}/submit.php?project={_PROJECT_NAME}"
parser = argparse.ArgumentParser()
parser.add_argument(
"-n", "--name", help="Job name", default=None, type=str, required=True
)
parser.add_argument("-s", "--site", help="Site name", default=SITE, type=str)
parser.add_argument(
"-q", "--quiet", help="Disable printing logs", action="store_true"
)
parser.add_argument(
"-c",
"--coverage",
help="Enable code coverage",
choices=("all", "tests", "samples"),
type=str,
default=None,
)
parser.add_argument(
"-j",
"--build-jobs",
help="Number of build tasks",
default=BUILD_JOBS,
type=int,
)
parser.add_argument(
"-B",
"--binary-dir",
help="Build directory",
default=BINARY_DIR,
type=str,
)
parser.add_argument(
"-S",
"--source-dir",
help="Source directory",
default=SOURCE_DIR,
type=str,
)
parser.add_argument(
"-F",
"--clean",
help="Remove existing build directory",
action="store_true",
)
parser.add_argument(
"-M",
"--mode",
help="Dashboard mode",
default=DASHBOARD_MODE,
choices=("Continuous", "Nightly", "Experimental"),
type=str,
)
parser.add_argument(
"-T",
"--stages",
help="Dashboard stages",
nargs="+",
default=DASHBOARD_STAGES,
choices=DASHBOARD_STAGES,
type=str,
)
parser.add_argument(
"--submit-url",
help="CDash submission site",
default=SUBMIT_URL,
type=str,
)
parser.add_argument(
"--repeat-until-pass",
help="<N> for --repeat until-pass:<N>",
default=None,
type=int,
)
parser.add_argument(
"--repeat-until-fail",
help="<N> for --repeat until-fail:<N>",
default=None,
type=int,
)
parser.add_argument(
"--repeat-after-timeout",
help="<N> for --repeat after-timeout:<N>",
default=None,
type=int,
)
parser.add_argument(
"--disable-cdash",
help="Disable submitting results to CDash dashboard",
action="store_true",
)
parser.add_argument(
"--require-cdash-submission",
help="Failure to submit results to CDash dashboard causes CTest failure",
action="store_true",
)
parser.add_argument(
"--gpu-targets",
help="GPU build architectures",
default=_DEFAULT_GPU_TARGETS,
type=str,
nargs="+",
)
parser.add_argument(
"--memcheck",
help="Run dynamic analysis tool",
default=None,
type=str,
choices=(
"ThreadSanitizer",
"AddressSanitizer",
"LeakSanitizer",
"MemorySanitizer",
"UndefinedBehaviorSanitizer",
),
)
parser.add_argument(
"--linter",
help="Enable linting tool",
default=None,
type=str,
choices=("clang-tidy",),
)
parser.add_argument(
"--run-attempt",
help="If > 1, will enable verbose logging of tests",
default=1,
type=int,
)
return parser.parse_args(args)
def parse_args(args=None):
if args is None:
args = sys.argv[1:]
index = 0
input_args = []
ctest_args = []
cmake_args = []
data = [input_args, cmake_args, ctest_args]
cmd = os.path.basename(sys.argv[0])
for itr in args:
if itr == "--":
index += 1
if index > 2:
raise RuntimeError(
f"Usage: {cmd} <options> -- <cmake-args> -- <ctest-args>"
)
else:
data[index].append(itr)
cdash_args = parse_cdash_args(input_args)
if cdash_args.run_attempt > 1:
os.environ["ROCPROFILER_LOG_LEVEL"] = "info"
os.environ["ROCPROF_LOG_LEVEL"] = "info"
if cdash_args.coverage:
cmake_args += ["-DROCPROFILER_BUILD_CODECOV=ON"]
if cdash_args.coverage == "samples":
ctest_args += ["-L", "samples"]
elif cdash_args.coverage == "tests":
ctest_args += ["-L", "tests"]
if cdash_args.linter == "clang-tidy":
cmake_args += ["-DROCPROFILER_ENABLE_CLANG_TIDY=ON"]
if (
cdash_args.mode == "Nightly"
and not cdash_args.require_cdash_submission
and not cdash_args.disable_cdash
):
sys.stderr.write(
"Enabling --require-cdash-submission for Nightly mode. Use --disable-cdash to suppress\n"
)
sys.stderr.flush()
cdash_args.require_cdash_submission = True
def get_repeat_val(_param):
_value = getattr(cdash_args, f"repeat_{_param}".replace("-", "_"))
return [f"{_param}:{_value}"] if _value is not None and _value > 1 else []
repeat_args = (
get_repeat_val("until-pass")
+ get_repeat_val("until-fail")
+ get_repeat_val("after-timeout")
)
ctest_args += ["--repeat"] + repeat_args if len(repeat_args) > 0 else []
return [cdash_args, cmake_args, ctest_args]
def run(*args, **kwargs):
import subprocess
return subprocess.run(*args, **kwargs)
if __name__ == "__main__":
args, cmake_args, ctest_args = parse_args()
if args.clean and os.path.exists(args.binary_dir):
if args.source_dir == args.binary_dir:
raise RuntimeError(
f"cannot clean binary directory == source directory ({args.source_dir})"
)
shutil.rmtree(args.binary_dir)
if not os.path.exists(args.binary_dir):
os.makedirs(args.binary_dir)
from textwrap import dedent
_config = dedent(generate_custom(args, cmake_args, ctest_args))
_script = dedent(generate_dashboard_script(args))
if not args.quiet:
sys.stderr.write(f"##### CTestCustom.cmake #####\n\n{_config}\n\n")
sys.stderr.write(f"##### dashboard.cmake #####\n\n{_script}\n\n")
with open(os.path.join(args.binary_dir, "CTestCustom.cmake"), "w") as f:
f.write(f"{_config}\n")
with open(os.path.join(args.binary_dir, "dashboard.cmake"), "w") as f:
f.write(f"{_script}\n")
CTEST_CMD = which("ctest", require=True)
dashboard_args = ["-D"]
for itr in args.stages:
dashboard_args.append(f"{args.mode}{itr}")
try:
verbose_options = (
"--progress",
"-V",
"-VV",
"--debug",
"--output-on-failure",
"-Q",
"--quiet",
)
if not args.quiet and len(ctest_args) == 0:
ctest_args = ["--output-on-failure", "-V"]
elif not args.quiet:
opts_union = [x for x in ctest_args if x in verbose_options]
if len(opts_union) == 0:
ctest_args += ["--progress", "--output-on-failure", "-V"]
# always fail if no tests exist
ctest_args += ["--no-tests=error"]
run_args = (
[CTEST_CMD]
+ dashboard_args
+ [
"-S",
os.path.join(args.binary_dir, "dashboard.cmake"),
]
+ ctest_args
)
print("CTest command: {}".format(" ".join(run_args)))
run(
run_args,
check=True,
)
finally:
if "-VV" not in ctest_args and not args.quiet:
tag = None
tagfpath = os.path.join(args.binary_dir, "Testing/TAG")
with open(tagfpath, "r") as f:
tag = f.readline().strip()
for file in glob.glob(
os.path.join(args.binary_dir, "Testing", tag, "**"),
recursive=True,
):
if not os.path.isfile(file):
continue
elif "CoverageLog-" in os.path.basename(file):
continue
elif "Test.xml" in os.path.basename(file):
continue
print(f"\n\n###### Reading {file}... ######\n\n")
with open(file, "r") as inpf:
fdata = inpf.read()
print(fdata)
# print out memory checker files
for file in glob.glob(
os.path.join(args.binary_dir, "Testing/Temporary/MemoryChecker.*"),
recursive=True,
):
if not os.path.isfile(file):
continue
print(f"\n\n\n###### Reading {file}... ######\n\n\n")
with open(file, "r") as inpf:
fdata = inpf.read()
print(fdata)
if _GCOVR_GENERATE_CMD:
print("\n\n\n###### Generating Cobertura XML... ######")
print(
"###### GCOVR command: '{}'... ######\n".format(" ".join(_GCOVR_GENERATE_CMD))
)
with open("/dev/null", "w") as devnull:
run(_GCOVR_GENERATE_CMD, stderr=devnull)
codecov_dir = os.path.join(args.source_dir, ".codecov")
codecov_xml = os.path.join(codecov_dir, f"{args.coverage}.xml")
codecov_md = os.path.join(codecov_dir, f"{args.coverage}.md")
PYCOBERTURA_CMD = which("pycobertura", require=False)
if PYCOBERTURA_CMD:
run(
[
PYCOBERTURA_CMD,
"show",
"--format",
"markdown",
"--output",
codecov_md,
codecov_xml,
]
)