1517a398bf
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
159 Zeilen
5.7 KiB
YAML
159 Zeilen
5.7 KiB
YAML
name: rocprofiler-sdk Advanced Analysis
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
pull_request:
|
|
paths:
|
|
- '.github/workflows/rocprofiler-sdk-codeql.yml'
|
|
- 'projects/rocprofiler-sdk/**'
|
|
- '!**/*.md'
|
|
- '!**/*.rtf'
|
|
- '!**/*.rst'
|
|
- '!**/.markdownlint-ci2.yaml'
|
|
- '!**/.readthedocs.yaml'
|
|
- '!**/.spellcheck.local.yaml'
|
|
- '!**/.wordlist.txt'
|
|
- '!projects/rocprofiler-sdk/CODEOWNERS'
|
|
- '!projects/rocprofiler-sdk/source/docs/**'
|
|
|
|
push:
|
|
branches:
|
|
- develop
|
|
paths:
|
|
- '.github/workflows/rocprofiler-sdk-codeql.yml'
|
|
- 'projects/rocprofiler-sdk/**'
|
|
- '!**/*.md'
|
|
- '!**/*.rtf'
|
|
- '!**/*.rst'
|
|
- '!**/.markdownlint-ci2.yaml'
|
|
- '!**/.readthedocs.yaml'
|
|
- '!**/.spellcheck.local.yaml'
|
|
- '!**/.wordlist.txt'
|
|
- '!projects/rocprofiler-sdk/CODEOWNERS'
|
|
- '!projects/rocprofiler-sdk/source/docs/**'
|
|
|
|
env:
|
|
ROCM_PATH: "/opt/rocm"
|
|
GPU_TARGETS: "gfx906;gfx908;gfx90a;gfx942;gfx950;gfx1030;gfx1100;gfx1101;gfx1102;gfx1201"
|
|
PATH: "/usr/bin:$PATH"
|
|
EXCLUDED_PATHS: "external /tmp/build/external"
|
|
GLOBAL_CMAKE_OPTIONS: "-DROCPROFILER_INTERNAL_RCCL_API_TRACE=ON"
|
|
ENABLE_HIP_CLR_BUILD: "false"
|
|
|
|
jobs:
|
|
analyze:
|
|
name: Analyze (${{ matrix.language }})
|
|
# Runner size impacts CodeQL analysis time. To learn more, please see:
|
|
# - https://gh.io/recommended-hardware-resources-for-running-codeql
|
|
# - https://gh.io/supported-runners-and-hardware-resources
|
|
# - https://gh.io/using-larger-runners (GitHub.com only)
|
|
# Consider using larger runners or machines with greater resources for possible analysis time improvements.
|
|
runs-on: ubuntu-latest
|
|
|
|
container: rocm/dev-ubuntu-22.04:latest
|
|
permissions:
|
|
# required for all workflows
|
|
security-events: write
|
|
|
|
# required to fetch internal or private CodeQL packs
|
|
packages: read
|
|
|
|
# only required for workflows in private repositories
|
|
actions: read
|
|
contents: read
|
|
|
|
strategy:
|
|
fail-fast: false
|
|
matrix:
|
|
include:
|
|
# cpp analysis disabled - takes too long and frequently times out
|
|
# - language: cpp
|
|
# build-mode: manual
|
|
- language: python
|
|
build-mode: none
|
|
- language : actions
|
|
build-mode: none
|
|
steps:
|
|
- name: Install requirements
|
|
timeout-minutes: 10
|
|
shell: bash
|
|
env:
|
|
DEBIAN_FRONTEND: noninteractive
|
|
run: |
|
|
sudo apt update -y
|
|
sudo apt upgrade -y
|
|
sudo apt install -y software-properties-common wget rocm-llvm-dev
|
|
sudo apt-add-repository ppa:git-core/ppa
|
|
wget https://repo.radeon.com/rocm/rocm.gpg.key -O - | gpg --dearmor | sudo tee /etc/apt/keyrings/rocm.gpg > /dev/null
|
|
sudo tee /etc/apt/sources.list.d/rocm.list << EOF
|
|
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/rocm/apt/7.1 jammy main
|
|
deb [arch=amd64 signed-by=/etc/apt/keyrings/rocm.gpg] https://repo.radeon.com/graphics/7.1/ubuntu jammy main
|
|
EOF
|
|
sudo apt update -y
|
|
sudo apt upgrade -y
|
|
sudo apt install -y git build-essential cmake g++-11 g++-12 python3-pip libdw-dev libsqlite3-dev rccl-dev libva-amdgpu-dev rocdecode-dev rocjpeg-dev
|
|
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-11 10 --slave /usr/bin/g++ g++ /usr/bin/g++-11 --slave /usr/bin/gcov gcov /usr/bin/gcov-11
|
|
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 20 --slave /usr/bin/g++ g++ /usr/bin/g++-12 --slave /usr/bin/gcov gcov /usr/bin/gcov-12
|
|
git config --global --add safe.directory '*'
|
|
|
|
- uses: actions/checkout@v4
|
|
with:
|
|
sparse-checkout: |
|
|
projects/rocprofiler-sdk
|
|
projects/clr
|
|
projects/hip
|
|
.github/workflows/rocprofiler-sdk-codeql.yml
|
|
.github/workflows/rocprofiler-sdk-formatting.yml
|
|
submodules: 'true'
|
|
|
|
# Initializes the CodeQL tools for scanning.
|
|
- name: Initialize CodeQL
|
|
uses: github/codeql-action/init@v3
|
|
with:
|
|
languages: ${{ matrix.language }}
|
|
build-mode: ${{ matrix.build-mode }}
|
|
queries: security-extended
|
|
|
|
- name: Build and Install HIP
|
|
if: ${{ env.ENABLE_HIP_CLR_BUILD == 'true' }}
|
|
shell: bash
|
|
working-directory: projects
|
|
run: |
|
|
export HIP_DIR=$PWD/hip
|
|
export CLR_DIR=$PWD/clr
|
|
export LD_LIBRARY_PATH=${{ env.ROCM_PATH }}/lib:${{ env.ROCM_PATH }}/llvm/lib:$LD_LIBRARY_PATH
|
|
export PATH=${{ env.ROCM_PATH }}/bin:${{ env.ROCM_PATH }}/llvm/bin:$PATH
|
|
echo "Install HIP..."
|
|
cd $CLR_DIR
|
|
pip install CppHeaderParser
|
|
cmake \
|
|
-DHIP_COMMON_DIR=$HIP_DIR \
|
|
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
|
|
-DHIP_PLATFORM=amd \
|
|
-DCMAKE_PREFIX_PATH='${{ env.ROCM_PATH }};${{ env.ROCM_PATH }}/llvm' \
|
|
-DCMAKE_INSTALL_PREFIX=${{ env.ROCM_PATH }} \
|
|
-DHIP_LLVM_ROOT=${{ env.ROCM_PATH }}/lib/llvm \
|
|
-DHIP_CATCH_TEST=0 \
|
|
-DCLR_BUILD_HIP=ON \
|
|
-DCLR_BUILD_OCL=ON \
|
|
-S $CLR_DIR \
|
|
-B build
|
|
cmake --build build --target all --parallel 8
|
|
cmake --build build --target install
|
|
echo "✅ HIP Installation complete!"
|
|
|
|
- name: Configure and Build
|
|
timeout-minutes: 30
|
|
shell: bash
|
|
run: |
|
|
cd projects/rocprofiler-sdk
|
|
python3 -m pip install -r requirements.txt
|
|
cmake -B /tmp/build -DCMAKE_PREFIX_PATH=/opt/rocm ${{ env.GLOBAL_CMAKE_OPTIONS }} -DPython3_EXECUTABLE=$(which python3) .
|
|
cmake --build /tmp/build --target all --parallel 16
|
|
rm -rf ${EXCLUDED_PATHS}
|
|
|
|
- name: Perform CodeQL Analysis
|
|
uses: github/codeql-action/analyze@v3
|
|
with:
|
|
category: "/language:${{matrix.language}}"
|