* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements
Buffer Pool Design
------------------
Replace the fixed array-based double buffer with a dynamic pool design to
fix race conditions that caused "internal correlation id was retired
prematurely" errors.
The original design had a race where flush callbacks could be delivered
out-of-order: when buffer 0 fills and begins flushing, writes go to
buffer 1. If buffer 1 fills before buffer 0's flush completes, the
buffer index wraps back to 0 (which may still be flushing). Independent
flush tasks submitted to the thread pool can complete out of order.
The new pool design:
- Uses a std::deque of buffer instances that grows as needed
- Allocates buffers from the pool when the current buffer needs to flush
- Serializes flushes with a mutex to ensure FIFO callback ordering
- Returns buffers to the pool after flush completion
- Eliminates the race between buffer selection and write operations
New Unit Tests
--------------
- buffer_correlation_ordering.cpp: Tests that API records are always
delivered before their corresponding retirement records
- buffer_ordering_stress.cpp: Stress tests buffer flush ordering under
high contention with multiple threads rapidly filling buffers
HSA Tool Hooks
--------------
Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that
waits for pending flush tasks before tool finalization, preventing
"retired prematurely" errors during HSA shutdown.
Sanitizer Improvements
----------------------
- LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder
- LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup)
- TSAN: Added suppression for false positive on C++11 thread-safe static
initialization in create_write_functor
- ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto
- Disabled attachment tests for sanitizers due to library preloading issues
Other Fixes
-----------
- Thread-trace agent test: Use heap-allocated callback state
- Correlation ID: Refactored reference counting and finalization ordering
* [rocprofiler-sdk] Revert buffer pool design changes
Revert buffer.cpp and buffer.hpp to the original double-buffer
design from develop branch. The pool-based redesign introduced
concerns about:
- Signal safety (mutex vs atomic_flag)
- API changes (flush() return type)
- Complexity of the new design
This revert removes:
- Dynamic buffer pool with std::deque
- std::mutex/condition_variable synchronization
- buffer_correlation_ordering.cpp test
- buffer_ordering_stress.cpp test
The underlying buffer flush ordering issue will need to be
addressed with a different approach that preserves the original
API and synchronization characteristics.
* [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization
- Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks
- Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning
- Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp)
- Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior:
- hsa/queue.cpp (lines 105, 210)
- hsa/async_copy.cpp (line 344)
- hsa/hsa_barrier.cpp (line 43)
- buffer.cpp (lines 107, 138, 185)
This ensures no correlation IDs are created once finalization starts (fini_status != 0),
preventing races between finalization and ongoing tracing operations.
* [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation
Buffer records are not guaranteed to arrive in any specific order. Tests and
samples should use timestamps for temporal ordering validation instead.
Changes:
- samples/external_correlation_id_request: Replace 'retired prematurely' arrival
order check with timestamp-based validation that retirement timestamp >=
max(end_timestamps) for records with the same correlation ID
- tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
- tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check
Correlation IDs are not guaranteed to be monotonically increasing when records
are sorted by timestamp. Temporal ordering should be validated using the
timestamp fields in each record.
* [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal
Restore the SYSTEM keyword to target_include_directories for
rocprofiler-sdk-fmt to match develop branch.
* [rccl] Remove orphaned rocSHMEM gitlink
Remove orphaned submodule reference that was introduced during a merge
but never had a corresponding .gitmodules entry, causing CI failures
with "fatal: no submodule mapping found in .gitmodules".
* [rocprofiler-sdk] Add HSA ABI version 0x09 support
Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which
introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release
functions (added in rocr-runtime SWDEV-561708).
* [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations
This commit consolidates fixes for handling the finalization status during
buffer flush operations across the SDK.
Changes:
- Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully
when flushing buffers, as this indicates buffers were already flushed
during finalization (not an error condition)
- HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check
for fini_status to allow operations during finalization process
- buffer.cpp: Revert fini_status checks to use > 0 for consistency
- correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging
to prevent correlation ID creation after finalization starts
Files modified:
- source/lib/rocprofiler-sdk-tool/tool.cpp
- tests/tools/json-tool.cpp
- source/lib/rocprofiler-sdk/tests/registration.cpp
- source/lib/rocprofiler-sdk/tests/roctx.cpp
- samples/api_buffered_tracing/client.cpp
- samples/counter_collection/buffered_client.cpp
- samples/counter_collection/device_counting_async_client.cpp
- samples/external_correlation_id_request/client.cpp
- samples/pc_sampling/client.cpp
- source/lib/rocprofiler-sdk/buffer.cpp
- source/lib/rocprofiler-sdk/context/correlation_id.cpp
- source/lib/rocprofiler-sdk/hsa/queue.cpp
- source/lib/rocprofiler-sdk/hsa/async_copy.cpp
- source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp
* [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling
Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls
in samples and tools. The ERROR_FINALIZED handling was overly complex
and the hsa_tool_hooks OnUnload synchronization is no longer needed.
Changes:
- Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code
- Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL
- Simplify buffer flush in tool.cpp and json-tool.cpp
- Remove ERROR_FINALIZED special handling from test files
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Fix output_stream move semantics to null source pointers
The default move constructor and move assignment operator for
output_stream did not null out the source's pointers after the move.
This caused double-close when the moved-from temporary was destroyed,
leading to use-after-free crashes (SIGSEGV in std::ostream::sentry).
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration
- generatePerfetto.cpp: Move output_stream into shared_state to prevent
use-after-free race conditions during Perfetto callback execution
- run-ci.py: Simplify and consolidate sanitizer environment variable
configuration for better maintainability
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions
The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required
for CTest to properly pass suppression files to the sanitizers during
memcheck runs.
Co-Authored-By: Claude <noreply@anthropic.com>
* Revert "[rccl] Remove orphaned rocSHMEM gitlink"
This reverts commit 1ad21003941355658fff8114fa27768f11a948f7.
* [rocprofiler-sdk] Revert registration.cpp changes
Revert changes to registration.cpp to match develop branch.
Co-Authored-By: Claude <noreply@anthropic.com>
* [rocprofiler-sdk] Remove suppression file content printing from run-ci.py
Co-Authored-By: Claude <noreply@anthropic.com>
* Fix output_stream move ctor/assignment operator
* Fix erroneous revert of registration.cpp
* Fix handling of fini status in correlation ID construction
* [rocprofiler-sdk] Fix OMPT segfault during finalization
Add nullptr checks in OMPT tracing code to handle the case where
correlation_tracing_service::construct() returns nullptr during
finalization. This fixes segfaults in openmp-target-sample and
tests.integration.execute.openmp-tools.
The correlation ID construction now returns nullptr when fini_status > 0,
but the OMPT callbacks were not checking for this, causing crashes when
dereferencing the null pointer during OpenMP runtime shutdown.
Changes:
- event_common(): Return nullptr early if correlation ID is null
- event(): Check for nullptr before calling sub_ref_count()
- ompt_task_create_callback(): Return early if correlation ID is null
- ompt_task_schedule_callback(): Return early if correlation ID is null
* [rocprofiler-sdk] Fix HSA API tracing segfault during finalization
Add nullptr check in hsa_api_impl::functor after correlation ID
construction. During finalization, correlation_service::construct()
returns nullptr, and without this check the code would dereference
the null pointer when accessing corr_id->internal.
This fixes the SEGV at address 0x000000000008 (null + 8 byte offset)
that occurs when HSA async event threads call hsa_signal_destroy
during runtime shutdown after finalization has started.
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Sets heavy GitHub CI workflows to not trigger on docs-only changes.
Specifically, sets azure-ci-dispatcher.yml and therock-ci.yml, as well as many rocprofiler workflows, to not trigger when the change consists entirely of docs-only files.
* [ROCProfiler-SDK] Remove 'gfx900' and 'gfx940' from GPU targets
* Remove unsupported GPU targets from workflow
* Remove gfx900 and gfx940 from GPU targets
* Increase rocDecode code coverage and add version check
* Update rocJPEG tests
* Fix rocJPEG tests
* Enable building tests/samples in rocm release compat workflow
* Readded rocJPEG test skips
* formatting
* Adding ROCm libraries for the code-coverage job
* Added return value check for error message and updated compatability to enable tests
* Disable rocm_release_compatibility samples and tests until openmp issue is resolved
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
* Changing CDash Project
* Fixing CI
* Fixing AQLProfile CDash
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI
* Fixing CI