develop
24 Коммитов
| Автор | SHA1 | Сообщение | Дата | |
|---|---|---|---|---|
|
|
1517a398bf |
[rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
d496bcef18 |
Fix dimension mismatch for multi-GPU systems with identical architect… (#1440)
* Fix dimension mismatch for multi-GPU systems with identical architectures This change addresses an issue where counter dimensions were incorrectly shared across all GPU agents with the same architecture name, even when those agents had different hardware configurations (e.g., different CU counts). Changes: - Updated getBlockDimensions() to accept agent ID instead of architecture name - Made dimension cache agent-specific instead of architecture-specific - Updated set_dimensions() in AST evaluation to use specific agent ID - Modified all API functions to handle agent-specific dimension lookups - Updated tests to work with agent-specific dimensions This fix ensures that dimensions accurately reflect the actual hardware configuration of each individual GPU agent, preventing dimension mismatches in multi-GPU systems where GPUs share the same architecture but have different physical configurations. Counter ID Representation Changes: - Modified counter_id encoding to include agent information in bits 37-32 - Agent logical_node_id is encoded as (value + 1) to ensure agent 0 is detectable - Counter records internally store only 16-bit base metric IDs (bits 15-0) - Tool reconstructs agent-encoded counter IDs from base metric ID & agent info - Instance record counter_id field uses bitwise AND mask to extract base metric ID (counter_id.handle & 0xFFFF) to fit in 16-bit storage - Output generators (CSV, JSON, Perfetto) use agent-encoded IDs for consistency - Updated counter_config.cpp and metrics.cpp to extract base metric ID when needed - All counter lookups now properly handle agent-encoded vs base metric IDs This ensures counter IDs are consistent between metadata and output records while maintaining compact storage in instance records. |
||
|
|
990946e956 |
[SDK] Fix null handles (#474)
* Fix null handle
- use .handle=0, not .handle=numeric_limits<>::max()
* Update lib.common.hasher
* Fix ROCPROFILER_CONTEXT_NONE
* Use context operator==
* Update CHANGELOG
* Updated null handle for scratch memory and changed allocation test so that free ops account for null agent
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
7243889d6a |
Add perfetto support for scratch memory (#303)
* Add perfetto support for scratch memory
* Updated tests and docs.
* Update docs data
* Added underflow check
* Record all free events to 0 bytes
* Add format
* Address review comment
* updated tests for scratch memory
* update scratch-memory tests.
[ROCm/rocprofiler-sdk commit:
|
||
|
|
fbf17a42d4 |
[SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges (#363)
* [SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges
Range extent to capture all work between roctxpush/pop operations. Entry callback takes place during roxtxpush and exit callback takes place in roctxpop. This is primarily to allow us to keep an ancestor id on the ancestor stack such that all operations that take place within the push/pop context can be annotated as being apart of this range. With the current setup (where push and pop are two separate operations that need to be combined externally), we cannot keep an ancestor id on the stack and thus cannot tie tracing events to particular ranges.
Correlation id information is inherited from the push operation. Ancestor id needs to be added in a future commit that also outputs this ancestor to CSV.
Output:
```
[ctest] {'size': 64, 'kind': 7, 'operation': 1, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479402642, 'end_timestamp': 2932551491178449, 'thread_id': 3254861}
[ctest] {'size': 64, 'kind': 8, 'operation': 2, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479405878, 'end_timestamp': 2932551491181214, 'thread_id': 3254861}
```
Note: Kind 8 = range extent op.
* Merge fix
Revert several changes
source/lib/rocprofiler-sdk/marker/range_marker.*
- separate out range marker implementation for standard marker implementation
Update public API with marker core range
Support marker core range in sdk (source/lib/rocprofiler-sdk)
Transition rocprofiler-sdk-tool and output lib to use marker core range
Misc fixes for tests
Fix logic in lib/output/generate{CSV,Stats}.cpp
Update tests/rocprofv3/tracing-hip-in-libraries (marker validation)
Fix test_otf2_data
* Test fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
b461671093 |
Modified perfetto output for HIP stream display (#431)
* Modified perfetto output for HIP stream display
* Moved stream_map file location and changed perfetto output names Private_Segment_Size and Group_Segment_Size to Scratch_Size and LDS_Block_Size respectively
* Used const_cast to remove const modifier on void*
* Reverted stream_map changes, now using tool_metadata map to track mapping between stream ptrs and stream IDs
* Removed buffer tracing args in perfetto, added tool_...hip buffer record struct that stores the HIP stream ID for display purposes
* Updated rocpd perfetto.cpp to reflect stream changes. Still need to add vgpr values and stream ID for HIP API
* Changes pass-by const reference to pass-by const value
[ROCm/rocprofiler-sdk commit:
|
||
|
|
af8650c014 |
Fix calculation for counter collection in perfetto (#395)
* Fix counter values acuumulation
* Updated change log
* Fix warnings
* Address comment
[ROCm/rocprofiler-sdk commit:
|
||
|
|
b097e276a9 |
[rocprofv3] Add rocpd output support (part 1: prelude) (#401)
* [rocprofv3] Add rocpd output support (part 1: prelude)
- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
- md5sum
- hasher
- simple_timer
- static_tl_object
- get_process_start_time_ns(pid_t)
- output library updates
- node_info
- file_generator (generator is now virtual base class)
- stream info updates
* Added submodules
* Code review updates
* Minor unused-but-set-X warning fixes
* Update CI
- install libsqlite3-dev package
* Update CI
- install libsqlite3-dev package
* Fix static thread-local object memory leak
- also fix signal handler chaining
* Remove URL from comment
* Remove page migration exception
* Enable ROCPROFILER_BUILD_SQLITE3 by default
- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON
* Fix gotcha installation
- make install of target optional
* Validate tracing + counter collection dispatch data
- i.e. correlation ids, thread ids, timestamps
* Make find_package(SQLite3) optional
- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)
* Fixes to tracing + counter collection test
* get_process_start_time_ns update
- original implementation did not work
* Fix pytest-packages test_perfetto_data for counter collection
- erroneous failure when used with same PMC + multiple agents
* cmake policy: option() honors normal variables
- for GOTCHA submodule
* Improve samples/api_buffered_tracing stability
- reduce likelihood of sporadic exception throw
* Update gotcha submodule
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
24f054f509 |
Fix HIP Streams Duplication Error (#313)
* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
5467c83188 |
rocDecode Buffer Tracing Support (#315)
* Added buffer tracing support for rocdecode and updated tests to work with buffer tracing
* Updated perfetto to output args individually rather than as a string list
* Updated docstrings and operation type, changed OTF2 code to remove warning due to change in operation type
* Updated tests for review comments
* Test args exist and return value
* Updated to use string entry
* Change function name
* Updated PR to reflect review comments
* Updated for PR review comments
* Change function name
[ROCm/rocprofiler-sdk commit:
|
||
|
|
8f891cdcc8 |
[SDK][rocprofv3] Buffer tracing records with args (HIP) (#285)
* [SDK][rocprofv3] HIP API buffer records with args (ext)
- New buffer tracing domain(s) for HIP APIs which include the arguments and the return value in the buffer records
- Update HIP stream support for extended HIP buffer tracing
- Update rocprofv3 tool library and output library to use extended HIP buffer tracing recods
* Update stream.cpp
- handle hipStream_t address being reused for a new stream
* Update doxygen docs for rocprofiler_iterate_buffer_tracing_record_args
* Update rocprofv3 tool.cpp
- configure buffer tracing services with HIP_*_API_EXT variants
- tweak logging level for hip_stream_display_callback
* Fix validation tests
- add HIP_RUNTIME_API_EXT and HIP_COMPILER_API_EXT to valid domain names
* Serialization support for buffer tracing args
* Disable stream service for __hipPopCallConfiguration
- this is interpreted as a stream create but it doesn't create a stream
* Fix execute_buffer_record_emplace for HIP extended contexts
* Add uint64_t_retval to rocprofiler_hip_api_retval_t union
- reading in hipError_t_retval during serialization of pointer return value causes undefined behavior
* Fix compilation warning about unused but set parameter
- in hip/stream.cpp
* Add synchronization for async_copy_data
* Fix compilation error
* Fix compilation error
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
fbcb5025e7 |
[SDK] Add Stack IDs (#269)
* Add Stack IDs
* Add memcpy test
* Add async corr id record
* Async events use `rocprofiler_async_correlation_id_t`
* Sync events use `rocprofiler_correlation_id_t`
* Update ATT to use asnyc IDs
* Review comments
[ROCm/rocprofiler-sdk commit:
|
||
|
|
b2c0f91aef |
Add perfetto support for counter collection
Fix endtimestamp for counter tracks
Add fix for rocprofv3 counter collection tests
Fix formats and refactors
Added docs and addressed review comments
Address more review comments.
[ROCm/rocprofiler-sdk commit:
|
||
|
|
7aeaffd871 |
HIP Streams to Queues Translation (#235)
* rocprofiler_stream_id_t: opaque handle for a stream
- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
- rocprofiler_buffer_tracing_hip_api_record_t
- rocprofiler_buffer_tracing_memory_copy_record_t
- rocprofiler_callback_tracing_hip_api_data_t
- rocprofiler_callback_tracing_memory_copy_data_t
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Mark Meserve <mark.meserve@amd.com>
Co-authored-by: Elwazir, Ammar <Ammar.Elwazir@amd.com>
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jakaraddi, Manjunath <Manjunath.Jakaraddi@amd.com>
Co-authored-by: Bhardwaj, Gopesh <Gopesh.Bhardwaj@amd.com>
Co-authored-by: Nagaraj, Sriraksha <Sriraksha.Nagaraj@amd.com>
Co-authored-by: U, Srihari <Srihari.U@amd.com>
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
Co-authored-by: Benjamin Welton <ben@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
864a9c328d |
Adding agent-index (#189)
* Adding agent-index
* review changes
* review comments addressed
* minor fix
* fix CI failure
* review comments
* Fix agent index test and address review comments
* Build Fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
be053cb10b |
Temporarily Fix Incorrect Kernel Perfetto Trace Duration due to Firmware Timestamp Bug (#134)
* Perfetto duration temp fix setup
* Add timestamp change amounts to ROCP Info
* Groups kernel dispatch info by agent and queue id before sorting. Midpoint interpolation is then performed on the sorted kernels
* Moved dispatch bins into the for-loop
* Fix compilation error by using const ref
* Modified for review comments
* Changed variable names
[ROCm/rocprofiler-sdk commit:
|
||
|
|
74efdad1aa |
rocJPEG API Tracing (#73)
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Support for rocJPEG API Trace
* Added newline to rocjpeg_version.h
* json-tool code added, initial test/bin commit
* Formatting
* Resolved rocjpeg bin test compilation errors
* Tests implemented. Perfetto module currently resulting in errors, so need to retest whenever it is fixed
* Formatting and compilation errors
* Minor fixes
* Copyright year update and minor fixes
* Doc update fix
* Added rocjpeg csv file in data
* Addresses review comments: Updated fixed Findroc.. and uses root directory as a hint, fixed documentation error, changed tables to use _CORE, minor style fixes
* Added rocdecode and rocjpeg to CI
* Removed rocdecode and rocjpeg from CI and added back build tests option
* Updated Cmake Files
* Added rocDecode and rocJPEG to CI
* Remove cmake line added in error
* Temporarily modified tests to pass if rocdecode or rocjpeg tracing are not supported for CI, cmake changes
* Added find_package for test
* Added back use of system rocDecode and rocJPEG, modifies system files to include prefix path
* Updated no-link to include INCLUDE_DIR/roc(decode|jpeg), added comments for tests
* Resolve merge conflicts and formatting
* Added regex find and replace instead of include for CI
* VAAPI package causing errors on Vega20
* Removed system rocjpeg and rocdecode use temporarily until cmake issues resolved
* Removed workflows regex
* Formatting and minor test modification
* Modified test for vega20
* Update rocDecode and rocJPEG cmake and tests
* Changelog
* Fix merge conflict
* Added back if-statements around add-tests since cmake-generator-expressions are resulting in errors when the packages are missing
* Removed if found statements, replaced with TARGET:EXISTS
* Skip json file for rocjpeg and rocdecode tests if not supported
* Add os import
---------
Co-authored-by: Kandula, Venkateshwar reddy <Venkateshwarreddy.Kandula@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
3a26de9e53 |
Memory Allocation Counter Track Shows Total Allocation (#71)
* Counter track for memory allocation is now a running sum showing total allocation
* Address review comments
* Update source/lib/output/generatePerfetto.cpp
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
* Updated to reflect review comments
* Fix compilation errors on CI
* remove braces on scalar
* Fix struct compilation issues
* Removed name_to_id for sanitizer
---------
Co-authored-by: Meserve, Mark <Mark.Meserve@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
908862e45e |
Initialize extremes to max and min values (#184)
* Initialize extremes to max and min values
* Address review comment
* Adding clang format
[ROCm/rocprofiler-sdk commit:
|
||
|
|
edb51fc861 |
update copyright date to 2025 (#102)
* Update LICENSE
* Update conf.py
* Update copyright year
* [fix] Update copyright year
* Update copyright year "ROCm Developer Tools"
* Add license headers to c++ files
* Add license to *.py
* Update licenses in rocdecode sources
---------
Co-authored-by: srawat <120587655+SwRaw@users.noreply.github.com>
Co-authored-by: Mythreya <mythreya.kuricheti@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
3e1b8ba4ec |
rocDecode API Tracing Support (#49)
* rocDecode API Tracing support
* Test bin file added to rocdecode. Need to add validate python methods
* Added option to not make rocDecode tests
* Added rocdecode and rocprofv3 tests
* Added csv test
* Address PR comments. Changed tests to use built-in rocstreambit decoder to remove ffmpeg dependancy. Changed cmake option to disbale tests rather than not build them. Tests work locally, but will fail until rocDecode is built with tracing enabled on CI
* Add option to avoid building rocdecode tests
* Added option to avoid building rocdecode bin file
* Merge conflict error
* CMake files changed in response to review comments. Attempting to implement callbacks.
* Turned off test building for rocdecode
* Minor fixes for review comments
* Review comments
* Updated formatting
* Document changes and format.hpp reversion. Need to remove iterate args support for now for later update.
* Remove iterate args support
* Remove iterate-args
* enforce abi versioning in macro if
* Fix doc error
* removed spaces to fix indentation error
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
792329fefd |
SWDEV-492625 memory free functions (#11)
* SWDEV-492625: Track free memory HSA functions to help determine total amount of memory allocated on the system at any one time
* Minor fixes to address comments
* Update allocation size description
* Moved get function back to specialization, minor typo fixes
* Removed memory_operation_type field, removed memory_pool allocation enum, converted starting address to hex string for json format.
* Made conversion to hex_string a function, changed address to use union rocprofiler_address_t type, changed VMEM descriptors
* Removed as_hex from the global namespace
* Formatting
* Removed TRACK_EVENT for memory allocation, now TRACK_COUNTER for memory allocation is being performed
* Check if address was recorded before retrieving allocation size in generate Perfetto
* Formatting
* Update source/lib/output/generatePerfetto.cpp
* Explicitly disable app-abort tests
* Remove excluding app-abort test from workflow CI
- redundant bc these tests are explicitly marked as disabled now
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
94f4f56c40 |
Memory Allocation Tracking (#1142)
* Initial commit: Need to implement wrapper function to collect data and test that wrapper function is correctly replacing core HSA functions
* Attempted to implement wrapper implementation for hsa memory allocation functions. Need to modify generate record files and test if implementation is working as expected
* Debugging and implementing generateCSV function
* Memory allocation size and starting address outputted to csv and json file formats
* Formatting
* Initial setup for OTF2 and Perfetto generation
* Collecting agent id for memory_allocation and formatting
* Modified memory_allocation.cpp to set up code for AMD_EXT commands
* Support for memory_pool_allocate added
* Removed accidently added file
* Made flag optional and added more OTF2 and Perfetto code. Needs testing to ensure perfetto and OTF2 works
* Formatting
* Fixed perfetto and otf2 output
* Fixed flag issue due to incorrect buffer use
* Updated documentation
* Small cleaning and comments
* Added test for HSA memory allocation tracing
* Fixed summary test validation errors due to allocation tracing. Added type to location_base to create unique event ids for allocation due to OTF2 trace error
* Decreased lower limit of hip calls for test
* Modified summary tests to vary number of allocate requests
* Minor fixes to address comments. Still need to address OTF2 comments
* Fix docs and changed OTF2 to use enum for type specified in location_base construction
* Fixed schema error
* Added vmem command tracking. Need to add test
* Updated test to work with vmem command and updated generateCSV to output int instead of hex string.
* OTF2 enum update and mispelling fix
* CI does not support Virtual Memory API. Removed vmem test. Will add back if CI is modifed to suport vmem API
* Update CMakeLists.txt for memory allocation test
* Updated summary test
* Minor fixes to address comments
* Moved domain_type.hpp enum to before LAST
* Fixed compile errors and formatting
* Fixed stats summary domain name error
* Added rocprofv3 test
* Page migration test fix
* Undo page migration test changes. Failures do not appear to have to do with memory allocation
[ROCm/rocprofiler-sdk commit:
|
||
|
|
c17952fd23 |
rocprofv3: refactor and reorganize rocprofiler-sdk-tool library (#1138)
* Add rocprofv3-multi-node.md to source/lib/rocprofiler-sdk-tool
* Initial source re-organization
- create "output" static library
* Update include/rocprofiler-sdk/cxx/serialization.hpp
- add GPR count fields to kernel symbol serialization
* Add source/scripts/generate-rocpd.py
- reads one or more JSON output files from rocprofv3 and writes rocpd SQLite3 database
- Note: preliminary implementation
* More reorganization b/t lib/rocprofiler-sdk-tool and lib/output
* Updates to generate-rocpd.py
- add SQL views
- option: --absolute-timestamps -> --normalize-timestamps
- option: --generic-markers
- misc fixes with regards to getting the views working
- support marker names
* Update generate-rocpd.py
- Add --marker-mode option
* Update generate-rocpd.py
- Improve debugging of bad bulk SQLite statements
* Update rocprofv3-multi-node.md
- cleanup of proposed SQL schema
* lib/output/format_path.{hpp,cpp}
- rename format to format_path (in config.hpp and config.cpp)
- move format_path functionality to format_path.{hpp,cpp}
* Rework lib/output/tmp_file_buffer.{hpp,cpp}
* Update output_key.cpp
- support %cwd%, %launch_date%
* Rework lib/output/buffered_output.hpp
* Support csv_output_file constructed via domain_type
* Update lib/output/domain_type.{hpp,cpp}
- get_domain_trace_file_name
- get_domain_stats_file_name
* Update lib/rocprofiler-sdk-tool/tool.cpp
- tweak headers
* Update lib/output/generate*.cpp
- remove include of helpers.hpp
- CSV uses domain_type for filenames
* Update samples/counter_collection/per_dev_serialization.cpp
- make wait_on volatile
* Remove tool_table from lib/output and lib/rocprofiler-sdk-tool
- Also split various structs into their own files
- lib/output/agent_info
- lib/output/metadata
- lib/output/kernel_symbol_info
- lib/output/counter_info
- Implemented rocprofiler::tool::metadata
* Optimize rocprofiler_tool_counter_collection_record_t
- reduce the size of the struct from 24784 bytes to 8376 bytes
* Introduced output_config
- split subset of config (from tools library) into output_config to be able to configure the output generating functions separately from the tool library
- this is a significant step towards the output generating functions not relying on static global memory
* Stream chunks of data into output instead of loading all info memory
* Remove duplicate group_segment_size in rocprofiler_kernel_dispatch_info_t serialization
* Adding Q&A to rocprofv3-multi-node.md
* Remove all remaining include lib/rocprofiler-sdk-tool from lib/output
- migrated a fair amount of code from lib/rocprofiler-sdk-tool/helper.hpp to lib/output
* Update Q&A of rocprofv3-multi-node.md
* Fix minor compilation errors + minor cleanup
* Update hsa/async_copy.cpp
- when ROCPROFILER_CI_STRICT_TIMESTAMPS > 0, reduce the active_signal sync wait time
* Update profiling_time.hpp
- fix log messages for when start/end time is less/greater than enqueue/current CPU time
* Fix generate_stats for tool_counter_record_t
* Dictionary optimization for generate-rocpd.py
---------
Co-authored-by: SrirakshaNag <104580803+SrirakshaNag@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit:
|