develop
142 Commity
| Autor | SHA1 | Wiadomość | Data | |
|---|---|---|---|---|
|
|
1517a398bf |
[rocprofiler-sdk] Buffer finalization fixes and HSA ABI 0x09 support (#2318)
* [rocprofiler-sdk] Fix buffer flush ordering and sanitizer CI improvements Buffer Pool Design ------------------ Replace the fixed array-based double buffer with a dynamic pool design to fix race conditions that caused "internal correlation id was retired prematurely" errors. The original design had a race where flush callbacks could be delivered out-of-order: when buffer 0 fills and begins flushing, writes go to buffer 1. If buffer 1 fills before buffer 0's flush completes, the buffer index wraps back to 0 (which may still be flushing). Independent flush tasks submitted to the thread pool can complete out of order. The new pool design: - Uses a std::deque of buffer instances that grows as needed - Allocates buffers from the pool when the current buffer needs to flush - Serializes flushes with a mutex to ensure FIFO callback ordering - Returns buffers to the pool after flush completion - Eliminates the race between buffer selection and write operations New Unit Tests -------------- - buffer_correlation_ordering.cpp: Tests that API records are always delivered before their corresponding retirement records - buffer_ordering_stress.cpp: Stress tests buffer flush ordering under high contention with multiple threads rapidly filling buffers HSA Tool Hooks -------------- Added hsa_tool_hooks.cpp/hpp to register an HSA OnUnload callback that waits for pending flush tasks before tool finalization, preventing "retired prematurely" errors during HSA shutdown. Sanitizer Improvements ---------------------- - LSAN: Set fast_unwind_on_malloc=1 to prevent deadlock in libgcc unwinder - LSAN: Added suppressions for external tools (liblzma, liblsan, seq, strdup) - TSAN: Added suppression for false positive on C++11 thread-safe static initialization in create_write_functor - ASAN/UBSAN: Added patterns for known issues in HSA runtime, HIP, perfetto - Disabled attachment tests for sanitizers due to library preloading issues Other Fixes ----------- - Thread-trace agent test: Use heap-allocated callback state - Correlation ID: Refactored reference counting and finalization ordering * [rocprofiler-sdk] Revert buffer pool design changes Revert buffer.cpp and buffer.hpp to the original double-buffer design from develop branch. The pool-based redesign introduced concerns about: - Signal safety (mutex vs atomic_flag) - API changes (flush() return type) - Complexity of the new design This revert removes: - Dynamic buffer pool with std::deque - std::mutex/condition_variable synchronization - buffer_correlation_ordering.cpp test - buffer_ordering_stress.cpp test The underlying buffer flush ordering issue will need to be addressed with a different approach that preserves the original API and synchronization characteristics. * [rocprofiler-sdk] Consistent fini_status checks to prevent correlation ID creation during finalization - Revert TOCTOU CAS loop change in sub_ref_count() - not needed with consistent checks - Add fini_status check in correlation_tracing_service::construct() with ROCP_CI_LOG warning - Add nullptr checks at all construct() call sites (queue.cpp, async_copy.cpp, memory_allocation.cpp) - Change all 'get_fini_status() > 0' to '!= 0' for consistent behavior: - hsa/queue.cpp (lines 105, 210) - hsa/async_copy.cpp (line 344) - hsa/hsa_barrier.cpp (line 43) - buffer.cpp (lines 107, 138, 185) This ensures no correlation IDs are created once finalization starts (fini_status != 0), preventing races between finalization and ongoing tracing operations. * [rocprofiler-sdk] Replace arrival-order checks with timestamp-based temporal validation Buffer records are not guaranteed to arrive in any specific order. Tests and samples should use timestamps for temporal ordering validation instead. Changes: - samples/external_correlation_id_request: Replace 'retired prematurely' arrival order check with timestamp-based validation that retirement timestamp >= max(end_timestamps) for records with the same correlation ID - tests/external_correlation.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/registration.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check - tests/roctx.cpp: Remove EXPECT_GT(corr_id, last_corr_id) check Correlation IDs are not guaranteed to be monotonically increasing when records are sorted by timestamp. Temporal ordering should be validated using the timestamp fields in each record. * [rocprofiler-sdk] Revert external/CMakeLists.txt SYSTEM keyword removal Restore the SYSTEM keyword to target_include_directories for rocprofiler-sdk-fmt to match develop branch. * [rccl] Remove orphaned rocSHMEM gitlink Remove orphaned submodule reference that was introduced during a merge but never had a corresponding .gitmodules entry, causing CI failures with "fatal: no submodule mapping found in .gitmodules". * [rocprofiler-sdk] Add HSA ABI version 0x09 support Add ABI checks for HSA_AMD_EXT_API_TABLE_STEP_VERSION 0x09 which introduces hsa_amd_counted_queue_acquire and hsa_amd_counted_queue_release functions (added in rocr-runtime SWDEV-561708). * [rocprofiler-sdk] Handle finalized status gracefully in buffer flush operations This commit consolidates fixes for handling the finalization status during buffer flush operations across the SDK. Changes: - Tool and samples: Handle ROCPROFILER_STATUS_ERROR_FINALIZED gracefully when flushing buffers, as this indicates buffers were already flushed during finalization (not an error condition) - HSA handlers (queue.cpp, async_copy.cpp, hsa_barrier.cpp): Use > 0 check for fini_status to allow operations during finalization process - buffer.cpp: Revert fini_status checks to use > 0 for consistency - correlation_id.cpp: Add fini_status > 0 check with ROCP_TRACE logging to prevent correlation ID creation after finalization starts Files modified: - source/lib/rocprofiler-sdk-tool/tool.cpp - tests/tools/json-tool.cpp - source/lib/rocprofiler-sdk/tests/registration.cpp - source/lib/rocprofiler-sdk/tests/roctx.cpp - samples/api_buffered_tracing/client.cpp - samples/counter_collection/buffered_client.cpp - samples/counter_collection/device_counting_async_client.cpp - samples/external_correlation_id_request/client.cpp - samples/pc_sampling/client.cpp - source/lib/rocprofiler-sdk/buffer.cpp - source/lib/rocprofiler-sdk/context/correlation_id.cpp - source/lib/rocprofiler-sdk/hsa/queue.cpp - source/lib/rocprofiler-sdk/hsa/async_copy.cpp - source/lib/rocprofiler-sdk/hsa/hsa_barrier.cpp * [rocprofiler-sdk] Remove hsa_tool_hooks and simplify buffer flush handling Remove the hsa_tool_hooks infrastructure and simplify buffer flush calls in samples and tools. The ERROR_FINALIZED handling was overly complex and the hsa_tool_hooks OnUnload synchronization is no longer needed. Changes: - Remove hsa_tool_hooks.cpp/hpp and related registration.cpp code - Simplify buffer flush calls in samples to use direct ROCPROFILER_CALL - Simplify buffer flush in tool.cpp and json-tool.cpp - Remove ERROR_FINALIZED special handling from test files Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Fix output_stream move semantics to null source pointers The default move constructor and move assignment operator for output_stream did not null out the source's pointers after the move. This caused double-close when the moved-from temporary was destroyed, leading to use-after-free crashes (SIGSEGV in std::ostream::sentry). Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Improve Perfetto trace writer and sanitizer configuration - generatePerfetto.cpp: Move output_stream into shared_state to prevent use-after-free race conditions during Perfetto callback execution - run-ci.py: Simplify and consolidate sanitizer environment variable configuration for better maintainability Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Revert run-ci.py changes that broke sanitizer suppressions The previous changes removed MEMCHECK_SANITIZER_OPTIONS which is required for CTest to properly pass suppression files to the sanitizers during memcheck runs. Co-Authored-By: Claude <noreply@anthropic.com> * Revert "[rccl] Remove orphaned rocSHMEM gitlink" This reverts commit 1ad21003941355658fff8114fa27768f11a948f7. * [rocprofiler-sdk] Revert registration.cpp changes Revert changes to registration.cpp to match develop branch. Co-Authored-By: Claude <noreply@anthropic.com> * [rocprofiler-sdk] Remove suppression file content printing from run-ci.py Co-Authored-By: Claude <noreply@anthropic.com> * Fix output_stream move ctor/assignment operator * Fix erroneous revert of registration.cpp * Fix handling of fini status in correlation ID construction * [rocprofiler-sdk] Fix OMPT segfault during finalization Add nullptr checks in OMPT tracing code to handle the case where correlation_tracing_service::construct() returns nullptr during finalization. This fixes segfaults in openmp-target-sample and tests.integration.execute.openmp-tools. The correlation ID construction now returns nullptr when fini_status > 0, but the OMPT callbacks were not checking for this, causing crashes when dereferencing the null pointer during OpenMP runtime shutdown. Changes: - event_common(): Return nullptr early if correlation ID is null - event(): Check for nullptr before calling sub_ref_count() - ompt_task_create_callback(): Return early if correlation ID is null - ompt_task_schedule_callback(): Return early if correlation ID is null * [rocprofiler-sdk] Fix HSA API tracing segfault during finalization Add nullptr check in hsa_api_impl::functor after correlation ID construction. During finalization, correlation_service::construct() returns nullptr, and without this check the code would dereference the null pointer when accessing corr_id->internal. This fixes the SEGV at address 0x000000000008 (null + 8 byte offset) that occurs when HSA async event threads call hsa_signal_destroy during runtime shutdown after finalization has started. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> |
||
|
|
7fcea905f3 |
[rocprofiler-sdk] Fix double-buffering emplace and flush synchronization (#2334)
* Fix buffer tracing synchronization lock - PR #529 (in rocprofiler-sdk-internal) introduced waiting on the syncer flag when emplacing in a buffer to prevent the overwriting buffer records currently being processed in a buffer flush callback - The above fix introduced a block on the both buffers when a buffer flush callback was being executed instead of a block on the buffer being flushed. * Add rocpd tests for duplicate records * Address code review comments |
||
|
|
a2288eb50b |
[rocprofiler-sdk] Install unit tests and helper functions for integration tests (#921)
* [rocprofiler-sdk] Install unit tests and helper functions for integration tests * Fix rocprofiler-sdk-tests-target export * Fix handling of cmake policy CMP0174 * Remove -vv from new pytest.ini files * add unit tests and integration tests. * add path to ci workflow. * misc. fixes. * pc sampling tests. * bug fixes. * pc sampling tests fix. * misc. * Update CMakeLists.txt * Update rocprofiler_config_install_tests.cmake, correct license name * fix units tests install issues. * fix counters_def file path. * fix bug, arg shifting. * vendor pytest-cmake. * cmake config fix. missing endfunction() * disable tests, 1.rocprofv3-trace-hip-libs. 2.kernel-tracing. 3.external_correlation 4.rocpd. * disable buffered-tracing test and remove pytest-cmake from requirements.txt. * disable hip-graph-tracing test. * fix building standalone tests to load rocprofiler-sdk cmake package first and then find rocprofiler_sdk_pytest module. * addressed comments: 1.add local bin path to code cov workflow. 2.add to cmake prefix path local bin. 3.use ROCPROFILER_MEMCHECK_PRELOAD_ENV_VALUE 4.misc. fix * enabled back tests api_buffered, external_correlation_id, hip-graph, kernel-tracing, rocpd, tracing-hip-in-libraries. and misc fixes(formating, extra fixtures for agent-index tests.) * cpack to use llvm bin for .hsaco debug symbols. * psdb tests fixes. * EOL. * misc. fixes and Disable api_buffered_tracing, external_correlation_id, hip-graph-tracing, kernel-tracing, rocpd, summary, tracing-hip-libraries, tracing-plus-counter-collection. * fix incorrect cmakelists file. * strip smallkernel.bin * format. * revert disabled tests commit. * misc. fix in counter tests. * misc. * search codeobj unit test assets in curr bin and install bin. * refactor newly added rocpd tests. * modify tests for newly added hip-host-tracing. * add LD LIB path to units, psdb is failing due to libs not being found. --------- Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com> Co-authored-by: Venkateshwar Reddy Kandula <Venkateshwarreddy.Kandula@amd.com> Co-authored-by: JeniferC99 <150404595+JeniferC99@users.noreply.github.com> |
||
|
|
061948a5ec |
[rocpd] Adding merge and package submodules for rocpd (#164)
* adding ROCpd database merge
* adding ROCpd database merge concatenating all tables
* update merge script
- copy all tables from files
* fix merge format
* Add package submodule, initial POC. Need to refine
* Minor fixes and clean up duplicated code in package.py
* Revamp metadata layout, add wildcard and .rpdb parsing
* Add auto merge & package when > 5 DBs, add examples, don't use auto_merge when using sub-commands merge & package
* - Extend package/yaml inputs to all rocpd modules
- Improve handling more corner cases for bad input files when parsing input parameters (bad yaml files, bad .rpdb folder, folders as input)
- Changed to use UUID in merged filename instead of the time, in auto-merge algorithm
* Minor text fixes for consistancy between modules
* Add more wildcard support and add package, merge tests
* Make changes based on review suggestions
* Move parsing packages into importer.py, simplified adding required params to a function
* fix package test by flattening input list before processing
* Integrate merge.py changes from Jonathan to add name-collision checks, recreating indexes, foreign key check (disabled for now, due to processing time)
* Rework rocpd.<submodule>.{add_args,process_args}
- add_args function returns a functor which accepts input and args
- time_window functor returned from add_args automatically applies time windowing of input
* change merge&package limit to 1, merge should create data views
* Move files by default instead of making copies
- copying can be enabled by passing "copy=True" or --copy cmdline argument
* refactor package to make the logic cleaner, set merge limit back to 5
* Allow automerge-limit param to override limit, change default back to 1. Tests updated to use query, much quicker
* Update --help instructions for package
---------
Co-authored-by: acanadas <acanadas@amd.com>
Co-authored-by: a-canadasruiz <Araceli.CanadasRuiz@amd.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
|
||
|
|
11d12a82fb |
rocprofiler-sdk: attach: fix test permissions (#1528)
* attach: fix test permissions - Test is now skipped if insufficient permissions detected - Should fix test (for now) in Azure CI pipeline - Add more extensive permission checking for the tests - Add default parameters to prevent running rm -rf on a root directory - Add use for unused LOG_LEVEL parameter |
||
|
|
06bf110c84 |
Adding counters support for strix halo (#1358)
* Adding counters support for strix halo * Updated coutners list * Added missing counter info * Updated arch support |
||
|
|
d496bcef18 |
Fix dimension mismatch for multi-GPU systems with identical architect… (#1440)
* Fix dimension mismatch for multi-GPU systems with identical architectures This change addresses an issue where counter dimensions were incorrectly shared across all GPU agents with the same architecture name, even when those agents had different hardware configurations (e.g., different CU counts). Changes: - Updated getBlockDimensions() to accept agent ID instead of architecture name - Made dimension cache agent-specific instead of architecture-specific - Updated set_dimensions() in AST evaluation to use specific agent ID - Modified all API functions to handle agent-specific dimension lookups - Updated tests to work with agent-specific dimensions This fix ensures that dimensions accurately reflect the actual hardware configuration of each individual GPU agent, preventing dimension mismatches in multi-GPU systems where GPUs share the same architecture but have different physical configurations. Counter ID Representation Changes: - Modified counter_id encoding to include agent information in bits 37-32 - Agent logical_node_id is encoded as (value + 1) to ensure agent 0 is detectable - Counter records internally store only 16-bit base metric IDs (bits 15-0) - Tool reconstructs agent-encoded counter IDs from base metric ID & agent info - Instance record counter_id field uses bitwise AND mask to extract base metric ID (counter_id.handle & 0xFFFF) to fit in 16-bit storage - Output generators (CSV, JSON, Perfetto) use agent-encoded IDs for consistency - Updated counter_config.cpp and metrics.cpp to extract base metric ID when needed - All counter lookups now properly handle agent-encoded vs base metric IDs This ensures counter IDs are consistent between metadata and output records while maintaining compact storage in instance records. |
||
|
|
e7a26594b7 |
[rocprofiler-sdk] Fix Stream ID Error for Attachment (#1142)
* Changed stream error warning, remove regex search from attach execute test * Formatting * Revert accidental change * Fix stream hang error due to grabbing same lock twice * Updated add stream code, need to update tests * Update attachment tests to use streams, threads, and multiple devices * Update tests and fix stream issues * Updated error messages to be more explicit, updated json to csv code in conftest to include streams and threads * Formatting * Add attachment label to attachment tests and update validation to fix errors * Fix attach twice conftest * Disabled thread san tests for attachment since they no longer work with bin file changes * Updated for comment * Added null check for getting attach status |
||
|
|
952d1dabe2 |
[ROCProfiler-SDK][ROCR] HSA New API changes for HSA_AMD_EXT_API_TABLE_STEP_VERSION 8 (#1182)
* add new hsa ext api for version 8. * use fmt instead of ostream. * override rccl from therock * Update rocprofiler-sdk-continuous_integration.yml * Update rocprofiler-sdk-continuous_integration.yml * Update rocprofiler-sdk-continuous_integration.yml * enable rocr-build * format * disable att consecutive-kernels tests. * Enable ROCR build in code coverage workflow --------- Co-authored-by: Venkateshwar Reddy Kandula <venkateshwar.kandula1306@gmail.com> |
||
|
|
abd6029603 |
[rocprofv3] MultiKernelDispatch ATT support (#774)
* Initial consecutive kernel WIP * Updated logic after discussion, create context only when needed, change set of captured ids to dispatch_id_t type * Updated to fix concurrency issues and revert kernel_iterations * Add captured id in first lock capture * Updated code to use wlock, added comments, removed some unecessary atomic * Cleaned up, need to add test * Add test to check that generated stats csv file is not empty * Updated test to check if vector-ops kernels are being used * Fix phase bug * Updated for comments * Flattened ATT logic a bit * Fix incorrect if-statement * Fix merge conflict |
||
|
|
1e9d8abbf6 |
[rocpd] Convert to perfetto does not display scratch_memory correctly - SWDEV-542550 (#168)
Add scratch memory to pftrace generated with rocpd ---- Co-authored-by: Marko Crnobrnja <Marko.Crnobrnja@amd.com> Co-authored-by: Aleksei Tumakaev <atumakae@amd.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> |
||
|
|
9910705685 | Disable validation tests when execution test is disabled (#1060) | ||
|
|
3f001b0305 |
[rocpd] Refactor to use python to convert rocpd to CSV + add CSV tests + remove old cpp implementation (#159)
* Write agent info to CSV * Write kernel to CSV * Write memory copy to CSV * Write memory allocation to CSV * Write hip api to CSV * Write hsa api to CSV * Write marker api to CSV * Write counters to CSV * Write scratch memory to CSV * Write rccl api to CSV * Write rocdecode api to CSV * Write rocjpeg api to CSV * Remove info_process joins * Format agent id * Compose full file name is sql writer function * Add missing fields to kernel traces csv * Rename vgpr_count to arch_vgpr_count * Fix kernel name * Skip empty query results * Format csv.py * Delete c++ CSV writer * Add CSV header comparison test * Fix comment spacing in csv.py * Change ALLOC to ALLOCATE in memory allocation writer * Do not append trace to agent info file name * Revert changes for VGPR_Count * Fix csv validation test * Add sorting by guid * Use EXISTS to check query results are not empty * Merge API-specific queries * Optimize regions query * Column name mapping for agent info * Pass config to sql writer * Move agent id string building to a separate function * add titled_headers argument * Remove titled-columns argument * Improvements for regions csv * fix CSV validation test * improve CSV validation test * remove roctxMarkA from csv validation test * fix capability field titles in agent info * remove filter.py from query as that is still experimental * Remove some aliases, now that query will auto-title the column headers --------- Co-authored-by: Aleksei Tumakaev <atumakae@amd.com> Co-authored-by: Young Hui <young.hui@amd.com> Co-authored-by: Young Hui - AMD <145490163+yhuiYH@users.noreply.github.com> |
||
|
|
bf49039005 |
[rocprofiler-sdk][rocprofiler-register] Initial Attachment Support (#316)
* attach: milestone: API tracing - This pairs with another commit in rocprofiler-sdk to fully function - Add ptrace entry points for tool attachment - API tracing works at this commit - Queue tracing not supported yet * attach: cleanup - Remove hardcode for loading of tool library - Make invoke registration functions public again * attach: proxy queue first draft - Adds ability to trace with queues during attachment - Must be paired with updated rocprofiler-sdk * attach: prestore overhaul - Must be paired with commit in rocprofiler-sdk * attach: add dispatch table rework - Register will load the prestore library and provide entrypoints to sdk * attach: formatting and cleanup * attach: revise dispatch table scheme * attach: formatting * attach: milestone: API tracing - This change must be paired with a change in rocprofiler-register to fully function. - API tracing works at this commit - Queue tracing not supported yet * attach: cleanup and comments * attach: Formatting and crash fixes * attach: add attach duration - Add option attach-duration-msec for attachment * Formatting + sglang hang fix via signal handling * Changed FATAL_IF to DFATAL_IF for scratch_memory due to persistent crash when iterating queues * attach: proxy queue first draft - Adds ability to trace with queues during attachment - Must be paired with updated rocprofiler-register * Allow null agents for scratch output * attach: improve queue library interface - Significant changes to force exported interfaces back to C - Fixes bug with unknown agents at attachment - Code objects' names may still be incorrect * attach: add code_object support - Kernel traces will now have names and all other information for launches - Add capture of hsa_executable to the queue library - Various logging improvements * attach: rename queue library to prestore * attach: prestore overhaul - Must be paired with commit from rocprofiler-register - Massive overhaul of code organization in prestore library - Separates registrations for different object types - Sets up future changes for initialization * attach: add prestore dispatch table - Removes linkage to prestore library from sdk * attach: cleanup * attach: formatting * attach: fix input prompt not appearing * attach: fix component name in cmake * attach: revert change to export level * Make prestore API public * attach: update sdk attachment library WIP - This commit is NONFUNCTIONAL - Changes around structure to remove classes - Seperate C linkage where needed - Still needs updates to register for correct usage * attach: update register with dispatch table WIP - This commit is NONFUNCTIONAL - Changes rocprofiler_register to handle dispatch table from attach library. - Still needs changes in SDK with dispatch table usage * attach: dispatch table wip - This commit is NONFUNCTIONAL * attach: move attach component into core * attach: rename to rocprofv3-attach * attach: add callbacks for new queues and code objects * attach: finish dispatch table implementation - Fixes kernel tracing * attach: add cmake variable for attachment support * feat: Add --attach alias for rocprofv3 with comprehensive attachment tests - Add `--attach` as an alias to existing `-p/--pid` functionality in rocprofv3.py - Create comprehensive attachment test suite with CSV and JSON output validation: - New attachment-test application for testing dynamic profiling scenarios - Unified test script supporting both CSV and JSON output formats - Pytest-based validation for kernel traces, memory copies, HSA API calls, and agent info - Add CMake integration for automated attachment testing - Support parameterized output directory and filename specification - Implement proper environment setup for attachment queue registration Tests verify successful attachment to running processes and capture of: - Kernel dispatch traces with workgroup/grid dimensions - Memory copy operations (H2D/D2H) with size validation - HSA API call traces across multiple domains - GPU/CPU agent information and capabilities * Documentation Update * attach: make attach script callable * Added ROCPROFILER_REGISTER_ATTACHMENT_TOOL_LIB to remove hardcoded name * attach: revert metrics library path changes * Generic Attachment in Register (#942) Remove tool references in register * Add second param to attach call in rocprof register * Add experimental reattachment support for ROCprofiler-SDK This commit introduces experimental reattachment functionality allowing tools to dynamically reattach to running processes with comprehensive design changes to support multiple attach/detach cycles: **Core Reattachment API:** - Add rocprofiler_tool_configure_result_experimental_t with tool_reattach/tool_detach callbacks - Add rocprofiler_call_client_reattach and rocprofiler_call_client_detach C exports - Implement reattachment tracking in rocprofiler_register_attach to differentiate initial attachment from reattachment cycles - Add rocprofiler_register_invoke_reattach for handling reattachment requests **Design Changes - Registration System Flow:** The registration system now supports a dual-path initialization: 1. Initial Attachment Flow: - rocprofiler_register_attach() -> rocprofiler_register_invoke_all_registrations() - Full tool initialization with complete context setup - Sets prev_attached atomic flag to track state 2. Reattachment Flow: - rocprofiler_register_attach() detects prev_attached=true -> rocprofiler_register_invoke_reattach() - Bypasses full re-initialization, calls client reattach callbacks instead - Preserves existing contexts and buffers, only reactivates profiling services **Design Changes - Tool Library Loading:** Enhanced rocprofiler-register library loading with function pointer resolution: - Extended rocp_set_api_table_data_t tuple to include reattach/detach function pointers - Automatic symbol resolution for rocprofiler_call_client_reattach/detach functions - Support for both LD_PRELOAD and dlopen scenarios with consistent callback availability **Design Changes - Context Management:** Introduced dual context systems for attachment scenarios: - get_contexts() - Original contexts for standard tool initialization - get_attach_contexts() - Separate context map for attachment-specific lifecycle - attach_init() - Creates contexts for ALL buffer tracing services using existing buffers - attach_start() - Selectively starts contexts based on configuration options - attach_detach() - Cleanly stops and destroys attachment contexts **Design Changes - Buffer Management:** Added reset_tmp_file_buffer() template for clean reattachment state: - Properly closes and removes old temporary files - Deletes existing file_buffer instances to prevent stale file position tracking - Creates fresh file_buffer instances for clean reattachment cycles - Addresses core issue where file position metadata becomes stale between cycles **Design Changes - Environment Variable Injection:** Added ROCP_REGISTERED_TOOL_ATTACH environment variable: - Distinguishes attachment-loaded tools from LD_PRELOAD scenarios - Enables registration system to apply attachment-specific logic - Helps tools adapt behavior for attachment vs standard initialization **Attachment Context Management:** - Add attach_init/attach_start/attach_detach functions for dynamic context lifecycle - Add reset_tmp_file_buffer template for clean reattachment state management - Implement get_attach_contexts() for tracking active attachment contexts **Test Infrastructure:** - Add projects/rocprofiler-sdk/tests/rocprofv3/reattach/ comprehensive test suite - Include reattachment test scripts with unified attachment/detachment cycles - Add validate.py with trace data validation for kernel, memory copy, HSA API, and agent info - Add conftest.py for JSON and CSV data loading utilities **Configuration Updates:** - Update CMakeLists.txt to include reattachment tests in build system - Add environment variable ROCP_REGISTERED_TOOL_ATTACH for attachment state tracking - Enhance rocprofiler-register library loading with reattach/detach function resolution **Flow Impact Analysis:** This design enables robust multi-cycle attachment by: 1. Preventing duplicate initialization on reattachment 2. Maintaining separate context lifecycles for attachment vs standard operation 3. Ensuring clean temporary file state between attachment cycles 4. Providing tools with explicit reattach/detach callback hooks 5. Supporting both programmatic and environment-based tool configuration The experimental nature allows for iteration on the API while establishing the foundation for production-ready dynamic profiling capabilities. * Fix misc clang-tidy warnings/errors * CMake Option and Environment Variable Updates - CMake: ROCPROFILER_REGISTER_ALWAYS_SUPPORT_ATTACH -> ROCPROFILER_REGISTER_BUILD_DEFAULT_ATTACHMENT - Env: ROCPROFILER_REGISTER_ATTACHMENT_ENABLED -> * Source reorganization * Formatting + new lines at EOF * Fix flake8 F841: local variable is assigned to but never used * Update attachment test - get rid of 5 second start delay - add roctx * Rework implementation - Remove rocprofiler_tool_configure_result_experimental_t in lieu of rocprofiler_configure_attach - Add <rocprofiler-sdk/experimental/registration.h> - TODO: Update process_attachment.rst * Handle re-attachment options - inherit options from previous attachment - check previous options do not modify data collection services * Fix support for tools w/o rocprofiler_configure_attach - fix segfault when rocprofiler_configure_attach does not exist - fix naming convention for functions accepting attach dispatch table - cleanup rocprofiler_configure_attach implementation in rocprofv3 tool * attach: remove unknown agent handling - Change was from earlier commit, no longer needed * attach: add error for attaching without library loaded * attach: revise version numbering * attach: register header revisions * attach: clang format register * attach: formatting * attach: fix build failure - Remove cross dependency into rocprofiler-sdk, fixes build on some systems * attach: revise register library detection * Update rocprofiler-register and attach library - formatting - proper signature of register_functor for rocprofiler-sdk-attach library callback - remove get_dispatch_registration_table() * Bump rocprofiler-register version to 0.6.0 + AnyNewerVersion * Fix output support for rocprofiler-sdk-tool * Fix formatting * Fix clang tidy errors * Misc rocprofiler-sdk-attach fixes * attach: add sigint handling to attach python * tool README.md formatting Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> * Fix buffered output issue * attach: add errors for tool attach * CI Fixes * Rework tests * attach: improve library loading in rocprofv3 attach * formatting * Update tests to use pytest framework * Fix test_attachment_hsa_api_trace * attach: catch ctypes exceptions * attach: fix leak in registration * attach: fix sanitizer tests * attach: fix sanitizer tests further * attach: disable attach asan tests * attach: disable ubsan test * attach: fix permissions in installed test package * attach: formatting --------- Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com> Co-authored-by: Tim Gu <Tim.Gu@amd.com> Co-authored-by: Claude Code <claude@anthropic.com> Co-authored-by: Benjamin Welton <bwelton@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: Jonathan R. Madsen <jrmadsen@users.noreply.github.com> Co-authored-by: Benjamin Welton <bewelton@amd.com> |
||
|
|
dd44ae3295 |
[Palamida scan] SWDEV-553053 Adding missing copyrights information (#836)
* SWDEV-553053 Adding missing copyrights information |
||
|
|
b60c0ceddd |
[rocprofv3] Unconditionally collect stream and kernel rename data in rocprofv3 for rocpd (#171)
* Remove config checks for stream and kernel rename data collection * Updated csv generation to check if kernel rename is on before calling get_kernel_name * Update metadata to use kernel_rename bool argument * Formatting + unconditionally store kernel name in rocpd * Readded kernel rename parameter after rebase * Fixed rebase conflicts * Updated comment in line with github comments * Added check in rocpd csv.cpp to output kernel name if region name is empty * Add test for kernel rename --------- Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com> |
||
|
|
9849073836 |
SWDEV-540648: Adding realtime clock to v3 tool. Update decoder header. (#666)
* SWDEV-540648: Adding realtime clock to v3 tool. Update header for decoder. * Adding tests * Review comments * Review comment |
||
|
|
53dcae49c6 | [rocprofiler-sdk] Disable multiplex tests (#876) | ||
|
|
2cfedef6b6 |
[CI] Increase rocDecode and rocJPEG Code Coverage (#183)
* Increase rocDecode code coverage and add version check * Update rocJPEG tests * Fix rocJPEG tests * Enable building tests/samples in rocm release compat workflow * Readded rocJPEG test skips * formatting * Adding ROCm libraries for the code-coverage job * Added return value check for error message and updated compatability to enable tests * Disable rocm_release_compatibility samples and tests until openmp issue is resolved --------- Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com> Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com> Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com> |
||
|
|
4d98a0169f |
Handle special cases when stream value is hipStreamLegacy (0x01) or hipStreamPerThread (0x02) (#343)
* Updated stream code to handle special cases when stream value is 0x01 or 0x02 * Removed extra definitions and updated tests to account for special case * Modified stream.cpp so that each thread assigned a unique stream ID when hipStreamPerThread is used as stream value. Modified tests to check that threads are assigned unique, repeated values when hipStreamPerThread is called * Updated idx_offset, stream_map, and thread counter to be in one struct. * Update stream.cpp to only use add_stream() and update tests for seperate unit test for hipStreamPerThread * Remove unecessary comment * Removed unecessary line * Updated tests and stream.cpp to update stream ID correctly * Updated test structure |
||
|
|
6a6b16be93 |
Adding GPU index as a parameter for ATT (#547)
* Adding GPU index as a parameter for ATT
* Tidy fix
* Using tokenize
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
* Adding error logging. Using idx instead of id.
---------
Co-authored-by: Giovanni <gbaraldi@amd.com>
Co-authored-by: Indic, Vladimir <Vladimir.Indic@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
6b2a4fcfc2 |
Revert memory allocation CSV output file header and update tests (#532)
* Reverted header and field location for csv memory allocation and updated tests
* Updated example csv file and made small update
[ROCm/rocprofiler-sdk commit:
|
||
|
|
fe6cfc9cc8 |
PCS test: cast agent name to str (#546)
* PCS test: cast agent name to str
[ROCm/rocprofiler-sdk commit:
|
||
|
|
68ae6cf65f |
[rocpd] Adding summary module to generate summaries from rocpd database + query submodule + rocpd command-line tools (#488)
* adding summary.py to generate tmp <category_region>_summary views
* migrating CSV summary to SDK method of writing CSVs
- Add domain_view to summary.py
- omit the C++ code of writing CSV because it gets revered later anyway
* Add summary subparser and write_sql_view_to_csv function
* adding all <>_summary views generation to summary.py
* add summary_per_rank feature
* add --summary-per-rank
* reconstruct generate_summary_view and create_domain_view
-introduce by_rank
* remove sqr and variance in summary views
* use RocpdImportData instead of connection
* two fixes on summary.py
--modify the generate_summary_view function to return a tuple with view name and sql code
add if_not_exits parameter to generete_summary_view
* Refactor summary.py to allow output path and filename args, and apply time_window
- clean up summary table column headers
- only generate by-rank views if that param is specified
* Add ProcessID to Hostname output and csv, so users can identify the system in the by-rank summaries
* Summary.py, just add hostname to by-rank summaries, instead of creating mapping table
* Summary - migrate csv writer to pandas, for more future flexibility
* Adding a few simple tests for summary.py
* Linting fixes
* add region_categories to summary options
- Automatically retrieve region categories from the database if argument is None
* add backticks for view_names
* fix tests after rebase
* Made code review changes
- fixed whitespace in CMakelists.txt
- adding query.py module & subparser in __main__.py
- refactor summary function to return query
- used query.py to output csv
- used query.py to also output summary to console
- provided new command line options to select summary output to csv or console
* Made fix to jinja template in query.py, as suggested by copilot
* Consolidated output calls to query in export_view function based on feedback
- refactored: helpers, query functions, create view functions
- extended formats to include what query supports (md, html, pdf, json)
- added json format to query, and changed orient=records
- adding jinja2 and reportlab to requirements.txt
* Add version_info for rocpd and roctx
* Add rocpd commandline tool
* Add executable permissions to source/bin/rocpd.py
* Removed rocpd2query, and cleaned up --help examples
---------
Co-authored-by: acanadas <acanadas@amd.com>
Co-authored-by: Jin Tao <jintao12@amd.com>
Co-authored-by: a-canadasruiz <Araceli.CanadasRuiz@amd.com>
Co-authored-by: Jonathan R. Madsen <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
4ca156e572 |
Thread trace and Trace Decoder API tests and samples (#416)
* Adding test and samples to decoder
* Fix sample
* Formatting
* Fix multi test
* Disable sample
* Fix tests
* Format
* Version fix
* Locking the decoder
* Add atomic
* Review comments
* Format
* Adding readme
* merge conflict and adding PCS+ATT test
* Review comments
* Properly disable PCS test
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
* Adding back env var test
* Name fix
* Preload sample
* Addressing review comments
* Update docs
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
990946e956 |
[SDK] Fix null handles (#474)
* Fix null handle
- use .handle=0, not .handle=numeric_limits<>::max()
* Update lib.common.hasher
* Fix ROCPROFILER_CONTEXT_NONE
* Use context operator==
* Update CHANGELOG
* Updated null handle for scratch memory and changed allocation test so that free ops account for null agent
---------
Co-authored-by: Ian Trowbridge <Ian.Trowbridge@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
7243889d6a |
Add perfetto support for scratch memory (#303)
* Add perfetto support for scratch memory
* Updated tests and docs.
* Update docs data
* Added underflow check
* Record all free events to 0 bytes
* Add format
* Address review comment
* updated tests for scratch memory
* update scratch-memory tests.
[ROCm/rocprofiler-sdk commit:
|
||
|
|
fbf17a42d4 |
[SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges (#363)
* [SWDEV-516561][1/2] Add MARKER_RANGE_EXTENT to capture ROCTX ranges
Range extent to capture all work between roctxpush/pop operations. Entry callback takes place during roxtxpush and exit callback takes place in roctxpop. This is primarily to allow us to keep an ancestor id on the ancestor stack such that all operations that take place within the push/pop context can be annotated as being apart of this range. With the current setup (where push and pop are two separate operations that need to be combined externally), we cannot keep an ancestor id on the stack and thus cannot tie tracing events to particular ranges.
Correlation id information is inherited from the push operation. Ancestor id needs to be added in a future commit that also outputs this ancestor to CSV.
Output:
```
[ctest] {'size': 64, 'kind': 7, 'operation': 1, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479402642, 'end_timestamp': 2932551491178449, 'thread_id': 3254861}
[ctest] {'size': 64, 'kind': 8, 'operation': 2, 'correlation_id': {'internal': 1525, 'external': 0, 'ancestor': 1524}, 'start_timestamp': 2932551479405878, 'end_timestamp': 2932551491181214, 'thread_id': 3254861}
```
Note: Kind 8 = range extent op.
* Merge fix
Revert several changes
source/lib/rocprofiler-sdk/marker/range_marker.*
- separate out range marker implementation for standard marker implementation
Update public API with marker core range
Support marker core range in sdk (source/lib/rocprofiler-sdk)
Transition rocprofiler-sdk-tool and output lib to use marker core range
Misc fixes for tests
Fix logic in lib/output/generate{CSV,Stats}.cpp
Update tests/rocprofv3/tracing-hip-in-libraries (marker validation)
Fix test_otf2_data
* Test fixes
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
145944dc30 |
[CI] Disable conversion script validation (#492)
- pytest fails when you selectively disable all the tests in the suite
[ROCm/rocprofiler-sdk commit:
|
||
|
|
fb51f0e5d4 |
[CI] Disable other unstable tests (#491)
Disable other unstable tests
- validation test_validate_counter_collection_pmc1 for conversion script
- increase timeout for tests/rocprofv3/pc-sampling execution phase
[ROCm/rocprofiler-sdk commit:
|
||
|
|
839c07c4aa |
[CI] Testing stability (#486)
* [CI] Testing Stability
- CMake option ROCPROFILER_DISABLE_UNSTABLE_CTESTS
- used for tests which periodically fail around 1 out of every 10 runs
- set to ON while instability remains, this needs to set to OFF in ROCm 7.1 or, ideally, ROCm 7.0.1
- Use FIXTURES_SETUP and FIXTURES_REQUIRED for some tests
- replace "threw an exception" with "${ROCPROFILER_DEFAULT_FAIL_REGEX}" for misc FAIL_REGULAR_EXPRESSIONS
* Remove contents of all EXCLUDE_{TESTS,LABEL}_REGEX from CI workflow
* Disable patch git step in code-coverage run
* Tweak spin time of reproducible runtime
* Removed patch git step in code-coverage run
* Update ROCPROFILER_DEFAULT_FAIL_REGEX
* Mark test-counter-collection tests as unstable
- add fixtures setup/required
* Remove ATTACHED_FILES_ON_FAIL
- CDash doesn't store enable downloading these properly anyway
* Relax collection-period fuzzing window
* Disable unstable collection-period test
- too unstable
* formatting
* Disable unstable device_counting_service_test.async_counters
* Suppress perfetto internal data race errors
* Switch code-coverage CI jobs to mi300 runner
* Timeout increases
* rocprofv3-test-rocpd updates
- add fixtures
- switch executable
- redefine input/output paths
* Revert code-coverage job to mi300a runner
* Update rocprofv3-test-rocpd-execute-multiproc
- reduce problem size
* disable multiproc rocpd
* Split code-coverage into separate workflow
- network issues cause this job to fail frequently
- when in a separate workflow, it can be restarted easily
* Fixtures for rocprofv3-test-trace-hip-in-libraries
* Disable unstable device_counting_service_test.sync_counters
* Potential fix for code scanning alert no. 171: Workflow does not contain permissions
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* Switch code-coverage to run on rocprof-azure
- mi300a EMU runner set is unstable (network issues)
* tests/rocprofv3/pc-sampling SKIP_REGULAR_EXPRESSION
* Update rocprofv3-test-list-avail-trace-execute
- reduce log level and increase timeout
* rocprofv3: Prevent recursive call to rocprofv3_error_signal_handler + log chaining
* rocprofv3: Use ROCP_ERROR + std::exit instead of ROCP_FATAL
- should help with SKIP_REGULAR_EXPRESSION
---------
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
4d79e1df30 |
[SDK] Support CMake option for using internal RCCL tracing + (temporary) enable in CI (#457)
* Temp: disable RCCL tracing
* Update continuous_integration.yml
* Update continuous_integration.yml
* Update continuous_integration.yml
* Adding option to disable rccl tracing from CMake
* Update codeql.yml
* Misc updates
- ROCPROFILER_BUILD_RCCL -> ROCPROFILER_INTERNAL_RCCL_API_TRACE
- env.EXTRA_TEMP_CMAKE_OPTIONS -> env.GLOBAL_CMAKE_OPTIONS
- add (advanced) option ROCPROFILER_INTERNAL_RCCL_API_TRACE
* Fix rocprofiler::sdk::get_enum_label
- missing enum labels for HIP_RUNTIME_API_TABLE_STEP_VERSION > 8
* Update tests/rocprofv3/advanced-thread-trace/CMakeLists.txt
- improve various aspect of cmake -- particularly echoing where attdecoder_LIBRARY was found
* Use CMAKE_MESSAGE_INDENT
- add prefix to cmake messages to help indicate where messages are coming from
- make find_package(Python3 ...) QUIET for bindings
* Fix rocprofiler::sdk::get_enum_label
- handle HSA_AMD_EXT_API_TABLE_MAJOR_VERSION
* Fix rocprofv3 message for att library path
* Fix tests/rocprofv3/advanced-thread-trace/att_input.yml config
* Fix rocprofv3 check_att_capability + soversion/version library resolution
- Account for ROCPROF_ATT_LIBRARY_PATH in env in check_att_capability
- Add resolve_library_path
- supports resolution of library names to SOVERSION and VERSION paths
* Fix python linting error (unused import)
---------
Co-authored-by: Ammar ELWazir <aelwazir@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
4f7d6d7f1c |
Removed roc_video_dec files due to reoccurring errors (#439)
* Removed roc_video_dec files due to reoccurring errors
* Added back rocDecCreateVideoParser
* Fix remaining errors with rocdecode.cpp
[ROCm/rocprofiler-sdk commit:
|
||
|
|
3a62fee4ac |
[rocprofv3-avail] Rework rocprofv3-avail tool (#312)
---------
Co-authored-by: vlaindic_amdeng <vladimir.indic@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
e5097d6a36 |
[ROCm 7.0] [PC sampling] Disable PC sampling tests if the GFX arch doesn't support it (#436)
Disable PC sampling tests if the GFX arch doesn't support it
[ROCm/rocprofiler-sdk commit:
|
||
|
|
bb942ef500 |
[rocprofv3] rocpd Python package (#384)
* Squashed commit of the following:
commit f764eb6f4a45baa25eb8f1b50b1035c84578c200
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu May 1 09:09:37 2025 -0500
Misc post rebase fixes
commit 447418b0765819eb2fb5c8b5c3ca9128a091d37e
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu May 1 07:22:07 2025 -0500
Formatting
commit 975661f5e498cde99f8c3ce5486c47db03856d1b
Author: Young Hui <young.hui@amd.com>
Date: Wed Apr 30 21:19:30 2025 -0400
Reorganize rocpd command line and grouped Required Arguments together
- had to add --input to each output.py file again since it was moved out of output_config.py
- ran formatter
commit 9322328611a332c3979f040b652a9e9a9482200e
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 30 17:00:58 2025 +0000
corrected indices on some operation, and kind for kernel dispatch
commit 6c146cd0c508dca6f2453e3844e09e1ed3f9978a
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 30 16:01:26 2025 +0000
some corrections on pf trace output: added categories
commit 4e02d3f8617324c95e4a449243ab9ab3f4695471
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 30 14:35:19 2025 +0000
fixed perfetto cpp with adding stack id and parent stack id to views:tests
commit d7efd9334361cd7d6a842d083a3f8ca51efe72d3
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 30 14:16:32 2025 +0000
fixed perfetto cpp with adding stack id and parent stack id to views
commit cdd0e2ec0788d44fdf2d5833822e055c43cddec6
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 30 09:47:07 2025 -0500
restore output_config, add output_file arg for generate csv
commit 5f9b7d93dcbefd55e0ba6e2674602e809aa61632
Author: JIn Tao <jintao12@amd.com>
Date: Wed Apr 30 09:30:32 2025 +0000
add ROCDecode and ROCJpeg API calls
commit 7724de1263c5f960cc64c5b0e7afb3834d797f87
Author: JIn Tao <jintao12@amd.com>
Date: Wed Apr 30 09:05:14 2025 +0000
Json output: add counters_collection
commit a13930d6d2b87605ca1ece58291172f79d81d91f
Author: JIn Tao <jintao12@amd.com>
Date: Wed Apr 30 09:01:17 2025 +0000
Json output: add scratch_memory
commit 54e62e25c6d89e718324ce3bc51eb80c25756c48
Author: JIn Tao <jintao12@amd.com>
Date: Wed Apr 30 08:00:08 2025 +0000
Json output: add marker_api
commit ab920196c7ddb68a9a1fdc121f43528e513d5a67
Author: root <root@smc300x-ccs-aus-GPUF2C5.cs-aus.dcgpu>
Date: Tue Apr 29 10:48:53 2025 -0500
csv refactor, fix output-format argument for script
commit e033d18356f397e3a684e255dcffd0c0d64ec19e
Author: JIn Tao <jintao12@amd.com>
Date: Tue Apr 29 11:30:16 2025 +0000
minor revison
commit 748f6754ac0238eca63bb12b26f62b514de65a0d
Author: Jin Tao <jintao12@amd.com>
Date: Tue Apr 29 10:09:54 2025 +0000
Json output: complete structures
commit 52c8d77e0eeb8dca7476814ff03b5cdf88055fd6
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Mon Apr 28 15:44:47 2025 +0000
forced tests upd
commit 7fabc80d3b8db7d137b05a958c633ad5bf8dbae9
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Mon Apr 28 15:43:34 2025 +0000
fixed the relative-type index issue (missing load) for agent info and related parameter adjustmets stuff for python
commit f8f5bffc010ad6d43a9f8fee90a79e4342fb9562
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Mon Apr 28 12:28:35 2025 +0000
added 3 convenience views, some refactoring and fixing kernel-rename option
commit 831cd336115153d1e73f01c9120a67c904478f89
Author: Jin Tao <jintao12@amd.com>
Date: Mon Apr 28 14:45:31 2025 +0000
Json output: add kernel_dispatch, hip_api, hsa_api
commit 4c414a1abce51fbdd6d5856b2e36e6272279c671
Author: Jin Tao <jintao12@amd.com>
Date: Fri Apr 25 13:30:01 2025 +0000
optimize the json output code, certain problems need to be done, e.g., empty counters and strings
commit ceecd7cc5b81f014766199c0a57645386ade23dd
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri Apr 25 10:51:16 2025 +0000
removed unused variable session from write_otf
commit 29fdb2db4fe0cc930cd6b3172092604ee5409242
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri Apr 25 10:14:48 2025 +0000
added tests for generated csv and otf2
commit 5091d2d51e7e4d68fcdc95a97a82a0df41f28350
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri Apr 25 10:12:30 2025 +0000
run formating from command line
commit abbb7637b1704ea904540c5ff717102bf450c76d
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri Apr 25 10:09:25 2025 +0000
added check up if cvs os not broken after other chnages
commit 9ff614d6d8e87fc3647d8f3b0120425c24213f3b
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 24 15:09:40 2025 +0000
updated new csv.cpp and otf2.cpp to fit the string_view fix
commit e94ea0f3668c9b972f2dd4144cb4152c1b202f93
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 24 14:51:55 2025 +0000
fixed string view reporting which works at least forgenerateOTF2.cpp
commit 5c5ea532279fba0b7ef5abcd1916d20d0b7fb7b8
Author: Jin Tao <jintao12@amd.com>
Date: Thu Apr 24 14:32:10 2025 +0000
Json output: add strings.marker_api
commit d28d1c18c9693421f7676d6de82c2c20af11eaa0
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 24 13:10:13 2025 +0000
small upd on cmake for tests
commit 325cb3719517ad514291ab620dd85fb04daeb906
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 24 13:08:58 2025 +0000
fixed abs_index for connecting data and handles for otf2 location reporting
commit e9a648ade545795646f6aca61fdbece5a39fea5c
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Tue Apr 22 14:00:12 2025 +0000
commit after fixing warnings, executabel name, validate works
commit 1cd2d63501b8f951996c9dfcd1d0af6d6f16c006
Author: Jin Tao <jintao12@amd.com>
Date: Thu Apr 24 12:38:03 2025 +0000
Json output: add marker_api in buffer_records
commit f01ab23d2f8f6c524568e2c453fe26a2e4320a1c
Author: Young Hui <young.hui@amd.com>
Date: Wed Apr 23 21:37:36 2025 -0400
Add python binding for agent-index-value to output_config
- command line passes correctly for csv, perfetto needs to be fixed
commit a92dd0c060dd398db365ec37af905dcca25c8a7e
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 23 09:08:43 2025 -0500
provisional fix in json.cpp
- For now using absolut index for agents
commit fddacacbb54f5678a40d552ec8a3a2f9de65381b
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 23 08:55:49 2025 -0500
adding agent indexes (abs, log, type) to types for agent_index_value options
- Fixing agent_id populated in rocpd_memory_allocate table
commit 3b0414ba5271d25996564eddc6d757b1afc637af
Author: Jin Tao <jintao12@amd.com>
Date: Wed Apr 23 11:22:59 2025 +0000
minor revision regarding json output
commit a84bc3d0f7dd9bf1014d024ef15eeb7c7ec990c5
Author: Jin Tao <jintao12@amd.com>
Date: Wed Apr 23 11:16:48 2025 +0000
add json output
commit e6c0dd98de0b5f24492ac4396cf8d59bd20d58ad
Author: Young Hui <young.hui@amd.com>
Date: Tue Apr 22 17:22:45 2025 -0400
Add rocpd commandline input param file check, to ensure DB exists
- added OTF2 script
- added placeholder JSON script
commit 1d482257b8f23bf4d64d57d8bd36775b38254026
Author: Young Hui <young.hui@amd.com>
Date: Mon Apr 21 12:47:52 2025 -0400
Clean up some rocpd python files
- removed some unused files
- cleanup __main__.py imports and duplicate main
commit c15af2aac9935ffce92d9d6ce35ab5e9eabed57c
Author: Young Hui <young.hui@amd.com>
Date: Thu Apr 17 18:40:13 2025 -0400
Add rocpd command line support
- right now pftrace and csv are supported
- also removed some otf2 test files, to fix cmake configure
- formatting edits
commit 10bee3bcf496edd8e1ad9521498c926915a33f07
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 17 15:17:15 2025 +0000
experimented with roctxA, formatted
commit 11ff50882bcbedbe516c6461c5f8d65e38d0aae5
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 17 15:17:05 2025 +0000
experimented with roctxA, formatted
commit 7cc87cbf56f6d7117df10dc2cfe45174bedff22a
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 17 14:16:16 2025 +0000
small refactoring on api-calls preprocessing
commit 421bb11d5b97a4b888c0f9a0b46fca229e4abf25
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu Apr 17 14:03:39 2025 +0000
added rccl, rocdecode, rocjpeg, corrected markers
commit 71c1122ec001ce2548aa1e6d7b0d4bbd5ac16d79
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Wed Apr 16 15:59:39 2025 +0000
integration tests for otf2 generation will stay till otf2 gen becomes stable
commit b8ff32bb269a4efec001804eb0064a8c6c7f8f6d
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Wed Apr 16 15:07:19 2025 +0000
intermediate local commit of functioningotf2 output after refactoring writer code
commit 4d6140fbad00a713aed20b72d41fa62219f9aed7
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Wed Apr 16 09:18:40 2025 +0000
intermediate local commit of functioningotf2 output
commit 96f40ebce93ff3a27c01d5e5267eda67c3ab68ec
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri Apr 11 17:58:24 2025 +0000
first working commit on generating otf2
commit 75ddceb4bd3dc6c32cd8a60450bd1c70bf4d3193
Author: acanadas <acanadas@amd.com>
Date: Wed Apr 16 11:24:30 2025 +0000
Add CSV export functionality to ROCm Profiler Data
Implement complete CSV export pipeline with the following components:
- Add write_csv function to libpyrocpd.cpp for core CSV generation
- Create csv.hpp and csv.cpp for CSV formatting and management
- Implement stats_summary.hpp and stats_summary.cpp for performance metrics
- Add comprehensive type definitions for markers, counters and statistics
- Create SQL views in schema_data for efficient data extraction
- Add csv.py module similar to pftrace.py for Python API consistency
- Implement convert.py script with multiple format support (CSV, Perfetto)
This change enables exporting profiling data in CSV format for easier analysis
and integration with external tools.
commit 953223e32faa862e79bd1f61e28a55874efa0589
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Tue Apr 1 19:58:18 2025 -0500
[output] Update generateRocpd.cpp to new schema + misc
- support guid, {{upid}}, {{view_upid}}, etc.
- improve env support for MPI in format_path
commit 54bd3b0def48d91a81045676fb2f5f549b813880
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Tue Apr 1 19:54:41 2025 -0500
[rocpd] Updates + cleanup
- remove stale {autograd,call_stacks,strings,subclass}.py
- reorganize the SQL schema files
- add source/functions.{hpp,cpp}
- custom SQL function
- update interop for defining functions
- misc improvements to "write_perfetto" function
- added rocpd.pftrace
- added rocpd.output_config
commit 32f668b019c961f0797eec9f613cf5dfea0aa377
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Tue Apr 1 14:35:31 2025 -0500
[common] md5sum calculator
commit b6ae75ba270ea92661f6cfe75647531a4d6202f3
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Sun Mar 30 01:57:50 2025 -0500
Optimize sql_generator when query requires ORDER BY
commit 51d6f33b0b1f80dd09de70c91592f928e31a730f
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Sat Mar 29 01:49:52 2025 -0500
Minimal support for merged pftrace from multiple database
commit 90c4add9001cad2af85d14783ac1fb35c89c7770
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Sat Mar 29 00:46:59 2025 -0500
Update cereal submodule
- fix recursive include
commit a5d75dcb5de9c0667af03ec7a34ac484ff864bac
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Fri Mar 28 14:30:32 2025 -0500
Formatting
commit 7345810d5ea5d76b9a4ed9bd548399cb8df1feda
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Fri Mar 28 13:44:24 2025 -0500
Fixed interpolation bug
commit 289739669a84ac83b920f417ef94310dd9ee40c6
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Fri Mar 28 13:32:29 2025 -0500
Initial memory copy implementation created
commit 91416d784bae05984b8e9670d6cd22231fbc8bed
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Fri Mar 28 13:12:43 2025 -0500
Added memory allocation counter tracks, removed midpoint interpolation temporarily due to strange outputs
commit 7e733e393d06fecc198ae4f7891edecb90882136
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Fri Mar 28 13:22:34 2025 -0500
Support multiple database connections
commit e82d7b3ee24a855258dee50ea3a4ee8e52f70509
Author: Mark Meserve <Mark.Meserve@amd.com>
Date: Fri Mar 28 17:44:05 2025 +0000
fix tracing_session init
commit d29ac6270e7466794ca43ffdf061b7514a29ad94
Author: Mark Meserve <Mark.Meserve@amd.com>
Date: Fri Mar 28 17:00:08 2025 +0000
seperate perfetto init to RAII class
commit 0538261898fdc83a05dc3835ec07d90c4b8dd937
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Fri Mar 28 11:03:45 2025 -0500
Kernel trace in perfetto now functional
commit 3eb63de679e8dfbc3bc551302ca097356f17de7d
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Thu Mar 27 17:43:43 2025 -0500
Added memory copy and allocation types, need to test kernel perfetto output
commit c70ce8340a5e4603bed27d6f3f0d95bc77aad196
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 15:16:18 2025 -0500
Misc libpyrocpd updates
- support conditions on read
commit 189226fb3aeaf3485137335392b271f4f1271040
Author: Ian Trowbridge <Ian.Trowbridge@amd.com>
Date: Thu Mar 27 14:43:25 2025 -0500
kenrel_dispatch type added, added load function for serialize
commit 07f6af65733f34c067365c94500b03a9ffff6b6b
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 14:08:30 2025 -0500
ROCPROFILER_BUILD_DEPRECATED_WARNINGS OFF by default
commit b3df97af8fee651d20030dbfc8d6c635774030a7
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 14:06:42 2025 -0500
Fix include
commit 49baef7d173385154d346120313e6c9511665b68
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 13:16:50 2025 -0500
Python interface improvements
commit 7d614ed3ab07836c420e216261e80e0b629739a4
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 13:16:35 2025 -0500
Output keys: support {...} format in addition to %...% format
commit afdb63a0814f0954dfb7500abe5c95fbacdddbc2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 00:43:50 2025 -0500
[Release] Replace implementation usage
- rocprofiler_record_dimension_info_t -> rocprofiler_counter_record_dimension_info_t
commit d60ecf99334d96a75b08f99d7a5d8556588258e3
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 00:43:04 2025 -0500
[CMake] ROCPROFILER_BUILD_DEPRECATED_WARNINGS option
commit a89de9f205d500ffc9fdbef400b8b712b167782b
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Mar 27 00:30:52 2025 -0500
[python rocpd] bindings updates
- read_{code_objects,kernel_symbols,nodes,processes,threads}
- support writing perfetto output for regions
- support fallback casting of python object
- defined various data types for python
commit bb048b42f5828ce742947e1d3b72a35c578c0b0c
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 22:18:56 2025 -0500
[SDK] hip stream tracing update
- disable hip stream tracing support for HIP compiler API functions
- add "stream_value" field to rocprofiler_callback_tracing_hip_stream_data_t
- data type: rocprofiler_address_t
commit 392a12b20ee795ab025097066444504aac3ddd88
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 22:02:22 2025 -0500
[output library] generateRocpd.cpp updates
- generic functions for accessing fields which may/may not exist
- simple timer logging
- populate parent_stack_id
- misc cleanup
commit 86ae21d178a0c52ec47e27414b63edbd2a62a94d
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:46:34 2025 -0500
[public API] update cxx/serialization/load.hpp
- ROCPROFILER_SDK_CXX_SERIALIZATION_LOAD_DEBUG
- load definitions
- rocprofiler_address_t
- rocprofiler_callback_tracing_code_object_load_data_t
- rocprofiler_callback_tracing_code_object_kernel_symbol_register_data_t
commit 4ba6232c23296791df484d47db8268e4bc997c0d
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:44:52 2025 -0500
[public API] update cxx/perfetto.hpp
- function: get_perfetto_category for callback tracing and buffer tracing kinds
commit 4c0fcf4395f6c337cf3b955ec45b0567f9b3a477
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:26:32 2025 -0500
[python] Rename rocpd/schema_data/* files
- {tables,views,indexes}.sql
commit 47b862ece9ed3f16d7a5ecd6b632f13b0086bc01
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:23:35 2025 -0500
[external] cleanup external/CMakeLists.txt
commit 4e2a71db9b224b11f4144104b4569a5efb45b6c2
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:23:05 2025 -0500
[samples] Minor tweak to counter collection sample app
commit c62dd55c5e82b4105bdce374dc45c6adf65a0cc4
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:22:08 2025 -0500
[tool library] Rework kernel rename and stream id data
- simplify and improve memory management
- support stream information for memory allocation
commit 2521d499664bb01c0e2f9f9454b5da5c38b29cfc
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Wed Mar 26 21:18:46 2025 -0500
[output library] Misc reorg and updates
- dropped "with_stream" from various data types
- add stream support to memory allocation records
- generator<T> base class resize function
- serialization load functions for kernel symbols, node_info
- reorganization of SQL code
- cereal::SQLite3InputArchive
commit 70a76d6352dca84241f1749da660f8af8e89c469
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Mon Mar 24 03:52:58 2025 -0500
Misc fixes after rebase
commit 4811201c3e07ac8b5b4edc4028df9cdbc3481bd1
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Nov 14 22:30:43 2024 -0600
Python Bindings
Fix rocpd graph support
- remove uniqueness constraint from subclass
- replace assumption of "graphExec" with "pGraphExec" in the parsed API arguments
pybind11 submodule
Initial roctx python bindings
Fix testing
Add rest of python bindings for ROCTX
Add decorators and context manager for roctx
Fix: restore rocprofiler_reset_python3_cache from CMakeLists
work in prorgess! added stub test for pybind
added subdirectory for pybind-test
Fixing CMakeLists for pybind-test
Pybind-test
- Move simple-transpose-pybind.py to tests/bin
- Execute test in CMakeLists for pybind-test
Fix python format
Update rocprofv3 python bindings testing
- move to tests/rocprofv3/python-bindings
- remove misc code
- add marker.py
Update roctx python bindings
Update rocprofv3 python bindings test validation
Update rocprofv3 python bindings test validation
- Add checks for marker_api in json format
- Unify checks for csv and json
patch removing tmp trace_processor_python_api file after validate is run, ptherwise it blocks the work of others
correction of previous commit: correct patch
removed patch because the santizing will be added to perfetto reader
perf reader: removing tmp user-owned file after the data are read
robust handling of removing tmp file via a context
minor clean up perfetto reader, fixing hip-in-libraries-test
minor clean up perfetto reader, fixing hip-in-libraries-test
Fix generateRocpd for older compilers
Formatting
Remove rocpd-db-to-chrome-tracing console script via pip
Fix PYTHONPATH for rocprofv3-trace-roctx-python-bindings-*
replaced context by PerfettoReader destructor to remove tmp perfetto dir after the test run
formatting python
formatted cmake for pybind-tests
Update contexts and decorators roctx python bindings
added stub jypiter notebook for python analysis
formatting notebook
Update jypiter notebook for python analysis
- Create summary tables as views : memory_copy, kernel_dispatch, hip_api
upd the generate_rocpd.py by schme reading and rocpd_node table and queue to rocpd_kernelapi
comments to discuss on generateJson and generateRocpd.cpp
comments to discuss on generateJson and generateRocpd.cpp
remove comments because they block pull
remove comments because they block pull
format generate_rocpd
formatted generateRocpd and generateJSON cpps
formatted generateRocpd and generateJSON cpps
formatted the analysis notebook
reset the Rocpd aand JSon generate to the state before I added comments because my local formatting with c-lang format fails
add using pandas dataframe to generate summaries
added basic pytorch trace example in tests
upd execute toy pytorch test
pretty printing 3 temporary views bypassing pand frames
set instrument cells (functions, constants) in front of usage cells
added experimental routines for creating time slots via markers and counting given api calls within this time slot
formatting
simplified utilities
more simplified utilities, added label usage in report
more simplified utilities, added label usage in report
fogottem commit example matrix mult
added extracting copy operations by kind, added second db to compare, moved looping over database outside basic procedures
marker around print
replaced push.pop with just markers
added duration and matrixmult flops, lists of kernel names and api names
add bin directory variable to pass to generateRocpd
correcting schema
add bin directory variable to pass to generateRocpd
debug upd for matrixmult
create default timed views as copy of the original db
correct table naming in the orifinal db
removing rocpd_string, rocpd_metadata from views
removing rocpd_string, rocpd_metadata from views
removing rocpd_string, rocpd_metadata from views
implemented timed views clean up and creation innotebook
formatted python
formatted python
formatted files via make format and commented pytprch test
added new line at the end of the tableSchema
Add tracing script for rocpd time slicing views
- Start and end options: timestamps/percentage , markers
rebase and format one cpp file
fix source formatting
Rocpd: Time slice data
- Rename tracing.py > time_slice.py
- Adapt time_slice.py to use RocpdImportData, fix typo 'rocpd_api_ops', add output argument (overwrite input otherwise), update rocpd_kernelapi view
- Simplifyed chrome_tracing.py : remove start+end arguments, avoid manipulating the time window/time slice
Fix rebase issues
[DO NOT MERGE] rocpd-schema.md
Schema updates
rocpd schema v3 updates/fixes
rocpd schema v3 updates/fixes
- samples view
- kernels view
- insert zero duration regions as samples
generateRocp update rocpd_memory_allocate table fill
Misc compilation cleanup
tableSchema.sql updates
Add core performance analysis views:
- Add busy view for GPU utilization metrics
- Add top view for overall performance summary
- Add top_kernels view for kernel performance analysis
- Add memory_copies view for memory transfer tracking
- Add memory_allocates view for memory allocation stats
- Add kernel_summary view for aggregate kernel metrics
utilitySchema.sql clean up TOP view and some consistancy changes
added marker views
Update rocpd schema
- new: rocpd_pmc_event
- modified: rocpd_pmc
- modified: _rocpd_memory_copy
- name_id
- add rocdecode API
- populate rocpd_pmc and rocpd_pmc_event
- slight modifications to column names for consistency
:memory allocation insert for db
memory allocation insert for db
memory allocation insert for db
commit after rebase
autoincrement for memory allocation
stringsanitizer applied for pmc description
replaced inner join with left join for stream id, queue id, region name id
Update chrome_tracing.py
- chrome_tracing script updated using new schema
- importer.py pending update with new schema, currently commented out
Misc compilation cleanup after rebase
Update rocpd/time_window.py
- time_window.py script updated using new schema
- Adapt chrome_tracing.py to use time windows
added rocpd_arg and the join view with events, populated
set back default python version to 3.6 for remote
addeddereferencing stream ids and usage, some formatting
refine error messages and help arguments in time_windows.py
test if removing exact python version fixes integration test
removed unused variable
formatting and added back exact 3.6 to fix python linting (try)
removed the exact python version again because it causes advanced analysis to fail
revise microseconds to nanoseconds
made type and name text again, put back strict version for python
Reorganization of public cxx serialization headers
Rework generator<T> (abstract base) + file_generator<T>
- return file_generator<T> from buffered_output
Add gotcha submodule
Update cxx serialization headers
Revert some HIP stream changes
[output library] Misc updates
- fix serialization of agent_info
- add parseRocpd.{hpp,cpp}
- move common SQL utils to sql.{hpp,cpp}
[python libraries] Bindings for rocpd + reorg
- move common cmake code into source/lib/python/utilities.cmake
- build python bindings for rocpd (primitive implementation)
- Require ROCPROFILER_BUILD_SQLITE3=OFF for rocpd python bindings
- wrap sqlite3_open / sqlite3_close / sqlite3_open_v2 / sqlite3_close_v2
Update rocpd/importer.py
- importer.py updated using new schema
commit aff19818a5c9f9f6004013144ae00e2c31b21739
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Nov 14 22:27:39 2024 -0600
[SEPARATE PR] HIP API buffer records with args (ext)
- Needs to be in separate PR
- New buffer tracing domain(s) for HIP APIs which include the arguments and the return value in the buffer records
Update HIP stream support for extended HIP buffer tracing
Update rocprofv3 tool library and output library to use extended HIP buffer tracing recods
commit 43c3f0ddd5a104346d6db77b8a1b66fd9ec2f797
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Thu Nov 14 18:47:52 2024 -0600
[SEPARATE PR] Combo of several separate PRs
[SEPARATE PR] Update correlation ID retirement
- Needs to be in separate PR
- correlation_id_finalize
[SEPARATE PR] rocprofiler_query_intercept_table_name
- Needs to be in separate PR
- Function to get the name of intercept tables
[SEPARATE PR] Memory copy and memory alloc updates
commit e3da9738b06f974fb6b935893f4172852819b6bc
Author: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Date: Mon Nov 4 17:22:30 2024 -0600
Add SQLite3 build support
Initial SQL writing to rocpd schema in C++
task code
Implement kernel dispatch and memory copy writing to table
Update generateRocpd.cpp
- SQLITE3_CHECK macro
- Error messages for sqlite3_exec errors
- Tweaked rocpd_kernelcodeobject to use kernel ids as id
- fixed some issues:
- use string_entries.at(...) instead of (SELECT id FROM ... WHERE string = ...)
- use custom op_idx instead of relying on correlation ID (one correlation ID can map to multiple ops)
Update generate-rocpd.py
- misc tweaks to mirror generateRocpd.cpp implementation
- add strings for counter names
- use kernel id for rocpd_kernelcodeobject
Tweak tests/bin/hip-graph/hip-graph.cpp
- use stream sync
Update generateRocpd.cpp
- made table and view schema more readable
- sql_exec_callback
Add source/scripts/rocprofv3-db-to-tracing.py
- Script which reads a rocprofv3 rocpd database and outputs a chrome tracing format JSON
Common library updates
- static_tl_object: similar to static_object but for thread-local objects
- additional template metaprogramming constructs
- reverse
- function_traits
rocprofiler_stream_id_t: opaque handle for a stream
- e.g. HIP stream
- the same HIP stream may map to different HSA queues at different points in the application
- added to:
- rocprofiler_buffer_tracing_hip_api_record_t
- rocprofiler_buffer_tracing_memory_copy_record_t
- rocprofiler_callback_tracing_hip_api_data_t
- rocprofiler_callback_tracing_memory_copy_data_t
rocprofiler_stream_id_t: output support
- use stream_id in generatePerfetto.cpp
- use stream_id in generateRocpd.cpp
rocprofiler_stream_id_t: output support
- rocpd_kernelapi and rocpd_copyapi encode stream/queue as integer instead of string
Update source/scripts/rocprofv3-db-to-tracing.py
- Create temporary view from multiple db file
- Modify write_chrome_tracing_json to use the tmp views
Update rocprofv3-db-to-tracing.py
- retain rocpd tables names in create_temp_view instead of appending "_tmp"
- improve the GPU/Agent and Queue identifiers
- make the SQL statements easier to read
- remove rocpd_hsaApi usage (not necessary as HSA data resides in rocpd_api table)
- directly connect to input if only one database
Update generateRocpd.cpp
- use dispatch id for rocpd_kernelapi id (primary key)
- use dispatch_id for rocpd_op sequenceId
- create copy_id for rocpd_copyapi id (primary key)
- use copy_id for rocpd_op sequenceId
Update tests/{async-copy-tracing,kernel-tracing,page-migration}
- expand reading the RCCL and scratch memory trace data
Add sdk::parse::strip function
- utility function for stripping characters from beginning and end of string
Preliminary rocpd_node table
- rocpd_node is a meta-table for process ids
- added lib/output/node_info.{hpp,cpp}
- add tool::node_info instance to tool::metadata
Update page-migration test
Python packaging of rocpd
add temporary view rocpd_copyapi
Update rocpd/importer.py
- Add rocpd_kernelapi and rocpd_copyapi
- Create meta temporary views using RocpdSchema
JSON tool handling of HIP compiler API data
Misc rocpd updates
- SQL formatting
- remove unused imports
- fix relative import
lib/output metadata updates
- add process_init_ns and process_fini_ns
- add command_line member
rocprofv3 updates: tool.cpp
- simplify buffer service configuration
- rocprofiler_at_intercept_table_registration -> api_timestamps_callback -> start time
- record init, fini, start, and end times
Added simple_timer to common library
Add missing new lines at end of files
f
added labels for multiprocessed, tested kernel-rename, groupd by queue, type-relative of but bug in run-label
- fixed static marker name to dynamic
rocpd fix python linting errors
Squashed commit of JSON changes, small CSV, OTF2, pftrace fixes, and rocpd commandline params
Contains these original commits:
commit 2c7a10771d60ad0b93073f94f6226a6e92ade4cb
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Wed May 14 18:57:30 2025 +0000
removed run markers reporting from otf2 as they duplicate api calls
commit 826c0c13a164e7a5d2c7f1963cc437ee53416658
Author: Young Hui <young.hui@amd.com>
Date: Wed May 14 12:55:51 2025 -0400
rocpd command line updates
- some stats-summary params moved to generic params since applies to json and csv
- added --group-by-queue for perfetto
commit 33a82dcc9c05869f70219b75c76e4d0e6ae84a39
Author: acanadas <acanadas@amd.com>
Date: Wed May 14 07:12:45 2025 -0500
Fix corr_id for kernel in write_csv, fix std dev in summary views
commit b89dfbd02a243c3e1b5d1a4a968ab5b7c9ecb3a3
Author: acanadas <acanadas@amd.com>
Date: Wed May 14 04:03:56 2025 -0500
Fix generate json merged for multiple db
commit 4968f65f51a6539c95a801c1340487a5675b1f45
Author: Jin Tao <jin.tao@amd.com>
Date: Wed May 14 08:02:10 2025 +0000
clean the code for json: remove host fuction, but keep the basic structure
commit 0f5ed2011a6da5a6046974c03c64569a4747102e
Author: Jin Tao <jin.tao@amd.com>
Date: Wed May 14 07:12:23 2025 +0000
small fix
commit 152c149905e98f33688fdb0cfc5ca88b3c61694d
Author: Young Hui <young.hui@amd.com>
Date: Tue May 13 22:13:35 2025 -0400
Schema revert rocpd_event.correlation_id back to INT
- removed duplicate K.tid from kernel views
- re-enabled some json.cpp code to build
commit 8af5ff79b1f85cd1f2ac4c61af580dc891e6dd70
Author: Jin Tao <jin.tao@amd.com>
Date: Tue May 13 11:13:56 2025 +0000
add host_functions
commit 2a3e0307ab7571be0f46ffd2f706d2c9f34cdce1
Author: acanadas <acanadas@amd.com>
Date: Tue May 13 06:08:10 2025 -0500
Adding summary to write_json, add extra information for sql column type error
commit d339f1cc5eb856e9e3bd7461ac4d776e8265b7cd
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Mon May 12 17:42:47 2025 +0000
fixed counter annotations for kernel dispatch AND shadowed names in jsob.cpp
commit d3a8ece3135a8cb5a78b05f7ca653793d89d5d5a
Author: acanadas <acanadas@amd.com>
Date: Mon May 12 09:18:56 2025 -0500
Fix node for counters in write_json
commit 5153841b8d047bb4c78aeda5675848f2887aef29
Author: acanadas <acanadas@amd.com>
Date: Mon May 12 08:26:22 2025 -0500
Adding counters. code_objects and kernel_symbols in write_json
commit e979a1ab87d6b148d2b90ae25fc60d5e766e3251
Author: Jin Tao <jin.tao@amd.com>
Date: Mon May 12 13:10:13 2025 +0000
add json counter_collection.records
commit 026d022052122ee36f83022e270110840dc38aa3
Author: Young Hui <young.hui@amd.com>
Date: Sat May 10 00:29:33 2025 -0400
Fix kernel_dispatch thread_id in Perfetto traces, more pftrace args supported and formatting
- perfetto-backend=system results in 0KB pftrace file, but probably need the traced and perfetto daemons (have not tested with them)
commit f1341487723672b407fa89d2d383944d92663c55
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri May 9 17:08:40 2025 +0000
fixed counters for perfetto but need more testing
commit 33612f9bc6121e38ad9b41d2aecd035d1d629efb
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri May 9 13:34:50 2025 +0000
rearchitectured generator tests into a dedicated rocpd-generators suit (to be extended) and jdon file-name input made a parameter
commit da1e386b77bc1bbbbdc1ca8ee2ffd8c09a99ae5b
Author: acanadas <acanadas@amd.com>
Date: Fri May 9 08:36:00 2025 -0500
fixing buffer_record missing values in json, fix using clang_tidy
commit f7adb8b28ab64127ef20ff642ced40ccace7fef8
Author: acanadas <acanadas@amd.com>
Date: Fri May 9 10:08:28 2025 +0000
revise callback_records.counters_collection, still has 3 TBDs
commit 49d4057e7b7e295953915b90cdeb7359948c5034
Author: acanadas <acanadas@amd.com>
Date: Thu May 8 11:10:32 2025 -0500
Fixing correlation_id for csv and json
commit bb37556c9d1da7fe1ccb0bc5bf0e05d4f0989476
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu May 8 13:27:01 2025 +0000
This is a combination of 3 commits.
tree node indexing updated, added extended agent struct for otf2 for efffective handling agents, agent indexes and laeled agent names
rolled back tree id as process counter, tested well (source and test)
commit 38dc36da08a5a505b6d7d47840690b4d6c29b429
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Thu May 8 13:26:46 2025 +0000
tree node indexing updated, added extended agent struct for otf2 for efffective handling agents, agent indexes and laeled agent names
commit 2958ba37abc86e5b03bc03f5d529c0efcfd2ce45
Author: acanadas <acanadas@amd.com>
Date: Thu May 8 08:08:10 2025 -0500
Adding size, kind and operation for buffer_records in write_json
commit 654c42d8a1c341ef1bfbb724eaf6b24e64fb5475
Author: acanadas <acanadas@amd.com>
Date: Thu May 8 07:40:30 2025 -0500
Fixing merge with schema change
commit 01f133bc0fed66d60cb5fe05d9e0b256938c7fb0
Author: acanadas <acanadas@amd.com>
Date: Thu May 8 06:48:58 2025 -0500
fixing write_json merge
commit dd3bbd5ed6faabc2207d9bf933c5c8efb8a2fe16
Author: root <root@smc300x-ccs-aus-GPUF2C5.cs-aus.dcgpu>
Date: Thu May 8 04:45:56 2025 -0500
update write_json adding strings, buffer_records and counters
commit 6a991305bf75f4f2e944b40e7194b43fcf1ec340
Author: Young Hui <young.hui@amd.com>
Date: Thu May 8 01:29:08 2025 -0400
Schema changed rocpd_event.correlation_id to TEXT type [WIP]
- when --kernel-rename is used, SQL stores correlation_id.external.value value with precision error. To store accurately, need to store as TEXT.
- work-in-progress commit, still getting a few rocpd SQL conversion warnings due to TEXT type change.
commit d50b058dc764394f9fc9cfc264866f43f9354da0
Author: oshkarav_amdeng <oshkarav@amd.com>
Date: Fri May 2 09:32:12 2025 +0000
This is a combination of 2 commits.
fixed clang tidy complaint on otf2 and perfetto, added guid-s t info names, but still 2-process DBs fail
fixed mutliprocess otf2, but agents still need to be labeled by guid
Misc fixes
- flush less often for perfetto
- metadata create_agent_index
- remove sqlite3_close from write_otf2
- warning if gotcha_wrap fails
- fix rocpd-generators tests
- fix multi-python build config
Whitespace cleanup
Revert tests/rocprofv3/summary/CMakeLists.txt
Remove unused scripts
- generate-rocpd.py is outdated
- rocprofv3-db-to-tracing.py is scratch
- simple-transpose-pybind.py is no longer used
Fix maybe-uninitialized compiler warning
Fix maybe-uninitialized compiler warning
Update rocpd.write_X functions to require RocpdImportData instance
- create libpyrocpd.RocpdImportData
- rocpd.importer.RocpdImportData inherits from libpyrocpd.RocpdImportData
- write_csv, write_json, etc. all require instance of RocpdImportData instead of list of sqlite3.Connection
output/rocpd source reorganization
- move lib/output/parseRocpd.* to lib/python/rocpd/source/common.*
- move lib/output/serialization to lib/python/rocpd/source/serialization
- move lib/output/sql/generator.hpp to lib/python/rocpd/source/sql_generator.hpp
Minor refactor of lib/output/sql/*
Remove lib/output/sql.hpp
- this file just included other headers
Fix rocpd source reorg
Update schema files
- replace quotes with backticks
remove short guid from perfetto
fix string sanitization in generateRocpd.cpp
* Update cereal submodule
* Updates following rebase
- remove MANIFEST.in
- remove rocpd/schema_data/*.sql
- use rocprofiler-sdk-rocpd
* rocpd command line update to pass importData instead of connection
* Update get_perfetto_category for ROCDECODE_API_EXT
* data_views.sql updates
- remove kernels_renamed view
- remove rccl view
- remove rocjpeg view
- remove rocdecode view
- remove api_regions view
- remove api_threads view
* Perfetto, OTF2, output config updates
- Support kernel rename for output config
- Combine memory copy and kernels into same stream track
- Fix get_category_string in perfetto
- Support kernel renaming in OTF2 and Perfetto
* Add samples to perfetto
* ORDER BY for perfetto regions/samples
* Fix busy view, fix integer overflow in summary views
* CSV adding --stats-per-rank and --stats-summary-per-rank reports
Squash of these original commits:
commit 4c0ded4efb6ef273953faf7fd100a54ce16ae00b
Author: Jin Tao <jintao12@amd.com>
Date: Mon Jun 2 07:25:06 2025 +0000
change all rank output to PID instead of GUID
commit 969950b009ad18a156a31bfee43f64e032720262
Author: Jin Tao <jintao12@amd.com>
Date: Wed May 28 11:22:29 2025 +0000
Change --stats-per-rank and --stats-summary-per-rank to csv.py
commit 970f88dec3bb83663d8a84d8bf4dfcfb8857e903
Author: Jin Tao <jintao12@amd.com>
Date: Thu May 22 11:39:51 2025 +0000
refactor and small bug fix
commit a4c5ab4290c0a9b7b45421c3fc21150914d12110
Author: Jin Tao <jintao12@amd.com>
Date: Thu May 22 10:59:13 2025 +0000
add args for --stats-per-node and --stats-summary-per-node
commit 432907d0a54de886d01f962f3084c6708bdb1456
Author: Jin Tao <jintao12@amd.com>
Date: Thu May 22 09:22:55 2025 +0000
add summary for marker, rccl, rocdecode, rocjpeg_calls
commit ab53c9c8f4a6f5f7bf40a32bf1f93add1dbd10f5
Author: Jin Tao <jintao12@amd.com>
Date: Wed May 21 13:33:04 2025 +0000
add hsa_api_stats_by_node, hip_api_stats_by_nod
commit b26974bdd6d0047759b4014f9c226cf771f0a2ad
Author: Jin Tao <jintao12@amd.com>
Date: Wed May 21 11:14:07 2025 +0000
Add memory allocation stats by node and memory copy stats by node
commit d02819c2c4bd57bfeb0537d3a42d9f22866e8600
Author: Jin Tao <jintao12@amd.com>
Date: Wed May 21 08:33:08 2025 +0000
Kernels (or categories) by node completed
* JSON fixes for agent in memory allocation if free in write_json, add group by nid for _summary_node views
Squash of these original fixes:
commit 25d8d985b319585fab16eb6d8e9ec5c91e767a91
Author: acanadas <acanadas@amd.com>
Date: Thu May 22 14:00:20 2025 +0000
fix agent for memory allocation if free in write_json, add group by nid for _summary_node views
commit d64905418a307db9e56b816e4b9f4de587bc3a14
Author: acanadas <acanadas@amd.com>
Date: Thu May 22 08:55:21 2025 +0000
Fix typo json.cpp for memory copy src agent
* CSV add domain_stats_per_rank and PID header fix
Squash of these 3 commits:
commit 10ec1cf40cc92a9d7ce7db84a8c6bbe20c51a340
Author: Jin Tao <jintao12@amd.com>
Date: Tue Jun 3 10:06:33 2025 +0000
small fix PID header rename
commit 6775fd3b2031b30ae389b165cf44b321026c1e80
Author: Jin Tao <jintao12@amd.com>
Date: Mon May 26 11:47:23 2025 +0000
Add domain_stats_per_rank
commit 9f3eb1793e354c1f04bab1fbc837e86f1f5b5030
Author: Jin Tao <jintao12@amd.com>
Date: Thu May 22 12:11:28 2025 +0000
add domain_summary_node view
CSV PID header formatting
* upd otf2 generate validation test went well
* fix Cmake formatting
* In data_views, change P.id/T.id to P.pid/T.tid
* Remove unapproved views for this PR
* Remove rocpd JSON output support
* Fix clang-tidy errors
* Revert output_config changes + cmake + remove chrome_tracing.py
* Revert
* Misc rocpd cleanup
- remove all CsvType per-node and stats enum values
- remove rocpd::types::marker
- remove rocpd/source/stats_summary.{hpp,cpp}
- add `sample_regions` view
- add `regions_and_samples` view
- move tests/rocprofv3/rocpd-generators to tests/rocprofv3/rocpd
- merge validate_perfetto.py and validate_otf2.py into single validate.py
* Remove stats options from rocpd.output_config
* Remove json from rocpd.__main__
* Remove rocpd.csv summary/stats options
* Remove generate_from_rocpd.py
* Add rocpd subparser (convert) + migrate tests to use python3 -m rocpd
* Add additional tests for the rocpd command line
- Check the --help flag works for:
- python3 -m rocpd --help
- python3 -m rocpd convert --help
- python3 -m rocpd.csv --help
- python3 -m rocpd.otf2 --help
- python3 -m rocpd.pftrace --help
* adding rocpd python shebangs and main function parameter ordering
* Fix sanitizer tests + remove read_code_objects and read_kernel_symbols
* Misc updates
- Update CHANGELOG.md and source/lib/python/rocpd/README.md
- update time_window.py
- find min/max in regions_and_samples/kernels/memory_allocations/memory_copies
- RocpdImportData supports all the same functions as sqlite3.Connection
* Improve time_window.py
- find the tables with start/end dynamically
- find the tables with timestamp dynamically
* Minor revert to time_window.py
* Remove tests/rocprofv3/pytorch-tests
* Fix python 3.6 error in time_window.py
- 'type' object is not subscriptable
* Fix rocpd installation
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Young Hui <young.hui@amd.com>
Co-authored-by: acanadas <acanadas@amd.com>
Co-authored-by: Jin Tao <jintao12@amd.com>
Co-authored-by: oshkarav_amdeng <oshkarav@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
7dc9c6c391 |
[ROCm 7.0] [PC Sampling] [Tests] Using yaml input directory (#435)
Using yaml input directory
[ROCm/rocprofiler-sdk commit:
|
||
|
|
0e93099fd7 |
[rocprofv3] SQLite3 database output (rocpd) support + rocprofiler-sdk-rocpd (#403)
* [rocprofv3] rocpd SQLite3 database output support
* Move counters xml and yaml to source/share/rocprofiler-sdk
- more representative of install hierarchy
* Add share/rocprofiler-sdk/rocpd SQL files
* Experimental rocprofiler-sdk SQL API
* rocprofv3 default output format is rocpd
* Fix rocpd event ids for counter collection w/o kernel dispatch
* Remove fktable entries from rocpd_tables.sql
* Fix rocpd schema path
* Fix install component for roctx python bindings
* rocprofiler-sdk-rocpd
- create include/rocprofiler-sdk-rocpd
- create rocprofiler-sdk-rocpd library, package, etc.
- default all "guid" fields to "{{guid}}" in tables
- remove "{{view_uuid}}" support (always unused)
* Migrate rocprofv3 to use rocprofiler-sdk-rocpd
* Fix missing foreign key reference
* Revert change
* Fix cmake comment
* Fix maybe-uninitialized compiler warning
* Fix maybe-uninitialized compiler warning
* Add logging to rocpd_sql_load_schema
* Improve string sanitization when inserting json strings
* Initialize rocpd logging on rocprofiler-sdk-rocpd library load
* Revert lib/output/generatePerfetto.cpp changes
* [temporary] Tweak rocprofv3-test-list-avail-trace-execute test log level
* Update get_install_path for lib/rocprofiler-sdk-rocpd/sql.cpp
- try to resolve issues on RHEL/SLES for dladdr
* Update lib/common/logging.cpp
- enable environ overrides
* dlsym for rocpd_sql_load_schema
* Make dl_info.dli_fname lexically normal
* Implement node_info alternatives if /etc/machine-id does not exist
* Misc include fixes
* SHA256 and UUIDv7 support
* Implement UUIDv7 in generateRocpd.cpp
* Support push/pop environment variables
* Minor tweak
* Fix glog segfaults when unsetting glog env
* Updated CHANGELOG
* Updates tests/pytest-packages
- rocpd_reader.py: RocpdReader
* Update tests / marker_views.sql
- add test_rocpd_data
* Update rocpd_tables.sql
- Use AUTOINCREMENT
- insert "uuid" and "guid" into rocpd_metadata
* Minor updates to generateRocpd.cpp
- don't quote GUID
- use sqlite3_open_v2
- use sqlite3_close_v2
* Update execute_raw_sql_statements_impl
- uses sqlite3_last_insert_rowid for autoincrement
* Update SQL deferred_transaction
- CI check for nullptr to connection
* Apply suggestions from code review
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
* Code review updates
- formatting
- replace if with switch
- remove loop for {{uuid}}
* Fix pmc_groups handling in rocprofv3
* Address code review feedback
- Include rocm_version in rocprofv3 version info
- Note `--version` option for `rocprofv3` in CHANGELOG.md
- remove commented out code
* Fix packaging dependencies
* Fix install package step of CI workflow
* Fix install package step of CI workflow
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Welton, Benjamin <Benjamin.Welton@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
8c9270f59c |
Fix logic error in PR #410 (#414)
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
Co-authored-by: Ma, Bing <Bing.Ma@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
4d29b74750 |
Unit test fix - calculate aggregate sum based on instance size (#410)
* Unit test fix- calculate aggregate sum based on instance size
* Update GRBM instance size
* aggregate sum
---------
Co-authored-by: Sushma Vaddireddy <svaddire@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
b097e276a9 |
[rocprofv3] Add rocpd output support (part 1: prelude) (#401)
* [rocprofv3] Add rocpd output support (part 1: prelude)
- git submodules for sqlite3, GOTCHA, and pybind11
- HIP stream data
- rocprofiler_query_intercept_table_name(...)
- serialization load
- rocprofiler::sdk::get_perfetto_category(KindT)
- rocprofiler::sdk::parse::strip
- common library updates
- md5sum
- hasher
- simple_timer
- static_tl_object
- get_process_start_time_ns(pid_t)
- output library updates
- node_info
- file_generator (generator is now virtual base class)
- stream info updates
* Added submodules
* Code review updates
* Minor unused-but-set-X warning fixes
* Update CI
- install libsqlite3-dev package
* Update CI
- install libsqlite3-dev package
* Fix static thread-local object memory leak
- also fix signal handler chaining
* Remove URL from comment
* Remove page migration exception
* Enable ROCPROFILER_BUILD_SQLITE3 by default
- try find_package(SQLite3) first and then build when ROCPROFILER_BUILD_SQLITE3=ON
* Fix gotcha installation
- make install of target optional
* Validate tracing + counter collection dispatch data
- i.e. correlation ids, thread ids, timestamps
* Make find_package(SQLite3) optional
- ROCm CI does not have SQLite3 dev package installed and cannot build from source (missing tclsh)
* Fixes to tracing + counter collection test
* get_process_start_time_ns update
- original implementation did not work
* Fix pytest-packages test_perfetto_data for counter collection
- erroneous failure when used with same PMC + multiple agents
* cmake policy: option() honors normal variables
- for GOTCHA submodule
* Improve samples/api_buffered_tracing stability
- reduce likelihood of sporadic exception throw
* Update gotcha submodule
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
682b9967e0 |
[RSERP-1802] Add trace decoder to API (#398)
* Add trace decoder to API.
* Cleanup and activity
* Rename
* Minor fix
* Replace tt/TT with thread_trace/THREAD_TRACE
- public API types are not abbreviated
* Fix aliases
* Build system updates
- activate clang-tidy for all subfolders in lib
- fix addition of sources for att-tool
* Fix clang-tidy issues with lib/att-tool/counters.{hpp,cpp}
* Delete counters.cpp
* Formatting
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
cfce653d86 |
[SDK] Standardize rocprofiler-sdk counter definition YAML schema (#370)
* Convert YAML Format
Convert YAML format and reader to properly read the YAML.
Comparison between output's from the YAML show only changes in ordering
of architectures (and ids).
* Test fixes
* Add script for converting the YAML schema to source/scripts
* Update documentation
* Change the extra counter code block to YAML
* Add missing new line at EOF
* remove name issues
---------
Co-authored-by: Benjamin Welton <bewelton@amd.com>
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
d9bd0903f8 |
[PSDB] [CI] SWDEV-528922: Modify summary MEMORY_ALLOCATION for summary.txt (#375)
* Update validate.py
* increase memalloc to 4 to account new vmem.
* Format.
* modify test to work when vmem using and not to work.
* address comments.
---------
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
24f054f509 |
Fix HIP Streams Duplication Error (#313)
* Fix stream duplication and fixed tests
* Added comments to explain stream.cpp code, change stream nullptr check to occur in update table to prevent readding null stream, simplified hip-streams bin file code, add destroyStreams to hip-streams bin file code
* Removed roctx from CMakeLists.txt
* Updated documentation
* Fix documentation
* Removed update_table for HIP compiler table and updated stream.cpp to remove support for HIP compiler table
* Added runtime initialization check for HIP
* Changed tool name, working on fixing memory management
* Added context for counter collection kernel rename combination
* Changed name from map to set and changed description
* Fix documentation description for group-by-queue
* Merged memory copy and kernel operations onto a single track when on the same stream
* Updated perfetto output to remove hardware information from track name to merge all memory copy and kernel operations on the same stream to the same track:
* Most pr comments addressed
* Added filter for counter collection and removed kernel buffer tracing hack
* Added PR comment fixes
---------
Co-authored-by: Madsen, Jonathan <Jonathan.Madsen@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
49486fee5e |
[CI] Fix code coverage and thread sanitizer workflows (#378)
* Fix code coverage workflow
* Relocate rocprofv3 conversion test script + rename tests
- these are rocprofv3 tests and were not properly located and not properly named
* Fix thread sanitizer
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
9f7703f918 |
Build system (libdw), correlation ID, and shebang fixes (#354)
* Fix compilation for output library
- link to targets for ATT (amd-comgr, dw, elf)
* Relax correlation ID retirement log failures
- only fail for correlation ID retirement underflow when building in CI mode
* Fix shebang for several files
- license was inserted before shebang in several places
* Update code coverage exclude folders for samples
* Tweak to agent tests
- test to make sure hsa agent is not the old value instead of testing that it is the new value
* Fix libdw include/link
---------
Co-authored-by: Jonathan R. Madsen <jonathanrmadsen@gmail.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
f91f0712f7 |
SWDEV-528686: ATT fix for gfx12 s_wait_idle. Fixes for csv. Default to parse to trace. Fix for ROCR_VISIBLE_DEVICES. (#345)
* Fix for gfx12 s_wait_idle. Added wait field on att.csv
* Format and default to ATT to trace
* Update .mds
* No fatal error for invalid agent
* Tidy fixes
* Rename wait to idle, removed uneeded headers
* Remove unused traceID
* Tidy fix
* Fix csv output
* Formatting
* Fix tests
* Fix tests
* Fix for visible devices
* Review comment: Fix cmake
* Review suggestion
* Remove changelog/readme
* Review comments
* Review comment for CSV
* Formatting
---------
Co-authored-by: Giovanni Baraldi <gbaraldi@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
1264ffeb45 |
[PSDB] [CI] SWDEV-528922: Modify summary MEMORY_ALLOCATION kind tests to account new hip runtime memory manger (#368)
* Modify summary MEMORY_ALLOCATION kind tests to account new hip runtime new memory manager which adds memory pools allocations.
* address comments.
---------
Co-authored-by: Venkateshwar Reddy Kandula <vkandula@amd.com>
[ROCm/rocprofiler-sdk commit:
|
||
|
|
3b63300a67 |
Video file location change for rocDecode (#353)
* Video file location change for rocDecode
* Formatting
* Changed rocJPEG images directory not found to a warning as well
* Minor name update
* Fix rocJPEG disable variable
[ROCm/rocprofiler-sdk commit:
|